Skip to content

A repository that contains implementation of a Real-Time Vehicle Data Processing Pipeline that efficiently manages and analyzes vehicle data through a cohesive system.

License

Notifications You must be signed in to change notification settings

night-fury-me/real-time-vehicle-data-processing

Repository files navigation

This repository contains real-time data processing data pipeline using technologies like - gRPC, Apache Kafka, Apache Flink, Google BigQuery. The base framework for this project was adapted from the flink-with-python repository.

Real-time Vehicle Data Processing Pipeline

The Real-Time Vehicle Data Processing Pipeline efficiently manages and analyzes vehicle data through a cohesive system. The Vehicle C++ Client collects real-time telemetry data and transmits it via gRPC. This data is then ingested by the Kafka Data Producer, which streams it into Kafka topics. Apache Flink processes the streaming data, performs aggregations and real-time analytics, and writes the results to Google BigQuery for comprehensive storage and analysis. This pipeline ensures seamless data flow and insightful analysis, optimizing vehicle management and performance.

Image description

TODO

  • Develop vehicle client that sends data via gRPC in C++ (to generate dummy data)
  • Develop gRPC server (written in Python) to recieve data from vechicle client application
  • Write Producer using Apache Kafka to stream messages containing vehicle data (written in Python)
  • Develop Consumer with Apache Flink to consume stream messages from Kafka Producer (written in Python)
  • Aggregate and average data streams consumed by Flink for a particular time window (e.g., 5 second)
  • Sink the aggregated and averaged stream data for the particular time window in Google BigQuery
  • Connect Tableau with Google BigQuery to do Data Analytics
  • Perform Real-time AI prediction from Flink data stream to detect anomaly and other decision making
  • Notify events (anomaly, warnings, any kind of failure) using AWS SNS to stack holders (Vehicle User, Data Analytics Panel etc.) in real-time

Usage

To start all the containers - Vehicle C++ Client, Kafka Data Producer (gRPC Server), Flink Data Streamer (BigQuery Sink Client)

docker-compose up -d  

Enter into flink consumers shell

docker-compose exec -it data-streamer bash 

Submit a flink job in the flink cluster

/flink/bin/flink run -py /taskscripts/data-streamer.py --jobmanager jobmanager:8081 --target local

Enter into Data Stream Producers shell

docker-compose exec -it data-producer bash 

Run the gRPC Server

Run the Python gRPC server script:

python grpc_server.py

Enter into Vehicle clients shell

docker-compose exec -it vehicle-cpp-client bash

Start the sending vehicle data through gRPC

./client

Turn of all the processes

docker-compose down 

About

A repository that contains implementation of a Real-Time Vehicle Data Processing Pipeline that efficiently manages and analyzes vehicle data through a cohesive system.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published