GitHub - nessmahm/SparkKafka-Real-Time-Brand-Trend-Analysis: A real-time data processing project for analyzing e-commerce sales data. It integrates Apache Kafka 📡 and Apache Spark ⚡ to extract brand information and calculate the top N brands. Results are stored in HBase and displayed in real-time ⏱️. This project enables seamless integration with visualization tools 📊, offering insights into brand trends.

Project Overview

This project is designed to analyze e-commerce sales data in real-time using Apache Kafka, Apache Spark, and HBase. The data stream is ingested from Kafka, processed with Spark to extract and count brand occurrences, and the top N brands are then stored in HBase for further analysis. The results are also displayed in real-time, offering insights into brand trends.

Project Structure

SparkKafkaTopNBrands:
- Reads a stream of messages from a Kafka topic.
- Extracts the brand information from the data and counts occurrences.
- Outputs the top N brands in real-time to the console.
SparkKafkaTopNBrandsHbase :
- Similar to the first file, but also writes the top N brands to HBase for persistent storage.
- The data is written to a specified HBase table, and the system ensures that each batch of top N brands is updated.
- This file provides fault tolerance and ensures that data is consistently stored in HBase.
Kafka Producer:
- This is used to simulate data input into the Kafka topics that Spark consumes for real-time processing.
- It generates e-commerce transaction data with brand details and sends it to the Kafka topic.

Prerequisites

Apache Kafka: For streaming real-time data.
Apache Spark: For processing the streaming data in real-time.
HBase: For storing the processed data.
Java: The code is written in Java and uses Apache Spark's Java API.

How to Run

Start Kafka:
- Set up Kafka with the desired bootstrap server configuration.
- Create topics to be consumed by Spark.
Start HBase:
- Ensure HBase is running and the table brands exists.
Start the Spark Application:
- Compile and run the SparkKafkaTopNBrands or SparkKafkaTopNBrandsHbase class, passing the appropriate Kafka bootstrap server, topics, and consumer group ID as arguments:
```
java -jar SparkKafkaTopNBrands.jar <bootstrap-servers> <topics> <group-id>
```
- The Spark job will process the stream and output the results either to the console (for SparkKafkaTopNBrands) or to HBase (for SparkKafkaTopNBrandsHbase).
Kafka Producer:
- You can also run a Kafka producer to send synthetic data to the Kafka topic:
```
java -jar KafkaProducer.jar <bootstrap-servers> <topic>
```

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.idea		.idea
src/main/java/spark/kafka		src/main/java/spark/kafka
.gitignore		.gitignore
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Overview

Project Structure

Prerequisites

How to Run

About

Releases

Packages

Languages

nessmahm/SparkKafka-Real-Time-Brand-Trend-Analysis

Folders and files

Latest commit

History

Repository files navigation

Project Overview

Project Structure

Prerequisites

How to Run

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages