Project based on application of azure databricks
-
Updated
Mar 7, 2023 - Python
Project based on application of azure databricks
This project performed data wrangling, analysis, visualization as well as machine learning prediction on a hypothetical music app's user churn with pyspark.
This project implements a real-time data pipeline with Kafka, Spark, and MongoDB. It generates vehicle data using UXSIM, streams it to a Kafka broker, processes it with Spark, and stores raw and processed data in MongoDB. Queries analyze vehicle counts, speeds, and routes over specified periods.
ontains the code and examples for my article on Medium, which introduces the English SDK for Apache Spark, showcasing how to combine the power of Apache Spark with large language models (LLMs)
A repository concentrating on using High end parallel pipelines to perform ETL across various data sources
NBA shot predictions with PySpark and SparkML
Add a description, image, and links to the pysaprk topic page so that developers can more easily learn about it.
To associate your repository with the pysaprk topic, visit your repo's landing page and select "manage topics."