- Airflow: For scheduling and orchestration of the data pipeline tasks.
- EC2: For running the Python scripts for data extraction and transformation.
- Lambda Functions: For serverless, triggered processing of data transfer between S3 buckets.
- S3: For storing data at various stages of the pipeline.
- Redshift: For efficient data warehousing and analytics.
- QuickSight: For data visualization and exploration.
- Python Script: Extracts data from Zillow in JSON format and stores it in an S3 bucket.
- S3 Bucket (Staging): Stores the initial extracted JSON data.
- AWS Lambda Function 1 (Data Transfer): Triggers upon new data in the staging S3 bucket and copies the JSON data to a destination S3 bucket.
- S3 Bucket (Processing): Holds the JSON data ready for further processing.
- AWS Lambda Function 2 (Data Transformation): Triggers upon new data in the processing S3 bucket, reads the JSON data, converts it to CSV format, and stores the CSV data in a designated S3 bucket.
- S3 Bucket (Transformed Data): Stores the final processed data in CSV format.
- Amazon Redshift: Stores the CSV data from the transformed data S3 bucket for efficient data warehousing and analytics.
- Amazon QuickSight: Connects to the Redshift data warehouse to visualize and analyze the Zillow data.