An AirFlow Dag that moves S3 Bucket Files to Google Cloud Storage. Keeping the same folder partitioning and file formats.
Write an AirFlow Dag that moves S3 Bucket Files to Google Cloud Storage. Keeping the same folder partitioning and file formats.
- Avoid billing
- Keep it simple
For this mission we’re going to deploy Apache Airflow with the Docker Compose method on Google Cloud’s Cloud Shell. We could have used Google Cloud’s Cloud Composer but with this approach we’re avoiding billing.
In this DAG we’re going to copy data from s3://thecodemancer/Revelo/ to gs://thecodemancer/Revelo/. We could have done this by coding from scratch but Airflow has a rich set of Operators ready to import, configure and use. Having said that, we’ll use the S3ToGCSOperator for simplicity.
- Python
- Docker
- Apache Airflow
- AWS
- GCP