ETL Pipeline for Twitter Data using Apache Airflow
-
Updated
Apr 5, 2023 - Python
ETL Pipeline for Twitter Data using Apache Airflow
ETL Pipeline (postgres, bigquery, csv, json, google storage)
Keywords: Python, Airflow, AWS, S3, Redshift, ETL
Python ETL Data Pipeline with AWS Glue and Athena
extract transform and load and transfrom
Udacity project within the Data Engineer Nanodegree
Implementation ETL with Python for data integration workflows.
We Build an ETL pipeline using Airflow that accomplishes the following: Downloads data from an AWS S3 bucket, Runs a Spark/Spark SQL job on the downloaded data producing a cleaned-up dataset of delivery deadline missing orders and then Upload the cleaned-up dataset back to the same S3 bucket in a folder primed for higher level analytics
ETL using application streaming and creating a Data Lake
Oracle SQL scripts for a data warehouse (Created in 2012)
Sort delimited or fixed width files by a defined key with data filter options and progress reporting.
Add a description, image, and links to the etl topic page so that developers can more easily learn about it.
To associate your repository with the etl topic, visit your repo's landing page and select "manage topics."