brand logo

Projects

ETL Pipeline: NYC Taxi Dataset
Github
Post

ETL Pipeline: NYC Taxi Dataset

Live

This project implements an automated ETL (Extract, Transform, Load) pipeline using Apache Airflow to manage, schedule, and monitor data workflows, enabling reliable data ingestion into databases or data warehouses. After the ETL process, the project also includes data analysis on the NYC Taxi Trip dataset, with a dedicated notebook to explore insights, trends, and metrics derived from the processed data.

Stack used

DockerPythonApache AirflowNumpyPandasPostgreSQL