This project provides a functional template to orchestrate ETL processes, which fetch stock ticker price data from yahoo
finance, with Airflow. It features robust logging with the loguru library and unit tests with pytest. The Airflow
setup is minimal, using the LocalExecutor to execute tasks. The project is set up to send the data to a MySQL database
on the host machine but can be customized to support other database systems.
- Download and install MySQL
- Note that you do not have to store data in MySQL on the host machine, but you will have to adjust the programs accordingly if you choose to use another setup.
- Setup a MySQL database, and create the necessary tables by running
SQL\init\Create_DB_Tables.sqlon your MySQL Server - Create a
.envfile with the following variables:DB_HOST: the database host defined in step 2DB_PORT: the database port defined in step 2DB_USERNAME: the database username defined in step 2DB_PASSWORD: the database password defined in step 2DB_NAME: stock_data, or a different name ifSQL\init\Create_DB_Tables.sqlwas alteredLOG_FILE_PATH: The file location and name where you would like your log file to be stored. An absolute path is not recommended, as it may pose issues when used from the docker container.
- Install the local requirements, which are different that the requirements for the docker containers, by running
pip install -r requirements-local.txt - Populate the
tickerstable with a list of tickers you want to pull data on. The scriptetl\src\get_tsx_tickers.pywill pull the most recent list of tickers from the TSX website and load them into the table for you, however to use it you must install chromedriver and set the absolute path to it on your machine in the.envfile created in 3 asCHROME_DRIVER_PATH. However, you do not need to use this list of tickers, so feel free to populate the table with whatever tickers you are interested in using any method. - Load historic daily data for these tickers by running
etl\src\get_yfinance_price_data.pyand supplying the date range you are interested in.
- Download and install Docker Desktop
- Run
openssl rand -hex 32to generate a JWT key, and save it to the.envfile created in A)3. underAIRFLOW_JWT_SECRET. - Add
AIRFLOW_UID=50000to the same file, or customize the value if needed. - Adjust the start date of
etl\dags\price_history_dag.pyaccordingly, as it will backfill by default. - Initialize the Airflow database by running
docker compose up airflow-init, and wait for the 0 exit code. - Start Airflow with
docker compose up - Visit
localhost:8080to view the Airflow UI - Ensure the MySQL database setup in 1.c) is running.
- Unpause the
price_history_dag. - Let Airflow run and pull price data for you on a daily basis! Note that null rows in the ETL process will be stored in a separate table on the database. This way, the data for these tickers/dates can be pulled at a later date if it was not available.
- Run the existing tests by navigating to the
testfolder and runningpytest, and feel free to add others! - Adjust the DAG and source code to a more granular frequency if intraday/hourly data is needed.
- Play around with the
docker-compose.yamlfile to suit your needs. It currently uses theLocalExecutorto run tasks, and uses EST to configure its timezone. - Create other DAGs to schedule other stock ETL or ML related tasks! I will continue to build other functionality into this library!