-
Notifications
You must be signed in to change notification settings - Fork 16.8k
API - Improve public get_task_instances endpoint #62027
Copy link
Copy link
Closed
Labels
area:APIAirflow's REST/HTTP APIAirflow's REST/HTTP APIarea:performancegood first issuekind:bugThis is a clearly a bugThis is a clearly a bug
Milestone
Description
Similarly to #62025
On big installation the "get_task_instances" endpoint to list all the dagruns in the UI is taking a long time to return a response.
The critical part of the code is the actual db query. Most likely due to the number of joins creating rows explosion. We need to optimize the query, most likely by double checking the eager loading options and verify that there is no row explosion cause by a wrong option (joinedload vs selectinload). Also we can add load_only to limit the number of columns selected in each joins.
airflow/airflow-core/src/airflow/api_fastapi/core_api/routes/public/task_instances.py
Lines 483 to 533 in 4e75919
| query = ( | |
| select(TI).join(TI.dag_run).outerjoin(TI.dag_version).options(*eager_load_TI_and_TIH_for_validation()) | |
| ) | |
| if dag_run_id != "~": | |
| dag_run = session.scalar(select(DagRun).filter_by(run_id=dag_run_id)) | |
| if not dag_run: | |
| raise HTTPException( | |
| status.HTTP_404_NOT_FOUND, | |
| f"DagRun with run_id: `{dag_run_id}` was not found", | |
| ) | |
| query = query.where(TI.run_id == dag_run_id) | |
| if dag_id != "~": | |
| dag = get_dag_for_run_or_latest_version(dag_bag, dag_run, dag_id, session) | |
| query = query.where(TI.dag_id == dag_id) | |
| if dag: | |
| task_group_id.dag = dag | |
| task_instance_select, total_entries = paginated_select( | |
| statement=query, | |
| filters=[ | |
| run_after_range, | |
| logical_date_range, | |
| start_date_range, | |
| end_date_range, | |
| update_at_range, | |
| duration_range, | |
| state, | |
| pool, | |
| pool_name_pattern, | |
| queue, | |
| queue_name_pattern, | |
| executor, | |
| task_id, | |
| task_display_name_pattern, | |
| task_group_id, | |
| dag_id_pattern, | |
| run_id_pattern, | |
| version_number, | |
| readable_ti_filter, | |
| try_number, | |
| operator, | |
| operator_name_pattern, | |
| map_index, | |
| ], | |
| order_by=order_by, | |
| offset=offset, | |
| limit=limit, | |
| session=session, | |
| ) | |
| task_instances = session.scalars(task_instance_select) |
Also needs investigation to check if an extra index could help.
Screen.Recording.2026-02-16.at.16.28.21.mov
Committer
- I acknowledge that I am a maintainer/committer of the Apache Airflow project.
Committer
- I acknowledge that I am a maintainer/committer of the Apache Airflow project.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
area:APIAirflow's REST/HTTP APIAirflow's REST/HTTP APIarea:performancegood first issuekind:bugThis is a clearly a bugThis is a clearly a bug
Type
Fields
Give feedbackNo fields configured for issues without a type.