Skip to content

API - Improve public get_task_instances endpoint #62027

@pierrejeambrun

Description

@pierrejeambrun

Similarly to #62025

On big installation the "get_task_instances" endpoint to list all the dagruns in the UI is taking a long time to return a response.

The critical part of the code is the actual db query. Most likely due to the number of joins creating rows explosion. We need to optimize the query, most likely by double checking the eager loading options and verify that there is no row explosion cause by a wrong option (joinedload vs selectinload). Also we can add load_only to limit the number of columns selected in each joins.

query = (
select(TI).join(TI.dag_run).outerjoin(TI.dag_version).options(*eager_load_TI_and_TIH_for_validation())
)
if dag_run_id != "~":
dag_run = session.scalar(select(DagRun).filter_by(run_id=dag_run_id))
if not dag_run:
raise HTTPException(
status.HTTP_404_NOT_FOUND,
f"DagRun with run_id: `{dag_run_id}` was not found",
)
query = query.where(TI.run_id == dag_run_id)
if dag_id != "~":
dag = get_dag_for_run_or_latest_version(dag_bag, dag_run, dag_id, session)
query = query.where(TI.dag_id == dag_id)
if dag:
task_group_id.dag = dag
task_instance_select, total_entries = paginated_select(
statement=query,
filters=[
run_after_range,
logical_date_range,
start_date_range,
end_date_range,
update_at_range,
duration_range,
state,
pool,
pool_name_pattern,
queue,
queue_name_pattern,
executor,
task_id,
task_display_name_pattern,
task_group_id,
dag_id_pattern,
run_id_pattern,
version_number,
readable_ti_filter,
try_number,
operator,
operator_name_pattern,
map_index,
],
order_by=order_by,
offset=offset,
limit=limit,
session=session,
)
task_instances = session.scalars(task_instance_select)

Also needs investigation to check if an extra index could help.

Screen.Recording.2026-02-16.at.16.28.21.mov

Committer

  • I acknowledge that I am a maintainer/committer of the Apache Airflow project.

Committer

  • I acknowledge that I am a maintainer/committer of the Apache Airflow project.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions