Skip to content

Commit 13af96b

Browse files
perf: use load_only() in get_dag_runs eager loading to reduce data fetched per task instance (#62482)
* perf: use load_only() in eager_load_dag_run_for_validation to reduce data fetched The get_dag_runs API endpoint was slow on large deployments because eager_load_dag_run_for_validation() used selectinload on task_instances and task_instances_histories without restricting which columns were fetched. This caused SQLAlchemy to load all heavyweight columns (executor_config with pickled data, hostname, rendered fields, etc.) for every task instance across every DAG run in the result page — even though only dag_version_id is needed to traverse the association proxy to DagVersion. Add load_only(TaskInstance.dag_version_id) and load_only(TaskInstanceHistory.dag_version_id) to the selectinload chains so the SELECT for task instances fetches only the identity columns and the FK needed to resolve the dag_version relationship, significantly reducing the volume of data transferred from the database on busy deployments. Fixes #62025 * Fix static checks --------- Co-authored-by: pierrejeambrun <pierrejbrun@gmail.com>
1 parent f4fd68f commit 13af96b

File tree

1 file changed

+12
-1
lines changed
  • airflow-core/src/airflow/api_fastapi/common/db

1 file changed

+12
-1
lines changed

airflow-core/src/airflow/api_fastapi/common/db/dag_runs.py

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,13 +41,24 @@
4141

4242

4343
def eager_load_dag_run_for_validation() -> tuple[LoaderOption, ...]:
44-
"""Construct the eager loading options necessary for a DagRunResponse object."""
44+
"""
45+
Construct the eager loading options necessary for a DagRunResponse object.
46+
47+
For the list endpoint (get_dag_runs), loading all task instance columns is
48+
wasteful because we only need the dag_version_id FK to traverse to DagVersion.
49+
Using load_only() on TaskInstance and TaskInstanceHistory restricts the SELECT
50+
to just the identity columns and dag_version_id, avoiding large intermediate
51+
result sets caused by loading heavyweight columns (executor_config, etc.) for
52+
every task instance across every DAG run returned by the query.
53+
"""
4554
return (
4655
joinedload(DagRun.dag_model),
4756
selectinload(DagRun.task_instances)
57+
.load_only(TaskInstance.dag_version_id)
4858
.joinedload(TaskInstance.dag_version)
4959
.joinedload(DagVersion.bundle),
5060
selectinload(DagRun.task_instances_histories)
61+
.load_only(TaskInstanceHistory.dag_version_id)
5162
.joinedload(TaskInstanceHistory.dag_version)
5263
.joinedload(DagVersion.bundle),
5364
joinedload(DagRun.dag_run_note),

0 commit comments

Comments
 (0)