Commit 13af96b
perf: use load_only() in get_dag_runs eager loading to reduce data fetched per task instance (#62482)
* perf: use load_only() in eager_load_dag_run_for_validation to reduce data fetched
The get_dag_runs API endpoint was slow on large deployments because
eager_load_dag_run_for_validation() used selectinload on task_instances and
task_instances_histories without restricting which columns were fetched.
This caused SQLAlchemy to load all heavyweight columns (executor_config with
pickled data, hostname, rendered fields, etc.) for every task instance across
every DAG run in the result page — even though only dag_version_id is needed
to traverse the association proxy to DagVersion.
Add load_only(TaskInstance.dag_version_id) and
load_only(TaskInstanceHistory.dag_version_id) to the selectinload chains so
the SELECT for task instances fetches only the identity columns and the FK
needed to resolve the dag_version relationship, significantly reducing the
volume of data transferred from the database on busy deployments.
Fixes #62025
* Fix static checks
---------
Co-authored-by: pierrejeambrun <pierrejbrun@gmail.com>1 parent f4fd68f commit 13af96b
1 file changed
+12
-1
lines changedLines changed: 12 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
41 | 41 | | |
42 | 42 | | |
43 | 43 | | |
44 | | - | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
45 | 54 | | |
46 | 55 | | |
47 | 56 | | |
| 57 | + | |
48 | 58 | | |
49 | 59 | | |
50 | 60 | | |
| 61 | + | |
51 | 62 | | |
52 | 63 | | |
53 | 64 | | |
| |||
0 commit comments