fix: eliminate duplicate JOINs in get_task_instances endpoint by ShubhamGondane · Pull Request #62910 · apache/airflow

ShubhamGondane · 2026-03-05T05:09:38Z

Root cause

get_task_instances combined explicit join(TI.dag_run) + outerjoin(TI.dag_version) with joinedload options for the same tables inside eager_load_TI_and_TIH_for_validation(). SQLAlchemy emits a separate JOIN for each joinedload even when the table is already in the FROM clause, producing redundant duplicate JOINs in the SQL.

Before — 7 JOINs (dag_run and dag_version each joined twice):

FROM task_instance
  JOIN dag_run ON ...               -- explicit
  LEFT OUTER JOIN dag_version ON ... -- explicit
  LEFT OUTER JOIN dag_version AS dag_version_1 ON ...  -- joinedload duplicate
  LEFT OUTER JOIN dag_bundle ON ...
  JOIN dag_run AS dag_run_1 ON ...  -- joinedload duplicate
  LEFT OUTER JOIN dag AS dag_1 ON ...
  LEFT OUTER JOIN task_instance_note ON ...

After — 5 JOINs (each table once):

FROM task_instance
  JOIN dag_run ON ...               -- reused by contains_eager
  LEFT OUTER JOIN dag_version ON ... -- reused by contains_eager
  LEFT OUTER JOIN dag_bundle ON ...
  LEFT OUTER JOIN dag AS dag_1 ON ...
  LEFT OUTER JOIN task_instance_note ON ...

Note on performance: Because the query selects from task_instance and all the duplicated JOINs traverse many-to-one relationships (each TI has exactly one DagRun, one DagVersion), there is no row multiplication — the duplicate JOINs are redundant but not harmful at the row level. Benchmarks on PostgreSQL at scale (20,000 rows) confirmed no measurable timing difference. The value of this change is SQL correctness: contains_eager is the right SQLAlchemy idiom when the JOIN is already explicit, and it avoids emitting unnecessary SQL.

Fix

Replace joinedload options for already-joined tables with contains_eager, which tells SQLAlchemy to reuse the existing JOIN result for eager loading instead of adding a new one.

eager_load_TI_and_TIH_for_validation() is intentionally left unchanged — its other callers (get_mapped_task_instances, get_task_instances_history) have different join patterns and are not affected.

Testing

All 47 existing TestGetTaskInstances tests pass unchanged.
Replaced the query-count regression test with test_no_duplicate_joins_in_get_task_instances_query, which calls the real endpoint and captures the SQL it emits via a SQLAlchemy event listener, then asserts dag_run and dag_version each appear exactly once in the JOINs.

related: #62027

Was generative AI tooling used to co-author this PR?

Yes — Claude

Generated-by: Claude Sonnet 4.6 following the guidelines

pierrejeambrun

We need to adjust the test case which seems irrelevant.

From a performance standpoint it doesn't seem to make a difference. Can you provide a benchmark or more details as to the expected performance gains?

Before

After

airflow-core/tests/unit/api_fastapi/core_api/routes/public/test_task_instances.py

ShubhamGondane · 2026-03-09T21:30:11Z

Thanks for the review @pierrejeambrun
You are right on both counts.

Regarding the performance benchmark - I ran a PostgreSQL benchmark at 20,000 rows and didn't see any measurable difference. Root cause: all duplicated JOINs are on many-to-one relationships (each TI has exactly one DagRun/DagVersion), so there's no row multiplication regardless. Updated the PR description to remove performance claims.

pierrejeambrun

CI need fixing.

potiuk · 2026-03-11T00:01:59Z

CI need fixing.

looks good :)

potiuk · 2026-03-11T12:33:21Z

@ShubhamGondane This PR has a few issues that need to be addressed before it can be reviewed — please see our Pull Request quality criteria.

Issues found:

⚠️ Unresolved review comments: This PR has 1 unresolved review thread from maintainers. Please review and resolve all inline review comments before requesting another review. You can resolve a conversation by clicking 'Resolve conversation' on each thread after addressing the feedback. See pull request guidelines.

Note: Your branch is 253 commits behind main. Some check failures may be caused by changes in the base branch rather than by your PR. Please rebase your branch and push again to get up-to-date CI results.

What to do next:

The comment informs you what you need to do.
Fix each issue, then mark the PR as "Ready for review" in the GitHub UI - but only after making sure that all the issues are fixed.
Maintainers will then proceed with a normal review.

Please address the issues above and push again. If you have questions, feel free to ask on the Airflow Slack.

airflow-core/src/airflow/api_fastapi/core_api/routes/public/task_instances.py

…#62027) Replaces joinedload with contains_eager for already-joined tables, reducing the main query from 7 JOINs to 5.

…apache#62027

potiuk · 2026-03-12T20:13:14Z

@ShubhamGondane This PR has a few issues that need to be addressed before it can be reviewed — please see our Pull Request quality criteria.

Issues found:

⚠️ Unresolved review comments: This PR has 1 unresolved review thread from maintainers: @pierrejeambrun (MEMBER): 1 unresolved thread. Please review and resolve all inline review comments before requesting another review. You can resolve a conversation by clicking 'Resolve conversation' on each thread after addressing the feedback. See pull request guidelines.

Note: Your branch is 76 commits behind main. Some check failures may be caused by changes in the base branch rather than by your PR. Please rebase your branch and push again to get up-to-date CI results.

What to do next:

The comment informs you what you need to do.
Fix each issue, then mark the PR as "Ready for review" in the GitHub UI - but only after making sure that all the issues are fixed.
Maintainers will then proceed with a normal review.

Please address the issues above and push again. If you have questions, feel free to ask on the Airflow Slack.

…idation for all task instance endpoints

ShubhamGondane · 2026-03-12T21:46:51Z

Thanks for the feedback @pierrejeambrun! I've reworked the approach:

Moved the explicit .join(dag_run) / .outerjoin(dag_version) into eager_load_TI_and_TIH_for_validation so all join + eager-loading logic is centralised in one function
The function now takes the query as input, adds the joins, applies contains_eager, and returns the modified query
Updated all 4: get_task_instances, get_mapped_task_instances, get_task_instances_batch, and get_task_instance_tries
Added SQL-structure regression tests for the mapped and batch endpoints as well

…nces

pierrejeambrun

LGTM thanks

…#62910) * fix: eliminate duplicate JOINs in get_task_instances endpoint (apache#62027) Replaces joinedload with contains_eager for already-joined tables, reducing the main query from 7 JOINs to 5. * tests: replace query-count test with SQL-structure regression test for apache#62027 * fix: mypy type annotation for dialect in benchmark script * fix: centralise join + eager-loading in eager_load_TI_and_TIH_for_validation for all task instance endpoints * ci: retrigger checks

ShubhamGondane requested review from bugraoz93, choo121600, ephraimbuddy, jason810496, pierrejeambrun, rawwar and shubhamraj-git as code owners March 5, 2026 05:09

boring-cyborg bot added the area:API Airflow's REST/HTTP API label Mar 5, 2026

kkumtree mentioned this pull request Mar 5, 2026

i18n(Miscellaneous): Replace DAG as Dag(en,hi) #62923

Merged

1 task

pierrejeambrun reviewed Mar 9, 2026

View reviewed changes

airflow-core/tests/unit/api_fastapi/core_api/routes/public/test_task_instances.py Outdated Show resolved Hide resolved

ShubhamGondane requested review from amoghrajesh, ashb, gopidesupavan, jedcunningham, jscheffl and potiuk as code owners March 9, 2026 21:29

pierrejeambrun reviewed Mar 10, 2026

View reviewed changes

potiuk added the ready for maintainer review Set after triaging when all criteria pass. label Mar 11, 2026

pierrejeambrun reviewed Mar 12, 2026

View reviewed changes

airflow-core/src/airflow/api_fastapi/core_api/routes/public/task_instances.py Outdated Show resolved Hide resolved

ShubhamGondane added 3 commits March 12, 2026 11:33

fix: eliminate duplicate JOINs in get_task_instances endpoint (apache…

fa0c31a

…#62027) Replaces joinedload with contains_eager for already-joined tables, reducing the main query from 7 JOINs to 5.

tests: replace query-count test with SQL-structure regression test for …

37784a0

…apache#62027

fix: mypy type annotation for dialect in benchmark script

5ad902b

potiuk removed the ready for maintainer review Set after triaging when all criteria pass. label Mar 12, 2026

fix: centralise join + eager-loading in eager_load_TI_and_TIH_for_val…

fc621a5

…idation for all task instance endpoints

ShubhamGondane force-pushed the fix/eliminate-duplicate-joins-get-task-instances branch from 90f3a49 to fc621a5 Compare March 12, 2026 21:44

ShubhamGondane added 2 commits March 12, 2026 16:03

ci: retrigger checks

fb17bb8

Merge branch 'main' into fix/eliminate-duplicate-joins-get-task-insta…

fc30a66

…nces

potiuk added the ready for maintainer review Set after triaging when all criteria pass. label Mar 13, 2026

pierrejeambrun mentioned this pull request Mar 18, 2026

API - UI Optimization #62020

Open

pierrejeambrun approved these changes Mar 18, 2026

View reviewed changes

pierrejeambrun added this to the Airflow 3.2.0 milestone Mar 18, 2026

pierrejeambrun merged commit efce3b1 into apache:main Mar 18, 2026
132 checks passed

pierrejeambrun mentioned this pull request Mar 31, 2026

Improve get_task_instances performance by deferring joins and using approximate row count #62294

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: eliminate duplicate JOINs in get_task_instances endpoint#62910

fix: eliminate duplicate JOINs in get_task_instances endpoint#62910
pierrejeambrun merged 6 commits intoapache:mainfrom
ShubhamGondane:fix/eliminate-duplicate-joins-get-task-instances

ShubhamGondane commented Mar 5, 2026 •

edited by pierrejeambrun

Loading

Uh oh!

pierrejeambrun left a comment

Uh oh!

Uh oh!

ShubhamGondane commented Mar 9, 2026

Uh oh!

pierrejeambrun left a comment

Uh oh!

potiuk commented Mar 11, 2026

Uh oh!

potiuk commented Mar 11, 2026

Uh oh!

Uh oh!

potiuk commented Mar 12, 2026

Uh oh!

ShubhamGondane commented Mar 12, 2026

Uh oh!

pierrejeambrun left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ShubhamGondane commented Mar 5, 2026 • edited by pierrejeambrun Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Root cause

Fix

Testing

Was generative AI tooling used to co-author this PR?

Uh oh!

pierrejeambrun left a comment

Choose a reason for hiding this comment

Before

After

Uh oh!

Uh oh!

ShubhamGondane commented Mar 9, 2026

Uh oh!

pierrejeambrun left a comment

Choose a reason for hiding this comment

Uh oh!

potiuk commented Mar 11, 2026

Uh oh!

potiuk commented Mar 11, 2026

Uh oh!

Uh oh!

potiuk commented Mar 12, 2026

Uh oh!

ShubhamGondane commented Mar 12, 2026

Uh oh!

pierrejeambrun left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ShubhamGondane commented Mar 5, 2026 •

edited by pierrejeambrun

Loading