Skip to content

Handle triggerer cross-loop connection fallback (#64213)#1

Draft
deepujain wants to merge 595 commits intomainfrom
fix-64213-triggerer-aws-conn-loop
Draft

Handle triggerer cross-loop connection fallback (#64213)#1
deepujain wants to merge 595 commits intomainfrom
fix-64213-triggerer-aws-conn-loop

Conversation

@deepujain
Copy link
Copy Markdown
Owner

Title: Handle triggerer cross-loop connection RuntimeError in Task SDK (apache#64213)

Summary

The Task SDK connection backend now treats the triggerer worker-thread cross-loop RuntimeError the same way it already treats the existing AsyncToSync event-loop error, so deferrable AWS hooks can keep resolving connections instead of silently falling back to empty credentials.

Changes

  • task-sdk/src/airflow/sdk/execution_time/secrets/execution_api.py -- broadened the greenback fallback gate to recognize the cross-loop Future attached to a different loop error in addition to the existing AsyncToSync event-loop message.
  • task-sdk/tests/task_sdk/execution_time/test_secrets.py -- extended the regression test to cover both runtime-error variants and verify the greenback fallback still returns the expected connection.

Test plan

  • uv run --project task-sdk ruff check task-sdk/src/airflow/sdk/execution_time/secrets/execution_api.py task-sdk/tests/task_sdk/execution_time/test_secrets.py
  • uv run --project task-sdk ruff format --check task-sdk/src/airflow/sdk/execution_time/secrets/execution_api.py task-sdk/tests/task_sdk/execution_time/test_secrets.py
  • uv run --project task-sdk pytest task-sdk/tests/task_sdk/execution_time/test_secrets.py -xvs

Fixes apache#64213

Pranaykarvi and others added 30 commits March 17, 2026 19:33
…he#63816)

Chakra 3.34.0 changed the Dialog recipe's z-index token from
zIndex.modal (1400) to zIndex.popover (1500). The dropdown fix
from apache#62747 was computing modal+1=1401, which is now below the
dialog's 1500, causing dropdowns to render behind the fullscreen
modal. Update the token lookup to match the current recipe.
…#63793)

* Skip provider tests when all test directories are excluded

When running Providers[google] or Providers[amazon] on Python 3.14,
generate_args_for_pytest removes the test folders for excluded
providers, but the skip check in _run_test only triggered when the
--ignore filter itself removed something. Since the folders were
already removed upstream, the guard condition was never met, leaving
pytest with only flags and no test directories — causing it to crash
on unrecognized custom arguments.

Remove the overly strict guard so the skip fires whenever no test
directories remain in the args.

* Fix PROD image docker tests for Python-version-excluded providers

The docker tests expected all providers from prod_image_installed_providers.txt
to be present, but providers like google and amazon declare
excluded-python-versions in their provider.yaml. On Python 3.14, these
providers are correctly excluded from the PROD image at build time, but
the tests didn't account for this.

Read provider.yaml exclusions and filter expected providers/imports based
on the Docker image's Python version.

* Skip Python-incompatible provider wheels during PROD image build

get_distribution_specs.py now reads Requires-Python metadata from each
wheel and skips wheels that are incompatible with the running
interpreter. This prevents excluded providers (e.g. amazon on 3.14)
from being passed to pip/uv and installed despite their exclusion.

Also fix the requires-python specifier generation in packages.py:
!=3.14 per PEP 440 only excludes 3.14.0, not 3.14.2. Use !=3.14.*
wildcard to exclude the entire minor version.
… ORM/migration files (apache#62234)

* Re-introducing `--use-migration-files` and fix inconsistences between ORM/migration files

The `--use-migration-files` flag was removed in apache#41120 when we pruned the
migration history for Airflow 3. At the time, we couldn't create a database
from scratch using only migration files because the migration chain no
longer started from an empty database. The oldest migration assumed an
existing 2.6.2 schema.

Now that we have a squashed migration (0000_2_6_2) with
`down_revision = None` that creates the full baseline schema from scratch,
we can re-introduce this flag. This enables a critical development
workflow, which is: creating the database purely from migration files, then using
`alembic revision --autogenerate` to detect schema drift between the ORM
models and the migration-produced schema. Without this flag, `autogenerate`
compares against an ORM-created database (via `create_all`), which masks
any drift since both sides come from the same ORM definitions.

* fixup! Re-introducing `--use-migration-files` and fix inconsistences between ORM/migration files

* Update edge3 pre-commit file

* Remove added fab migration and fix the missing naming convention

* Fix inconsistencies in providers(fab/edge3) and also fix the edge3 pre-commit

* fixup! Fix inconsistencies in providers(fab/edge3) and also fix the edge3 pre-commit

* Add fixes for mysql and fab including postgress

* Fix conflicts

* more inconsistencies fixes

* import TIMESTAMP at function level to make provider verification script happy

* Harden MySQL migration procedure reuse
…tory with 3 updates (apache#63743)

Bumps the fab-ui-package-updates group with 3 updates in the /providers/fab/src/airflow/providers/fab/www directory: [babel-loader](https://github.com/babel/babel-loader), [mini-css-extract-plugin](https://github.com/webpack/mini-css-extract-plugin) and [terser-webpack-plugin](https://github.com/webpack/terser-webpack-plugin).


Updates `babel-loader` from 10.1.0 to 10.1.1
- [Release notes](https://github.com/babel/babel-loader/releases)
- [Changelog](https://github.com/babel/babel-loader/blob/main/CHANGELOG.md)
- [Commits](babel/babel-loader@v10.1.0...v10.1.1)

Updates `mini-css-extract-plugin` from 2.10.0 to 2.10.1
- [Release notes](https://github.com/webpack/mini-css-extract-plugin/releases)
- [Changelog](https://github.com/webpack/mini-css-extract-plugin/blob/main/CHANGELOG.md)
- [Commits](webpack/mini-css-extract-plugin@v2.10.0...v2.10.1)

Updates `terser-webpack-plugin` from 5.3.17 to 5.4.0
- [Release notes](https://github.com/webpack/terser-webpack-plugin/releases)
- [Changelog](https://github.com/webpack/terser-webpack-plugin/blob/main/CHANGELOG.md)
- [Commits](webpack/terser-webpack-plugin@v5.3.17...v5.4.0)

---
updated-dependencies:
- dependency-name: babel-loader
  dependency-version: 10.1.1
  dependency-type: direct:development
  update-type: version-update:semver-patch
  dependency-group: fab-ui-package-updates
- dependency-name: mini-css-extract-plugin
  dependency-version: 2.10.1
  dependency-type: direct:development
  update-type: version-update:semver-patch
  dependency-group: fab-ui-package-updates
- dependency-name: terser-webpack-plugin
  dependency-version: 5.4.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
  dependency-group: fab-ui-package-updates
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Add missing deprecation warnings

* Add newsfragment
…apache#63500)

* refactor: Added page explaining when to use deferred and when to use async operators with examples and explaining what the difference is

* refactor: Already updated some parts regarding to remarks

* refactor: Improved documentation with recommendations from @kaxil

* refactor: Added section to explain when not to use async or deferrable as suggested by TP

* refactor: Added reference from deferring as Jens suggested

* refactor: Fixed doc reference

* refactor: Reformatted deferred_http_operator_dag example

* refactor: Added MS Graph async example

* refactor: Changed order of MS Graph Async Example

* refactor: Updated MS Graph example

* refactor: Fixed indentation of bullet lists

* refactor: Reformatted Async Multiplexing Example

* refactor: Fixed reference to deferred vs async in deferring document

* refactor: Put full url instead of doc reference to tsk-sdk

---------

Co-authored-by: David Blain <david.blain@b-holding.be>
…ms (apache#62962)

- Method _get_sagemaker_studio_config in hook did not specify domain, project, and DZ region identifiers. The hook was missing domain_identifier, project_identifier, and datazone_domain_region.
- Add template_fields to SageMakerNotebookOperator. Without template_fields, passing XCom references (e.g. task_instance.xcom_pull(...)) for domain_id, project_id, or domain_region caused a ParamValidationError because the hook was instantiated before Airflow resolved the XCom values to actual strings. Adding these to template_fields ensures they are rendered before execute() is called.
- Updated example_sagemaker_unified_studio to reflect changes.
apache#63777)

The /api/v2/version call in NavTabs used a hardcoded absolute path,
  which 404s when Airflow is served under a subpath (e.g. /airflow/).
  This prevented the nav tabs from rendering, forcing users to navigate
  via inline table links that lacked relative=path, causing URLs to
  append instead of replace (e.g. /worker/jobs/worker/...).
The `--provider` flag was only passed to `extract_metadata.py` but not
to `extract_parameters.py` or `extract_connections.py`. This caused
incremental builds to scan all 99 providers and 1625 modules instead
of just the requested one.

The registry workflow was building the CI image from scratch every run
(~24 min) because it lacked the BuildKit mount cache that
ci-image-build.yml provides. Inline `breeze ci-image build` with
registry cache doesn't help because Docker layer cache invalidates
on every commit when the build context changes.

Split into two jobs following the established pattern used by
ci-amd-arm.yml and update-constraints-on-push.yml:

- `build-ci-image`: calls ci-image-build.yml which handles mount cache
  restore, ghcr.io login, registry cache, and image stashing
- `build-and-publish-registry`: restores the stashed image via
  prepare_breeze_and_image action, then runs the rest unchanged

* Fix merge crash when incremental extract skips modules.json

extract_parameters.py with --provider intentionally skips writing
modules.json (only the targeted provider's parameters are extracted).
The merge script assumed modules.json always exists, causing a
FileNotFoundError during incremental builds.

Handle missing new_modules_path the same way missing
existing_modules_path is already handled: treat it as an empty list.

* Fix /mnt not writable when loading stashed CI image

The prepare_breeze_and_image action loads the CI image from /mnt, which
requires make_mnt_writeable.sh to run first. Each job gets a fresh
runner, so the writeable /mnt from the build job doesn't carry over.

* Regenerate pnpm lockfile for workspace mode

Adding `packages: ['.']` to pnpm-workspace.yaml changed how pnpm
processes overrides, causing ERR_PNPM_LOCKFILE_CONFIG_MISMATCH with
--frozen-lockfile. Regenerate the lockfile with pnpm 9 to match.

* Scope prebuild uv resolution to dev/registry project

The prebuild script ran `uv run` without --project, causing uv to
resolve the full workspace including samba → krb5 which needs
libkrb5-dev (not installed on the CI runner).

Eleventy pagination templates emit empty fallback JSON for every provider,
even when only one provider's data was extracted.  A plain `aws s3 sync`
uploads those stubs and overwrites real connection/parameter data.

Changes:
- Exclude per-provider connections.json and parameters.json from the main
  S3 sync during incremental builds, then selectively upload only the
  target provider's API files
- Filter connections early in extract_connections.py (before the loop)
  and support space-separated multi-provider IDs
- Suppress SCARF_ANALYTICS and DO_NOT_TRACK telemetry in CI
- Document the Eleventy pagination limitation in README and AGENTS.md

* Exclude all per-provider API files during incremental S3 sync

The previous exclude only covered connections.json and parameters.json,
but modules.json and versions.json for non-target providers also contain
incomplete data (no version info extracted) and would overwrite correct
data on S3.  Simplify to exclude the entire api/providers/* subtree and
selectively upload only the target provider's directory.

* Also exclude provider HTML pages during incremental S3 sync

Non-target provider pages are rebuilt without connection/parameter data
(the version-specific extraction files don't exist locally). Without
this exclude, the incremental build overwrites complete HTML pages on
S3 with versions missing the connection builder section.

The providers listing page uses merged data (all providers) and must
be updated during incremental builds — especially for new providers.
AWS CLI --include after --exclude re-includes the specific file.
The executor already treats both queues as joinable queues. It calls:
  - task_done()
  - join()
  - flush logic that assumes task accounting is tracked

A plain manager Queue() does not match that contract. On Python 3.14 this showed up in teardown/error paths as: `ValueError: task_done() called too many times`
…3840)

Bumps [pyasn1](https://github.com/pyasn1/pyasn1) from 0.6.2 to 0.6.3.
- [Release notes](https://github.com/pyasn1/pyasn1/releases)
- [Changelog](https://github.com/pyasn1/pyasn1/blob/main/CHANGES.rst)
- [Commits](pyasn1/pyasn1@v0.6.2...v0.6.3)

---
updated-dependencies:
- dependency-name: pyasn1
  dependency-version: 0.6.3
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
The `queued_tasks` property in `LocalKubernetesExecutor` and `CeleryKubernetesExecutor` incorrectly merged base executor tasks. `dict.update()` modifies the dictionary in place which could lead to race conditions during rapid dict updates. This commit replaces `dict.update()` with the python dictionary union operator `|` for a safer and immutable map combination.

Signed-off-by: Ankit Kumar <ankitkumar17541@gmail.com>
Co-authored-by: Jens Scheffler <95105677+jscheffl@users.noreply.github.com>
Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
…apache#63832)

The xcom add test used a hardcoded key (`test_xcom_key`) which could collide with leftover state from a previous failed run where xcom delete never executed. Derive the key from the randomized `date_param` so each parametrize set gets a unique key.
* feat: added russian ui translation

* fix: missing "one" and "many" for "files"

* fix: added "_few"

* fix: removed unused "few"'s, added "asset_many"

* Update airflow-core/src/airflow/ui/public/i18n/locales/ru/common.json

Co-authored-by: renat-sagut <renat.sagut@gmail.com>

* Update airflow-core/src/airflow/ui/public/i18n/locales/ru/common.json

Co-authored-by: renat-sagut <renat.sagut@gmail.com>

* Apply suggestions from code review

Co-authored-by: renat-sagut <renat.sagut@gmail.com>

* CODEOWNERS updated to include locales/ru

* Update .github/CODEOWNERS

* Fix static checks

* CODEOWNERS fixed + replaced "Даг" with "DAG"

* Apply suggestions from code review

Co-authored-by: renat-sagut <renat.sagut@gmail.com>

* Apply suggestions from code review

Co-authored-by: renat-sagut <renat.sagut@gmail.com>

* misc translation fixes

* Fix ts-compile-lint-ui

---------

Co-authored-by: o.marchuk <o.marchuk@mkskom.ru>
Co-authored-by: renat-sagut <renat.sagut@gmail.com>
Co-authored-by: Jens Scheffler <95105677+jscheffl@users.noreply.github.com>
Co-authored-by: Jens Scheffler <jscheffl@apache.org>
* Update Helm Chart release notes for 1.20.0

* Review feedback Elad

* Review feedback on release notes

* Add 63659 to release notes

* Fix spellcheck + feedback Mirtpl
Do not backfill old DagRun.created_at
Signed-off-by: Guan-Ming (Wesley) Chiu <105915352+guan404ming@users.noreply.github.com>
…ts (apache#62369)

* Stream task instance summaries for multiple DAG runs over a single NDJSON connection to eliminate N+1 requests

* Stream task instance summaries for multiple DAG runs over a single NDJSON connection, replacing individual requests to improve performance and eliminate N+1 query issues.

* Fix capitalization of "Dag" in documentation and code comments for consistency

* Refactor GridTISummaries schema and update streaming endpoint to improve clarity and performance

* Fix formatting and linter issues

* Fix static check

* Fix static check

* Fix static check
* Fix make_partial_model

* Revert "Bump pydantic min version to 2.12.3 (apache#63570)"

This reverts commit 9516a77.

* Fix CI
Dev-iL and others added 30 commits March 26, 2026 04:33
v0.12.6 is the first that has wheels for 3.13 and 3.14.

Docker image building should be slightly faster as a result.
* Use compat sdk conf import in Informatica provider

* Remove common-compat "use next version" comment
* Use compat sdk conf import in Google Gen AI operators

* Remove common-compat "use next version" comment
* Compat sdk conf follow-up for multiple providers

* Remove common-compat "use next version" comments
…pache#62083)

* Add initial Portuguese (pt) translation agent skill (apache#62001)

* Update pt.md and remove SKILL.md

* Fix markdownlint MD032: add blank lines around lists + confirm license header (apache#62001)

* Remove long ASF license header as per request

* Fix insert-license CI error

---------

Co-authored-by: Jason(Zhe-You) Liu <68415893+jason810496@users.noreply.github.com>
* simplify the bug report template

Changes involve:
- Merging how to reproduce and what happened as these two pieces of information are naturally intertwined and a good reproduction walkthrough usually tells the story of what went wrong along the way.
- apache airflow version is usually a handy information while filing the bug. Instead of giving dropdowns, the user let to mention the airflow version would cover redundant ask of filling the version if it belongs to 3x versions.

* pre-commit fixes

updating the missing line in the bug template

fixing yaml linting issues

* simplify the bug report template

Changes involve:
- Merging how to reproduce and what happened as these two pieces of information are naturally intertwined and a good reproduction walkthrough usually tells the story of what went wrong along the way.
- apache airflow version is usually a handy information while filing the bug. Instead of giving dropdowns, the user let to mention the airflow version would cover redundant ask of filling the version if it belongs to 3x versions.

* pre-commit fixes

updating the missing line in the bug template

fixing yaml linting issues

* reverting linting changes unrelated to the template edit

* add placeholder and update terminology

* retain the airflow version type and structure

Since the issue template is static in nature, hiding the other versions is not allowed.
Retaining the current element type till the alternate options are decided

* add placeholder text for airflow version input and update verbiage.

- updated the airflow version field to have placeholder that hints the reporter run the airflow version command and paste the output.
- minor verbiage update for issue description

* remove quotes for the placeholder
…ITLOperator (apache#64108)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…ode (apache#64154)

* refactor: Fixed output encoding with WinRMTrigger
…e#64242)

* Pass parameters to k8s methods conditionally to fix mypy
…alls (apache#64216)

When `assume_role_method` is set to `assume_role_with_web_identity`, the
STS client used to fetch credentials was created without the connection's
botocore config. This meant proxy settings, timeouts, and other config
from `config_kwargs` in the connection extra were silently ignored.

The `assume_role` and `assume_role_with_saml` paths correctly pass
`self.config` to the STS client, but the web identity path passed a raw
`base_session.create_client` as `client_creator` to botocore's
`AssumeRoleWithWebIdentityCredentialFetcher`, which never received the
connection config.

This wraps `client_creator` to merge the connection's botocore config
into any config that botocore passes when creating the STS client,
ensuring proxy and other settings are respected.
…he#63979)

* Improve Playwright test patterns in VariablePage (apache#63965)

- Replace CSS :has-text() with locator.filter({ hasText }) in rowByKey
- Replace CSS attribute selector with getByRole('checkbox') in selectRow
- Replace page.waitForFunction() DOM queries with locator-based
  waiting (Promise.race of noData text vs first table row)
- Replace CSS input[type='checkbox'] with getByRole('checkbox')
  in selectAllCheckbox

Aligns with Playwright best practices per apache#63036.

* Revert checkbox selectors — Chakra hidden input incompatible with getByRole

getByRole('checkbox') resolves to Chakra UI's hidden <input> which
is not visible/stable, causing TimeoutError. Keep original CSS
selectors for checkbox interactions until Chakra components expose
proper accessible roles.

* Use expect().toBeVisible() with .or() combinator

Replace Promise.race + waitFor() with Playwright's built-in
.or() combinator for assertion-based waiting. Verified locally
with 5/5 pass.

---------

Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
Document that providers can be suspended from CI builds and releases
when their dependencies block upgrades, including per-Python-version
suspension. Stewards are responsible for resolving issues and ensuring
green CI before unsuspension.
…pache#64245)

The zip import error fix (apache#63617) changed the public signature of
`DagFileProcessorManager.deactivate_deleted_dags` from
`(bundle_name, present: set[DagFileInfo])` to
`(bundle_name, observed_filelocs: set[str])`, breaking subclass
overrides. Restore the original signature and compute observed
filelocs internally.

Also widen `DagModel.deactivate_deleted_dags` `rel_filelocs` type
from `set[str]` to `Collection[str]` to accept both list and set
callers.
…he#64077)

* Add LLMFileAnalysisOperator and @task.llm_file_analysis to the common-ai provider

# Conflicts:
#	uv.lock

* Fix mypy issues

* Update utils

* Update return model

* Fix spells

* fix up read

* document prefix lookup operation
* avoid passing parsed input back to component

* on change, update component and debounce utc parsing

* typo, linting fixes

* longer type delay, removed redundant isValid check
…ng (apache#64182)

The PoolBar component links to the task instances page using
SearchParamsKeys.STATE ('state') and SearchParamsKeys.POOL ('pool'),
but the TaskInstances page reads filters from SearchParamsKeys.TASK_STATE
('task_state') and SearchParamsKeys.POOL_NAME_PATTERN ('pool_name_pattern').

This mismatch causes clicking a pool slot segment (running, queued, etc.)
to navigate to the task instances page without any filters being applied,
showing all task instances instead of those filtered by the clicked state
and pool.

Fix by using the correct search parameter keys (TASK_STATE and
POOL_NAME_PATTERN) and slot.slotType instead of slot.color for the
state filter value.
…te (apache#64244)

* Fix LLMApprovalMixin to enforce allow_modifications in execute_complete
Co-authored-by: Oleg Kachur <kachur@google.com>
Catch TaskAlreadyRunningError from the supervisor and raise Celery
Ignore() to prevent the broker redelivery from being recorded as a
task failure.

related: apache#58441
* Improve Playwright test patterns in providers.spec.ts

* Tighten ProvidersPage load readiness check

---------

Co-authored-by: Shabbir Hussain <shabbir@Shabbirs-MacBook-Air.local>
Co-authored-by: Yeonguk Choo <choo121600@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

amazon provider: deferrable AWS hook in triggerer can lose connection when TriggerCommsDecoder.send() hits cross-loop RuntimeError