Skip to content

Enable parallel backfill by eliminating shared state between providers#63288

Merged
kaxil merged 1 commit into
apache:mainfrom
astronomer:registry-parallel-backfill
Mar 10, 2026
Merged

Enable parallel backfill by eliminating shared state between providers#63288
kaxil merged 1 commit into
apache:mainfrom
astronomer:registry-parallel-backfill

Conversation

@kaxil
Copy link
Copy Markdown
Member

@kaxil kaxil commented Mar 10, 2026

Follow-up to #63269. The backfill command previously used shared providers.json state, meaning two breeze registry backfill runs for different providers couldn't safely execute concurrently.

This adds --provider and --providers-json flags to both extraction scripts (extract_parameters.py, extract_connections.py) so each backfill run uses an isolated temp providers.json and only scans the target provider. In --provider mode, modules.json is not written (it would be incomplete), so concurrent runs don't clobber each other.

What changed

  • extract_parameters.py: --provider flag filters to single provider and skips modules.json/runtime_modules.json writes; --providers-json overrides the default search paths
  • extract_connections.py: Same two flags — filters output to single provider and accepts a custom providers.json path
  • registry_commands.py: Backfill command now creates a temp providers.json per version in a TemporaryDirectory, passes --provider/--providers-json to scripts, removes the _patch_providers_json save/restore pattern

Usage

Two terminal sessions can now safely backfill different providers simultaneously:

# Terminal 1
breeze registry backfill --provider amazon --version 9.15.0 --version 9.14.0

# Terminal 2
breeze registry backfill --provider google --version 14.0.0 --version 13.0.0

The registry-backfill.yml GitHub Actions workflow already uses a matrix strategy per provider, so this also makes individual CI jobs faster (no longer scanning 100+ providers per run).

Add --provider and --providers-json flags to extract_parameters.py and
extract_connections.py so each backfill run uses an isolated temp
providers.json and only scans the target provider. In --provider mode,
modules.json is not written (it would be incomplete), so concurrent
runs don't clobber each other.

The backfill command now creates a TemporaryDirectory with per-version
providers.json files instead of patching a shared file.
@kaxil kaxil force-pushed the registry-parallel-backfill branch from 02d1e52 to c55426b Compare March 10, 2026 18:37
@kaxil kaxil merged commit a9c0bf3 into apache:main Mar 10, 2026
129 checks passed
@kaxil kaxil deleted the registry-parallel-backfill branch March 10, 2026 20:38
@github-project-automation github-project-automation Bot moved this from In review to Done in Airflow Registry Mar 10, 2026
dominikhei pushed a commit to dominikhei/airflow that referenced this pull request Mar 11, 2026
apache#63288)

Add --provider and --providers-json flags to extract_parameters.py and
extract_connections.py so each backfill run uses an isolated temp
providers.json and only scans the target provider. In --provider mode,
modules.json is not written (it would be incomplete), so concurrent
runs don't clobber each other.

The backfill command now creates a TemporaryDirectory with per-version
providers.json files instead of patching a shared file.
Pyasma pushed a commit to Pyasma/airflow that referenced this pull request Mar 13, 2026
apache#63288)

Add --provider and --providers-json flags to extract_parameters.py and
extract_connections.py so each backfill run uses an isolated temp
providers.json and only scans the target provider. In --provider mode,
modules.json is not written (it would be incomplete), so concurrent
runs don't clobber each other.

The backfill command now creates a TemporaryDirectory with per-version
providers.json files instead of patching a shared file.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants