Skip to content

Grouped Issue: Asset Scheduling behavior changes based on DAG performance settings (and it shouldn't) #56750

@SCrocky

Description

@SCrocky

Apache Airflow version

3.1.0

If "Other Airflow 2/3 version" selected, which one?

No response

What happened?

Asset scheduling behaviors

Asset Event triggered DAGs behave one of 3 different ways:

1. A single Asset Event triggers a single DAG Run
2. Multiple Asset Events trigger a single DAG Run
3. Asset Events that haven't triggered a DAG Run, but are older than the last run are silently ignored

How to make Datasets Behave differently

To force behavior 2 & 3 to happen, one can set max_active_runs=1 and every time the DAG runs it will "consume" (either via behavior 2 or 3) all available Asset Events.

To force behavior 1, one must set max_active_runs to a high value, and hope that Asset Events are not generated faster than the scheduler runs (or else we fall into behavior 2)

It is important to note that the catchup argument does not seem to affect this mechanic in any way.

The main Issue

The main issue here is:

Asset Event Scheduling behaves in very different ways, based on DAG parallelism & Airflow Scheduler performance

These things should be unrelated, and as far as I could tell, this behavior is undocumented.

Linked Issues

Other issues that would likely be solved by addressing this issue:

#56749 (UI changes)
#53896 (distinct DAG Run per Asset Event)
#50890 (want catchup on Assets)
#56691 (distinct DAG Run per Asset Event)
#56050 (Max active runs = 1 changes behavior)
#55956 (Force separate Events)
#47398

Unclear issues that may be related:

#56541 ? (unclear)
#42015 ? (unclear)

What you think should happen instead?

In my professional setting we use both behavior 1 (for Event based scheduling) and behavior 2 & 3 (for table refreshes). Check out my Talk from Airflow Summit 2025 for more details.

So I suggest we make the Asset Event DAG triggering behavior configurable on a DAG level.

For example by adding a asset_grouping argument:

  • if asset_grouping=True then we have behavior 2
  • if asset_grouping=False then we have behavior 1

Behavior 3 is a bug in my opinion and should never happen.
I've put more info on the Asset Event attribution in this issue

I also suggest we rename catchup to time_interval_catchup or some similar value, so that it is clear it does not apply to Asset Event based scheduling.

And we should document all this stuff.

How to reproduce

To reproduce simply upload the following DAGs in a brand new Airflow instance:

check_dataset_sync.py

make sure to use a DB other than SQlite so you can compare the difference between max_active_runs=1 and max_active_runs=10.

Then use the airflow standalone command.
Turn all the DAGs on.

You should obtain the following DAGs:

Image

And manually trigger the asset generator DAG once.

Image

You will then see that the non-parallel DAGs only trigger twice, and the parallel DAG triggers 4-5 times, depending on scheduler frequency.

You can check the logs to see how many Asset Events each DAG is consuming:

Image

You can also do similar tests for Event Driven Asset Events:

event_scheduling_test.py

But be sure to add your dags repo to the PYTHONPATH export PYTHONPATH=$AIRFLOW_HOME/dags

Operating System

Ubuntu 24

Versions of Apache Airflow Providers

apache-airflow-providers-common-compat   1.7.3
apache-airflow-providers-common-io       1.6.2
apache-airflow-providers-common-sql      1.27.5
apache-airflow-providers-postgres        6.2.3
apache-airflow-providers-smtp            2.2.0
apache-airflow-providers-standard        1.6.0

Deployment

Virtualenv installation

Deployment details

Using postgres for the Airflow DB

Anything else?

@cmarteepants I've finally gotten around to making this issue as previously discussed.

Let me know if everything is clear and understandable.

@uranusjr enjoy ;)

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions