Skip to content

Lock TriggerCommsDecoder sync req-res cycle#64882

Open
uranusjr wants to merge 1 commit intoapache:mainfrom
astronomer:trigger-comm-send-sync
Open

Lock TriggerCommsDecoder sync req-res cycle#64882
uranusjr wants to merge 1 commit intoapache:mainfrom
astronomer:trigger-comm-send-sync

Conversation

@uranusjr
Copy link
Copy Markdown
Member

@uranusjr uranusjr commented Apr 8, 2026

Synchronous methods in TriggerCommsDecoder need an additional threading.Lock to ensure communications happening in different threads don't interlace. While the async methods are guarded by an asyncio.Lock, this is not enough for sync methods.

This is also (mostly) what the sync-based CommsDecoder is doing, except since that one's sync-based, sync methods only have a thread lock, and the async ones use double-locking.

Fix #64620.

Synchronous methods in TriggerCommsDecoder need an additional
threading.Lock to ensure communications happening in different threads
don't interlace. While the async methods are guarded by an asyncio.Lock,
this is not enough for sync methods.

This is also (mostly) what the sync-based CommsDecoder is doing, except
since that one's sync-based, sync methods only have a thread lock, and
the async ones use double-locking.
Copy link
Copy Markdown
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh this is getting messy

@uranusjr
Copy link
Copy Markdown
Member Author

uranusjr commented Apr 8, 2026

Do you think it’d be viable to default to async in the non-trigger CommsDecoder as well? (And just wrap the sync version with async_to_sync)

@tirkarthi
Copy link
Copy Markdown
Contributor

In order to make sure that we only send+receive one message at a time I have
swapped from the threading.lock to aiologic.Lock which allows for better
locking/waiting behaviour between threads (which is what Triggers will use via
the sync_to_async) and the main TriggerRunner.

We were also affected by this in 3.1.5 and while exploring the previous code I came across da78b39 which seems related and had above comment. Is using aiologic based lock a better option here?

One of our major worries was also that triggerer kept heartbeating though it was processing no triggers as reported in the issue due to this issue. Hence the liveness probe was not restarting the triggerer process. Is there something else to be done to fix the issue? We had deadlock in our mysql cluster from time to time as we use end from trigger feature which lead to triggerer restarts on deadlock and sort of worked around this issue.

https://pypi.org/project/aiologic/

@eladkal eladkal added this to the Airflow 3.2.1 milestone Apr 8, 2026
@eladkal eladkal added the backport-to-v3-2-test Mark PR with this label to backport to v3-2-test branch label Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:Triggerer backport-to-v3-2-test Mark PR with this label to backport to v3-2-test branch

Projects

None yet

4 participants