Airflow 3.1.5 Mixed Executors - Task Instance Not Found & State Desync Issues #61468
Replies: 3 comments 2 replies
-
|
for taskflow, this works @task(queue="xxx") |
Beta Was this translation helpful? Give feedback.
-
|
I see you're using the config This is for the old (deprecated) Celery/k8s hybrid executor, not to be used with multiple executor configuration. I'm wondering if you have a setup that is half converted to the modern multiple executor config.
Multiple executor config was released in 2.10, where in the docs do you see it say this should work with 2.6? I'm wondering if you're mixing configuration between versions. Lastly, I know that Celery can be quite unstable. With tasks being lost in some circumstances, or issues with the task SDK API (task tokens expiring by the time they reach the worker). It's worth looking more into this, which is unrelated to multiple executor configuration. |
Beta Was this translation helpful? Give feedback.
-
|
For users who experience the same thing in the future: Regardless of which queue you use, if you migrate the same queue from the old Airflow version to the new one, purge it first to prevent issues. Also, Airflow only officially supports |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi folks 👋,
We're experiencing critical issues with Airflow
3.1.5using mixed executors (KubernetesExecutor,CeleryExecutor) that preventCeleryExecutortasks from running successfully. This is blocking ourmigration from Airflow
2.xto3.x.Environment
3.1.5AIRFLOW__CORE__EXECUTOR=KubernetesExecutor,CeleryExecutorKubernetes (EKS)AWS SQS (FIFO queues)PostgreSQLPostgreSQL (RDS)CeleryExecutor:dbtjobs (high-concurrency, short-duration, SLA-critical)KubernetesExecutor: Resource-intensive jobs requiring custom pod specs{ "accept_content": [ "json" ], "event_serializer": "json", "worker_prefetch_multiplier": 1, "task_acks_late": true, "task_default_queue": "default", "task_default_exchange": "default", "task_track_started": true, "broker_url": "sqs://", "broker_transport_options": { "visibility_timeout": 32000, "region_name": "eu-west-1", "region": "eu-west-1", "predefined_queues": { "vc-sqs-celery-worker.fifo": { "url": "https://sqs.eu-west-1.amazonaws.com/xxx/vc-sqs-celery-worker.fifo" } } }, "broker_connection_retry_on_startup": true, "result_backend": "postgresql", "database_engine_options": {}, "worker_concurrency": 16, "worker_enable_remote_control": true, "worker_redirect_stdouts": false, "worker_hijack_root_logger": false }Configuration
DAG-level executor specification:
Celery queue configuration:
Queue passed to tasks:
Evidence of correct configuration (by importing dag and checking in the pod):
Tasks are correctly configured with both
executorandqueueparameters.The Issues
Issue 1: Task Instance Not Found (CRITICAL - Blocks all CeleryExecutor tasks)
Tasks with
executor: CeleryExecutorconsistently fail with "Task Instance not found" errors,preventing any task execution.
Celery Worker Logs:
Scheduler Logs:
Timeline:
SIGKILL404)running, Executor state:failed→ state desyncIssue 2: TaskFlow Tasks Default to
queue='default'TaskFlow API tasks (
@taskdecorator) don't inherit thequeueparameter from DAG code, defaultingto
queue='default':Even though the DAG code sets:
And passes it to explicit operators:
TaskFlow tasks created with
@taskdecorator don't inherit this and default toqueue='default'.What Works
KubernetesExecutortasks work perfectly (no issues)CeleryExecutorworked flawlesslyqueue=queueparameter work correctlyexecutor='CeleryExecutor'andqueue='vc-sqs-celery-worker.fifo'when inspectedQuestions for the Community
404"Task Instance not found" when the worker tries to start atask that the scheduler created seconds earlier? Is there a synchronization issue between
scheduler, API server, and executors?
executor=parameter fully supported in Airflow 3.x mixed executor setups? Thedocumentation suggests it is (since 2.6+), but we're experiencing critical failures.
@taskdecorator) with mixed executors:queuebe specified indefault_args?queueparameter set in DAG code propagate to TaskFlow tasks?identical symptoms with 3.1.5 mixed executors) but no GitHub issues.
reports
"failed"but the task instance remains in"running"state.configuration entirely until fixed?
What We've Tried
AIRFLOW__CELERY__OPERATION_TIMEOUTfrom30sto300spredefined_queuesexecutorandqueueattributes when inspectedAIRFLOW__CELERY__DEFAULT_QUEUE- doesn't resolve TaskFlow task routingCeleryExecutortasks still fail with "Task Instance not found"Beta Was this translation helpful? Give feedback.
All reactions