Skip to content

fix: gracefully handle 404 from worker log server for historical retry attempts#63002

Merged
eladkal merged 2 commits intoapache:v2-11-testfrom
kalluripradeep:backport-62475-v2-11-test
Mar 6, 2026
Merged

fix: gracefully handle 404 from worker log server for historical retry attempts#63002
eladkal merged 2 commits intoapache:v2-11-testfrom
kalluripradeep:backport-62475-v2-11-test

Conversation

@kalluripradeep
Copy link
Copy Markdown
Contributor

Backport of #62475 to v2-11-test.

Original PR: #62475

What changed

Added 404 handling in _read_from_logs_server in file_task_handler.py with a local filesystem fallback and a clear message when logs aren't accessible.

How I fixed it

Added an elif response.status_code == 404 branch that:

  1. First tries reading from local filesystem via _read_from_local — covers cases where logs are on a shared drive
  2. If that also finds nothing, shows the user a clear message instead of a raw 404: "Log file not found on worker '{hostname}'. This attempt may have run on a different worker whose logs are no longer accessible. Consider configuring remote logging (S3, GCS, etc.) for log persistence."

Fixes #62372

@kalluripradeep
Copy link
Copy Markdown
Contributor Author

@eladkal — backport PR for v2-11-test as requested.

@eladkal eladkal added this to the Airflow 2.11.2 milestone Mar 6, 2026
@eladkal
Copy link
Copy Markdown
Contributor

eladkal commented Mar 6, 2026

@kalluripradeep can you verify if the failures in test_trigger_dag_and_wait_for_result relates to the changes in the PR?

@kalluripradeep
Copy link
Copy Markdown
Contributor Author

@kalluripradeep can you verify if the failures in test_trigger_dag_and_wait_for_result relates to the changes in the PR?

@eladkal — confirmed these failures are unrelated to this PR. The failing tests are test_integration_run_dag (timeout >300s) and test_integration_run_dag_with_scheduler_failure in TestCeleryAndLocalExecutor — both are DAG execution/scheduling tests.

This PR only modifies file_task_handler.py (log reading logic) and adds unit tests for it. No changes to DAG triggering, scheduling, or executor code.

@eladkal eladkal merged commit 47a1ca7 into apache:v2-11-test Mar 6, 2026
167 of 176 checks passed
@potiuk
Copy link
Copy Markdown
Member

potiuk commented Mar 7, 2026

Yes. That was a problem with Celery Provider 3.17.0 #63043

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants