Skip to content

Add detailed task spans#63568

Open
dstandish wants to merge 16 commits intoapache:mainfrom
astronomer:add-detailed-task-spans
Open

Add detailed task spans#63568
dstandish wants to merge 16 commits intoapache:mainfrom
astronomer:add-detailed-task-spans

Conversation

@dstandish
Copy link
Copy Markdown
Contributor

@dstandish dstandish commented Mar 13, 2026

Builds on top of #63839

Add spans for detailed debugging info during the task execution process

These are only emitted if you add {"airflow/task_span_detail_level": 2} to dag run conf.

@dstandish dstandish force-pushed the add-detailed-task-spans branch 3 times, most recently from 2cbc72a to 4b37205 Compare March 20, 2026 16:36
@dstandish dstandish force-pushed the add-detailed-task-spans branch from c460e01 to 2b5a0ed Compare March 24, 2026 17:22
Copy link
Copy Markdown
Contributor

@nickstenning nickstenning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Others are better place to say whether the specific detail spans you've chosen to emit are valuable, but the tooling and implementation LGTM! I left some very minor comments but nothing blocking.

dstandish and others added 4 commits March 24, 2026 15:48
@nickstenning
Copy link
Copy Markdown
Contributor

Just calling out this thread in case it's slipped through the cracks?

@dstandish dstandish added this to the Airflow 3.2.0 milestone Mar 26, 2026
@ashb ashb force-pushed the add-detailed-task-spans branch from f472203 to d83ac4f Compare April 9, 2026 14:53
@dstandish dstandish force-pushed the add-detailed-task-spans branch from d83ac4f to 6cef7fc Compare April 9, 2026 16:09
@kaxil kaxil requested a review from Copilot April 10, 2026 19:55
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds optional “detail spans” to OpenTelemetry traces to improve task execution debugging, gated by a DAG run conf setting (airflow/task_span_detail_level), and extends tests to validate propagation and span hierarchy.

Changes:

  • Introduces detail_span decorator/context manager in the Task SDK task runner and wraps key execution steps with nested spans when detail level > 1.
  • Propagates task_span_detail_level via W3C tracestate in the DAG run’s trace carrier, and preserves it when clearing task instances.
  • Adds unit/integration tests to verify detail-level parsing/propagation and expected span structure.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
task-sdk/src/airflow/sdk/execution_time/task_runner.py Adds detail_span and instruments task lifecycle with nested spans + exception/status recording
task-sdk/tests/task_sdk/execution_time/test_task_runner.py Adds tests for detail_span behavior at different detail levels
shared/observability/src/airflow_shared/observability/traces/init.py Encodes detail level into tracestate and adds API to read it back
shared/observability/tests/observability/test_traces.py Adds tests for tracestate entries, carrier creation, and level extraction
airflow-core/src/airflow/models/dagrun.py Seeds dagrun carrier with detail level; wraps _emit_dagrun_span call with error handling
airflow-core/src/airflow/models/taskinstance.py Preserves detail level when regenerating dagrun carrier during clear_task_instances
airflow-core/src/airflow/api_fastapi/execution_api/routes/task_instances.py Makes task instance id attribute string + guards span emission
airflow-core/tests/unit/models/test_dagrun.py Tests that detail level in conf is embedded and readable
airflow-core/tests/unit/models/test_taskinstance.py Tests that clearing task instances preserves detail level
airflow-core/tests/unit/listeners/test_listeners.py Updates expected listener error wording
airflow-core/tests/integration/otel/test_otel.py Parameterizes integration test to validate default vs detail span hierarchies
scripts/ci/docker-compose/integration-otel.yml Adds deprecation note comment for env var
dev/* Adds internal review/demo/trace analysis dev artifacts

Comment on lines +156 to +165
def _make_ctx(self):
parent_span = trace.get_current_span()
config_level = get_task_span_detail_level(span=parent_span)
if config_level > 1:
return tracer.start_as_current_span(*self._args, **self._kwargs)
return trace.INVALID_SPAN

def __enter__(self):
self._ctx = self._make_ctx()
return self._ctx.__enter__()
Comment on lines +148 to +149
class detail_span:
"""Context manager and decorator that creates a child span when detail level > 1."""
Comment on lines 1949 to 1952
startup_details = get_startup_details()
span = _make_task_span(msg=startup_details)
stack.enter_context(span)
span_ctx_mgr = _make_task_span(msg=startup_details)
span = stack.enter_context(span_ctx_mgr)
ti, context, log = startup(msg=startup_details)
finalize(ti, state, context, log, error)
except KeyboardInterrupt:

with detail_span("run") as span:
Comment on lines +1981 to +1988
except KeyboardInterrupt as e:
log.exception("Ctrl-c hit")
span.record_exception(e)
span.set_status(Status(StatusCode.ERROR, description=f"Exception: {type(e).__name__}"))
sys.exit(2)
except Exception:
except Exception as e:
log.exception("Top level error")
span.record_exception(e)
Comment on lines +103 to +110
def get_task_span_detail_level(span: Span):
span_ctx = span.get_span_context()
trace_state = span_ctx.trace_state
try:
return int(trace_state.get(TASK_SPAN_DETAIL_LEVEL_KEY, default=DEFAULT_TASK_SPAN_DETAIL_LEVEL))
except Exception:
log.warning("%s config in dag run conf must be integer.", TASK_SPAN_DETAIL_LEVEL_KEY)
return DEFAULT_TASK_SPAN_DETAIL_LEVEL
Comment on lines +3602 to +3604
new_ctx = TraceContextTextMapPropagator().extract(dag_run.context_carrier)
span = otel_trace.get_current_span(new_ctx)
assert get_task_span_detail_level(span) == 2
start_time=int((self.queued_at or self.start_date or timezone.utcnow()).timestamp() * 1e9),
attributes=attributes,
context=context.Context(),
context=context.Context(), # maybe need to make optional!!!
Comment on lines +4733 to +4740
def _make_provider_with_detail_level(self, level: int):
"""Return (provider, tracer, carrier) where the carrier encodes the given detail level."""
exporter = InMemorySpanExporter()
provider = TracerProvider()
provider.add_span_processor(SimpleSpanProcessor(exporter))
t = provider.get_tracer("test")
carrier = new_dagrun_trace_carrier(task_span_detail_level=level)
return provider, t, exporter, carrier
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants