Motivation
The http://Backend.AI Manager currently has OpenTelemetry logging configured but lacks trace propagation support. Without an active TracerProvider and aiohttp instrumentation, incoming W3C Trace Context headers (traceparent/tracestate) from upstream services (e.g., the Kubernetes Bridge) are ignored, making cross-service trace correlation impossible.
Required Features
- Activate the global
TracerProvider by calling trace.set_tracer_provider() in apply_otel_tracer(), which was previously created but never registered.
- Wire
apply_otel_tracer() into the existing BraceStyleAdapter.apply_otel() call path so it runs during service discovery initialization.
- Instrument aiohttp server and client for automatic W3C Trace Context propagation on incoming and outgoing HTTP requests.
- Handle the aiohttp Application lifecycle ordering constraint: since
web.Application() is instantiated before OTel config is available, manually inject the OTel server middleware into the existing app instance before it is frozen by runner.setup().
Impact
ai.backend.logging — otel.py and utils.py modified to activate tracing.
ai.backend.manager — server.py modified to instrument aiohttp and inject middleware.
- No changes to API behavior, request handling, or database interactions.
- Cross-service distributed tracing becomes functional when OTel is enabled.
Testing Scenarios
- Verify that when
otel.enabled = true, the global TracerProvider is registered and the OTel server middleware is present in root_app.middlewares.
- Verify that when
otel.enabled = false, no instrumentation or middleware injection occurs.
- Verify that incoming requests with
traceparent headers are correlated with the correct trace in the configured OTel backend (e.g., Tempo).
- Verify that outgoing aiohttp client requests propagate
traceparent headers.
JIRA Issue: BA-4330
Motivation
The http://Backend.AI Manager currently has OpenTelemetry logging configured but lacks trace propagation support. Without an active TracerProvider and aiohttp instrumentation, incoming W3C Trace Context headers (
traceparent/tracestate) from upstream services (e.g., the Kubernetes Bridge) are ignored, making cross-service trace correlation impossible.Required Features
TracerProviderby callingtrace.set_tracer_provider()inapply_otel_tracer(), which was previously created but never registered.apply_otel_tracer()into the existingBraceStyleAdapter.apply_otel()call path so it runs during service discovery initialization.web.Application()is instantiated before OTel config is available, manually inject the OTel server middleware into the existing app instance before it is frozen byrunner.setup().Impact
ai.backend.logging—otel.pyandutils.pymodified to activate tracing.ai.backend.manager—server.pymodified to instrument aiohttp and inject middleware.Testing Scenarios
otel.enabled = true, the globalTracerProvideris registered and the OTel server middleware is present inroot_app.middlewares.otel.enabled = false, no instrumentation or middleware injection occurs.traceparentheaders are correlated with the correct trace in the configured OTel backend (e.g., Tempo).traceparentheaders.JIRA Issue: BA-4330