Skip to content

Enable OpenTelemetry Distributed Tracing Infrastructure in Manager #8693

@hhoikoo

Description

@hhoikoo

Motivation

The http://Backend.AI Manager currently has OpenTelemetry logging configured but lacks trace propagation support. Without an active TracerProvider and aiohttp instrumentation, incoming W3C Trace Context headers (traceparent/tracestate) from upstream services (e.g., the Kubernetes Bridge) are ignored, making cross-service trace correlation impossible.

Required Features

  • Activate the global TracerProvider by calling trace.set_tracer_provider() in apply_otel_tracer(), which was previously created but never registered.
  • Wire apply_otel_tracer() into the existing BraceStyleAdapter.apply_otel() call path so it runs during service discovery initialization.
  • Instrument aiohttp server and client for automatic W3C Trace Context propagation on incoming and outgoing HTTP requests.
  • Handle the aiohttp Application lifecycle ordering constraint: since web.Application() is instantiated before OTel config is available, manually inject the OTel server middleware into the existing app instance before it is frozen by runner.setup().

Impact

  • ai.backend.loggingotel.py and utils.py modified to activate tracing.
  • ai.backend.managerserver.py modified to instrument aiohttp and inject middleware.
  • No changes to API behavior, request handling, or database interactions.
  • Cross-service distributed tracing becomes functional when OTel is enabled.

Testing Scenarios

  • Verify that when otel.enabled = true, the global TracerProvider is registered and the OTel server middleware is present in root_app.middlewares.
  • Verify that when otel.enabled = false, no instrumentation or middleware injection occurs.
  • Verify that incoming requests with traceparent headers are correlated with the correct trace in the configured OTel backend (e.g., Tempo).
  • Verify that outgoing aiohttp client requests propagate traceparent headers.

JIRA Issue: BA-4330

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No fields configured for Story.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions