Skip to content

Conversation

@hhoikoo
Copy link
Member

@hhoikoo hhoikoo commented Feb 9, 2026

resolves #8693 (BA-4330)

Overview

Enables OpenTelemetry distributed tracing in the Manager by activating the global TracerProvider and instrumenting aiohttp server/client for W3C Trace Context propagation. The aiohttp instrumentation ordering issue is worked around with manual middleware injection (see TODO below). Also tunes the BatchSpanProcessor queue and batch sizes for production GraphQL workloads.

Problem Statement

  • The Manager had OTel logging configured but apply_otel_tracer() never called trace.set_tracer_provider(), so no TracerProvider was registered globally.
  • instrument_aiohttp_server() and instrument_aiohttp_client() were called inside service_discovery_ctx which runs during runner.setup() — after the aiohttp Application is already instantiated and frozen. Since AioHttpServerInstrumentor works by patching the Application class via setattr, it only affects instances created after the call, leaving the existing root app without OTel middleware.
  • As a result, incoming traceparent/tracestate headers from upstream services (e.g., the Kubernetes Bridge) were never extracted, and cross-service trace correlation was impossible.
  • The SDK-default BatchSpanProcessor queue size (2048) and export batch size (512) are insufficient for production GraphQL workloads, causing span drops during burst traffic.

Architecture

sequenceDiagram
    participant Bridge as Bridge Service
    participant AIOHTTP as aiohttp Server<br/>(OTel Middleware)
    participant GQL as GraphQL Layer
    participant Tempo as OTel Backend

    Bridge->>AIOHTTP: HTTP + traceparent header
    Note over AIOHTTP: OTel middleware extracts<br/>trace context
    AIOHTTP->>GQL: Request with trace context
    GQL-->>AIOHTTP: Response
    AIOHTTP-->>Bridge: HTTP Response
    AIOHTTP->>Tempo: Export spans
Loading

BatchSpanProcessor Tuning

Added max-queue-size (default: 65536) and max-export-batch-size (default: 4096) to OTELConfig, following the same Annotated[..., Field(default=...), BackendAIConfigMeta(...)] pattern used by other configs (Redis, Pyroscope, etc.). Values flow through OTELConfigOpenTelemetrySpecBatchSpanProcessor.

TODO: Instrumentation Ordering Workaround

The manager startup creates web.Application() before OTel config is available (config is loaded from etcd via config_provider_ctx). This PR works around the ordering constraint by:

  1. Calling instrument_aiohttp_server() / instrument_aiohttp_client() after config is loaded but before the app is frozen
  2. Manually inserting the OTel server middleware into root_app.middlewares

This is a workaround, not a proper fix. The proper fix would decouple the manager's setup procedure from the aiohttp Application lifecycle so that OTel instrumentation runs before web.Application() is created (e.g., by moving OTel config to BootstrapConfig or by restructuring startup to separate config loading from aiohttp cleanup contexts).


Checklist: (if applicable)

  • Milestone metadata specifying the target backport version
  • Mention to the original issue
  • Installer updates including:
    • Fixtures for db schema changes
    • New mandatory config options
  • Update of end-to-end CLI integration tests in ai.backend.test
  • API server-client counterparts (e.g., manager API -> client SDK)
  • Test case(s) to:
    • Demonstrate the difference of before/after
    • Demonstrate the flow of abstract/conceptual models with a concrete implementation
  • Documentation
    • Contents in the docs directory
    • docstrings in public interfaces and type annotations

Copilot AI review requested due to automatic review settings February 9, 2026 05:57
@github-actions github-actions bot added size:XL 500~ LoC area:docs Documentations comp:manager Related to Manager component comp:client Related to Client component comp:common Related to Common component comp:cli Related to CLI component require:db-migration Automatically set when alembic migrations are added or updated labels Feb 9, 2026
@hhoikoo hhoikoo marked this pull request as draft February 9, 2026 05:58
@hhoikoo hhoikoo changed the base branch from main to feat/BA-4299 February 9, 2026 05:59
@hhoikoo hhoikoo removed area:docs Documentations comp:client Related to Client component comp:cli Related to CLI component comp:common Related to Common component labels Feb 9, 2026
@github-actions github-actions bot added size:L 100~500 LoC comp:cli Related to CLI component and removed size:XL 500~ LoC labels Feb 9, 2026
@hhoikoo hhoikoo removed the require:db-migration Automatically set when alembic migrations are added or updated label Feb 9, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Enables OpenTelemetry distributed tracing in the Manager and expands the typed action/audit/service-discovery foundations to improve observability, RBAC classification, and query capabilities across services.

Changes:

  • Activate global OTel tracer provider and instrument aiohttp server/client; add GraphQL resolver spans.
  • Replace string-based entity_type/operation_type with typed enums across actions and RBAC validators.
  • Add service discovery foundations (config schema, events, DB models/migration) and extend several GraphQL filter/order features.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
.claude/skills/README.md Documents new /bep-guide skill and usage flow.
CLAUDE.md Updates contributor guidance to reference /bep-guide.
changes/8615.fix.md Changelog entry for role-change group deletion fix.
changes/8628.fix.md Changelog entry for scoped_query project override fix.
changes/8643.fix.md Changelog entry for GraphQL async resolver timing fix.
changes/8647.fix.md Changelog entry for deployment duplicate name returning 409.
changes/8648.feature.md Changelog entry for usage bucket CLI date filtering.
changes/8652.fix.md Changelog entry for collecting error_code in history records.
changes/8671.feature.md Changelog entry for nested filter/order helpers.
changes/8672.doc.md Changelog entry for BEP-1046 docs.
changes/8674.doc.md Changelog entry for BEP-1047 docs.
changes/8675.feature.md Changelog entry for DomainV2 nested filters/orders.
changes/8676.doc.md Changelog entry for BEP conventions.
changes/8677.feature.md Changelog entry for resource slot normalization foundations.
changes/8678.feature.md Changelog entry for ProjectV2 nested filters/orders.
changes/8679.enhance.md Changelog entry for StrEnum action type system changes.
changes/8680.enhance.md Changelog entry for separating child entity types.
changes/8681.feature.md Changelog entry for UserV2 nested filters.
changes/8682.feature.md Changelog entry for ResourceGroup filter/order + Agent status order.
changes/8683.feature.md Changelog entry for fair share GQL node references.
changes/8684.enhance.md Changelog entry for removing AuditLogEntityType etc.
changes/8685.feature.md Changelog entry for service discovery foundations.
docs/manager/rest-reference/openapi.json Adds OpenAPI schema for DateRangeFilter and usage-bucket date filtering.
proposals/BEP-1046/cli-migration.md Adds BEP-1046 CLI migration details.
proposals/BEP-1046/data-model.md Adds BEP-1046 service discovery DB model documentation.
proposals/BEP-1046/event-registration.md Adds BEP-1046 event registration/flow documentation.
proposals/BEP-1047/current-design.md Adds BEP-1047 current design detail.
proposals/BEP-1047/migration-compatibility.md Adds BEP-1047 migration/compat guidance.
proposals/BEP-1047/phase-2-scheduler-integration.md Adds BEP-1047 phase 2 scheduler integration detail.
proposals/BEP-1047/phase-3-usage-bucket-optimization.md Adds BEP-1047 phase 3 usage-bucket optimization detail.
src/ai/backend/common/configs/init.py Exposes ServiceEndpointConfig in configs package exports.
src/ai/backend/common/configs/service_discovery.py Adds endpoint/instance metadata config schema for service discovery.
src/ai/backend/common/dto/manager/fair_share/types.py Adds period_start date range filtering to usage bucket DTO filters.
src/ai/backend/common/dto/manager/query.py Introduces DateRangeFilter DTO.
src/ai/backend/common/events/event_types/service_discovery/anycast.py Adds service discovery anycast events and endpoint payload model.
src/ai/backend/common/events/types.py Adds SERVICE_DISCOVERY event domain.
src/ai/backend/common/exception.py Adds DeploymentNameAlreadyExists (HTTP 409) error type.
src/ai/backend/common/metrics/metric.py Types action-observation entity_type as EntityType.
src/ai/backend/common/types.py Adds ServiceCatalogStatus enum.
src/ai/backend/logging/otel.py Sets global tracer provider to activate tracing.
src/ai/backend/logging/utils.py Wires tracer initialization into BraceStyleAdapter.apply_otel().
src/ai/backend/manager/actions/action/base.py Changes action API to return typed EntityType and ActionOperationType.
src/ai/backend/manager/actions/action/batch.py Removes permission_operation_type() in favor of operation_type().
src/ai/backend/manager/actions/action/scope.py Types scope entity as ScopeType and removes permission_operation_type().
src/ai/backend/manager/actions/action/single_entity.py Removes permission_operation_type() in favor of operation_type().
src/ai/backend/manager/actions/types.py Adds ActionOperationType and updates ActionSpec types.
src/ai/backend/manager/actions/validators/rbac/scope.py Uses typed action/scope enums and maps operation to permission operation.
src/ai/backend/manager/actions/validators/rbac/single_entity.py Uses typed action enums and maps operation to permission operation.
src/ai/backend/manager/api/gql/agent/types.py Adds Agent ordering by STATUS.
src/ai/backend/manager/api/gql/domain_v2/types/init.py Exposes DomainV2 nested filters.
src/ai/backend/manager/api/gql/fair_share/types/init.py Exposes new fair-share nested filters/types.
src/ai/backend/manager/api/gql/fair_share/types/domain.py Replaces child navigation with domain node reference; adds nested filter/order.
src/ai/backend/manager/api/gql/project_v2/types/init.py Exposes ProjectV2 nested filters.
src/ai/backend/manager/api/gql/resource_group/types.py Extends ResourceGroup filter/order fields.
src/ai/backend/manager/api/gql/user_v2/fetcher/init.py Exposes new fetch_user API.
src/ai/backend/manager/api/gql/user_v2/fetcher/user.py Adds single-user fetch by UUID.
src/ai/backend/manager/api/gql/user_v2/types/init.py Exposes new UserV2 nested filters.
src/ai/backend/manager/api/gql_legacy/audit_log.py Switches audit log entity type variants to common EntityType.
src/ai/backend/manager/api/gql_legacy/base.py Prevents scoped_query from overriding project param with group_id.
src/ai/backend/manager/api/gql_legacy/schema.py Adds GraphQL resolver OTel spans and improved async timing/error handling.
src/ai/backend/manager/cli/main.py Ensures AgentRow mapper is registered for relationship resolution.
src/ai/backend/manager/models/alembic/versions/acd2e76c1e40_add_service_catalog_tables.py Adds service catalog tables migration.
src/ai/backend/manager/models/audit_log/init.py Removes AuditLogEntityType export.
src/ai/backend/manager/models/audit_log/row.py Removes AuditLogEntityType enum definition.
src/ai/backend/manager/models/resource_slot/init.py Adds resource slot model exports.
src/ai/backend/manager/models/service_catalog/init.py Adds service catalog model exports.
src/ai/backend/manager/models/service_catalog/row.py Adds service catalog ORM models.
src/ai/backend/manager/repositories/agent/options.py Adds ordering by agent status.
src/ai/backend/manager/repositories/deployment/db_source/db_source.py Adds endpoint-name uniqueness check with a 409 error on conflict.
src/ai/backend/manager/repositories/fair_share/db_source/db_source.py Adds joins to support nested filters/orders in fair share searches.
src/ai/backend/manager/repositories/user/db_source/db_source.py Avoids clearing user groups implicitly on role change.
src/ai/backend/manager/reporters/base.py Types action messages with EntityType and ActionOperationType.
src/ai/backend/manager/server.py Instruments aiohttp server/client when OTel enabled.
src/ai/backend/manager/services/agent/actions/base.py Switches entity_type() to EntityType.AGENT.
src/ai/backend/manager/services/agent/actions/get_total_resources.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/agent/actions/get_watcher_status.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/agent/actions/handle_heartbeat.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/agent/actions/load_container_counts.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/agent/actions/mark_agent_exit.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/agent/actions/mark_agent_running.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/agent/actions/recalculate_usage.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/agent/actions/remove_agent_from_images.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/agent/actions/remove_agent_from_images_by_canonicals.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/agent/actions/search_agents.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/agent/actions/sync_agent_registry.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/agent/actions/watcher_agent_restart.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/agent/actions/watcher_agent_start.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/agent/actions/watcher_agent_stop.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/app_config/actions/domain.py Switches action entity/operation typing to enums.
src/ai/backend/manager/services/app_config/actions/get_merged.py Switches action entity/operation typing to enums.
src/ai/backend/manager/services/app_config/actions/user.py Switches action entity/operation typing to enums.
src/ai/backend/manager/services/artifact/actions/base.py Switches entity_type() to EntityType.ARTIFACT.
src/ai/backend/manager/services/artifact/actions/delegate_scan.py Refines entity/operation typing; adds ARTIFACT_SCAN entity type.
src/ai/backend/manager/services/artifact/actions/delete_multi.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/artifact/actions/get.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/artifact/actions/get_revisions.py Refines entity/operation typing for artifact revisions.
src/ai/backend/manager/services/artifact/actions/list_with_revisions.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/artifact/actions/restore_multi.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/artifact/actions/retrieve_model.py Refines entity/operation typing for artifact model retrieval.
src/ai/backend/manager/services/artifact/actions/retrieve_model_multi.py Refines entity/operation typing for artifact model retrieval.
src/ai/backend/manager/services/artifact/actions/scan.py Refines entity/operation typing for artifact scan.
src/ai/backend/manager/services/artifact/actions/search.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/artifact/actions/search_with_revisions.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/artifact/actions/update.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/artifact/actions/upsert_multi.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/artifact_registry/actions/base.py Switches entity_type() to EntityType.ARTIFACT_REGISTRY.
src/ai/backend/manager/services/artifact_registry/actions/common/get_meta.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/artifact_registry/actions/common/get_multi.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/artifact_registry/actions/common/search.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/artifact_registry/actions/huggingface/create.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/artifact_registry/actions/huggingface/delete.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/artifact_registry/actions/huggingface/get.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/artifact_registry/actions/huggingface/get_multi.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/artifact_registry/actions/huggingface/list.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/artifact_registry/actions/huggingface/search.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/artifact_registry/actions/huggingface/update.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/artifact_registry/actions/reservoir/create.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/artifact_registry/actions/reservoir/delete.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/artifact_registry/actions/reservoir/get.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/artifact_registry/actions/reservoir/get_multi.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/artifact_registry/actions/reservoir/list.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/artifact_registry/actions/reservoir/search.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/artifact_registry/actions/reservoir/update.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/artifact_revision/actions/approve.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/artifact_revision/actions/associate_with_storage.py Refines entity/operation typing; adds storage-link entity type.
src/ai/backend/manager/services/artifact_revision/actions/base.py Switches entity_type() to EntityType.ARTIFACT_REVISION.
src/ai/backend/manager/services/artifact_revision/actions/cancel_import.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/artifact_revision/actions/cleanup.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/artifact_revision/actions/delegate_import_revision_batch.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/artifact_revision/actions/disassociate_with_storage.py Refines entity/operation typing; adds storage-link entity type.
src/ai/backend/manager/services/artifact_revision/actions/get.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/artifact_revision/actions/get_download_progress.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/artifact_revision/actions/get_readme.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/artifact_revision/actions/get_verification_result.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/artifact_revision/actions/import_revision.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/artifact_revision/actions/import_revision_batch.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/artifact_revision/actions/reject.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/artifact_revision/actions/search.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/audit_log/actions/base.py Switches entity_type() to EntityType.AUDIT_LOG.
src/ai/backend/manager/services/audit_log/actions/create.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/audit_log/actions/search.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/auth/actions/authorize.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/auth/actions/base.py Switches entity_type() to EntityType.AUTH.
src/ai/backend/manager/services/auth/actions/generate_ssh_keypair.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/auth/actions/get_role.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/auth/actions/get_ssh_keypair.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/auth/actions/signout.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/auth/actions/signup.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/auth/actions/update_full_name.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/auth/actions/update_password.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/auth/actions/update_password_no_auth.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/auth/actions/upload_ssh_keypair.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/container_registry/actions/base.py Switches entity_type() to EntityType.CONTAINER_REGISTRY.
src/ai/backend/manager/services/container_registry/actions/clear_images.py Refines entity/operation typing for registry images.
src/ai/backend/manager/services/container_registry/actions/create_container_registry.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/container_registry/actions/delete_container_registry.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/container_registry/actions/get_container_registries.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/container_registry/actions/load_all_container_registries.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/container_registry/actions/load_container_registries.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/container_registry/actions/modify_container_registry.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/container_registry/actions/rescan_images.py Refines entity/operation typing for registry images.
src/ai/backend/manager/services/deployment/actions/access_token/base.py Switches entity_type() to EntityType.DEPLOYMENT_ACCESS_TOKEN.
src/ai/backend/manager/services/deployment/actions/access_token/create_access_token.py Switches base type and operation_type() to typed enums.
src/ai/backend/manager/services/deployment/actions/access_token/search_access_tokens.py Switches base type and operation_type() to typed enums.
src/ai/backend/manager/services/deployment/actions/auto_scaling_rule/base.py Switches entity_type() to deployment auto-scaling-rule entity.
src/ai/backend/manager/services/deployment/actions/auto_scaling_rule/create_auto_scaling_rule.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/deployment/actions/auto_scaling_rule/delete_auto_scaling_rule.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/deployment/actions/auto_scaling_rule/search_auto_scaling_rules.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/deployment/actions/auto_scaling_rule/update_auto_scaling_rule.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/deployment/actions/base.py Switches entity_type() to EntityType.DEPLOYMENT.
src/ai/backend/manager/services/deployment/actions/create_deployment.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/deployment/actions/create_legacy_deployment.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/deployment/actions/deployment_policy/base.py Switches entity_type() to EntityType.DEPLOYMENT_POLICY.
src/ai/backend/manager/services/deployment/actions/deployment_policy/get_deployment_policy.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/deployment/actions/destroy_deployment.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/deployment/actions/get_deployment_by_id.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/deployment/actions/get_replica_by_id.py Switches base type and operation_type() to typed enums.
src/ai/backend/manager/services/deployment/actions/model_revision/add_model_revision.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/deployment/actions/model_revision/base.py Switches entity_type() to model revision entity type.
src/ai/backend/manager/services/deployment/actions/model_revision/create_model_revision.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/deployment/actions/model_revision/get_revision_by_id.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/deployment/actions/model_revision/search_revisions.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/deployment/actions/replica/base.py Introduces replica base action with replica entity type.
src/ai/backend/manager/services/deployment/actions/revision_operations/activate_revision.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/deployment/actions/revision_operations/base.py Switches entity_type() to deployment revision entity type.
src/ai/backend/manager/services/deployment/actions/route/base.py Switches entity_type() to route entity type.
src/ai/backend/manager/services/deployment/actions/route/search_routes.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/deployment/actions/route/update_route_traffic_status.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/deployment/actions/search_deployments.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/deployment/actions/search_replicas.py Switches base type and operation_type() to typed enums.
src/ai/backend/manager/services/deployment/actions/sync_replicas.py Switches base type and operation_type() to typed enums.
src/ai/backend/manager/services/deployment/actions/update_deployment.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/domain/actions/base.py Switches entity_type() to EntityType.DOMAIN.
src/ai/backend/manager/services/domain/actions/create_domain.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/domain/actions/create_domain_node.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/domain/actions/delete_domain.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/domain/actions/get_domain.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/domain/actions/modify_domain.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/domain/actions/modify_domain_node.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/domain/actions/purge_domain.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/domain/actions/search_domains.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/domain/actions/search_rg_domains.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/error_log/actions/base.py Switches entity_type() to EntityType.ERROR_LOG.
src/ai/backend/manager/services/error_log/actions/create.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/error_log/actions/search.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/export/actions/base.py Switches entity_type() to EntityType.EXPORT and typed operation.
src/ai/backend/manager/services/export/actions/export_audit_logs_csv.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/export/actions/export_keypairs_csv.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/export/actions/export_projects_csv.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/export/actions/export_sessions_csv.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/export/actions/export_users_csv.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/export/actions/get_report.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/export/actions/list_reports.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/group/actions/base.py Switches entity_type() to EntityType.GROUP.
src/ai/backend/manager/services/group/actions/create_group.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/group/actions/delete_group.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/group/actions/modify_group.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/group/actions/purge_group.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/group/actions/search_projects.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/group/actions/usage_per_month.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/group/actions/usage_per_period.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/image/actions/alias_base.py Introduces base action for image alias entity typing.
src/ai/backend/manager/services/image/actions/alias_image.py Switches to alias base action and typed operation.
src/ai/backend/manager/services/image/actions/base.py Switches entity_type() to EntityType.IMAGE.
src/ai/backend/manager/services/image/actions/clear_image_custom_resource_limit.py Switches to resource-limit base action and typed operation.
src/ai/backend/manager/services/image/actions/dealias_image.py Switches to alias base action and typed operation.
src/ai/backend/manager/services/image/actions/forget_image.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/image/actions/get_all_images.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/image/actions/get_image_installed_agents.py Refines entity/operation typing for image-agent queries.
src/ai/backend/manager/services/image/actions/get_images.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/image/actions/modify_image.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/image/actions/preload_image.py Refines entity/operation typing for image preload.
src/ai/backend/manager/services/image/actions/purge_images.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/image/actions/resource_limit_base.py Introduces base action for image resource-limit entity typing.
src/ai/backend/manager/services/image/actions/scan_image.py Refines entity/operation typing for image scan.
src/ai/backend/manager/services/image/actions/search_aliases.py Switches to alias base action and typed operation.
src/ai/backend/manager/services/image/actions/search_images.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/image/actions/set_image_resource_limit.py Switches to resource-limit base action and typed operation.
src/ai/backend/manager/services/image/actions/unload_image.py Refines entity/operation typing for image preload/unload.
src/ai/backend/manager/services/image/actions/untag_image_from_registry.py Refines entity/operation typing for image tag deletion.
src/ai/backend/manager/services/keypair_resource_policy/actions/base.py Switches entity_type() to EntityType.KEYPAIR_RESOURCE_POLICY.
src/ai/backend/manager/services/keypair_resource_policy/actions/create_keypair_resource_policy.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/keypair_resource_policy/actions/delete_keypair_resource_policy.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/keypair_resource_policy/actions/modify_keypair_resource_policy.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/metric/actions/container.py Switches entity/operation typing for container metric actions.
src/ai/backend/manager/services/model_serving/actions/base.py Switches entity_type() to EntityType.MODEL_SERVICE.
src/ai/backend/manager/services/model_serving/actions/clear_error.py Refines entity/operation typing for deployment error clear.
src/ai/backend/manager/services/model_serving/actions/create_auto_scaling_rule.py Refines entity/operation typing for deployment auto-scaling rule creation.
src/ai/backend/manager/services/model_serving/actions/create_model_service.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/model_serving/actions/delete_auto_scaling_rule.py Refines entity/operation typing for deployment auto-scaling rule deletion.
src/ai/backend/manager/services/model_serving/actions/delete_model_service.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/model_serving/actions/delete_route.py Refines entity/operation typing for route deletion.
src/ai/backend/manager/services/model_serving/actions/dry_run_model_service.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/model_serving/actions/force_sync.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/model_serving/actions/generate_token.py Refines entity/operation typing for deployment token.
src/ai/backend/manager/services/model_serving/actions/get_model_service_info.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/model_serving/actions/list_errors.py Refines entity/operation typing for deployment errors listing.
src/ai/backend/manager/services/model_serving/actions/list_model_service.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/model_serving/actions/modify_auto_scaling_rule.py Refines entity/operation typing for auto scaling rule updates.
src/ai/backend/manager/services/model_serving/actions/modify_endpoint.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/model_serving/actions/scale_service_replicas.py Refines entity/operation typing for replica scaling.
src/ai/backend/manager/services/model_serving/actions/search_auto_scaling_rules.py Refines entity/operation typing for auto scaling rule searches.
src/ai/backend/manager/services/model_serving/actions/update_route.py Refines entity/operation typing for route updates.
src/ai/backend/manager/services/notification/actions/base.py Switches entity/operation typing for notification actions.
src/ai/backend/manager/services/notification/actions/create_channel.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/notification/actions/create_rule.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/notification/actions/delete_channel.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/notification/actions/delete_rule.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/notification/actions/get_channel.py Switches operation_type() to typed ActionOperationType.
src/ai/backend/manager/services/notification/actions/get_rule.py Switches operation_type() to typed ActionOperationType.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@github-actions github-actions bot added size:M 30~100 LoC and removed size:L 100~500 LoC labels Feb 11, 2026
@hhoikoo hhoikoo force-pushed the feat/BA-4299 branch 4 times, most recently from d49b7a4 to 88b0d9f Compare February 11, 2026 10:32
@hhoikoo hhoikoo changed the base branch from feat/BA-4299 to main February 11, 2026 10:37
@github-actions github-actions bot added size:S 10~30 LoC and removed size:M 30~100 LoC labels Feb 11, 2026
@github-actions github-actions bot added size:M 30~100 LoC comp:common Related to Common component and removed size:S 10~30 LoC labels Feb 11, 2026
@hhoikoo hhoikoo force-pushed the feat/BA-4330 branch 2 times, most recently from 094cfcf to b3c9705 Compare February 11, 2026 13:46
@github-actions github-actions bot added comp:agent Related to Agent component comp:webserver Related to Web Server component comp:storage-proxy Related to Storage proxy component comp:app-proxy Related to App Proxy component labels Feb 11, 2026
hhoikoo and others added 4 commits February 11, 2026 22:50
Activate the previously stubbed OTel tracing pipeline so Backend.AI
participates in distributed traces initiated by external clients.

- Set the global TracerProvider in apply_otel_tracer() so
  trace.get_tracer() returns a functioning tracer
- Call apply_otel_tracer() from BraceStyleAdapter.apply_otel()
- Instrument aiohttp server/client to propagate W3C Trace Context
- Add OTel spans to GQLMetricMiddleware.resolve() with attributes
  for operation name, field name, and parent type
- Register AgentRow ORM mapper in clear-history CLI to fix lazy
  relationship resolution

Co-Authored-By: Claude Opus 4.6 <[email protected]>
…pagation

Move instrument_aiohttp_server/client() from service_discovery_ctx to
server_main() before the app is frozen. The instrumentor patches the
Application class, but since root_app is already instantiated by that
point, we must manually inject the OTel server middleware into the
existing app's middleware list. This ensures incoming W3C traceparent
headers are extracted and cross-service traces are correlated.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Add max_queue_size and max_export_batch_size to OTELConfig and
OpenTelemetrySpec, defaulting to 65536 and 4096 respectively.
The SDK defaults (2048/512) are insufficient for production GraphQL
workloads and cause span drops during burst traffic.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@hhoikoo hhoikoo marked this pull request as ready for review February 12, 2026 01:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp:agent Related to Agent component comp:app-proxy Related to App Proxy component comp:cli Related to CLI component comp:common Related to Common component comp:manager Related to Manager component comp:storage-proxy Related to Storage proxy component comp:webserver Related to Web Server component size:M 30~100 LoC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enable OpenTelemetry Distributed Tracing Infrastructure in Manager

1 participant