Skip to content

Add structured logs to zenml-server using structlog#4676

Open
amitvikramraj wants to merge 14 commits intodevelopfrom
feat/otel
Open

Add structured logs to zenml-server using structlog#4676
amitvikramraj wants to merge 14 commits intodevelopfrom
feat/otel

Conversation

@amitvikramraj
Copy link
Copy Markdown
Contributor

@amitvikramraj amitvikramraj commented Apr 2, 2026

Changes I made

  • Replaced the custom ConsoleFormatter with structlog as a ProcessorFormatter over Python's standard logging.
  • Server-side modules (zen_server/, zen_stores/) now emit structured events with keyword arguments instead of f-string messages.
  • Request context (request_id, method, path, etc.) is automatically propagated via structlog contextvars — no manual interpolation needed.
  • Added OpenTelemetry exporter for logs, traces and metrics.
  • Also includes a few small improvements:
    • X-Request-ID in error responses to easily be able to filter logs
    • Added rate limit warning messages in logs

Why

  • Server logs were unstructured f-strings with manually threaded request IDs across 11+ files.
  • This made them impossible to query in by tools like Loki/Garfana and painful to maintain.
  • structlog provides parseable JSON output in the server and colored console output locally, with automatic context propagation.

Before / After

Before:

[d5638eeb] API STATS - GET /api/v1/pipelines AUTHORIZING  [ active_requests: 1 memory_used_mb: 343.77 thread_count: 6 open_fds: 28 fd_limit: 65535 ]
[d5638eeb] SQL STATS - GET /api/v1/pipelines 'SqlZenStore.list_runs' STARTED  [ active_connections: 0 idle_connections: 3 overflow_connections: -17 ]
[d5638eeb] SQL STATS - GET /api/v1/pipelines 'SqlZenStore.list_runs' COMPLETED in 12.34ms  [ active_connections: 0 idle_connections: 3 overflow_connections: -17 ]
[d5638eeb] API STATS - 200 GET /api/v1/pipelines took 38.42ms  [ active_requests: 1 memory_used_mb: 343.77 thread_count: 6 ]

After — console:

2026-04-02T09:21:52Z [debug    ] request.received               [zenml.zen_server.middleware] client_ip=172.217.26.123 method=GET path=/api/v1/pipelines request_id=d5638eeb
2026-04-02T09:21:52Z [debug    ] sql.session.started            [zenml.zen_stores.sql_zen_store] active_connections=0 caller=SqlZenStore.list_runs idle_connections=3 request_id=d5638eeb
2026-04-02T09:21:52Z [debug    ] sql.session.completed          [zenml.zen_stores.sql_zen_store] caller=SqlZenStore.list_runs duration_ms=12.34 error=false request_id=d5638eeb
2026-04-02T09:21:52Z [debug    ] request.completed              [zenml.zen_server.middleware] duration_ms=38.42ms method=GET path=/api/v1/pipelines request_id=d5638eeb status_code=200

After — JSON (server default, queryable in Loki/Grafana):

{"event":"request.received","level":"debug","logger":"zenml.zen_server.middleware","method":"GET","path":"/api/v1/pipelines","request_id":"d5638eeb","client_ip":"172.217.26.123","timestamp":"2026-04-02T09:21:52Z"}
{"event":"sql.session.started","level":"debug","logger":"zenml.zen_stores.sql_zen_store","caller":"SqlZenStore.list_runs","active_connections":0,"idle_connections":3,"request_id":"d5638eeb","timestamp":"2026-04-02T09:21:52Z"}
{"event":"sql.session.completed","level":"debug","logger":"zenml.zen_stores.sql_zen_store","caller":"SqlZenStore.list_runs","duration_ms":12.34,"error":false,"request_id":"d5638eeb","timestamp":"2026-04-02T09:21:52Z"}
{"event":"request.completed","level":"debug","logger":"zenml.zen_server.middleware","method":"GET","path":"/api/v1/pipelines","status_code":200,"duration_ms":"38.42ms","request_id":"d5638eeb","timestamp":"2026-04-02T09:21:52Z"}

Steps to reproduce:

  1. Run docker compose up --build — ZenML UI starts on port 3001, Grafana UI runs on port 3002
  2. Open Grafana → Loki → Verify the logs

TODO: Later remove the docker-compose.

@socket-security
Copy link
Copy Markdown

socket-security bot commented Apr 2, 2026

@amitvikramraj amitvikramraj linked an issue Apr 2, 2026 that may be closed by this pull request
1 task
&& uv pip install .[server,secrets-aws,secrets-gcp,secrets-azure,secrets-hashicorp,s3fs,gcsfs,adlfs,connectors-aws,connectors-gcp,connectors-azure,azureml,sagemaker,vertex] "alembic==1.15.2" \
&& uv pip uninstall zenml \
&& uv pip freeze > requirements.txt
&& uv pip install .[server,secrets-aws,secrets-gcp,secrets-azure,secrets-hashicorp,s3fs,gcsfs,adlfs,connectors-aws,connectors-gcp,connectors-azure,azureml,sagemaker,vertex,otel] "alembic==1.15.2" \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking otel might as well be part of the server dependencies.

@amitvikramraj amitvikramraj marked this pull request as ready for review April 7, 2026 09:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add structured log

2 participants