ci: add unit tests for all missing instrumentation and util packages by adityamehra · Pull Request #233 · signalfx/splunk-otel-python-contrib

adityamehra · 2026-03-18T19:31:19Z

Summary

Add CI test coverage for the 11 packages that were missing from the test workflow, and fix several bugs discovered during CI integration.

Packages added to CI

Instrumentation: openai-v2, openai-agents-v2, llamaindex, fastmcp, crewai, aidefense, weaviate
Util: emitters-test, traceloop-translator, openlit-translator, langsmith-translator

Bug fixes discovered during CI integration

1. `openai-v2` tests failing in CI — Splunk emitter missing `capture_content` guard (`d021a72`)

RCA: The SplunkConversationEventsEmitter (from opentelemetry-util-genai-emitters-splunk) registers with mode="replace-category" for "content_events", replacing the built-in ContentEventsEmitter. The built-in emitter's on_end() method has a guard if not self._capture_content: return that skips log emission when content capture is disabled. The Splunk replacement was missing this guard.

In CI, where the Splunk emitters package is installed, the emitter would emit (redacted) log records even when OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=False, causing assert len(logs) == 0 assertions to fail. Locally (where the Splunk package is not installed), the built-in emitter with the guard was used, so tests passed.

Fix: Added the if not self._capture_content: return guard to SplunkConversationEventsEmitter.on_end(), matching the built-in emitter's behavior.

2. `crewai` test failing in CI — `agent_id` is `None` after PR #228 regression (`55e0ff5`)

RCA: PR #228 ("Remove run_id from GenAI Utils") removed the run_id field which was previously used to auto-generate agent_id. After removal, agent_id defaults to None on all AgentInvocation objects. The span emitter's _start_agent() computes a fallback agent_id from span_id (f"{agent.span_id:016x}") and sets it as a span attribute, but never writes it back to the agent.agent_id field on the object. When handler.start_agent() pushes (agent.name, agent.agent_id) onto the context stack, it pushes None. Downstream LLM invocations inherit this None, causing GEN_AI_AGENT_ID assertions to fail.

This was also flagged by @pradystar during the PR #228 review: "wouldn't the agent_id be None for outermost agent since no parent sets it?"

Fix: Added 2-line derivation in handler.start_agent() — after on_start() sets span_id, derive agent_id = f"{span_id:016x}" when not explicitly set by the framework.

Other CI fixes

Disabled deepeval evaluators globally in CI via env var, with override for the evals-specific test step (394cb50)
Added llama-index-embeddings-openai test dependency for llamaindex (3612b47)
Fixed crewai workflow input serialization and empty output assertions (5410ea7)
Fixed VCR/pytest-recording configuration for openai-v2 tests (a34fee2, 85e1cb2, 4785828)
Removed translator packages that fail due to .pth auto-import conflicts (02947ec)

Known remaining failures

openai-agents-v2: 14 test failures (pre-existing, not introduced by this PR)
llamaindex: 7 failures in test_workflow_agent_variants ("Event stream already consumed" errors) + embedding/RAG tests need VCR cassettes

Made-with: Cursor

Cover the 11 packages that were missing from CI: - Instrumentation: openai-v2, openai-agents-v2, llamaindex, fastmcp, crewai, aidefense, weaviate - Util: emitters-test, traceloop-translator, openlit-translator, langsmith-translator Made-with: Cursor

The traceloop, openlit, and langsmith translator packages use .pth files that auto-execute on install and fail with ModuleNotFoundError for processor modules. Remove them from CI for now. Made-with: Cursor

pytest-recording (from langchain's test deps) is incompatible with pytest-vcr. pytest-recording provides the same @pytest.mark.vcr() decorator, so pytest-vcr is not needed separately. Made-with: Cursor

pytest-recording's built-in vcr fixture is function-scoped, so our module-scoped fixture_vcr that depends on it causes a ScopeMismatch error. Downgrade to function scope to match. Made-with: Cursor

…recording Without vcr_cassette_dir, pytest-recording looks for cassettes in a module-name subdirectory that doesn't exist. Adding the serializer key and session-scoped cassette dir fixture aligns with the langchain test pattern and ensures cassettes are loaded correctly. Made-with: Cursor

In CI, deepeval is installed via dev-genai-requirements.txt and auto-discovers through entry points. This causes it to make real OpenAI API calls during tests, consuming VCR cassette responses and emitting unexpected log records. Set evaluators to "none" and reset the handler singleton per test, matching the langchain pattern. Made-with: Cursor

Add if: always() to each test step so a failure in one package doesn't skip the remaining test suites. The job still fails if any step fails, but now all results are visible in a single run. Made-with: Cursor

…ssertions Workflow inputs are now serialized as JSON (preserving key names) via a dedicated _make_workflow_input_message function. Agent inputs keep the existing field-level extraction for description/expected_output/inquiry. Also fixes empty-output test assertions that expected '""' instead of "", and adds a test verifying agent captures both description and expected_output as input_messages. Made-with: Cursor

…dency Tests import llama_index.embeddings.openai.OpenAIEmbedding but the package was not listed in [project.optional-dependencies.test]. Made-with: Cursor

Prevents deepeval from making real OpenAI API calls during CI, which causes tests to hang or fail with connection errors. Made-with: Cursor

Set OTEL_INSTRUMENTATION_GENAI_EVALS_EVALUATORS=none as a workflow-level env var to prevent deepeval from making real OpenAI API calls during non-eval test suites in CI (where no API key is available). This was causing extra log emissions, VCR cassette mismatches, and test failures in openai-v2 and crewai tests. The evals-deepeval test step overrides with an empty value to allow deepeval to function normally for its own tests. Made-with: Cursor

…abled SplunkConversationEventsEmitter was missing the capture_content guard that the built-in ContentEventsEmitter has. When the splunk emitter replaces the built-in (via mode=replace-category), it emitted redacted log records even with CAPTURE_MESSAGE_CONTENT=False. This caused openai-v2 tests asserting len(logs)==0 to fail in CI (where the splunk emitters package is installed) but pass locally (where it is not). Made-with: Cursor

PR #228 removed run_id which was previously used as agent_id. After that change, agent_id defaults to None on AgentInvocation objects. The span emitter computes a fallback from span_id and sets it on the span, but never writes it back to the object. This left agent_id as None on the context stack, so downstream LLM invocations inherited None instead of a valid identifier. Derive agent_id = f"{span_id:016x}" in start_agent after on_start when agent_id is still None, matching the span emitter's fallback logic. Made-with: Cursor

- openai-agents-v2/test_tracer.py: all 14 tests broken after PR #228 (run_id removal); needs rework of span processor stubs - llamaindex embedding/rag tests: require live OpenAI API key, need VCR cassettes added - llamaindex workflow agent tests: "Event stream already consumed" errors in workflow event instrumentation Made-with: Cursor

Replace git+https monorepo references with pinned PyPI versions for faster, more reliable CI installs. All versions match the v1.38.0 / v0.59b0 release. Made-with: Cursor

…ed test modules Made-with: Cursor

On Windows, connecting to localhost:4242 with timeout=0.1s raises APITimeoutError (TCP SYN retries) instead of APIConnectionError (immediate ECONNREFUSED on Unix). Both indicate unreachable endpoint. Made-with: Cursor

pradystar · 2026-03-20T21:43:22Z

util/opentelemetry-util-genai/src/opentelemetry/util/genai/handler.py

        self._emitter.on_start(agent)
+        # Derive agent_id from span_id when not explicitly set by the framework
+        if agent.agent_id is None and agent.span_id is not None:
+            agent.agent_id = f"{agent.span_id:016x}"


are we using hex value elsewhere too?

We removed this because the instrumentation should be assigning an agent id. see #228 (comment)

Is it the job of the genai utils to generate an id for the agent?

@keith-decker thanks! yeah this is specifically when framework does not set it (maybe crewai?). Maybe we want to add a note if we are leaving this on the framework.
@adityamehra do you agree? however, if we do want to set agent id here it should be string not hex value.

adityamehra requested review from a team as code owners March 18, 2026 19:31

adityamehra added 5 commits March 18, 2026 13:42

ci: remove translator packages that fail due to .pth auto-import

02947ec

The traceloop, openlit, and langsmith translator packages use .pth files that auto-execute on install and fail with ModuleNotFoundError for processor modules. Remove them from CI for now. Made-with: Cursor

ci: drop pytest-vcr to fix conflict with pytest-recording

4785828

pytest-recording (from langchain's test deps) is incompatible with pytest-vcr. pytest-recording provides the same @pytest.mark.vcr() decorator, so pytest-vcr is not needed separately. Made-with: Cursor

fix(openai-v2): fix vcr fixture scope mismatch with pytest-recording

85e1cb2

pytest-recording's built-in vcr fixture is function-scoped, so our module-scoped fixture_vcr that depends on it causes a ScopeMismatch error. Downgrade to function scope to match. Made-with: Cursor

adityamehra force-pushed the ci/add-missing-unit-tests branch from c7c865f to a34fee2 Compare March 18, 2026 20:42

adityamehra added 2 commits March 18, 2026 13:52

ci: run all test steps even when earlier ones fail

9bc80d7

Add if: always() to each test step so a failure in one package doesn't skip the remaining test suites. The job still fails if any step fails, but now all results are visible in a single run. Made-with: Cursor

adityamehra force-pushed the ci/add-missing-unit-tests branch from 2c147ff to 9bc80d7 Compare March 19, 2026 15:51

adityamehra added 3 commits March 19, 2026 11:46

fix(llamaindex): add missing llama-index-embeddings-openai test depen…

3612b47

…dency Tests import llama_index.embeddings.openai.OpenAIEmbedding but the package was not listed in [project.optional-dependencies.test]. Made-with: Cursor

fix(llamaindex): add conftest to disable deepeval evaluators in tests

fabe498

Prevents deepeval from making real OpenAI API calls during CI, which causes tests to hang or fail with connection errors. Made-with: Cursor

adityamehra force-pushed the ci/add-missing-unit-tests branch from 31923c3 to fabe498 Compare March 19, 2026 19:37

adityamehra added 7 commits March 19, 2026 13:07

ci: switch dev-genai-requirements to PyPI packages instead of git

a70490f

Replace git+https monorepo references with pinned PyPI versions for faster, more reliable CI installs. All versions match the v1.38.0 / v0.59b0 release. Made-with: Cursor

fix(lint): add E402 per-file-ignores and fix import sorting for skipp…

d5536eb

…ed test modules Made-with: Cursor

adityamehra force-pushed the ci/add-missing-unit-tests branch from 5238700 to 1ad9ea9 Compare March 20, 2026 18:02

pradystar reviewed Mar 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: add unit tests for all missing instrumentation and util packages#233

ci: add unit tests for all missing instrumentation and util packages#233
adityamehra wants to merge 17 commits intomainfrom
ci/add-missing-unit-tests

adityamehra commented Mar 18, 2026 •

edited

Loading

Uh oh!

pradystar Mar 20, 2026

Uh oh!

keith-decker Mar 20, 2026

Uh oh!

pradystar Mar 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

adityamehra commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Packages added to CI

Bug fixes discovered during CI integration

1. openai-v2 tests failing in CI — Splunk emitter missing capture_content guard (d021a72)

2. crewai test failing in CI — agent_id is None after PR #228 regression (55e0ff5)

Other CI fixes

Known remaining failures

Uh oh!

pradystar Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

keith-decker Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

pradystar Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

adityamehra commented Mar 18, 2026 •

edited

Loading

1. `openai-v2` tests failing in CI — Splunk emitter missing `capture_content` guard (`d021a72`)

2. `crewai` test failing in CI — `agent_id` is `None` after PR #228 regression (`55e0ff5`)

pradystar Mar 20, 2026 •

edited

Loading