Skip to content

ci: add unit tests for all missing instrumentation and util packages#233

Open
adityamehra wants to merge 17 commits intomainfrom
ci/add-missing-unit-tests
Open

ci: add unit tests for all missing instrumentation and util packages#233
adityamehra wants to merge 17 commits intomainfrom
ci/add-missing-unit-tests

Conversation

@adityamehra
Copy link
Contributor

@adityamehra adityamehra commented Mar 18, 2026

Summary

Add CI test coverage for the 11 packages that were missing from the test workflow, and fix several bugs discovered during CI integration.

Packages added to CI

  • Instrumentation: openai-v2, openai-agents-v2, llamaindex, fastmcp, crewai, aidefense, weaviate
  • Util: emitters-test, traceloop-translator, openlit-translator, langsmith-translator

Bug fixes discovered during CI integration

1. openai-v2 tests failing in CI — Splunk emitter missing capture_content guard (d021a72)

RCA: The SplunkConversationEventsEmitter (from opentelemetry-util-genai-emitters-splunk) registers with mode="replace-category" for "content_events", replacing the built-in ContentEventsEmitter. The built-in emitter's on_end() method has a guard if not self._capture_content: return that skips log emission when content capture is disabled. The Splunk replacement was missing this guard.

In CI, where the Splunk emitters package is installed, the emitter would emit (redacted) log records even when OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=False, causing assert len(logs) == 0 assertions to fail. Locally (where the Splunk package is not installed), the built-in emitter with the guard was used, so tests passed.

Fix: Added the if not self._capture_content: return guard to SplunkConversationEventsEmitter.on_end(), matching the built-in emitter's behavior.

2. crewai test failing in CI — agent_id is None after PR #228 regression (55e0ff5)

RCA: PR #228 ("Remove run_id from GenAI Utils") removed the run_id field which was previously used to auto-generate agent_id. After removal, agent_id defaults to None on all AgentInvocation objects. The span emitter's _start_agent() computes a fallback agent_id from span_id (f"{agent.span_id:016x}") and sets it as a span attribute, but never writes it back to the agent.agent_id field on the object. When handler.start_agent() pushes (agent.name, agent.agent_id) onto the context stack, it pushes None. Downstream LLM invocations inherit this None, causing GEN_AI_AGENT_ID assertions to fail.

This was also flagged by @pradystar during the PR #228 review: "wouldn't the agent_id be None for outermost agent since no parent sets it?"

Fix: Added 2-line derivation in handler.start_agent() — after on_start() sets span_id, derive agent_id = f"{span_id:016x}" when not explicitly set by the framework.

Other CI fixes

  • Disabled deepeval evaluators globally in CI via env var, with override for the evals-specific test step (394cb50)
  • Added llama-index-embeddings-openai test dependency for llamaindex (3612b47)
  • Fixed crewai workflow input serialization and empty output assertions (5410ea7)
  • Fixed VCR/pytest-recording configuration for openai-v2 tests (a34fee2, 85e1cb2, 4785828)
  • Removed translator packages that fail due to .pth auto-import conflicts (02947ec)

Known remaining failures

  • openai-agents-v2: 14 test failures (pre-existing, not introduced by this PR)
  • llamaindex: 7 failures in test_workflow_agent_variants ("Event stream already consumed" errors) + embedding/RAG tests need VCR cassettes

Made-with: Cursor

@adityamehra adityamehra requested review from a team as code owners March 18, 2026 19:31
Cover the 11 packages that were missing from CI:
- Instrumentation: openai-v2, openai-agents-v2, llamaindex, fastmcp,
  crewai, aidefense, weaviate
- Util: emitters-test, traceloop-translator, openlit-translator,
  langsmith-translator

Made-with: Cursor
The traceloop, openlit, and langsmith translator packages use .pth
files that auto-execute on install and fail with ModuleNotFoundError
for processor modules. Remove them from CI for now.

Made-with: Cursor
pytest-recording (from langchain's test deps) is incompatible with
pytest-vcr. pytest-recording provides the same @pytest.mark.vcr()
decorator, so pytest-vcr is not needed separately.

Made-with: Cursor
pytest-recording's built-in vcr fixture is function-scoped, so our
module-scoped fixture_vcr that depends on it causes a ScopeMismatch
error. Downgrade to function scope to match.

Made-with: Cursor
…recording

Without vcr_cassette_dir, pytest-recording looks for cassettes in a
module-name subdirectory that doesn't exist. Adding the serializer key
and session-scoped cassette dir fixture aligns with the langchain test
pattern and ensures cassettes are loaded correctly.

Made-with: Cursor
@adityamehra adityamehra force-pushed the ci/add-missing-unit-tests branch from c7c865f to a34fee2 Compare March 18, 2026 20:42
In CI, deepeval is installed via dev-genai-requirements.txt and
auto-discovers through entry points. This causes it to make real
OpenAI API calls during tests, consuming VCR cassette responses
and emitting unexpected log records. Set evaluators to "none" and
reset the handler singleton per test, matching the langchain pattern.

Made-with: Cursor
Add if: always() to each test step so a failure in one package
doesn't skip the remaining test suites. The job still fails if
any step fails, but now all results are visible in a single run.

Made-with: Cursor
@adityamehra adityamehra force-pushed the ci/add-missing-unit-tests branch from 2c147ff to 9bc80d7 Compare March 19, 2026 15:51
…ssertions

Workflow inputs are now serialized as JSON (preserving key names) via a
dedicated _make_workflow_input_message function. Agent inputs keep the
existing field-level extraction for description/expected_output/inquiry.

Also fixes empty-output test assertions that expected '""' instead of "",
and adds a test verifying agent captures both description and
expected_output as input_messages.

Made-with: Cursor
…dency

Tests import llama_index.embeddings.openai.OpenAIEmbedding but
the package was not listed in [project.optional-dependencies.test].

Made-with: Cursor
Prevents deepeval from making real OpenAI API calls during CI,
which causes tests to hang or fail with connection errors.

Made-with: Cursor
@adityamehra adityamehra force-pushed the ci/add-missing-unit-tests branch from 31923c3 to fabe498 Compare March 19, 2026 19:37
Set OTEL_INSTRUMENTATION_GENAI_EVALS_EVALUATORS=none as a workflow-level
env var to prevent deepeval from making real OpenAI API calls during
non-eval test suites in CI (where no API key is available). This was
causing extra log emissions, VCR cassette mismatches, and test failures
in openai-v2 and crewai tests.

The evals-deepeval test step overrides with an empty value to allow
deepeval to function normally for its own tests.

Made-with: Cursor
…abled

SplunkConversationEventsEmitter was missing the capture_content guard
that the built-in ContentEventsEmitter has. When the splunk emitter
replaces the built-in (via mode=replace-category), it emitted redacted
log records even with CAPTURE_MESSAGE_CONTENT=False. This caused
openai-v2 tests asserting len(logs)==0 to fail in CI (where the splunk
emitters package is installed) but pass locally (where it is not).

Made-with: Cursor
PR #228 removed run_id which was previously used as agent_id. After that
change, agent_id defaults to None on AgentInvocation objects. The span
emitter computes a fallback from span_id and sets it on the span, but
never writes it back to the object. This left agent_id as None on the
context stack, so downstream LLM invocations inherited None instead of
a valid identifier.

Derive agent_id = f"{span_id:016x}" in start_agent after on_start when
agent_id is still None, matching the span emitter's fallback logic.

Made-with: Cursor
- openai-agents-v2/test_tracer.py: all 14 tests broken after PR #228
  (run_id removal); needs rework of span processor stubs
- llamaindex embedding/rag tests: require live OpenAI API key, need
  VCR cassettes added
- llamaindex workflow agent tests: "Event stream already consumed"
  errors in workflow event instrumentation

Made-with: Cursor
Replace git+https monorepo references with pinned PyPI versions for
faster, more reliable CI installs. All versions match the v1.38.0 /
v0.59b0 release.

Made-with: Cursor
On Windows, connecting to localhost:4242 with timeout=0.1s raises
APITimeoutError (TCP SYN retries) instead of APIConnectionError
(immediate ECONNREFUSED on Unix). Both indicate unreachable endpoint.

Made-with: Cursor
@adityamehra adityamehra force-pushed the ci/add-missing-unit-tests branch from 5238700 to 1ad9ea9 Compare March 20, 2026 18:02
self._emitter.on_start(agent)
# Derive agent_id from span_id when not explicitly set by the framework
if agent.agent_id is None and agent.span_id is not None:
agent.agent_id = f"{agent.span_id:016x}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we using hex value elsewhere too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We removed this because the instrumentation should be assigning an agent id. see #228 (comment)

Is it the job of the genai utils to generate an id for the agent?

Copy link
Contributor

@pradystar pradystar Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keith-decker thanks! yeah this is specifically when framework does not set it (maybe crewai?). Maybe we want to add a note if we are leaving this on the framework.
@adityamehra do you agree? however, if we do want to set agent id here it should be string not hex value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants