ci: add unit tests for all missing instrumentation and util packages#233
Open
adityamehra wants to merge 17 commits intomainfrom
Open
ci: add unit tests for all missing instrumentation and util packages#233adityamehra wants to merge 17 commits intomainfrom
adityamehra wants to merge 17 commits intomainfrom
Conversation
Cover the 11 packages that were missing from CI: - Instrumentation: openai-v2, openai-agents-v2, llamaindex, fastmcp, crewai, aidefense, weaviate - Util: emitters-test, traceloop-translator, openlit-translator, langsmith-translator Made-with: Cursor
The traceloop, openlit, and langsmith translator packages use .pth files that auto-execute on install and fail with ModuleNotFoundError for processor modules. Remove them from CI for now. Made-with: Cursor
pytest-recording (from langchain's test deps) is incompatible with pytest-vcr. pytest-recording provides the same @pytest.mark.vcr() decorator, so pytest-vcr is not needed separately. Made-with: Cursor
pytest-recording's built-in vcr fixture is function-scoped, so our module-scoped fixture_vcr that depends on it causes a ScopeMismatch error. Downgrade to function scope to match. Made-with: Cursor
…recording Without vcr_cassette_dir, pytest-recording looks for cassettes in a module-name subdirectory that doesn't exist. Adding the serializer key and session-scoped cassette dir fixture aligns with the langchain test pattern and ensures cassettes are loaded correctly. Made-with: Cursor
c7c865f to
a34fee2
Compare
In CI, deepeval is installed via dev-genai-requirements.txt and auto-discovers through entry points. This causes it to make real OpenAI API calls during tests, consuming VCR cassette responses and emitting unexpected log records. Set evaluators to "none" and reset the handler singleton per test, matching the langchain pattern. Made-with: Cursor
Add if: always() to each test step so a failure in one package doesn't skip the remaining test suites. The job still fails if any step fails, but now all results are visible in a single run. Made-with: Cursor
2c147ff to
9bc80d7
Compare
…ssertions Workflow inputs are now serialized as JSON (preserving key names) via a dedicated _make_workflow_input_message function. Agent inputs keep the existing field-level extraction for description/expected_output/inquiry. Also fixes empty-output test assertions that expected '""' instead of "", and adds a test verifying agent captures both description and expected_output as input_messages. Made-with: Cursor
…dency Tests import llama_index.embeddings.openai.OpenAIEmbedding but the package was not listed in [project.optional-dependencies.test]. Made-with: Cursor
Prevents deepeval from making real OpenAI API calls during CI, which causes tests to hang or fail with connection errors. Made-with: Cursor
31923c3 to
fabe498
Compare
Set OTEL_INSTRUMENTATION_GENAI_EVALS_EVALUATORS=none as a workflow-level env var to prevent deepeval from making real OpenAI API calls during non-eval test suites in CI (where no API key is available). This was causing extra log emissions, VCR cassette mismatches, and test failures in openai-v2 and crewai tests. The evals-deepeval test step overrides with an empty value to allow deepeval to function normally for its own tests. Made-with: Cursor
…abled SplunkConversationEventsEmitter was missing the capture_content guard that the built-in ContentEventsEmitter has. When the splunk emitter replaces the built-in (via mode=replace-category), it emitted redacted log records even with CAPTURE_MESSAGE_CONTENT=False. This caused openai-v2 tests asserting len(logs)==0 to fail in CI (where the splunk emitters package is installed) but pass locally (where it is not). Made-with: Cursor
PR #228 removed run_id which was previously used as agent_id. After that change, agent_id defaults to None on AgentInvocation objects. The span emitter computes a fallback from span_id and sets it on the span, but never writes it back to the object. This left agent_id as None on the context stack, so downstream LLM invocations inherited None instead of a valid identifier. Derive agent_id = f"{span_id:016x}" in start_agent after on_start when agent_id is still None, matching the span emitter's fallback logic. Made-with: Cursor
- openai-agents-v2/test_tracer.py: all 14 tests broken after PR #228 (run_id removal); needs rework of span processor stubs - llamaindex embedding/rag tests: require live OpenAI API key, need VCR cassettes added - llamaindex workflow agent tests: "Event stream already consumed" errors in workflow event instrumentation Made-with: Cursor
Replace git+https monorepo references with pinned PyPI versions for faster, more reliable CI installs. All versions match the v1.38.0 / v0.59b0 release. Made-with: Cursor
…ed test modules Made-with: Cursor
On Windows, connecting to localhost:4242 with timeout=0.1s raises APITimeoutError (TCP SYN retries) instead of APIConnectionError (immediate ECONNREFUSED on Unix). Both indicate unreachable endpoint. Made-with: Cursor
5238700 to
1ad9ea9
Compare
pradystar
reviewed
Mar 20, 2026
| self._emitter.on_start(agent) | ||
| # Derive agent_id from span_id when not explicitly set by the framework | ||
| if agent.agent_id is None and agent.span_id is not None: | ||
| agent.agent_id = f"{agent.span_id:016x}" |
Contributor
There was a problem hiding this comment.
are we using hex value elsewhere too?
Contributor
There was a problem hiding this comment.
We removed this because the instrumentation should be assigning an agent id. see #228 (comment)
Is it the job of the genai utils to generate an id for the agent?
Contributor
There was a problem hiding this comment.
@keith-decker thanks! yeah this is specifically when framework does not set it (maybe crewai?). Maybe we want to add a note if we are leaving this on the framework.
@adityamehra do you agree? however, if we do want to set agent id here it should be string not hex value.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add CI test coverage for the 11 packages that were missing from the test workflow, and fix several bugs discovered during CI integration.
Packages added to CI
Bug fixes discovered during CI integration
1.
openai-v2tests failing in CI — Splunk emitter missingcapture_contentguard (d021a72)RCA: The
SplunkConversationEventsEmitter(fromopentelemetry-util-genai-emitters-splunk) registers withmode="replace-category"for"content_events", replacing the built-inContentEventsEmitter. The built-in emitter'son_end()method has a guardif not self._capture_content: returnthat skips log emission when content capture is disabled. The Splunk replacement was missing this guard.In CI, where the Splunk emitters package is installed, the emitter would emit (redacted) log records even when
OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=False, causingassert len(logs) == 0assertions to fail. Locally (where the Splunk package is not installed), the built-in emitter with the guard was used, so tests passed.Fix: Added the
if not self._capture_content: returnguard toSplunkConversationEventsEmitter.on_end(), matching the built-in emitter's behavior.2.
crewaitest failing in CI —agent_idisNoneafter PR #228 regression (55e0ff5)RCA: PR #228 ("Remove run_id from GenAI Utils") removed the
run_idfield which was previously used to auto-generateagent_id. After removal,agent_iddefaults toNoneon allAgentInvocationobjects. The span emitter's_start_agent()computes a fallbackagent_idfromspan_id(f"{agent.span_id:016x}") and sets it as a span attribute, but never writes it back to theagent.agent_idfield on the object. Whenhandler.start_agent()pushes(agent.name, agent.agent_id)onto the context stack, it pushesNone. Downstream LLM invocations inherit thisNone, causingGEN_AI_AGENT_IDassertions to fail.This was also flagged by @pradystar during the PR #228 review: "wouldn't the agent_id be None for outermost agent since no parent sets it?"
Fix: Added 2-line derivation in
handler.start_agent()— afteron_start()setsspan_id, deriveagent_id = f"{span_id:016x}"when not explicitly set by the framework.Other CI fixes
deepevalevaluators globally in CI via env var, with override for the evals-specific test step (394cb50)llama-index-embeddings-openaitest dependency for llamaindex (3612b47)5410ea7)a34fee2,85e1cb2,4785828).pthauto-import conflicts (02947ec)Known remaining failures
test_workflow_agent_variants("Event stream already consumed" errors) + embedding/RAG tests need VCR cassettesMade-with: Cursor