Hi Traceloop team,
While testing AI PatchLab (an open-source local-first security scanner) on a few mid-popularity Python AI projects, I scanned openllmetry at approximately 72fc45e and wanted to flag one best-practice improvement plus one minor SDK code-clarity note. Filing as a single courtesy issue.
Full curated write-up of the scan (with FP analysis, methodology, and the findings AI PatchLab got wrong): https://elfrost.github.io/ai-patchlab/scans/traceloop-openllmetry.html
1. Anonymize secrets in VCR cassettes before recording
Of 26 high-severity findings on the scan, 25 are Gitleaks matches in packages/**/tests/cassettes/**.yaml:
- 11×
aws-access-token matches in opentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_*/
- 8×
jwt matches in opentelemetry-instrumentation-watsonx/tests/
- 6×
generic-api-key matches (including PostHog phc_… public keys in haystack cassettes)
None of these are credential leaks today: the AWS findings are access key IDs without their corresponding secret keys (the Sigv4 signature in the cassette is only valid for that one already-replayed request), the JWTs have transparently placeholder claims (sub: noone@ibm.com, account.bss: abc123), and the PostHog phc_ keys are public write-only event-ingestion identifiers by design.
But this is still worth addressing because:
- Cassettes leak metadata: which AWS account, which Bedrock model, which day, which API surface. For an observability SDK that ships to enterprises, that's worth scrubbing.
- One bad re-record away from a real secret: if VCR isn't configured to anonymize, the next contributor recording a cassette with a real prod key against a different provider will accidentally land it. Unfiltered cassettes are a recurring source of real-world key leaks across Python OSS.
Recommended fix: configure VCR's filter_headers, filter_query_parameters, and before_record_response in the test base (probably in each package's conftest.py or a shared tests/common/):
import vcr
vcr_config = vcr.VCR(
filter_headers=[
('authorization', 'REDACTED'),
('x-api-key', 'REDACTED'),
],
filter_query_parameters=[
('api_key', 'REDACTED'),
],
# Optional: response body scrub for tokens/JWTs returned from auth endpoints
before_record_response=lambda response: response, # add custom redaction if needed
)
This single change would zero out 25 of the 26 high-severity findings on a re-scan and reduce the per-re-record drift risk to near-zero.
2. packages/traceloop-sdk/traceloop/sdk/prompts/client.py:44 — a comment on the jinja2.Environment() use
obj._jinja_env = Environment()
A Semgrep rule (direct-use-of-jinja2) flags this because Environment() defaults to autoescape=False, which would be a real concern when rendering to HTML. Here the Environment is used to render LLM prompts, where autoescape=True would actively damage the output (escaping <, >, & etc. that may be intentional in the prompt).
So the current code is correct — just suggesting a one-line comment so future contributors and security scanners don't keep flagging this:
# autoescape disabled: rendered output goes to an LLM as a prompt, not to HTML
obj._jinja_env = Environment()
Both items are low-priority. Happy to open separate PRs if useful. Thanks for openllmetry — the rest of the scan turned up only false positives or by-design patterns (token-count logger calls, plugin-discovery dynamic imports, sample-app calculator with whitelisted eval), which is a good sign about the codebase overall.
Hi Traceloop team,
While testing AI PatchLab (an open-source local-first security scanner) on a few mid-popularity Python AI projects, I scanned openllmetry at approximately
72fc45eand wanted to flag one best-practice improvement plus one minor SDK code-clarity note. Filing as a single courtesy issue.Full curated write-up of the scan (with FP analysis, methodology, and the findings AI PatchLab got wrong): https://elfrost.github.io/ai-patchlab/scans/traceloop-openllmetry.html
1. Anonymize secrets in VCR cassettes before recording
Of 26 high-severity findings on the scan, 25 are Gitleaks matches in
packages/**/tests/cassettes/**.yaml:aws-access-tokenmatches inopentelemetry-instrumentation-anthropic/tests/cassettes/test_bedrock_*/jwtmatches inopentelemetry-instrumentation-watsonx/tests/generic-api-keymatches (including PostHogphc_…public keys in haystack cassettes)None of these are credential leaks today: the AWS findings are access key IDs without their corresponding secret keys (the Sigv4 signature in the cassette is only valid for that one already-replayed request), the JWTs have transparently placeholder claims (
sub: noone@ibm.com,account.bss: abc123), and the PostHogphc_keys are public write-only event-ingestion identifiers by design.But this is still worth addressing because:
Recommended fix: configure VCR's
filter_headers,filter_query_parameters, andbefore_record_responsein the test base (probably in each package'sconftest.pyor a sharedtests/common/):This single change would zero out 25 of the 26 high-severity findings on a re-scan and reduce the per-re-record drift risk to near-zero.
2.
packages/traceloop-sdk/traceloop/sdk/prompts/client.py:44— a comment on thejinja2.Environment()useA Semgrep rule (
direct-use-of-jinja2) flags this becauseEnvironment()defaults toautoescape=False, which would be a real concern when rendering to HTML. Here the Environment is used to render LLM prompts, whereautoescape=Truewould actively damage the output (escaping<,>,&etc. that may be intentional in the prompt).So the current code is correct — just suggesting a one-line comment so future contributors and security scanners don't keep flagging this:
Both items are low-priority. Happy to open separate PRs if useful. Thanks for openllmetry — the rest of the scan turned up only false positives or by-design patterns (token-count logger calls, plugin-discovery dynamic imports, sample-app calculator with whitelisted
eval), which is a good sign about the codebase overall.