[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1326

2026-03-16T22:26:20Z

github-actions[bot]
bot Mar 16, 2026

📊 Current CI/CD Pipeline Status

This repository has a mature, multi-layered CI/CD pipeline with 52 total registered workflows across conventional YAML and compiled agentic (.md) workflows. The pipeline covers build verification, linting, security scanning, integration testing, and smoke testing end-to-end.

Pipeline health (recent PR runs): 18 distinct workflows run on pull requests. Failure rate is low, with the main notable failure being the agentic Build Test Suite (which runs cross-language builds via AI). Most core checks (Build Verification, Lint, TypeScript Type Check, CodeQL, Integration Tests, Test Coverage) are passing consistently.

✅ Existing Quality Gates

The following checks currently run on pull requests:

Check	Workflow	Scope
ESLint + TypeScript linting	`lint.yml`	All PRs
Markdown linting	`lint.yml`	All PRs
TypeScript type check	`test-integration.yml`	All PRs
Build verification (Node 20 & 22)	`build.yml`	All PRs
API proxy unit tests	`build.yml`	All PRs
Unit test coverage + regression check	`test-coverage.yml`	All PRs (non-md)
Integration tests (domain/network, protocol/security, container/ops, API proxy)	`test-integration-suite.yml`	All PRs
Chroot integration tests (multi-language)	`test-chroot.yml`	All PRs
Examples test (Docker builds)	`test-examples.yml`	All PRs (non-md)
Setup action test	`test-action.yml`	All PRs (non-md)
PR title format (Conventional Commits)	`pr-title.yml`	All PRs
CodeQL (JS/TS + Actions)	`codeql.yml`	All PRs
Container security scan (Trivy)	`container-scan.yml`	PRs touching `containers/**`
Dependency vulnerability audit (npm audit)	`dependency-audit.yml`	All PRs (non-md)
Documentation link check	`link-check.yml`	PRs touching `*.md`
Documentation preview build	`docs-preview.yml`	PRs touching docs
AI security code review	`security-guard.md` (Claude)	All PRs
AI build test suite (cross-language)	`build-test.md` (Copilot)	All PRs
Smoke tests (Claude, Codex, Copilot)	`smoke-*.md`	All PRs + schedule

Scheduled-only checks: Secret diggers (hourly), performance benchmarks (weekly), dependency security monitor (daily), doc maintainer (daily).

🔍 Identified Gaps

🔴 High Priority

1. Integration Test Pattern Coverage Has Blind Spots

The test-integration-suite.yml workflow uses --testPathPatterns regex to split 33 integration test files across 4 parallel jobs. Several test files are not matched by any pattern and are therefore silently skipped in CI:

api-target-allowlist.test.ts — tests automatic domain allowlisting for API targets
gh-host-injection.test.ts — security test for GH_HOST injection protection
ghes-auto-populate.test.ts — GHES domain auto-population feature
skip-pull.test.ts — tests --skip-pull flag behavior
workdir-tmpfs-hiding.test.ts — security test for workdir visibility hiding

This is a significant gap: security-critical tests (gh-host-injection, workdir-tmpfs-hiding) exist but may not run on PRs.

Recommendation: Either add the missing test names to the relevant job patterns, or replace the pattern-based split with explicit test file lists. Consider auditing periodically with a script that cross-checks test files against CI patterns.

2. Very Low Unit Test Coverage With Permissive Thresholds

Current unit test coverage is 38% statements overall, with critical files having near-zero coverage:

cli.ts — 0% coverage (0/69 statements)
docker-manager.ts — 18% coverage (45/250 statements)

The coverage thresholds in jest config are set very low (≥38% statements, ≥30% branches), meaning PRs that further reduce coverage in these critical files can still pass. cli.ts and docker-manager.ts are the two largest, most complex files.

Recommendation: Incrementally raise coverage thresholds. Add per-file minimum thresholds for cli.ts and docker-manager.ts. The test-coverage-improver agentic workflow runs weekly but improvements should be required as part of landing new features.

3. Agentic Build Test Suite Has Persistent Failures

The Build Test Suite agentic workflow has conclusion=failure in recent PR runs. This is an AI-driven workflow that runs multi-language build tests. A persistent failure here means an entire quality gate is effectively non-functional.

Recommendation: Investigate the root cause of the failure (likely a network or token issue), fix it, and add alerting via ci-doctor workflow for prolonged failures.

🟡 Medium Priority

4. Container Security Scan Not Triggered on Source Code Changes

container-scan.yml (Trivy) only runs when files under containers/** change. A change to src/docker-manager.ts that alters how containers are configured (capabilities, seccomp, network) would not trigger a container rescan.

Recommendation: Consider running container security scans on every PR (with caching to limit cost), or expand the path trigger to include src/** since source changes affect the runtime security posture.

5. Performance Benchmarks Never Run on PRs

performance-monitor.yml runs only on a weekly schedule. Startup time, container spin-up latency, and throughput regressions introduced in a PR would go undetected until the following weekly run — and the weekly run doesn't comment on the offending PR.

Recommendation: Add a lightweight performance check step to the build workflow (e.g., measure startup time on a single iteration) that can detect significant regressions (>50%) and post a PR comment with the delta.

6. Smoke Tests Are Effectively Optional for External Contributors

The smoke-*.md agentic smoke tests (Claude, Codex, Copilot) trigger on PRs but are gated by roles: all combined with reaction emoji requirements for non-team members. While this is intentional to prevent abuse, it means the most realistic end-to-end validation of the firewall (running a real AI agent through the AWF) does not run automatically for all PRs.

Recommendation: Consider making at least one smoke test required (or running a non-AI smoke test that exercises the same code paths) as a required status check for maintainer PRs.

7. No License Compliance Checking for Dependencies

There is no automated check to verify that newly added npm dependencies use acceptable licenses (MIT, Apache-2.0, ISC, etc.) and don't introduce copyleft licenses (GPL, AGPL) that could create legal complications for a commercial product.

Recommendation: Add license-checker or licensee to the dependency audit workflow to flag incompatible license additions.

8. Secret Scanning Is Not a PR Gate

The hourly secret-digger-* agentic workflows scan for secrets but run on a schedule, not on PRs. A secret committed to a PR would not be blocked; it would only be detected after the fact (up to 1 hour later).

Recommendation: Consider adding GitHub's native secret scanning push protection (a repository setting) which blocks pushes containing recognized secrets at the git level, complementing the AI-based scanning.

🟢 Low Priority

9. CLI Flag Consistency Check Not on PRs

cli-flag-consistency-checker.md runs weekly and checks for inconsistencies between CLI flags and documentation. A PR that adds a flag without updating docs would pass all checks and only be caught at the next weekly run.

Recommendation: Run the CLI flag consistency check on PRs that touch src/cli.ts or README.md.

10. Documentation Preview Doesn't Fail the PR

docs-preview.yml builds the Astro/Starlight docs site with continue-on-error: true. A broken docs build silently passes; contributors only see an artifact upload failure if they dig into the logs.

Recommendation: Remove continue-on-error: true or post a PR comment when the docs build fails. Currently broken docs can be merged unnoticed.

11. No SBOM Generation

There is no Software Bill of Materials (SBOM) generation in the release or PR workflow. For a security-focused tool distributed as a GitHub Action/npm package, an SBOM aids downstream consumers in vulnerability tracking.

Recommendation: Add cyclonedx-npm or @cyclonedx/cdxgen to the release workflow to generate an SBOM artifact.

12. Missing `api-proxy` Container in Security Scans

container-scan.yml scans containers/agent/ and containers/squid/ but there is also a containers/api-proxy/ with its own Node.js dependencies (a separate package.json). The api-proxy container is not scanned by Trivy.

Recommendation: Add a third job to container-scan.yml to scan the api-proxy container image.

📋 Actionable Recommendations

Priority	Recommendation	Complexity	Impact
🔴 High	Fix integration test pattern gaps (add missing tests to CI patterns)	Low	High — security tests `gh-host-injection`, `workdir-tmpfs-hiding` may not be running
🔴 High	Raise coverage thresholds incrementally; add per-file minimums for `cli.ts` and `docker-manager.ts`	Low	Medium — prevents backsliding on most critical files
🔴 High	Investigate and fix agentic Build Test Suite failures	Medium	Medium — restores an existing quality gate
🟡 Medium	Expand container scan to trigger on `src/**` changes	Low	High — source changes affect runtime security
🟡 Medium	Add lightweight per-PR startup time benchmark (1-2 iterations)	Low	Medium — catches performance regressions early
🟡 Medium	Add `license-checker` step to dependency audit workflow	Low	Medium — prevents license compliance issues
🟡 Medium	Enable GitHub native secret scanning push protection (repo setting)	Low	High — blocks secrets at push time
🟡 Medium	Make smoke test (or equivalent) a required check for maintainers	Medium	High — ensures real end-to-end validation on every change
🟢 Low	Add CLI flag consistency check to PRs touching `src/cli.ts`	Low	Low — catches doc drift earlier
🟢 Low	Remove `continue-on-error` from docs preview, add failure comment	Low	Low — visible feedback on broken docs
🟢 Low	Add SBOM generation to release workflow	Low	Low-Medium — improves supply chain transparency
🟢 Low	Add api-proxy container scan to `container-scan.yml`	Low	Low — closes a scan gap

📈 Metrics Summary

Metric	Value
Total registered GitHub Actions workflows	52
Workflows triggered on every PR	~18
Agentic workflows running on PRs	5 (security-guard, build-test, smoke-claude, smoke-codex, smoke-copilot)
Integration test files total	33
Integration test files potentially missing from CI patterns	~5
Unit test statement coverage	38.4%
Coverage threshold (statements)	38% (very low margin)
`cli.ts` unit test coverage	0%
`docker-manager.ts` unit test coverage	18%
Recent PR check failure rate (non-agentic)	~0%
Agentic Build Test Suite recent failure rate	~100% (persistent failure)
Container scan frequency	On container file changes + weekly
Performance benchmark frequency	Weekly only
Secret scanning PR gate	❌ None (hourly schedule only)

AI generated by CI/CD Pipelines and Integration Tests Gap Assessment

expires on Mar 23, 2026, 10:26 PM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1326

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1326

Uh oh!

github-actions[bot] bot Mar 16, 2026

📊 Current CI/CD Pipeline Status

✅ Existing Quality Gates

🔍 Identified Gaps

🔴 High Priority

1. Integration Test Pattern Coverage Has Blind Spots

2. Very Low Unit Test Coverage With Permissive Thresholds

3. Agentic Build Test Suite Has Persistent Failures

🟡 Medium Priority

4. Container Security Scan Not Triggered on Source Code Changes

5. Performance Benchmarks Never Run on PRs

6. Smoke Tests Are Effectively Optional for External Contributors

7. No License Compliance Checking for Dependencies

8. Secret Scanning Is Not a PR Gate

🟢 Low Priority

9. CLI Flag Consistency Check Not on PRs

10. Documentation Preview Doesn't Fail the PR

11. No SBOM Generation

12. Missing api-proxy Container in Security Scans

📋 Actionable Recommendations

📈 Metrics Summary

Replies: 0 comments

github-actions[bot]
bot Mar 16, 2026

12. Missing `api-proxy` Container in Security Scans