You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The repository has a mature and well-layered CI/CD pipeline with 21 agentic workflow definitions (.md files compiled to .lock.yml) and 17 traditional GitHub Actions YAML workflows. All recent runs of critical workflows (Build Verification: 1,216 total runs; Integration Tests: 324 total runs) show a 100% success rate in the most recent window, indicating a stable baseline. The pipeline covers the full development lifecycle: build verification, unit/integration testing, security scanning, documentation, and release automation.
✅ Existing Quality Gates
The following checks run on every pull request targeting main:
Code Quality
Build Verification (build.yml) — TypeScript compilation + ESLint on Node 20 and 22 matrix
Lint (lint.yml) — ESLint on TypeScript sources + markdownlint on all .md files
TypeScript Type Check (test-integration.yml) — tsc --noEmit with strict config
PR Title Check (pr-title.yml) — Conventional Commits enforcement with allowed scopes
Testing
Unit Tests with Coverage (test-coverage.yml) — Jest coverage comparison vs. base branch; posts PR comment; fails on regression
Integration Tests (test-integration-suite.yml) — Four parallelized job groups: domain/network, protocol security, container operations, API proxy (33 integration test files)
Examples Test (test-examples.yml) — End-to-end smoke tests of shell examples
Test Setup Action (test-action.yml) — Validates the action.yml setup action
API Proxy Unit Tests — Run as part of build.yml
Security
CodeQL (codeql.yml) — SAST for JavaScript/TypeScript and GitHub Actions workflows
Dependency Vulnerability Audit (dependency-audit.yml) — npm audit with SARIF upload, fails on high/critical
Container Security Scan (container-scan.yml) — Trivy scanning of agent and squid containers (triggered on container path changes)
AI Security Guard (security-guard.md) — Claude-based AI review of security-sensitive diffs on every PR
Documentation
Link Check (link-check.yml) — Lychee link validation on Markdown file changes
Release / Agentic Workflows on PRs
Smoke tests (smoke-claude.md, smoke-codex.md, smoke-copilot.md, smoke-chroot.md) — End-to-end firewall tests using each AI engine (reaction-opt-in on PRs, scheduled every 12h)
Build Test (build-test.md) — Agentic build verification on PRs
🔍 Identified Gaps
🔴 High Priority
H1 — Critically Low Coverage Thresholds
The coverage thresholds in jest.config.js are set very low: 38% statements, 31.78% branches, 37% functions. The two most critical files — cli.ts (0% coverage) and docker-manager.ts (18% coverage) — are the core orchestrators of the entire tool. Low thresholds mean the coverage gate provides almost no protection against regressions in these files.
H2 — Container Security Scan Not Triggered on Every PR
container-scan.yml uses paths: filtering limited to containers/** changes. PRs that modify src/docker-manager.ts (which controls container configuration, capabilities, and seccomp) bypass Trivy scanning entirely, even though such changes directly affect the security posture of the containers.
H3 — Smoke Tests Are Opt-In and Non-Blocking on PRs
Smoke tests (smoke-claude, smoke-codex, smoke-copilot) require an emoji reaction to trigger on PRs and are not required status checks. The full end-to-end firewall validation (actual network egress control with a real AI agent) is therefore never a blocking gate on merge. A PR that breaks the core proxy flow can be merged if no reaction is added.
H4 — Performance Benchmarks Run Weekly Only
performance-monitor.yml is scheduled weekly and never runs on PRs. Startup latency and container boot time are important UX properties of a firewall tool, and regressions can be introduced in docker-manager.ts without detection until the following Monday.
🟡 Medium Priority
M1 — Coverage Thresholds Are Not Ratcheted Up Over Time
While the test-coverage-improver.md agentic workflow exists to open PRs improving coverage, the static thresholds in jest.config.js don't automatically rise as coverage improves. There is no mechanism to prevent coverage from drifting back down to the minimum threshold after it has been raised.
M2 — No License Compliance Checking
There is no FOSSA, LicenseChecker, or license-checker step to validate that dependencies comply with the project's license policy. For a security tool distributed as open source, unexpected copyleft or restrictive licenses in dependencies could create legal risk.
M3 — No Mutation Testing
The test suite validates that tests pass, but does not verify test effectiveness. Mutation testing (e.g., Stryker) would reveal tests that pass even when the source code is intentionally broken, which is particularly important for security-critical logic like domain pattern validation and iptables rule generation.
M4 — Docs Site Not Tested in PRs
docs-preview.yml exists but does not appear to run on all PRs. The Astro/Starlight documentation site (docs-site/) has no build validation on code PRs, meaning a documentation build break could go undetected until the deploy workflow runs post-merge.
M5 — No Structured Fuzz / Property-Based Testing
The domain parsing logic (src/domain-patterns.ts), Squid config generation (src/squid-config.ts), and iptables rule construction are security-critical surfaces. Property-based testing (e.g., fast-check) would provide stronger guarantees than example-based unit tests alone.
M6 — Container Scan Only Covers HIGH/CRITICAL — No MEDIUM Tracking
container-scan.yml is configured with severity: 'CRITICAL,HIGH', which is appropriate for blocking but provides no visibility into accumulating MEDIUM vulnerabilities that can become high-risk over time.
🟢 Low Priority
L1 — No macOS / Windows Testing
All CI runs on ubuntu-latest. The tool uses Docker and iptables, which are Linux-specific, but the CLI itself could be installed on macOS. There is no validation that npm install or the action setup step works on macOS runners.
L2 — No Dependabot Auto-Merge for Minor/Patch Dependencies
Dependabot updates are not configured in .github/dependabot.yml (file not found in the directory listing). Dependency freshness relies on the dependency-security-monitor.md agentic workflow rather than automated PRs, which means patch updates may be delayed.
L3 — Agentic Workflow Compilation Not Validated in PRs
Changes to .md workflow files require manual compilation (gh aw compile) to produce .lock.yml files. There is no CI check that validates the compiled .lock.yml matches the .md source, allowing drift between the two.
L4 — No SBOM (Software Bill of Materials) Generation
The release workflow does not produce an SBOM artifact (CycloneDX or SPDX format). For a security-focused tool, publishing an SBOM alongside each release would improve supply chain transparency.
L5 — Link Check Only Triggers on Markdown Changes
link-check.yml uses paths: ['**/*.md'], so it only runs when a Markdown file is modified. A broken external URL in existing docs can persist indefinitely if the PR doesn't touch Markdown. The weekly schedule provides a backstop, but non-blocking.
📋 Actionable Recommendations
Gap
Recommendation
Complexity
Impact
H1 — Low coverage thresholds
Raise thresholds incrementally per quarter: target 60% statements, 50% branches by end of year. Prioritize cli.ts and docker-manager.ts with dedicated test suites.
Medium
🔴 High
H2 — Container scan path filter
Add src/** to container-scan.ymlpaths: trigger, OR remove path restriction and run Trivy on every PR (use caching to keep it fast).
Low
🔴 High
H3 — Smoke tests opt-in
Add at least one smoke test variant as a required status check (e.g., smoke-copilot runs automatically on all PRs without needing a reaction). Alternatively, run a lightweight `awf --allow-domains example.com curl (example.com/redacted) integration check as a required gate.
Low
🔴 High
H4 — No PR performance gate
Add a lightweight benchmark step to the build workflow that measures container startup time against a threshold (e.g., fail if > 30s). Reuse scripts/ci/benchmark-performance.ts.
Medium
🟡 Medium
M1 — Static coverage thresholds
Implement a coverage ratchet: after each merge to main, update jest.config.js thresholds to current coverage if they are higher than existing minimums.
Medium
🟡 Medium
M2 — License compliance
Add npx license-checker --onlyAllow 'MIT;Apache-2.0;BSD-2-Clause;BSD-3-Clause;ISC' to the build workflow.
Low
🟡 Medium
M3 — No mutation testing
Integrate Stryker Mutator as a scheduled weekly job targeting src/domain-patterns.ts and src/squid-config.ts.
High
🟡 Medium
M4 — Docs site build
Add cd docs-site && npm ci && npm run build as a job in build.yml triggered on docs-site/** path changes.
Low
🟡 Medium
M5 — No fuzz/property tests
Add fast-check property-based tests for domain-patterns.ts and squid-config.ts.
Medium
🟡 Medium
M6 — MEDIUM vuln visibility
Add a separate non-blocking Trivy scan step with severity: MEDIUM that posts results to GitHub Security tab without failing the check.
Low
🟢 Low
L1 — No macOS testing
Add a macOS job to test-action.yml to validate the setup action.
Low
🟢 Low
L2 — No Dependabot
Add .github/dependabot.yml with monthly npm update schedule and auto-merge for patch versions via a GitHub Actions auto-merge workflow.
Low
🟢 Low
L3 — Lock file drift
Add a CI check: gh aw compile all .md files and git diff --exit-code to detect uncommitted lock file changes.
Low
🟢 Low
L4 — No SBOM
Add anchore/sbom-action to release.yml to attach a CycloneDX SBOM to each GitHub release.
Low
🟢 Low
📈 Metrics Summary
Metric
Value
Total workflows (YAML)
17
Total agentic workflows (.md)
21
Workflows running on every PR
12+
Unit test files
14
Integration test files
33
Total unit tests
~135
Statement coverage
38.39% (threshold: 38%)
Branch coverage
31.78% (threshold: 30%)
Function coverage
37.03% (threshold: 35%)
Line coverage
38.31% (threshold: 38%)
cli.ts coverage
0% 🔴
docker-manager.ts coverage
18% 🔴
Build Verification success rate (recent)
100% (10/10 recent runs)
Integration Tests success rate (recent)
100% (10/10 recent runs)
Security workflows
CodeQL + Trivy + npm audit + AI Security Guard
Key Observation
The pipeline infrastructure is comprehensive and well-designed. The most impactful improvement area is test coverage depth — the thresholds are intentionally set low to pass a starting baseline, and the two most important files (cli.ts and docker-manager.ts) remain almost entirely untested. Increasing coverage on these files would directly improve confidence in every PR and catch regressions in the core firewall orchestration logic.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
📊 Current CI/CD Pipeline Status
The repository has a mature and well-layered CI/CD pipeline with 21 agentic workflow definitions (
.mdfiles compiled to.lock.yml) and 17 traditional GitHub Actions YAML workflows. All recent runs of critical workflows (Build Verification: 1,216 total runs; Integration Tests: 324 total runs) show a 100% success rate in the most recent window, indicating a stable baseline. The pipeline covers the full development lifecycle: build verification, unit/integration testing, security scanning, documentation, and release automation.✅ Existing Quality Gates
The following checks run on every pull request targeting
main:Code Quality
build.yml) — TypeScript compilation + ESLint on Node 20 and 22 matrixlint.yml) — ESLint on TypeScript sources + markdownlint on all.mdfilestest-integration.yml) —tsc --noEmitwith strict configpr-title.yml) — Conventional Commits enforcement with allowed scopesTesting
test-coverage.yml) — Jest coverage comparison vs. base branch; posts PR comment; fails on regressiontest-integration-suite.yml) — Four parallelized job groups: domain/network, protocol security, container operations, API proxy (33 integration test files)test-chroot.yml) — Multi-language chroot support (Python, Go, Java, .NET)test-examples.yml) — End-to-end smoke tests of shell examplestest-action.yml) — Validates theaction.ymlsetup actionbuild.ymlSecurity
codeql.yml) — SAST for JavaScript/TypeScript and GitHub Actions workflowsdependency-audit.yml) —npm auditwith SARIF upload, fails on high/criticalcontainer-scan.yml) — Trivy scanning of agent and squid containers (triggered on container path changes)security-guard.md) — Claude-based AI review of security-sensitive diffs on every PRDocumentation
link-check.yml) — Lychee link validation on Markdown file changesRelease / Agentic Workflows on PRs
smoke-claude.md,smoke-codex.md,smoke-copilot.md,smoke-chroot.md) — End-to-end firewall tests using each AI engine (reaction-opt-in on PRs, scheduled every 12h)build-test.md) — Agentic build verification on PRs🔍 Identified Gaps
🔴 High Priority
H1 — Critically Low Coverage Thresholds
The coverage thresholds in
jest.config.jsare set very low: 38% statements, 31.78% branches, 37% functions. The two most critical files —cli.ts(0% coverage) anddocker-manager.ts(18% coverage) — are the core orchestrators of the entire tool. Low thresholds mean the coverage gate provides almost no protection against regressions in these files.H2 — Container Security Scan Not Triggered on Every PR
container-scan.ymlusespaths:filtering limited tocontainers/**changes. PRs that modifysrc/docker-manager.ts(which controls container configuration, capabilities, and seccomp) bypass Trivy scanning entirely, even though such changes directly affect the security posture of the containers.H3 — Smoke Tests Are Opt-In and Non-Blocking on PRs
Smoke tests (
smoke-claude,smoke-codex,smoke-copilot) require an emoji reaction to trigger on PRs and are not required status checks. The full end-to-end firewall validation (actual network egress control with a real AI agent) is therefore never a blocking gate on merge. A PR that breaks the core proxy flow can be merged if no reaction is added.H4 — Performance Benchmarks Run Weekly Only
performance-monitor.ymlis scheduled weekly and never runs on PRs. Startup latency and container boot time are important UX properties of a firewall tool, and regressions can be introduced indocker-manager.tswithout detection until the following Monday.🟡 Medium Priority
M1 — Coverage Thresholds Are Not Ratcheted Up Over Time
While the
test-coverage-improver.mdagentic workflow exists to open PRs improving coverage, the static thresholds injest.config.jsdon't automatically rise as coverage improves. There is no mechanism to prevent coverage from drifting back down to the minimum threshold after it has been raised.M2 — No License Compliance Checking
There is no FOSSA, LicenseChecker, or
license-checkerstep to validate that dependencies comply with the project's license policy. For a security tool distributed as open source, unexpected copyleft or restrictive licenses in dependencies could create legal risk.M3 — No Mutation Testing
The test suite validates that tests pass, but does not verify test effectiveness. Mutation testing (e.g., Stryker) would reveal tests that pass even when the source code is intentionally broken, which is particularly important for security-critical logic like domain pattern validation and iptables rule generation.
M4 — Docs Site Not Tested in PRs
docs-preview.ymlexists but does not appear to run on all PRs. The Astro/Starlight documentation site (docs-site/) has no build validation on code PRs, meaning a documentation build break could go undetected until the deploy workflow runs post-merge.M5 — No Structured Fuzz / Property-Based Testing
The domain parsing logic (
src/domain-patterns.ts), Squid config generation (src/squid-config.ts), and iptables rule construction are security-critical surfaces. Property-based testing (e.g., fast-check) would provide stronger guarantees than example-based unit tests alone.M6 — Container Scan Only Covers HIGH/CRITICAL — No MEDIUM Tracking
container-scan.ymlis configured withseverity: 'CRITICAL,HIGH', which is appropriate for blocking but provides no visibility into accumulating MEDIUM vulnerabilities that can become high-risk over time.🟢 Low Priority
L1 — No macOS / Windows Testing
All CI runs on
ubuntu-latest. The tool uses Docker and iptables, which are Linux-specific, but the CLI itself could be installed on macOS. There is no validation thatnpm installor the action setup step works on macOS runners.L2 — No Dependabot Auto-Merge for Minor/Patch Dependencies
Dependabot updates are not configured in
.github/dependabot.yml(file not found in the directory listing). Dependency freshness relies on thedependency-security-monitor.mdagentic workflow rather than automated PRs, which means patch updates may be delayed.L3 — Agentic Workflow Compilation Not Validated in PRs
Changes to
.mdworkflow files require manual compilation (gh aw compile) to produce.lock.ymlfiles. There is no CI check that validates the compiled.lock.ymlmatches the.mdsource, allowing drift between the two.L4 — No SBOM (Software Bill of Materials) Generation
The release workflow does not produce an SBOM artifact (CycloneDX or SPDX format). For a security-focused tool, publishing an SBOM alongside each release would improve supply chain transparency.
L5 — Link Check Only Triggers on Markdown Changes
link-check.ymlusespaths: ['**/*.md'], so it only runs when a Markdown file is modified. A broken external URL in existing docs can persist indefinitely if the PR doesn't touch Markdown. The weekly schedule provides a backstop, but non-blocking.📋 Actionable Recommendations
cli.tsanddocker-manager.tswith dedicated test suites.src/**tocontainer-scan.ymlpaths:trigger, OR remove path restriction and run Trivy on every PR (use caching to keep it fast).smoke-copilotruns automatically on all PRs without needing a reaction). Alternatively, run a lightweight `awf --allow-domains example.com curl (example.com/redacted) integration check as a required gate.scripts/ci/benchmark-performance.ts.jest.config.jsthresholds to current coverage if they are higher than existing minimums.npx license-checker --onlyAllow 'MIT;Apache-2.0;BSD-2-Clause;BSD-3-Clause;ISC'to the build workflow.src/domain-patterns.tsandsrc/squid-config.ts.cd docs-site && npm ci && npm run buildas a job inbuild.ymltriggered ondocs-site/**path changes.fast-checkproperty-based tests fordomain-patterns.tsandsquid-config.ts.severity: MEDIUMthat posts results to GitHub Security tab without failing the check.test-action.ymlto validate the setup action..github/dependabot.ymlwith monthly npm update schedule and auto-merge for patch versions via a GitHub Actions auto-merge workflow.gh aw compileall.mdfiles andgit diff --exit-codeto detect uncommitted lock file changes.anchore/sbom-actiontorelease.ymlto attach a CycloneDX SBOM to each GitHub release.📈 Metrics Summary
cli.tscoveragedocker-manager.tscoverageKey Observation
The pipeline infrastructure is comprehensive and well-designed. The most impactful improvement area is test coverage depth — the thresholds are intentionally set low to pass a starting baseline, and the two most important files (
cli.tsanddocker-manager.ts) remain almost entirely untested. Increasing coverage on these files would directly improve confidence in every PR and catch regressions in the core firewall orchestration logic.Beta Was this translation helpful? Give feedback.
All reactions