EveryInc · tmchow · Mar 17, 2026 · Mar 15, 2026 · Mar 15, 2026 · Mar 15, 2026
diff --git a/.claude-plugin/marketplace.json b/.claude-plugin/marketplace.json
@@ -11,7 +11,7 @@
   "plugins": [
     {
       "name": "compound-engineering",
-      "description": "AI-powered development tools that get smarter with every use. Make each unit of engineering work easier than the last. Includes 29 specialized agents and 42 skills.",
+      "description": "AI-powered development tools that get smarter with every use. Make each unit of engineering work easier than the last. Includes 29 specialized agents and 44 skills.",
       "version": "2.41.0",
       "author": {
         "name": "Kieran Klaassen",

diff --git a/docs/brainstorms/2026-03-14-ce-plan-rewrite-requirements.md b/docs/brainstorms/2026-03-14-ce-plan-rewrite-requirements.md
@@ -0,0 +1,85 @@
+---
+date: 2026-03-14
+topic: ce-plan-rewrite
+---
+
+# Rewrite `ce:plan` to Separate Planning from Implementation
+
+## Problem Frame
+
+`ce:plan` sits between `ce:brainstorm` and `ce:work`, but the current skill mixes issue authoring, technical planning, and pseudo-implementation. That makes plans brittle and pushes the planning phase to predict details that are often only discoverable during implementation. PR #246 intensifies this by asking plans to include complete code, exact commands, and micro-step TDD and commit choreography. The rewrite should keep planning strong enough for a capable agent or engineer to execute, while moving code-writing, test-running, and execution-time learning back into `ce:work`.
+
+## Requirements
+
+- R1. `ce:plan` must accept either a raw feature description or a requirements document produced by `ce:brainstorm` as primary input.
+- R2. `ce:plan` must preserve compound-engineering's planning strengths: repo pattern scan, institutional learnings, conditional external research, and requirements-gap checks when warranted.
+- R3. `ce:plan` must produce a durable implementation plan focused on decisions, sequencing, file paths, dependencies, risks, and test scenarios, not implementation code.
+- R4. `ce:plan` must not instruct the planner to run tests, generate exact implementation snippets, or learn from execution-time results. Those belong to `ce:work`.
+- R5. Plan tasks and subtasks must be right-sized for implementation handoff, but sized as logical units or atomic commits rather than 2-5 minute copy-paste steps.
+- R6. Plans must remain shareable and portable as documents or issues without tool-specific executor litter such as TodoWrite instructions, `/ce:work` choreography, or git command recipes in the artifact itself.
+- R7. `ce:plan` must carry forward product decisions, scope boundaries, success criteria, and deferred questions from `ce:brainstorm` without re-inventing them.
+- R8. `ce:plan` must explicitly distinguish what gets resolved during planning from what is intentionally deferred to implementation-time discovery.
+- R9. `ce:plan` must hand off cleanly to `ce:work`, giving enough information for task creation without pre-writing code.
+- R10. If detail levels remain, they must change depth of analysis and documentation, not the planning philosophy. A small plan can be terse while still staying decision-first.
+- R11. If an upstream requirements document contains unresolved `Resolve Before Planning` items, `ce:plan` must classify whether they are true product blockers or misfiled technical questions before proceeding.
+- R12. `ce:plan` must not plan past unresolved product decisions that would change behavior, scope, or success criteria, but it may absorb technical or research questions by reclassifying them into planning-owned investigation.
+- R13. When true blockers remain, `ce:plan` must pause helpfully: surface the blockers, allow the user to convert them into explicit assumptions or decisions, or route them back to `ce:brainstorm`.
+
+## Success Criteria
+
+- A fresh implementer can start work from the plan without needing clarifying questions, but the plan does not contain implementation code.
+- `ce:work` can derive actionable tasks from the plan without relying on micro-step commands or embedded git/test instructions.
+- Plans stay accurate longer as repo context changes because they capture decisions and boundaries rather than speculative code.
+- A requirements document from `ce:brainstorm` flows into planning without losing decisions, scope boundaries, or success criteria.
+- Plans do not proceed past unresolved product blockers unless the user explicitly converts them into assumptions or decisions.
+- For the same feature, the rewritten `ce:plan` produces output that is materially shorter and less brittle than the current skill or PR #246's proposed format while remaining execution-ready.
+
+## Scope Boundaries
+
+- Do not redesign `ce:brainstorm`'s product-definition role.
+- Do not remove decomposition, file paths, verification, or risk analysis from `ce:plan`.
+- Do not move planning into a vague, under-specified artifact that leaves execution to guess.
+- Do not change `ce:work` in this phase beyond possible follow-up clarification of what plan structure it should prefer.
+- Do not require heavyweight PRD ceremony for small or straightforward work.
+
+## Key Decisions
+
+- Use a hybrid model: keep compound-engineering's research and handoff strengths, but adopt iterative-engineering's "decisions, not code" boundary.
+- Planning stops before execution: no running tests, no fail/pass learning, no exact implementation snippets, and no commit shell commands in the plan.
+- Use logical tasks and subtasks sized around atomic changes or commit units rather than 2-5 minute micro-steps.
+- Keep explicit verification and test scenarios, but express them as expected coverage and validation outcomes rather than commands with predicted output.
+- Preserve `ce:brainstorm` as the preferred upstream input when available, with clear handling for deferred technical questions.
+- Treat `Resolve Before Planning` as a classification gate: planning first distinguishes true product blockers from technical questions, then investigates only the latter.
+
+## High-Level Direction
+
+- Phase 0: Resume existing plan work when relevant, detect brainstorm input, and assess scope.
+- Phase 1: Gather context through repo research, institutional learnings, and conditional external research.
+- Phase 2: Resolve planning-time technical questions and capture implementation-time unknowns separately.
+- Phase 3: Structure the plan around components, dependencies, files, test targets, risks, and verification.
+- Phase 4: Write a right-sized plan artifact whose depth varies by scope, but whose boundary stays planning-only.
+- Phase 5: Review and hand off to refinement, deeper research, issue sharing, or `ce:work`.
+
+## Alternatives Considered
+
+- Keep the current `ce:plan` and only reject PR #246.
+  Rejected because the underlying issue remains: the current skill already drifts toward issue-template output plus pseudo-implementation.
+- Adopt Superpowers `writing-plans` nearly wholesale.
+  Rejected because it is intentionally execution-script-oriented and collapses planning into detailed code-writing and command choreography.
+- Adopt iterative-engineering `tech-planning` wholesale.
+  Rejected because it would lose useful compound-engineering behaviors such as brainstorm-origin integration, institutional learnings, and richer post-plan handoff options.
+
+## Dependencies / Assumptions
+
+- `ce:work` can continue creating its own actionable task list from a decision-first plan.
+- If `ce:work` later benefits from an explicit section such as `## Implementation Units` or `## Work Breakdown`, that should be a separate follow-up designed around execution needs rather than micro-step code generation.
+
+## Resolved During Planning
+
+- [Affects R10][Technical] Replaced `MINIMAL` / `MORE` / `A LOT` with `Lightweight` / `Standard` / `Deep` to align `ce:plan` with `ce:brainstorm`'s scope model.
+- [Affects R9][Technical] Updated `ce:work` to explicitly consume decision-first plan sections such as `Implementation Units`, `Requirements Trace`, `Files`, `Test Scenarios`, and `Verification`.
+- [Affects R2][Needs research] Kept SpecFlow as a conditional planning aid: use it for `Standard` or `Deep` plans when flow completeness is unclear rather than making it mandatory for every plan.
+
+## Next Steps
+
+-> Review, refine, and commit the `ce:plan` and `ce:work` rewrite
diff --git a/docs/solutions/skill-design/beta-skills-framework.md b/docs/solutions/skill-design/beta-skills-framework.md
@@ -0,0 +1,96 @@
+---
+title: "Beta skills framework: parallel skills with -beta suffix for safe rollouts"
+category: skill-design
+date: 2026-03-17
+module: plugins/compound-engineering/skills
+component: SKILL.md
+tags:
+  - skill-design
+  - beta-testing
+  - skill-versioning
+  - rollout-safety
+severity: medium
+description: "Pattern for trialing new skill versions alongside stable ones using a -beta suffix. Covers naming, plan file naming, internal references, and promotion path."
+related:
+  - docs/solutions/skill-design/compound-refresh-skill-improvements.md
+---
+
+## Problem
+
+Core workflow skills like `ce:plan` and `deepen-plan` are deeply chained (`ce:brainstorm` → `ce:plan` → `deepen-plan` → `ce:work`) and orchestrated by `lfg` and `slfg`. Rewriting these skills risks breaking the entire workflow for all users simultaneously. There was no mechanism to let users trial new skill versions alongside stable ones.
+
+Alternatives considered and rejected:
+- **Beta gate in SKILL.md** with config-driven routing (`beta: true` in `compound-engineering.local.md`): relies on prompt-level conditional routing which risks instruction blending, requires setup integration, and adds complexity to the skill files themselves.
+- **Pure router SKILL.md** with both versions in `references/`: adds file-read penalty and refactors stable skills unnecessarily.
+- **Separate beta plugin**: heavy infrastructure for a temporary need.
+
+## Solution
+
+### Parallel skills with `-beta` suffix
+
+Create separate skill directories alongside the stable ones. Each beta skill is a fully independent copy with its own frontmatter, instructions, and internal references.
+
+```
+skills/
+├── ce-plan/SKILL.md           # Stable (unchanged)
+├── ce-plan-beta/SKILL.md      # New version
+├── deepen-plan/SKILL.md       # Stable (unchanged)
+└── deepen-plan-beta/SKILL.md  # New version
+```
+
+### Naming and frontmatter conventions
+
+- **Directory**: `<skill-name>-beta/`
+- **Frontmatter name**: `<skill:name>-beta` (e.g., `ce:plan-beta`)
+- **Description**: Write the intended stable description, then prefix with `[BETA]`. This ensures promotion is a simple prefix removal rather than a rewrite.
+- **`disable-model-invocation: true`**: Prevents the model from auto-triggering the beta skill. Users invoke it manually with the slash command. Remove this field when promoting to stable.
+- **Plan files**: Use `-beta-plan.md` suffix (e.g., `2026-03-17-001-feat-auth-flow-beta-plan.md`) to avoid clobbering stable plan files
+
+### Internal references
+
+Beta skills must reference each other by their beta names:
+- `ce:plan-beta` references `/deepen-plan-beta` (not `/deepen-plan`)
+- `deepen-plan-beta` references `ce:plan-beta` (not `ce:plan`)
+
+### What doesn't change
+
+- Stable `ce:plan` and `deepen-plan` are completely untouched
+- `lfg`/`slfg` orchestration continues to use stable skills — no modification needed
+- `ce:brainstorm` still hands off to stable `ce:plan` — no modification needed
+- `ce:work` consumes plan files from either version (reads the file, doesn't care which skill wrote it)
+
+### Tradeoffs
+
+**Simplicity over seamless integration.** Beta skills exist as standalone, manually-invoked skills. They won't be auto-triggered by `ce:brainstorm` handoffs or `lfg`/`slfg` orchestration without further surgery to those skills, which isn't worth the complexity for a trial period.
+
+**Intended usage pattern:** A user can run `/ce:plan` for the stable output, then run `/ce:plan-beta` on the same input to compare the two plan documents side by side. The `-beta-plan.md` suffix ensures both outputs coexist in `docs/plans/` without collision.
+
+## Promotion path
+
+When the beta version is validated:
+
+1. Replace stable `SKILL.md` content with beta skill content
+2. Restore stable frontmatter: remove `[BETA]` prefix from description, restore stable `name:`
+3. Remove `disable-model-invocation: true` so the model can auto-trigger it
+4. Update all internal references back to stable names
+5. Restore stable plan file naming (remove `-beta` from the convention)
+6. Delete the beta skill directory
+7. Update README.md: remove from Beta Skills section, verify counts
+8. Verify `lfg`/`slfg` work with the promoted skill
+9. Verify `ce:work` consumes plans from the promoted skill
+
+## Validation
+
+After creating a beta skill, search its SKILL.md for references to the stable skill name it replaces. Any occurrence of the stable name without `-beta` is a missed rename — it would cause output collisions or route to the wrong skill.
+
+Check for:
+- **Output file paths** that use the stable naming convention instead of the `-beta` variant
+- **Cross-skill references** that point to stable skill names instead of beta counterparts
+- **User-facing text** (questions, confirmations) that mentions stable paths or names
+
+## Prevention
+
+- When adding a beta skill, always use the `-beta` suffix consistently in directory name, frontmatter name, description, plan file naming, and all internal skill-to-skill references
+- After creating a beta skill, run the validation checks above to catch missed renames in file paths, user-facing text, and cross-skill references
+- Always test that stable skills are completely unaffected by the beta skill's existence
+- Keep beta and stable plan file suffixes distinct so outputs can coexist for comparison
diff --git a/plugins/compound-engineering/.claude-plugin/plugin.json b/plugins/compound-engineering/.claude-plugin/plugin.json
@@ -1,7 +1,7 @@
 {
   "name": "compound-engineering",
   "version": "2.41.0",
-  "description": "AI-powered development tools. 29 agents, 42 skills, 1 MCP server for code review, research, design, and workflow automation.",
+  "description": "AI-powered development tools. 29 agents, 44 skills, 1 MCP server for code review, research, design, and workflow automation.",
   "author": {
     "name": "Kieran Klaassen",
     "email": "kieran@every.to",

diff --git a/plugins/compound-engineering/AGENTS.md b/plugins/compound-engineering/AGENTS.md
@@ -116,6 +116,43 @@ grep -E '`(references|assets|scripts)/[^`]+`' skills/*/SKILL.md
 grep -E '^description:' skills/*/SKILL.md
 ```
 
+## Beta Skills
+
+Beta skills are experimental versions of core workflow skills, published as separate skills with a `-beta` suffix (e.g., `ce-plan-beta`, `deepen-plan-beta`). They live alongside the stable versions and are invoked directly.
+
+See `docs/solutions/skill-design/beta-skills-framework.md` for the full pattern.
+
+### Beta Skill Rules
+
+- Beta skills use `-beta` suffix in directory name, skill name, and description prefix (`[BETA]`)
+- Beta skills set `disable-model-invocation: true` to prevent accidental auto-triggering — users invoke them manually
+- Beta skill descriptions should be the intended stable description prefixed with `[BETA]`, so promotion is a simple prefix removal
+- Beta skills must reference other beta skills by their beta names (e.g., `/deepen-plan-beta`, not `/deepen-plan`)
+- Beta plan output files use `-beta-plan.md` suffix to avoid clobbering stable plan files
+- Beta skills are not wired into `lfg`/`slfg` orchestration — invoke them directly
+
+### Beta Skill Validation
+
+After creating or modifying a beta skill, search its SKILL.md for any reference to the stable skill name it replaces. Occurrences of the stable name without `-beta` are missed renames that would cause output collisions or misrouting. Check for:
+
+- Output file paths using the stable naming convention instead of the `-beta` variant
+- Cross-skill references pointing to stable names instead of beta counterparts
+- User-facing text (questions, confirmations) mentioning stable paths or names
+
+### Promoting Beta to Stable
+
+When replacing a stable skill with its beta version:
+
+- [ ] Replace stable `SKILL.md` content with beta skill content
+- [ ] Restore stable frontmatter: remove `[BETA]` prefix from description, restore stable `name:` (e.g., `ce:plan` not `ce:plan-beta`)
+- [ ] Remove `disable-model-invocation: true` so the model can auto-trigger the skill
+- [ ] Update all internal references back to stable names (`/deepen-plan` not `/deepen-plan-beta`)
+- [ ] Restore stable plan file naming (remove `-beta` from `-beta-plan.md` convention)
+- [ ] Delete the beta skill directory
+- [ ] Update README.md: remove from Beta Skills section, verify counts
+- [ ] Verify `lfg`/`slfg` still work with the updated stable skill
+- [ ] Verify `ce:work` consumes plans from the promoted skill correctly
+
 ## Documentation
 
 See `docs/solutions/plugin-versioning-requirements.md` for detailed versioning workflow.
diff --git a/plugins/compound-engineering/README.md b/plugins/compound-engineering/README.md
@@ -7,7 +7,7 @@ AI-powered development tools that get smarter with every use. Make each unit of
 | Component | Count |
 |-----------|-------|
 | Agents | 29 |
-| Skills | 42 |
+| Skills | 44 |
 | MCP Servers | 1 |
 
 ## Agents
@@ -90,7 +90,7 @@ Core workflow commands use `ce:` prefix to unambiguously identify them as compou
 |---------|-------------|
 | `/lfg` | Full autonomous engineering workflow |
 | `/slfg` | Full autonomous workflow with swarm mode for parallel execution |
-| `/deepen-plan` | Enhance plans with parallel research agents for each section |
+| `/deepen-plan` | Stress-test plans and deepen weak sections with targeted research |
 | `/changelog` | Create engaging changelogs for recent merges |
 | `/create-agent-skill` | Create or edit Claude Code skills |
 | `/generate_command` | Generate new slash commands |
@@ -156,6 +156,17 @@ Core workflow commands use `ce:` prefix to unambiguously identify them as compou
 |-------|-------------|
 | `agent-browser` | CLI-based browser automation using Vercel's agent-browser |
 
+### Beta Skills
+
+Experimental versions of core workflow skills. These are being tested before replacing their stable counterparts. They work standalone but are not yet wired into the automated `lfg`/`slfg` orchestration.
+
+| Skill | Description | Replaces |
+|-------|-------------|----------|
+| `ce:plan-beta` | Decision-first planning focused on boundaries, sequencing, and verification | `ce:plan` |
+| `deepen-plan-beta` | Selective stress-test that targets weak sections with research | `deepen-plan` |
+
+To test: invoke `/ce:plan-beta` or `/deepen-plan-beta` directly. Plans produced by the beta skills are compatible with `/ce:work`.
+
 ### Image Generation
 
 | Skill | Description |