This project demonstrates a Minimum Viable Product (MVP) of an "Autonomous Software Factory" using PocketFlow for workflow orchestration, Streamlit for a web-based GUI, and OpenAI LLMs for various AI-driven tasks in a simplified software development lifecycle.
The application allows a user to describe a Python function. A series of AI agents then attempt to:
- Understand and plan the function.
- Generate Python code.
- Design and execute test cases against the generated code.
- Validate the code against basic rules.
- Allow the user to review, approve, or reject the code, providing feedback for refinement.
- Iteratively refine the code based on feedback or test/validation failures.
- Package the final approved code.
- Streamlit GUI: Interactive web interface for user input and feedback.
- PocketFlow Orchestration: Core SDLC logic (planning, coding, testing, critique, refinement) managed by PocketFlow nodes and flows.
- AI Agents (LLM-Powered):
- Architect/Planner Agent: Makes high-level decisions (Python/standard lib for MVP) and refines user requests into actionable plans or asks clarifying questions.
- Developer Agent: Generates and refines Python code based on plans and feedback.
- Test Case Designer Agent: Creates basic test cases for the planned function.
- QA Agent: Executes generated test cases against the code using a
code_tester_tool. - Validation Agent: Checks code against basic project standards.
- Critique Agent: Provides feedback to the Developer Agent if tests or validation fail, or if the user rejects the code.
- Security/Compliance Agent: Checks for basic security and compliance issues.
- Human-in-the-Loop (HITL):
- Initial requirement specification.
- Clarification responses if the AI planner is unsure.
- Review, approval, or rejection (with feedback) of generated code.
- Iterative Refinement: The system can loop through critique and code regeneration up to a configurable number of times.
- RAG Context: Simple file-based RAG for providing guidelines to agents (architectural, planning, coding, validation, debugging, security).
- SQLite Persistence: Task progress, generated code, test results, and feedback are stored in an SQLite database.
- Dockerized: The application is containerized for easy setup and consistent execution.
flowchart TD
A([User Input]) --> B(Architect/Planner Node)
B -->|Clear| C(Test Case Designer Node)
B -->|Needs Clarification| D[User Clarification]
D --> B
C --> E(Developer Node)
E --> F(QA Node)
F -->|All Tests Pass| G(Validation Node)
F -->|Test Fails| H(Critique Node)
G -->|Validation Pass| I[User Review]
G -->|Validation Fail| H
I -->|Approve| J(Package Node)
I -->|Reject| H
H --> E
J --> K([Done])
pocketflow_sft_dev_app/
├── app.py # Streamlit UI and main logic
├── nodes.py # PocketFlow Node definitions
├── flow.py # PocketFlow Flow definitions
├── utils/
│ ├── __init__.py
│ ├── call_llm.py
│ ├── tools.py
│ ├── prompts.py
│ └── database.py
├── rag_contexts/ # Text files for RAG
│ ├── architectural_principles.txt
│ # ... (other .txt files)
├── database/ # SQLite database will be created here
│ └── (sdlc_tasks.db) # (created at runtime if not volume-mapped)
├── output_artifacts/ # (Optional) For saving final packaged code
├── Dockerfile
├── requirements.txt
└── README.md # This file
-
Prerequisites:
- Docker and Docker Compose installed and running.
- An OpenAI API key.
-
Clone/Download Files: Ensure all project files are in a directory (e.g.,
autonomous-software-factory-design). -
Configure Secrets: Create a
.envfile in the project root with your secrets (do not commit this file):OPENAI_API_KEY=sk-your_actual_openai_api_key # You can add other environment variables here if needed
-
Build and Start the Application: From the project root, run:
docker compose up --build
This will use the provided
docker-compose.ymlandDockerfileto build and run the app. Your code will be mounted into the container for live reload during development. -
Persist the SQLite Database (Default): The database is stored in the
database/directory and is persisted by default via the volume mapping indocker-compose.yml. -
Access the Application: Open your web browser and navigate to
http://localhost:8501.
Environment variables can be set in your .env file or overridden in docker-compose.yml:
OPENAI_API_KEY(Required): Your OpenAI API key.ARCHITECT_LLM_MODEL(Default:gpt-4o): Model for the Architect/Planner.PLANNER_LLM_MODEL(Default:gpt-4o): Model for the Planner.DEVELOPER_LLM_MODEL(Default:gpt-3.5-turbo): Model for the Developer.TEST_DESIGNER_LLM_MODEL(Default:gpt-3.5-turbo): Model for the Test Case Designer.QA_LLM_MODEL(Default:gpt-4o): Model for the QA Agent (tool use).VALIDATION_LLM_MODEL(Default:gpt-3.5-turbo): Model for the Validation Agent.CRITIQUE_LLM_MODEL(Default:gpt-4o-mini): Model for the Critique Agent.MAX_PLANNER_ITERATIONS(Default:2): Max times planner will ask for clarification.MAX_REFINEMENTS(Default:3): Max times developer will refine code after rejection/failures.
The application uses Streamlit to manage different UI stages. Each stage might trigger a PocketFlow Flow composed of several Nodes.
- Input Requirements: User provides a description.
- Planning:
ArchitectPlannerNodeprocesses the request. If clear, it creates a plan. If ambiguous, it generates clarification questions. - Clarification (HITL): If questions are generated, the UI prompts the user. The refined request is fed back to the
ArchitectPlannerNode. This loop continues until the plan is clear or max iterations are hit. - Test Design & Code Generation: Once a plan is ready,
TestCaseDesignerNodegenerates test cases. Then,DeveloperNodegenerates the initial Python code. - Automated Testing & Validation:
QANodeexecutes each test case usingcode_tester_tool.ValidationNodechecks the code against predefined rules.SecurityComplianceNodechecks for security/compliance issues. - Human Review (HITL): The generated code, test results, and validation feedback are presented to the user.
- Approve: The task moves to completion.
- Reject: The user provides feedback.
- Critique & Refinement: If rejected (or if tests/validation failed),
CritiqueNodeanalyzes the issues and user feedback. TheDeveloperNodethen attempts to refine the code. This loop (back to step 5) continues until approval or max refinements. - Packaging/Completion: If approved,
PackageNodeprepares final output. If max refinements/planning iterations are hit, the process ends with a failure message.
SQLite is used to store the state of each task, including all generated artifacts and feedback, allowing for persistence.
If you see errors like KeyError: 'utils' or KeyError: 'nodes' in Docker logs:
- Ensure all imports in your code are absolute (e.g.,
from utils.prompts import ...not relative imports). - Add
ENV PYTHONPATH=/appto your Dockerfile (assuming your code is in/appin the container). - Make sure you run Streamlit from the project root (
WORKDIR /app). - Rebuild your Docker image after making these changes.
Dockerfile best practice for import issues in Streamlit:
Add this to ensure absolute imports work in Docker/Streamlit:
ENV PYTHONPATH=/appAnd make sure your Dockerfile has:
WORKDIR /app
CMD ["streamlit", "run", "app.py", "--server.address=0.0.0.0", "--server.runOnSave=true"]If you see KeyError: 'utils' or KeyError: 'nodes', this is almost always a Python import/module path issue in Docker.
- More sophisticated RAG using LlamaIndex or similar.
- Support for more complex software (multiple files, classes, dependencies).
- Advanced security (SAST/DAST) and compliance checks.
- Visual graph of the PocketFlow execution in Streamlit.
- Ability to load and resume previous tasks.