Skip to content

shreeharshshinde/InspectAI

Repository files navigation

InspectAI

AI-powered code review assistant — CodeT5 + CodeBERT + FAISS RAG + MCP Orchestration

Phase 1 (Refined) | April 2026 | Research Prototype → Structured Codebase


What It Does

InspectAI analyzes pull request diffs and generates actionable code review comments:

PR Diff → [CodeT5 + RAG] → Review Comment + [CodeBERT] → Severity Label → GitHub Comment
                                                           └─ If critical/major → Jira Ticket

Quick Start

# 1. Setup
git clone <repo> && cd InspectAI
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev,train]"
cp .env.example .env    # Fill in INSPECTAI_GITHUB_TOKEN

# 2. Collect & train (first time)
make collect      # Fetch PR reviews from GitHub
make preprocess   # Parse diffs
make split        # Train/valid split
make train-codet5 # Fine-tune CodeT5
make train-codebert  # Fine-tune CodeBERT classifier
make index        # Build FAISS index

# 3. Serve
make serve

# 4. Test
curl -X POST http://localhost:8000/review \
  -H "Content-Type: application/json" \
  -d '{"diff": "+def foo(x):\n+    eval(x)", "use_rag": true}'

Architecture

src/
├── core/          # Shared: config, embeddings, retrieval, models, logger
├── data/          # ETL: collect → preprocess → split
├── training/      # Model training: CodeT5, CodeBERT, FAISS index, evaluation
├── inference/     # Hot path: generator, classifier, pipeline
├── integrations/  # External APIs: GitHub, Jira, static analysis
├── mcp/           # Orchestration: MCP workflow
├── feedback/      # Feedback loop (Phase 2)
└── api/           # FastAPI: routes, schemas, middleware

Key design decisions documented in docs/architecture.md


API

Endpoint Description
GET /health Service status + model info
POST /review Submit diff → get review + severity
POST /feedback Accept/reject feedback (Phase 2)

Full API reference: docs/api_reference.md


Models

Model Role Parameters
Salesforce/codet5-small Review generation (seq2seq) 60M
microsoft/codebert-base Severity classification (5 classes) 125M
microsoft/codebert-base Text embedding for FAISS RAG 125M

Severity Classification

Label Triggers Jira Meaning
critical Security issues, data loss, crashes
major Logic bugs, performance issues
minor Edge cases, missing validation
style PEP 8, naming, formatting
nit Optional suggestions

Documentation

Document Contents
docs/MASTER_KNOWLEDGE.md Start here — complete project reference
docs/architecture.md Component internals and data flow
docs/api_reference.md Endpoint documentation with examples
docs/research_notes.md Paper plan, experiments, venues
docs/PHASE2_PLAN.md What to build next

Research

InspectAI is a research project targeting publication at MSR/SANER/ICSE workshops.

Core contribution: First unified system combining code review generation, severity classification, RAG-augmented context, static analysis signals, and a developer feedback loop.

Novel metric: Actionability Score — measures how actionable a review comment is (implemented in src/training/evaluate.py).

See docs/research_notes.md for paper plan, experiment designs, and target venues.


Phase Status

Phase Status Description
Phase 0 ✅ Complete Original flat prototype (archived)
Phase 1 Current Refactored, modular, documented codebase
Phase 2A ⬜ Next Code hygiene (delete dead code, ruff, real .gitignore)
Phase 2B Evaluation framework (BLEU, ROUGE, Actionability)
Phase 2C Model upgrades (codet5-base, real severity annotations)
Phase 2D Real integrations (GitHub webhook, Jira, static analysis)
Phase 2E Feedback loop (SQLite store, active learning retraining)

About

InspectAI is an experimental project aimed at building an AI-powered code review assistant. The goal is to integrate AI models + static analysis + Jira to help developers catch issues early, reduce CI/CD failures, and make reviews more efficient.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors