🕷️ CrawlMind

AI-Powered Web Crawling & Research Platform

Paste a URL. Describe your research. Let AI do the rest.

CrawlMind combines Cloudflare's crawl infrastructure with AI-powered URL discovery and multi-hop research synthesis — turning any query into structured, crawled knowledge.

Getting Started · Features · Architecture · Deploy

✨ Features

Core Crawling

Smart Input — Auto-detects URLs vs. natural language; just paste or type
Cloudflare-Powered — Fast, reliable crawling via Cloudflare's Browser Rendering API
Multi-Format Output — Markdown, HTML, plaintext, or cleaned readable HTML
JS Rendering — Crawl JavaScript-heavy SPAs with headless rendering
Advanced Controls — Depth, page limits, subdomain inclusion, URL patterns, date filters

🧠 AI Discovery (New)

AI URL Discovery — Describe what you need; Groq finds the best sources to crawl
Depth Tiers — Quick (~30s), Deep Dive (~2min), or Multi-hop Research (~5min)
Multi-Hop Research — Crawl → analyze gaps → discover follow-up sources → repeat (up to 3 rounds)
AI Synthesis — NVIDIA NIM generates a comprehensive research report from all crawled data
Parent-Child Jobs — Research jobs manage multiple sub-crawls independently, no interference with normal crawls

Platform

AI Chat — Ask questions about crawl results with full context awareness
Soft-Delete Library — Archive, restore, and manage past crawls
Analytics Dashboard — Track crawl usage, search patterns, and AI queries
Plan-Based Limits — Tiered pricing with Stripe integration
Auth — GitHub, Google, and email sign-in via Better Auth

🏗️ Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        USER INPUT                               │
│         URL / Natural Language / AI Discovery Toggle             │
└─────────────┬──────────────────────────┬────────────────────────┘
              │                          │
        URL detected              AI Discovery ON
              │                          │
              ▼                          ▼
    ┌─────────────────┐     ┌──────────────────────────┐
    │  POST /api/crawl │     │   POST /api/research     │
    │  Normal Pipeline │     │   AI Research Pipeline   │
    └────────┬────────┘     └────────────┬─────────────┘
             │                           │
             ▼                           ▼
    ┌─────────────────┐     ┌──────────────────────────┐
    │ Cloudflare Crawl │     │ Groq: Discover URLs      │
    │ Single Job       │     │ (llama-3.3-70b-versatile)│
    └────────┬────────┘     └────────────┬─────────────┘
             │                           │
             │                           ▼
             │              ┌──────────────────────────┐
             │              │ Spawn Parallel Sub-Crawls │
             │              │ via Cloudflare Crawl API  │
             │              └────────────┬─────────────┘
             │                           │
             │              ┌────────────▼─────────────┐
             │              │ RESEARCH tier only:       │
             │              │ NIM Gap Analysis →        │
             │              │ Follow-up Crawls (×3)     │
             │              └────────────┬─────────────┘
             │                           │
             │                           ▼
             │              ┌──────────────────────────┐
             │              │ NIM: Synthesis Report     │
             │              │ (nemotron-super-49b)      │
             ▼              └────────────┬─────────────┘
    ┌─────────────────┐                  │
    │  Neon PostgreSQL │◄────────────────┘
    │  (Prisma ORM)    │
    └─────────────────┘

🛠️ Tech Stack

Layer	Technology	Purpose
Framework	Next.js 15 (App Router)	Full-stack React with server components
Database	Neon PostgreSQL + Prisma	Serverless Postgres with type-safe ORM
Auth	Better Auth	GitHub, Google, email authentication
Crawling	Cloudflare Crawl API	Browser rendering + web crawling at scale
AI — Fast	Groq (`llama-3.3-70b`)	URL discovery (~200ms responses)
AI — Deep	NVIDIA NIM (`nemotron-super-49b`)	Gap analysis + synthesis reports
AI Chat	Vercel AI SDK	Streaming chat over crawl results
Payments	Stripe	Subscription billing + webhooks
Styling	Tailwind CSS + shadcn/ui	Utility-first CSS + accessible components
Deployment	Vercel	Edge-optimized serverless hosting

🚀 Getting Started

Prerequisites

Bun v1.0+
Neon PostgreSQL database
Cloudflare account with Crawl API access
Groq API key (for AI URL discovery)
NVIDIA NIM API key (for synthesis)

Quick Start

# Clone
git clone https://github.com/pantha704/CrawlMind.git
cd CrawlMind

# Install
bun install

# Configure
cp .env.example .env.local
# Edit .env.local with your keys (see below)

# Database
bunx prisma db push
bunx prisma generate

# Run
bun run dev

Environment Variables

# Database (Neon)
DATABASE_URL=postgresql://...

# Auth
BETTER_AUTH_SECRET=your-secret
BETTER_AUTH_URL=http://localhost:3001
GITHUB_CLIENT_ID=...
GITHUB_CLIENT_SECRET=...
GOOGLE_CLIENT_ID=...
GOOGLE_CLIENT_SECRET=...

# Cloudflare
CLOUDFLARE_API_TOKEN=...
CLOUDFLARE_ACCOUNT_ID=...

# AI
GROQ_API_KEY=...          # For URL discovery (Groq)
NVIDIA_NIM_API_KEY=...    # For synthesis (NVIDIA NIM)

# Stripe
STRIPE_SECRET_KEY=...
STRIPE_WEBHOOK_SECRET=...
NEXT_PUBLIC_STRIPE_PUBLISHABLE_KEY=...

# App
NEXT_PUBLIC_APP_URL=http://localhost:3001

📁 Project Structure

src/
├── app/
│   ├── api/
│   │   ├── crawl/              # Crawl CRUD, results proxy, cancel
│   │   ├── research/           # AI Discovery — create, poll, active
│   │   ├── chat/               # AI chat endpoint
│   │   ├── stripe/             # Payment webhooks
│   │   └── user/               # Usage tracking & settings
│   ├── dashboard/
│   │   ├── page.tsx            # Main dashboard
│   │   ├── jobs/               # Crawl job list + detail
│   │   ├── research/           # AI research detail page
│   │   ├── chat/               # AI chat interface
│   │   ├── library/            # Archived results
│   │   └── analytics/          # Usage analytics
│   ├── pricing/                # Pricing page
│   └── (auth)/                 # Sign in / sign up
├── components/
│   ├── dashboard/              # Dashboard UI (crawl-input, active-jobs, etc.)
│   ├── landing/                # Landing page components
│   └── ui/                     # shadcn/ui primitives
└── lib/
    ├── auth.ts                 # Better Auth config
    ├── cloudflare.ts           # Cloudflare Crawl API client
    ├── research.ts             # AI Discovery — Groq + NIM integration
    ├── ai.ts                   # AI model configuration
    ├── prisma.ts               # Prisma client
    └── stripe.ts               # Stripe client

🧠 AI Discovery — How It Works

Tier	What Happens	Sources	Time
⚡ Quick	AI finds 3-5 relevant sources, crawls them	3-5	~30s
🔍 Deep Dive	AI discovers 10-15 categorized sources	10-15	~2min
🧠 Research	Multi-hop: crawl → gap analysis → follow-up crawls (×3 rounds) → synthesis	15-30+	~5min

Models used:

Groq (llama-3.3-70b-versatile) — Fast URL discovery (~200ms)
NVIDIA NIM (nemotron-super-49b-v1.5) — Deep analysis & comprehensive synthesis

💳 Pricing Tiers

Plan	Price	Crawls/day	Pages/crawl	AI Chat	JS Render
Spark	Free	2	30	3 queries	❌
Pro	$12/mo	25	500	Unlimited	✅
Pro+	$24/mo	75	1,000	Unlimited	✅
Scale	$39/mo	150	5,000	Unlimited	✅

🚢 Deploy

Vercel (Recommended)

Push to GitHub
Import in Vercel
Add all environment variables
Set NEXT_PUBLIC_APP_URL to your Vercel domain
Deploy

Note: Ensure NEXT_PUBLIC_APP_URL points to your deployed domain (not localhost) for webhooks and auth callbacks.

📄 License

MIT — see LICENSE for details.

Built with ☕ and curiosity

Name		Name	Last commit message	Last commit date
Latest commit History 92 Commits
docs		docs
prisma		prisma
public		public
scripts		scripts
src		src
.antigravityignore		.antigravityignore
.gitignore		.gitignore
README.md		README.md
bun.lock		bun.lock
components.json		components.json
eslint.config.mjs		eslint.config.mjs
next.config.ts		next.config.ts
package.json		package.json
postcss.config.mjs		postcss.config.mjs
prisma.config.ts		prisma.config.ts
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🕷️ CrawlMind

AI-Powered Web Crawling & Research Platform

✨ Features

Core Crawling

🧠 AI Discovery (New)

Platform

🏗️ Architecture

🛠️ Tech Stack

🚀 Getting Started

Prerequisites

Quick Start

Environment Variables

📁 Project Structure

🧠 AI Discovery — How It Works

💳 Pricing Tiers

🚢 Deploy

Vercel (Recommended)

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🕷️ CrawlMind

AI-Powered Web Crawling & Research Platform

✨ Features

Core Crawling

🧠 AI Discovery (New)

Platform

🏗️ Architecture

🛠️ Tech Stack

🚀 Getting Started

Prerequisites

Quick Start

Environment Variables

📁 Project Structure

🧠 AI Discovery — How It Works

💳 Pricing Tiers

🚢 Deploy

Vercel (Recommended)

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages