A high-performance, cross-platform market data collection system for real-time and historical market microstructure data.
Status: Development / Pilot Ready
Market Data Collector is a complete solution for building your own market data archive. It connects to financial data providers, captures market data in real-time, and stores everything locally so you have full ownership and offline access to your data.
| Capability | What It Does For You |
|---|---|
| 📡 Real-Time Streaming | Capture live trades, quotes, and order book depth as they happen from Interactive Brokers, Alpaca, NYSE, Polygon, or StockSharp |
| 📥 Historical Backfill | Download years of historical price data from 10+ providers (Yahoo Finance, Tiingo, Polygon, Alpaca, and more) with automatic failover |
| 💾 Local Data Storage | Own your data—everything is stored in structured JSONL or Parquet files on your machine, not locked in a vendor's cloud |
| 🔍 Data Quality Monitoring | Automatic validation catches missing data, sequence gaps, and anomalies before they corrupt your analysis |
| 📦 Data Packaging | Export and package your data for sharing, backup, or use in other tools |
| 📊 Live Dashboards | Monitor collection status, throughput, and data quality through a web dashboard or Windows desktop app |
| 🔬 Backtesting Integration | Feed your collected data directly into QuantConnect Lean for algorithmic strategy development |
- Quantitative researchers who need tick-level market microstructure data for analysis
- Algorithmic traders building strategies that require historical and real-time market data
- Data engineers who want to build a reliable market data pipeline
- Hobbyist traders who want to collect and own their own market data archive
- Students and academics studying market microstructure, price formation, or trading systems
Commercial market data is expensive, vendor APIs change without notice, and cloud-only solutions mean you never truly own your data. Market Data Collector gives you:
- Data independence — Switch providers without losing your archive or rewriting code
- Cost control — Use free-tier APIs strategically, pay only for premium data you actually need
- Reliability — Automatic reconnection, failover between providers, and data integrity checks
- Flexibility — Collect exactly the symbols and data types you need, store them how you want
Use the installation orchestrator script for all setups. It keeps Docker and native installs consistent across platforms.
# Interactive installer (Docker or Native)
./scripts/install/install.sh
# Or choose a mode explicitly
./scripts/install/install.sh --docker
./scripts/install/install.sh --nativeAccess the dashboard at http://localhost:8080
The PowerShell installer mirrors the same workflow on Windows.
# Interactive installation
.\scripts\install\install.ps1
# Or specify mode directly
.\scripts\install\install.ps1 -Mode Docker
.\scripts\install\install.ps1 -Mode NativeMake targets remain available as thin wrappers around the installer and runtime commands.
make help
make docker
make run-ui
make test
make doctorDesktop-focused helpers:
make desktop-dev-bootstrap # Validate desktop development environment (PowerShell)
make build-wpf # Build WPF desktop app (Windows only)
make test-desktop-services # Run desktop-focused tests (includes WPF service tests on Windows)WPF Desktop App (Recommended) - Modern WPF application with maximum Windows stability
- Works on Windows 7+
- Simple .exe deployment
- Direct assembly references (no WinRT limitations)
- See WPF README for details
Installation:
# Build from source
dotnet build src/MarketDataCollector.Wpf/MarketDataCollector.Wpf.csproj -c Release
# Run
dotnet run --project src/MarketDataCollector.Wpf/MarketDataCollector.Wpf.csprojFor contributors working on the WPF desktop application, use the desktop development bootstrap script to validate your environment:
make desktop-dev-bootstrapThis script validates:
- ✅ .NET 9 SDK installation
- ✅ Windows SDK presence (Windows only)
- ✅ Visual Studio Build Tools
- ✅ XAML tooling support
- ✅ Desktop project restore and smoke build
The script provides actionable fix messages for any missing components.
The repository includes comprehensive tests for desktop services:
# Run all desktop-focused tests (platform-aware)
make test-desktop-services
# On Windows, this runs:
# - WPF service tests (NavigationService, ConfigService, StatusService, ConnectionService)
# - Desktop UI service tests (ApiClientService, BackfillService, FixtureDataService, etc.)
# - Integration tests for desktop-specific functionality
# On non-Windows platforms, runs available integration tests onlyTest Projects:
tests/MarketDataCollector.Wpf.Tests- 58 tests for WPF singleton services- NavigationServiceTests (14 tests)
- ConfigServiceTests (13 tests)
- StatusServiceTests (13 tests)
- ConnectionServiceTests (18 tests)
tests/MarketDataCollector.Ui.Tests- 71 tests for desktop UI services- ApiClientServiceTests, BackfillServiceTests, FixtureDataServiceTests
- FormValidationServiceTests, SystemHealthServiceTests, WatchlistServiceTests
- BoundedObservableCollectionTests, CircularBufferTests
# Build WPF application (Windows only)
make build-wpfTroubleshooting install
- Missing signing certificate: Install the signing cert or use a release-signed package; unsigned MSIX packages cannot be installed.
- Blocked package: If Windows SmartScreen blocks the installer, choose More info → Run anyway, or unblock the downloaded file in Properties.
Market Data Collector is built on .NET 9.0 using C# 13 and F# 8.0 across 635 source files. It uses a modular, event-driven architecture with bounded channels for high-throughput data processing. The system supports deployment as a single self-contained executable, a Docker container, or a systemd service.
- Multi-provider ingest: Interactive Brokers (L2 depth, tick-by-tick), Alpaca (WebSocket streaming), NYSE (direct feed), Polygon (aggregates), StockSharp (multi-exchange)
- Historical backfill: 10+ providers (Yahoo Finance, Stooq, Tiingo, Alpha Vantage, Finnhub, Nasdaq Data Link, Polygon, IB) with automatic failover
- Provider-agnostic architecture: Swap feeds without code changes and preserve stream IDs for reconciliation
- Microstructure detail: Tick-by-tick trades, Level 2 order book, BBO quotes, and order-flow statistics
- High-performance pipeline: Bounded channel architecture (default 50,000 events) with configurable backpressure
- Integrity validation: Sequence checks and order book integrity enforcement with dedicated event emission
- Hot configuration reload: Apply subscription changes without restarting the collector
- Graceful shutdown: Flushes all events and metrics before exit
- Flexible JSONL storage: Default BySymbol naming convention
{root}/{symbol}/{type}/{date}.jsonlfor optimal organization (also supports ByDate, ByType, Flat) with optional gzip compression - Partitioning and retention: Daily/hourly/monthly/none plus retention by age or total capacity
- Data replay: Stream historical JSONL files for backtesting and research
- Web dashboard: Modern HTML dashboard for live monitoring, integrity event tracking, and backfill controls
- Native Windows app: WPF desktop application for Windows-only configuration and monitoring
- Metrics and status: Prometheus metrics at
/metrics, JSON status at/status, HTML dashboard at/ - Logging: Structured logging via Serilog with ready-to-use sinks
- Secure credential management: Windows CredentialPicker integration for API keys and secrets
- Credential protection:
.gitignoreexcludes sensitive configuration files from version control - Environment variable support: API credentials via environment variables for production deployments
- Monolithic core: Simple, maintainable architecture with optional UI components
- Provider abstraction: Clean interfaces for adding new data providers
- ADR compliance: Architecture Decision Records for consistent design patterns
For first-time users, the interactive wizard guides you through setup:
# Clone the repository
git clone https://github.com/rodoHasArrived/Market-Data-Collector.git
cd Market-Data-Collector
# Run the interactive configuration wizard (recommended for new users)
dotnet run --project src/MarketDataCollector/MarketDataCollector.csproj -- --wizardThe wizard will:
- Detect available data providers from your environment
- Guide you through provider selection and configuration
- Set up symbols, storage, and backfill options
- Generate your
appsettings.jsonautomatically
If you have environment variables set, use quick auto-configuration:
# Auto-detect providers from environment variables
dotnet run --project src/MarketDataCollector/MarketDataCollector.csproj -- --auto-config
# Check what providers are available
dotnet run --project src/MarketDataCollector/MarketDataCollector.csproj -- --detect-providers
# Validate your API credentials
dotnet run --project src/MarketDataCollector/MarketDataCollector.csproj -- --validate-credentials# Clone the repository
git clone https://github.com/rodoHasArrived/Market-Data-Collector.git
cd Market-Data-Collector
# Copy the sample settings and edit as needed
cp config/appsettings.sample.json config/appsettings.json
# Option 1: Launch the web dashboard (serves HTML + Prometheus + JSON status)
dotnet run --project src/MarketDataCollector/MarketDataCollector.csproj -- --ui --watch-config --http-port 8080
# Run smoke test (no provider connectivity required)
dotnet run --project src/MarketDataCollector/MarketDataCollector.csproj
# Run self-tests
dotnet run --project src/MarketDataCollector/MarketDataCollector.csproj -- --selftest
# Historical backfill with overrides
dotnet run --project src/MarketDataCollector/MarketDataCollector.csproj -- \
--backfill --backfill-provider stooq --backfill-symbols SPY,AAPL \
--backfill-from 2024-01-01 --backfill-to 2024-01-05Access the monitoring dashboard at http://localhost:8080, JSON status at http://localhost:8080/status, and Prometheus metrics at http://localhost:8080/metrics.
Comprehensive documentation is available in the docs/ directory:
- docs/README.md - Product overview, CLI/UI usage, and configuration highlights
- docs/HELP.md - Comprehensive user guide with troubleshooting and FAQ
- docs/getting-started/README.md - End-to-end setup for local development
- docs/HELP.md#configuration - Detailed explanation of every setting including backfill
- docs/architecture/overview.md - System architecture and design
- docs/operations/operator-runbook.md - Operations guide and production deployment
- docs/architecture/domains.md - Event contracts and domain models
- docs/architecture/c4-diagrams.md - System diagrams
- docs/integrations/lean-integration.md - QuantConnect Lean integration guide and examples
- docs/architecture/storage-design.md - Advanced storage organization and data management strategies
- docs/development/github-actions-summary.md - Build observability toolkit and CLI reference
- CLAUDE.md - Main AI assistant guide
- docs/ai/ - Specialized guides for subsystems (F#, providers, storage, testing)
- Interactive Brokers - L2 market depth, tick-by-tick trades, quotes via IB Gateway
- Alpaca - Real-time trades and quotes via WebSocket
- NYSE - Direct NYSE market data feed
- Polygon - Real-time trades, quotes, and aggregates
- StockSharp - Multi-exchange connectors
| Provider | Free Tier | Data Types | Rate Limits |
|---|---|---|---|
| Alpaca | Yes (with account) | Bars, trades, quotes | 200/min |
| Polygon | Limited | Bars, trades, quotes, aggregates | Varies |
| Tiingo | Yes | Daily bars | 500/hour |
| Yahoo Finance | Yes | Daily bars | Unofficial |
| Stooq | Yes | Daily bars | Low |
| StockSharp | Yes (with account) | Various | Varies |
| Finnhub | Yes | Daily bars | 60/min |
| Alpha Vantage | Yes | Daily bars | 5/min |
| Nasdaq Data Link | Limited | Various | Varies |
| Interactive Brokers | Yes (with account) | All types | IB pacing rules |
Configure fallback chains in appsettings.json under Backfill.ProviderPriority for automatic failover between providers.
- Alpaca - Symbol search with exchange info
- Finnhub - Global symbol search
- Polygon - Ticker search with market data
- OpenFIGI - FIGI-based instrument resolution
- StockSharp - Multi-exchange symbol search
Market Data Collector now integrates with QuantConnect's Lean Engine, enabling sophisticated algorithmic trading strategies:
- Custom Data Types: Trade and quote data exposed as Lean
BaseDatatypes - Backtesting Support: Use collected tick data for algorithm backtesting
- Data Provider: Custom
IDataProviderimplementation for JSONL files - Sample Algorithms: Ready-to-use examples for microstructure-aware trading
See src/MarketDataCollector/Integrations/Lean/README.md for integration details and examples.
Market data is stored as newline-delimited JSON (JSONL) files with:
- Configurable naming conventions (by symbol, date, or type)
- Optional gzip compression
- Automatic retention management
- Data integrity events alongside market data
The built-in HTTP server provides:
- Prometheus metrics at
/metrics - JSON status at
/status - Live HTML dashboard at
/ - Health checks at
/health,/ready,/live(Kubernetes-compatible) - Backfill status and controls at
/api/backfill/*
Monitor event throughput, drop rates, integrity events, and pipeline statistics in real-time. Initiate or review historical backfill jobs directly from the dashboard without restarting the collector.
# Production deployment with Docker Compose
make docker
# With monitoring stack (Prometheus + Grafana)
make docker-monitoring
# View logs
make docker-logs
# Health check
curl http://localhost:8080/healthThe application supports Kubernetes-style health probes:
- Liveness:
/liveor/livez - Readiness:
/readyor/readyz - Health:
/healthor/healthz(detailed JSON response)
# Copy service file
sudo cp deploy/systemd/marketdatacollector.service /etc/systemd/system/
# Enable and start
sudo systemctl enable marketdatacollector
sudo systemctl start marketdatacollectorAPI credentials can be set via environment variables:
export ALPACA__KEYID=your-key-id
export ALPACA__SECRETKEY=your-secret-keyThe repository includes 22 GitHub Actions workflows for automated testing, security, and deployment:
- Build & Release - Automated builds, cross-platform releases, and reusable build workflow
- Security - CodeQL analysis, dependency auditing, and vulnerability scanning
- Docker - Container image builds and publishing
- Performance Benchmarks - Track performance metrics over time
- Code Quality - Linting, formatting, and code analysis
- Test Matrix - Multi-platform testing (Windows, Linux, macOS)
- Desktop Builds - WPF desktop application builds
- PR Checks - Pull request validation
- Auto Labeling - Intelligent PR and issue labeling
- Stale Management - Automatic issue/PR lifecycle management
- Build Observability - Build metrics and diagnostic capture
- Documentation - Auto-update docs, AI instruction sync, TODO scanning
- Diagrams - Architecture and UML diagram generation
- Scheduled Maintenance - Automated maintenance tasks
- Nightly - Nightly builds and extended checks
- Workflow Validation - Self-validating workflow correctness
See .github/workflows/README.md for detailed documentation.
# Run full environment health check
make doctor
# Quick environment check
make doctor-quick
# Run build diagnostics
make diagnose
# Show build metrics
make metrics| Issue | Solution |
|---|---|
| Build fails with NETSDK1100 | Ensure EnableWindowsTargeting=true is set in Directory.Build.props |
| Provider connection errors | Run make doctor and verify API credentials with --validate-credentials |
| Missing configuration | Run make setup-config to create appsettings.json from template |
| High memory usage | Check channel capacity settings in EventPipeline configuration |
| Rate limit errors | Review ProviderRateLimitTracker logs and adjust request intervals |
# Collect debug bundle for issue reporting
make collect-debug
# Build with binary log for detailed analysis
make build-binlog
# Validate JSONL data integrity
make validate-dataFor more troubleshooting details, see docs/HELP.md.
Pre-built Docker images are available from GitHub Container Registry:
# Pull the latest image
docker pull ghcr.io/rodohasarrived/market-data-collector:latest
# Run the container
docker run -d -p 8080:8080 \
-v $(pwd)/data:/app/data \
-v $(pwd)/appsettings.json:/app/appsettings.json:ro \
ghcr.io/rodohasarrived/market-data-collector:latest635 source files | 623 C# | 12 F# | 163 test files | 130 documentation files
Market-Data-Collector/
├── .github/ # CI/CD workflows (22), AI prompts, Dependabot
├── docs/ # Documentation (130 files), ADRs, AI assistant guides
├── build/ # Build tooling (Python, Node.js, .NET generators, scripts)
├── deploy/ # Docker, systemd, and monitoring configs
├── config/ # Configuration files (appsettings.json)
├── src/
│ ├── MarketDataCollector/ # Entry point, integrations, web UI server
│ ├── MarketDataCollector.Application/ # Startup, config, services, pipeline, monitoring
│ ├── MarketDataCollector.Core/ # Shared config models, exceptions, logging, serialization
│ ├── MarketDataCollector.Domain/ # Business logic, collectors, events, models
│ ├── MarketDataCollector.Infrastructure/ # Provider implementations, resilience, HTTP
│ ├── MarketDataCollector.Storage/ # Data persistence, archival, export, packaging
│ ├── MarketDataCollector.Contracts/ # Shared DTOs and API contracts
│ ├── MarketDataCollector.ProviderSdk/ # Provider SDK interfaces and attributes
│ ├── MarketDataCollector.FSharp/ # F# domain models and validation (12 files)
│ ├── MarketDataCollector.Ui/ # Web dashboard
│ ├── MarketDataCollector.Ui.Shared/ # Shared UI endpoint handlers
│ ├── MarketDataCollector.Ui.Services/ # Shared UI service abstractions
│ └── MarketDataCollector.Wpf/ # WPF desktop app (Windows)
├── tests/ # C# and F# test projects (163 files)
│ ├── MarketDataCollector.Tests/ # Core unit and integration tests
│ ├── MarketDataCollector.FSharp.Tests/ # F# domain tests
│ ├── MarketDataCollector.Wpf.Tests/ # WPF service tests (Windows)
│ └── MarketDataCollector.Ui.Tests/ # Desktop UI service tests
├── benchmarks/ # Performance benchmarks (BenchmarkDotNet)
├── MarketDataCollector.sln
├── Makefile # Build automation (72 targets)
└── CLAUDE.md # AI assistant guide
See LICENSE file for details.
Contributions are welcome! Please see the CLAUDE.md for architecture details and coding guidelines before submitting pull requests.