Enterprise-grade data lakehouse sync with ML/AI capabilities
Faborite is a comprehensive data platform that combines lakehouse synchronization, intelligent data transformations, machine learning, and AI-powered analytics - all in a single, production-ready package.
- 30+ Connectors: SQL databases, NoSQL stores, cloud storage, streaming platforms, time series databases
- Smart Sampling: Random, stratified, time-based, and custom SQL sampling strategies
- Multi-Format Support: Parquet, Delta Lake, Iceberg, CSV, JSON, Avro
- Real-time Streaming: Kafka, RabbitMQ, Redis Streams, Event Hubs
- AutoML: Automated model selection for classification, regression, and multiclass problems
- 7 ML Algorithms: Classification, regression, anomaly detection, forecasting, clustering, recommendations
- MLflow-style Tracking: Experiment management with metrics, parameters, and artifacts
- Feature Engineering: 8 transformation techniques including normalization, encoding, and binning
- Intelligent Schema Inference: Auto-detect 9 data types with confidence scoring
- Query Optimization: Detect and fix 8 SQL anti-patterns automatically
- PII Detection: Identify and mask sensitive data (Email, SSN, Credit Card, etc.)
- Smart Mapping: Automatic schema mapping with Levenshtein distance matching
- 22 Statistical Functions: Mean, median, percentiles, variance, skewness, kurtosis, etc.
- 12 Window Functions: ROW_NUMBER, RANK, LAG/LEAD, cumulative sums, moving averages
- Pivot/Unpivot: Cross-tabulation, transpose, multi-value pivots
- SQL Engine: DuckDB-powered analytics with lakehouse integration
# As .NET Global Tool
dotnet tool install -g Faborite
# Or download binary from releases
# https://github.com/mjtpena/faborite/releases# Sync data from Microsoft Fabric
faborite sync --workspace "MyWorkspace" --lakehouse "MyLakehouse"
# With custom sampling
faborite sync --workspace "MyWorkspace" --lakehouse "MyLakehouse" --sample-size 10000
# Export to specific format
faborite export --format parquet --output "./data"# Train classification model with AutoML
faborite ml train --data "./data.csv" --target "label" --type classification
# Detect PII in dataset
faborite ai detect-pii --data "./sensitive-data.csv" --mask partial
# Optimize SQL queries
faborite ai optimize-query --file "./query.sql"
# Infer schema from data
faborite ai infer-schema --data "./unknown-data.csv"- Relational: SQL Server, PostgreSQL, MySQL, Oracle, SQLite
- NoSQL: MongoDB, Cassandra, Elasticsearch, Redis
- Graph: Neo4j
- Data Warehouses: Snowflake, Azure Synapse, Google BigQuery
- AWS S3, Google Cloud Storage, Azure Blob Storage
- MinIO, Cloudflare R2, Backblaze B2, Wasabi
- Apache Kafka, RabbitMQ, Redis Streams
- Azure Event Hubs, AWS Kinesis
- InfluxDB, TimescaleDB, Prometheus
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Faborite Platform β
βββββββββββββββββββ¬ββββββββββββββββββββββββ¬βββββββββββββββββββββββ€
β Data Layer β Processing Layer β ML/AI Layer β
βββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββ€
β β’ 30+ Connectorsβ β’ Transformations β β’ AutoML β
β β’ Multi-format β β’ Window Functions β β’ Classification β
β β’ Streaming β β’ Aggregations β β’ Regression β
β β’ Batch β β’ Pivots β β’ Anomaly Detection β
β β β’ SQL Engine β β’ Forecasting β
β β β β’ Clustering β
β β β β’ PII Detection β
β β β β’ Schema Inference β
β β β β’ Query Optimization β
βββββββββββββββββββ΄ββββββββββββββββββββββββ΄βββββββββββββββββββββββ
- 28,500+ Lines of Code
- 30+ Production-Ready Connectors
- 12 ML/AI Engines
- 100+ Statistical Functions
- Full Async/Await Support
- Comprehensive Logging
- Production-Grade Error Handling
// Window functions
var engine = new WindowFunctionEngine(logger);
var result = await engine.ApplyWindowFunctionAsync(
data, "ROW_NUMBER", "date", "category");
// Custom aggregations
var aggEngine = new AggregationEngine(logger);
var stats = await aggEngine.CalculateStatisticsAsync(data, "value");// AutoML classification
var automl = new AutoMLEngine(logger);
var result = await automl.AutoTrainClassificationAsync(
trainingData, "label", maxExperimentTimeInSeconds: 60);
// Time series forecasting
var forecaster = new ForecastingEngine(logger);
var forecast = await forecaster.ForecastAsync(
data, "value", horizon: 30, windowSize: 10);// Schema inference
var inferrer = new SchemaInferenceEngine(logger);
var schema = await inferrer.InferSchemaAsync(data, sampleSize: 1000);
// Query optimization
var optimizer = new QueryOptimizationEngine(logger);
var analysis = await optimizer.AnalyzeQueryAsync(sqlQuery);- Installation Guide
- Connector Setup
- ML/AI Usage
- API Reference
- Architecture Decisions
- GitHub Issues - Track bugs, features, and enhancements
- GitHub Projects - Sprint planning and roadmap
- Product Vision
- Feature Roadmap
We welcome contributions! See CONTRIBUTING.md for guidelines.
# Clone repository
git clone https://github.com/mjtpena/faborite.git
# Restore dependencies
dotnet restore
# Build solution
dotnet build
# Run tests
dotnet testMIT License - see LICENSE for details.
Built with:
- GitHub Issues: Report a bug
- Discussions: Ask questions
- Twitter: @mjtpena
See GitHub Issues for active development tracking.
- ONNX Runtime neural network integration
- Azure ML & AWS SageMaker integration
- Connection pooling & circuit breaker patterns
- Model monitoring & drift detection
- Integration tests with Testcontainers
Made with β€οΈ by Michael John PeΓ±a