Skip to content

Sync Microsoft Fabric lakehouse data locally for faster development

License

Notifications You must be signed in to change notification settings

mjtpena/faborite

Faborite 🎯

CI/CD Pipeline codecov NuGet License: MIT .NET

Enterprise-grade data lakehouse sync with ML/AI capabilities

Faborite is a comprehensive data platform that combines lakehouse synchronization, intelligent data transformations, machine learning, and AI-powered analytics - all in a single, production-ready package.

🎯 Core Capabilities

πŸ“Š Data Integration & Sync

  • 30+ Connectors: SQL databases, NoSQL stores, cloud storage, streaming platforms, time series databases
  • Smart Sampling: Random, stratified, time-based, and custom SQL sampling strategies
  • Multi-Format Support: Parquet, Delta Lake, Iceberg, CSV, JSON, Avro
  • Real-time Streaming: Kafka, RabbitMQ, Redis Streams, Event Hubs

πŸ€– Machine Learning Suite

  • AutoML: Automated model selection for classification, regression, and multiclass problems
  • 7 ML Algorithms: Classification, regression, anomaly detection, forecasting, clustering, recommendations
  • MLflow-style Tracking: Experiment management with metrics, parameters, and artifacts
  • Feature Engineering: 8 transformation techniques including normalization, encoding, and binning

🧠 AI-Powered Features

  • Intelligent Schema Inference: Auto-detect 9 data types with confidence scoring
  • Query Optimization: Detect and fix 8 SQL anti-patterns automatically
  • PII Detection: Identify and mask sensitive data (Email, SSN, Credit Card, etc.)
  • Smart Mapping: Automatic schema mapping with Levenshtein distance matching

πŸ”„ Data Transformation

  • 22 Statistical Functions: Mean, median, percentiles, variance, skewness, kurtosis, etc.
  • 12 Window Functions: ROW_NUMBER, RANK, LAG/LEAD, cumulative sums, moving averages
  • Pivot/Unpivot: Cross-tabulation, transpose, multi-value pivots
  • SQL Engine: DuckDB-powered analytics with lakehouse integration

πŸš€ Quick Start

Installation

# As .NET Global Tool
dotnet tool install -g Faborite

# Or download binary from releases
# https://github.com/mjtpena/faborite/releases

Basic Usage

# Sync data from Microsoft Fabric
faborite sync --workspace "MyWorkspace" --lakehouse "MyLakehouse"

# With custom sampling
faborite sync --workspace "MyWorkspace" --lakehouse "MyLakehouse" --sample-size 10000

# Export to specific format
faborite export --format parquet --output "./data"

ML/AI Examples

# Train classification model with AutoML
faborite ml train --data "./data.csv" --target "label" --type classification

# Detect PII in dataset
faborite ai detect-pii --data "./sensitive-data.csv" --mask partial

# Optimize SQL queries
faborite ai optimize-query --file "./query.sql"

# Infer schema from data
faborite ai infer-schema --data "./unknown-data.csv"

πŸ“¦ Supported Connectors

Databases

  • Relational: SQL Server, PostgreSQL, MySQL, Oracle, SQLite
  • NoSQL: MongoDB, Cassandra, Elasticsearch, Redis
  • Graph: Neo4j
  • Data Warehouses: Snowflake, Azure Synapse, Google BigQuery

Cloud Storage

  • AWS S3, Google Cloud Storage, Azure Blob Storage
  • MinIO, Cloudflare R2, Backblaze B2, Wasabi

Streaming

  • Apache Kafka, RabbitMQ, Redis Streams
  • Azure Event Hubs, AWS Kinesis

Time Series

  • InfluxDB, TimescaleDB, Prometheus

πŸ› οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         Faborite Platform                        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚   Data Layer    β”‚   Processing Layer    β”‚    ML/AI Layer       β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ β€’ 30+ Connectorsβ”‚ β€’ Transformations     β”‚ β€’ AutoML             β”‚
β”‚ β€’ Multi-format  β”‚ β€’ Window Functions    β”‚ β€’ Classification     β”‚
β”‚ β€’ Streaming     β”‚ β€’ Aggregations        β”‚ β€’ Regression         β”‚
β”‚ β€’ Batch         β”‚ β€’ Pivots              β”‚ β€’ Anomaly Detection  β”‚
β”‚                 β”‚ β€’ SQL Engine          β”‚ β€’ Forecasting        β”‚
β”‚                 β”‚                       β”‚ β€’ Clustering         β”‚
β”‚                 β”‚                       β”‚ β€’ PII Detection      β”‚
β”‚                 β”‚                       β”‚ β€’ Schema Inference   β”‚
β”‚                 β”‚                       β”‚ β€’ Query Optimization β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“Š Statistics

  • 28,500+ Lines of Code
  • 30+ Production-Ready Connectors
  • 12 ML/AI Engines
  • 100+ Statistical Functions
  • Full Async/Await Support
  • Comprehensive Logging
  • Production-Grade Error Handling

πŸŽ“ Examples

Data Transformation

// Window functions
var engine = new WindowFunctionEngine(logger);
var result = await engine.ApplyWindowFunctionAsync(
    data, "ROW_NUMBER", "date", "category");

// Custom aggregations
var aggEngine = new AggregationEngine(logger);
var stats = await aggEngine.CalculateStatisticsAsync(data, "value");

Machine Learning

// AutoML classification
var automl = new AutoMLEngine(logger);
var result = await automl.AutoTrainClassificationAsync(
    trainingData, "label", maxExperimentTimeInSeconds: 60);

// Time series forecasting
var forecaster = new ForecastingEngine(logger);
var forecast = await forecaster.ForecastAsync(
    data, "value", horizon: 30, windowSize: 10);

AI Features

// Schema inference
var inferrer = new SchemaInferenceEngine(logger);
var schema = await inferrer.InferSchemaAsync(data, sampleSize: 1000);

// Query optimization
var optimizer = new QueryOptimizationEngine(logger);
var analysis = await optimizer.AnalyzeQueryAsync(sqlQuery);

πŸ“š Documentation

🀝 Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

Development Setup

# Clone repository
git clone https://github.com/mjtpena/faborite.git

# Restore dependencies
dotnet restore

# Build solution
dotnet build

# Run tests
dotnet test

πŸ“ License

MIT License - see LICENSE for details.

πŸ™ Acknowledgments

Built with:

πŸ“ž Support

🎯 Roadmap

See GitHub Issues for active development tracking.

Upcoming Features


Made with ❀️ by Michael John Peña

About

Sync Microsoft Fabric lakehouse data locally for faster development

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

No packages published