A real-time machine learning-based autoscaling system for microservices using Kafka message broker and ensemble voting.
This project enhances the Weaveworks Sock Shop microservices demo with intelligent autoscaling capabilities. Three trained ML models (XGBoost, Random Forest, Logistic Regression) continuously monitor service metrics and vote on scaling decisions through a Kafka-based architecture.
Sock Shop Services → Prometheus → Metrics Aggregator → Kafka → 3 ML Models → Kafka → Authoritative Scaler → Decisions
- Metrics Aggregator Service: Polls Prometheus every 30s and publishes feature vectors to Kafka
- ML Inference Services (3): XGBoost, Random Forest, Logistic Regression models predict SLA violations
- Authoritative Scaling Service: Aggregates votes and makes final scaling decisions
- Kafka Message Broker: Enables real-time streaming with topics
metricsandmodel-votes
- ✅ Real-time ML inference with sub-second latency
- ✅ Ensemble voting (3 models) reduces false positives
- ✅ Kafka-based scalable architecture
- ✅ Dual modes: Production (ML active) and Development (data collection)
- ✅ 99.87% accuracy with 100% recall for SLA violations
- ✅ All predictions logged for continuous improvement
- Docker & Docker Compose
- Python 3.11+ (for load testing)
- 8GB RAM minimum
- Windows (CMD batch files provided)
start-production.batdocker-compose -f docker-compose.ml.yml logs -f authoritative-scalerstart-development.batstop-all.batStarts all services with ML inference active:
REM Start
start-production.bat
REM View decisions
docker-compose -f docker-compose.ml.yml logs -f authoritative-scaler
REM Stop
stop-all.batOutput Example:
================================================================================
SCALING DECISION #1 @ 2026-04-04 12:30:02 UTC
================================================================================
Service: orders
------------------------------------------------------------
xgboost -> SCALE UP (confidence: 87.00%)
random_forest -> SCALE UP (confidence: 82.00%)
logistic_regression -> NO ACTION (confidence: 55.00%)
------------------------------------------------------------
DECISION: SCALE UP
Vote Count: 2 SCALE UP, 1 NO ACTION (3 total)
Average Confidence: 74.67%
================================================================================
Collects metrics to CSV for model retraining:
REM Start
start-development.bat
REM Check output
type output\sockshop_metrics.csv# 30-minute ramp test
python metrics-collection/run_experiment.py --pattern ramp --duration 30
# 10-minute constant load
python metrics-collection/run_experiment.py --pattern constant --duration 10.
├── start-production.bat # Start production mode
├── start-development.bat # Start development mode
├── stop-all.bat # Stop all services
├── docker-compose.ml.yml # ML services configuration
│
├── services/ # Microservices
│ ├── metrics-aggregator/ # Polls Prometheus, publishes to Kafka
│ ├── ml-inference/ # ML model inference service
│ └── authoritative-scaler/ # Vote aggregator & decision engine
│
├── ML-Models/ # Trained models
│ └── models/
│ ├── xgboost/
│ ├── random_forest/
│ └── logistic_regression/
│
├── microservices-demo/ # Sock Shop demo
├── load-testing/ # Locust load tests
└── output/ # Output files (CSV, predictions)
| Service | Port | Description |
|---|---|---|
| Sock Shop | 80 | Microservices demo application |
| Prometheus | 9090 | Metrics collection |
| Grafana | 3000 | Monitoring dashboards (admin/foobar) |
| Kafka | 9092 | Message broker |
| Zookeeper | 2181 | Kafka coordination |
Edit docker-compose.ml.yml to configure:
metrics-aggregator:
environment:
MODE: production # or development
POLL_INTERVAL_SEC: 30 # Polling interval
PROMETHEUS_URL: http://prometheus:9090
authoritative-scaler:
environment:
VOTING_STRATEGY: majority # majority, unanimous, weighted
DECISION_WINDOW_SEC: 5 # Time to collect votes# All services
docker-compose -f docker-compose.ml.yml logs -f
# Specific service
docker-compose -f docker-compose.ml.yml logs -f authoritative-scaler
docker-compose -f docker-compose.ml.yml logs -f ml-xgboost# View metrics
docker exec -it kafka kafka-console-consumer --bootstrap-server localhost:9092 --topic metrics
# View votes
docker exec -it kafka kafka-console-consumer --bootstrap-server localhost:9092 --topic model-votes- Prometheus: http://localhost:9090
- Grafana: http://localhost:3000 (admin / foobar)
- Sock Shop: http://localhost:80
| Model | Accuracy | Precision | Recall | F1-Score | ROC-AUC |
|---|---|---|---|---|---|
| XGBoost | 99.87% | 99.74% | 100.00% | 99.87% | 99.87% |
| Random Forest | 99.87% | 99.74% | 100.00% | 99.87% | 99.87% |
| Logistic Regression | 99.74% | 99.48% | 100.00% | 99.74% | 99.74% |
All models achieve 100% recall for SLA violations (no false negatives).
REM Check Docker
docker ps
REM Check logs
docker-compose -f docker-compose.ml.yml logs
REM Restart
stop-all.bat
start-production.bat# Check Prometheus
curl http://localhost:9090/api/v1/query?query=up
# Check metrics aggregator
docker-compose -f docker-compose.ml.yml logs metrics-aggregator# Check Kafka
docker ps | grep kafka
# Restart Kafka
docker-compose -f docker-compose.ml.yml restart kafkaAll files saved to ./output/:
sockshop_metrics.csv- Raw metrics (development mode)predictions_xgboost.csv- XGBoost predictionspredictions_random_forest.csv- Random Forest predictionspredictions_logistic_regression.csv- Logistic Regression predictions
- README.md - This file (main documentation)
- ARCHITECTURE.md - System architecture and design details
This implementation is designed for educational and research purposes. All components are original work based on:
- Weaveworks Sock Shop demo (open source)
- Scikit-learn ML models (trained on collected data)
- Apache Kafka (open source message broker)
- Custom-built services (metrics aggregator, inference, scaler)
This project extends the Weaveworks Sock Shop demo. See original license: https://github.com/microservices-demo/microservices-demo
- Sock Shop: https://github.com/microservices-demo/microservices-demo
- Kafka: https://kafka.apache.org/
- Scikit-learn: https://scikit-learn.org/
- XGBoost: https://xgboost.readthedocs.io/
For issues:
- Check logs:
docker-compose -f docker-compose.ml.yml logs - Verify all services running:
docker ps - Review ARCHITECTURE.md for technical details