Skip to content

Latest commit

 

History

History
290 lines (209 loc) · 15.6 KB

File metadata and controls

290 lines (209 loc) · 15.6 KB

ML-Driven Proactive Microservices Autoscaling

A real-time machine learning-based autoscaling system for microservices using Kafka message broker and ensemble voting.

Overview

This project enhances the Weaveworks Sock Shop microservices demo with intelligent autoscaling capabilities. Three trained ML models (XGBoost, Random Forest, Logistic Regression) continuously monitor service metrics and vote on scaling decisions through a Kafka-based architecture.

Architecture

Sock Shop Services → Prometheus → Metrics Aggregator → Kafka → 3 ML Models → Kafka → Authoritative Scaler → Decisions

Components

  1. Metrics Aggregator Service: Polls Prometheus every 30s and publishes feature vectors to Kafka
  2. ML Inference Services (3): XGBoost, Random Forest, Logistic Regression models predict SLA violations
  3. Authoritative Scaling Service: Aggregates votes and makes final scaling decisions
  4. Kafka Message Broker: Enables real-time streaming with topics metrics and model-votes

Features

  • ✅ Real-time ML inference with sub-second latency
  • ✅ Ensemble voting (3 models) reduces false positives
  • ✅ Kafka-based scalable architecture
  • ✅ Dual modes: Production (ML active) and Development (data collection)
  • ✅ 99.87% accuracy with 100% recall for SLA violations
  • ✅ All predictions logged for continuous improvement

Quick Start

Prerequisites

  • Docker & Docker Compose
  • Python 3.11+ (for load testing)
  • 8GB RAM minimum
  • Windows (CMD batch files provided)

Start Production Mode (ML inference active)

start-production.bat

View Scaling Decisions

docker-compose -f docker-compose.ml.yml logs -f authoritative-scaler

Start Development Mode (CSV collection)

start-development.bat

Stop All Services

stop-all.bat

Usage

Production Mode

Starts all services with ML inference active:

REM Start
start-production.bat

REM View decisions
docker-compose -f docker-compose.ml.yml logs -f authoritative-scaler

REM Stop
stop-all.bat

Output Example:

================================================================================
SCALING DECISION #1 @ 2026-04-04 12:30:02 UTC
================================================================================

Service: orders
------------------------------------------------------------
  xgboost              -> SCALE UP   (confidence: 87.00%)
  random_forest        -> SCALE UP   (confidence: 82.00%)
  logistic_regression  -> NO ACTION  (confidence: 55.00%)
------------------------------------------------------------
  DECISION: SCALE UP
  Vote Count: 2 SCALE UP, 1 NO ACTION (3 total)
  Average Confidence: 74.67%
================================================================================

Development Mode

Collects metrics to CSV for model retraining:

REM Start
start-development.bat

REM Check output
type output\sockshop_metrics.csv

Running Load Tests

# 30-minute ramp test
python metrics-collection/run_experiment.py --pattern ramp --duration 30

# 10-minute constant load
python metrics-collection/run_experiment.py --pattern constant --duration 10

Project Structure

.
├── start-production.bat            # Start production mode
├── start-development.bat           # Start development mode
├── stop-all.bat                    # Stop all services
├── docker-compose.ml.yml           # ML services configuration
│
├── services/                       # Microservices
│   ├── metrics-aggregator/         # Polls Prometheus, publishes to Kafka
│   ├── ml-inference/               # ML model inference service
│   └── authoritative-scaler/       # Vote aggregator & decision engine
│
├── ML-Models/                      # Trained models
│   └── models/
│       ├── xgboost/
│       ├── random_forest/
│       └── logistic_regression/
│
├── microservices-demo/             # Sock Shop demo
├── load-testing/                   # Locust load tests
└── output/                         # Output files (CSV, predictions)

Services

Service Port Description
Sock Shop 80 Microservices demo application
Prometheus 9090 Metrics collection
Grafana 3000 Monitoring dashboards (admin/foobar)
Kafka 9092 Message broker
Zookeeper 2181 Kafka coordination

Configuration

Environment Variables

Edit docker-compose.ml.yml to configure:

metrics-aggregator:
  environment:
    MODE: production              # or development
    POLL_INTERVAL_SEC: 30         # Polling interval
    PROMETHEUS_URL: http://prometheus:9090

authoritative-scaler:
  environment:
    VOTING_STRATEGY: majority     # majority, unanimous, weighted
    DECISION_WINDOW_SEC: 5        # Time to collect votes

Monitoring

View Logs

# All services
docker-compose -f docker-compose.ml.yml logs -f

# Specific service
docker-compose -f docker-compose.ml.yml logs -f authoritative-scaler
docker-compose -f docker-compose.ml.yml logs -f ml-xgboost

Kafka Topics

# View metrics
docker exec -it kafka kafka-console-consumer --bootstrap-server localhost:9092 --topic metrics

# View votes
docker exec -it kafka kafka-console-consumer --bootstrap-server localhost:9092 --topic model-votes

Access Dashboards

Model Performance

Model Accuracy Precision Recall F1-Score ROC-AUC
XGBoost 99.87% 99.74% 100.00% 99.87% 99.87%
Random Forest 99.87% 99.74% 100.00% 99.87% 99.87%
Logistic Regression 99.74% 99.48% 100.00% 99.74% 99.74%

All models achieve 100% recall for SLA violations (no false negatives).

Troubleshooting

Services Not Starting

REM Check Docker
docker ps

REM Check logs
docker-compose -f docker-compose.ml.yml logs

REM Restart
stop-all.bat
start-production.bat

No Metrics Collected

# Check Prometheus
curl http://localhost:9090/api/v1/query?query=up

# Check metrics aggregator
docker-compose -f docker-compose.ml.yml logs metrics-aggregator

Kafka Issues

# Check Kafka
docker ps | grep kafka

# Restart Kafka
docker-compose -f docker-compose.ml.yml restart kafka

Output Files

All files saved to ./output/:

  • sockshop_metrics.csv - Raw metrics (development mode)
  • predictions_xgboost.csv - XGBoost predictions
  • predictions_random_forest.csv - Random Forest predictions
  • predictions_logistic_regression.csv - Logistic Regression predictions

Documentation

  • README.md - This file (main documentation)
  • ARCHITECTURE.md - System architecture and design details

Academic Integrity

This implementation is designed for educational and research purposes. All components are original work based on:

  • Weaveworks Sock Shop demo (open source)
  • Scikit-learn ML models (trained on collected data)
  • Apache Kafka (open source message broker)
  • Custom-built services (metrics aggregator, inference, scaler)

License

This project extends the Weaveworks Sock Shop demo. See original license: https://github.com/microservices-demo/microservices-demo

References

Support

For issues:

  1. Check logs: docker-compose -f docker-compose.ml.yml logs
  2. Verify all services running: docker ps
  3. Review ARCHITECTURE.md for technical details