ML-Driven Proactive Microservices Autoscaling

A real-time machine learning-based autoscaling system for microservices using Kafka message broker and ensemble voting.

Overview

This project enhances the Weaveworks Sock Shop microservices demo with intelligent autoscaling capabilities. Three trained ML models (XGBoost, Random Forest, Logistic Regression) continuously monitor service metrics and vote on scaling decisions through a Kafka-based architecture.

Architecture

Sock Shop Services → Prometheus → Metrics Aggregator → Kafka → 3 ML Models → Kafka → Authoritative Scaler → Decisions

Components

Metrics Aggregator Service: Polls Prometheus every 30s and publishes feature vectors to Kafka
ML Inference Services (3): XGBoost, Random Forest, Logistic Regression models predict SLA violations
Authoritative Scaling Service: Aggregates votes and makes final scaling decisions
Kafka Message Broker: Enables real-time streaming with topics metrics and model-votes

Features

✅ Real-time ML inference with sub-second latency
✅ Ensemble voting (3 models) reduces false positives
✅ Kafka-based scalable architecture
✅ Dual modes: Production (ML active) and Development (data collection)
✅ 99.87% accuracy with 100% recall for SLA violations
✅ All predictions logged for continuous improvement

Quick Start

Prerequisites

Docker & Docker Compose
Python 3.11+ (for load testing)
8GB RAM minimum
Windows (CMD batch files provided)

Start Production Mode (ML inference active)

start-production.bat

View Scaling Decisions

docker-compose -f docker-compose.ml.yml logs -f authoritative-scaler

Start Development Mode (CSV collection)

start-development.bat

Stop All Services

stop-all.bat

Usage

Production Mode

Starts all services with ML inference active:

REM Start
start-production.bat

REM View decisions
docker-compose -f docker-compose.ml.yml logs -f authoritative-scaler

REM Stop
stop-all.bat

Output Example:

================================================================================
SCALING DECISION #1 @ 2026-04-04 12:30:02 UTC
================================================================================

Service: orders
------------------------------------------------------------
  xgboost              -> SCALE UP   (confidence: 87.00%)
  random_forest        -> SCALE UP   (confidence: 82.00%)
  logistic_regression  -> NO ACTION  (confidence: 55.00%)
------------------------------------------------------------
  DECISION: SCALE UP
  Vote Count: 2 SCALE UP, 1 NO ACTION (3 total)
  Average Confidence: 74.67%
================================================================================

Development Mode

Collects metrics to CSV for model retraining:

REM Start
start-development.bat

REM Check output
type output\sockshop_metrics.csv

Running Load Tests

# 30-minute ramp test
python metrics-collection/run_experiment.py --pattern ramp --duration 30

# 10-minute constant load
python metrics-collection/run_experiment.py --pattern constant --duration 10

Project Structure

.
├── start-production.bat            # Start production mode
├── start-development.bat           # Start development mode
├── stop-all.bat                    # Stop all services
├── docker-compose.ml.yml           # ML services configuration
│
├── services/                       # Microservices
│   ├── metrics-aggregator/         # Polls Prometheus, publishes to Kafka
│   ├── ml-inference/               # ML model inference service
│   └── authoritative-scaler/       # Vote aggregator & decision engine
│
├── ML-Models/                      # Trained models
│   └── models/
│       ├── xgboost/
│       ├── random_forest/
│       └── logistic_regression/
│
├── microservices-demo/             # Sock Shop demo
├── load-testing/                   # Locust load tests
└── output/                         # Output files (CSV, predictions)

Services

Service	Port	Description
Sock Shop	80	Microservices demo application
Prometheus	9090	Metrics collection
Grafana	3000	Monitoring dashboards (admin/foobar)
Kafka	9092	Message broker
Zookeeper	2181	Kafka coordination

Configuration

Environment Variables

Edit docker-compose.ml.yml to configure:

metrics-aggregator:
  environment:
    MODE: production              # or development
    POLL_INTERVAL_SEC: 30         # Polling interval
    PROMETHEUS_URL: http://prometheus:9090

authoritative-scaler:
  environment:
    VOTING_STRATEGY: majority     # majority, unanimous, weighted
    DECISION_WINDOW_SEC: 5        # Time to collect votes

Monitoring

View Logs

# All services
docker-compose -f docker-compose.ml.yml logs -f

# Specific service
docker-compose -f docker-compose.ml.yml logs -f authoritative-scaler
docker-compose -f docker-compose.ml.yml logs -f ml-xgboost

Kafka Topics

# View metrics
docker exec -it kafka kafka-console-consumer --bootstrap-server localhost:9092 --topic metrics

# View votes
docker exec -it kafka kafka-console-consumer --bootstrap-server localhost:9092 --topic model-votes

Access Dashboards

Prometheus: http://localhost:9090
Grafana: http://localhost:3000 (admin / foobar)
Sock Shop: http://localhost:80

Model Performance

Model	Accuracy	Precision	Recall	F1-Score	ROC-AUC
XGBoost	99.87%	99.74%	100.00%	99.87%	99.87%
Random Forest	99.87%	99.74%	100.00%	99.87%	99.87%
Logistic Regression	99.74%	99.48%	100.00%	99.74%	99.74%

All models achieve 100% recall for SLA violations (no false negatives).

Troubleshooting

Services Not Starting

REM Check Docker
docker ps

REM Check logs
docker-compose -f docker-compose.ml.yml logs

REM Restart
stop-all.bat
start-production.bat

No Metrics Collected

# Check Prometheus
curl http://localhost:9090/api/v1/query?query=up

# Check metrics aggregator
docker-compose -f docker-compose.ml.yml logs metrics-aggregator

Kafka Issues

# Check Kafka
docker ps | grep kafka

# Restart Kafka
docker-compose -f docker-compose.ml.yml restart kafka

Output Files

All files saved to ./output/:

sockshop_metrics.csv - Raw metrics (development mode)
predictions_xgboost.csv - XGBoost predictions
predictions_random_forest.csv - Random Forest predictions
predictions_logistic_regression.csv - Logistic Regression predictions

Documentation

README.md - This file (main documentation)
ARCHITECTURE.md - System architecture and design details

Academic Integrity

This implementation is designed for educational and research purposes. All components are original work based on:

Weaveworks Sock Shop demo (open source)
Scikit-learn ML models (trained on collected data)
Apache Kafka (open source message broker)
Custom-built services (metrics aggregator, inference, scaler)

License

This project extends the Weaveworks Sock Shop demo. See original license: https://github.com/microservices-demo/microservices-demo

References

Sock Shop: https://github.com/microservices-demo/microservices-demo
Kafka: https://kafka.apache.org/
Scikit-learn: https://scikit-learn.org/
XGBoost: https://xgboost.readthedocs.io/

Support

For issues:

Check logs: docker-compose -f docker-compose.ml.yml logs
Verify all services running: docker ps
Review ARCHITECTURE.md for technical details

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ML-Driven Proactive Microservices Autoscaling

Overview

Architecture

Components

Features

Quick Start

Prerequisites

Start Production Mode (ML inference active)

View Scaling Decisions

Start Development Mode (CSV collection)

Stop All Services

Usage

Production Mode

Development Mode

Running Load Tests

Project Structure

Services

Configuration

Environment Variables

Monitoring

View Logs

Kafka Topics

Access Dashboards

Model Performance

Troubleshooting

Services Not Starting

No Metrics Collected

Kafka Issues

Output Files

Documentation

Academic Integrity

License

References

Support

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

ML-Driven Proactive Microservices Autoscaling

Overview

Architecture

Components

Features

Quick Start

Prerequisites

Start Production Mode (ML inference active)

View Scaling Decisions

Start Development Mode (CSV collection)

Stop All Services

Usage

Production Mode

Development Mode

Running Load Tests

Project Structure

Services

Configuration

Environment Variables

Monitoring

View Logs

Kafka Topics

Access Dashboards

Model Performance

Troubleshooting

Services Not Starting

No Metrics Collected

Kafka Issues

Output Files

Documentation

Academic Integrity

License

References

Support