Building Resilient Spring Boot Apps with Resilience4j

A hands-on workshop where you'll learn to protect your microservices from cascading failures using retry, circuit breaker, and distributed retry patterns.

Architecture

┌─────────────────────┐         ┌─────────────────────┐
│   Service A          │  HTTP   │   Service B          │
│   Product Catalog    │────────>│   Pricing Service    │
│   (port 8080)        │         │   (port 8081)        │
│                      │         │                      │
│   GET /products      │         │   GET /pricing/{id}  │
│   GET /products/{id} │         │   POST /admin/...    │
└─────────────────────┘         └─────────────────────┘

Service A serves a product catalog. For each product, it calls Service B to get the current price. Service B has an admin API that lets you simulate different failure scenarios.

Prerequisites

Java 21
Maven 3.9+
Docker & Docker Compose
A REST client (curl, Postman, or IntelliJ HTTP client)

Quick Start

Option 1: Run locally (recommended for development)

# Build both services
mvn clean package -DskipTests

# Terminal 1: Start Service B (Pricing)
cd pricing-service
mvn spring-boot:run

# Terminal 2: Start Service A (Product Catalog)
cd product-catalog-service
mvn spring-boot:run

Option 2: Run with Docker

# Build the jars first
mvn clean package -DskipTests

# Start both services
docker-compose up --build

Verify it works

# Get all products with prices
curl http://localhost:8080/products

# Get a single product
curl http://localhost:8080/products/PROD-001

# Check Service B status
curl http://localhost:8081/admin/status

Service B Admin API (Failure Toggle)

Use these endpoints to simulate different failure scenarios:

Command	Effect
`curl -X POST http://localhost:8081/admin/healthy`	Normal operation
`curl -X POST http://localhost:8081/admin/slow`	10-second delay on every request
`curl -X POST http://localhost:8081/admin/fail`	500 error on every request
`curl -X POST "http://localhost:8081/admin/random?rate=40"`	40% of requests fail randomly
`curl http://localhost:8081/admin/status`	Check current failure mode

Phase 1: The Chaos (20 min)

Goal: Understand what happens when a downstream service fails and there's no resilience in place.

Challenge 1: Observe the Disaster

You're on-call. The pricing service is having issues. Document what happens.

Start both services and verify GET http://localhost:8080/products works

Toggle Service B to slow mode:

curl -X POST http://localhost:8081/admin/slow

Hit /products from multiple browser tabs simultaneously. What happens? How long does each request take?

Toggle Service B to error mode:

curl -X POST http://localhost:8081/admin/fail

Hit /products again. What do you see?
Bonus: If using Docker, stop Service B entirely:
```
docker stop pricing-service
```
What happens now? Is this different from the slow failure?

Think about:

How many concurrent requests does it take to make Service A unresponsive?
Which is worse: a slow service or a dead service? Why?
What happens to Service A's thread pool when Service B is slow?

Don't forget to reset Service B when you're done:

curl -X POST http://localhost:8081/admin/healthy

Phase 2: The Retry (45 min)

Goal: Handle transient failures by retrying failed requests. Then learn why naive retries can make things worse.

Challenge 2: Basic Retry

Mission: Service B fails randomly 40% of the time. Add retry logic so most user requests succeed.

# Set Service B to random failure mode (40%)
curl -X POST "http://localhost:8081/admin/random?rate=40"

Now hit GET /products several times. You'll see failures. Fix it with a retry.

Where to add it: product-catalog-service/src/main/java/com/workshop/catalog/client/PricingClient.java

What to do:

Add the @Retry annotation to the getPrice method
Configure the retry in application.yml

Hint 1: The annotation

@Retry(name = "pricingService")
public Map<String, Object> getPrice(String productId) {

Hint 2: The YAML config

Add this to product-catalog-service/src/main/resources/application.yml:

resilience4j:
  retry:
    instances:
      pricingService:
        maxAttempts: 3
        waitDuration: 500ms

Hint 3: Don't forget the import!

import io.github.resilience4j.retry.annotation.Retry;

Test it: With 40% failure rate and 3 attempts, what's the probability of all 3 attempts failing? (Answer: 0.4^3 = 6.4%. So ~94% of requests should succeed now!)

Check your metrics:

curl http://localhost:8080/actuator/retries
curl http://localhost:8080/actuator/retryevents

Stuck? git checkout step-1-retry

Challenge 3: The Retry Storm

Mission: Your retry works, but you're making things worse. Fix it.

Toggle Service B to slow mode:

curl -X POST http://localhost:8081/admin/slow

Send 10 concurrent requests:

for i in {1..10}; do curl -s http://localhost:8080/products/PROD-001 & done; wait

Check Service B's logs. How many requests did it receive? (Hint: with 3 retries, it's up to 30!)

Problem: Your retries are hammering an already struggling service. This is called a retry storm.

Fix it: Add exponential backoff and jitter.

Hint 1: What is exponential backoff?

Instead of retrying every 500ms, each retry waits longer:

1st retry: 500ms
2nd retry: 1000ms (500ms * 2)
3rd retry: 2000ms (500ms * 2 * 2)

This gives the failing service time to recover.

Hint 2: What is jitter?

If 100 clients all retry at exactly the same intervals, they'll all hit the server at the same time (thundering herd). Jitter adds randomness to the wait time so retries are spread out.

Hint 3: The YAML config

resilience4j:
  retry:
    instances:
      pricingService:
        maxAttempts: 3
        waitDuration: 500ms
        enableExponentialBackoff: true
        exponentialBackoffMultiplier: 2
        enableRandomizedWait: true
        randomizedWaitFactor: 0.5

Bonus Challenge: Only retry on server errors (5xx), NOT on client errors (4xx). A 400 Bad Request will never succeed no matter how many times you retry.

Bonus Hint: Exception filtering

resilience4j:
  retry:
    instances:
      pricingService:
        maxAttempts: 3
        waitDuration: 500ms
        enableExponentialBackoff: true
        exponentialBackoffMultiplier: 2
        enableRandomizedWait: true
        randomizedWaitFactor: 0.5
        retryExceptions:
          - org.springframework.web.reactive.function.client.WebClientResponseException.InternalServerError
          - org.springframework.web.reactive.function.client.WebClientResponseException.ServiceUnavailable
          - org.springframework.web.reactive.function.client.WebClientResponseException.BadGateway
          - java.io.IOException
          - java.util.concurrent.TimeoutException
        ignoreExceptions:
          - org.springframework.web.reactive.function.client.WebClientResponseException.BadRequest
          - org.springframework.web.reactive.function.client.WebClientResponseException.NotFound

Stuck? git checkout step-1-retry

Phase 3: The Circuit Breaker (55 min)

Goal: When a service is consistently failing, stop calling it entirely. Fail fast and provide a fallback.

Challenge 4: Basic Circuit Breaker

Mission: Service B is down. Instead of waiting and retrying (wasting time and resources), detect the failure pattern and stop calling it.

# Set Service B to fail mode
curl -X POST http://localhost:8081/admin/fail

Where to add it: Same file — PricingClient.java

What to do:

Add the @CircuitBreaker annotation to the getPrice method
Configure it in application.yml
Send requests repeatedly and watch the circuit breaker open

Hint 1: The annotation

@CircuitBreaker(name = "pricingService")
public Map<String, Object> getPrice(String productId) {

Hint 2: The YAML config

resilience4j:
  circuitbreaker:
    instances:
      pricingService:
        registerHealthIndicator: true
        slidingWindowSize: 10
        failureRateThreshold: 50
        waitDurationInOpenState: 10s
        permittedNumberOfCallsInHalfOpenState: 3
        slidingWindowType: COUNT_BASED

What this means:

Look at the last 10 calls (slidingWindowSize)
If 50% or more failed (failureRateThreshold), open the circuit
Stay open for 10 seconds (waitDurationInOpenState)
Then allow 3 test calls (permittedNumberOfCallsInHalfOpenState)
If those succeed, close the circuit again

Hint 3: Import

import io.github.resilience4j.circuitbreaker.annotation.CircuitBreaker;

Watch the state transitions:

# Check circuit breaker state
curl http://localhost:8080/actuator/circuitbreakers

# Check circuit breaker events
curl http://localhost:8080/actuator/circuitbreakerevents

WARNING: The #1 Workshop Pitfall!

If your @CircuitBreaker annotation doesn't seem to work, check this: Spring AOP proxies do NOT intercept method calls within the same class. If ProductService calls a @CircuitBreaker method that's also in ProductService, the annotation is ignored!

The annotation must be on a method in a different Spring bean that is called from outside. That's why we put it on PricingClient (called by ProductService).

Stuck? git checkout step-2-circuitbreaker

Challenge 5: The Fallback

Mission: When the circuit is open, users see an ugly error. Give them something useful instead.

What to do:

Add a fallbackMethod to your @CircuitBreaker annotation
The fallback should return a default price with a "priceStale": true flag
Test it: toggle Service B to fail, wait for the circuit to open, then hit /products

Hint 1: The annotation with fallback

@CircuitBreaker(name = "pricingService", fallbackMethod = "getPriceFallback")
public Map<String, Object> getPrice(String productId) {
    // ... existing code
}

Hint 2: The fallback method signature

The fallback method must:

Be in the same class
Have the same parameters as the original method, plus a Throwable parameter
Have the same return type

private Map<String, Object> getPriceFallback(String productId, Throwable t) {
    log.warn("Fallback triggered for product {}: {}", productId, t.getMessage());
    return Map.of(
        "productId", productId,
        "price", 0,
        "currency", "EUR",
        "discount", 0,
        "finalPrice", 0,
        "priceStale", true,
        "error", "Price temporarily unavailable"
    );
}

Hint 3: Bonus — cache the last known good price

Add a simple in-memory cache to PricingClient:

private final Map<String, Map<String, Object>> priceCache = new ConcurrentHashMap<>();

public Map<String, Object> getPrice(String productId) {
    // ... existing WebClient call
    Map<String, Object> response = // ... call Service B
    priceCache.put(productId, response); // Cache successful responses
    return response;
}

private Map<String, Object> getPriceFallback(String productId, Throwable t) {
    Map<String, Object> cached = priceCache.get(productId);
    if (cached != null) {
        Map<String, Object> result = new HashMap<>(cached);
        result.put("priceStale", true);
        return result;
    }
    // No cached price available
    return Map.of(
        "productId", productId,
        "price", 0,
        "currency", "EUR",
        "discount", 0,
        "finalPrice", 0,
        "priceStale", true,
        "error", "Price temporarily unavailable"
    );
}

Stuck? git checkout step-2-circuitbreaker

Challenge 6: Combine Retry + Circuit Breaker

Mission: Use both patterns together. But the order matters!

The question: If you have both @Retry and @CircuitBreaker on the same method, which one executes first?

The wrong order: Retry wraps Circuit Breaker

Circuit opens → Retry still tries 3 times → Each attempt is instantly rejected → Wasted effort

The right order: Circuit Breaker wraps Retry

Circuit open? → Don't even bother retrying, go straight to fallback
Circuit closed? → Try the call, retry on failure, count the final result

Hint 1: How to control the order

Resilience4j uses aspect ordering. Lower number = higher priority (executes first, wraps the others).

resilience4j:
  circuitbreaker:
    circuitBreakerAspectOrder: 1
  retry:
    retryAspectOrder: 2

With this config: CircuitBreaker (order 1) wraps Retry (order 2) wraps the actual call.

Hint 2: Both annotations together

@CircuitBreaker(name = "pricingService", fallbackMethod = "getPriceFallback")
@Retry(name = "pricingService")
public Map<String, Object> getPrice(String productId) {
    // ... existing code
}

Hint 3: Full YAML config

resilience4j:
  circuitbreaker:
    circuitBreakerAspectOrder: 1
    instances:
      pricingService:
        registerHealthIndicator: true
        slidingWindowSize: 10
        failureRateThreshold: 50
        waitDurationInOpenState: 10s
        permittedNumberOfCallsInHalfOpenState: 3
        slidingWindowType: COUNT_BASED
  retry:
    retryAspectOrder: 2
    instances:
      pricingService:
        maxAttempts: 3
        waitDuration: 500ms
        enableExponentialBackoff: true
        exponentialBackoffMultiplier: 2
        enableRandomizedWait: true
        randomizedWaitFactor: 0.5
        retryExceptions:
          - org.springframework.web.reactive.function.client.WebClientResponseException.InternalServerError
          - org.springframework.web.reactive.function.client.WebClientResponseException.ServiceUnavailable
          - java.io.IOException
          - java.util.concurrent.TimeoutException

Test the full flow:

Service B healthy → all good
curl -X POST "http://localhost:8081/admin/random?rate=40" → retry saves you
curl -X POST http://localhost:8081/admin/fail → circuit opens → fallback kicks in
curl -X POST http://localhost:8081/admin/healthy → circuit goes half-open → closes again

Stuck? git checkout step-2-circuitbreaker

Wrap-up: Distributed Retry (Bonus)

The Problem: Everything we've built so far is in-memory. If Service A crashes, all pending retries are lost. In production with multiple instances, you need persistent retry.

The Pattern:

When the circuit breaker fallback fires, save the failed request to a database
A @Scheduled job polls the database and retries periodically
Use ShedLock to ensure only one instance runs the scheduler

This is implemented in the solution branch. Check it out to see the full implementation:

git checkout solution

Key files:

RetryRequest.java — JPA entity for the retry queue
RetryRequestRepository.java — Spring Data repository
DistributedRetryService.java — Scheduled job with ShedLock
application.yml — H2 + ShedLock configuration

Useful Endpoints

Endpoint	Description
`GET http://localhost:8080/products`	All products with prices
`GET http://localhost:8080/products/PROD-001`	Single product
`GET http://localhost:8080/actuator/health`	Health check (includes CB state)
`GET http://localhost:8080/actuator/retries`	Retry instances and config
`GET http://localhost:8080/actuator/retryevents`	Retry event log
`GET http://localhost:8080/actuator/circuitbreakers`	Circuit breaker states
`GET http://localhost:8080/actuator/circuitbreakerevents`	Circuit breaker event log
`GET http://localhost:8081/admin/status`	Service B failure mode

Solution Branches

If you get stuck, checkout the solution branch for that phase:

Branch	Content
`main`	Starter code (no resilience)
`step-1-retry`	Retry with backoff, jitter, and exception filtering
`step-2-circuitbreaker`	Circuit breaker with fallback and in-memory price cache
`solution`	Everything + distributed retry with ShedLock

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
pricing-service		pricing-service
product-catalog-service		product-catalog-service
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
pom.xml		pom.xml
workshop-slides.html		workshop-slides.html
workshop-slides.md		workshop-slides.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Building Resilient Spring Boot Apps with Resilience4j

Architecture

Prerequisites

Quick Start

Option 1: Run locally (recommended for development)

Option 2: Run with Docker

Verify it works

Service B Admin API (Failure Toggle)

Phase 1: The Chaos (20 min)

Challenge 1: Observe the Disaster

Phase 2: The Retry (45 min)

Challenge 2: Basic Retry

Challenge 3: The Retry Storm

Phase 3: The Circuit Breaker (55 min)

Challenge 4: Basic Circuit Breaker

Challenge 5: The Fallback

Challenge 6: Combine Retry + Circuit Breaker

Wrap-up: Distributed Retry (Bonus)

Useful Endpoints

Solution Branches

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Building Resilient Spring Boot Apps with Resilience4j

Architecture

Prerequisites

Quick Start

Option 1: Run locally (recommended for development)

Option 2: Run with Docker

Verify it works

Service B Admin API (Failure Toggle)

Phase 1: The Chaos (20 min)

Challenge 1: Observe the Disaster

Phase 2: The Retry (45 min)

Challenge 2: Basic Retry

Challenge 3: The Retry Storm

Phase 3: The Circuit Breaker (55 min)

Challenge 4: Basic Circuit Breaker

Challenge 5: The Fallback

Challenge 6: Combine Retry + Circuit Breaker

Wrap-up: Distributed Retry (Bonus)

Useful Endpoints

Solution Branches

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages