curl http://localhost:8080/actuator/health -- check which
health indicator is DOWN.maximum-pool-size must not exceed your database's max_connections
divided by number of app instances. [src3]ClientHttpRequestFactorySettings since Boot 3.4. [src7]/actuator/health for all three. [src4]leak-detection-threshold) must be enabled in production --
default is disabled. [src3]| # | 503 Root Cause | Likelihood | Symptom | Fix |
|---|---|---|---|---|
| 1 | Readiness probe failing | ~30% | K8s removes pod; actuator DOWN | Fix health indicator or increase probe timeout [src1, src4] |
| 2 | Connection pool exhausted | ~25% | Connection is not available, request timed out after 30000ms |
Increase pool size, fix leaks, enable leak detection [src3] |
| 3 | Thread pool saturated | ~15% | All 200 Tomcat threads busy, RejectedExecutionException |
Increase server.tomcat.threads.max, enable virtual threads on 3.2+ [src5] |
| 4 | Downstream timeout cascade | ~15% | Requests hang waiting for external API | Add timeouts + circuit breaker [src6, src7] |
| 5 | OutOfMemoryError | ~5% | GC thrashing | Increase heap or fix memory leak [src2] |
| 6 | Database unavailable | ~5% | DB health check DOWN | Fix DB connection or disable check [src1] |
| 7 | Disk space full | ~3% | DiskSpaceHealthIndicator DOWN | Free disk or adjust threshold [src1] |
| 8 | Graceful shutdown | ~2% | App shutting down, rejecting requests | Expected during rolling updates [src4] |
START
├── Is /actuator/health returning DOWN?
│ ├── YES → Which health indicator is DOWN?
│ │ ├── db → Database connection issue [src1]
│ │ │ └── FIX: Check DB connectivity, increase pool size
│ │ ├── diskSpace → Disk full [src1]
│ │ │ └── FIX: Free disk, adjust threshold
│ │ ├── custom → Application-specific health check [src1]
│ │ │ └── FIX: Debug the custom HealthIndicator
│ │ └── readinessState → App not ready [src4]
│ │ └── FIX: Check startup dependencies
│ └── NO → Health is UP but still getting 503?
│ ├── Check load balancer health check path
│ └── Check if 503 comes from reverse proxy (nginx/ALB)
├── Connection pool errors in logs?
│ ├── YES → HikariCP exhausted [src3]
│ │ └── FIX: spring.datasource.hikari.maximum-pool-size=30
│ └── NO ↓
├── Thread rejection in logs?
│ ├── YES → Tomcat threads saturated [src5]
│ │ └── FIX: server.tomcat.threads.max=400 or enable virtual threads (3.2+)
│ └── NO ↓
├── Downstream calls timing out?
│ ├── YES → Cascading failure [src6]
│ │ └── FIX: Add RestClient/WebClient timeouts + circuit breaker
│ └── NO → Check JVM metrics (GC pauses, heap usage)
└── DEFAULT → Enable actuator + micrometer metrics
Determine which component is reporting DOWN. This is always the first diagnostic step. [src1]
# Check overall health
curl -s http://localhost:8080/actuator/health | jq
# Enable detailed health (application.properties):
management.endpoint.health.show-details=always
management.health.readinessstate.enabled=true
management.health.livenessstate.enabled=true
Verify: curl -s http://localhost:8080/actuator/health | jq .status → expected:
"UP" after fix
HikariCP default pool size of 10 is almost always too small for production. [src3]
# application.properties — HikariCP tuning [src3]
spring.datasource.hikari.maximum-pool-size=30
spring.datasource.hikari.minimum-idle=10
spring.datasource.hikari.connection-timeout=30000
spring.datasource.hikari.idle-timeout=600000
spring.datasource.hikari.max-lifetime=1800000
spring.datasource.hikari.leak-detection-threshold=60000
Verify:
curl -s http://localhost:8080/actuator/metrics/hikaricp.connections.active | jq .measurements →
active should be below maximum-pool-size
Default 200 threads can be exhausted under sustained load. On Spring Boot 3.2+, consider virtual threads instead. [src5]
# application.properties — Tomcat tuning [src5]
server.tomcat.threads.max=400
server.tomcat.threads.min-spare=20
server.tomcat.accept-count=100
server.tomcat.max-connections=10000
server.tomcat.connection-timeout=20000
# Spring Boot 3.2+ — virtual threads (eliminates thread pool exhaustion)
spring.threads.virtual.enabled=true
Every outbound HTTP call must have connect and read timeouts. [src6, src7]
// Spring Boot 3.2+ — RestClient with timeouts (preferred) [src7]
@Configuration
public class RestClientConfig {
@Bean
public RestClient restClient(RestClient.Builder builder) {
return builder
.requestFactory(ClientHttpRequestFactories.get(
ClientHttpRequestFactorySettings.DEFAULTS
.withConnectTimeout(Duration.ofSeconds(5))
.withReadTimeout(Duration.ofSeconds(10))
))
.build();
}
}
// Spring Boot 2.x-3.1 — RestTemplate with timeouts
@Configuration
public class LegacyRestClientConfig {
@Bean
public RestTemplate restTemplate() {
var factory = new SimpleClientHttpRequestFactory();
factory.setConnectTimeout(Duration.ofSeconds(5));
factory.setReadTimeout(Duration.ofSeconds(10));
return new RestTemplate(factory);
}
}
Use three separate probes: startup, liveness, and readiness. [src4]
# Kubernetes deployment — proper probe configuration [src4]
containers:
- name: app
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
failureThreshold: 3
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8080
initialDelaySeconds: 15
periodSeconds: 5
failureThreshold: 3
startupProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
failureThreshold: 30 # 30 × 5s = 150s max startup
Verify: kubectl describe pod <pod-name> → no probe failure events after
deployment
Full script: java-circuit-breaker-for-downstream-services.java (30 lines)
// Input: Downstream API that may be slow/unavailable
// Output: Graceful degradation instead of 503 cascade
@Service
public class PaymentService {
private final RestTemplate restTemplate;
private final CircuitBreakerRegistry circuitBreakerRegistry;
// Resilience4j circuit breaker prevents cascade [src6]
@CircuitBreaker(name = "payment", fallbackMethod = "paymentFallback")
@TimeLimiter(name = "payment")
public CompletableFuture<PaymentResult> processPayment(PaymentRequest request) {
return CompletableFuture.supplyAsync(() ->
restTemplate.postForObject("/api/payments", request, PaymentResult.class)
);
}
private CompletableFuture<PaymentResult> paymentFallback(PaymentRequest req, Throwable t) {
return CompletableFuture.completedFuture(
PaymentResult.pending("Payment queued — retry in progress")
);
}
}
// application.yml — circuit breaker config
// resilience4j.circuitbreaker.instances.payment:
// sliding-window-size: 10
// failure-rate-threshold: 50
// wait-duration-in-open-state: 30s
// Input: Non-critical external dependency (e.g., cache, search index)
// Output: Health check that degrades to WARNING without failing readiness
@Component
public class SearchHealthIndicator implements HealthIndicator {
private final RestClient restClient;
@Override
public Health health() {
try {
restClient.get().uri("/ping")
.retrieve().toBodilessEntity();
return Health.up().build();
} catch (Exception e) {
// Report degraded but don't fail readiness [src1]
return Health.up()
.withDetail("search", "degraded: " + e.getMessage())
.build();
}
}
}
// ❌ BAD — one slow downstream call can exhaust all threads [src6]
String result = restTemplate.getForObject("http://slow-service/api", String.class);
// Default: no timeout → thread hangs indefinitely
// ✅ GOOD — bounded timeout prevents thread exhaustion [src6, src7]
var factory = new SimpleClientHttpRequestFactory();
factory.setConnectTimeout(Duration.ofSeconds(3));
factory.setReadTimeout(Duration.ofSeconds(10));
var restTemplate = new RestTemplate(factory);
# ❌ BAD — HikariCP default is 10 connections
# Under load: "Connection is not available, request timed out after 30000ms"
# ✅ GOOD — scale pool to match expected concurrency [src3]
spring.datasource.hikari.maximum-pool-size=30
spring.datasource.hikari.leak-detection-threshold=60000
# ❌ BAD — virtual threads handle more requests but each still needs a DB connection
# 1000 virtual threads + 10-connection pool = massive contention [src3, src5]
spring.threads.virtual.enabled=true
# spring.datasource.hikari.maximum-pool-size=10 (default)
# ✅ GOOD — scale connection pool alongside virtual threads [src3]
spring.threads.virtual.enabled=true
spring.datasource.hikari.maximum-pool-size=50
spring.datasource.hikari.leak-detection-threshold=60000
# ❌ BAD — liveness and readiness have different semantics [src4]
livenessProbe:
httpGet:
path: /actuator/health # includes readiness checks!
readinessProbe:
httpGet:
path: /actuator/health # same endpoint
# ✅ GOOD — each probe checks what it should [src4]
livenessProbe:
httpGet:
path: /actuator/health/liveness # only: is the JVM alive?
readinessProbe:
httpGet:
path: /actuator/health/readiness # only: can it accept traffic?
startupProbe:
httpGet:
path: /actuator/health/liveness # give app time to start
initialDelaySeconds too short → K8s fails
probe before Spring Boot starts. Fix: use startupProbe for slow-starting apps. [src4]close()). Fix: enable leak-detection-threshold and use try-with-resources. [src3]server.shutdown=graceful + spring.lifecycle.timeout-per-shutdown-phase=30s.
[src4]server.tomcat.threads.max or enable virtual threads on 3.2+. [src5]synchronized blocks pin virtual
threads to carrier threads. Under high concurrency, this re-introduces thread exhaustion. Fix: replace
synchronized with ReentrantLock in hot paths. [src5]maximum-pool-size=200 across 10
instances = 2000 connections, exceeding most DB defaults (100-200). Fix: calculate
pool_size = max_connections / instance_count. [src3]# Check health details
curl -s http://localhost:8080/actuator/health | jq
# Check individual probe endpoints (K8s)
curl -s http://localhost:8080/actuator/health/liveness | jq
curl -s http://localhost:8080/actuator/health/readiness | jq
# HikariCP metrics (requires micrometer)
curl -s http://localhost:8080/actuator/metrics/hikaricp.connections.active | jq
curl -s http://localhost:8080/actuator/metrics/hikaricp.connections.pending | jq
curl -s http://localhost:8080/actuator/metrics/hikaricp.connections.timeout | jq
# Tomcat threads
curl -s http://localhost:8080/actuator/metrics/tomcat.threads.current | jq
curl -s http://localhost:8080/actuator/metrics/tomcat.threads.busy | jq
# JVM memory
curl -s http://localhost:8080/actuator/metrics/jvm.memory.used | jq
# Check if virtual threads are active (Spring Boot 3.2+)
curl -s http://localhost:8080/actuator/metrics/executor.active | jq
# Thread dump for stuck threads
curl -s http://localhost:8080/actuator/threaddump | jq '.threads[] | select(.state=="WAITING" or .state=="TIMED_WAITING") | .threadName'
# Kubernetes: check pod events for probe failures
kubectl describe pod <pod-name> | grep -A5 "Events:"
kubectl get events --field-selector involvedObject.name=<pod-name>
| Spring Boot | Status | Key Changes for 503 Debugging |
|---|---|---|
| 2.3+ | Maintenance | Readiness/liveness probes, graceful shutdown introduced [src4] |
| 2.4+ | Maintenance | Startup probe support, K8s probe grouping |
| 3.0+ | Maintenance | Jakarta EE 10 namespace (javax → jakarta), Tomcat 10.1 |
| 3.2+ | Current | Virtual threads GA, RestClient GA, improved observability [src5, src7] |
| 3.3+ | Current | Enhanced health group configuration, structured logging |
| 3.4+ | Current | RestClient timeout via ClientHttpRequestFactorySettings, SSL bundle auto-reload
[src7] |
| Debug 503 When | Look Elsewhere When | Use Instead |
|---|---|---|
| Actuator health shows DOWN | 502 Bad Gateway (upstream proxy error) | Check nginx/ALB/Envoy logs |
| Connection pool logs show exhaustion | 504 Gateway Timeout (proxy timeout) | Increase proxy timeout settings |
| K8s pod keeps restarting | 500 Internal Server Error (unhandled exception) | Check application logs for stack trace |
| Load increases → intermittent 503 | Consistent 503 on every request from startup | Check application.properties misconfiguration |
| After deploying new version | 503 from CDN or static assets | Check CDN origin configuration |
spring.threads.virtual.enabled=true) can eliminate
thread-pool exhaustion but don't fix connection pool or downstream timeout problems. You may hit
connection pool limits faster because more concurrent requests proceed simultaneously.management.health.readinessstate.enabled defaults to true in K8s
environments (auto-detected via KUBERNETES_SERVICE_HOST env var).ClientHttpRequestFactorySettings rather than manual factory creation.
spring.threads.virtual.enabled=true causes SimpleAsyncTaskExecutor to be
used, which ignores spring.task.execution.pool.* properties.