How to Fix 503 Service Unavailable in Spring Boot
How do I fix 503 Service Unavailable in Spring Boot?
TL;DR
- Bottom line: 503 means server is up but can't handle requests -- top causes: (1) K8s readiness probe failing, (2) HikariCP pool exhausted, (3) Tomcat threads saturated, (4) downstream timeout cascade.
- Key tool/command:
curl http://localhost:8080/actuator/health-- check which health indicator isDOWN. - Watch out for: Default HikariCP pool is only 10 connections -- production needs 20-50+. Virtual threads do NOT fix this.
- Works with: Spring Boot 2.x / 3.x / 4.x (Framework 7), embedded Tomcat/Jetty/Undertow, Kubernetes/ECS/Cloud Run.
Constraints
- Never disable readiness probes to "fix" 503 -- the probe is reporting a real problem; fix the underlying health indicator. [src1]
- Virtual threads (Spring Boot 3.2+) eliminate thread-pool exhaustion but do NOT fix connection pool exhaustion or downstream timeouts. [src5]
- HikariCP
maximum-pool-sizemust not exceed your database'smax_connectionsdivided by number of app instances. [src3] - RestClient (Spring Boot 3.2+) replaces RestTemplate for new code -- configure timeouts via
ClientHttpRequestFactorySettingssince Boot 3.4. [src7] - In Kubernetes, always use separate liveness, readiness, and startup probes -- never use a single
/actuator/healthfor all three. [src4] - Connection leak detection (
leak-detection-threshold) must be enabled in production -- default is disabled. [src3]
Quick Reference
| # | 503 Root Cause | Likelihood | Symptom | Fix |
|---|---|---|---|---|
| 1 | Readiness probe failing | ~30% | K8s removes pod; actuator DOWN | Fix health indicator or increase probe timeout [src1, src4] |
| 2 | Connection pool exhausted | ~25% | Connection is not available, request timed out after 30000ms |
Increase pool size, fix leaks, enable leak detection [src3] |
| 3 | Thread pool saturated | ~15% | All 200 Tomcat threads busy, RejectedExecutionException |
Increase server.tomcat.threads.max, enable virtual threads on 3.2+ [src5] |
| 4 | Downstream timeout cascade | ~15% | Requests hang waiting for external API | Add timeouts + circuit breaker [src6, src7] |
| 5 | OutOfMemoryError | ~5% | GC thrashing | Increase heap or fix memory leak [src2] |
| 6 | Database unavailable | ~5% | DB health check DOWN | Fix DB connection or disable check [src1] |
| 7 | Disk space full | ~3% | DiskSpaceHealthIndicator DOWN | Free disk or adjust threshold [src1] |
| 8 | Graceful shutdown | ~2% | App shutting down, rejecting requests | Expected during rolling updates [src4] |
Decision Tree
START
├── Is /actuator/health returning DOWN?
│ ├── YES → Which health indicator is DOWN?
│ │ ├── db → Database connection issue [src1]
│ │ │ └── FIX: Check DB connectivity, increase pool size
│ │ ├── diskSpace → Disk full [src1]
│ │ │ └── FIX: Free disk, adjust threshold
│ │ ├── custom → Application-specific health check [src1]
│ │ │ └── FIX: Debug the custom HealthIndicator
│ │ └── readinessState → App not ready [src4]
│ │ └── FIX: Check startup dependencies
│ └── NO → Health is UP but still getting 503?
│ ├── Check load balancer health check path
│ └── Check if 503 comes from reverse proxy (nginx/ALB)
├── Connection pool errors in logs?
│ ├── YES → HikariCP exhausted [src3]
│ │ └── FIX: spring.datasource.hikari.maximum-pool-size=30
│ └── NO ↓
├── Thread rejection in logs?
│ ├── YES → Tomcat threads saturated [src5]
│ │ └── FIX: server.tomcat.threads.max=400 or enable virtual threads (3.2+)
│ └── NO ↓
├── Downstream calls timing out?
│ ├── YES → Cascading failure [src6]
│ │ └── FIX: Add RestClient/WebClient timeouts + circuit breaker
│ └── NO → Check JVM metrics (GC pauses, heap usage)
└── DEFAULT → Enable actuator + micrometer metrics
Step-by-Step Guide
1. Check actuator health endpoint
Determine which component is reporting DOWN. This is always the first diagnostic step. [src1]
# Check overall health
curl -s http://localhost:8080/actuator/health | jq
# Enable detailed health (application.properties):
management.endpoint.health.show-details=always
management.health.readinessstate.enabled=true
management.health.livenessstate.enabled=true
Verify: curl -s http://localhost:8080/actuator/health | jq .status → expected:
"UP" after fix
2. Check and tune connection pool
HikariCP default pool size of 10 is almost always too small for production. [src3]
# application.properties — HikariCP tuning [src3]
spring.datasource.hikari.maximum-pool-size=30
spring.datasource.hikari.minimum-idle=10
spring.datasource.hikari.connection-timeout=30000
spring.datasource.hikari.idle-timeout=600000
spring.datasource.hikari.max-lifetime=1800000
spring.datasource.hikari.leak-detection-threshold=60000
Verify:
curl -s http://localhost:8080/actuator/metrics/hikaricp.connections.active | jq .measurements →
active should be below maximum-pool-size
3. Check and tune Tomcat thread pool
Default 200 threads can be exhausted under sustained load. On Spring Boot 3.2+, consider virtual threads instead. [src5]
# application.properties — Tomcat tuning [src5]
server.tomcat.threads.max=400
server.tomcat.threads.min-spare=20
server.tomcat.accept-count=100
server.tomcat.max-connections=10000
server.tomcat.connection-timeout=20000
# Spring Boot 3.2+ — virtual threads (eliminates thread pool exhaustion)
spring.threads.virtual.enabled=true
4. Add timeouts to downstream calls
Every outbound HTTP call must have connect and read timeouts. [src6, src7]
// Spring Boot 3.2+ — RestClient with timeouts (preferred) [src7]
@Configuration
public class RestClientConfig {
@Bean
public RestClient restClient(RestClient.Builder builder) {
return builder
.requestFactory(ClientHttpRequestFactories.get(
ClientHttpRequestFactorySettings.DEFAULTS
.withConnectTimeout(Duration.ofSeconds(5))
.withReadTimeout(Duration.ofSeconds(10))
))
.build();
}
}
// Spring Boot 2.x-3.1 — RestTemplate with timeouts
@Configuration
public class LegacyRestClientConfig {
@Bean
public RestTemplate restTemplate() {
var factory = new SimpleClientHttpRequestFactory();
factory.setConnectTimeout(Duration.ofSeconds(5));
factory.setReadTimeout(Duration.ofSeconds(10));
return new RestTemplate(factory);
}
}
5. Configure Kubernetes probes correctly
Use three separate probes: startup, liveness, and readiness. [src4]
# Kubernetes deployment — proper probe configuration [src4]
containers:
- name: app
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
failureThreshold: 3
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8080
initialDelaySeconds: 15
periodSeconds: 5
failureThreshold: 3
startupProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
failureThreshold: 30 # 30 × 5s = 150s max startup
Verify: kubectl describe pod <pod-name> → no probe failure events after
deployment
Code Examples
Java: Circuit breaker for downstream services
Full script: java-circuit-breaker-for-downstream-services.java (30 lines)
// Input: Downstream API that may be slow/unavailable
// Output: Graceful degradation instead of 503 cascade
@Service
public class PaymentService {
private final RestTemplate restTemplate;
private final CircuitBreakerRegistry circuitBreakerRegistry;
// Resilience4j circuit breaker prevents cascade [src6]
@CircuitBreaker(name = "payment", fallbackMethod = "paymentFallback")
@TimeLimiter(name = "payment")
public CompletableFuture<PaymentResult> processPayment(PaymentRequest request) {
return CompletableFuture.supplyAsync(() ->
restTemplate.postForObject("/api/payments", request, PaymentResult.class)
);
}
private CompletableFuture<PaymentResult> paymentFallback(PaymentRequest req, Throwable t) {
return CompletableFuture.completedFuture(
PaymentResult.pending("Payment queued — retry in progress")
);
}
}
// application.yml — circuit breaker config
// resilience4j.circuitbreaker.instances.payment:
// sliding-window-size: 10
// failure-rate-threshold: 50
// wait-duration-in-open-state: 30s
Java: Custom health indicator with graceful degradation
// Input: Non-critical external dependency (e.g., cache, search index)
// Output: Health check that degrades to WARNING without failing readiness
@Component
public class SearchHealthIndicator implements HealthIndicator {
private final RestClient restClient;
@Override
public Health health() {
try {
restClient.get().uri("/ping")
.retrieve().toBodilessEntity();
return Health.up().build();
} catch (Exception e) {
// Report degraded but don't fail readiness [src1]
return Health.up()
.withDetail("search", "degraded: " + e.getMessage())
.build();
}
}
}
Anti-Patterns
Wrong: No timeouts on HTTP calls
// ❌ BAD — one slow downstream call can exhaust all threads [src6]
String result = restTemplate.getForObject("http://slow-service/api", String.class);
// Default: no timeout → thread hangs indefinitely
Correct: Always set timeouts + circuit breaker
// ✅ GOOD — bounded timeout prevents thread exhaustion [src6, src7]
var factory = new SimpleClientHttpRequestFactory();
factory.setConnectTimeout(Duration.ofSeconds(3));
factory.setReadTimeout(Duration.ofSeconds(10));
var restTemplate = new RestTemplate(factory);
Wrong: Default connection pool size for production
# ❌ BAD — HikariCP default is 10 connections
# Under load: "Connection is not available, request timed out after 30000ms"
Correct: Size pool for production workload
# ✅ GOOD — scale pool to match expected concurrency [src3]
spring.datasource.hikari.maximum-pool-size=30
spring.datasource.hikari.leak-detection-threshold=60000
Wrong: Virtual threads without connection pool scaling
# ❌ BAD — virtual threads handle more requests but each still needs a DB connection
# 1000 virtual threads + 10-connection pool = massive contention [src3, src5]
spring.threads.virtual.enabled=true
# spring.datasource.hikari.maximum-pool-size=10 (default)
Correct: Virtual threads with appropriately sized pool
# ✅ GOOD — scale connection pool alongside virtual threads [src3]
spring.threads.virtual.enabled=true
spring.datasource.hikari.maximum-pool-size=50
spring.datasource.hikari.leak-detection-threshold=60000
Wrong: Single health endpoint for all K8s probes
# ❌ BAD — liveness and readiness have different semantics [src4]
livenessProbe:
httpGet:
path: /actuator/health # includes readiness checks!
readinessProbe:
httpGet:
path: /actuator/health # same endpoint
Correct: Separate probe endpoints
# ✅ GOOD — each probe checks what it should [src4]
livenessProbe:
httpGet:
path: /actuator/health/liveness # only: is the JVM alive?
readinessProbe:
httpGet:
path: /actuator/health/readiness # only: can it accept traffic?
startupProbe:
httpGet:
path: /actuator/health/liveness # give app time to start
Common Pitfalls
- Readiness probe too aggressive:
initialDelaySecondstoo short → K8s fails probe before Spring Boot starts. Fix: usestartupProbefor slow-starting apps. [src4] - Connection pool exhaustion from leaks: Connections not returned (exception before
close()). Fix: enableleak-detection-thresholdand use try-with-resources. [src3] - Health check includes non-critical deps: Redis DOWN → entire app DOWN. Fix: group non-critical under readiness only, or use custom HealthIndicator that reports degraded. [src1]
- No graceful shutdown: SIGTERM but no in-flight draining. Fix:
server.shutdown=graceful+spring.lifecycle.timeout-per-shutdown-phase=30s. [src4] - Thread pool defaults too low: Tomcat default 200 threads → consumed under load. Fix:
increase
server.tomcat.threads.maxor enable virtual threads on 3.2+. [src5] - Virtual threads with pinned carriers:
synchronizedblocks pin virtual threads to carrier threads. Under high concurrency, this re-introduces thread exhaustion. Fix: replacesynchronizedwithReentrantLockin hot paths. [src5] - Connection pool too large for DB:
maximum-pool-size=200across 10 instances = 2000 connections, exceeding most DB defaults (100-200). Fix: calculatepool_size = max_connections / instance_count. [src3]
Diagnostic Commands
# Check health details
curl -s http://localhost:8080/actuator/health | jq
# Check individual probe endpoints (K8s)
curl -s http://localhost:8080/actuator/health/liveness | jq
curl -s http://localhost:8080/actuator/health/readiness | jq
# HikariCP metrics (requires micrometer)
curl -s http://localhost:8080/actuator/metrics/hikaricp.connections.active | jq
curl -s http://localhost:8080/actuator/metrics/hikaricp.connections.pending | jq
curl -s http://localhost:8080/actuator/metrics/hikaricp.connections.timeout | jq
# Tomcat threads
curl -s http://localhost:8080/actuator/metrics/tomcat.threads.current | jq
curl -s http://localhost:8080/actuator/metrics/tomcat.threads.busy | jq
# JVM memory
curl -s http://localhost:8080/actuator/metrics/jvm.memory.used | jq
# Check if virtual threads are active (Spring Boot 3.2+)
curl -s http://localhost:8080/actuator/metrics/executor.active | jq
# Thread dump for stuck threads
curl -s http://localhost:8080/actuator/threaddump | jq '.threads[] | select(.state=="WAITING" or .state=="TIMED_WAITING") | .threadName'
# Kubernetes: check pod events for probe failures
kubectl describe pod <pod-name> | grep -A5 "Events:"
kubectl get events --field-selector involvedObject.name=<pod-name>
Version History & Compatibility
| Spring Boot | Status | Key Changes for 503 Debugging |
|---|---|---|
| 2.3+ | EOL (commercial support only) | Readiness/liveness probes, graceful shutdown introduced [src4] |
| 2.4+ | EOL (commercial support only) | Startup probe support, K8s probe grouping |
| 3.0+ | EOL OSS | Jakarta EE 10 namespace (javax → jakarta), Tomcat 10.1 |
| 3.2+ | EOL OSS | Virtual threads GA, RestClient GA, improved observability [src5, src7] |
| 3.3+ | EOL OSS | Enhanced health group configuration, structured logging |
| 3.4+ | EOL OSS (no further patches) | RestClient timeout via ClientHttpRequestFactorySettings, SSL bundle auto-reload
[src7] |
| 3.5+ | Maintenance (OSS through 2026-06-30) | Final 3.x minor. Actuator processInfo now exposes virtual-thread counts on JDK 24+
— direct signal for thread saturation [src1] |
| 4.0+ | Current (Framework 7) | HTTP client properties moved to spring.http.clients.* (old
spring.http.client.* deprecated). Actuator endpoint nullability migrated to JSpecify
org.jspecify.annotations.Nullable — affects custom HealthIndicators [src8] |
When to Use / When Not to Use
| Debug 503 When | Look Elsewhere When | Use Instead |
|---|---|---|
| Actuator health shows DOWN | 502 Bad Gateway (upstream proxy error) | Check nginx/ALB/Envoy logs |
| Connection pool logs show exhaustion | 504 Gateway Timeout (proxy timeout) | Increase proxy timeout settings |
| K8s pod keeps restarting | 500 Internal Server Error (unhandled exception) | Check application logs for stack trace |
| Load increases → intermittent 503 | Consistent 503 on every request from startup | Check application.properties misconfiguration |
| After deploying new version | 503 from CDN or static assets | Check CDN origin configuration |
Important Caveats
- Virtual threads (Spring Boot 3.2+ with
spring.threads.virtual.enabled=true) can eliminate thread-pool exhaustion but don't fix connection pool or downstream timeout problems. You may hit connection pool limits faster because more concurrent requests proceed simultaneously. - Health endpoint 503 from K8s readiness is by design -- don't disable; fix the underlying health check.
- In Spring Boot 3.x,
management.health.readinessstate.enableddefaults to true in K8s environments (auto-detected viaKUBERNETES_SERVICE_HOSTenv var). - RestClient (Spring Boot 3.2+) is the recommended replacement for RestTemplate. Since Boot 3.4, timeouts
can be configured via
ClientHttpRequestFactorySettingsrather than manual factory creation. spring.threads.virtual.enabled=truecausesSimpleAsyncTaskExecutorto be used, which ignoresspring.task.execution.pool.*properties.- On Spring Boot 4.0+, HTTP client timeout properties moved namespace: use
spring.http.clients.*instead ofspring.http.client.*(still works but emits deprecation warnings). [src8] - On Spring Boot 3.5+ running on JDK 24+, virtual-thread pinning from
synchronizedblocks is largely fixed at the JVM level, butsynchronizedblocks holding I/O still serialize traffic. Custom HealthIndicators that synchronize over external calls can still spike latency under load.