Spring Boot 503 error

- Bottom line: 503 in Spring Boot means the server is up but can't handle requests -- top causes: (1) Kubernetes readiness probe failing, (2) HikariCP connection pool exhausted, (3) Tomcat thread pool saturated, (4) downstream service timeout cascading.

Spring Boot service unavailable

- Bottom line: 503 in Spring Boot means the server is up but can't handle requests -- top causes: (1) Kubernetes readiness probe failing, (2) HikariCP connection pool exhausted, (3) Tomcat thread pool saturated, (4) downstream service timeout cascading.

Spring Boot health check failing

- Bottom line: 503 in Spring Boot means the server is up but can't handle requests -- top causes: (1) Kubernetes readiness probe failing, (2) HikariCP connection pool exhausted, (3) Tomcat thread pool saturated, (4) downstream service timeout cascading.

Spring Boot actuator 503

- Bottom line: 503 in Spring Boot means the server is up but can't handle requests -- top causes: (1) Kubernetes readiness probe failing, (2) HikariCP connection pool exhausted, (3) Tomcat thread pool saturated, (4) downstream service timeout cascading.

Spring Boot readiness probe failing

- Bottom line: 503 in Spring Boot means the server is up but can't handle requests -- top causes: (1) Kubernetes readiness probe failing, (2) HikariCP connection pool exhausted, (3) Tomcat thread pool saturated, (4) downstream service timeout cascading.

How to Fix 503 Service Unavailable in Spring Boot

How do I fix 503 Service Unavailable in Spring Boot?

TL;DR

Bottom line: 503 means server is up but can't handle requests -- top causes: (1) K8s readiness probe failing, (2) HikariCP pool exhausted, (3) Tomcat threads saturated, (4) downstream timeout cascade.
Key tool/command: curl http://localhost:8080/actuator/health -- check which health indicator is DOWN.
Watch out for: Default HikariCP pool is only 10 connections -- production needs 20-50+. Virtual threads do NOT fix this.
Works with: Spring Boot 2.x / 3.x / 4.x (Framework 7), embedded Tomcat/Jetty/Undertow, Kubernetes/ECS/Cloud Run.

Constraints

Never disable readiness probes to "fix" 503 -- the probe is reporting a real problem; fix the underlying health indicator. [src1]
Virtual threads (Spring Boot 3.2+) eliminate thread-pool exhaustion but do NOT fix connection pool exhaustion or downstream timeouts. [src5]
HikariCP maximum-pool-size must not exceed your database's max_connections divided by number of app instances. [src3]
RestClient (Spring Boot 3.2+) replaces RestTemplate for new code -- configure timeouts via ClientHttpRequestFactorySettings since Boot 3.4. [src7]
In Kubernetes, always use separate liveness, readiness, and startup probes -- never use a single /actuator/health for all three. [src4]
Connection leak detection (leak-detection-threshold) must be enabled in production -- default is disabled. [src3]

Quick Reference

#	503 Root Cause	Likelihood	Symptom	Fix
1	Readiness probe failing	~30%	K8s removes pod; actuator DOWN	Fix health indicator or increase probe timeout [src1, src4]
2	Connection pool exhausted	~25%	`Connection is not available, request timed out after 30000ms`	Increase pool size, fix leaks, enable leak detection [src3]
3	Thread pool saturated	~15%	All 200 Tomcat threads busy, `RejectedExecutionException`	Increase `server.tomcat.threads.max`, enable virtual threads on 3.2+ [src5]
4	Downstream timeout cascade	~15%	Requests hang waiting for external API	Add timeouts + circuit breaker [src6, src7]
5	OutOfMemoryError	~5%	GC thrashing	Increase heap or fix memory leak [src2]
6	Database unavailable	~5%	DB health check DOWN	Fix DB connection or disable check [src1]
7	Disk space full	~3%	DiskSpaceHealthIndicator DOWN	Free disk or adjust threshold [src1]
8	Graceful shutdown	~2%	App shutting down, rejecting requests	Expected during rolling updates [src4]

Decision Tree

START
├── Is /actuator/health returning DOWN?
│   ├── YES → Which health indicator is DOWN?
│   │   ├── db → Database connection issue [src1]
│   │   │   └── FIX: Check DB connectivity, increase pool size
│   │   ├── diskSpace → Disk full [src1]
│   │   │   └── FIX: Free disk, adjust threshold
│   │   ├── custom → Application-specific health check [src1]
│   │   │   └── FIX: Debug the custom HealthIndicator
│   │   └── readinessState → App not ready [src4]
│   │       └── FIX: Check startup dependencies
│   └── NO → Health is UP but still getting 503?
│       ├── Check load balancer health check path
│       └── Check if 503 comes from reverse proxy (nginx/ALB)
├── Connection pool errors in logs?
│   ├── YES → HikariCP exhausted [src3]
│   │   └── FIX: spring.datasource.hikari.maximum-pool-size=30
│   └── NO ↓
├── Thread rejection in logs?
│   ├── YES → Tomcat threads saturated [src5]
│   │   └── FIX: server.tomcat.threads.max=400 or enable virtual threads (3.2+)
│   └── NO ↓
├── Downstream calls timing out?
│   ├── YES → Cascading failure [src6]
│   │   └── FIX: Add RestClient/WebClient timeouts + circuit breaker
│   └── NO → Check JVM metrics (GC pauses, heap usage)
└── DEFAULT → Enable actuator + micrometer metrics

Step-by-Step Guide

1. Check actuator health endpoint

Determine which component is reporting DOWN. This is always the first diagnostic step. [src1]

# Check overall health
curl -s http://localhost:8080/actuator/health | jq

# Enable detailed health (application.properties):
management.endpoint.health.show-details=always
management.health.readinessstate.enabled=true
management.health.livenessstate.enabled=true

Verify: curl -s http://localhost:8080/actuator/health | jq .status → expected: "UP" after fix

2. Check and tune connection pool

HikariCP default pool size of 10 is almost always too small for production. [src3]

# application.properties — HikariCP tuning [src3]
spring.datasource.hikari.maximum-pool-size=30
spring.datasource.hikari.minimum-idle=10
spring.datasource.hikari.connection-timeout=30000
spring.datasource.hikari.idle-timeout=600000
spring.datasource.hikari.max-lifetime=1800000
spring.datasource.hikari.leak-detection-threshold=60000

Verify: curl -s http://localhost:8080/actuator/metrics/hikaricp.connections.active | jq .measurements → active should be below maximum-pool-size

3. Check and tune Tomcat thread pool

Default 200 threads can be exhausted under sustained load. On Spring Boot 3.2+, consider virtual threads instead. [src5]

# application.properties — Tomcat tuning [src5]
server.tomcat.threads.max=400
server.tomcat.threads.min-spare=20
server.tomcat.accept-count=100
server.tomcat.max-connections=10000
server.tomcat.connection-timeout=20000

# Spring Boot 3.2+ — virtual threads (eliminates thread pool exhaustion)
spring.threads.virtual.enabled=true

4. Add timeouts to downstream calls

Every outbound HTTP call must have connect and read timeouts. [src6, src7]

// Spring Boot 3.2+ — RestClient with timeouts (preferred) [src7]
@Configuration
public class RestClientConfig {
    @Bean
    public RestClient restClient(RestClient.Builder builder) {
        return builder
            .requestFactory(ClientHttpRequestFactories.get(
                ClientHttpRequestFactorySettings.DEFAULTS
                    .withConnectTimeout(Duration.ofSeconds(5))
                    .withReadTimeout(Duration.ofSeconds(10))
            ))
            .build();
    }
}

// Spring Boot 2.x-3.1 — RestTemplate with timeouts
@Configuration
public class LegacyRestClientConfig {
    @Bean
    public RestTemplate restTemplate() {
        var factory = new SimpleClientHttpRequestFactory();
        factory.setConnectTimeout(Duration.ofSeconds(5));
        factory.setReadTimeout(Duration.ofSeconds(10));
        return new RestTemplate(factory);
    }
}

5. Configure Kubernetes probes correctly

Use three separate probes: startup, liveness, and readiness. [src4]

# Kubernetes deployment — proper probe configuration [src4]
containers:
  - name: app
    livenessProbe:
      httpGet:
        path: /actuator/health/liveness
        port: 8080
      initialDelaySeconds: 30
      periodSeconds: 10
      failureThreshold: 3
    readinessProbe:
      httpGet:
        path: /actuator/health/readiness
        port: 8080
      initialDelaySeconds: 15
      periodSeconds: 5
      failureThreshold: 3
    startupProbe:
      httpGet:
        path: /actuator/health/liveness
        port: 8080
      initialDelaySeconds: 10
      periodSeconds: 5
      failureThreshold: 30  # 30 × 5s = 150s max startup

Verify: kubectl describe pod <pod-name> → no probe failure events after deployment

Code Examples

Java: Circuit breaker for downstream services

Full script: java-circuit-breaker-for-downstream-services.java (30 lines)

// Input:  Downstream API that may be slow/unavailable
// Output: Graceful degradation instead of 503 cascade

@Service
public class PaymentService {
    private final RestTemplate restTemplate;
    private final CircuitBreakerRegistry circuitBreakerRegistry;

    // Resilience4j circuit breaker prevents cascade [src6]
    @CircuitBreaker(name = "payment", fallbackMethod = "paymentFallback")
    @TimeLimiter(name = "payment")
    public CompletableFuture<PaymentResult> processPayment(PaymentRequest request) {
        return CompletableFuture.supplyAsync(() ->
            restTemplate.postForObject("/api/payments", request, PaymentResult.class)
        );
    }

    private CompletableFuture<PaymentResult> paymentFallback(PaymentRequest req, Throwable t) {
        return CompletableFuture.completedFuture(
            PaymentResult.pending("Payment queued — retry in progress")
        );
    }
}

// application.yml — circuit breaker config
// resilience4j.circuitbreaker.instances.payment:
//   sliding-window-size: 10
//   failure-rate-threshold: 50
//   wait-duration-in-open-state: 30s

Java: Custom health indicator with graceful degradation

// Input:  Non-critical external dependency (e.g., cache, search index)
// Output: Health check that degrades to WARNING without failing readiness

@Component
public class SearchHealthIndicator implements HealthIndicator {
    private final RestClient restClient;

    @Override
    public Health health() {
        try {
            restClient.get().uri("/ping")
                .retrieve().toBodilessEntity();
            return Health.up().build();
        } catch (Exception e) {
            // Report degraded but don't fail readiness [src1]
            return Health.up()
                .withDetail("search", "degraded: " + e.getMessage())
                .build();
        }
    }
}

Anti-Patterns

Wrong: No timeouts on HTTP calls

// ❌ BAD — one slow downstream call can exhaust all threads [src6]
String result = restTemplate.getForObject("http://slow-service/api", String.class);
// Default: no timeout → thread hangs indefinitely

Correct: Always set timeouts + circuit breaker

// ✅ GOOD — bounded timeout prevents thread exhaustion [src6, src7]
var factory = new SimpleClientHttpRequestFactory();
factory.setConnectTimeout(Duration.ofSeconds(3));
factory.setReadTimeout(Duration.ofSeconds(10));
var restTemplate = new RestTemplate(factory);

Wrong: Default connection pool size for production

# ❌ BAD — HikariCP default is 10 connections
# Under load: "Connection is not available, request timed out after 30000ms"

Correct: Size pool for production workload

# ✅ GOOD — scale pool to match expected concurrency [src3]
spring.datasource.hikari.maximum-pool-size=30
spring.datasource.hikari.leak-detection-threshold=60000

Wrong: Virtual threads without connection pool scaling

# ❌ BAD — virtual threads handle more requests but each still needs a DB connection
# 1000 virtual threads + 10-connection pool = massive contention [src3, src5]
spring.threads.virtual.enabled=true
# spring.datasource.hikari.maximum-pool-size=10  (default)

Correct: Virtual threads with appropriately sized pool

# ✅ GOOD — scale connection pool alongside virtual threads [src3]
spring.threads.virtual.enabled=true
spring.datasource.hikari.maximum-pool-size=50
spring.datasource.hikari.leak-detection-threshold=60000

Wrong: Single health endpoint for all K8s probes

# ❌ BAD — liveness and readiness have different semantics [src4]
livenessProbe:
  httpGet:
    path: /actuator/health  # includes readiness checks!
readinessProbe:
  httpGet:
    path: /actuator/health  # same endpoint

Correct: Separate probe endpoints

# ✅ GOOD — each probe checks what it should [src4]
livenessProbe:
  httpGet:
    path: /actuator/health/liveness   # only: is the JVM alive?
readinessProbe:
  httpGet:
    path: /actuator/health/readiness  # only: can it accept traffic?
startupProbe:
  httpGet:
    path: /actuator/health/liveness   # give app time to start

Common Pitfalls

Readiness probe too aggressive: initialDelaySeconds too short → K8s fails probe before Spring Boot starts. Fix: use startupProbe for slow-starting apps. [src4]
Connection pool exhaustion from leaks: Connections not returned (exception before close()). Fix: enable leak-detection-threshold and use try-with-resources. [src3]
Health check includes non-critical deps: Redis DOWN → entire app DOWN. Fix: group non-critical under readiness only, or use custom HealthIndicator that reports degraded. [src1]
No graceful shutdown: SIGTERM but no in-flight draining. Fix: server.shutdown=graceful + spring.lifecycle.timeout-per-shutdown-phase=30s. [src4]
Thread pool defaults too low: Tomcat default 200 threads → consumed under load. Fix: increase server.tomcat.threads.max or enable virtual threads on 3.2+. [src5]
Virtual threads with pinned carriers: synchronized blocks pin virtual threads to carrier threads. Under high concurrency, this re-introduces thread exhaustion. Fix: replace synchronized with ReentrantLock in hot paths. [src5]
Connection pool too large for DB: maximum-pool-size=200 across 10 instances = 2000 connections, exceeding most DB defaults (100-200). Fix: calculate pool_size = max_connections / instance_count. [src3]

Diagnostic Commands

# Check health details
curl -s http://localhost:8080/actuator/health | jq

# Check individual probe endpoints (K8s)
curl -s http://localhost:8080/actuator/health/liveness | jq
curl -s http://localhost:8080/actuator/health/readiness | jq

# HikariCP metrics (requires micrometer)
curl -s http://localhost:8080/actuator/metrics/hikaricp.connections.active | jq
curl -s http://localhost:8080/actuator/metrics/hikaricp.connections.pending | jq
curl -s http://localhost:8080/actuator/metrics/hikaricp.connections.timeout | jq

# Tomcat threads
curl -s http://localhost:8080/actuator/metrics/tomcat.threads.current | jq
curl -s http://localhost:8080/actuator/metrics/tomcat.threads.busy | jq

# JVM memory
curl -s http://localhost:8080/actuator/metrics/jvm.memory.used | jq

# Check if virtual threads are active (Spring Boot 3.2+)
curl -s http://localhost:8080/actuator/metrics/executor.active | jq

# Thread dump for stuck threads
curl -s http://localhost:8080/actuator/threaddump | jq '.threads[] | select(.state=="WAITING" or .state=="TIMED_WAITING") | .threadName'

# Kubernetes: check pod events for probe failures
kubectl describe pod <pod-name> | grep -A5 "Events:"
kubectl get events --field-selector involvedObject.name=<pod-name>

Version History & Compatibility

Spring Boot	Status	Key Changes for 503 Debugging
2.3+	EOL (commercial support only)	Readiness/liveness probes, graceful shutdown introduced [src4]
2.4+	EOL (commercial support only)	Startup probe support, K8s probe grouping
3.0+	EOL OSS	Jakarta EE 10 namespace (javax → jakarta), Tomcat 10.1
3.2+	EOL OSS	Virtual threads GA, RestClient GA, improved observability [src5, src7]
3.3+	EOL OSS	Enhanced health group configuration, structured logging
3.4+	EOL OSS (no further patches)	RestClient timeout via `ClientHttpRequestFactorySettings`, SSL bundle auto-reload [src7]
3.5+	Maintenance (OSS through 2026-06-30)	Final 3.x minor. Actuator `processInfo` now exposes virtual-thread counts on JDK 24+ — direct signal for thread saturation [src1]
4.0+	Current (Framework 7)	HTTP client properties moved to `spring.http.clients.` (old `spring.http.client.` deprecated). Actuator endpoint nullability migrated to JSpecify `org.jspecify.annotations.Nullable` — affects custom HealthIndicators [src8]

When to Use / When Not to Use

Debug 503 When	Look Elsewhere When	Use Instead
Actuator health shows DOWN	502 Bad Gateway (upstream proxy error)	Check nginx/ALB/Envoy logs
Connection pool logs show exhaustion	504 Gateway Timeout (proxy timeout)	Increase proxy timeout settings
K8s pod keeps restarting	500 Internal Server Error (unhandled exception)	Check application logs for stack trace
Load increases → intermittent 503	Consistent 503 on every request from startup	Check application.properties misconfiguration
After deploying new version	503 from CDN or static assets	Check CDN origin configuration

Important Caveats

Virtual threads (Spring Boot 3.2+ with spring.threads.virtual.enabled=true) can eliminate thread-pool exhaustion but don't fix connection pool or downstream timeout problems. You may hit connection pool limits faster because more concurrent requests proceed simultaneously.
Health endpoint 503 from K8s readiness is by design -- don't disable; fix the underlying health check.
In Spring Boot 3.x, management.health.readinessstate.enabled defaults to true in K8s environments (auto-detected via KUBERNETES_SERVICE_HOST env var).
RestClient (Spring Boot 3.2+) is the recommended replacement for RestTemplate. Since Boot 3.4, timeouts can be configured via ClientHttpRequestFactorySettings rather than manual factory creation.
spring.threads.virtual.enabled=true causes SimpleAsyncTaskExecutor to be used, which ignores spring.task.execution.pool.* properties.
On Spring Boot 4.0+, HTTP client timeout properties moved namespace: use spring.http.clients.* instead of spring.http.client.* (still works but emits deprecation warnings). [src8]
On Spring Boot 3.5+ running on JDK 24+, virtual-thread pinning from synchronized blocks is largely fixed at the JVM level, but synchronized blocks holding I/O still serialize traffic. Custom HealthIndicators that synchronize over external calls can still spike latency under load.