How to Fix 503 Service Unavailable in Spring Boot

Type: Software Reference Confidence: 0.92 Sources: 7 Verified: 2026-02-23 Freshness: quarterly

TL;DR

Constraints

Quick Reference

# 503 Root Cause Likelihood Symptom Fix
1 Readiness probe failing ~30% K8s removes pod; actuator DOWN Fix health indicator or increase probe timeout [src1, src4]
2 Connection pool exhausted ~25% Connection is not available, request timed out after 30000ms Increase pool size, fix leaks, enable leak detection [src3]
3 Thread pool saturated ~15% All 200 Tomcat threads busy, RejectedExecutionException Increase server.tomcat.threads.max, enable virtual threads on 3.2+ [src5]
4 Downstream timeout cascade ~15% Requests hang waiting for external API Add timeouts + circuit breaker [src6, src7]
5 OutOfMemoryError ~5% GC thrashing Increase heap or fix memory leak [src2]
6 Database unavailable ~5% DB health check DOWN Fix DB connection or disable check [src1]
7 Disk space full ~3% DiskSpaceHealthIndicator DOWN Free disk or adjust threshold [src1]
8 Graceful shutdown ~2% App shutting down, rejecting requests Expected during rolling updates [src4]

Decision Tree

START
├── Is /actuator/health returning DOWN?
│   ├── YES → Which health indicator is DOWN?
│   │   ├── db → Database connection issue [src1]
│   │   │   └── FIX: Check DB connectivity, increase pool size
│   │   ├── diskSpace → Disk full [src1]
│   │   │   └── FIX: Free disk, adjust threshold
│   │   ├── custom → Application-specific health check [src1]
│   │   │   └── FIX: Debug the custom HealthIndicator
│   │   └── readinessState → App not ready [src4]
│   │       └── FIX: Check startup dependencies
│   └── NO → Health is UP but still getting 503?
│       ├── Check load balancer health check path
│       └── Check if 503 comes from reverse proxy (nginx/ALB)
├── Connection pool errors in logs?
│   ├── YES → HikariCP exhausted [src3]
│   │   └── FIX: spring.datasource.hikari.maximum-pool-size=30
│   └── NO ↓
├── Thread rejection in logs?
│   ├── YES → Tomcat threads saturated [src5]
│   │   └── FIX: server.tomcat.threads.max=400 or enable virtual threads (3.2+)
│   └── NO ↓
├── Downstream calls timing out?
│   ├── YES → Cascading failure [src6]
│   │   └── FIX: Add RestClient/WebClient timeouts + circuit breaker
│   └── NO → Check JVM metrics (GC pauses, heap usage)
└── DEFAULT → Enable actuator + micrometer metrics

Step-by-Step Guide

1. Check actuator health endpoint

Determine which component is reporting DOWN. This is always the first diagnostic step. [src1]

# Check overall health
curl -s http://localhost:8080/actuator/health | jq

# Enable detailed health (application.properties):
management.endpoint.health.show-details=always
management.health.readinessstate.enabled=true
management.health.livenessstate.enabled=true

Verify: curl -s http://localhost:8080/actuator/health | jq .status → expected: "UP" after fix

2. Check and tune connection pool

HikariCP default pool size of 10 is almost always too small for production. [src3]

# application.properties — HikariCP tuning [src3]
spring.datasource.hikari.maximum-pool-size=30
spring.datasource.hikari.minimum-idle=10
spring.datasource.hikari.connection-timeout=30000
spring.datasource.hikari.idle-timeout=600000
spring.datasource.hikari.max-lifetime=1800000
spring.datasource.hikari.leak-detection-threshold=60000

Verify: curl -s http://localhost:8080/actuator/metrics/hikaricp.connections.active | jq .measurements → active should be below maximum-pool-size

3. Check and tune Tomcat thread pool

Default 200 threads can be exhausted under sustained load. On Spring Boot 3.2+, consider virtual threads instead. [src5]

# application.properties — Tomcat tuning [src5]
server.tomcat.threads.max=400
server.tomcat.threads.min-spare=20
server.tomcat.accept-count=100
server.tomcat.max-connections=10000
server.tomcat.connection-timeout=20000

# Spring Boot 3.2+ — virtual threads (eliminates thread pool exhaustion)
spring.threads.virtual.enabled=true

4. Add timeouts to downstream calls

Every outbound HTTP call must have connect and read timeouts. [src6, src7]

// Spring Boot 3.2+ — RestClient with timeouts (preferred) [src7]
@Configuration
public class RestClientConfig {
    @Bean
    public RestClient restClient(RestClient.Builder builder) {
        return builder
            .requestFactory(ClientHttpRequestFactories.get(
                ClientHttpRequestFactorySettings.DEFAULTS
                    .withConnectTimeout(Duration.ofSeconds(5))
                    .withReadTimeout(Duration.ofSeconds(10))
            ))
            .build();
    }
}

// Spring Boot 2.x-3.1 — RestTemplate with timeouts
@Configuration
public class LegacyRestClientConfig {
    @Bean
    public RestTemplate restTemplate() {
        var factory = new SimpleClientHttpRequestFactory();
        factory.setConnectTimeout(Duration.ofSeconds(5));
        factory.setReadTimeout(Duration.ofSeconds(10));
        return new RestTemplate(factory);
    }
}

5. Configure Kubernetes probes correctly

Use three separate probes: startup, liveness, and readiness. [src4]

# Kubernetes deployment — proper probe configuration [src4]
containers:
  - name: app
    livenessProbe:
      httpGet:
        path: /actuator/health/liveness
        port: 8080
      initialDelaySeconds: 30
      periodSeconds: 10
      failureThreshold: 3
    readinessProbe:
      httpGet:
        path: /actuator/health/readiness
        port: 8080
      initialDelaySeconds: 15
      periodSeconds: 5
      failureThreshold: 3
    startupProbe:
      httpGet:
        path: /actuator/health/liveness
        port: 8080
      initialDelaySeconds: 10
      periodSeconds: 5
      failureThreshold: 30  # 30 × 5s = 150s max startup

Verify: kubectl describe pod <pod-name> → no probe failure events after deployment

Code Examples

Java: Circuit breaker for downstream services

Full script: java-circuit-breaker-for-downstream-services.java (30 lines)

// Input:  Downstream API that may be slow/unavailable
// Output: Graceful degradation instead of 503 cascade

@Service
public class PaymentService {
    private final RestTemplate restTemplate;
    private final CircuitBreakerRegistry circuitBreakerRegistry;

    // Resilience4j circuit breaker prevents cascade [src6]
    @CircuitBreaker(name = "payment", fallbackMethod = "paymentFallback")
    @TimeLimiter(name = "payment")
    public CompletableFuture<PaymentResult> processPayment(PaymentRequest request) {
        return CompletableFuture.supplyAsync(() ->
            restTemplate.postForObject("/api/payments", request, PaymentResult.class)
        );
    }

    private CompletableFuture<PaymentResult> paymentFallback(PaymentRequest req, Throwable t) {
        return CompletableFuture.completedFuture(
            PaymentResult.pending("Payment queued — retry in progress")
        );
    }
}

// application.yml — circuit breaker config
// resilience4j.circuitbreaker.instances.payment:
//   sliding-window-size: 10
//   failure-rate-threshold: 50
//   wait-duration-in-open-state: 30s

Java: Custom health indicator with graceful degradation

// Input:  Non-critical external dependency (e.g., cache, search index)
// Output: Health check that degrades to WARNING without failing readiness

@Component
public class SearchHealthIndicator implements HealthIndicator {
    private final RestClient restClient;

    @Override
    public Health health() {
        try {
            restClient.get().uri("/ping")
                .retrieve().toBodilessEntity();
            return Health.up().build();
        } catch (Exception e) {
            // Report degraded but don't fail readiness [src1]
            return Health.up()
                .withDetail("search", "degraded: " + e.getMessage())
                .build();
        }
    }
}

Anti-Patterns

Wrong: No timeouts on HTTP calls

// ❌ BAD — one slow downstream call can exhaust all threads [src6]
String result = restTemplate.getForObject("http://slow-service/api", String.class);
// Default: no timeout → thread hangs indefinitely

Correct: Always set timeouts + circuit breaker

// ✅ GOOD — bounded timeout prevents thread exhaustion [src6, src7]
var factory = new SimpleClientHttpRequestFactory();
factory.setConnectTimeout(Duration.ofSeconds(3));
factory.setReadTimeout(Duration.ofSeconds(10));
var restTemplate = new RestTemplate(factory);

Wrong: Default connection pool size for production

# ❌ BAD — HikariCP default is 10 connections
# Under load: "Connection is not available, request timed out after 30000ms"

Correct: Size pool for production workload

# ✅ GOOD — scale pool to match expected concurrency [src3]
spring.datasource.hikari.maximum-pool-size=30
spring.datasource.hikari.leak-detection-threshold=60000

Wrong: Virtual threads without connection pool scaling

# ❌ BAD — virtual threads handle more requests but each still needs a DB connection
# 1000 virtual threads + 10-connection pool = massive contention [src3, src5]
spring.threads.virtual.enabled=true
# spring.datasource.hikari.maximum-pool-size=10  (default)

Correct: Virtual threads with appropriately sized pool

# ✅ GOOD — scale connection pool alongside virtual threads [src3]
spring.threads.virtual.enabled=true
spring.datasource.hikari.maximum-pool-size=50
spring.datasource.hikari.leak-detection-threshold=60000

Wrong: Single health endpoint for all K8s probes

# ❌ BAD — liveness and readiness have different semantics [src4]
livenessProbe:
  httpGet:
    path: /actuator/health  # includes readiness checks!
readinessProbe:
  httpGet:
    path: /actuator/health  # same endpoint

Correct: Separate probe endpoints

# ✅ GOOD — each probe checks what it should [src4]
livenessProbe:
  httpGet:
    path: /actuator/health/liveness   # only: is the JVM alive?
readinessProbe:
  httpGet:
    path: /actuator/health/readiness  # only: can it accept traffic?
startupProbe:
  httpGet:
    path: /actuator/health/liveness   # give app time to start

Common Pitfalls

Diagnostic Commands

# Check health details
curl -s http://localhost:8080/actuator/health | jq

# Check individual probe endpoints (K8s)
curl -s http://localhost:8080/actuator/health/liveness | jq
curl -s http://localhost:8080/actuator/health/readiness | jq

# HikariCP metrics (requires micrometer)
curl -s http://localhost:8080/actuator/metrics/hikaricp.connections.active | jq
curl -s http://localhost:8080/actuator/metrics/hikaricp.connections.pending | jq
curl -s http://localhost:8080/actuator/metrics/hikaricp.connections.timeout | jq

# Tomcat threads
curl -s http://localhost:8080/actuator/metrics/tomcat.threads.current | jq
curl -s http://localhost:8080/actuator/metrics/tomcat.threads.busy | jq

# JVM memory
curl -s http://localhost:8080/actuator/metrics/jvm.memory.used | jq

# Check if virtual threads are active (Spring Boot 3.2+)
curl -s http://localhost:8080/actuator/metrics/executor.active | jq

# Thread dump for stuck threads
curl -s http://localhost:8080/actuator/threaddump | jq '.threads[] | select(.state=="WAITING" or .state=="TIMED_WAITING") | .threadName'

# Kubernetes: check pod events for probe failures
kubectl describe pod <pod-name> | grep -A5 "Events:"
kubectl get events --field-selector involvedObject.name=<pod-name>

Version History & Compatibility

Spring Boot Status Key Changes for 503 Debugging
2.3+ Maintenance Readiness/liveness probes, graceful shutdown introduced [src4]
2.4+ Maintenance Startup probe support, K8s probe grouping
3.0+ Maintenance Jakarta EE 10 namespace (javax → jakarta), Tomcat 10.1
3.2+ Current Virtual threads GA, RestClient GA, improved observability [src5, src7]
3.3+ Current Enhanced health group configuration, structured logging
3.4+ Current RestClient timeout via ClientHttpRequestFactorySettings, SSL bundle auto-reload [src7]

When to Use / When Not to Use

Debug 503 When Look Elsewhere When Use Instead
Actuator health shows DOWN 502 Bad Gateway (upstream proxy error) Check nginx/ALB/Envoy logs
Connection pool logs show exhaustion 504 Gateway Timeout (proxy timeout) Increase proxy timeout settings
K8s pod keeps restarting 500 Internal Server Error (unhandled exception) Check application logs for stack trace
Load increases → intermittent 503 Consistent 503 on every request from startup Check application.properties misconfiguration
After deploying new version 503 from CDN or static assets Check CDN origin configuration

Important Caveats

Related Units