max_connections (default: 100),
new attempts fail with FATAL: sorry, too many clients already. The fix: deploy
PgBouncer in transaction pooling mode, which multiplexes thousands of app connections
through a small, fixed pool of actual server connections.SELECT state, count(*) FROM pg_stat_activity GROUP BY state; -- shows active, idle, and
idle in transaction connections. High idle counts = connection leak; high
idle in transaction = transaction/lock leak.max_connections costs memory (~5-10MB per
connection) and degrades performance beyond ~300-500 connections due to OS scheduler and lock
contention. Use PgBouncer instead.transaction_timeout.max_connections above 300-500 without a connection pooler --
performance degrades super-linearly due to OS context switching (1 process per connection) and
PostgreSQL internal lock contention. [src1]PREPARE/EXECUTE), advisory locks (pg_advisory_lock),
LISTEN/NOTIFY, persistent SET statements, and temporary tables
are all incompatible with pool_mode = transaction. [src2]idle_session_timeout requires PostgreSQL 14+ -- on older versions, use
PgBouncer's client_idle_timeout (added in PgBouncer 1.24) or implement application-level
connection release. [src1]transaction_timeout requires PostgreSQL 17+ -- this parameter was added in
PG17 (2024-09). Do not set it on older versions or the server will reject the configuration. [src5]auth_type = md5 on older PgBouncer versions. [src2]max_connections requires a full server restart -- unlike most GUC
parameters, it cannot be changed with pg_reload_conf(). Plan maintenance windows. [src1]| # | Symptom / Goal | Command / Solution | Notes |
|---|---|---|---|
| 1 | Check current connection count | SELECT count(*) FROM pg_stat_activity; |
Compare to max_connections [src3] |
| 2 | Check connection limit | SHOW max_connections; |
Default: 100 [src1] |
| 3 | See connections by state | SELECT state, count(*) FROM pg_stat_activity GROUP BY state; |
active, idle, idle in transaction [src3] |
| 4 | Find which apps consume connections | SELECT usename, application_name, client_addr, count(*) FROM pg_stat_activity GROUP BY 1,2,3 ORDER BY 4 DESC;
|
Pinpoints the culprit [src3] |
| 5 | Kill idle connections immediately | SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE state = 'idle' AND query_start < NOW() - INTERVAL '10 minutes';
|
Emergency only [src3] |
| 6 | Kill idle-in-transaction connections | SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE state = 'idle in transaction' AND query_start < NOW() - INTERVAL '5 minutes';
|
Frees locked resources [src3] |
| 7 | Check superuser reserved slots | SHOW superuser_reserved_connections; |
Default: 3 (reserved from max_connections) [src1] |
| 8 | Check PgBouncer pool status | SHOW POOLS; in PgBouncer admin |
cl_waiting > 0 = exhaustion [src2] |
| 9 | Check PgBouncer overall stats | SHOW STATS; in PgBouncer admin |
req/s, latency, errors [src2] |
| 10 | Set idle-in-transaction timeout | idle_in_transaction_session_timeout = 30000 |
Kills stuck transactions after 30s [src1] |
| 11 | Set statement timeout | statement_timeout = 30000 |
Kills queries running > 30s [src1] |
| 12 | Set transaction timeout (PG17+) | transaction_timeout = 60000 |
Limits total transaction duration to 60s [src5] |
| 13 | Reload config without restart | SELECT pg_reload_conf(); |
For GUC params that don't need restart [src1] |
| 14 | Increase max_connections (temporary) | Edit postgresql.conf, restart server |
Requires server restart; see caveats [src1] |
START -- "FATAL: sorry, too many clients already" or connection timeout
|
+-- Step 1: Diagnose
| +-- SHOW max_connections;
| +-- SELECT count(*) FROM pg_stat_activity;
| +-- SELECT state, count(*) FROM pg_stat_activity GROUP BY state;
| +-- HIGH "active" -> real load -> need PgBouncer [src2]
| +-- HIGH "idle" -> connection leak -> fix app pool [src4]
| +-- HIGH "idle in transaction" -> transaction leak -> set idle_in_transaction_session_timeout [src1]
|
+-- Step 2: Emergency mitigation
| +-- Kill idle connections -> pg_terminate_backend() [src3]
| +-- Set idle_in_transaction_session_timeout -> prevents future stalls [src1]
| +-- Temporarily increase max_connections -> restart required, temporary [src1]
|
+-- Step 3: Permanent fix
| +-- No connection pooler?
| | +-- INSTALL PgBouncer in transaction pooling mode [src2, src6]
| | -> 10,000 app connections -> 20-100 PostgreSQL connections
| +-- PgBouncer installed but pool_size too small?
| | +-- Increase default_pool_size in pgbouncer.ini [src2]
| +-- Application connection leak?
| | +-- Add missing connection.close() / context managers
| | +-- Set server_idle_timeout in PgBouncer [src2]
| +-- Long-running transactions?
| | +-- idle_in_transaction_session_timeout + statement_timeout [src1]
| | +-- transaction_timeout (PostgreSQL 17+) [src5]
| +-- High-scale (>50 concurrent per core)?
| +-- Consider PgCat (multi-threaded, 59K tps) [src4, src7, src8]
| +-- Consider Odyssey (enterprise, multi-threaded) [src4]
|
+-- Step 4: Monitor
+-- Alert when pg_stat_activity count > 80% of max_connections
+-- SHOW POOLS; -- cl_waiting > 0 = problem
+-- Grafana + postgres_exporter for dashboards
Always start with pg_stat_activity before making changes. [src3]
-- How many connections vs the limit?
SHOW max_connections;
SELECT count(*) AS current FROM pg_stat_activity;
-- What are they doing?
SELECT state, count(*) FROM pg_stat_activity GROUP BY state ORDER BY count(*) DESC;
-- Who is consuming the most?
SELECT usename, application_name, client_addr, state, count(*)
FROM pg_stat_activity
WHERE pid <> pg_backend_pid()
GROUP BY 1, 2, 3, 4
ORDER BY count(*) DESC LIMIT 20;
-- Long-running / stuck connections
SELECT pid, usename, state,
now() - backend_start AS conn_age,
now() - query_start AS query_age,
left(query, 100)
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY query_age DESC NULLS LAST;
For immediate relief when hitting the connection limit. [src3]
-- Kill idle connections older than 10 minutes
SELECT pg_terminate_backend(pid), usename, application_name, state
FROM pg_stat_activity
WHERE state = 'idle'
AND query_start < NOW() - INTERVAL '10 minutes'
AND pid <> pg_backend_pid();
-- Kill idle-in-transaction connections older than 5 minutes
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE state = 'idle in transaction'
AND query_start < NOW() - INTERVAL '5 minutes'
AND pid <> pg_backend_pid();
-- Check headroom
SELECT current_setting('max_connections')::int AS max_conn,
(SELECT count(*) FROM pg_stat_activity) AS used,
current_setting('max_connections')::int
- current_setting('superuser_reserved_connections')::int
- (SELECT count(*) FROM pg_stat_activity) AS slots_remaining;
Add to postgresql.conf and reload. [src1, src5]
# postgresql.conf
idle_in_transaction_session_timeout = 30000 # ms -- kills stuck transactions
statement_timeout = 30000 # ms -- kills slow statements
idle_session_timeout = 600000 # ms -- kills truly idle connections (PG14+)
transaction_timeout = 60000 # ms -- caps total transaction time (PG17+)
-- Reload without restart:
SELECT pg_reload_conf();
PgBouncer multiplexes many app connections through a small pool. [src2, src6]
# /etc/pgbouncer/pgbouncer.ini
[databases]
myapp = host=127.0.0.1 port=5432 dbname=myapp
[pgbouncer]
listen_port = 6432
listen_addr = *
auth_type = scram-sha-256
auth_file = /etc/pgbouncer/userlist.txt
pool_mode = transaction # CRITICAL: return connection after each transaction
default_pool_size = 25 # server connections per db/user
max_client_conn = 10000 # accept up to 10,000 app connections
reserve_pool_size = 5
reserve_pool_timeout = 3
max_prepared_statements = 200 # enabled by default in 1.24+
server_idle_timeout = 600
log_pooler_errors = 1
sudo systemctl enable pgbouncer && sudo systemctl start pgbouncer
# Monitor:
psql -h 127.0.0.1 -p 6432 -U pgbouncer_admin pgbouncer -c "SHOW POOLS;"
# cl_waiting > 0 = pool exhaustion; increase default_pool_size
Even with PgBouncer, tune your application's own connection pool. [src4, src7]
# SQLAlchemy (Python)
engine = create_engine(
"postgresql+psycopg2://user:pass@pgbouncer_host:6432/mydb",
pool_size=5,
max_overflow=10,
pool_timeout=30,
pool_recycle=1800,
pool_pre_ping=True,
)
// Node.js pg
const pool = new Pool({
host: 'pgbouncer_host', port: 6432,
max: 10,
idleTimeoutMillis: 30000,
connectionTimeoutMillis: 5000,
});
WITH limits AS (
SELECT current_setting('max_connections')::int AS max_conn,
current_setting('superuser_reserved_connections')::int AS reserved
),
totals AS (SELECT count(*) AS total FROM pg_stat_activity)
SELECT
l.max_conn,
l.max_conn - l.reserved AS usable,
t.total AS used,
round((t.total::numeric / (l.max_conn - l.reserved)) * 100, 1) AS pct_used,
CASE
WHEN (t.total::numeric / (l.max_conn - l.reserved)) > 0.90 THEN 'CRITICAL'
WHEN (t.total::numeric / (l.max_conn - l.reserved)) > 0.75 THEN 'WARNING'
ELSE 'OK'
END AS health,
(SELECT count(*) FROM pg_stat_activity WHERE state = 'active') AS active_conns,
(SELECT count(*) FROM pg_stat_activity WHERE state = 'idle') AS idle_conns,
(SELECT count(*) FROM pg_stat_activity WHERE state = 'idle in transaction') AS idle_txn_conns
FROM limits l, totals t;
Full script: python-connection-pool-monitor-with-alerting.py (95 lines)
#!/usr/bin/env python3
"""
Input: PostgreSQL DSN + alert thresholds
Output: Connection health report + alerts
Requirements: pip install psycopg2-binary
"""
import psycopg2, sys
def check_health(dsn, warn_pct=75.0, crit_pct=90.0):
conn = psycopg2.connect(dsn)
cur = conn.cursor()
alerts = []
cur.execute("SHOW max_connections"); max_conn = int(cur.fetchone()[0])
cur.execute("SHOW superuser_reserved_connections"); reserved = int(cur.fetchone()[0])
usable = max_conn - reserved
cur.execute("""
SELECT state, count(*),
max(EXTRACT(EPOCH FROM (now()-query_start)))::int
FROM pg_stat_activity WHERE pid <> pg_backend_pid()
GROUP BY state
""")
by_state = {r[0] or 'unknown': {'count': r[1], 'max_age': r[2]} for r in cur.fetchall()}
total = sum(v['count'] for v in by_state.values())
pct = (total / usable) * 100
if pct >= crit_pct:
alerts.append(f'CRITICAL: {pct:.1f}% full ({total}/{usable}) -- imminent FATAL')
elif pct >= warn_pct:
alerts.append(f'WARNING: {pct:.1f}% full ({total}/{usable})')
idle_txn = by_state.get('idle in transaction', {})
if idle_txn.get('count', 0) > 5:
alerts.append(f"WARNING: {idle_txn['count']} idle-in-transaction (max_age={idle_txn.get('max_age',0)}s)")
idle = by_state.get('idle', {})
if idle.get('count', 0) > usable * 0.5:
alerts.append(f"WARNING: {idle['count']} idle connections -- possible leak")
cur.close(); conn.close()
return {'used': total, 'usable': usable, 'pct': pct, 'by_state': by_state, 'alerts': alerts}
if __name__ == '__main__':
dsn = sys.argv[1] if len(sys.argv) > 1 else 'postgresql://localhost/mydb'
r = check_health(dsn)
print(f"Connections: {r['used']}/{r['usable']} ({r['pct']:.1f}%)")
for state, info in r['by_state'].items():
print(f" {state}: {info['count']} (max_age={info.get('max_age',0)}s)")
for alert in r['alerts']: print(f"[ALERT] {alert}")
Full script: bash-pgbouncer-health-check-and-pool-report.sh (42 lines)
#!/bin/bash
PGB_HOST="${PGB_HOST:-127.0.0.1}"
PGB_PORT="${PGB_PORT:-6432}"
PGB_USER="${PGB_USER:-pgbouncer_admin}"
PSQL="psql -h $PGB_HOST -p $PGB_PORT -U $PGB_USER pgbouncer -t -A -F'|'"
echo "=== PgBouncer Health Report === $(date)"
echo ""
echo "--- Pools ---"
eval "$PSQL" -c "SHOW POOLS;" | while IFS='|' read -r db user cl_active cl_waiting sv_active sv_idle rest; do
[ "$db" = "database" ] || [ -z "$db" ] && continue
if [ "${cl_waiting:-0}" -gt "0" ] 2>/dev/null; then
echo " WARNING: $db/$user -- cl_waiting=$cl_waiting cl_active=$cl_active sv_active=$sv_active"
else
echo " OK: $db/$user -- cl_active=$cl_active sv_active=$sv_active sv_idle=$sv_idle"
fi
done
echo ""
echo "Actions if cl_waiting > 0:"
echo " 1. Increase pool_size in pgbouncer.ini -> RELOAD"
echo " 2. Check for long-running transactions in PostgreSQL"
echo " 3. Check for connection leaks in application"
# BAD -- max_connections = 1000 -> ~5-10 GB RAM; performance degrades above ~300 [src1, src4]
max_connections = 1000
# Context switching for 1000 OS processes slows everything down
# Still fails when microservices push past 1000
# GOOD -- PgBouncer handles app connections; PostgreSQL handles few [src2, src6]
# postgresql.conf
max_connections = 100 # PostgreSQL only sees pooled connections
# pgbouncer.ini
max_client_conn = 10000 # accept 10,000 app connections
default_pool_size = 25 # but only 25 actual PostgreSQL connections
pool_mode = transaction
# BAD -- connection never released on exception [src4]
def get_users():
conn = pool.getconn()
cur = conn.cursor()
cur.execute("SELECT * FROM users") # exception here...
rows = cur.fetchall()
pool.putconn(conn) # ...this is never reached
return rows
# GOOD -- connection always returned to pool [src4]
def get_users():
conn = pool.getconn()
try:
with conn.cursor() as cur:
cur.execute("SELECT * FROM users")
return cur.fetchall()
finally:
pool.putconn(conn) # always runs, even on exception
# BAD -- session pooling barely improves over no pooling [src2, src6]
pool_mode = session
# Each app "connection" holds a server connection for its entire lifetime
# 500 simultaneous users -> 500 server connections -> same problem
# GOOD -- maximum multiplexing [src2, src6]
pool_mode = transaction
# Server connection held only during a transaction
# 500 users doing 50ms transactions -> ~25 server connections
# Not compatible with transaction pooling:
# - Persistent SET statements
# - Server-side prepared statements (PREPARE/EXECUTE)
# - Advisory locks (pg_advisory_lock)
# - LISTEN/NOTIFY
# - Temporary tables (use unlogged tables instead)
# BAD -- no timeouts means one stuck transaction can exhaust all connections [src1, src5]
# postgresql.conf with NO timeout settings:
# idle_in_transaction_session_timeout = 0
# statement_timeout = 0
# idle_session_timeout = 0
# One stuck transaction holds locks and connections indefinitely
# GOOD -- layered timeouts catch different failure modes [src1, src5]
# postgresql.conf
idle_in_transaction_session_timeout = 30000 # kills stuck BEGIN with no COMMIT
statement_timeout = 30000 # kills runaway queries
idle_session_timeout = 600000 # kills abandoned connections (PG14+)
transaction_timeout = 60000 # caps total transaction time (PG17+)
max_connections includes superuser reserved slots: The effective limit for
regular users is max_connections - superuser_reserved_connections (default: 100 - 3 = 97).
Increase superuser_reserved_connections = 5 for emergency admin access. [src1]max_connections requires a restart: Unlike most PostgreSQL
config changes, max_connections requires a full server restart (not just
pg_reload_conf()). Plan maintenance windows accordingly. [src1]PREPARE/EXECUTE) are connection-scoped. In transaction mode,
PgBouncer may route EXECUTE to a different server connection than PREPARE.
PgBouncer 1.24+ has max_prepared_statements = 200 enabled by default, which helps but
doesn't fully solve cross-transaction usage. [src2]pool_size = num_cpu_cores x 2. For a 4-core DB server: default_pool_size = 8.
Going higher often hurts due to PostgreSQL's internal lock contention. [src2, src6]
idle in transaction holds locks: An idle in transaction
connection holds all row locks from its open transaction. Even one stuck transaction can block other
queries. Set idle_in_transaction_session_timeout to automatically kill these. [src1, src3]pool_size=10, you get 320 potential connections to PgBouncer. Size
max_client_conn in PgBouncer accordingly and use max_user_client_connections
(1.24+) to cap per-user usage. [src4, src7]VALID UNTIL
of user passwords when using auth_query. Upgrade to 1.24.1+ or 1.25.0+. [src2]-- Connection overview
SHOW max_connections;
SHOW superuser_reserved_connections;
SELECT count(*) FROM pg_stat_activity;
SELECT state, count(*) FROM pg_stat_activity GROUP BY state ORDER BY count(*) DESC;
-- Top consumers
SELECT usename, application_name, client_addr, state, count(*)
FROM pg_stat_activity GROUP BY 1,2,3,4 ORDER BY count(*) DESC LIMIT 20;
-- Long-running / stuck
SELECT pid, usename, state,
now()-backend_start AS conn_age,
now()-query_start AS query_age,
left(query, 80)
FROM pg_stat_activity WHERE state != 'idle' ORDER BY query_age DESC NULLS LAST LIMIT 20;
-- Emergency: kill idle connections
SELECT pg_terminate_backend(pid) FROM pg_stat_activity
WHERE state = 'idle' AND query_start < NOW() - INTERVAL '10 minutes'
AND pid <> pg_backend_pid();
-- Emergency: kill stuck transactions
SELECT pg_terminate_backend(pid) FROM pg_stat_activity
WHERE state = 'idle in transaction' AND query_start < NOW() - INTERVAL '5 minutes'
AND pid <> pg_backend_pid();
-- Headroom calculation
SELECT current_setting('max_connections')::int AS max_conn,
(SELECT count(*) FROM pg_stat_activity) AS used,
current_setting('max_connections')::int
- current_setting('superuser_reserved_connections')::int
- (SELECT count(*) FROM pg_stat_activity) AS slots_remaining;
# PgBouncer admin interface
psql -h 127.0.0.1 -p 6432 -U pgbouncer pgbouncer
# Inside PgBouncer admin:
SHOW POOLS; # cl_waiting > 0 = exhaustion -- increase pool_size
SHOW STATS; # request rates, avg query time
SHOW SERVERS; # server-side connections
SHOW CLIENTS; # client-side connections (idle state new in 1.25)
SHOW CONFIG; # current config values
RELOAD; # reload pgbouncer.ini (no restart needed)
PAUSE mydb; # pause all queries to a db (for maintenance)
RESUME mydb; # resume after pause
| Feature | Available Since | Notes |
|---|---|---|
pg_stat_activity |
PostgreSQL 7.4 | Essential monitoring view [src3] |
pg_terminate_backend() |
PostgreSQL 8.4 | Terminate connections [src3] |
idle_in_transaction_session_timeout |
PostgreSQL 9.6 | Auto-kills stuck transactions [src1] |
statement_timeout |
PostgreSQL 7.3 | Kills slow statements [src1] |
idle_session_timeout |
PostgreSQL 14 | Kills completely idle connections [src1] |
transaction_timeout |
PostgreSQL 17 (2024-09) | Limits total transaction duration [src5] |
| PgBouncer transaction pooling | PgBouncer 1.0 (2007) | Core feature; stable [src2] |
| PgBouncer SCRAM-SHA-256 | PgBouncer 1.14 (2020) | Required for PostgreSQL 14+ default auth [src2] |
PgBouncer client_idle_timeout |
PgBouncer 1.24 (2025-01) | Per-user idle client timeout [src2] |
PgBouncer max_prepared_statements |
PgBouncer 1.24 (2025-01) | Default 200; was opt-in before [src2] |
PgBouncer max_user_client_connections |
PgBouncer 1.24 (2025-01) | Per-user connection limits [src2] |
| PgBouncer LDAP auth | PgBouncer 1.25 (2025-11) | LDAP via HBA or auth_ldap_options [src2] |
| PgBouncer client-side direct TLS | PgBouncer 1.25 (2025-11) | PG17 faster TLS support [src2] |
PgBouncer transaction_timeout |
PgBouncer 1.25 (2025-11) | Pooler-level transaction timeout [src2] |
PgBouncer query_wait_notify |
PgBouncer 1.25 (2025-11) | NOTICE to queued clients [src2] |
| Use When | Don't Use When | Use Instead |
|---|---|---|
| FATAL: sorry, too many clients already | Slow query performance (not connection count) | PostgreSQL query optimization (EXPLAIN ANALYZE) |
| Web app with >50 concurrent DB connections | Single-user CLI or batch jobs | Direct PostgreSQL connection (no pooler needed) |
| Microservices each opening own connection pool | Need read replicas + load balancing + failover | Pgpool-II or PgCat (multi-feature proxy) |
| Connection count approaching max_connections | Already using managed pooling (Supabase, RDS Proxy) | Managed pooler settings (tune, don't replace) |
| High idle connection count from leaks | PgBouncer bottleneck at >50K tps | PgCat or Odyssey (multi-threaded) |
transaction_timeout closes a gap: before PG17, a transaction
with many short statements and short idle pauses could run indefinitely even with
statement_timeout and idle_in_transaction_session_timeout set.
transaction_timeout caps total wall-clock time. [src5]max_prepared_statements = 200). Older versions required explicit opt-in. Test
compatibility when upgrading. [src2]