aws lambda update-function-configuration --function-name my-func --memory-size 1769 --timeout 30
UNKNOWNHOSTEXCEPTION regardless of timeout. [src5]
| # | Cause | Likelihood | Signature | Fix |
|---|---|---|---|---|
| 1 | Insufficient memory/CPU | ~30% | Duration near timeout; Max Memory Used near
Memory Size |
Increase memory to 1769 MB (1 vCPU) [src1, src3] |
| 2 | Downstream service timeout | ~25% | Task timed out after X.XX seconds; no error log before timeout |
Add explicit connection/read timeouts (5s/10s) to HTTP clients [src2] |
| 3 | VPC without NAT gateway | ~20% | ETIMEDOUT or Task timed out on external HTTP calls |
Add NAT gateway to public subnet; route private subnet through it [src5] |
| 4 | Cold start during INIT | ~10% | Init Duration: NNNNms in REPORT log; first invocation slow |
SnapStart, provisioned concurrency, or reduce package size [src3, src4] |
| 5 | Large deployment package | ~5% | High Init Duration (>1s); large ZIP artifact |
Tree-shake dependencies; use Lambda Layers [src6, src7] |
| 6 | Recursive/infinite loop | ~5% | Function always times out at exact timeout limit | Check S3 trigger writing to same bucket, SQS re-queue loops [src2] |
| 7 | Payload too large | ~3% | Timeout with large event payloads | Batch smaller; use S3 pre-signed URLs for large data [src2] |
| 8 | Default 3s timeout | ~2% | Task timed out after 3.00 seconds |
Set 30–60s for APIs, 300s for data processing [src1] |
| 9 | Network ACL blocking ephemeral ports | Rare | Intermittent ETIMEDOUT in VPC functions |
Allow TCP/UDP ports 1024–65535 in subnet Network ACLs [src5] |
| 10 | DNS resolution limit exceeded | Rare | UNKNOWNHOSTEXCEPTION under high concurrency |
Reduce concurrent DNS lookups; max 20 TCP DNS connections [src5] |
START — Lambda function timing out or slow
├── Check REPORT log in CloudWatch
│ ├── "Init Duration" present and > 1000ms?
│ │ ├── YES → COLD START ISSUE
│ │ │ ├── Runtime is Java/Python/.NET? → Enable SnapStart [src4]
│ │ │ ├── Package > 50 MB? → Tree-shake, use Layers [src6, src7]
│ │ │ ├── Need guaranteed <100ms start? → Provisioned Concurrency [src3]
│ │ │ └── Heavy SDK imports? → Lazy-load, import only needed clients [src8]
│ │ └── NO → RUNTIME TIMEOUT ISSUE ↓
│ │
│ ├── "Max Memory Used" close to "Memory Size"?
│ │ ├── YES → Increase memory (doubles CPU too) [src1, src3]
│ │ └── NO ↓
│ │
│ ├── Duration close to timeout every time?
│ │ ├── YES → Likely infinite loop or recursive trigger [src2]
│ │ └── NO ↓
│ │
│ ├── Function in VPC?
│ │ ├── YES → Check NAT gateway exists for outbound internet [src5]
│ │ │ ├── No NAT → Add NAT gateway + route table
│ │ │ ├── AWS services only? → Use VPC endpoints [src5]
│ │ │ └── Intermittent? → Check Network ACL ephemeral ports [src5]
│ │ └── NO ↓
│ │
│ └── Timeout only on some invocations?
│ ├── YES → Downstream service latency → add client-side timeouts [src2]
│ └── NO → Increase function timeout; check payload size [src1]
│
└── No REPORT log at all?
└── Check execution role has AWSLambdaBasicExecutionRole [src2]
Every Lambda invocation produces a REPORT line with key metrics. This is your starting point. [src2, src3]
# CloudWatch Logs Insights — find recent timeouts
fields @timestamp, @message
| filter @message like /Task timed out/
| sort @timestamp desc
| limit 20
# Analyze cold start frequency and duration
fields @timestamp, @initDuration, @duration, @maxMemoryUsed, @memorySize
| filter ispresent(@initDuration)
| stats avg(@initDuration) as avgColdStart, max(@initDuration) as maxColdStart,
count(*) as coldStartCount
| sort coldStartCount desc
Verify: REPORT line shows Duration, Billed Duration,
Memory Size, Max Memory Used, and Init Duration (if cold start).
Lambda allocates CPU proportionally to memory. Below 1769 MB, you get fractional CPU. This is the single most impactful tuning knob. [src1, src3]
# Set memory to 1769 MB (1 full vCPU) and timeout to 30 seconds
aws lambda update-function-configuration \
--function-name my-function \
--memory-size 1769 \
--timeout 30
Verify:
aws lambda get-function-configuration --function-name my-function --query '{Memory: MemorySize, Timeout: Timeout}'
Never rely on the Lambda timeout as your only safety net. Set client-side timeouts on every external call. [src2]
# Python — explicit timeouts on AWS SDK and HTTP calls
import boto3
from botocore.config import Config
boto_config = Config(
connect_timeout=5, # 5s to establish connection
read_timeout=10, # 10s to read response
retries={'max_attempts': 2}
)
dynamodb = boto3.client('dynamodb', config=boto_config)
VPC-attached functions have no internet access by default. All outbound traffic goes through the VPC. [src5]
# SAM template — Lambda in VPC with NAT gateway
Resources:
MyFunction:
Type: AWS::Serverless::Function
Properties:
Handler: index.handler
Runtime: nodejs20.x
MemorySize: 1769
Timeout: 30
VpcConfig:
SecurityGroupIds:
- !Ref LambdaSecurityGroup
SubnetIds:
- !Ref PrivateSubnet1
SnapStart takes a microVM snapshot after INIT, reducing cold starts from seconds to sub-second. [src4]
# Enable SnapStart and publish a version
aws lambda update-function-configuration \
--function-name my-function \
--snap-start ApplyOn=PublishedVersions
aws lambda publish-version --function-name my-function
Verify: Check CloudWatch for Restore Duration instead of
Init Duration — restore should be <200ms.
Provisioned concurrency keeps environments pre-warmed. Most reliable but most expensive approach. [src3, src6]
# Set 10 provisioned concurrent executions
aws lambda put-provisioned-concurrency-config \
--function-name my-function \
--qualifier live \
--provisioned-concurrent-executions 10
Verify: Status should be READY. REPORT log should show no
Init Duration.
# Input: API Gateway event
# Output: JSON response with downstream data
import json, os, boto3
from botocore.config import Config
# INIT phase: runs once per cold start, persists across warm invocations
boto_config = Config(connect_timeout=5, read_timeout=10, retries={'max_attempts': 2})
dynamodb = boto3.resource('dynamodb', config=boto_config)
table = dynamodb.Table(os.environ['TABLE_NAME'])
def handler(event, context):
remaining_ms = context.get_remaining_time_in_millis()
if remaining_ms < 5000:
return {'statusCode': 503, 'body': json.dumps({'error': 'Insufficient time'})}
try:
item_id = event.get('pathParameters', {}).get('id', '')
response = table.get_item(Key={'id': item_id})
item = response.get('Item')
if not item:
return {'statusCode': 404, 'body': json.dumps({'error': 'Not found'})}
return {'statusCode': 200, 'body': json.dumps(item, default=str)}
except Exception as e:
return {'statusCode': 500, 'body': json.dumps({'error': str(e)})}
// Input: API Gateway event
// Output: JSON response
const { DynamoDBClient, GetItemCommand } = require('@aws-sdk/client-dynamodb');
const { unmarshall } = require('@aws-sdk/util-dynamodb');
const client = new DynamoDBClient({
requestHandler: { connectionTimeout: 5000, socketTimeout: 10000 },
maxAttempts: 2
});
exports.handler = async (event, context) => {
if (context.getRemainingTimeInMillis() < 5000) {
return { statusCode: 503, body: JSON.stringify({ error: 'Insufficient time' }) };
}
const id = event.pathParameters?.id;
const { Item } = await client.send(new GetItemCommand({
TableName: process.env.TABLE_NAME,
Key: { id: { S: id } }
}));
if (!Item) return { statusCode: 404, body: JSON.stringify({ error: 'Not found' }) };
return { statusCode: 200, body: JSON.stringify(unmarshall(Item)) };
};
// Input: API Gateway proxy request
// Output: API Gateway proxy response
// Requires: Java 11+ runtime with SnapStart enabled
import software.amazon.awssdk.services.dynamodb.DynamoDBClient;
import org.crac.Core;
import org.crac.Resource;
public class Handler implements RequestHandler<APIGatewayProxyRequestEvent,
APIGatewayProxyResponseEvent>, Resource {
private final DynamoDBClient dynamodb = DynamoDBClient.create();
public Handler() {
Core.getGlobalContext().register(this); // Register CRaC hooks
}
@Override
public void beforeCheckpoint(org.crac.Context<?> ctx) {
dynamodb.describeEndpoints(); // Pre-warm connection before snapshot
}
@Override
public void afterRestore(org.crac.Context<?> ctx) {
// Re-validate connections after restore
}
}
# ❌ BAD — 3-second default is almost never enough [src1]
Resources:
MyFunction:
Type: AWS::Serverless::Function
Properties:
Handler: index.handler
Runtime: nodejs20.x
# No Timeout — defaults to 3 seconds; cold start + any call > 3s = failure
# ✅ GOOD — explicit timeout with memory tuning [src1]
Resources:
MyFunction:
Type: AWS::Serverless::Function
Properties:
Handler: index.handler
Runtime: nodejs20.x
MemorySize: 1769 # 1 full vCPU
Timeout: 30 # 30s for API backends
// ❌ BAD — imports entire SDK (~60MB), dramatically increases cold start [src7, src8]
const AWS = require('aws-sdk');
const dynamodb = new AWS.DynamoDB();
// Init Duration: 800-1500ms due to massive import
// ✅ GOOD — modular imports, minimal cold start impact [src7, src8]
const { DynamoDBClient } = require('@aws-sdk/client-dynamodb');
// Init Duration: 150-300ms — only loads what's needed
# ❌ BAD — new client every invocation, wasting warm-start reuse [src3, src8]
def handler(event, context):
dynamodb = boto3.resource('dynamodb') # NEW client every time
table = dynamodb.Table('my-table')
return table.get_item(Key={'id': event['id']})
# ✅ GOOD — client created once in INIT, reused across warm invocations [src3, src8]
import boto3
dynamodb = boto3.resource('dynamodb') # Created once during INIT
table = dynamodb.Table('my-table')
def handler(event, context):
return table.get_item(Key={'id': event['id']}) # Reuses warm connection
# ❌ BAD — VPC function with no internet path silently times out [src5]
VpcConfig:
SecurityGroupIds: [!Ref SG]
SubnetIds: [!Ref PrivateSubnet]
# No NAT gateway — ALL external HTTP calls will ETIMEDOUT
# ✅ GOOD — private subnet routes to NAT for internet; VPC endpoints for AWS [src5]
NATGateway:
Type: AWS::EC2::NatGateway
Properties:
SubnetId: !Ref PublicSubnet
AllocationId: !GetAtt EIP.AllocationId
PrivateRoute:
Type: AWS::EC2::Route
Properties:
DestinationCidrBlock: 0.0.0.0/0
NatGatewayId: !Ref NATGateway
ETIMEDOUT, not a clear networking error. [src5]# Check current function configuration
aws lambda get-function-configuration --function-name my-function \
--query '{Memory: MemorySize, Timeout: Timeout, Runtime: Runtime, VPC: VpcConfig.SubnetIds, SnapStart: SnapStart}'
# Find recent timeouts (CloudWatch Logs Insights)
fields @timestamp, @message
| filter @message like /Task timed out/
| sort @timestamp desc | limit 50
# Analyze cold start frequency
fields @timestamp, @initDuration, @duration, @maxMemoryUsed, @memorySize
| filter ispresent(@initDuration)
| stats count(*) as coldStarts, avg(@initDuration) as avgInitMs,
max(@initDuration) as maxInitMs, pct(@initDuration, 99) as p99InitMs
by bin(1h)
# Check memory utilization
fields @maxMemoryUsed, @memorySize, @duration
| stats avg(@maxMemoryUsed) as avgMemUsed, max(@maxMemoryUsed) as maxMemUsed
# Check provisioned concurrency status
aws lambda get-provisioned-concurrency-config \
--function-name my-function --qualifier live
# Check VPC route table for NAT gateway
aws ec2 describe-route-tables \
--filters "Name=association.subnet-id,Values=subnet-xxxxx" \
--query 'RouteTables[*].Routes[?DestinationCidrBlock==`0.0.0.0/0`]'
# Test invocation
aws lambda invoke --function-name my-function \
--payload '{"test": true}' --cli-read-timeout 60 response.json
| Feature | Available Since | Notes |
|---|---|---|
| Lambda timeout (max 15 min) | 2018 | Increased from 5 min; all runtimes [src1] |
| VPC Hyperplane ENI | 2019 | Eliminated ~10s VPC cold start penalty [src5] |
| Provisioned concurrency | Dec 2019 | All runtimes; eliminates cold starts [src3] |
| ARM64/Graviton support | 2021 | 20% faster cold starts vs x86; lower cost [src6] |
| SnapStart for Java | Nov 2022 | Java 11+ Corretto; free [src4] |
| INIT phase logging (INIT_REPORT) | Nov 2023 | Explicit Init/Restore phase error reporting [src3] |
| SnapStart for Python | Dec 2024 | Python 3.12+; caching/restore charges apply [src4] |
| SnapStart for .NET | Dec 2024 | .NET 8+; requires Annotations v1.6.0+ [src4] |
| Lambda Managed Instances | 2025 preview | Multi-concurrent execution on EC2-class instances |
| Use When | Don't Use When | Use Instead |
|---|---|---|
| Timeout <15 min and stateless workload | Processing >15 min | AWS Step Functions or ECS/Fargate |
| Cold start <2s is acceptable | Hard real-time <10ms requirement | EC2, ECS, or always-on containers |
| Traffic is spiky or unpredictable | Steady >1000 req/s sustained | ECS/Fargate with ALB (cheaper at scale) |
| SnapStart available for your runtime | Need zero cold starts guaranteed | Provisioned concurrency or containers |
| API Gateway integration <29s | Long-running API response >29s | Async invocation + polling, or WebSocket API |
Init Duration in CloudWatch, not just
averages. [src3, src7]