input validation best practices

- Bottom line: Always validate input server-side using allowlist (accept known good) validation with strict type coercion, length limits, and semantic checks -- blocklist approaches are fundamentally incomplete and bypassable.

server-side input validation

- Bottom line: Always validate input server-side using allowlist (accept known good) validation with strict type coercion, length limits, and semantic checks -- blocklist approaches are fundamentally incomplete and bypassable.

allowlist vs blocklist validation

- Bottom line: Always validate input server-side using allowlist (accept known good) validation with strict type coercion, length limits, and semantic checks -- blocklist approaches are fundamentally incomplete and bypassable.

data validation security patterns

- Bottom line: Always validate input server-side using allowlist (accept known good) validation with strict type coercion, length limits, and semantic checks -- blocklist approaches are fundamentally incomplete and bypassable.

CWE-20 input validation

- Bottom line: Always validate input server-side using allowlist (accept known good) validation with strict type coercion, length limits, and semantic checks -- blocklist approaches are fundamentally incomplete and bypassable.

Input Validation: Secure Data Validation Patterns

What are the best input validation patterns?

TL;DR

Bottom line: Always validate input server-side using allowlist (accept known good) validation with strict type coercion, length limits, and semantic checks -- blocklist approaches are fundamentally incomplete and bypassable.
Key tool/command: z.string().email().max(254) (Zod/TS), EmailStr + Field(max_length=254) (Pydantic/Python), @Email @Size(max=254) (Jakarta/Java), validate:"required,email,max=254" (Go validator).
Watch out for: Validating on the client side only -- any JavaScript validation is bypassed in seconds with browser DevTools or a proxy.
Works with: All server-side languages and frameworks. Libraries: Pydantic v2+ (Python), Zod 3.x (TypeScript), Joi 17.x (Node.js), Jakarta Validation 3.1 (Java), go-playground/validator v10 (Go).

Constraints

ALWAYS validate on the server side -- client-side validation is for UX only and is trivially bypassed
Use allowlist (accept known good) over blocklist (reject known bad) -- blocklists are incomplete by definition
Canonicalize and decode input BEFORE validation -- double-encoding bypasses validators
Validate BOTH syntactic correctness (format, type, length) AND semantic correctness (business rules, cross-field consistency)
NEVER use regex as the sole validation mechanism -- combine with type coercion, length limits, and range checks
Pin validation library versions -- breaking changes in Pydantic v1 to v2, Joi v16 to v17 changed APIs

Quick Reference

Validation Patterns

#	Validation Pattern	Description	When to Use	Risk if Skipped
1	Allowlist (accept known good)	Define exactly what is valid; reject everything else	All discrete inputs (enums, choices, known formats)	Injection attacks bypass incomplete checks
2	Type coercion	Convert input to expected type (int, float, bool, date)	Numeric fields, dates, booleans	Type confusion, integer overflow, NaN propagation
3	Length limits	Enforce min/max length on strings, arrays, file sizes	All string and collection inputs	Buffer overflow, DoS via oversized payloads, ReDoS
4	Range checks	Validate numeric values fall within expected bounds	Prices, quantities, ages, coordinates	Negative quantities, overflow, business logic abuse
5	Regex patterns	Match input against format patterns (anchored: `^...$`)	Emails, phone numbers, postal codes, IDs	Malformed data, injection via unexpected characters
6	Encoding/normalization	Canonicalize Unicode, decode URL-encoding before validation	All text inputs, especially multi-byte	Double-encoding bypasses, homoglyph attacks
7	Schema validation	Validate entire request structure (JSON Schema, Zod, Pydantic)	API payloads, complex nested objects	Missing fields, extra fields, type mismatches
8	Semantic validation	Cross-field consistency (start < end, total = sum of parts)	Business logic, date ranges, financial data	Logic bugs, data corruption, fraud
9	Sanitization	Strip or encode dangerous characters AFTER validation	Rich text, HTML inputs (as defense-in-depth)	XSS, injection if validation alone is insufficient
10	File validation	Check MIME type, magic bytes, size, extension, filename	File uploads	Arbitrary code execution, path traversal

Language/Framework Quick Map

Language	Primary Library	Schema Example	Key Feature
Python	Pydantic v2	`class User(BaseModel): email: EmailStr`	Type-safe, fast Rust core, auto-coercion
TypeScript	Zod 3.x	`z.object({ email: z.string().email() })`	Static type inference from schema
Node.js	Joi 17.x	`Joi.object({ email: Joi.string().email() })`	Fluent API, detailed error messages
Java	Jakarta Validation 3.1	`@Email @NotBlank String email;`	Annotation-driven, framework-integrated
Go	validator v10	Email string `validate:"required,email"`	Struct tag-based, Gin integration
C#	FluentValidation	`RuleFor(x => x.Email).NotEmpty().EmailAddress()`	LINQ-like, testable rules

Decision Tree

START: What kind of input are you validating?
├── Discrete/enumerated value (country, status, category)?
│   ├── YES → Allowlist: check against exact set of valid values
│   └── NO ↓
├── Structured data type (email, URL, phone, date, UUID)?
│   ├── YES → Use library-provided validators + length limits + semantic checks
│   └── NO ↓
├── Numeric value (price, quantity, age)?
│   ├── YES → Type coerce to number + range check (min/max) + reject NaN/Infinity
│   └── NO ↓
├── Free-form text (name, comment, description)?
│   ├── YES → Length limit + Unicode normalization + encoding on OUTPUT
│   └── NO ↓
├── File upload?
│   ├── YES → Validate extension + MIME type + magic bytes + size limit + rename
│   └── NO ↓
├── Complex nested object (API payload)?
│   ├── YES → Schema validation (Pydantic, Zod, JSON Schema) + semantic checks
│   └── NO ↓
└── DEFAULT → Type coerce + length limit + allowlist characters + server-side only

Step-by-Step Guide

1. Define your validation schema at the boundary

Define validation schemas at the point where data enters your application (API endpoints, form handlers, CLI parsers). Never validate deep inside business logic. [src1]

# Python: Pydantic v2 -- define schema at API boundary
from pydantic import BaseModel, Field, EmailStr, field_validator

class CreateUserRequest(BaseModel):
    email: EmailStr
    name: str = Field(min_length=1, max_length=100)
    age: int = Field(ge=13, le=150)
    role: str = Field(pattern=r'^(admin|user|viewer)$')

Verify: CreateUserRequest(email="bad", name="", age=5, role="hacker") raises ValidationError with specific field errors.

2. Implement allowlist validation for discrete values

For any input that should be one of a known set of values, validate against an explicit allowlist. Never use blocklists for enumerated data. [src1]

// TypeScript: Zod -- allowlist via enum
import { z } from 'zod';  // ^3.22.0

const RoleSchema = z.enum(['admin', 'user', 'viewer']);
const CreateUserSchema = z.object({
  email: z.string().email().max(254),
  name: z.string().min(1).max(100).trim(),
  role: RoleSchema,
}).strict();  // Reject unknown keys

Verify: CreateUserSchema.safeParse({ role: 'superadmin' }) returns { success: false }.

3. Add type coercion with strict error handling

Convert inputs to expected types early. Reject values that cannot be cleanly coerced. [src2]

// Go: validator v10 -- struct tag validation
type CreateUserRequest struct {
    Email string `json:"email" validate:"required,email,max=254"`
    Name  string `json:"name"  validate:"required,min=1,max=100"`
    Age   int    `json:"age"   validate:"required,gte=13,lte=150"`
    Role  string `json:"role"  validate:"required,oneof=admin user viewer"`
}

Verify: validate.Struct(req) with invalid fields returns validator.ValidationErrors.

4. Validate at the semantic level

After syntactic validation passes, check business rules: cross-field consistency, temporal logic, and domain constraints. [src7]

# Pydantic -- semantic (cross-field) validation
from pydantic import model_validator

class BookingRequest(BaseModel):
    check_in: date
    check_out: date

    @model_validator(mode='after')
    def validate_dates(self):
        if self.check_out <= self.check_in:
            raise ValueError('check_out must be after check_in')
        return self

Verify: Reversed dates raise ValidationError.

5. Canonicalize input before validation

Decode and normalize input before applying validation rules to prevent double-encoding attacks. [src2]

import unicodedata

def canonicalize(value: str) -> str:
    normalized = unicodedata.normalize('NFC', value)
    cleaned = ''.join(
        c for c in normalized
        if unicodedata.category(c) != 'Cc' or c in ('\n', '\t')
    )
    return cleaned.strip()

Verify: Control characters like \x00 are stripped; decomposed Unicode is composed.

Code Examples

Python (Pydantic v2): API Request Validation

# Input:  Raw JSON request body from HTTP POST
# Output: Validated, typed Python object or ValidationError

from pydantic import BaseModel, Field, EmailStr, field_validator
from pydantic import ConfigDict
from enum import Enum

class UserRole(str, Enum):
    admin = "admin"
    user = "user"
    viewer = "viewer"

class CreateUserRequest(BaseModel):
    model_config = ConfigDict(str_strip_whitespace=True)
    email: EmailStr
    name: str = Field(min_length=1, max_length=100)
    age: int = Field(ge=13, le=150)
    role: UserRole = UserRole.user
    bio: str | None = Field(default=None, max_length=500)

TypeScript (Zod 3.x): Form and API Validation

// Input:  Unknown data from request body or form submission
// Output: Typed object or ZodError with field-level details

import { z } from 'zod';  // ^3.22.0

const CreateUserSchema = z.object({
  email: z.string().email().max(254).toLowerCase(),
  name: z.string().min(1).max(100).trim(),
  age: z.coerce.number().int().min(13).max(150),
  role: z.enum(['admin', 'user', 'viewer']).default('user'),
}).strict();

type CreateUser = z.infer<typeof CreateUserSchema>;

Java (Jakarta Validation 3.1): Bean Validation

// Input:  Request DTO from Spring MVC / JAX-RS
// Output: Validated bean or ConstraintViolationException

import jakarta.validation.constraints.*;

public record CreateUserRequest(
    @NotBlank @Email @Size(max = 254)
    String email,

    @NotBlank @Size(min = 1, max = 100)
    @Pattern(regexp = "^[\\p{L} '-]+$")
    String name,

    @NotNull @Min(13) @Max(150)
    Integer age,

    @NotNull @Pattern(regexp = "^(admin|user|viewer)$")
    String role
) {}

Go (validator v10): Struct Tag Validation

// Input:  JSON-decoded struct from HTTP request
// Output: nil (valid) or validator.ValidationErrors

type CreateUserRequest struct {
    Email string `json:"email" validate:"required,email,max=254"`
    Name  string `json:"name"  validate:"required,min=1,max=100"`
    Age   int    `json:"age"   validate:"required,gte=13,lte=150"`
    Role  string `json:"role"  validate:"required,oneof=admin user viewer"`
    Bio   string `json:"bio"   validate:"omitempty,max=500"`
}

var validate = validator.New()

func ValidateUser(req *CreateUserRequest) error {
    req.Email = strings.TrimSpace(strings.ToLower(req.Email))
    req.Name = strings.TrimSpace(req.Name)
    return validate.Struct(req)
}

Anti-Patterns

Wrong: Blocklist-based validation

# BAD -- blocklist filtering is trivially bypassed
def validate_input(value):
    dangerous = ['<script>', 'DROP TABLE', 'eval(', '../']
    for d in dangerous:
        if d.lower() in value.lower():
            raise ValueError('Dangerous input detected')
    return value
# Bypassed by: <scr<script>ipt>, DR/**/OP TABLE, e\x76al(, ..%2f

Correct: Allowlist with type coercion

# GOOD -- define what is valid, reject everything else
from pydantic import BaseModel, Field
from enum import Enum

class Status(str, Enum):
    active = "active"
    inactive = "inactive"

class UpdateRequest(BaseModel):
    status: Status
    count: int = Field(ge=0, le=1000)
    name: str = Field(max_length=100)

Wrong: Client-side validation only

// BAD -- client-side validation provides zero security
function submitForm() {
  const email = document.getElementById('email').value;
  if (!email.includes('@')) { alert('Invalid email'); return; }
  // Attacker bypasses with: curl -X POST -d 'email=<script>alert(1)</script>'
  fetch('/api/users', { method: 'POST', body: JSON.stringify({ email }) });
}

Correct: Server-side validation with client-side UX

// GOOD -- server-side validation is authoritative
import { z } from 'zod';

const EmailSchema = z.string().email().max(254);

app.post('/api/users', (req, res) => {
  const result = EmailSchema.safeParse(req.body.email);
  if (!result.success) {
    return res.status(400).json({ error: result.error.flatten() });
  }
});

Wrong: Regex-only validation without length limits

# BAD -- regex without length limit enables ReDoS
import re
email_re = re.compile(r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$')

def validate_email(email):
    if email_re.match(email):  # No length check!
        return True  # 10MB string causes catastrophic backtracking
    return False

Correct: Length limit before regex

# GOOD -- check length BEFORE applying regex
def validate_email(email: str) -> bool:
    if not email or len(email) > 254:  # RFC 5321 limit
        return False
    if len(email.split('@')[0]) > 64:  # Local part limit
        return False
    return bool(re.match(r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$', email))

Wrong: Trusting deserialized data

# BAD -- deserializing untrusted input without validation
import json
data = json.loads(request.body)
user_id = data['user_id']   # No type check
quantity = data['quantity']  # No range check
db.execute(f"UPDATE orders SET qty={quantity} WHERE user={user_id}")

Correct: Schema validation then parameterized queries

# GOOD -- validate schema, then use parameterized queries
from pydantic import BaseModel, Field

class UpdateOrderRequest(BaseModel):
    user_id: int = Field(gt=0)
    quantity: int = Field(ge=1, le=9999)

req = UpdateOrderRequest.model_validate_json(request.body)
db.execute("UPDATE orders SET qty=%s WHERE user=%s", (req.quantity, req.user_id))

Common Pitfalls

Validating input but not encoding output: Input validation and output encoding solve different problems. Validate input for correctness; encode output for context-specific safety. Neither replaces the other. [src1]
ReDoS via unbounded regex: Complex regex patterns on long input cause catastrophic backtracking. Fix: Always enforce length limits before regex; use RE2-compatible patterns; set regex timeouts. [src2]
Forgetting Unicode normalization: Equivalent Unicode representations bypass character allowlists. Fix: Apply unicodedata.normalize('NFC', input) before validation. [src1]
Validating only at the API gateway: Middleware validates structure but business logic receives raw data from internal services. Fix: Validate at every trust boundary. [src7]
Pydantic v1 to v2 migration breakage: @validator replaced by @field_validator; class Config replaced by model_config. Fix: Follow the Pydantic v2 migration guide; use bump-pydantic tool. [src3]
Zod .parse() vs .safeParse(): Using .parse() throws exceptions on invalid input, crashing Express if uncaught. Fix: Use .safeParse() in request handlers and check result.success. [src4]
Silently coercing bad data: Automatic type coercion hides bugs. Fix: Use strict mode (Pydantic strict=True, z.number() without z.coerce). [src3]
Not validating array/collection sizes: Accepting unbounded arrays allows memory exhaustion DoS. Fix: Always set max_length on list/array fields. [src2]

Diagnostic Commands

# Test Pydantic validation in Python REPL
python -c "
from pydantic import BaseModel, EmailStr, Field
class T(BaseModel):
    email: EmailStr
    age: int = Field(ge=0, le=150)
try: T(email='bad', age=-1)
except Exception as e: print(e)
"

# Test Zod validation in Node.js
node -e "
const {z} = require('zod');
const s = z.object({email: z.string().email(), age: z.number().int().min(0)});
console.log(s.safeParse({email:'bad', age:-1}));
"

# Find unvalidated request body usage (Node.js/Express)
grep -rn 'req\.body\.' --include="*.js" --include="*.ts" . | grep -v 'validate\|schema\|parse'

# Find raw SQL string interpolation (Python)
grep -rn 'f".*SELECT\|f".*INSERT\|f".*UPDATE\|f".*DELETE' --include="*.py" .

# Audit Java controllers for missing @Valid annotation
grep -rn '@RequestBody' --include="*.java" . | grep -v '@Valid'

Version History & Compatibility

Library	Version	Status	Key Change
Pydantic	v2.x	Current	Rust-powered core, @field_validator, 5-50x faster
Pydantic	v1.x	EOL (2024)	@validator, class Config -- use bump-pydantic to migrate
Zod	3.x	Current/Stable	z.coerce, .pipe(), .brand(), discriminated unions
Joi	17.x	Current	ESM support, TypeScript types
Joi	16.x	Deprecated	Package was @hapi/joi
Jakarta Validation	3.1	Current	jakarta.validation.* namespace
Jakarta Validation	2.0	Legacy	javax.validation.* -- requires namespace migration
go-playground/validator	v10	Current	Custom validators, struct-level validation, dive support
FluentValidation (.NET)	11.x	Current	.NET 8 support, async validators

When to Use / When Not to Use

Use When	Don't Use When	Use Instead
Any user-supplied data enters your application	Processing fully trusted internal system data	Basic type assertions may suffice
Building APIs that accept JSON/form payloads	Validating output for display	Output encoding (context-specific escaping)
File upload processing	Simple static config file parsing	Config libraries with built-in schema
CLI tools accepting user arguments	Data already validated by upstream service in same trust zone	Pass validated types between services
Preventing injection attacks at the boundary	Replacing parameterized queries for SQL	Parameterized queries + input validation together

Important Caveats

Input validation is necessary but NOT sufficient for security -- it must be combined with output encoding, parameterized queries, and defense-in-depth measures
Validation library APIs change between major versions (Pydantic v1 vs v2, Joi 16 vs 17) -- always check the docs for your specific version
Unicode and encoding edge cases (homoglyphs, bidirectional text, zero-width characters) require explicit handling beyond basic regex patterns
Regex-based validation must be tested for ReDoS with tools like rxxr2 or safe-regex -- catastrophic backtracking can cause denial of service
Client-side and server-side validation should use the same schema definition when possible (Zod enables this for full-stack TypeScript) to avoid drift
Performance impact: deep validation of large payloads adds latency -- validate early and fail fast, set depth limits