Input Validation: Secure Data Validation Patterns
What are the best input validation patterns?
TL;DR
- Bottom line: Always validate input server-side using allowlist (accept known good) validation with strict type coercion, length limits, and semantic checks -- blocklist approaches are fundamentally incomplete and bypassable.
- Key tool/command:
z.string().email().max(254)(Zod/TS),EmailStr+Field(max_length=254)(Pydantic/Python),@Email @Size(max=254)(Jakarta/Java),validate:"required,email,max=254"(Go validator). - Watch out for: Validating on the client side only -- any JavaScript validation is bypassed in seconds with browser DevTools or a proxy.
- Works with: All server-side languages and frameworks. Libraries: Pydantic v2+ (Python), Zod 3.x (TypeScript), Joi 17.x (Node.js), Jakarta Validation 3.1 (Java), go-playground/validator v10 (Go).
Constraints
- ALWAYS validate on the server side -- client-side validation is for UX only and is trivially bypassed
- Use allowlist (accept known good) over blocklist (reject known bad) -- blocklists are incomplete by definition
- Canonicalize and decode input BEFORE validation -- double-encoding bypasses validators
- Validate BOTH syntactic correctness (format, type, length) AND semantic correctness (business rules, cross-field consistency)
- NEVER use regex as the sole validation mechanism -- combine with type coercion, length limits, and range checks
- Pin validation library versions -- breaking changes in Pydantic v1 to v2, Joi v16 to v17 changed APIs
Quick Reference
Validation Patterns
| # | Validation Pattern | Description | When to Use | Risk if Skipped |
|---|---|---|---|---|
| 1 | Allowlist (accept known good) | Define exactly what is valid; reject everything else | All discrete inputs (enums, choices, known formats) | Injection attacks bypass incomplete checks |
| 2 | Type coercion | Convert input to expected type (int, float, bool, date) | Numeric fields, dates, booleans | Type confusion, integer overflow, NaN propagation |
| 3 | Length limits | Enforce min/max length on strings, arrays, file sizes | All string and collection inputs | Buffer overflow, DoS via oversized payloads, ReDoS |
| 4 | Range checks | Validate numeric values fall within expected bounds | Prices, quantities, ages, coordinates | Negative quantities, overflow, business logic abuse |
| 5 | Regex patterns | Match input against format patterns (anchored: ^...$) | Emails, phone numbers, postal codes, IDs | Malformed data, injection via unexpected characters |
| 6 | Encoding/normalization | Canonicalize Unicode, decode URL-encoding before validation | All text inputs, especially multi-byte | Double-encoding bypasses, homoglyph attacks |
| 7 | Schema validation | Validate entire request structure (JSON Schema, Zod, Pydantic) | API payloads, complex nested objects | Missing fields, extra fields, type mismatches |
| 8 | Semantic validation | Cross-field consistency (start < end, total = sum of parts) | Business logic, date ranges, financial data | Logic bugs, data corruption, fraud |
| 9 | Sanitization | Strip or encode dangerous characters AFTER validation | Rich text, HTML inputs (as defense-in-depth) | XSS, injection if validation alone is insufficient |
| 10 | File validation | Check MIME type, magic bytes, size, extension, filename | File uploads | Arbitrary code execution, path traversal |
Language/Framework Quick Map
| Language | Primary Library | Schema Example | Key Feature |
|---|---|---|---|
| Python | Pydantic v2 | class User(BaseModel): email: EmailStr | Type-safe, fast Rust core, auto-coercion |
| TypeScript | Zod 3.x | z.object({ email: z.string().email() }) | Static type inference from schema |
| Node.js | Joi 17.x | Joi.object({ email: Joi.string().email() }) | Fluent API, detailed error messages |
| Java | Jakarta Validation 3.1 | @Email @NotBlank String email; | Annotation-driven, framework-integrated |
| Go | validator v10 | Email string `validate:"required,email"` | Struct tag-based, Gin integration |
| C# | FluentValidation | RuleFor(x => x.Email).NotEmpty().EmailAddress() | LINQ-like, testable rules |
Decision Tree
START: What kind of input are you validating?
├── Discrete/enumerated value (country, status, category)?
│ ├── YES → Allowlist: check against exact set of valid values
│ └── NO ↓
├── Structured data type (email, URL, phone, date, UUID)?
│ ├── YES → Use library-provided validators + length limits + semantic checks
│ └── NO ↓
├── Numeric value (price, quantity, age)?
│ ├── YES → Type coerce to number + range check (min/max) + reject NaN/Infinity
│ └── NO ↓
├── Free-form text (name, comment, description)?
│ ├── YES → Length limit + Unicode normalization + encoding on OUTPUT
│ └── NO ↓
├── File upload?
│ ├── YES → Validate extension + MIME type + magic bytes + size limit + rename
│ └── NO ↓
├── Complex nested object (API payload)?
│ ├── YES → Schema validation (Pydantic, Zod, JSON Schema) + semantic checks
│ └── NO ↓
└── DEFAULT → Type coerce + length limit + allowlist characters + server-side only
Step-by-Step Guide
1. Define your validation schema at the boundary
Define validation schemas at the point where data enters your application (API endpoints, form handlers, CLI parsers). Never validate deep inside business logic. [src1]
# Python: Pydantic v2 -- define schema at API boundary
from pydantic import BaseModel, Field, EmailStr, field_validator
class CreateUserRequest(BaseModel):
email: EmailStr
name: str = Field(min_length=1, max_length=100)
age: int = Field(ge=13, le=150)
role: str = Field(pattern=r'^(admin|user|viewer)$')
Verify: CreateUserRequest(email="bad", name="", age=5, role="hacker") raises ValidationError with specific field errors.
2. Implement allowlist validation for discrete values
For any input that should be one of a known set of values, validate against an explicit allowlist. Never use blocklists for enumerated data. [src1]
// TypeScript: Zod -- allowlist via enum
import { z } from 'zod'; // ^3.22.0
const RoleSchema = z.enum(['admin', 'user', 'viewer']);
const CreateUserSchema = z.object({
email: z.string().email().max(254),
name: z.string().min(1).max(100).trim(),
role: RoleSchema,
}).strict(); // Reject unknown keys
Verify: CreateUserSchema.safeParse({ role: 'superadmin' }) returns { success: false }.
3. Add type coercion with strict error handling
Convert inputs to expected types early. Reject values that cannot be cleanly coerced. [src2]
// Go: validator v10 -- struct tag validation
type CreateUserRequest struct {
Email string `json:"email" validate:"required,email,max=254"`
Name string `json:"name" validate:"required,min=1,max=100"`
Age int `json:"age" validate:"required,gte=13,lte=150"`
Role string `json:"role" validate:"required,oneof=admin user viewer"`
}
Verify: validate.Struct(req) with invalid fields returns validator.ValidationErrors.
4. Validate at the semantic level
After syntactic validation passes, check business rules: cross-field consistency, temporal logic, and domain constraints. [src7]
# Pydantic -- semantic (cross-field) validation
from pydantic import model_validator
class BookingRequest(BaseModel):
check_in: date
check_out: date
@model_validator(mode='after')
def validate_dates(self):
if self.check_out <= self.check_in:
raise ValueError('check_out must be after check_in')
return self
Verify: Reversed dates raise ValidationError.
5. Canonicalize input before validation
Decode and normalize input before applying validation rules to prevent double-encoding attacks. [src2]
import unicodedata
def canonicalize(value: str) -> str:
normalized = unicodedata.normalize('NFC', value)
cleaned = ''.join(
c for c in normalized
if unicodedata.category(c) != 'Cc' or c in ('\n', '\t')
)
return cleaned.strip()
Verify: Control characters like \x00 are stripped; decomposed Unicode is composed.
Code Examples
Python (Pydantic v2): API Request Validation
# Input: Raw JSON request body from HTTP POST
# Output: Validated, typed Python object or ValidationError
from pydantic import BaseModel, Field, EmailStr, field_validator
from pydantic import ConfigDict
from enum import Enum
class UserRole(str, Enum):
admin = "admin"
user = "user"
viewer = "viewer"
class CreateUserRequest(BaseModel):
model_config = ConfigDict(str_strip_whitespace=True)
email: EmailStr
name: str = Field(min_length=1, max_length=100)
age: int = Field(ge=13, le=150)
role: UserRole = UserRole.user
bio: str | None = Field(default=None, max_length=500)
TypeScript (Zod 3.x): Form and API Validation
// Input: Unknown data from request body or form submission
// Output: Typed object or ZodError with field-level details
import { z } from 'zod'; // ^3.22.0
const CreateUserSchema = z.object({
email: z.string().email().max(254).toLowerCase(),
name: z.string().min(1).max(100).trim(),
age: z.coerce.number().int().min(13).max(150),
role: z.enum(['admin', 'user', 'viewer']).default('user'),
}).strict();
type CreateUser = z.infer<typeof CreateUserSchema>;
Java (Jakarta Validation 3.1): Bean Validation
// Input: Request DTO from Spring MVC / JAX-RS
// Output: Validated bean or ConstraintViolationException
import jakarta.validation.constraints.*;
public record CreateUserRequest(
@NotBlank @Email @Size(max = 254)
String email,
@NotBlank @Size(min = 1, max = 100)
@Pattern(regexp = "^[\\p{L} '-]+$")
String name,
@NotNull @Min(13) @Max(150)
Integer age,
@NotNull @Pattern(regexp = "^(admin|user|viewer)$")
String role
) {}
Go (validator v10): Struct Tag Validation
// Input: JSON-decoded struct from HTTP request
// Output: nil (valid) or validator.ValidationErrors
type CreateUserRequest struct {
Email string `json:"email" validate:"required,email,max=254"`
Name string `json:"name" validate:"required,min=1,max=100"`
Age int `json:"age" validate:"required,gte=13,lte=150"`
Role string `json:"role" validate:"required,oneof=admin user viewer"`
Bio string `json:"bio" validate:"omitempty,max=500"`
}
var validate = validator.New()
func ValidateUser(req *CreateUserRequest) error {
req.Email = strings.TrimSpace(strings.ToLower(req.Email))
req.Name = strings.TrimSpace(req.Name)
return validate.Struct(req)
}
Anti-Patterns
Wrong: Blocklist-based validation
# BAD -- blocklist filtering is trivially bypassed
def validate_input(value):
dangerous = ['<script>', 'DROP TABLE', 'eval(', '../']
for d in dangerous:
if d.lower() in value.lower():
raise ValueError('Dangerous input detected')
return value
# Bypassed by: <scr<script>ipt>, DR/**/OP TABLE, e\x76al(, ..%2f
Correct: Allowlist with type coercion
# GOOD -- define what is valid, reject everything else
from pydantic import BaseModel, Field
from enum import Enum
class Status(str, Enum):
active = "active"
inactive = "inactive"
class UpdateRequest(BaseModel):
status: Status
count: int = Field(ge=0, le=1000)
name: str = Field(max_length=100)
Wrong: Client-side validation only
// BAD -- client-side validation provides zero security
function submitForm() {
const email = document.getElementById('email').value;
if (!email.includes('@')) { alert('Invalid email'); return; }
// Attacker bypasses with: curl -X POST -d 'email=<script>alert(1)</script>'
fetch('/api/users', { method: 'POST', body: JSON.stringify({ email }) });
}
Correct: Server-side validation with client-side UX
// GOOD -- server-side validation is authoritative
import { z } from 'zod';
const EmailSchema = z.string().email().max(254);
app.post('/api/users', (req, res) => {
const result = EmailSchema.safeParse(req.body.email);
if (!result.success) {
return res.status(400).json({ error: result.error.flatten() });
}
});
Wrong: Regex-only validation without length limits
# BAD -- regex without length limit enables ReDoS
import re
email_re = re.compile(r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$')
def validate_email(email):
if email_re.match(email): # No length check!
return True # 10MB string causes catastrophic backtracking
return False
Correct: Length limit before regex
# GOOD -- check length BEFORE applying regex
def validate_email(email: str) -> bool:
if not email or len(email) > 254: # RFC 5321 limit
return False
if len(email.split('@')[0]) > 64: # Local part limit
return False
return bool(re.match(r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$', email))
Wrong: Trusting deserialized data
# BAD -- deserializing untrusted input without validation
import json
data = json.loads(request.body)
user_id = data['user_id'] # No type check
quantity = data['quantity'] # No range check
db.execute(f"UPDATE orders SET qty={quantity} WHERE user={user_id}")
Correct: Schema validation then parameterized queries
# GOOD -- validate schema, then use parameterized queries
from pydantic import BaseModel, Field
class UpdateOrderRequest(BaseModel):
user_id: int = Field(gt=0)
quantity: int = Field(ge=1, le=9999)
req = UpdateOrderRequest.model_validate_json(request.body)
db.execute("UPDATE orders SET qty=%s WHERE user=%s", (req.quantity, req.user_id))
Common Pitfalls
- Validating input but not encoding output: Input validation and output encoding solve different problems. Validate input for correctness; encode output for context-specific safety. Neither replaces the other. [src1]
- ReDoS via unbounded regex: Complex regex patterns on long input cause catastrophic backtracking. Fix: Always enforce length limits before regex; use RE2-compatible patterns; set regex timeouts. [src2]
- Forgetting Unicode normalization: Equivalent Unicode representations bypass character allowlists. Fix: Apply
unicodedata.normalize('NFC', input)before validation. [src1] - Validating only at the API gateway: Middleware validates structure but business logic receives raw data from internal services. Fix: Validate at every trust boundary. [src7]
- Pydantic v1 to v2 migration breakage:
@validatorreplaced by@field_validator;class Configreplaced bymodel_config. Fix: Follow the Pydantic v2 migration guide; usebump-pydantictool. [src3] - Zod .parse() vs .safeParse(): Using
.parse()throws exceptions on invalid input, crashing Express if uncaught. Fix: Use.safeParse()in request handlers and checkresult.success. [src4] - Silently coercing bad data: Automatic type coercion hides bugs. Fix: Use strict mode (
Pydantic strict=True,z.number()withoutz.coerce). [src3] - Not validating array/collection sizes: Accepting unbounded arrays allows memory exhaustion DoS. Fix: Always set
max_lengthon list/array fields. [src2]
Diagnostic Commands
# Test Pydantic validation in Python REPL
python -c "
from pydantic import BaseModel, EmailStr, Field
class T(BaseModel):
email: EmailStr
age: int = Field(ge=0, le=150)
try: T(email='bad', age=-1)
except Exception as e: print(e)
"
# Test Zod validation in Node.js
node -e "
const {z} = require('zod');
const s = z.object({email: z.string().email(), age: z.number().int().min(0)});
console.log(s.safeParse({email:'bad', age:-1}));
"
# Find unvalidated request body usage (Node.js/Express)
grep -rn 'req\.body\.' --include="*.js" --include="*.ts" . | grep -v 'validate\|schema\|parse'
# Find raw SQL string interpolation (Python)
grep -rn 'f".*SELECT\|f".*INSERT\|f".*UPDATE\|f".*DELETE' --include="*.py" .
# Audit Java controllers for missing @Valid annotation
grep -rn '@RequestBody' --include="*.java" . | grep -v '@Valid'
Version History & Compatibility
| Library | Version | Status | Key Change |
|---|---|---|---|
| Pydantic | v2.x | Current | Rust-powered core, @field_validator, 5-50x faster |
| Pydantic | v1.x | EOL (2024) | @validator, class Config -- use bump-pydantic to migrate |
| Zod | 3.x | Current/Stable | z.coerce, .pipe(), .brand(), discriminated unions |
| Joi | 17.x | Current | ESM support, TypeScript types |
| Joi | 16.x | Deprecated | Package was @hapi/joi |
| Jakarta Validation | 3.1 | Current | jakarta.validation.* namespace |
| Jakarta Validation | 2.0 | Legacy | javax.validation.* -- requires namespace migration |
| go-playground/validator | v10 | Current | Custom validators, struct-level validation, dive support |
| FluentValidation (.NET) | 11.x | Current | .NET 8 support, async validators |
When to Use / When Not to Use
| Use When | Don't Use When | Use Instead |
|---|---|---|
| Any user-supplied data enters your application | Processing fully trusted internal system data | Basic type assertions may suffice |
| Building APIs that accept JSON/form payloads | Validating output for display | Output encoding (context-specific escaping) |
| File upload processing | Simple static config file parsing | Config libraries with built-in schema |
| CLI tools accepting user arguments | Data already validated by upstream service in same trust zone | Pass validated types between services |
| Preventing injection attacks at the boundary | Replacing parameterized queries for SQL | Parameterized queries + input validation together |
Important Caveats
- Input validation is necessary but NOT sufficient for security -- it must be combined with output encoding, parameterized queries, and defense-in-depth measures
- Validation library APIs change between major versions (Pydantic v1 vs v2, Joi 16 vs 17) -- always check the docs for your specific version
- Unicode and encoding edge cases (homoglyphs, bidirectional text, zero-width characters) require explicit handling beyond basic regex patterns
- Regex-based validation must be tested for ReDoS with tools like rxxr2 or safe-regex -- catastrophic backtracking can cause denial of service
- Client-side and server-side validation should use the same schema definition when possible (Zod enables this for full-stack TypeScript) to avoid drift
- Performance impact: deep validation of large payloads adds latency -- validate early and fail fast, set depth limits