Input Validation: Secure Data Validation Patterns

Type: Software Reference Confidence: 0.94 Sources: 7 Verified: 2026-02-27 Freshness: 2026-02-27

TL;DR

Constraints

Quick Reference

Validation Patterns

#Validation PatternDescriptionWhen to UseRisk if Skipped
1Allowlist (accept known good)Define exactly what is valid; reject everything elseAll discrete inputs (enums, choices, known formats)Injection attacks bypass incomplete checks
2Type coercionConvert input to expected type (int, float, bool, date)Numeric fields, dates, booleansType confusion, integer overflow, NaN propagation
3Length limitsEnforce min/max length on strings, arrays, file sizesAll string and collection inputsBuffer overflow, DoS via oversized payloads, ReDoS
4Range checksValidate numeric values fall within expected boundsPrices, quantities, ages, coordinatesNegative quantities, overflow, business logic abuse
5Regex patternsMatch input against format patterns (anchored: ^...$)Emails, phone numbers, postal codes, IDsMalformed data, injection via unexpected characters
6Encoding/normalizationCanonicalize Unicode, decode URL-encoding before validationAll text inputs, especially multi-byteDouble-encoding bypasses, homoglyph attacks
7Schema validationValidate entire request structure (JSON Schema, Zod, Pydantic)API payloads, complex nested objectsMissing fields, extra fields, type mismatches
8Semantic validationCross-field consistency (start < end, total = sum of parts)Business logic, date ranges, financial dataLogic bugs, data corruption, fraud
9SanitizationStrip or encode dangerous characters AFTER validationRich text, HTML inputs (as defense-in-depth)XSS, injection if validation alone is insufficient
10File validationCheck MIME type, magic bytes, size, extension, filenameFile uploadsArbitrary code execution, path traversal

Language/Framework Quick Map

LanguagePrimary LibrarySchema ExampleKey Feature
PythonPydantic v2class User(BaseModel): email: EmailStrType-safe, fast Rust core, auto-coercion
TypeScriptZod 3.xz.object({ email: z.string().email() })Static type inference from schema
Node.jsJoi 17.xJoi.object({ email: Joi.string().email() })Fluent API, detailed error messages
JavaJakarta Validation 3.1@Email @NotBlank String email;Annotation-driven, framework-integrated
Govalidator v10Email string `validate:"required,email"`Struct tag-based, Gin integration
C#FluentValidationRuleFor(x => x.Email).NotEmpty().EmailAddress()LINQ-like, testable rules

Decision Tree

START: What kind of input are you validating?
├── Discrete/enumerated value (country, status, category)?
│   ├── YES → Allowlist: check against exact set of valid values
│   └── NO ↓
├── Structured data type (email, URL, phone, date, UUID)?
│   ├── YES → Use library-provided validators + length limits + semantic checks
│   └── NO ↓
├── Numeric value (price, quantity, age)?
│   ├── YES → Type coerce to number + range check (min/max) + reject NaN/Infinity
│   └── NO ↓
├── Free-form text (name, comment, description)?
│   ├── YES → Length limit + Unicode normalization + encoding on OUTPUT
│   └── NO ↓
├── File upload?
│   ├── YES → Validate extension + MIME type + magic bytes + size limit + rename
│   └── NO ↓
├── Complex nested object (API payload)?
│   ├── YES → Schema validation (Pydantic, Zod, JSON Schema) + semantic checks
│   └── NO ↓
└── DEFAULT → Type coerce + length limit + allowlist characters + server-side only

Step-by-Step Guide

1. Define your validation schema at the boundary

Define validation schemas at the point where data enters your application (API endpoints, form handlers, CLI parsers). Never validate deep inside business logic. [src1]

# Python: Pydantic v2 -- define schema at API boundary
from pydantic import BaseModel, Field, EmailStr, field_validator

class CreateUserRequest(BaseModel):
    email: EmailStr
    name: str = Field(min_length=1, max_length=100)
    age: int = Field(ge=13, le=150)
    role: str = Field(pattern=r'^(admin|user|viewer)$')

Verify: CreateUserRequest(email="bad", name="", age=5, role="hacker") raises ValidationError with specific field errors.

2. Implement allowlist validation for discrete values

For any input that should be one of a known set of values, validate against an explicit allowlist. Never use blocklists for enumerated data. [src1]

// TypeScript: Zod -- allowlist via enum
import { z } from 'zod';  // ^3.22.0

const RoleSchema = z.enum(['admin', 'user', 'viewer']);
const CreateUserSchema = z.object({
  email: z.string().email().max(254),
  name: z.string().min(1).max(100).trim(),
  role: RoleSchema,
}).strict();  // Reject unknown keys

Verify: CreateUserSchema.safeParse({ role: 'superadmin' }) returns { success: false }.

3. Add type coercion with strict error handling

Convert inputs to expected types early. Reject values that cannot be cleanly coerced. [src2]

// Go: validator v10 -- struct tag validation
type CreateUserRequest struct {
    Email string `json:"email" validate:"required,email,max=254"`
    Name  string `json:"name"  validate:"required,min=1,max=100"`
    Age   int    `json:"age"   validate:"required,gte=13,lte=150"`
    Role  string `json:"role"  validate:"required,oneof=admin user viewer"`
}

Verify: validate.Struct(req) with invalid fields returns validator.ValidationErrors.

4. Validate at the semantic level

After syntactic validation passes, check business rules: cross-field consistency, temporal logic, and domain constraints. [src7]

# Pydantic -- semantic (cross-field) validation
from pydantic import model_validator

class BookingRequest(BaseModel):
    check_in: date
    check_out: date

    @model_validator(mode='after')
    def validate_dates(self):
        if self.check_out <= self.check_in:
            raise ValueError('check_out must be after check_in')
        return self

Verify: Reversed dates raise ValidationError.

5. Canonicalize input before validation

Decode and normalize input before applying validation rules to prevent double-encoding attacks. [src2]

import unicodedata

def canonicalize(value: str) -> str:
    normalized = unicodedata.normalize('NFC', value)
    cleaned = ''.join(
        c for c in normalized
        if unicodedata.category(c) != 'Cc' or c in ('\n', '\t')
    )
    return cleaned.strip()

Verify: Control characters like \x00 are stripped; decomposed Unicode is composed.

Code Examples

Python (Pydantic v2): API Request Validation

# Input:  Raw JSON request body from HTTP POST
# Output: Validated, typed Python object or ValidationError

from pydantic import BaseModel, Field, EmailStr, field_validator
from pydantic import ConfigDict
from enum import Enum

class UserRole(str, Enum):
    admin = "admin"
    user = "user"
    viewer = "viewer"

class CreateUserRequest(BaseModel):
    model_config = ConfigDict(str_strip_whitespace=True)
    email: EmailStr
    name: str = Field(min_length=1, max_length=100)
    age: int = Field(ge=13, le=150)
    role: UserRole = UserRole.user
    bio: str | None = Field(default=None, max_length=500)

TypeScript (Zod 3.x): Form and API Validation

// Input:  Unknown data from request body or form submission
// Output: Typed object or ZodError with field-level details

import { z } from 'zod';  // ^3.22.0

const CreateUserSchema = z.object({
  email: z.string().email().max(254).toLowerCase(),
  name: z.string().min(1).max(100).trim(),
  age: z.coerce.number().int().min(13).max(150),
  role: z.enum(['admin', 'user', 'viewer']).default('user'),
}).strict();

type CreateUser = z.infer<typeof CreateUserSchema>;

Java (Jakarta Validation 3.1): Bean Validation

// Input:  Request DTO from Spring MVC / JAX-RS
// Output: Validated bean or ConstraintViolationException

import jakarta.validation.constraints.*;

public record CreateUserRequest(
    @NotBlank @Email @Size(max = 254)
    String email,

    @NotBlank @Size(min = 1, max = 100)
    @Pattern(regexp = "^[\\p{L} '-]+$")
    String name,

    @NotNull @Min(13) @Max(150)
    Integer age,

    @NotNull @Pattern(regexp = "^(admin|user|viewer)$")
    String role
) {}

Go (validator v10): Struct Tag Validation

// Input:  JSON-decoded struct from HTTP request
// Output: nil (valid) or validator.ValidationErrors

type CreateUserRequest struct {
    Email string `json:"email" validate:"required,email,max=254"`
    Name  string `json:"name"  validate:"required,min=1,max=100"`
    Age   int    `json:"age"   validate:"required,gte=13,lte=150"`
    Role  string `json:"role"  validate:"required,oneof=admin user viewer"`
    Bio   string `json:"bio"   validate:"omitempty,max=500"`
}

var validate = validator.New()

func ValidateUser(req *CreateUserRequest) error {
    req.Email = strings.TrimSpace(strings.ToLower(req.Email))
    req.Name = strings.TrimSpace(req.Name)
    return validate.Struct(req)
}

Anti-Patterns

Wrong: Blocklist-based validation

# BAD -- blocklist filtering is trivially bypassed
def validate_input(value):
    dangerous = ['<script>', 'DROP TABLE', 'eval(', '../']
    for d in dangerous:
        if d.lower() in value.lower():
            raise ValueError('Dangerous input detected')
    return value
# Bypassed by: <scr<script>ipt>, DR/**/OP TABLE, e\x76al(, ..%2f

Correct: Allowlist with type coercion

# GOOD -- define what is valid, reject everything else
from pydantic import BaseModel, Field
from enum import Enum

class Status(str, Enum):
    active = "active"
    inactive = "inactive"

class UpdateRequest(BaseModel):
    status: Status
    count: int = Field(ge=0, le=1000)
    name: str = Field(max_length=100)

Wrong: Client-side validation only

// BAD -- client-side validation provides zero security
function submitForm() {
  const email = document.getElementById('email').value;
  if (!email.includes('@')) { alert('Invalid email'); return; }
  // Attacker bypasses with: curl -X POST -d 'email=<script>alert(1)</script>'
  fetch('/api/users', { method: 'POST', body: JSON.stringify({ email }) });
}

Correct: Server-side validation with client-side UX

// GOOD -- server-side validation is authoritative
import { z } from 'zod';

const EmailSchema = z.string().email().max(254);

app.post('/api/users', (req, res) => {
  const result = EmailSchema.safeParse(req.body.email);
  if (!result.success) {
    return res.status(400).json({ error: result.error.flatten() });
  }
});

Wrong: Regex-only validation without length limits

# BAD -- regex without length limit enables ReDoS
import re
email_re = re.compile(r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$')

def validate_email(email):
    if email_re.match(email):  # No length check!
        return True  # 10MB string causes catastrophic backtracking
    return False

Correct: Length limit before regex

# GOOD -- check length BEFORE applying regex
def validate_email(email: str) -> bool:
    if not email or len(email) > 254:  # RFC 5321 limit
        return False
    if len(email.split('@')[0]) > 64:  # Local part limit
        return False
    return bool(re.match(r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$', email))

Wrong: Trusting deserialized data

# BAD -- deserializing untrusted input without validation
import json
data = json.loads(request.body)
user_id = data['user_id']   # No type check
quantity = data['quantity']  # No range check
db.execute(f"UPDATE orders SET qty={quantity} WHERE user={user_id}")

Correct: Schema validation then parameterized queries

# GOOD -- validate schema, then use parameterized queries
from pydantic import BaseModel, Field

class UpdateOrderRequest(BaseModel):
    user_id: int = Field(gt=0)
    quantity: int = Field(ge=1, le=9999)

req = UpdateOrderRequest.model_validate_json(request.body)
db.execute("UPDATE orders SET qty=%s WHERE user=%s", (req.quantity, req.user_id))

Common Pitfalls

Diagnostic Commands

# Test Pydantic validation in Python REPL
python -c "
from pydantic import BaseModel, EmailStr, Field
class T(BaseModel):
    email: EmailStr
    age: int = Field(ge=0, le=150)
try: T(email='bad', age=-1)
except Exception as e: print(e)
"

# Test Zod validation in Node.js
node -e "
const {z} = require('zod');
const s = z.object({email: z.string().email(), age: z.number().int().min(0)});
console.log(s.safeParse({email:'bad', age:-1}));
"

# Find unvalidated request body usage (Node.js/Express)
grep -rn 'req\.body\.' --include="*.js" --include="*.ts" . | grep -v 'validate\|schema\|parse'

# Find raw SQL string interpolation (Python)
grep -rn 'f".*SELECT\|f".*INSERT\|f".*UPDATE\|f".*DELETE' --include="*.py" .

# Audit Java controllers for missing @Valid annotation
grep -rn '@RequestBody' --include="*.java" . | grep -v '@Valid'

Version History & Compatibility

LibraryVersionStatusKey Change
Pydanticv2.xCurrentRust-powered core, @field_validator, 5-50x faster
Pydanticv1.xEOL (2024)@validator, class Config -- use bump-pydantic to migrate
Zod3.xCurrent/Stablez.coerce, .pipe(), .brand(), discriminated unions
Joi17.xCurrentESM support, TypeScript types
Joi16.xDeprecatedPackage was @hapi/joi
Jakarta Validation3.1Currentjakarta.validation.* namespace
Jakarta Validation2.0Legacyjavax.validation.* -- requires namespace migration
go-playground/validatorv10CurrentCustom validators, struct-level validation, dive support
FluentValidation (.NET)11.xCurrent.NET 8 support, async validators

When to Use / When Not to Use

Use WhenDon't Use WhenUse Instead
Any user-supplied data enters your applicationProcessing fully trusted internal system dataBasic type assertions may suffice
Building APIs that accept JSON/form payloadsValidating output for displayOutput encoding (context-specific escaping)
File upload processingSimple static config file parsingConfig libraries with built-in schema
CLI tools accepting user argumentsData already validated by upstream service in same trust zonePass validated types between services
Preventing injection attacks at the boundaryReplacing parameterized queries for SQLParameterized queries + input validation together

Important Caveats

Related Units