AI/ML Due Diligence Checklist
Type: Concept
Confidence: 0.86
Sources: 6
Verified: 2026-02-28
Definition
AI/ML due diligence is a specialized extension of technology due diligence that evaluates the intelligence layer of a target company — model architecture, training data provenance, third-party AI dependencies, MLOps maturity, inference economics, regulatory compliance, and key person risk. This checklist supplements the standard 8-workstream M&A due diligence framework. [src2] [src5]
Key Properties
- Seven Pillars: Model Architecture, Training Data, IP & Licensing, Performance, Infrastructure & MLOps, Regulatory, AI Talent
- Highest Risk: Training data provenance — most companies cannot prove full commercial rights
- EU AI Act: Prohibited systems enforcement Aug 2025; GPAI obligations Feb 2026
- Vendor Risk: Heavy API reliance on OpenAI/Anthropic/Google creates concentration risk
- Inference Costs: 30-60% of COGS for AI-native companies
- Key Person Risk: Loss of 1-2 founding ML engineers can render models unmaintainable
Constraints
- AI model evaluation requires hands-on technical access sellers may resist [src1]
- Point-in-time metrics don't capture drift — request 6-12 months of production data [src3]
- Open-source model licenses vary widely and may restrict commercial use [src4]
- AI patent landscape is extremely active with frequent claim conflicts [src6]
- Transfer learning creates derivative data rights questions IP frameworks don't address
Framework Selection Decision Tree
START — Evaluating target with AI/ML capabilities
├── How central is AI to value?
│ ├── AI IS the product → Full AI DD ← YOU ARE HERE
│ ├── AI enhances product → Moderate AI DD + standard tech DD
│ ├── AI internal only → Light assessment within standard tech DD
│ └── AI claimed but minimal → Validate claims
├── Builds own models?
│ ├── YES → Full model + training data + IP review
│ └── NO (API wrappers) → Focus on vendor dependency
├── Processes personal data in AI?
│ ├── YES → GDPR/EU AI Act review critical
│ └── NO → Standard IP/licensing review
└── AI talent primary driver?
├── YES → Key person assessment, retention packages
└── NO → Standard HR with AI overlay
Application Checklist
Step 1: Map the AI stack
- Inputs needed: Architecture diagrams, model inventory, third-party contracts, open-source licenses
- Output: AI stack map (proprietary vs. licensed vs. third-party)
- Constraint: If >50% relies on third-party APIs, the "AI moat" is a wrapper — adjust valuation [src2]
Step 2: Assess training data provenance
- Inputs needed: Training data catalog, sourcing agreements, scraping logs, consent records
- Output: Training data rights assessment by risk tier
- Constraint: Highest-risk area — liabilities include copyright claims, GDPR violations, contractual breaches [src4]
Step 3: Evaluate model performance
- Inputs needed: 6-12 months production metrics, A/B tests, drift reports, failure documentation
- Output: Reliability assessment, performance trend analysis
- Constraint: Test set accuracy overstates real-world performance by 10-20% [src3]
Step 4: Assess infrastructure and inference economics
- Inputs needed: Cloud costs, GPU allocation, inference latency, MLOps pipeline
- Output: Inference unit economics, scalability assessment
- Constraint: Compute costs scale non-linearly — project 3-year costs under acquirer's growth [src5]
Step 5: Regulatory and IP compliance
- Inputs needed: AI risk classification, patent portfolio, ethics policies
- Output: Regulatory gap analysis, FTO opinion, EU AI Act classification
- Constraint: EU AI Act penalties up to EUR 35M or 7% of global turnover [src3]
Anti-Patterns
Wrong: Accepting AI claims at face value
Many companies market as "AI-powered" with minimal actual AI deployment. [src5]
Correct: Demand technical access and independent evaluation
Ask: "If we removed all third-party API calls, what AI capability would remain?" [src1]
Wrong: Ignoring training data rights
Dismissing provenance concerns because "everyone uses the same data" fails when a lawsuit targets the specific acquired company. [src4]
Correct: Categorize training data into risk tiers
Tier 1 (clear rights), Tier 2 (gray area), Tier 3 (high risk). Quantify exposure per tier. [src4]
Wrong: Valuing AI talent without retention analysis
Standard HR DD doesn't assess whether critical AI knowledge is transferable or locked in individuals. [src1]
Correct: Conduct knowledge concentration assessment
Map sole-knowledge holders, assess bus factor, design retention packages vesting over 2-4 years. [src5]
Common Misconceptions
Misconception: Traditional tech DD covers AI assets.
Reality: AI DD adds training data provenance, model validation, inference economics, EU AI Act, and MLOps — none covered in standard tech DD. [src2]
Misconception: Open-source AI models mean no IP risk.
Reality: Licenses vary widely — some restrict commercial use, and models trained on copyrighted data may transfer liability. [src4]
Misconception: High benchmark accuracy means production-ready.
Reality: Benchmark accuracy often overstates real-world performance due to distribution mismatch and data leakage. [src3]
Comparison with Similar Concepts
| Concept | Key Difference | When to Use |
| AI Due Diligence | Specialized AI/ML assessment | Target has material AI capabilities |
| Technology DD | Broader IT assessment | Every tech acquisition |
| Standard DD | Full 8-workstream | Every M&A transaction |
| AI Vendor Assessment | Evaluating a supplier | Procurement, not M&A |
When This Matters
Fetch this when a user asks about evaluating AI capabilities in an acquisition, assessing training data rights during M&A, or understanding EU AI Act implications for transactions.
Related Units