How do I actually run customer discovery — recruit subjects, conduct Mom Test interviews, synthesize findings, and make a pivot-or-persevere decision?

customer discovery interview execution guide

How do I actually run customer discovery — recruit subjects, conduct Mom Test interviews, synthesize findings, and make a pivot-or-persevere decision?

how to run Mom Test interviews for startup validation

How do I actually run customer discovery — recruit subjects, conduct Mom Test interviews, synthesize findings, and make a pivot-or-persevere decision?

customer interview sample size and saturation methodology

How do I actually run customer discovery — recruit subjects, conduct Mom Test interviews, synthesize findings, and make a pivot-or-persevere decision?

startup customer discovery synthesis and pivot criteria

How do I actually run customer discovery — recruit subjects, conduct Mom Test interviews, synthesize findings, and make a pivot-or-persevere decision?

lean customer development interview-to-decision pipeline

How do I actually run customer discovery — recruit subjects, conduct Mom Test interviews, synthesize findings, and make a pivot-or-persevere decision?

Customer Discovery Recipe: Interviews to Pivot-or-Persevere Decision

Purpose

This recipe produces a validated pivot-or-persevere decision backed by 15-25 customer discovery interviews, a theme-coded synthesis database, and a customer language bank — within 4-8 weeks at $0-$2,000. It executes the full discovery pipeline: hypothesis formulation, subject recruitment, Mom Test interviews, thematic saturation monitoring, affinity-mapped synthesis, and an evidence-scored decision. The output feeds directly into idea validation, MVP scoping, and go-to-market planning. [src1]

Prerequisites

Problem hypothesis — “I believe [customer segment] experiences [specific problem] when [trigger], which costs them [time/money/frustration]”
Target customer profile — At least one segment defined with specific criteria (role, company size, industry, behavior)
150+ reachable prospects — Budget 150 contacts to get 30 interviews (20% cold conversion) [src5]
Calendly account — Free tier at calendly.com for scheduling
Video conferencing — Zoom or Google Meet with recording (free tier sufficient)
Transcription tool — Otter.ai free tier (300 min/month) or tl;dv
Tracking spreadsheet — Google Sheets with columns: Name, Date, Segment, Pain (1-10), Frequency, Current Solution, Key Quotes, New Themes
8-15 hours/week available over 4-8 weeks

Constraints

Never ask “Would you use this?” or “Would you pay for this?” — 80% who say “I would buy” do not buy when given the chance. [src2]
Minimum 12 interviews per segment for code saturation; 16-24 for meaning saturation. 92% of themes appear in first 12. [src4]
Never pitch during problem discovery — “You aren’t allowed to tell them what their problem is and in return they aren’t allowed to tell you what to build.” [src2]
Synthesize within 48 hours of each interview — delayed synthesis loses emotional context. [src1]
Two-person interview team recommended (interviewer + note-taker) — solo interviews lose 40-60% of observational data. [src1]
Budget 150 outreach contacts to secure 30 scheduled interviews — 20% cold conversion, 50%+ warm. [src5]

Tool Selection Decision

Which path?
├── User has $0 budget AND easy customer access
│   └── PATH A: Free Manual — Calendly + Google Meet + Google Sheets + manual notes
├── User has $0 budget AND moderate/difficult access
│   └── PATH B: Free + AI Assist — Cal.com + Zoom + Otter.ai Free + Google Sheets
├── User has $100-$500 budget
│   └── PATH C: Lean — Calendly + Zoom + Otter.ai Pro + Notion/Airtable + gift cards
└── User has $500-$2,000 budget
    └── PATH D: Full Stack — Calendly + Zoom + Dovetail + Respondent.io + gift cards

Path	Tools	Cost	Speed	Output Quality
A: Free Manual	Calendly, Meet, Sheets, manual notes	$0	6-8 weeks	Good — disciplined manual coding
B: Free + AI Assist	Cal.com, Zoom, Otter.ai, Sheets	$0	5-7 weeks	Good — AI transcription improves accuracy
C: Lean	Calendly, Zoom, Otter.ai Pro, Notion	$100-$500	4-6 weeks	High — incentives speed recruitment
D: Full Stack	Calendly, Zoom, Dovetail, Respondent.io	$500-$2K	4-5 weeks	Excellent — pro recruitment + analysis

Execution Flow

Step 1: Formulate Falsifiable Hypotheses

Duration: 2-4 hours · Tool: Google Sheets or Notion

Write 3-5 core problem hypotheses using Steve Blank’s format: “We believe [customer segment] experiences [specific problem] when [trigger/context], which costs them [time/money/frustration].” Rank by risk — which ones, if wrong, kill the business model? Convert to testable hypotheses with explicit pass/fail criteria. Identify the “riskiest assumption” — test it first. [src8]

Hypothesis template:
"We believe [customer segment] experiences [problem] at [frequency].
TRUE when [X] of [Y] interviewees describe this pain unprompted.
FALSE when fewer than [Z] of [Y] mention it at all."

Validation thresholds:
- Validated: 60%+ unprompted mention with emotional intensity
- Partially validated: 30-60% mention
- Invalidated: Under 30% or directly contradicted

Verify: 3-5 hypotheses documented with pass/fail thresholds; riskiest assumption identified · If failed: Spend 2-4 hours on competitor research and community lurking first

Step 2: Design the Interview Guide (Mom Test Format)

Duration: 2-3 hours · Tool: Google Docs or Notion

Write 8-12 open-ended questions following the Mom Test’s three core rules: (1) talk about their life instead of your idea, (2) ask about specifics in the past instead of hypotheticals, (3) talk less and listen more. Eliminate three types of bad data: compliments, hypothetical fluff, and wishlists. [src2]

Interview structure (30-40 minutes):

WARM-UP (2-3 min):
"I'm researching how [role] teams handle [domain]. No pitch — just learning."

CONTEXT (5 min):
1. "Walk me through your current workflow for [domain]."
2. "What does a typical week look like regarding [area]?"

PROBLEM EXPLORATION (15-20 min):
3. "What's the hardest part about [domain]?"
4. "Tell me about the last time you dealt with [problem area]."
5. "What have you tried to solve it? How did that work out?"
6. "Why was that hard?" (dig for root cause)
7. "How much time/money do you spend on this today?"

CURRENT SOLUTIONS (5-10 min):
8. "What tools or workarounds do you use right now?"
9. "If you could wave a magic wand, what would change?"
10. "What would happen if this stayed unsolved for another year?"

WRAP-UP (3 min):
11. "Anything about [domain] that frustrates you I haven't asked about?"
12. "Who else should I talk to about this?" (referral chain)

NEVER ASK: "Would you use...?", "How much would you pay...?",
"Is this a good idea?", "What features would you want?"

Verify: Guide reviewed by teammate; no leading or hypothetical questions · If failed: Replace any “Would you...” with “Tell me about the last time you...”

Step 3: Recruit Interview Subjects (Target: 25-35 Scheduled)

Duration: 5-10 days · Tool: LinkedIn, communities, email, Calendly

Identify 100-150 people matching the target profile. End every email with a yes/no question — increases reply rates from 4.8% to 12.8%. Over-recruit by 30-40% for no-shows. Max 3-4 interviews/day. Send reminders 24h and 1h before. Incentives ($25-$100 gift cards for B2B) improve response from ~10% to ~25%. [src5] [src1]

Cold outreach (B2B, under 150 words):
Subject: Quick question about [domain]

Hi [Name], I'm researching how [role] teams handle [problem area].
Would you have 20 minutes this week for a quick call?
No pitch — just trying to understand workflows.
Would [Tuesday/Thursday] work?

Response rates: Cold LinkedIn 10-15%, Cold email 10-20%,
Warm intro 40-60%, Community post 3-5%

Verify: 20+ interviews scheduled with screened participants within 10 days · If failed: (1) broaden profile, (2) try Respondent.io at $50-$150/participant, (3) post in 3+ communities, (4) ask every interviewee for 2 referrals [src5]

Step 4: Conduct Interviews and Track in Real-Time

Duration: 2-4 weeks · Tool: Zoom/Meet (recording), Otter.ai (transcription), Sheets (tracking)

Follow Mom Test methodology: listen 80%, talk 20%. Watch for three types of bad data: compliments, hypothetical fluff, wishlists. Redirect with: “When did it last happen? Walk me through the experience.” 5-minute debrief after each interview. After every 5 interviews, review for emerging themes. [src2] [src1]

After each interview, capture:
1. Top 3 insights (what surprised you?)
2. Hypothesis evidence: which gained/lost support?
3. New themes: anything you hadn't heard before?
4. Customer language: exact words for the problem
5. Behavioral signals: workarounds, spending, time

Red flags (interview contaminated):
- You talked >30% of the time
- You described your solution before minute 25
- Only compliments, no specifics
- You asked "Would you use/buy...?"

Verify: After every 5 interviews: new themes still emerging? Pain score converging? 3+ describe same problem unprompted? · If failed: If no pain described, pivot hypothesis — ask “what IS painful?” [src6]

Step 5: Monitor Thematic Saturation

Duration: Ongoing from interview 10 · Tool: Google Sheets (saturation tracker)

Track new themes per interview using Guest, Namey & Chen’s methodology. Saturation reached when 3 consecutive interviews produce 0-1 new themes. Do not stop before 12 even if patterns seem clear. Do not continue past saturation — additional interviews add cost without insight. [src4]

Saturation benchmarks:
- First 6 interviews: ~78-79% of all themes
- First 10-12 interviews: ~92% of themes
- Code saturation (theme range): 9-12 interviews
- Meaning saturation (nuanced): 16-24 interviews

B2B: saturation at 15-20 interviews
B2C: may require 20-30 (population diversity)
Multiple segments: 12-15 per segment minimum

Verify: Saturation log shows ≤1 new theme in last 3; minimum 12 completed · If failed: If new themes at 20, sub-segment and continue [src3]

Step 6: Synthesize Findings with Affinity Mapping

Duration: 2-4 days · Tool: Sheets (theme coding), Miro/FigJam (affinity map), or Dovetail

Transform raw data into structured insights using thematic coding and affinity mapping. Weight unprompted mentions 3x over prompted responses. Identify the “hair on fire” problem: highest frequency + highest intensity + existing workaround spending. Build customer language bank — “The precise words customers use should be in your marketing material.” [src6] [src5]

Synthesis process:
A. Theme coding: tag observations, group into 5-8 parent themes
B. Affinity mapping: cluster quotes, name clusters, rank by
   frequency x intensity x willingness to spend
C. Customer language bank: extract exact words/phrases/metaphors
D. Hypothesis scorecard:
   - Validated: 60%+ unprompted, intensity 7+/10
   - Partial: 30-60% mention OR intensity 5-7/10
   - Invalidated: Under 30% or intensity below 5/10

Verify: Theme frequency table, affinity map, language bank, and hypothesis scorecard complete · If failed: Bring in second analyst for independent coding — compare results

Step 7: Make Pivot-or-Persevere Decision

Duration: 1-2 days · Tool: Google Sheets (decision matrix)

Apply evidence from all prior steps. Do not cherry-pick positive signals. Set hard deadline: decision within 2 weeks of last interview. Discovery without a forcing function becomes academic research, not startup strategy. [src1]

Signal	Stop	Pivot	Go
Pain score (avg)	<5/10	5-7/10	7+/10
Pattern clarity	No pattern	Emerging	10+ same
Unprompted mention rate	<30%	30-60%	>60%
Workaround spending	$0	$1-$50/mo	>$50/mo
Solution satisfaction	High	Moderate	Low
Referral willingness	0-1	2-4	5+
“Hair on fire” problem	No	Maybe	Clear
Segments identified	0	1	2+

Output files: discovery_synthesis.csv, hypothesis_scorecard.md, customer_language_bank.csv, decision_document.md

Verify: Decision documented with evidence; validated hypothesis and segment identified · If failed: If ambiguous after 20+, extend 5-8 interviews with tighter segment; if still ambiguous after 30, treat as “stop” [src1]

Output Schema

{
  "output_type": "customer_discovery_package",
  "format": "document collection",
  "columns": [
    {"name": "decision", "type": "string", "description": "Go, Pivot, or Stop with reasoning"},
    {"name": "core_hypothesis_validated", "type": "boolean", "description": "Riskiest assumption confirmed"},
    {"name": "hair_on_fire_problem", "type": "string", "description": "Highest-frequency, highest-intensity pain"},
    {"name": "target_segment", "type": "string", "description": "Best-fit customer segment profile"},
    {"name": "unprompted_mention_rate", "type": "number", "description": "% describing core problem unprompted"},
    {"name": "average_pain_intensity", "type": "number", "description": "Mean pain score 1-10"},
    {"name": "interviews_completed", "type": "number", "description": "Total interviews conducted"},
    {"name": "saturation_reached", "type": "boolean", "description": "Thematic saturation achieved"},
    {"name": "themes_identified", "type": "number", "description": "Total unique themes coded"},
    {"name": "customer_language_entries", "type": "number", "description": "Verbatim phrases captured"},
    {"name": "pivot_type", "type": "string", "description": "If pivoting: segment/problem/solution/channel"}
  ],
  "expected_row_count": "1 (single discovery decision)",
  "deduplication_key": "core_hypothesis + target_segment"
}

Quality Benchmarks

Quality Metric	Minimum Acceptable	Good	Excellent
Interviews per segment	12	18-20	25+
Thematic saturation	Approaching (1-2 new)	Reached (0-1 in last 3)	Confirmed (0 in last 5)
Unprompted mention rate	>30%	>50%	>70%
Average pain intensity	>5/10	>7/10	>8/10
Hypothesis scorecard	All scored	All + evidence	All + quotes + data
Customer language bank	20+ phrases	50+ phrases	100+ phrases
Synthesis timeline	Within 2 weeks	Within 1 week	Within 3 days

If below minimum: Re-run Steps 3-4 with broader segment. If interviews <12 per segment, recruit more before synthesizing — premature pattern-matching is the #1 failure mode. [src4]

Error Handling

Error	Likely Cause	Recovery Action
Cannot recruit 15 in 10 days	Segment too narrow or wrong channels	Broaden profile; try Respondent.io ($50-$150); ask interviewees for 2 referrals; post in 3+ communities
Polite but no real pain described	Problem hypothesis is wrong	Ask “what IS painful?” and let them lead; update hypothesis
Pain described but different problem	Hypothesis targeted wrong pain	Document actual pain as new hypothesis — valuable pivot signal
High pain but $0 spent on solutions	Pain not worth paying to solve	Test vitamin vs painkiller; pivot to higher-stakes version
New themes still at interview 20	Segment too heterogeneous	Sub-segment into 2-3 profiles; each needs 12+ interviews
Interviewer talked >30%	Insufficient Mom Test training	Discard contaminated interview; practice with teammate
Recording/transcription failed	Tool configuration issue	Take manual backup notes; switch Otter.ai to tl;dv or manual
Team disagrees on synthesis	Different analytical frames	Independent coding: each person codes separately, compare, resolve with evidence

Cost Breakdown

Component	Free ($0)	Lean ($500)	Standard ($2K)
Interview incentives	$0 (goodwill)	$200 (20 x $10)	$750 (25 x $30)
Transcription tools	$0 (Otter.ai free)	$0 (Otter.ai free)	$60 (Otter.ai Pro, 2 mo)
Recruiting platforms	$0 (DIY)	$0 (DIY)	$500 (Respondent.io)
Analysis tools	$0 (Google Sheets)	$0 (Google Sheets)	$100 (Dovetail, 2 mo)
Scheduling	$0 (Calendly free)	$0 (Calendly free)	$0 (Calendly free)
Video conferencing	$0 (Zoom/Meet free)	$0 (Zoom free)	$30 (Zoom Pro, 2 mo)
Contingency	$0	$300	$560
Total	$0	$500	$2,000

Constraint: Zero-budget discovery is possible but slower — recruiting without incentives reduces response rates by 50-70% and skews toward people who enjoy talking. [src1]

Anti-Patterns

Wrong: Asking “Would you use a product that does X?”

Future-prediction questions generate false positives. 80% who said “I would buy” did not buy. The Mom Test exists because your mom will never tell you your idea is bad. [src2]

Correct: Ask about past behavior with specifics

“Tell me about the last time you dealt with [problem].” Past behavior predicts future behavior. If they spend $0 on workarounds, they won’t buy your solution. [src2]

Wrong: 5-8 interviews then declaring validation

Confirmation bias in small samples. 12+ interviews for code saturation, 16-24 for meaning saturation. [src4]

Correct: Track saturation explicitly with a running log

Base-size-6, run-length-3 methodology. Saturation when 3 consecutive interviews produce 0-1 new themes. [src4]

Wrong: Running discovery indefinitely without deciding

Discovery without a forcing function becomes academic research, not startup strategy. [src1]

Correct: Set a hard deadline for the decision

Decide within 2 weeks of last interview. Perfect information does not exist — the goal is sufficient evidence to reduce risk. [src1]

Wrong: Interviewer talks more than 30% of the time

When founders talk, they pitch. Pitching triggers polite mode: compliments, fluff, and wishlists. [src2]

Correct: Listen 80%, talk 20% — use silence as a tool

Ask a question and wait. Best insights come from follow-up probes (“Why?”, “Tell me more”), not prepared questions. [src7]

When This Matters

Use when a founder or agent needs to execute a complete customer discovery cycle — recruit subjects, run Mom Test interviews, code themes, monitor saturation, synthesize findings, and make an evidence-backed pivot-or-persevere decision. Not a document about discovery methodology, but the actual execution steps with tools, templates, and quality gates. Requires a problem hypothesis as input; produces a validated decision, customer language bank, and synthesis database as output.