Customer Discovery Recipe: Interviews to Pivot-or-Persevere Decision

Type: Execution Recipe Confidence: 0.92 Sources: 8 Verified: 2026-03-11

Purpose

This recipe produces a validated pivot-or-persevere decision backed by 15-25 customer discovery interviews, a theme-coded synthesis database, and a customer language bank — within 4-8 weeks at $0-$2,000. It executes the full discovery pipeline: hypothesis formulation, subject recruitment, Mom Test interviews, thematic saturation monitoring, affinity-mapped synthesis, and an evidence-scored decision. The output feeds directly into idea validation, MVP scoping, and go-to-market planning. [src1]

Prerequisites

Constraints

Tool Selection Decision

Which path?
├── User has $0 budget AND easy customer access
│   └── PATH A: Free Manual — Calendly + Google Meet + Google Sheets + manual notes
├── User has $0 budget AND moderate/difficult access
│   └── PATH B: Free + AI Assist — Cal.com + Zoom + Otter.ai Free + Google Sheets
├── User has $100-$500 budget
│   └── PATH C: Lean — Calendly + Zoom + Otter.ai Pro + Notion/Airtable + gift cards
└── User has $500-$2,000 budget
    └── PATH D: Full Stack — Calendly + Zoom + Dovetail + Respondent.io + gift cards
PathToolsCostSpeedOutput Quality
A: Free ManualCalendly, Meet, Sheets, manual notes$06-8 weeksGood — disciplined manual coding
B: Free + AI AssistCal.com, Zoom, Otter.ai, Sheets$05-7 weeksGood — AI transcription improves accuracy
C: LeanCalendly, Zoom, Otter.ai Pro, Notion$100-$5004-6 weeksHigh — incentives speed recruitment
D: Full StackCalendly, Zoom, Dovetail, Respondent.io$500-$2K4-5 weeksExcellent — pro recruitment + analysis

Execution Flow

Step 1: Formulate Falsifiable Hypotheses

Duration: 2-4 hours · Tool: Google Sheets or Notion

Write 3-5 core problem hypotheses using Steve Blank’s format: “We believe [customer segment] experiences [specific problem] when [trigger/context], which costs them [time/money/frustration].” Rank by risk — which ones, if wrong, kill the business model? Convert to testable hypotheses with explicit pass/fail criteria. Identify the “riskiest assumption” — test it first. [src8]

Hypothesis template:
"We believe [customer segment] experiences [problem] at [frequency].
TRUE when [X] of [Y] interviewees describe this pain unprompted.
FALSE when fewer than [Z] of [Y] mention it at all."

Validation thresholds:
- Validated: 60%+ unprompted mention with emotional intensity
- Partially validated: 30-60% mention
- Invalidated: Under 30% or directly contradicted

Verify: 3-5 hypotheses documented with pass/fail thresholds; riskiest assumption identified · If failed: Spend 2-4 hours on competitor research and community lurking first

Step 2: Design the Interview Guide (Mom Test Format)

Duration: 2-3 hours · Tool: Google Docs or Notion

Write 8-12 open-ended questions following the Mom Test’s three core rules: (1) talk about their life instead of your idea, (2) ask about specifics in the past instead of hypotheticals, (3) talk less and listen more. Eliminate three types of bad data: compliments, hypothetical fluff, and wishlists. [src2]

Interview structure (30-40 minutes):

WARM-UP (2-3 min):
"I'm researching how [role] teams handle [domain]. No pitch — just learning."

CONTEXT (5 min):
1. "Walk me through your current workflow for [domain]."
2. "What does a typical week look like regarding [area]?"

PROBLEM EXPLORATION (15-20 min):
3. "What's the hardest part about [domain]?"
4. "Tell me about the last time you dealt with [problem area]."
5. "What have you tried to solve it? How did that work out?"
6. "Why was that hard?" (dig for root cause)
7. "How much time/money do you spend on this today?"

CURRENT SOLUTIONS (5-10 min):
8. "What tools or workarounds do you use right now?"
9. "If you could wave a magic wand, what would change?"
10. "What would happen if this stayed unsolved for another year?"

WRAP-UP (3 min):
11. "Anything about [domain] that frustrates you I haven't asked about?"
12. "Who else should I talk to about this?" (referral chain)

NEVER ASK: "Would you use...?", "How much would you pay...?",
"Is this a good idea?", "What features would you want?"

Verify: Guide reviewed by teammate; no leading or hypothetical questions · If failed: Replace any “Would you...” with “Tell me about the last time you...”

Step 3: Recruit Interview Subjects (Target: 25-35 Scheduled)

Duration: 5-10 days · Tool: LinkedIn, communities, email, Calendly

Identify 100-150 people matching the target profile. End every email with a yes/no question — increases reply rates from 4.8% to 12.8%. Over-recruit by 30-40% for no-shows. Max 3-4 interviews/day. Send reminders 24h and 1h before. Incentives ($25-$100 gift cards for B2B) improve response from ~10% to ~25%. [src5] [src1]

Cold outreach (B2B, under 150 words):
Subject: Quick question about [domain]

Hi [Name], I'm researching how [role] teams handle [problem area].
Would you have 20 minutes this week for a quick call?
No pitch — just trying to understand workflows.
Would [Tuesday/Thursday] work?

Response rates: Cold LinkedIn 10-15%, Cold email 10-20%,
Warm intro 40-60%, Community post 3-5%

Verify: 20+ interviews scheduled with screened participants within 10 days · If failed: (1) broaden profile, (2) try Respondent.io at $50-$150/participant, (3) post in 3+ communities, (4) ask every interviewee for 2 referrals [src5]

Step 4: Conduct Interviews and Track in Real-Time

Duration: 2-4 weeks · Tool: Zoom/Meet (recording), Otter.ai (transcription), Sheets (tracking)

Follow Mom Test methodology: listen 80%, talk 20%. Watch for three types of bad data: compliments, hypothetical fluff, wishlists. Redirect with: “When did it last happen? Walk me through the experience.” 5-minute debrief after each interview. After every 5 interviews, review for emerging themes. [src2] [src1]

After each interview, capture:
1. Top 3 insights (what surprised you?)
2. Hypothesis evidence: which gained/lost support?
3. New themes: anything you hadn't heard before?
4. Customer language: exact words for the problem
5. Behavioral signals: workarounds, spending, time

Red flags (interview contaminated):
- You talked >30% of the time
- You described your solution before minute 25
- Only compliments, no specifics
- You asked "Would you use/buy...?"

Verify: After every 5 interviews: new themes still emerging? Pain score converging? 3+ describe same problem unprompted? · If failed: If no pain described, pivot hypothesis — ask “what IS painful?” [src6]

Step 5: Monitor Thematic Saturation

Duration: Ongoing from interview 10 · Tool: Google Sheets (saturation tracker)

Track new themes per interview using Guest, Namey & Chen’s methodology. Saturation reached when 3 consecutive interviews produce 0-1 new themes. Do not stop before 12 even if patterns seem clear. Do not continue past saturation — additional interviews add cost without insight. [src4]

Saturation benchmarks:
- First 6 interviews: ~78-79% of all themes
- First 10-12 interviews: ~92% of themes
- Code saturation (theme range): 9-12 interviews
- Meaning saturation (nuanced): 16-24 interviews

B2B: saturation at 15-20 interviews
B2C: may require 20-30 (population diversity)
Multiple segments: 12-15 per segment minimum

Verify: Saturation log shows ≤1 new theme in last 3; minimum 12 completed · If failed: If new themes at 20, sub-segment and continue [src3]

Step 6: Synthesize Findings with Affinity Mapping

Duration: 2-4 days · Tool: Sheets (theme coding), Miro/FigJam (affinity map), or Dovetail

Transform raw data into structured insights using thematic coding and affinity mapping. Weight unprompted mentions 3x over prompted responses. Identify the “hair on fire” problem: highest frequency + highest intensity + existing workaround spending. Build customer language bank — “The precise words customers use should be in your marketing material.” [src6] [src5]

Synthesis process:
A. Theme coding: tag observations, group into 5-8 parent themes
B. Affinity mapping: cluster quotes, name clusters, rank by
   frequency x intensity x willingness to spend
C. Customer language bank: extract exact words/phrases/metaphors
D. Hypothesis scorecard:
   - Validated: 60%+ unprompted, intensity 7+/10
   - Partial: 30-60% mention OR intensity 5-7/10
   - Invalidated: Under 30% or intensity below 5/10

Verify: Theme frequency table, affinity map, language bank, and hypothesis scorecard complete · If failed: Bring in second analyst for independent coding — compare results

Step 7: Make Pivot-or-Persevere Decision

Duration: 1-2 days · Tool: Google Sheets (decision matrix)

Apply evidence from all prior steps. Do not cherry-pick positive signals. Set hard deadline: decision within 2 weeks of last interview. Discovery without a forcing function becomes academic research, not startup strategy. [src1]

SignalStopPivotGo
Pain score (avg)<5/105-7/107+/10
Pattern clarityNo patternEmerging10+ same
Unprompted mention rate<30%30-60%>60%
Workaround spending$0$1-$50/mo>$50/mo
Solution satisfactionHighModerateLow
Referral willingness0-12-45+
“Hair on fire” problemNoMaybeClear
Segments identified012+

Output files: discovery_synthesis.csv, hypothesis_scorecard.md, customer_language_bank.csv, decision_document.md

Verify: Decision documented with evidence; validated hypothesis and segment identified · If failed: If ambiguous after 20+, extend 5-8 interviews with tighter segment; if still ambiguous after 30, treat as “stop” [src1]

Output Schema

{
  "output_type": "customer_discovery_package",
  "format": "document collection",
  "columns": [
    {"name": "decision", "type": "string", "description": "Go, Pivot, or Stop with reasoning"},
    {"name": "core_hypothesis_validated", "type": "boolean", "description": "Riskiest assumption confirmed"},
    {"name": "hair_on_fire_problem", "type": "string", "description": "Highest-frequency, highest-intensity pain"},
    {"name": "target_segment", "type": "string", "description": "Best-fit customer segment profile"},
    {"name": "unprompted_mention_rate", "type": "number", "description": "% describing core problem unprompted"},
    {"name": "average_pain_intensity", "type": "number", "description": "Mean pain score 1-10"},
    {"name": "interviews_completed", "type": "number", "description": "Total interviews conducted"},
    {"name": "saturation_reached", "type": "boolean", "description": "Thematic saturation achieved"},
    {"name": "themes_identified", "type": "number", "description": "Total unique themes coded"},
    {"name": "customer_language_entries", "type": "number", "description": "Verbatim phrases captured"},
    {"name": "pivot_type", "type": "string", "description": "If pivoting: segment/problem/solution/channel"}
  ],
  "expected_row_count": "1 (single discovery decision)",
  "deduplication_key": "core_hypothesis + target_segment"
}

Quality Benchmarks

Quality MetricMinimum AcceptableGoodExcellent
Interviews per segment1218-2025+
Thematic saturationApproaching (1-2 new)Reached (0-1 in last 3)Confirmed (0 in last 5)
Unprompted mention rate>30%>50%>70%
Average pain intensity>5/10>7/10>8/10
Hypothesis scorecardAll scoredAll + evidenceAll + quotes + data
Customer language bank20+ phrases50+ phrases100+ phrases
Synthesis timelineWithin 2 weeksWithin 1 weekWithin 3 days

If below minimum: Re-run Steps 3-4 with broader segment. If interviews <12 per segment, recruit more before synthesizing — premature pattern-matching is the #1 failure mode. [src4]

Error Handling

ErrorLikely CauseRecovery Action
Cannot recruit 15 in 10 daysSegment too narrow or wrong channelsBroaden profile; try Respondent.io ($50-$150); ask interviewees for 2 referrals; post in 3+ communities
Polite but no real pain describedProblem hypothesis is wrongAsk “what IS painful?” and let them lead; update hypothesis
Pain described but different problemHypothesis targeted wrong painDocument actual pain as new hypothesis — valuable pivot signal
High pain but $0 spent on solutionsPain not worth paying to solveTest vitamin vs painkiller; pivot to higher-stakes version
New themes still at interview 20Segment too heterogeneousSub-segment into 2-3 profiles; each needs 12+ interviews
Interviewer talked >30%Insufficient Mom Test trainingDiscard contaminated interview; practice with teammate
Recording/transcription failedTool configuration issueTake manual backup notes; switch Otter.ai to tl;dv or manual
Team disagrees on synthesisDifferent analytical framesIndependent coding: each person codes separately, compare, resolve with evidence

Cost Breakdown

ComponentFree ($0)Lean ($500)Standard ($2K)
Interview incentives$0 (goodwill)$200 (20 x $10)$750 (25 x $30)
Transcription tools$0 (Otter.ai free)$0 (Otter.ai free)$60 (Otter.ai Pro, 2 mo)
Recruiting platforms$0 (DIY)$0 (DIY)$500 (Respondent.io)
Analysis tools$0 (Google Sheets)$0 (Google Sheets)$100 (Dovetail, 2 mo)
Scheduling$0 (Calendly free)$0 (Calendly free)$0 (Calendly free)
Video conferencing$0 (Zoom/Meet free)$0 (Zoom free)$30 (Zoom Pro, 2 mo)
Contingency$0$300$560
Total$0$500$2,000

Constraint: Zero-budget discovery is possible but slower — recruiting without incentives reduces response rates by 50-70% and skews toward people who enjoy talking. [src1]

Anti-Patterns

Wrong: Asking “Would you use a product that does X?”

Future-prediction questions generate false positives. 80% who said “I would buy” did not buy. The Mom Test exists because your mom will never tell you your idea is bad. [src2]

Correct: Ask about past behavior with specifics

“Tell me about the last time you dealt with [problem].” Past behavior predicts future behavior. If they spend $0 on workarounds, they won’t buy your solution. [src2]

Wrong: 5-8 interviews then declaring validation

Confirmation bias in small samples. 12+ interviews for code saturation, 16-24 for meaning saturation. [src4]

Correct: Track saturation explicitly with a running log

Base-size-6, run-length-3 methodology. Saturation when 3 consecutive interviews produce 0-1 new themes. [src4]

Wrong: Running discovery indefinitely without deciding

Discovery without a forcing function becomes academic research, not startup strategy. [src1]

Correct: Set a hard deadline for the decision

Decide within 2 weeks of last interview. Perfect information does not exist — the goal is sufficient evidence to reduce risk. [src1]

Wrong: Interviewer talks more than 30% of the time

When founders talk, they pitch. Pitching triggers polite mode: compliments, fluff, and wishlists. [src2]

Correct: Listen 80%, talk 20% — use silence as a tool

Ask a question and wait. Best insights come from follow-up probes (“Why?”, “Tell me more”), not prepared questions. [src7]

When This Matters

Use when a founder or agent needs to execute a complete customer discovery cycle — recruit subjects, run Mom Test interviews, code themes, monitor saturation, synthesize findings, and make an evidence-backed pivot-or-persevere decision. Not a document about discovery methodology, but the actual execution steps with tools, templates, and quality gates. Requires a problem hypothesis as input; produces a validated decision, customer language bank, and synthesis database as output.

Related Units