This recipe produces a validated pivot-or-persevere decision backed by 15-25 customer discovery interviews, a theme-coded synthesis database, and a customer language bank — within 4-8 weeks at $0-$2,000. It executes the full discovery pipeline: hypothesis formulation, subject recruitment, Mom Test interviews, thematic saturation monitoring, affinity-mapped synthesis, and an evidence-scored decision. The output feeds directly into idea validation, MVP scoping, and go-to-market planning. [src1]
Which path?
├── User has $0 budget AND easy customer access
│ └── PATH A: Free Manual — Calendly + Google Meet + Google Sheets + manual notes
├── User has $0 budget AND moderate/difficult access
│ └── PATH B: Free + AI Assist — Cal.com + Zoom + Otter.ai Free + Google Sheets
├── User has $100-$500 budget
│ └── PATH C: Lean — Calendly + Zoom + Otter.ai Pro + Notion/Airtable + gift cards
└── User has $500-$2,000 budget
└── PATH D: Full Stack — Calendly + Zoom + Dovetail + Respondent.io + gift cards
| Path | Tools | Cost | Speed | Output Quality |
|---|---|---|---|---|
| A: Free Manual | Calendly, Meet, Sheets, manual notes | $0 | 6-8 weeks | Good — disciplined manual coding |
| B: Free + AI Assist | Cal.com, Zoom, Otter.ai, Sheets | $0 | 5-7 weeks | Good — AI transcription improves accuracy |
| C: Lean | Calendly, Zoom, Otter.ai Pro, Notion | $100-$500 | 4-6 weeks | High — incentives speed recruitment |
| D: Full Stack | Calendly, Zoom, Dovetail, Respondent.io | $500-$2K | 4-5 weeks | Excellent — pro recruitment + analysis |
Duration: 2-4 hours · Tool: Google Sheets or Notion
Write 3-5 core problem hypotheses using Steve Blank’s format: “We believe [customer segment] experiences [specific problem] when [trigger/context], which costs them [time/money/frustration].” Rank by risk — which ones, if wrong, kill the business model? Convert to testable hypotheses with explicit pass/fail criteria. Identify the “riskiest assumption” — test it first. [src8]
Hypothesis template:
"We believe [customer segment] experiences [problem] at [frequency].
TRUE when [X] of [Y] interviewees describe this pain unprompted.
FALSE when fewer than [Z] of [Y] mention it at all."
Validation thresholds:
- Validated: 60%+ unprompted mention with emotional intensity
- Partially validated: 30-60% mention
- Invalidated: Under 30% or directly contradicted
Verify: 3-5 hypotheses documented with pass/fail thresholds; riskiest assumption identified · If failed: Spend 2-4 hours on competitor research and community lurking first
Duration: 2-3 hours · Tool: Google Docs or Notion
Write 8-12 open-ended questions following the Mom Test’s three core rules: (1) talk about their life instead of your idea, (2) ask about specifics in the past instead of hypotheticals, (3) talk less and listen more. Eliminate three types of bad data: compliments, hypothetical fluff, and wishlists. [src2]
Interview structure (30-40 minutes):
WARM-UP (2-3 min):
"I'm researching how [role] teams handle [domain]. No pitch — just learning."
CONTEXT (5 min):
1. "Walk me through your current workflow for [domain]."
2. "What does a typical week look like regarding [area]?"
PROBLEM EXPLORATION (15-20 min):
3. "What's the hardest part about [domain]?"
4. "Tell me about the last time you dealt with [problem area]."
5. "What have you tried to solve it? How did that work out?"
6. "Why was that hard?" (dig for root cause)
7. "How much time/money do you spend on this today?"
CURRENT SOLUTIONS (5-10 min):
8. "What tools or workarounds do you use right now?"
9. "If you could wave a magic wand, what would change?"
10. "What would happen if this stayed unsolved for another year?"
WRAP-UP (3 min):
11. "Anything about [domain] that frustrates you I haven't asked about?"
12. "Who else should I talk to about this?" (referral chain)
NEVER ASK: "Would you use...?", "How much would you pay...?",
"Is this a good idea?", "What features would you want?"
Verify: Guide reviewed by teammate; no leading or hypothetical questions · If failed: Replace any “Would you...” with “Tell me about the last time you...”
Duration: 5-10 days · Tool: LinkedIn, communities, email, Calendly
Identify 100-150 people matching the target profile. End every email with a yes/no question — increases reply rates from 4.8% to 12.8%. Over-recruit by 30-40% for no-shows. Max 3-4 interviews/day. Send reminders 24h and 1h before. Incentives ($25-$100 gift cards for B2B) improve response from ~10% to ~25%. [src5] [src1]
Cold outreach (B2B, under 150 words):
Subject: Quick question about [domain]
Hi [Name], I'm researching how [role] teams handle [problem area].
Would you have 20 minutes this week for a quick call?
No pitch — just trying to understand workflows.
Would [Tuesday/Thursday] work?
Response rates: Cold LinkedIn 10-15%, Cold email 10-20%,
Warm intro 40-60%, Community post 3-5%
Verify: 20+ interviews scheduled with screened participants within 10 days · If failed: (1) broaden profile, (2) try Respondent.io at $50-$150/participant, (3) post in 3+ communities, (4) ask every interviewee for 2 referrals [src5]
Duration: 2-4 weeks · Tool: Zoom/Meet (recording), Otter.ai (transcription), Sheets (tracking)
Follow Mom Test methodology: listen 80%, talk 20%. Watch for three types of bad data: compliments, hypothetical fluff, wishlists. Redirect with: “When did it last happen? Walk me through the experience.” 5-minute debrief after each interview. After every 5 interviews, review for emerging themes. [src2] [src1]
After each interview, capture:
1. Top 3 insights (what surprised you?)
2. Hypothesis evidence: which gained/lost support?
3. New themes: anything you hadn't heard before?
4. Customer language: exact words for the problem
5. Behavioral signals: workarounds, spending, time
Red flags (interview contaminated):
- You talked >30% of the time
- You described your solution before minute 25
- Only compliments, no specifics
- You asked "Would you use/buy...?"
Verify: After every 5 interviews: new themes still emerging? Pain score converging? 3+ describe same problem unprompted? · If failed: If no pain described, pivot hypothesis — ask “what IS painful?” [src6]
Duration: Ongoing from interview 10 · Tool: Google Sheets (saturation tracker)
Track new themes per interview using Guest, Namey & Chen’s methodology. Saturation reached when 3 consecutive interviews produce 0-1 new themes. Do not stop before 12 even if patterns seem clear. Do not continue past saturation — additional interviews add cost without insight. [src4]
Saturation benchmarks:
- First 6 interviews: ~78-79% of all themes
- First 10-12 interviews: ~92% of themes
- Code saturation (theme range): 9-12 interviews
- Meaning saturation (nuanced): 16-24 interviews
B2B: saturation at 15-20 interviews
B2C: may require 20-30 (population diversity)
Multiple segments: 12-15 per segment minimum
Verify: Saturation log shows ≤1 new theme in last 3; minimum 12 completed · If failed: If new themes at 20, sub-segment and continue [src3]
Duration: 2-4 days · Tool: Sheets (theme coding), Miro/FigJam (affinity map), or Dovetail
Transform raw data into structured insights using thematic coding and affinity mapping. Weight unprompted mentions 3x over prompted responses. Identify the “hair on fire” problem: highest frequency + highest intensity + existing workaround spending. Build customer language bank — “The precise words customers use should be in your marketing material.” [src6] [src5]
Synthesis process:
A. Theme coding: tag observations, group into 5-8 parent themes
B. Affinity mapping: cluster quotes, name clusters, rank by
frequency x intensity x willingness to spend
C. Customer language bank: extract exact words/phrases/metaphors
D. Hypothesis scorecard:
- Validated: 60%+ unprompted, intensity 7+/10
- Partial: 30-60% mention OR intensity 5-7/10
- Invalidated: Under 30% or intensity below 5/10
Verify: Theme frequency table, affinity map, language bank, and hypothesis scorecard complete · If failed: Bring in second analyst for independent coding — compare results
Duration: 1-2 days · Tool: Google Sheets (decision matrix)
Apply evidence from all prior steps. Do not cherry-pick positive signals. Set hard deadline: decision within 2 weeks of last interview. Discovery without a forcing function becomes academic research, not startup strategy. [src1]
| Signal | Stop | Pivot | Go |
|---|---|---|---|
| Pain score (avg) | <5/10 | 5-7/10 | 7+/10 |
| Pattern clarity | No pattern | Emerging | 10+ same |
| Unprompted mention rate | <30% | 30-60% | >60% |
| Workaround spending | $0 | $1-$50/mo | >$50/mo |
| Solution satisfaction | High | Moderate | Low |
| Referral willingness | 0-1 | 2-4 | 5+ |
| “Hair on fire” problem | No | Maybe | Clear |
| Segments identified | 0 | 1 | 2+ |
Output files: discovery_synthesis.csv, hypothesis_scorecard.md, customer_language_bank.csv, decision_document.md
Verify: Decision documented with evidence; validated hypothesis and segment identified · If failed: If ambiguous after 20+, extend 5-8 interviews with tighter segment; if still ambiguous after 30, treat as “stop” [src1]
{
"output_type": "customer_discovery_package",
"format": "document collection",
"columns": [
{"name": "decision", "type": "string", "description": "Go, Pivot, or Stop with reasoning"},
{"name": "core_hypothesis_validated", "type": "boolean", "description": "Riskiest assumption confirmed"},
{"name": "hair_on_fire_problem", "type": "string", "description": "Highest-frequency, highest-intensity pain"},
{"name": "target_segment", "type": "string", "description": "Best-fit customer segment profile"},
{"name": "unprompted_mention_rate", "type": "number", "description": "% describing core problem unprompted"},
{"name": "average_pain_intensity", "type": "number", "description": "Mean pain score 1-10"},
{"name": "interviews_completed", "type": "number", "description": "Total interviews conducted"},
{"name": "saturation_reached", "type": "boolean", "description": "Thematic saturation achieved"},
{"name": "themes_identified", "type": "number", "description": "Total unique themes coded"},
{"name": "customer_language_entries", "type": "number", "description": "Verbatim phrases captured"},
{"name": "pivot_type", "type": "string", "description": "If pivoting: segment/problem/solution/channel"}
],
"expected_row_count": "1 (single discovery decision)",
"deduplication_key": "core_hypothesis + target_segment"
}
| Quality Metric | Minimum Acceptable | Good | Excellent |
|---|---|---|---|
| Interviews per segment | 12 | 18-20 | 25+ |
| Thematic saturation | Approaching (1-2 new) | Reached (0-1 in last 3) | Confirmed (0 in last 5) |
| Unprompted mention rate | >30% | >50% | >70% |
| Average pain intensity | >5/10 | >7/10 | >8/10 |
| Hypothesis scorecard | All scored | All + evidence | All + quotes + data |
| Customer language bank | 20+ phrases | 50+ phrases | 100+ phrases |
| Synthesis timeline | Within 2 weeks | Within 1 week | Within 3 days |
If below minimum: Re-run Steps 3-4 with broader segment. If interviews <12 per segment, recruit more before synthesizing — premature pattern-matching is the #1 failure mode. [src4]
| Error | Likely Cause | Recovery Action |
|---|---|---|
| Cannot recruit 15 in 10 days | Segment too narrow or wrong channels | Broaden profile; try Respondent.io ($50-$150); ask interviewees for 2 referrals; post in 3+ communities |
| Polite but no real pain described | Problem hypothesis is wrong | Ask “what IS painful?” and let them lead; update hypothesis |
| Pain described but different problem | Hypothesis targeted wrong pain | Document actual pain as new hypothesis — valuable pivot signal |
| High pain but $0 spent on solutions | Pain not worth paying to solve | Test vitamin vs painkiller; pivot to higher-stakes version |
| New themes still at interview 20 | Segment too heterogeneous | Sub-segment into 2-3 profiles; each needs 12+ interviews |
| Interviewer talked >30% | Insufficient Mom Test training | Discard contaminated interview; practice with teammate |
| Recording/transcription failed | Tool configuration issue | Take manual backup notes; switch Otter.ai to tl;dv or manual |
| Team disagrees on synthesis | Different analytical frames | Independent coding: each person codes separately, compare, resolve with evidence |
| Component | Free ($0) | Lean ($500) | Standard ($2K) |
|---|---|---|---|
| Interview incentives | $0 (goodwill) | $200 (20 x $10) | $750 (25 x $30) |
| Transcription tools | $0 (Otter.ai free) | $0 (Otter.ai free) | $60 (Otter.ai Pro, 2 mo) |
| Recruiting platforms | $0 (DIY) | $0 (DIY) | $500 (Respondent.io) |
| Analysis tools | $0 (Google Sheets) | $0 (Google Sheets) | $100 (Dovetail, 2 mo) |
| Scheduling | $0 (Calendly free) | $0 (Calendly free) | $0 (Calendly free) |
| Video conferencing | $0 (Zoom/Meet free) | $0 (Zoom free) | $30 (Zoom Pro, 2 mo) |
| Contingency | $0 | $300 | $560 |
| Total | $0 | $500 | $2,000 |
Constraint: Zero-budget discovery is possible but slower — recruiting without incentives reduces response rates by 50-70% and skews toward people who enjoy talking. [src1]
Future-prediction questions generate false positives. 80% who said “I would buy” did not buy. The Mom Test exists because your mom will never tell you your idea is bad. [src2]
“Tell me about the last time you dealt with [problem].” Past behavior predicts future behavior. If they spend $0 on workarounds, they won’t buy your solution. [src2]
Confirmation bias in small samples. 12+ interviews for code saturation, 16-24 for meaning saturation. [src4]
Base-size-6, run-length-3 methodology. Saturation when 3 consecutive interviews produce 0-1 new themes. [src4]
Discovery without a forcing function becomes academic research, not startup strategy. [src1]
Decide within 2 weeks of last interview. Perfect information does not exist — the goal is sufficient evidence to reduce risk. [src1]
When founders talk, they pitch. Pitching triggers polite mode: compliments, fluff, and wishlists. [src2]
Ask a question and wait. Best insights come from follow-up probes (“Why?”, “Tell me more”), not prepared questions. [src7]
Use when a founder or agent needs to execute a complete customer discovery cycle — recruit subjects, run Mom Test interviews, code themes, monitor saturation, synthesize findings, and make an evidence-backed pivot-or-persevere decision. Not a document about discovery methodology, but the actual execution steps with tools, templates, and quality gates. Requires a problem hypothesis as input; produces a validated decision, customer language bank, and synthesis database as output.