This recipe produces a validated problem hypothesis backed by 20-40 structured customer discovery interviews, a synthesis report with pattern analysis, and a go/no-go scorecard based on a clear threshold: at least 8 out of 10 interviewees must describe the target problem unprompted. The output replaces founder intuition with evidence-based problem validation that feeds directly into solution design and MVP scoping. [src1]
business/customer-research — with firmographic and behavioral criteria specific enough to screen recruitsWhich path?
├── User wants free tools AND simple workflow
│ └── PATH A: Free Lean — Google Calendar + Google Meet + Google Sheets
├── User wants free tools AND better synthesis
│ └── PATH B: Free Pro — Cal.com + Zoom (free) + Notion
├── User wants paid tools AND automated workflow
│ └── PATH C: Paid Standard — Calendly + Zoom Pro + Dovetail
└── User wants maximum efficiency AND scale beyond 40 interviews
└── PATH D: Paid Scale — Calendly + Zoom + Dovetail + User Interviews
| Path | Tools | Cost | Speed (30 interviews) | Synthesis Quality |
|---|---|---|---|---|
| A: Free Lean | Google Calendar + Meet + Sheets | $0 | 4-6 weeks | Manual — adequate |
| B: Free Pro | Cal.com + Zoom free + Notion | $0 | 4-5 weeks | Structured — good |
| C: Paid Standard | Calendly + Zoom Pro + Dovetail | $45-75/mo | 3-4 weeks | AI-assisted — excellent |
| D: Paid Scale | Calendly + Zoom + Dovetail + User Interviews | $150-300/mo | 2-3 weeks | Automated — excellent |
Duration: 5-10 days · Tool: Scheduling tool + outreach channels
Recruit 35-50 candidates to yield 25-40 completed interviews (expect 30-40% no-show/decline rate). Recruit only from your ICP. [src1]
Channel Signal Quality Response Rate
────────────────────────────────────────────────────────────────
1. Warm intro via mutual contact Highest 40-60%
2. LinkedIn personalized outreach High 15-25%
3. Industry community/Slack/Reddit High 10-20%
4. Cold email to ICP list Medium 5-15%
5. Social media post (Twitter/X) Medium 3-10%
6. Paid recruiting panel Variable 80-95%
Verify: 35+ candidates scheduled within 10 days · If failed: Add a second recruitment channel or offer $20-50 gift card incentive
Duration: 1-2 hours · Tool: Document editor
Build a semi-structured script following Mom Test principles: ask about their life, not your idea. Focus on past behavior, not future intentions. [src1]
OPENING (2 min):
"Thanks for making time. I'm researching [problem domain] to understand
the real challenges. No right or wrong answers. Mind if I record?"
CONTEXT (3 min):
1. "Tell me about your role. What does a typical week look like?"
2. "Where does [problem domain] fit into your priorities?"
PROBLEM EXPLORATION (15 min):
3. "Walk me through the last time you dealt with [broad topic area]."
4. "What's the hardest part about [the process they described]?"
5. "How are you solving that today?"
6. "What have you tried that didn't work?"
7. "If you could wave a magic wand, what would change?"
COMMITMENT TESTING (3 min):
8. "Who else on your team deals with this?"
9. "Would you be open to a follow-up conversation?"
CLOSE (2 min):
"Is there anything I should have asked but didn't?"
Verify: Script reviewed by at least one other person. One pilot interview completed. · If failed: Cut questions if pilot runs over 30 minutes
Duration: 2-4 weeks · Tool: Video conferencing + transcription · Rate limit: Max 3 interviews per day
Run interviews following the script. Adapt based on emerging themes. Log every interview immediately after completion.
Per-interview checklist:
□ Recording consent given
□ Recording started (video + audio)
□ Script available but NOT read verbatim
□ Timer visible — stay within 25-30 minutes
□ Exact quotes captured (not paraphrases)
□ Commitment ask completed at end
□ 5-minute debrief immediately after
Verify: At least 80% of scheduled interviews completed. Tracking log fully populated. · If failed: If no-show rate exceeds 40%, send 24h and 1h reminders
Duration: 4-8 hours (batch after every 8-10 interviews) · Tool: Dovetail, Notion, or Google Sheets
Tag and code transcripts, cluster into themes, build evidence table. Synthesize in batches of 8-10, not all at the end. [src2] [src7]
Per-batch synthesis:
1. Tag: problem statements, severity, frequency, current solutions, outcomes
2. Cluster: group by theme, name each cluster
3. Count: frequency per theme, average severity, prompted vs unprompted
4. Build evidence table: Problem Theme | Frequency | Severity | Quotes
Verify: Every transcript tagged. Theme clusters are mutually exclusive. · If failed: If themes overlap, split into sub-themes and re-tag
Duration: 1-2 hours · Tool: Spreadsheet or synthesis tool
Apply the validation scorecard with four metrics: unprompted mention rate (≥ 80%), severity (≥ 3.0/5), active solution seeking (≥ 30%), and commitment signal (≥ 25%).
DECISION MATRIX:
All 4 metrics at validated/strong level → GO
Metric 1 validated + 2 of 3 others strong → CONDITIONAL GO
Metric 1 not validated → NO GO (pivot or reframe)
Any single metric below minimum → INVESTIGATE further
Verify: Scorecard uses tallied data from tracking log, not impressions · If failed: If metric 1 is 60-79%, conduct 10 more interviews with reframed problem language
Duration: 2-4 hours
Generate the synthesis report, problem validation scorecard, raw data archive, and design partner list for downstream consumption.
Deliverables:
1. interview-synthesis-report.md — evidence-backed findings
2. problem-validation-scorecard.json — machine-readable scorecard
3. interview-tracking-log.csv — all interviews with coded data
4. design-partners.csv — qualified follow-up candidates
{
"output_type": "problem_validation_scorecard",
"format": "JSON",
"columns": [
{"name": "problem_hypothesis", "type": "string", "required": true},
{"name": "total_interviews", "type": "number", "required": true},
{"name": "unprompted_mention_rate", "type": "number", "required": true},
{"name": "avg_severity", "type": "number", "required": true},
{"name": "active_solution_rate", "type": "number", "required": true},
{"name": "commitment_rate", "type": "number", "required": true},
{"name": "decision", "type": "string", "required": true},
{"name": "top_problems", "type": "array", "required": true},
{"name": "design_partners", "type": "number", "required": false}
],
"expected_row_count": "1",
"deduplication_key": "problem_hypothesis"
}
| Quality Metric | Minimum Acceptable | Good | Excellent |
|---|---|---|---|
| Interviews completed | 20 | 30 | 40+ |
| Unprompted problem mention rate | ≥ 60% | ≥ 80% | ≥ 90% |
| Average severity score | ≥ 3.0/5.0 | ≥ 3.5/5.0 | ≥ 4.0/5.0 |
| Active solution seeking rate | ≥ 30% | ≥ 50% | ≥ 70% |
| Transcript tagging completeness | 80% | 95% | 100% |
| Synthesis lag | < 72 hours | < 48 hours | < 24 hours |
If below minimum: If fewer than 20 interviews completed, extend recruitment. If unprompted rate is below 60% after 20 interviews, reframe problem hypothesis and conduct 10 more.
| Error | Likely Cause | Recovery Action |
|---|---|---|
| High no-show rate (> 40%) | Weak outreach or wrong channel | Rewrite outreach emphasizing value exchange, add calendar reminders at 24h and 1h |
| Only compliments, no real problems | Questions too leading or solution mentioned | Reset script to pure Mom Test questions, remove solution hints [src1] |
| All interviewees describe different problems | ICP too broad or hypothesis too vague | Narrow ICP segment, pick one job-to-be-done, re-recruit [src4] |
| Transcripts missing or corrupt | Recording failed or consent not given | Check recording at minute 1. Use backup note-taker. Re-interview if critical |
| Synthesis themes overlapping | Tag taxonomy too broad or inconsistent | Rebuild taxonomy with mutual exclusivity, re-tag from highlights |
| Interviewer talks > 30% | Script read verbatim or pitching | Practice active listening, review own talk ratio after each interview [src1] |
| Component | Free Tier | Paid Tier | At Scale |
|---|---|---|---|
| Scheduling (Calendly/Cal.com) | $0 (1 event type) | $10/mo | $16/mo |
| Video conferencing (Zoom/Meet) | $0 (40-min limit) | $13/mo | $20/mo |
| Transcription (Otter.ai/built-in) | $0 (300 min/mo) | $10/mo | $24/mo |
| Synthesis (Sheets/Dovetail) | $0 (Google Sheets) | $29/mo | $49/mo |
| Participant incentives | $0 (value exchange) | $20-50/interview | $50-100/interview |
| Total for 30 interviews | $0 | $62-112/mo + incentives | $109-159/mo + incentives |
Hypothetical questions produce hypothetical answers. People are terrible at predicting their own future behavior, yielding compliments and false positives that lead to products nobody uses. [src1]
Ask "Tell me about the last time you dealt with [problem domain]" and listen for the problem to emerge naturally. Past behavior is the best predictor of future behavior.
Early pattern recognition is often confirmation bias. The first 10 interviews tend to confirm your hypothesis because you unconsciously seek confirmation. [src6]
The saturation rule: at least 20 interviews, continue until 3 in a row surface zero new problem themes. This typically happens between interview 25 and 35.
Batch synthesis at the end produces shallow analysis and loses contextual memory of tone, hesitation, and emphasis. [src2]
Process each batch while conversations are fresh. Update theme clusters and tracking log. Adjust questions for the next batch based on emerging patterns.
Use this recipe when a founder or product team needs to validate that a real, severe, frequent problem exists before investing in building a solution. It produces structured evidence that replaces gut feeling with data. The output feeds directly into solution interview design, MVP scoping, and investor pitch evidence. Requires an ICP definition as input.