This scorecard separates real product-market fit signals from noise by scoring a startup across three independent dimensions — retention (do users come back?), engagement (is usage deepening?), and willingness-to-pay (do they pay without discounts?) — each further broken into weighted sub-signals with calibrated benchmarks. Use it when a founder or investor needs to answer "Is PMF real here, or are we fooling ourselves with vanity metrics?" The output is a composite 0-100 score with per-dimension diagnostics that route directly to the correct next card. [src1, src2]
PMF is not binary and not permanent — it is a progressive set of tests, each more definitive than the last. [src2] This scorecard exposes false positives (top-line growth from paid acquisition, inflated NPS without retention, free users with no pricing validation) and false negatives (slow-burn B2B products with strong retention but low signup volume). [src3, src6]
What this measures: Whether users who arrive stick around long enough to matter. Retention is the single most trustworthy PMF signal because it is behavioral, cohort-isolated, and harder to fake than top-line metrics. [src2, src8]
| Score | Level | Description | Evidence |
|---|---|---|---|
| 1 | Ad hoc | Retention decays toward zero; no plateau by Month 6 | Cohort curve keeps falling, <5% retained at M6 |
| 2 | Emerging | Decay slows but no clear plateau yet | Slope flattening; 10-20% retained at M6 |
| 3 | Defined | Clear plateau emerging in best segments | Plateau visible at 20-30% for at least one segment |
| 4 | Managed | Flat plateau across primary segments | 30-40% plateau sustained 6+ months |
| 5 | Optimized | Smiling retention (curve turns up over time) | Plateau rises — expansion and re-engagement dominate churn [src8] |
Red flags: Aggregate retention looks healthy but cohort view shows each new cohort performs worse than the last — classic sign of new-user acquisition masking deteriorating fit. [src3]
Quick diagnostic question: "Plot retention by weekly or monthly cohort for the last 6 months — does the curve flatten for any segment?"
| Score | Level | Description | Evidence |
|---|---|---|---|
| 1 | Ad hoc | B2C >10%/mo, B2B SMB >5%/mo, B2B ENT >2%/mo | Catastrophic churn |
| 2 | Emerging | B2C 7-10%/mo, B2B SMB 3-5%/mo, B2B ENT 1-2%/mo | Problematic |
| 3 | Defined | B2C 5-7%/mo, B2B SMB 2-3%/mo, B2B ENT 0.5-1%/mo | Industry median |
| 4 | Managed | B2C 3-5%/mo, B2B SMB 1-2%/mo, B2B ENT <0.5%/mo | Strong |
| 5 | Optimized | B2C <3%/mo, B2B SMB <1%/mo, B2B ENT near-zero | Best-in-class [src6] |
Red flags: Revenue churn lower than logo churn = small customers leaving while big ones stay (common, acceptable). Revenue churn higher than logo churn = bigger customers leaving = severe PMF problem. [src6]
Quick diagnostic question: "What is your monthly logo churn and monthly revenue churn, measured as a cohort for the last 3 months?"
| Score | Level | Description | Evidence |
|---|---|---|---|
| 1 | Ad hoc | No segment shows differentiated retention | All cohorts decay at similar poor rates |
| 2 | Emerging | One segment retains slightly better | Top segment 1.5x better than average |
| 3 | Defined | Clear "best segment" identified with materially better curve | Top segment 2-3x better |
| 4 | Managed | Best segment retains at plateau; others don't | Clear high-expectation customer profile documented [src1] |
| 5 | Optimized | Best segment retains + dominates revenue + is expanding | Top segment >50% of revenue, growing share |
Red flags: Scoring retention across all users instead of segmenting — hides the real fit that exists for a subset. Balfour: PMF is rarely across the whole market; it lives in segments. [src2]
Quick diagnostic question: "Can you point to one customer segment (by persona, use case, or size) that retains at >2x the rate of everyone else?"
What this measures: Whether active users develop habit and depth of use, not just logins. Engagement distinguishes shallow engagement (retention risk) from deep integration (PMF). [src6]
| Score | Level | Description | Evidence |
|---|---|---|---|
| 1 | Ad hoc | No defined "core action" or most users never perform it | <20% of signups perform core action |
| 2 | Emerging | Core action defined; minority perform it | 20-40% perform core action at least once |
| 3 | Defined | Majority perform core action in first session | 40-60% perform it; some repeat |
| 4 | Managed | Habitual use — core action performed weekly | 60-80% return within 7 days to repeat [src7] |
| 5 | Optimized | Core action performed multiple times per week by most actives | >80% weekly active; DAU/MAU >40% for consumer |
Red flags: Users log in but never perform the action that creates value — common in signup-optimized funnels. [src6]
Quick diagnostic question: "Define the single action that delivers the core value. What % of signups perform it? How often do retained users repeat it?"
| Score | Level | Description | Evidence |
|---|---|---|---|
| 1 | Ad hoc | Users touch only 1 feature | Shallow; high churn risk |
| 2 | Emerging | Most users touch 2 features | Limited depth |
| 3 | Defined | Retained users touch 3+ features | Multi-feature adoption by actives |
| 4 | Managed | Retained users create persistent artifacts (projects, integrations, data) | Lock-in via artifacts [src6] |
| 5 | Optimized | Users integrate product into daily workflow; switching cost is high | Product embedded in user routine |
Red flags: Signup spike from launch/press without feature depth = news-cycle noise, not PMF. [src3]
Quick diagnostic question: "Of users who are still active after 30 days, how many unique features/actions have they used? Do they create persistent artifacts (projects, integrations, teams)?"
| Score | Level | Description | Evidence |
|---|---|---|---|
| 1 | Ad hoc | <10% of new users arrive organically | Growth entirely paid-dependent |
| 2 | Emerging | 10-20% organic | Some spontaneous interest |
| 3 | Defined | 20-35% organic | Meaningful word-of-mouth emerging |
| 4 | Managed | 35-50% organic | Strong referral loop; viral coefficient approaching 1 |
| 5 | Optimized | >50% organic; users recruit other users without being asked | Network effects or genuine love [src4] |
Red flags: Paid acquisition can manufacture any growth curve — if organic share is <10% and retention is mediocre, you are buying a graph, not PMF. [src3]
Quick diagnostic question: "What % of new signups this month came from unpaid channels (direct, organic search, referral, word-of-mouth)?"
What this measures: Whether users commit economically — paying full price without discount, sustaining payments without churn, and expanding spend. Stated intent ("I would pay for this") is noise; actual payment is signal. [src2]
| Score | Level | Description | Evidence |
|---|---|---|---|
| 1 | Ad hoc | Free-only or trial → paid conversion <1% | No willingness signal |
| 2 | Emerging | Trial → paid 1-3% (B2C) or 5-10% (B2B freemium) | Weak signal |
| 3 | Defined | Trial → paid 3-5% (B2C) or 10-20% (B2B freemium) | Industry-typical |
| 4 | Managed | Trial → paid 5-10% (B2C) or 20-35% (B2B freemium) | Strong |
| 5 | Optimized | Trial → paid >10% (B2C) or >35% (B2B freemium) | Best-in-class [src6] |
Red flags: Discounts, coupons, or "free forever" converting users — that is a price signal, not a fit signal. Strip discounts and re-measure.
Quick diagnostic question: "What % of users who start a free trial or free plan convert to a paid plan at full list price (no discount)?"
| Score | Level | Description | Evidence |
|---|---|---|---|
| 1 | Ad hoc | >50% of revenue from discounted deals | Sellers buy revenue; value unproven |
| 2 | Emerging | 30-50% discounted | Heavy discounting to close |
| 3 | Defined | 15-30% discounted | Normal negotiation |
| 4 | Managed | <15% discounted; full-price close rate growing | Customers accept value |
| 5 | Optimized | No discount needed; attempted price increases do not raise churn | Pricing power |
Red flags: Sales team resorts to discount/extended terms to close; churn spikes when discounts expire. [src6]
Quick diagnostic question: "What % of closed deals in the last 90 days involved a discount greater than 10% off list?"
| Score | Level | Description | Evidence |
|---|---|---|---|
| 1 | Ad hoc | NRR <80% | Business shrinking on existing base |
| 2 | Emerging | NRR 80-95% | Leakage; no expansion |
| 3 | Defined | NRR 95-105% | Break-even on expansion vs churn |
| 4 | Managed | NRR 105-120% | Healthy expansion; existing base drives growth |
| 5 | Optimized | NRR >120% | Best-in-class; customers pay more over time [src6] |
Red flags: NRR calculated with new logos included — that is GRR plus new sales, not NRR. NRR must be same-cohort only.
Quick diagnostic question: "For the cohort of customers from 12 months ago, what is their revenue today as a percentage of their revenue 12 months ago (excluding any new customers acquired since)?"
Each sub-signal scores 1-5. Compute dimension scores as simple averages of their 3 sub-signals, then apply dimension weights:
Retention Dimension = avg(1A, 1B, 1C) weight 0.45
Engagement Dimension = avg(2A, 2B, 2C) weight 0.30
Willingness-to-Pay Dim. = avg(3A, 3B, 3C) weight 0.25
Composite Score (1-5) = 0.45*Retention + 0.30*Engagement + 0.25*WTP
Composite (0-100 scale) = (Composite - 1) / 4 * 100
Retention is weighted highest because it is the hardest signal to fake and the most predictive of long-term outcomes. [src2, src8] Willingness-to-pay is weighted lower only because it may not apply to pre-monetization products — when applicable, treat low WTP scores as a hard gate regardless of composite.
| Overall Score (0-100) | Maturity Level | Interpretation | Recommended Next Step |
|---|---|---|---|
| 0 - 19 | Critical — signals are noise | Claimed PMF is not supported by data. Vanity metrics dominate. Do not scale. | Return to customer discovery; fetch MVP testing framework |
| 20 - 39 | Developing — weak signals | Early positive signals but heavy noise. Retention not yet proven. High risk of false positive. | Run full PMF engine — fetch PMF measurement |
| 40 - 59 | Competent — partial PMF | PMF exists in a segment but not broadly. Identify and double down on the high-expectation customer. | Formalize segment focus — fetch PMF measurement |
| 60 - 79 | Advanced — clear PMF | Multiple independent signals confirm PMF. Can begin scaling in strongest segment. | Full scaling gate check — fetch scaling readiness |
| 80 - 100 | Best-in-class — undeniable PMF | Andreessen's "you'll feel it" territory: retention plateau + deep engagement + unforced payment. | Build scaling engine — fetch growth model design [src4] |
| Weak Dimension (Score < 3) | Fetch This Card |
|---|---|
| Retention | Product-Market Fit Measurement — focus on cohort analysis and segment isolation |
| Engagement | Product-Market Fit Measurement — apply Superhuman PMF engine to high-expectation customer [src1] |
| Willingness-to-Pay | Value-Based Pricing SaaS — re-validate pricing with real behavior, not surveys |
Scores mean different things at different stages and models. Applying one threshold across segments produces misleading diagnoses. [src7]
| Segment | Expected Average Score (0-100) | "Good" Threshold | "Alarm" Threshold |
|---|---|---|---|
| Pre-seed (MVP, <6 months) | 15-30 | >35 | <15 — too early to measure meaningfully |
| Seed (early revenue, 6-18 months) | 30-50 | >50 | <25 |
| Post-seed (growing revenue, 18-30 months) | 45-65 | >60 | <35 |
| Series A+ (scaling, 30+ months) | 55-75 | >70 | <50 — PMF should be solid by this stage [src7] |
| Consumer (B2C subscription) | 35-55 | >55 | <25 — B2C churn floors are higher |
| B2B SaaS SMB | 45-65 | >60 | <35 |
| B2B SaaS Enterprise | 55-75 | >65 | <45 — enterprise retention floors are much higher |
| Marketplace | 30-50 (two-sided is harder) | >55 | <25 — need to score both sides |
Use this scorecard when: (1) a founder or investor disagrees about whether PMF exists; (2) before any decision to increase burn rate, hire a sales team, or raise a Series A; (3) after a launch moment (press, Product Hunt) to distinguish sustained signal from news-cycle noise; (4) quarterly as part of board reporting to track PMF trajectory, not just magnitude. [src4, src7]