PMF Signals vs Noise Scorecard

How do I distinguish real PMF signals from noise — a structured scorecard across retention, engagement, and willingness-to-pay?

Purpose

This scorecard separates real product-market fit signals from noise by scoring a startup across three independent dimensions — retention (do users come back?), engagement (is usage deepening?), and willingness-to-pay (do they pay without discounts?) — each further broken into weighted sub-signals with calibrated benchmarks. Use it when a founder or investor needs to answer "Is PMF real here, or are we fooling ourselves with vanity metrics?" The output is a composite 0-100 score with per-dimension diagnostics that route directly to the correct next card. [src1, src2]

PMF is not binary and not permanent — it is a progressive set of tests, each more definitive than the last. [src2] This scorecard exposes false positives (top-line growth from paid acquisition, inflated NPS without retention, free users with no pricing validation) and false negatives (slow-burn B2B products with strong retention but low signup volume). [src3, src6]

Constraints

Minimum data: 50+ paying customers or 500+ active users, and 90+ days of cohort retention data. Below this, report scores as "directional only." [src3]
Business model matters: B2C subscription and B2B SaaS have different benchmark curves — do not apply consumer retention thresholds to enterprise SaaS or vice versa. [src7]
Do not rely on a single survey (Sean Ellis 40% test) as the sole signal — it has documented false positive and false negative rates. Always combine with behavioral data. [src2]
This scorecard measures signal quality, not PMF magnitude — a high score means the signals you have are real; it does not estimate total addressable market.
Re-run every 90 days. PMF can decay (market shift, competitive entry, AI-era volatility). [src2]

Assessment Dimensions

Dimension 1: Retention (weight: 45%)

What this measures: Whether users who arrive stick around long enough to matter. Retention is the single most trustworthy PMF signal because it is behavioral, cohort-isolated, and harder to fake than top-line metrics. [src2, src8]

Sub-signal 1A: Retention Curve Flatness

Score	Level	Description	Evidence
1	Ad hoc	Retention decays toward zero; no plateau by Month 6	Cohort curve keeps falling, <5% retained at M6
2	Emerging	Decay slows but no clear plateau yet	Slope flattening; 10-20% retained at M6
3	Defined	Clear plateau emerging in best segments	Plateau visible at 20-30% for at least one segment
4	Managed	Flat plateau across primary segments	30-40% plateau sustained 6+ months
5	Optimized	Smiling retention (curve turns up over time)	Plateau rises — expansion and re-engagement dominate churn [src8]

Red flags: Aggregate retention looks healthy but cohort view shows each new cohort performs worse than the last — classic sign of new-user acquisition masking deteriorating fit. [src3]

Quick diagnostic question: "Plot retention by weekly or monthly cohort for the last 6 months — does the curve flatten for any segment?"

Sub-signal 1B: Churn Rate (monthly, logo)

Score	Level	Description	Evidence
1	Ad hoc	B2C >10%/mo, B2B SMB >5%/mo, B2B ENT >2%/mo	Catastrophic churn
2	Emerging	B2C 7-10%/mo, B2B SMB 3-5%/mo, B2B ENT 1-2%/mo	Problematic
3	Defined	B2C 5-7%/mo, B2B SMB 2-3%/mo, B2B ENT 0.5-1%/mo	Industry median
4	Managed	B2C 3-5%/mo, B2B SMB 1-2%/mo, B2B ENT <0.5%/mo	Strong
5	Optimized	B2C <3%/mo, B2B SMB <1%/mo, B2B ENT near-zero	Best-in-class [src6]

Red flags: Revenue churn lower than logo churn = small customers leaving while big ones stay (common, acceptable). Revenue churn higher than logo churn = bigger customers leaving = severe PMF problem. [src6]

Quick diagnostic question: "What is your monthly logo churn and monthly revenue churn, measured as a cohort for the last 3 months?"

Sub-signal 1C: Segment Concentration

Score	Level	Description	Evidence
1	Ad hoc	No segment shows differentiated retention	All cohorts decay at similar poor rates
2	Emerging	One segment retains slightly better	Top segment 1.5x better than average
3	Defined	Clear "best segment" identified with materially better curve	Top segment 2-3x better
4	Managed	Best segment retains at plateau; others don't	Clear high-expectation customer profile documented [src1]
5	Optimized	Best segment retains + dominates revenue + is expanding	Top segment >50% of revenue, growing share

Red flags: Scoring retention across all users instead of segmenting — hides the real fit that exists for a subset. Balfour: PMF is rarely across the whole market; it lives in segments. [src2]

Quick diagnostic question: "Can you point to one customer segment (by persona, use case, or size) that retains at >2x the rate of everyone else?"

Dimension 2: Engagement (weight: 30%)

What this measures: Whether active users develop habit and depth of use, not just logins. Engagement distinguishes shallow engagement (retention risk) from deep integration (PMF). [src6]

Sub-signal 2A: Core Action Frequency

Score	Level	Description	Evidence
1	Ad hoc	No defined "core action" or most users never perform it	<20% of signups perform core action
2	Emerging	Core action defined; minority perform it	20-40% perform core action at least once
3	Defined	Majority perform core action in first session	40-60% perform it; some repeat
4	Managed	Habitual use — core action performed weekly	60-80% return within 7 days to repeat [src7]
5	Optimized	Core action performed multiple times per week by most actives	>80% weekly active; DAU/MAU >40% for consumer

Red flags: Users log in but never perform the action that creates value — common in signup-optimized funnels. [src6]

Quick diagnostic question: "Define the single action that delivers the core value. What % of signups perform it? How often do retained users repeat it?"

Sub-signal 2B: Depth of Adoption

Score	Level	Description	Evidence
1	Ad hoc	Users touch only 1 feature	Shallow; high churn risk
2	Emerging	Most users touch 2 features	Limited depth
3	Defined	Retained users touch 3+ features	Multi-feature adoption by actives
4	Managed	Retained users create persistent artifacts (projects, integrations, data)	Lock-in via artifacts [src6]
5	Optimized	Users integrate product into daily workflow; switching cost is high	Product embedded in user routine

Red flags: Signup spike from launch/press without feature depth = news-cycle noise, not PMF. [src3]

Quick diagnostic question: "Of users who are still active after 30 days, how many unique features/actions have they used? Do they create persistent artifacts (projects, integrations, teams)?"

Sub-signal 2C: Organic / Word-of-Mouth Share

Score	Level	Description	Evidence
1	Ad hoc	<10% of new users arrive organically	Growth entirely paid-dependent
2	Emerging	10-20% organic	Some spontaneous interest
3	Defined	20-35% organic	Meaningful word-of-mouth emerging
4	Managed	35-50% organic	Strong referral loop; viral coefficient approaching 1
5	Optimized	>50% organic; users recruit other users without being asked	Network effects or genuine love [src4]

Red flags: Paid acquisition can manufacture any growth curve — if organic share is <10% and retention is mediocre, you are buying a graph, not PMF. [src3]

Quick diagnostic question: "What % of new signups this month came from unpaid channels (direct, organic search, referral, word-of-mouth)?"

Dimension 3: Willingness-to-Pay (weight: 25%)

What this measures: Whether users commit economically — paying full price without discount, sustaining payments without churn, and expanding spend. Stated intent ("I would pay for this") is noise; actual payment is signal. [src2]

Sub-signal 3A: Paid Conversion (where applicable)

Score	Level	Description	Evidence
1	Ad hoc	Free-only or trial → paid conversion <1%	No willingness signal
2	Emerging	Trial → paid 1-3% (B2C) or 5-10% (B2B freemium)	Weak signal
3	Defined	Trial → paid 3-5% (B2C) or 10-20% (B2B freemium)	Industry-typical
4	Managed	Trial → paid 5-10% (B2C) or 20-35% (B2B freemium)	Strong
5	Optimized	Trial → paid >10% (B2C) or >35% (B2B freemium)	Best-in-class [src6]

Red flags: Discounts, coupons, or "free forever" converting users — that is a price signal, not a fit signal. Strip discounts and re-measure.

Quick diagnostic question: "What % of users who start a free trial or free plan convert to a paid plan at full list price (no discount)?"

Sub-signal 3B: Price Stability / Discount Dependency

Score	Level	Description	Evidence
1	Ad hoc	>50% of revenue from discounted deals	Sellers buy revenue; value unproven
2	Emerging	30-50% discounted	Heavy discounting to close
3	Defined	15-30% discounted	Normal negotiation
4	Managed	<15% discounted; full-price close rate growing	Customers accept value
5	Optimized	No discount needed; attempted price increases do not raise churn	Pricing power

Red flags: Sales team resorts to discount/extended terms to close; churn spikes when discounts expire. [src6]

Quick diagnostic question: "What % of closed deals in the last 90 days involved a discount greater than 10% off list?"

Sub-signal 3C: Net Revenue Retention (NRR) / Expansion

Score	Level	Description	Evidence
1	Ad hoc	NRR <80%	Business shrinking on existing base
2	Emerging	NRR 80-95%	Leakage; no expansion
3	Defined	NRR 95-105%	Break-even on expansion vs churn
4	Managed	NRR 105-120%	Healthy expansion; existing base drives growth
5	Optimized	NRR >120%	Best-in-class; customers pay more over time [src6]

Red flags: NRR calculated with new logos included — that is GRR plus new sales, not NRR. NRR must be same-cohort only.

Quick diagnostic question: "For the cohort of customers from 12 months ago, what is their revenue today as a percentage of their revenue 12 months ago (excluding any new customers acquired since)?"

Scoring & Interpretation

Overall Score Calculation

Each sub-signal scores 1-5. Compute dimension scores as simple averages of their 3 sub-signals, then apply dimension weights:

Retention Dimension      = avg(1A, 1B, 1C)       weight 0.45
Engagement Dimension     = avg(2A, 2B, 2C)       weight 0.30
Willingness-to-Pay Dim.  = avg(3A, 3B, 3C)       weight 0.25

Composite Score (1-5)    = 0.45*Retention + 0.30*Engagement + 0.25*WTP
Composite (0-100 scale)  = (Composite - 1) / 4 * 100

Retention is weighted highest because it is the hardest signal to fake and the most predictive of long-term outcomes. [src2, src8] Willingness-to-pay is weighted lower only because it may not apply to pre-monetization products — when applicable, treat low WTP scores as a hard gate regardless of composite.

Score Interpretation

Overall Score (0-100)	Maturity Level	Interpretation	Recommended Next Step
0 - 19	Critical — signals are noise	Claimed PMF is not supported by data. Vanity metrics dominate. Do not scale.	Return to customer discovery; fetch MVP testing framework
20 - 39	Developing — weak signals	Early positive signals but heavy noise. Retention not yet proven. High risk of false positive.	Run full PMF engine — fetch PMF measurement
40 - 59	Competent — partial PMF	PMF exists in a segment but not broadly. Identify and double down on the high-expectation customer.	Formalize segment focus — fetch PMF measurement
60 - 79	Advanced — clear PMF	Multiple independent signals confirm PMF. Can begin scaling in strongest segment.	Full scaling gate check — fetch scaling readiness
80 - 100	Best-in-class — undeniable PMF	Andreessen's "you'll feel it" territory: retention plateau + deep engagement + unforced payment.	Build scaling engine — fetch growth model design [src4]

Dimension-Level Action Routing

Weak Dimension (Score < 3)	Fetch This Card
Retention	Product-Market Fit Measurement — focus on cohort analysis and segment isolation
Engagement	Product-Market Fit Measurement — apply Superhuman PMF engine to high-expectation customer [src1]
Willingness-to-Pay	Value-Based Pricing SaaS — re-validate pricing with real behavior, not surveys

Benchmarks by Segment

Scores mean different things at different stages and models. Applying one threshold across segments produces misleading diagnoses. [src7]

Segment	Expected Average Score (0-100)	"Good" Threshold	"Alarm" Threshold
Pre-seed (MVP, <6 months)	15-30	>35	<15 — too early to measure meaningfully
Seed (early revenue, 6-18 months)	30-50	>50	<25
Post-seed (growing revenue, 18-30 months)	45-65	>60	<35
Series A+ (scaling, 30+ months)	55-75	>70	<50 — PMF should be solid by this stage [src7]
Consumer (B2C subscription)	35-55	>55	<25 — B2C churn floors are higher
B2B SaaS SMB	45-65	>60	<35
B2B SaaS Enterprise	55-75	>65	<45 — enterprise retention floors are much higher
Marketplace	30-50 (two-sided is harder)	>55	<25 — need to score both sides

[src3, src6, src7]

Common Pitfalls in Assessment

False positive from paid acquisition masking: Topline user growth looks strong, but >90% arrives via paid channels and Month-3 retention is <15%. The growth curve is bought, not earned. Strip paid cohorts and rescore on organic only. [src3]
Founder self-assessment bias: Founders overscore every dimension by 0.5-1.0 points on average. Require evidence (screenshots, cohort charts, SQL queries) for every score above 3. Ideally score with an outside advisor or investor. [src2]
Snapshot fallacy: A one-time score is a point-in-time snapshot. Real signal is the trajectory over 3-4 consecutive quarters. A score of 55 trending to 65 is a stronger signal than a 65 trending to 55.
Cross-segment averaging: Scoring retention across all users hides the real fit that exists for a specific segment. Always segment first (by persona, size, use case, or acquisition channel), then score the best segment — and report the segment size. [src2]
NPS / Sean Ellis survey treated as ground truth: Stated preference surveys have documented false positive rates. Use them only as triangulation against behavioral data; never as the single signal. [src2, src5]
Shallow engagement counted as depth: DAU/MAU can be inflated by notifications or auto-opens without real usage. Score only actions that create persistent artifacts or direct value. [src6]
Discounted revenue counted as willingness-to-pay: Heavy-discount closes are price-validation, not fit-validation. Report list-price conversion separately from discounted conversion.

When This Matters

Use this scorecard when: (1) a founder or investor disagrees about whether PMF exists; (2) before any decision to increase burn rate, hire a sales team, or raise a Series A; (3) after a launch moment (press, Product Hunt) to distinguish sustained signal from news-cycle noise; (4) quarterly as part of board reporting to track PMF trajectory, not just magnitude. [src4, src7]