This recipe sets up a working A/B testing system on a landing page — from sample size calculation through test creation, variant deployment, and statistical analysis. The output is a running experiment that splits traffic between control and variant(s), tracks conversions per variant, and produces statistically valid results. [src1]
Which path?
├── Developer AND budget = free
│ └── PATH A: PostHog Experiments — free, feature flags + analytics
├── Non-technical AND budget > $0
│ └── PATH B: VWO — $49/mo, visual drag-and-drop editor
├── Developer AND wants zero dependencies
│ └── PATH C: Custom JavaScript — $0, lightweight split with GA4
└── High traffic AND enterprise needs
└── PATH D: Statsig/Optimizely — $0-50K/yr, advanced stats
| Path | Tools | Cost | Setup Time | Best For |
|---|---|---|---|---|
| A: PostHog | PostHog Cloud | $0 | 30 min | Developers, product teams |
| B: VWO | VWO Starter | $49/mo | 20 min | Non-technical, visual tests |
| C: Custom JS | Vanilla JS + GA4 | $0 | 45 min | Minimalists, privacy-focused |
| D: Enterprise | Statsig/Optimizely | $0-50K+/yr | 60 min | High traffic, advanced stats |
Duration: 5 minutes · Tool: Online calculator
Use Evan Miller's calculator or Optimizely's calculator. At 5% baseline CVR and 20% relative MDE: ~3,800 visitors per variant. Test duration = (visitors per variant × variants) / monthly traffic. [src4] [src5]
Quick Reference (95% significance, 80% power):
───────────────────────────────────────
Baseline 2%, MDE 50%: ~3,600 per variant
Baseline 5%, MDE 20%: ~3,800 per variant
Baseline 5%, MDE 50%: ~700 per variant
Baseline 10%, MDE 20%: ~1,900 per variant
Baseline 10%, MDE 50%: ~350 per variant
Verify: Test duration is under 8 weeks. · If failed: If > 3 months, use qualitative testing instead.
Duration: 10-20 minutes · Tool: PostHog, VWO, or code editor
Create the experiment with control and variant(s), set goal metric, configure traffic allocation (50/50). For PostHog, use feature flags with experiment code. For VWO, use the visual editor. For custom JS, use localStorage-based variant assignment with GA4 event tracking. [src1] [src7]
Verify: Open page in two incognito windows — one shows control, one shows variant. · If failed: Check feature flag status, clear localStorage between tests.
Duration: 5 minutes · Tool: Code editor
Add CSS to hide the page until variant is applied, with a 2-second timeout fallback. For PostHog, use server-side bootstrapping. VWO has built-in anti-flicker.
Verify: On slow 3G throttling, no original content flashes. · If failed: Move anti-flicker to first element in <head>.
Duration: 5-10 minutes · Tool: PostHog, VWO, or GA4
Ensure conversion events include variant identifier. PostHog tracks automatically. VWO uses goal settings. Custom JS sends variant ID with GA4 events.
Verify: Test conversion in each variant, verify attribution in dashboard. · If failed: Check experiment impression fires before conversion event.
Duration: 2-8 weeks · Tool: PostHog, VWO, or calculator
Check results weekly but do not stop early. Verify traffic split, check for errors, confirm sample accumulation. Use VWO's significance calculator for custom JS tests. [src3]
Verify: At calculated sample size, one variant has > 95% significance. · If failed: If neither wins, difference is too small to detect.
Duration: 10 minutes · Tool: Code editor + testing tool
Make winning variant the permanent default. Remove testing code, anti-flicker snippet, and archive the experiment. Document results for future reference.
Verify: Winning variant is default. All test code removed. CVR stable for 1 week. · If failed: Verify permanent implementation matches test variant exactly.
{
"output_type": "ab_test_configuration",
"format": "running experiment + documentation",
"columns": [
{"name": "test_name", "type": "string", "description": "Unique test identifier", "required": true},
{"name": "tool", "type": "string", "description": "Testing platform used", "required": true},
{"name": "variants", "type": "array", "description": "Variant descriptions", "required": true},
{"name": "goal_metric", "type": "string", "description": "Primary conversion event", "required": true},
{"name": "sample_size_required", "type": "number", "description": "Minimum visitors per variant", "required": true},
{"name": "estimated_duration", "type": "string", "description": "Weeks to significance", "required": true},
{"name": "status", "type": "string", "description": "running/completed/stopped", "required": true}
],
"expected_row_count": "1",
"sort_order": "N/A",
"deduplication_key": "test_name"
}
| Quality Metric | Minimum Acceptable | Good | Excellent |
|---|---|---|---|
| Traffic split accuracy | Within ±10% | Within ±5% | Within ±2% |
| Sample size reached | > 80% of minimum | 100% of minimum | 120% of minimum |
| Test duration | < 12 weeks | < 6 weeks | < 3 weeks |
| Flicker-free experience | < 2s delay | < 500ms | Zero flicker (server-side) |
| Result documentation | Winner declared | Full stats documented | Learnings shared |
If below minimum: Check feature flag configuration for split accuracy. Increase MDE or traffic if sample size unreachable.
| Error | Likely Cause | Recovery Action |
|---|---|---|
| Both variants show same content | Feature flag not evaluating | Check dashboard for flag status; verify JS loads |
| Conversion rates at 0% | Events not firing or not attributed | Verify events fire with variant ID; check goal metric name |
| Page flicker visible | Anti-flicker snippet missing | Move anti-flicker CSS to first element in <head> |
| Traffic not splitting | CDN caching same variant | Check CDN does not cache variant-specific content |
| No data in PostHog | Script not loading or blocked | Check console for errors; test in incognito |
| Significance fluctuating | Normal before reaching sample size | Do not stop early; wait for full sample |
| Component | Free Tier | Paid Tier | At Scale |
|---|---|---|---|
| PostHog | $0 (1M events/mo) | Pay-per-use | $0.00005/event >1M |
| VWO | $0 (30-day trial) | $49/mo Starter | $972/mo Enterprise |
| Statsig | $0 (50M events/mo) | $150/mo Pro | Custom |
| Custom JS + GA4 | $0 | $0 | $0 |
| Total | $0 | $0-49/mo | $49-972/mo |
Early stopping inflates false positive rates from 5% to over 25%. A variant that looks better at day 3 may be identical at day 23. [src3]
Calculate required sample size before starting and commit to running until that number is reached.
Testing button color on 500 monthly visitors requires 15+ months for significance. [src5]
Low-traffic: test big changes (headlines, layouts). High-traffic: can test subtle changes (copy, placement).
Use this recipe when the agent needs to set up a quantitative A/B test on a landing page. Requires existing analytics and conversion events. Handles experiment design, tool setup, and statistical analysis.