This recipe executes a Signal Stack pilot that delivers 10-20 qualified dossiers per week to 2-3 pilot customers. The pilot validates that signal-driven outreach produces measurably higher conversion than cold outreach — the core thesis that 95% of any market is not buying at any moment, but the 5% in active crisis are detectable through observable corporate distress signals. Phase 1 exit criteria: 3 paying customers in vertical #1. [src1, src3]
Which path?
├── Technical team available (can write Python)
│ ├── Budget > $500/month
│ │ └── PATH A: Full API stack — LLM + Clearbit/Apollo + Resend
│ └── Budget < $500/month
│ └── PATH B: LLM + free enrichment + manual delivery
├── No-code team
│ ├── Budget > $500/month
│ │ └── PATH C: Make/n8n + Clay + LLM API
│ └── Budget < $500/month
│ └── PATH D: Manual pipeline — Google Alerts + ChatGPT + email
└── Hybrid (one developer + ops person)
└── PATH E: Python scrapers + LLM API + manual QA
| Path | Tools | Cost/month | Speed | Output Quality |
|---|---|---|---|---|
| A: Full API | Python + Claude/GPT-4 + Clearbit + Resend | $500-800 | 10-20/week | Excellent |
| B: Budget API | Python + LLM + Apollo free + manual | $200-400 | 10-15/week | Good |
| C: No-code | Make/n8n + Clay + LLM API | $400-700 | 8-15/week | Good |
| D: Manual | Google Alerts + ChatGPT + manual | $20-50 | 5-10/week | Adequate |
| E: Hybrid | Python + LLM API + manual QA | $300-500 | 10-20/week | Excellent |
Duration: 2-3 days · Tool: CRM export + spreadsheet analysis
Collect current outreach performance metrics from each pilot customer: total outreach volume, open rate, reply rate, meeting-booked rate, close rate, average deal size, and cost per meeting.
Verify: Baseline spreadsheet complete for all pilot customers with 3+ months of historical data. · If failed: Use industry benchmarks (15-25% open, 1-3% reply, 0.5-1% meeting rate).
Duration: 3-5 days · Tool: Python scripts + LLM API + test data
Run the pipeline on a test batch of 50-100 signal events. Classify each as true positive, false positive, or ambiguous. Calculate initial precision rate and calibrate classification thresholds. [src1]
Verify: Precision rate > 60% on test batch. · If failed: Return to taxonomy workshop, add exclusion rules for top 3 false positive patterns.
Duration: 5 days (ongoing weekly) · Tool: Full pipeline + human review
Generate 10-20 dossiers containing signal evidence, company profile, decision-maker identification, tailored outreach copy, and proof pack. Human-in-the-loop review scores each dossier 1-5 on accuracy, completeness, relevance, and proof quality. [src2]
Verify: At least 10 dossiers pass quality review (score >= 3). · If failed: Pause delivery, tighten classification, generate new batch.
Duration: 2 weeks (parallel to delivery) · Tool: Email delivery with variant tracking
Split delivery into 2-3 format variants: full PDF dossier, executive summary email, data-only alert. Track open/reply/meeting rates per variant. [src3]
Verify: Statistical significance on at least one metric after 2 weeks. · If failed: Extend test or reduce to 2 variants.
Duration: 2-4 hours/week (ongoing) · Tool: Spreadsheet analysis + taxonomy update
Review all signals weekly. Classify outcomes as true positive, unknown, or false positive. Update taxonomy rules based on false positive analysis. Target: < 30% false positive rate by week 4. [src1, src2]
Verify: False positive rate decreasing week-over-week. · If failed: Replace weakest signal source or add corroborating signal requirement.
Duration: Ongoing (weekly) · Tool: CRM integration or manual tracking
Track full funnel: dossier sent → opened → replied → meeting → proposal → close. Compare against baseline. Key metric: > 2x conversion vs. cold outreach baseline. [src3, src5]
Verify: Conversion data for 80%+ of delivered dossiers by week 4. · If failed: Add manual follow-up calls to capture outcomes.
Duration: 1-2 days · Tool: Analysis + presentation
Compile exit assessment: signal accuracy, conversion vs. baseline, customer satisfaction, unit economics, go/no-go for platform extraction. Phase 1 exit criteria: 3 paying customers in vertical #1. [src1, src2]
Verify: Exit assessment with data-backed recommendation complete. · If failed: Extend pilot by 4 weeks or reassess vertical selection.
{
"output_type": "pilot_performance_report",
"format": "spreadsheet + PDF summary",
"sections": [
{"name": "baseline_metrics", "type": "object", "description": "Pre-pilot outreach performance per customer"},
{"name": "weekly_dossier_batches", "type": "array", "description": "Dossier count, quality scores, delivery per week"},
{"name": "signal_accuracy", "type": "object", "description": "Precision rate, false positive rate, taxonomy iterations"},
{"name": "conversion_funnel", "type": "object", "description": "Open/reply/meeting/close rates vs baseline"},
{"name": "ab_test_results", "type": "object", "description": "Package format variant performance"},
{"name": "exit_assessment", "type": "object", "description": "Go/no-go for platform extraction"}
]
}
| Quality Metric | Minimum Acceptable | Good | Excellent |
|---|---|---|---|
| Dossier volume (per week) | >= 10 | >= 15 | >= 20 |
| Signal precision rate | > 60% | > 75% | > 85% |
| Dossier quality score (avg) | > 3.0/5 | > 3.5/5 | > 4.0/5 |
| Conversion vs. baseline | > 1.5x | > 2x | > 3x |
| False positive trend | Flat | Decreasing | < 20% by week 4 |
| Pilot customer retention | 2/3 continue | 3/3 continue | 3/3 convert to paid |
If below minimum: Pause delivery, return to taxonomy calibration, extend pilot by 2 weeks.
| Error | Likely Cause | Recovery Action |
|---|---|---|
| Precision < 60% after calibration | Taxonomy too broad or sources too noisy | Narrow to top 2-3 sources, add compound signal requirement |
| Pilot customer stops responding | Dossier quality too low | Direct outreach, request feedback, pivot contact |
| Volume < 10/week | Low event frequency in vertical | Add signal sources or broaden geographic scope |
| A/B test inconclusive | Volume too low | Extend test period or reduce variants |
| Compliance flag | PECR/CAN-SPAM violation | Pause delivery, review compliance, switch to opt-in only |
| LLM classification degrades | Prompt drift or model update | Re-calibrate prompts, pin model version |
| Component | Budget ($5K) | Standard ($10K) | Premium ($15K) |
|---|---|---|---|
| Signal Architect labor | $2K | $4K | $6K |
| LLM API costs | $500 | $1K | $1.5K |
| Enrichment APIs | $300 | $600 | $1K |
| Delivery infrastructure | $200 | $400 | $500 |
| Domain advisor | $0 | $2K | $3K |
| QA and iteration | $1K | $2K | $3K |
| Total (4-week pilot) | $4K-$5K | $10K | $15K |
Pushing 20+ dossiers before signal accuracy is validated. Result: pilot customers receive irrelevant dossiers, lose trust, and the pilot fails. [src2]
Deliver 10 human-reviewed dossiers in week 1. Only increase volume after customer confirms relevance on >= 7/10.
Starting without documenting current outreach performance. Result: you cannot prove the 2x conversion claim. [src3]
Spend 2-3 days collecting 3+ months of historical outreach data before delivering a single dossier.
Treating the initial taxonomy as fixed. Result: false positive rate stays high, quality plateaus, customers churn. [src1]
Review every false positive weekly. Update classification rules and document changes. The taxonomy should visibly improve each week.
Use when an agent needs to execute or plan a Signal Stack pilot engagement. This is the hands-on delivery recipe — it takes a completed signal audit and taxonomy and turns them into measurable results. The pilot is the validation gate: if it does not produce 3 paying customers with > 2x conversion, do not proceed to platform extraction.