Pilot Execution Playbook — Signal Stack
How do you execute a Signal Stack pilot delivering 10-20 qualified dossiers per week?
Purpose
This recipe executes a Signal Stack pilot that delivers 10-20 qualified dossiers per week to 2-3 pilot customers. The pilot validates that signal-driven outreach produces measurably higher conversion than cold outreach — the core thesis that 95% of any market is not buying at any moment, but the 5% in active crisis are detectable through observable corporate distress signals. Phase 1 exit criteria: 3 paying customers in vertical #1. [src1, src3]
Prerequisites
- Signal audit complete — all signal sources in the target vertical mapped and scored
- Signal taxonomy defined — trigger events, strength scoring, false positive thresholds validated by domain advisor
- 2-3 pilot customers identified — each with measurable current outreach baselines
- MVP pipeline operational — data ingestion, classification, enrichment, and dossier generation working
- GDPR/PECR/CAN-SPAM compliance confirmed — outbound delivery legal in target jurisdictions
Constraints
- No platform work until 3 paying customers validate vertical #1. The pilot proves the vertical, not the platform. [src1]
- Human-in-the-loop review mandatory for first 100 packages per client. Quality failures in pilot phase kill the engagement. [src2]
- Pilot customers must have measurable current outreach baselines — without baselines, you cannot prove the 2x conversion claim.
- GDPR/PECR/CAN-SPAM compliance required for all outbound delivery. [src2]
- Minimum pilot duration: 4 weeks. Shorter pilots produce statistically unreliable conversion data. [src5]
Tool Selection Decision
Which path?
├── Technical team available (can write Python)
│ ├── Budget > $500/month
│ │ └── PATH A: Full API stack — LLM + Clearbit/Apollo + Resend
│ └── Budget < $500/month
│ └── PATH B: LLM + free enrichment + manual delivery
├── No-code team
│ ├── Budget > $500/month
│ │ └── PATH C: Make/n8n + Clay + LLM API
│ └── Budget < $500/month
│ └── PATH D: Manual pipeline — Google Alerts + ChatGPT + email
└── Hybrid (one developer + ops person)
└── PATH E: Python scrapers + LLM API + manual QA
| Path | Tools | Cost/month | Speed | Output Quality |
|---|---|---|---|---|
| A: Full API | Python + Claude/GPT-4 + Clearbit + Resend | $500-800 | 10-20/week | Excellent |
| B: Budget API | Python + LLM + Apollo free + manual | $200-400 | 10-15/week | Good |
| C: No-code | Make/n8n + Clay + LLM API | $400-700 | 8-15/week | Good |
| D: Manual | Google Alerts + ChatGPT + manual | $20-50 | 5-10/week | Adequate |
| E: Hybrid | Python + LLM API + manual QA | $300-500 | 10-20/week | Excellent |
Execution Flow
Step 1: Baseline Measurement
Duration: 2-3 days · Tool: CRM export + spreadsheet analysis
Collect current outreach performance metrics from each pilot customer: total outreach volume, open rate, reply rate, meeting-booked rate, close rate, average deal size, and cost per meeting.
Verify: Baseline spreadsheet complete for all pilot customers with 3+ months of historical data. · If failed: Use industry benchmarks (15-25% open, 1-3% reply, 0.5-1% meeting rate).
Step 2: Pipeline Calibration
Duration: 3-5 days · Tool: Python scripts + LLM API + test data
Run the pipeline on a test batch of 50-100 signal events. Classify each as true positive, false positive, or ambiguous. Calculate initial precision rate and calibrate classification thresholds. [src1]
Verify: Precision rate > 60% on test batch. · If failed: Return to taxonomy workshop, add exclusion rules for top 3 false positive patterns.
Step 3: First Dossier Batch (Week 1)
Duration: 5 days (ongoing weekly) · Tool: Full pipeline + human review
Generate 10-20 dossiers containing signal evidence, company profile, decision-maker identification, tailored outreach copy, and proof pack. Human-in-the-loop review scores each dossier 1-5 on accuracy, completeness, relevance, and proof quality. [src2]
Verify: At least 10 dossiers pass quality review (score >= 3). · If failed: Pause delivery, tighten classification, generate new batch.
Step 4: A/B Test Package Formats
Duration: 2 weeks (parallel to delivery) · Tool: Email delivery with variant tracking
Split delivery into 2-3 format variants: full PDF dossier, executive summary email, data-only alert. Track open/reply/meeting rates per variant. [src3]
Verify: Statistical significance on at least one metric after 2 weeks. · If failed: Extend test or reduce to 2 variants.
Step 5: Weekly Taxonomy Iteration
Duration: 2-4 hours/week (ongoing) · Tool: Spreadsheet analysis + taxonomy update
Review all signals weekly. Classify outcomes as true positive, unknown, or false positive. Update taxonomy rules based on false positive analysis. Target: < 30% false positive rate by week 4. [src1, src2]
Verify: False positive rate decreasing week-over-week. · If failed: Replace weakest signal source or add corroborating signal requirement.
Step 6: Conversion Tracking
Duration: Ongoing (weekly) · Tool: CRM integration or manual tracking
Track full funnel: dossier sent → opened → replied → meeting → proposal → close. Compare against baseline. Key metric: > 2x conversion vs. cold outreach baseline. [src3, src5]
Verify: Conversion data for 80%+ of delivered dossiers by week 4. · If failed: Add manual follow-up calls to capture outcomes.
Step 7: Phase 1 Exit Assessment
Duration: 1-2 days · Tool: Analysis + presentation
Compile exit assessment: signal accuracy, conversion vs. baseline, customer satisfaction, unit economics, go/no-go for platform extraction. Phase 1 exit criteria: 3 paying customers in vertical #1. [src1, src2]
Verify: Exit assessment with data-backed recommendation complete. · If failed: Extend pilot by 4 weeks or reassess vertical selection.
Output Schema
{
"output_type": "pilot_performance_report",
"format": "spreadsheet + PDF summary",
"sections": [
{"name": "baseline_metrics", "type": "object", "description": "Pre-pilot outreach performance per customer"},
{"name": "weekly_dossier_batches", "type": "array", "description": "Dossier count, quality scores, delivery per week"},
{"name": "signal_accuracy", "type": "object", "description": "Precision rate, false positive rate, taxonomy iterations"},
{"name": "conversion_funnel", "type": "object", "description": "Open/reply/meeting/close rates vs baseline"},
{"name": "ab_test_results", "type": "object", "description": "Package format variant performance"},
{"name": "exit_assessment", "type": "object", "description": "Go/no-go for platform extraction"}
]
}
Quality Benchmarks
| Quality Metric | Minimum Acceptable | Good | Excellent |
|---|---|---|---|
| Dossier volume (per week) | >= 10 | >= 15 | >= 20 |
| Signal precision rate | > 60% | > 75% | > 85% |
| Dossier quality score (avg) | > 3.0/5 | > 3.5/5 | > 4.0/5 |
| Conversion vs. baseline | > 1.5x | > 2x | > 3x |
| False positive trend | Flat | Decreasing | < 20% by week 4 |
| Pilot customer retention | 2/3 continue | 3/3 continue | 3/3 convert to paid |
If below minimum: Pause delivery, return to taxonomy calibration, extend pilot by 2 weeks.
Error Handling
| Error | Likely Cause | Recovery Action |
|---|---|---|
| Precision < 60% after calibration | Taxonomy too broad or sources too noisy | Narrow to top 2-3 sources, add compound signal requirement |
| Pilot customer stops responding | Dossier quality too low | Direct outreach, request feedback, pivot contact |
| Volume < 10/week | Low event frequency in vertical | Add signal sources or broaden geographic scope |
| A/B test inconclusive | Volume too low | Extend test period or reduce variants |
| Compliance flag | PECR/CAN-SPAM violation | Pause delivery, review compliance, switch to opt-in only |
| LLM classification degrades | Prompt drift or model update | Re-calibrate prompts, pin model version |
Cost Breakdown
| Component | Budget ($5K) | Standard ($10K) | Premium ($15K) |
|---|---|---|---|
| Signal Architect labor | $2K | $4K | $6K |
| LLM API costs | $500 | $1K | $1.5K |
| Enrichment APIs | $300 | $600 | $1K |
| Delivery infrastructure | $200 | $400 | $500 |
| Domain advisor | $0 | $2K | $3K |
| QA and iteration | $1K | $2K | $3K |
| Total (4-week pilot) | $4K-$5K | $10K | $15K |
Anti-Patterns
Wrong: Optimizing for volume over quality in week 1
Pushing 20+ dossiers before signal accuracy is validated. Result: pilot customers receive irrelevant dossiers, lose trust, and the pilot fails. [src2]
Correct: Cap at 10 dossiers week 1, scale after quality is validated
Deliver 10 human-reviewed dossiers in week 1. Only increase volume after customer confirms relevance on >= 7/10.
Wrong: Skipping baseline measurement
Starting without documenting current outreach performance. Result: you cannot prove the 2x conversion claim. [src3]
Correct: Measure before you move
Spend 2-3 days collecting 3+ months of historical outreach data before delivering a single dossier.
Wrong: Not iterating on the taxonomy weekly
Treating the initial taxonomy as fixed. Result: false positive rate stays high, quality plateaus, customers churn. [src1]
Correct: Weekly taxonomy iteration is non-negotiable
Review every false positive weekly. Update classification rules and document changes. The taxonomy should visibly improve each week.
When This Matters
Use when an agent needs to execute or plan a Signal Stack pilot engagement. This is the hands-on delivery recipe — it takes a completed signal audit and taxonomy and turns them into measurable results. The pilot is the validation gate: if it does not produce 3 paying customers with > 2x conversion, do not proceed to platform extraction.