Pilot Execution Playbook — Signal Stack

How do you execute a Signal Stack pilot delivering 10-20 qualified dossiers per week?

Purpose

This recipe executes a Signal Stack pilot that delivers 10-20 qualified dossiers per week to 2-3 pilot customers. The pilot validates that signal-driven outreach produces measurably higher conversion than cold outreach — the core thesis that 95% of any market is not buying at any moment, but the 5% in active crisis are detectable through observable corporate distress signals. Phase 1 exit criteria: 3 paying customers in vertical #1. [src1, src3]

Prerequisites

Signal audit complete — all signal sources in the target vertical mapped and scored
Signal taxonomy defined — trigger events, strength scoring, false positive thresholds validated by domain advisor
2-3 pilot customers identified — each with measurable current outreach baselines
MVP pipeline operational — data ingestion, classification, enrichment, and dossier generation working
GDPR/PECR/CAN-SPAM compliance confirmed — outbound delivery legal in target jurisdictions

Constraints

No platform work until 3 paying customers validate vertical #1. The pilot proves the vertical, not the platform. [src1]
Human-in-the-loop review mandatory for first 100 packages per client. Quality failures in pilot phase kill the engagement. [src2]
Pilot customers must have measurable current outreach baselines — without baselines, you cannot prove the 2x conversion claim.
GDPR/PECR/CAN-SPAM compliance required for all outbound delivery. [src2]
Minimum pilot duration: 4 weeks. Shorter pilots produce statistically unreliable conversion data. [src5]

Tool Selection Decision

Which path?
├── Technical team available (can write Python)
│   ├── Budget > $500/month
│   │   └── PATH A: Full API stack — LLM + Clearbit/Apollo + Resend
│   └── Budget < $500/month
│       └── PATH B: LLM + free enrichment + manual delivery
├── No-code team
│   ├── Budget > $500/month
│   │   └── PATH C: Make/n8n + Clay + LLM API
│   └── Budget < $500/month
│       └── PATH D: Manual pipeline — Google Alerts + ChatGPT + email
└── Hybrid (one developer + ops person)
    └── PATH E: Python scrapers + LLM API + manual QA

Path	Tools	Cost/month	Speed	Output Quality
A: Full API	Python + Claude/GPT-4 + Clearbit + Resend	$500-800	10-20/week	Excellent
B: Budget API	Python + LLM + Apollo free + manual	$200-400	10-15/week	Good
C: No-code	Make/n8n + Clay + LLM API	$400-700	8-15/week	Good
D: Manual	Google Alerts + ChatGPT + manual	$20-50	5-10/week	Adequate
E: Hybrid	Python + LLM API + manual QA	$300-500	10-20/week	Excellent

Execution Flow

Step 1: Baseline Measurement

Duration: 2-3 days · Tool: CRM export + spreadsheet analysis

Collect current outreach performance metrics from each pilot customer: total outreach volume, open rate, reply rate, meeting-booked rate, close rate, average deal size, and cost per meeting.

Verify: Baseline spreadsheet complete for all pilot customers with 3+ months of historical data. · If failed: Use industry benchmarks (15-25% open, 1-3% reply, 0.5-1% meeting rate).

Step 2: Pipeline Calibration

Duration: 3-5 days · Tool: Python scripts + LLM API + test data

Run the pipeline on a test batch of 50-100 signal events. Classify each as true positive, false positive, or ambiguous. Calculate initial precision rate and calibrate classification thresholds. [src1]

Verify: Precision rate > 60% on test batch. · If failed: Return to taxonomy workshop, add exclusion rules for top 3 false positive patterns.

Step 3: First Dossier Batch (Week 1)

Duration: 5 days (ongoing weekly) · Tool: Full pipeline + human review

Generate 10-20 dossiers containing signal evidence, company profile, decision-maker identification, tailored outreach copy, and proof pack. Human-in-the-loop review scores each dossier 1-5 on accuracy, completeness, relevance, and proof quality. [src2]

Verify: At least 10 dossiers pass quality review (score >= 3). · If failed: Pause delivery, tighten classification, generate new batch.

Step 4: A/B Test Package Formats

Duration: 2 weeks (parallel to delivery) · Tool: Email delivery with variant tracking

Split delivery into 2-3 format variants: full PDF dossier, executive summary email, data-only alert. Track open/reply/meeting rates per variant. [src3]

Verify: Statistical significance on at least one metric after 2 weeks. · If failed: Extend test or reduce to 2 variants.

Step 5: Weekly Taxonomy Iteration

Duration: 2-4 hours/week (ongoing) · Tool: Spreadsheet analysis + taxonomy update

Review all signals weekly. Classify outcomes as true positive, unknown, or false positive. Update taxonomy rules based on false positive analysis. Target: < 30% false positive rate by week 4. [src1, src2]

Verify: False positive rate decreasing week-over-week. · If failed: Replace weakest signal source or add corroborating signal requirement.

Step 6: Conversion Tracking

Duration: Ongoing (weekly) · Tool: CRM integration or manual tracking

Track full funnel: dossier sent → opened → replied → meeting → proposal → close. Compare against baseline. Key metric: > 2x conversion vs. cold outreach baseline. [src3, src5]

Verify: Conversion data for 80%+ of delivered dossiers by week 4. · If failed: Add manual follow-up calls to capture outcomes.

Step 7: Phase 1 Exit Assessment

Duration: 1-2 days · Tool: Analysis + presentation

Compile exit assessment: signal accuracy, conversion vs. baseline, customer satisfaction, unit economics, go/no-go for platform extraction. Phase 1 exit criteria: 3 paying customers in vertical #1. [src1, src2]

Verify: Exit assessment with data-backed recommendation complete. · If failed: Extend pilot by 4 weeks or reassess vertical selection.

Output Schema

{
  "output_type": "pilot_performance_report",
  "format": "spreadsheet + PDF summary",
  "sections": [
    {"name": "baseline_metrics", "type": "object", "description": "Pre-pilot outreach performance per customer"},
    {"name": "weekly_dossier_batches", "type": "array", "description": "Dossier count, quality scores, delivery per week"},
    {"name": "signal_accuracy", "type": "object", "description": "Precision rate, false positive rate, taxonomy iterations"},
    {"name": "conversion_funnel", "type": "object", "description": "Open/reply/meeting/close rates vs baseline"},
    {"name": "ab_test_results", "type": "object", "description": "Package format variant performance"},
    {"name": "exit_assessment", "type": "object", "description": "Go/no-go for platform extraction"}
  ]
}

Quality Benchmarks

Quality Metric	Minimum Acceptable	Good	Excellent
Dossier volume (per week)	>= 10	>= 15	>= 20
Signal precision rate	> 60%	> 75%	> 85%
Dossier quality score (avg)	> 3.0/5	> 3.5/5	> 4.0/5
Conversion vs. baseline	> 1.5x	> 2x	> 3x
False positive trend	Flat	Decreasing	< 20% by week 4
Pilot customer retention	2/3 continue	3/3 continue	3/3 convert to paid

If below minimum: Pause delivery, return to taxonomy calibration, extend pilot by 2 weeks.

Error Handling

Error	Likely Cause	Recovery Action
Precision < 60% after calibration	Taxonomy too broad or sources too noisy	Narrow to top 2-3 sources, add compound signal requirement
Pilot customer stops responding	Dossier quality too low	Direct outreach, request feedback, pivot contact
Volume < 10/week	Low event frequency in vertical	Add signal sources or broaden geographic scope
A/B test inconclusive	Volume too low	Extend test period or reduce variants
Compliance flag	PECR/CAN-SPAM violation	Pause delivery, review compliance, switch to opt-in only
LLM classification degrades	Prompt drift or model update	Re-calibrate prompts, pin model version

Cost Breakdown

Component	Budget ($5K)	Standard ($10K)	Premium ($15K)
Signal Architect labor	$2K	$4K	$6K
LLM API costs	$500	$1K	$1.5K
Enrichment APIs	$300	$600	$1K
Delivery infrastructure	$200	$400	$500
Domain advisor	$0	$2K	$3K
QA and iteration	$1K	$2K	$3K
Total (4-week pilot)	$4K-$5K	$10K	$15K

Anti-Patterns

Wrong: Optimizing for volume over quality in week 1

Pushing 20+ dossiers before signal accuracy is validated. Result: pilot customers receive irrelevant dossiers, lose trust, and the pilot fails. [src2]

Correct: Cap at 10 dossiers week 1, scale after quality is validated

Deliver 10 human-reviewed dossiers in week 1. Only increase volume after customer confirms relevance on >= 7/10.

Wrong: Skipping baseline measurement

Starting without documenting current outreach performance. Result: you cannot prove the 2x conversion claim. [src3]

Correct: Measure before you move

Spend 2-3 days collecting 3+ months of historical outreach data before delivering a single dossier.

Wrong: Not iterating on the taxonomy weekly

Treating the initial taxonomy as fixed. Result: false positive rate stays high, quality plateaus, customers churn. [src1]

Correct: Weekly taxonomy iteration is non-negotiable

Review every false positive weekly. Update classification rules and document changes. The taxonomy should visibly improve each week.

When This Matters

Use when an agent needs to execute or plan a Signal Stack pilot engagement. This is the hands-on delivery recipe — it takes a completed signal audit and taxonomy and turns them into measurable results. The pilot is the validation gate: if it does not produce 3 paying customers with > 2x conversion, do not proceed to platform extraction.