This recipe builds the minimum viable signal intelligence pipeline: cron-scheduled data ingestion, LLM classification using validated taxonomy, company enrichment, PDF dossier generation, and email delivery with tracking. No UI required. Target: functional in 2-4 weeks with >2x conversion improvement over cold outreach. [src1, src4]
Which stack?
├── Python + Claude API (recommended)
│ └── PATH A: Best classification accuracy
├── Python + GPT-4 API
│ └── PATH B: Good accuracy, wider ecosystem
├── Python + local LLM
│ └── PATH C: No API costs, lower accuracy
└── Node.js + any LLM API
└── PATH D: Alternative runtime
| Path | Stack | Monthly Cost | Classification Quality | Setup Complexity |
|---|---|---|---|---|
| A: Python + Claude | Python, Claude API, Clearbit | $300-$800 | Excellent | Low |
| B: Python + GPT-4 | Python, GPT-4 API, Clearbit | $300-$800 | Good | Low |
| C: Python + Local | Python, Llama/Mixtral | $100-$500 | Adequate | High |
| D: Node.js | Node.js, any LLM | $300-$800 | Varies | Low |
Duration: 3-5 days · Tool: Python + requests/httpx + cron
Build API integrations for 3-5 data sources. Implement rate limiting, retry logic, date-filtered queries. Store raw data as JSON/SQLite. Schedule via cron. Add logging per source. [src2]
Verify: Each source pulls data for 3 consecutive runs. Error handling logs failures without crashing. · If failed: Check auth, rate limits, data format. Test with curl first.
Duration: 3-5 days · Tool: Python + LLM API + taxonomy JSON
Load taxonomy, construct classification prompts with taxonomy rules as system context and raw records as input. Parse structured JSON output. Apply composite scoring. Filter above threshold. Log all classifications. Include 2-3 few-shot examples. [src2]
Verify: >70% accuracy on 100 test records. Output parsing >95% success. · If failed: Refine prompt structure and few-shot examples.
Duration: 2-3 days · Tool: Python + Clearbit/Apollo API
Query enrichment API for classified companies: firmographics, contacts, technology stack. Handle partial data gracefully. Cache results. Track coverage percentage. [src3]
Verify: Enrichment returns data for >60% of companies. Contacts for >50%. · If failed: Add secondary source or manual LinkedIn lookup.
Duration: 2-3 days · Tool: Python + LLM API + PDF generation
Design 1-2 page PDF template. LLM generates narrative summary with outreach angle. Populate with structured data + narrative. Include signal evidence with source links. [src4]
Verify: Sample reviewed by sales team lead. Readable, clear evidence, actionable. · If failed: Iterate template with sales input.
Duration: 2-3 days · Tool: Python + SendGrid/Resend
Configure email templates, batch delivery scheduling, open/click tracking, delivery logging, unsubscribe mechanism. [src4]
Verify: Test emails delivered to 5 recipients. Tracking functional. PDFs render. · If failed: Check SPF/DKIM/DMARC for spam issues.
Duration: 2-3 days · Tool: Cron + logging + alerting
Wire modules together: ingest → classify → enrich → generate → deliver. Add health monitoring, alerting on failures, tracking spreadsheet. [src5]
Verify: Full pipeline runs 3 consecutive days without intervention. Alerts fire on simulated failures. · If failed: Add retry logic, increase timeouts.
{
"output_type": "mvp_pipeline",
"format": "deployed code + documentation",
"sections": [
{"name": "ingestion_module", "type": "object", "description": "Data source integrations with cron"},
{"name": "classification_module", "type": "object", "description": "LLM classification with taxonomy"},
{"name": "enrichment_module", "type": "object", "description": "Company/contact enrichment"},
{"name": "dossier_generator", "type": "object", "description": "PDF generation with narrative"},
{"name": "delivery_module", "type": "object", "description": "Email delivery with tracking"},
{"name": "monitoring", "type": "object", "description": "Health metrics and alerting"}
]
}
| Quality Metric | Minimum Acceptable | Good | Excellent |
|---|---|---|---|
| Pipeline uptime (daily runs) | > 90% | > 95% | > 99% |
| Classification accuracy | > 70% | > 80% | > 90% |
| Enrichment coverage | > 60% | > 75% | > 90% |
| Dossier delivery rate | > 95% | > 98% | > 99% |
| Conversion vs cold outreach | > 1.5x | > 2x | > 3x |
If below minimum: Identify bottleneck module, increase logging detail, fix before scaling.
| Error | Likely Cause | Recovery Action |
|---|---|---|
| Data source returns 429 | Rate limit exceeded | Exponential backoff; reduce frequency |
| LLM returns unparseable output | Prompt format issue | Output validation + retry with stricter format |
| Enrichment coverage < 50% | Small/private companies | Add secondary source or manual lookup |
| Emails in spam | Domain reputation | Verify SPF/DKIM/DMARC; warm up domain |
| Silent pipeline failure | Insufficient error handling | Try/catch per module with alerting |
| Component | Lean ($5K-$8K) | Standard ($8K-$12K) | Full ($12K-$18K) |
|---|---|---|---|
| Data ingestion | $1.5K-$2.5K | $2.5K-$4K | $4K-$6K |
| LLM classification | $1K-$1.5K | $1.5K-$2.5K | $2.5K-$4K |
| Enrichment | $500-$1K | $1K-$1.5K | $1.5K-$2.5K |
| Dossier generation | $800-$1.2K | $1.2K-$2K | $2K-$3K |
| Delivery + tracking | $500-$800 | $800-$1.2K | $1.2K-$2K |
| Monitoring | $500-$800 | $800-$1.2K | $1.2K-$1.5K |
| Total build | $5K-$8K | $8K-$12K | $12K-$18K |
| Monthly running | $300-$500 | $500-$1K | $1K-$2K |
Spending weeks on a dashboard before confirming dossier value. Result: pretty interface, worthless content. [src1]
Sales teams live in email. Add UI only after confirming >2x conversion improvement.
Writing ad hoc logic instead of using workshop output. Result: developer assumptions, not domain expertise. [src2]
Load the JSON schema from the workshop. Modifications only after revalidation with domain expert.
PostgreSQL, Redis, message queues for 50 records/day. Result: 3x build time, delayed validation. [src3]
Flat files or SQLite. Migrate to proper database only after volume justifies complexity.
Use when building the technical pipeline that converts identified signals into delivered sales dossiers. This is Phase 3 of the Signal Stack engagement — it implements the validated taxonomy into a functional automated system.