This recipe executes a systematic audit of all available data sources that could provide intent signals for a target industry vertical. It produces a scored inventory covering regulatory databases, behavioral data sources, visual signals, and unstructured media — enabling a go/no-go decision on vertical viability. [src1, src4]
Which audit depth?
├── Quick assessment (3-5 days)
│ └── PATH A: Desktop research only
├── Standard audit (5-10 days)
│ └── PATH B: Desktop + API testing
├── Deep audit (10-15 days)
│ └── PATH C: Desktop + API testing + vendor interviews
└── Competitive audit
└── PATH D: Standard + competitor signal analysis
| Path | Scope | Cost | Speed | Confidence |
|---|---|---|---|---|
| A: Quick | Surface-level identification | $2K-$3K | 3-5 days | Moderate |
| B: Standard | Identification + quality verification | $3K-$5K | 5-10 days | High |
| C: Deep | Full evaluation + vendor negotiation | $5K-$8K | 10-15 days | Very high |
| D: Competitive | Standard + competitor analysis | $4K-$7K | 7-12 days | High |
Duration: 1-2 days · Tool: Web research + government database directories
Identify all regulatory and government databases relevant to the target vertical: EPA, FDA, OSHA, SEC, state licensing boards, building permits, zoning databases. Document agency, URL, data format, update frequency, geographic coverage. Score each on accessibility (1-5), cost (1-5), refresh rate (1-5), signal-to-noise (1-5). [src1]
Verify: Minimum 5 regulatory sources identified and scored. · If failed: Vertical is lightly regulated — shift weight to behavioral sources.
Duration: 1-2 days · Tool: Web research + API documentation review
Identify behavioral sources: DNS/WHOIS changes, job board postings, review site activity, app store data, patent filings, press releases, conference speaker lists. Assess accessibility, cost, refresh rate, signal-to-noise for each. [src2, src5]
Verify: Minimum 5 behavioral sources, at least 2 with API access confirmed. · If failed: Vertical may lack digital footprint for automation.
Duration: 0.5-1 day · Tool: Satellite/street imagery platform evaluation
Evaluate visual signals: satellite imagery, street-level imagery, aerial photography. Highly vertical-dependent — skip for purely digital verticals. Note: visual processing requires specialized ML models ($2K-$10K development). [src4]
Verify: Visual relevance determined or documented as “not applicable.” · If failed: Visual signals are optional — continue.
Duration: 1-2 days · Tool: Media monitoring platform evaluation
Identify text and media sources: industry publications, trade journals, conference proceedings, podcast transcripts, social media, forums. Assess volume, relevance density, extraction difficulty, timeliness. [src1, src2]
Verify: Minimum 5 unstructured sources, at least 2 text-based. · If failed: Budget additional transcription costs for audio/video sources.
Duration: 1 day · Tool: Spreadsheet + scoring framework
Compile into single scored inventory. Composite = (Accessibility × 0.30) + (Cost × 0.20) + (Refresh Rate × 0.25) + (SNR × 0.25). Plot on 2×2 priority matrix. Calculate overall viability score. [src5]
Verify: All sources scored. Priority matrix generated. Viability score calculated. · If failed: If viability < 0.60, recommend pivot.
Duration: 0.5-1 day · Tool: Document generation
Produce report: executive summary, source inventory, priority matrix, cost projection, risk assessment, go/no-go recommendation.
Verify: Report reviewed, recommendation clearly stated. · If failed: Request domain expert input before finalizing.
{
"output_type": "signal_source_audit",
"format": "spreadsheet + document",
"sections": [
{"name": "source_inventory", "type": "array", "description": "All sources with 4-dimension scoring"},
{"name": "priority_matrix", "type": "object", "description": "2x2 quality vs accessibility"},
{"name": "viability_score", "type": "number", "description": "Overall vertical viability 0.0-1.0"},
{"name": "cost_projection", "type": "object", "description": "Monthly cost for top 10 sources"},
{"name": "risk_assessment", "type": "array", "description": "Legal, reliability, dependency risks"},
{"name": "recommendation", "type": "string", "description": "Go/no-go with rationale"}
]
}
| Quality Metric | Minimum Acceptable | Good | Excellent |
|---|---|---|---|
| Total sources identified | > 15 | > 25 | > 40 |
| Sources with API access | > 3 | > 8 | > 15 |
| Signal categories covered | 3 of 4 | 4 of 4 | 4 of 4 + niche |
| Cost accuracy (vs actual) | Within 50% | Within 25% | Within 10% |
| Refresh rate verified (3 cycles) | > 50% | > 75% | > 90% |
If below minimum: Extend audit 2-3 days or consider vertical lacks signal density.
| Error | Likely Cause | Recovery Action |
|---|---|---|
| No regulatory databases found | Lightly regulated vertical | Shift weight to behavioral/media sources |
| API access denied during testing | Rate limits or auth required | Contact vendor for eval access; estimate from docs |
| Inconsistent refresh rate | Irregular publication schedule | Use minimum observed frequency; flag reliability risk |
| Cost info unavailable | Enterprise pricing, requires sales call | Use comparable source pricing as estimate |
| Fewer than 15 sources total | Limited digital footprint | Recommend paid supplements or vertical pivot |
| Component | Quick ($2K-$3K) | Standard ($3K-$5K) | Deep ($5K-$8K) |
|---|---|---|---|
| Regulatory inventory | $500-$800 | $800-$1.2K | $1.2K-$2K |
| Behavioral mapping | $500-$800 | $800-$1.2K | $1.2K-$2K |
| Visual + unstructured | $300-$500 | $500-$800 | $800-$1.2K |
| Scoring + ranking | $300-$500 | $500-$800 | $800-$1.2K |
| Report | $400 | $400-$800 | $800-$1.5K |
| Total | $2K-$3K | $3K-$5K | $5K-$8K |
Listing 30 sources without quality assessment. Result: pipeline built on unreliable sources fails in month one. [src1]
Each source gets accessibility, cost, refresh rate, and signal-to-noise ratings. Composite scores drive prioritization.
Identifying sources requiring TOS-violating scraping. Result: cease-and-desist letters mid-engagement. [src3]
Confirm public access, official API, or commercial licensing for each source. Document access method and legal basis.
Checking a source once and assuming consistent updates. Result: pipeline depends on irregularly updated source. [src4]
Monitor top-priority sources across at least 3 update cycles before committing pipeline dependency.
Use when evaluating whether a target vertical has sufficient signal density for automated intelligence. This is Phase 1 of the Signal Stack engagement — its output drives the go/no-go decision for taxonomy design and pipeline construction.