Signal Source Audit
How do you audit industry signal sources across regulatory, behavioral, visual, and unstructured?
Purpose
This recipe executes a systematic audit of all available data sources that could provide intent signals for a target industry vertical. It produces a scored inventory covering regulatory databases, behavioral data sources, visual signals, and unstructured media — enabling a go/no-go decision on vertical viability. [src1, src4]
Prerequisites
- Target vertical defined with clear industry boundaries (SIC/NAICS codes or equivalent)
- Geographic scope determined — US, EU, global, or specific markets
- Budget parameters established for data source licensing
- Existing data sources inventoried to avoid duplicate evaluation
Constraints
- Only evaluate publicly accessible or commercially licensable sources. Grey-area scraping creates legal liability. [src3]
- Every source must be scored on all 4 dimensions — partial scoring invalidates viability calculation.
- Refresh rate assessment requires verifying at least 3 consecutive update cycles.
- Cost estimates must include access fees and processing/storage costs. [src4]
- Minimum 15 sources for vertical viability threshold.
Tool Selection Decision
Which audit depth?
├── Quick assessment (3-5 days)
│ └── PATH A: Desktop research only
├── Standard audit (5-10 days)
│ └── PATH B: Desktop + API testing
├── Deep audit (10-15 days)
│ └── PATH C: Desktop + API testing + vendor interviews
└── Competitive audit
└── PATH D: Standard + competitor signal analysis
| Path | Scope | Cost | Speed | Confidence |
|---|---|---|---|---|
| A: Quick | Surface-level identification | $2K-$3K | 3-5 days | Moderate |
| B: Standard | Identification + quality verification | $3K-$5K | 5-10 days | High |
| C: Deep | Full evaluation + vendor negotiation | $5K-$8K | 10-15 days | Very high |
| D: Competitive | Standard + competitor analysis | $4K-$7K | 7-12 days | High |
Execution Flow
Step 1: Inventory Regulatory Databases
Duration: 1-2 days · Tool: Web research + government database directories
Identify all regulatory and government databases relevant to the target vertical: EPA, FDA, OSHA, SEC, state licensing boards, building permits, zoning databases. Document agency, URL, data format, update frequency, geographic coverage. Score each on accessibility (1-5), cost (1-5), refresh rate (1-5), signal-to-noise (1-5). [src1]
Verify: Minimum 5 regulatory sources identified and scored. · If failed: Vertical is lightly regulated — shift weight to behavioral sources.
Step 2: Map Behavioral Data Sources
Duration: 1-2 days · Tool: Web research + API documentation review
Identify behavioral sources: DNS/WHOIS changes, job board postings, review site activity, app store data, patent filings, press releases, conference speaker lists. Assess accessibility, cost, refresh rate, signal-to-noise for each. [src2, src5]
Verify: Minimum 5 behavioral sources, at least 2 with API access confirmed. · If failed: Vertical may lack digital footprint for automation.
Step 3: Assess Visual Signal Availability
Duration: 0.5-1 day · Tool: Satellite/street imagery platform evaluation
Evaluate visual signals: satellite imagery, street-level imagery, aerial photography. Highly vertical-dependent — skip for purely digital verticals. Note: visual processing requires specialized ML models ($2K-$10K development). [src4]
Verify: Visual relevance determined or documented as “not applicable.” · If failed: Visual signals are optional — continue.
Step 4: Identify Unstructured Media Sources
Duration: 1-2 days · Tool: Media monitoring platform evaluation
Identify text and media sources: industry publications, trade journals, conference proceedings, podcast transcripts, social media, forums. Assess volume, relevance density, extraction difficulty, timeliness. [src1, src2]
Verify: Minimum 5 unstructured sources, at least 2 text-based. · If failed: Budget additional transcription costs for audio/video sources.
Step 5: Score and Rank All Sources
Duration: 1 day · Tool: Spreadsheet + scoring framework
Compile into single scored inventory. Composite = (Accessibility × 0.30) + (Cost × 0.20) + (Refresh Rate × 0.25) + (SNR × 0.25). Plot on 2×2 priority matrix. Calculate overall viability score. [src5]
Verify: All sources scored. Priority matrix generated. Viability score calculated. · If failed: If viability < 0.60, recommend pivot.
Step 6: Deliver Audit Report
Duration: 0.5-1 day · Tool: Document generation
Produce report: executive summary, source inventory, priority matrix, cost projection, risk assessment, go/no-go recommendation.
Verify: Report reviewed, recommendation clearly stated. · If failed: Request domain expert input before finalizing.
Output Schema
{
"output_type": "signal_source_audit",
"format": "spreadsheet + document",
"sections": [
{"name": "source_inventory", "type": "array", "description": "All sources with 4-dimension scoring"},
{"name": "priority_matrix", "type": "object", "description": "2x2 quality vs accessibility"},
{"name": "viability_score", "type": "number", "description": "Overall vertical viability 0.0-1.0"},
{"name": "cost_projection", "type": "object", "description": "Monthly cost for top 10 sources"},
{"name": "risk_assessment", "type": "array", "description": "Legal, reliability, dependency risks"},
{"name": "recommendation", "type": "string", "description": "Go/no-go with rationale"}
]
}
Quality Benchmarks
| Quality Metric | Minimum Acceptable | Good | Excellent |
|---|---|---|---|
| Total sources identified | > 15 | > 25 | > 40 |
| Sources with API access | > 3 | > 8 | > 15 |
| Signal categories covered | 3 of 4 | 4 of 4 | 4 of 4 + niche |
| Cost accuracy (vs actual) | Within 50% | Within 25% | Within 10% |
| Refresh rate verified (3 cycles) | > 50% | > 75% | > 90% |
If below minimum: Extend audit 2-3 days or consider vertical lacks signal density.
Error Handling
| Error | Likely Cause | Recovery Action |
|---|---|---|
| No regulatory databases found | Lightly regulated vertical | Shift weight to behavioral/media sources |
| API access denied during testing | Rate limits or auth required | Contact vendor for eval access; estimate from docs |
| Inconsistent refresh rate | Irregular publication schedule | Use minimum observed frequency; flag reliability risk |
| Cost info unavailable | Enterprise pricing, requires sales call | Use comparable source pricing as estimate |
| Fewer than 15 sources total | Limited digital footprint | Recommend paid supplements or vertical pivot |
Cost Breakdown
| Component | Quick ($2K-$3K) | Standard ($3K-$5K) | Deep ($5K-$8K) |
|---|---|---|---|
| Regulatory inventory | $500-$800 | $800-$1.2K | $1.2K-$2K |
| Behavioral mapping | $500-$800 | $800-$1.2K | $1.2K-$2K |
| Visual + unstructured | $300-$500 | $500-$800 | $800-$1.2K |
| Scoring + ranking | $300-$500 | $500-$800 | $800-$1.2K |
| Report | $400 | $400-$800 | $800-$1.5K |
| Total | $2K-$3K | $3K-$5K | $5K-$8K |
Anti-Patterns
Wrong: Counting sources without scoring them
Listing 30 sources without quality assessment. Result: pipeline built on unreliable sources fails in month one. [src1]
Correct: Score every source on all 4 dimensions
Each source gets accessibility, cost, refresh rate, and signal-to-noise ratings. Composite scores drive prioritization.
Wrong: Ignoring legal accessibility constraints
Identifying sources requiring TOS-violating scraping. Result: cease-and-desist letters mid-engagement. [src3]
Correct: Verify legal access for every source
Confirm public access, official API, or commercial licensing for each source. Document access method and legal basis.
Wrong: Single-snapshot refresh rate assessment
Checking a source once and assuming consistent updates. Result: pipeline depends on irregularly updated source. [src4]
Correct: Verify 3 consecutive update cycles
Monitor top-priority sources across at least 3 update cycles before committing pipeline dependency.
When This Matters
Use when evaluating whether a target vertical has sufficient signal density for automated intelligence. This is Phase 1 of the Signal Stack engagement — its output drives the go/no-go decision for taxonomy design and pipeline construction.