Signal Source Audit

How do you audit industry signal sources across regulatory, behavioral, visual, and unstructured?

Purpose

This recipe executes a systematic audit of all available data sources that could provide intent signals for a target industry vertical. It produces a scored inventory covering regulatory databases, behavioral data sources, visual signals, and unstructured media — enabling a go/no-go decision on vertical viability. [src1, src4]

Prerequisites

Target vertical defined with clear industry boundaries (SIC/NAICS codes or equivalent)
Geographic scope determined — US, EU, global, or specific markets
Budget parameters established for data source licensing
Existing data sources inventoried to avoid duplicate evaluation

Constraints

Only evaluate publicly accessible or commercially licensable sources. Grey-area scraping creates legal liability. [src3]
Every source must be scored on all 4 dimensions — partial scoring invalidates viability calculation.
Refresh rate assessment requires verifying at least 3 consecutive update cycles.
Cost estimates must include access fees and processing/storage costs. [src4]
Minimum 15 sources for vertical viability threshold.

Tool Selection Decision

Which audit depth?
├── Quick assessment (3-5 days)
│   └── PATH A: Desktop research only
├── Standard audit (5-10 days)
│   └── PATH B: Desktop + API testing
├── Deep audit (10-15 days)
│   └── PATH C: Desktop + API testing + vendor interviews
└── Competitive audit
    └── PATH D: Standard + competitor signal analysis

Path	Scope	Cost	Speed	Confidence
A: Quick	Surface-level identification	$2K-$3K	3-5 days	Moderate
B: Standard	Identification + quality verification	$3K-$5K	5-10 days	High
C: Deep	Full evaluation + vendor negotiation	$5K-$8K	10-15 days	Very high
D: Competitive	Standard + competitor analysis	$4K-$7K	7-12 days	High

Execution Flow

Step 1: Inventory Regulatory Databases

Duration: 1-2 days · Tool: Web research + government database directories

Identify all regulatory and government databases relevant to the target vertical: EPA, FDA, OSHA, SEC, state licensing boards, building permits, zoning databases. Document agency, URL, data format, update frequency, geographic coverage. Score each on accessibility (1-5), cost (1-5), refresh rate (1-5), signal-to-noise (1-5). [src1]

Verify: Minimum 5 regulatory sources identified and scored. · If failed: Vertical is lightly regulated — shift weight to behavioral sources.

Step 2: Map Behavioral Data Sources

Duration: 1-2 days · Tool: Web research + API documentation review

Identify behavioral sources: DNS/WHOIS changes, job board postings, review site activity, app store data, patent filings, press releases, conference speaker lists. Assess accessibility, cost, refresh rate, signal-to-noise for each. [src2, src5]

Verify: Minimum 5 behavioral sources, at least 2 with API access confirmed. · If failed: Vertical may lack digital footprint for automation.

Step 3: Assess Visual Signal Availability

Duration: 0.5-1 day · Tool: Satellite/street imagery platform evaluation

Evaluate visual signals: satellite imagery, street-level imagery, aerial photography. Highly vertical-dependent — skip for purely digital verticals. Note: visual processing requires specialized ML models ($2K-$10K development). [src4]

Verify: Visual relevance determined or documented as “not applicable.” · If failed: Visual signals are optional — continue.

Step 4: Identify Unstructured Media Sources

Duration: 1-2 days · Tool: Media monitoring platform evaluation

Identify text and media sources: industry publications, trade journals, conference proceedings, podcast transcripts, social media, forums. Assess volume, relevance density, extraction difficulty, timeliness. [src1, src2]

Verify: Minimum 5 unstructured sources, at least 2 text-based. · If failed: Budget additional transcription costs for audio/video sources.

Step 5: Score and Rank All Sources

Duration: 1 day · Tool: Spreadsheet + scoring framework

Compile into single scored inventory. Composite = (Accessibility × 0.30) + (Cost × 0.20) + (Refresh Rate × 0.25) + (SNR × 0.25). Plot on 2×2 priority matrix. Calculate overall viability score. [src5]

Verify: All sources scored. Priority matrix generated. Viability score calculated. · If failed: If viability < 0.60, recommend pivot.

Step 6: Deliver Audit Report

Duration: 0.5-1 day · Tool: Document generation

Produce report: executive summary, source inventory, priority matrix, cost projection, risk assessment, go/no-go recommendation.

Verify: Report reviewed, recommendation clearly stated. · If failed: Request domain expert input before finalizing.

Output Schema

{
  "output_type": "signal_source_audit",
  "format": "spreadsheet + document",
  "sections": [
    {"name": "source_inventory", "type": "array", "description": "All sources with 4-dimension scoring"},
    {"name": "priority_matrix", "type": "object", "description": "2x2 quality vs accessibility"},
    {"name": "viability_score", "type": "number", "description": "Overall vertical viability 0.0-1.0"},
    {"name": "cost_projection", "type": "object", "description": "Monthly cost for top 10 sources"},
    {"name": "risk_assessment", "type": "array", "description": "Legal, reliability, dependency risks"},
    {"name": "recommendation", "type": "string", "description": "Go/no-go with rationale"}
  ]
}

Quality Benchmarks

Quality Metric	Minimum Acceptable	Good	Excellent
Total sources identified	> 15	> 25	> 40
Sources with API access	> 3	> 8	> 15
Signal categories covered	3 of 4	4 of 4	4 of 4 + niche
Cost accuracy (vs actual)	Within 50%	Within 25%	Within 10%
Refresh rate verified (3 cycles)	> 50%	> 75%	> 90%

If below minimum: Extend audit 2-3 days or consider vertical lacks signal density.

Error Handling

Error	Likely Cause	Recovery Action
No regulatory databases found	Lightly regulated vertical	Shift weight to behavioral/media sources
API access denied during testing	Rate limits or auth required	Contact vendor for eval access; estimate from docs
Inconsistent refresh rate	Irregular publication schedule	Use minimum observed frequency; flag reliability risk
Cost info unavailable	Enterprise pricing, requires sales call	Use comparable source pricing as estimate
Fewer than 15 sources total	Limited digital footprint	Recommend paid supplements or vertical pivot

Cost Breakdown

Component	Quick ($2K-$3K)	Standard ($3K-$5K)	Deep ($5K-$8K)
Regulatory inventory	$500-$800	$800-$1.2K	$1.2K-$2K
Behavioral mapping	$500-$800	$800-$1.2K	$1.2K-$2K
Visual + unstructured	$300-$500	$500-$800	$800-$1.2K
Scoring + ranking	$300-$500	$500-$800	$800-$1.2K
Report	$400	$400-$800	$800-$1.5K
Total	$2K-$3K	$3K-$5K	$5K-$8K

Anti-Patterns

Wrong: Counting sources without scoring them

Listing 30 sources without quality assessment. Result: pipeline built on unreliable sources fails in month one. [src1]

Correct: Score every source on all 4 dimensions

Each source gets accessibility, cost, refresh rate, and signal-to-noise ratings. Composite scores drive prioritization.

Wrong: Ignoring legal accessibility constraints

Identifying sources requiring TOS-violating scraping. Result: cease-and-desist letters mid-engagement. [src3]

Correct: Verify legal access for every source

Confirm public access, official API, or commercial licensing for each source. Document access method and legal basis.

Wrong: Single-snapshot refresh rate assessment

Checking a source once and assuming consistent updates. Result: pipeline depends on irregularly updated source. [src4]

Correct: Verify 3 consecutive update cycles

Monitor top-priority sources across at least 3 update cycles before committing pipeline dependency.

When This Matters

Use when evaluating whether a target vertical has sufficient signal density for automated intelligence. This is Phase 1 of the Signal Stack engagement — its output drives the go/no-go decision for taxonomy design and pipeline construction.