Platform Extraction — Signal Stack

How do you refactor Signal Stack vertical #1 into reusable engine plus config layer?

Purpose

This recipe refactors a working Signal Stack vertical #1 into a reusable generic engine plus a declarative configuration layer. The extraction separates the 5 generic components (ingestion framework, classification pipeline, enrichment engine, document generator, delivery/tracking) from vertical-specific config (sources, triggers, targets, templates, conversion definitions). Target: each subsequent vertical requires < 50% of the effort of vertical #1. [src1, src2]

Prerequisites

3 paying customers in vertical #1 — the hard prerequisite. No extraction without validated revenue.
Working pipeline — all 5 layers operational and producing results
Signal taxonomy documented — trigger events, scoring logic, and false positive rules written down
Vertical #2 identified — at least one additional vertical selected to guide extraction
Regression test suite — 20+ known-good signal-to-dossier examples

Constraints

No platform work until 3 paying customers in vertical #1. Premature abstraction destroys startups. [src1]
Each subsequent vertical must require < 50% effort of first. If not, the abstraction is wrong. [src1]
Generic components must contain zero vertical-specific assumptions. [src3]
Vertical config must be declarative. Custom code per vertical means the boundary is wrong. [src4]
Backward compatibility non-negotiable. Zero disruption to existing clients. [src2]

Tool Selection Decision

Which path?
├── Pipeline is a monolithic Python script
│   └── PATH A: Strangler Fig — wrap existing code, extract incrementally
├── Pipeline is partially modular
│   └── PATH B: Interface Extraction — define contracts, refactor in place
├── Pipeline is modular but tightly coupled
│   └── PATH C: Config Extraction — extract hard-coded values into config
└── Starting fresh (rare — unmaintainable code only)
    └── PATH D: Rewrite with config-first architecture

Path	Approach	Duration	Risk	Best For
A: Strangler Fig	Incremental wrapping	4-5 weeks	Low	Monolithic pipelines
B: Interface Extraction	Define contracts, refactor	3-4 weeks	Medium	Partially modular
C: Config Extraction	Pull values into config	2-3 weeks	Low	Already modular
D: Rewrite	Build new, migrate	5-8 weeks	High	Unmaintainable code only

Execution Flow

Step 1: Architecture Audit

Duration: 2-3 days · Tool: Code review + architecture diagramming

Map current pipeline to the 5-layer architecture. For each layer, identify generic vs. vertical-specific components. Produce architecture diagram with annotated vertical assumptions. [src1]

Verify: Architecture diagram reviewed, every hard-coded vertical assumption annotated. · If failed: Pair with pipeline engineer to trace data flow end-to-end.

Step 2: Define Configuration Schema

Duration: 3-5 days · Tool: JSON Schema or Pydantic model

Design the declarative config schema covering: sources, triggers, targets, templates, delivery, and conversion definitions. Validate by expressing vertical #1 entirely as config. [src3, src4]

Verify: Vertical #1 fully expressible as config + generic engine. · If failed: Extend schema or accept behavior as generic engine feature.

Step 3: Extract Generic Engine

Duration: 5-8 days · Tool: Git branching + incremental refactoring

Extract layer by layer: ingestion framework, classification pipeline, enrichment engine, document generator, delivery/tracking. Use Strangler Fig pattern — wrap existing code, replace internals gradually. [src1, src4]

Verify: Engine runs vertical #1 from config file, output matches pre-extraction exactly. · If failed: Diff outputs, fix discrepancies before proceeding.

Step 4: Regression Testing

Duration: 2-3 days · Tool: Automated test suite

Run 20+ known-good examples through extracted engine + config. Compare classification, enrichment, dossier structure, and delivery behavior.

Verify: 100% pass on classification/enrichment, > 95% structural match on dossiers. · If failed: Add missing config parameters for failing edge cases.

Step 5: Vertical #2 Dry Run

Duration: 3-5 days · Tool: New config file + test data

Create vertical #2 config, run against 20-50 test signals. Measure: config creation time (< 3 days), engine code changes (zero), config coverage (> 90%). This is the extraction validation gate. [src1, src2]

Verify: Vertical #2 operational with zero engine changes. · If failed: Refactor affected component, re-run dry run.

Step 6: Documentation and Handoff

Duration: 2-3 days · Tool: Documentation + walkthrough

Produce platform architecture document: component diagram, config schema reference, new-vertical guide, deployment topology, extension points. [src3]

Verify: Team member creates basic vertical config from docs alone. · If failed: Annotate documentation gaps and fill them.

Output Schema

{
  "output_type": "platform_extraction_package",
  "format": "code repository + documentation",
  "sections": [
    {"name": "generic_engine", "type": "object", "description": "5-layer reusable pipeline codebase"},
    {"name": "config_schema", "type": "object", "description": "JSON/YAML schema for vertical config"},
    {"name": "vertical_1_config", "type": "object", "description": "Existing vertical as config"},
    {"name": "vertical_2_config", "type": "object", "description": "New vertical dry-run config"},
    {"name": "regression_results", "type": "object", "description": "Test suite pass/fail report"},
    {"name": "architecture_doc", "type": "object", "description": "Platform architecture and config guide"}
  ]
}

Quality Benchmarks

Quality Metric	Minimum Acceptable	Good	Excellent
Regression pass rate	> 95%	> 98%	100%
Vertical #2 config time	< 5 days	< 3 days	< 2 days
Engine code changes for V2	< 5 changes	1-2 changes	Zero
Config coverage	> 85%	> 90%	> 95%
V1 client disruption	< 1 hour downtime	Zero downtime	Zero disruption
Doc completeness	With help	Independent + questions	Fully independent

If below minimum: Abstraction boundary is wrong. Pause vertical #2 launch, refactor engine, re-test.

Error Handling

Error	Likely Cause	Recovery Action
Regression failures	Config schema missing edge cases	Diff outputs, add missing config parameters
V2 requires engine changes	Abstraction boundary too narrow	Generalize component, add config parameter
Config schema too complex (> 200 fields)	Over-engineering	Simplify: 20% config covers 80% variation
V1 clients report issues	Backward compatibility regression	Immediate rollback, fix, re-deploy
Team cannot create config from docs	Documentation gaps	Pair-program, annotate stumbling points, fill gaps

Cost Breakdown

Component	Solo ($10K)	Small Team ($15K)	Dedicated ($25K)
Architecture audit	$1K	$2K	$3K
Config schema design	$2K	$3K	$4K
Engine extraction	$4K	$5K	$8K
Regression testing	$1K	$2K	$3K
Vertical #2 dry run	$1K	$2K	$4K
Documentation	$1K	$1K	$3K
Total	$10K	$15K	$25K

Anti-Patterns

Wrong: Extracting the platform before 3 paying customers

Building generic infrastructure before proving the vertical commercially. Result: you optimize for flexibility nobody needs while the product stagnates. [src1]

Correct: Prove revenue first, extract second

Wait for 3 paying customers. Their usage patterns reveal what actually needs to be generic vs. one-off.

Wrong: Making everything configurable

Creating a 200+ field config schema. Result: config becomes as complex as code, nobody can create a vertical without the architect. [src3]

Correct: Configure the 80%, code the 20%

Identify the 20% of variation that covers 80% of use cases. Accept that rare edge cases may require small code additions.

Wrong: Big-bang rewrite

Rewriting from scratch. Result: months with no features, diverging codebases, scope creep. [src4]

Correct: Strangler Fig pattern

Wrap existing code behind new interfaces. Extract one layer at a time. Zero downtime, zero risk.

When This Matters

Use when an agent needs to plan or execute the transition from a working single-vertical pipeline to a reusable platform. This is the critical inflection point: done right, it enables rapid vertical expansion at marginal cost. Done wrong, it kills momentum and wastes engineering time.