Platform Extraction — Signal Stack

Type: Execution Recipe Confidence: 0.85 Sources: 4 Verified: 2026-03-29

Purpose

This recipe refactors a working Signal Stack vertical #1 into a reusable generic engine plus a declarative configuration layer. The extraction separates the 5 generic components (ingestion framework, classification pipeline, enrichment engine, document generator, delivery/tracking) from vertical-specific config (sources, triggers, targets, templates, conversion definitions). Target: each subsequent vertical requires < 50% of the effort of vertical #1. [src1, src2]

Prerequisites

Constraints

Tool Selection Decision

Which path?
├── Pipeline is a monolithic Python script
│   └── PATH A: Strangler Fig — wrap existing code, extract incrementally
├── Pipeline is partially modular
│   └── PATH B: Interface Extraction — define contracts, refactor in place
├── Pipeline is modular but tightly coupled
│   └── PATH C: Config Extraction — extract hard-coded values into config
└── Starting fresh (rare — unmaintainable code only)
    └── PATH D: Rewrite with config-first architecture
PathApproachDurationRiskBest For
A: Strangler FigIncremental wrapping4-5 weeksLowMonolithic pipelines
B: Interface ExtractionDefine contracts, refactor3-4 weeksMediumPartially modular
C: Config ExtractionPull values into config2-3 weeksLowAlready modular
D: RewriteBuild new, migrate5-8 weeksHighUnmaintainable code only

Execution Flow

Step 1: Architecture Audit

Duration: 2-3 days · Tool: Code review + architecture diagramming

Map current pipeline to the 5-layer architecture. For each layer, identify generic vs. vertical-specific components. Produce architecture diagram with annotated vertical assumptions. [src1]

Verify: Architecture diagram reviewed, every hard-coded vertical assumption annotated. · If failed: Pair with pipeline engineer to trace data flow end-to-end.

Step 2: Define Configuration Schema

Duration: 3-5 days · Tool: JSON Schema or Pydantic model

Design the declarative config schema covering: sources, triggers, targets, templates, delivery, and conversion definitions. Validate by expressing vertical #1 entirely as config. [src3, src4]

Verify: Vertical #1 fully expressible as config + generic engine. · If failed: Extend schema or accept behavior as generic engine feature.

Step 3: Extract Generic Engine

Duration: 5-8 days · Tool: Git branching + incremental refactoring

Extract layer by layer: ingestion framework, classification pipeline, enrichment engine, document generator, delivery/tracking. Use Strangler Fig pattern — wrap existing code, replace internals gradually. [src1, src4]

Verify: Engine runs vertical #1 from config file, output matches pre-extraction exactly. · If failed: Diff outputs, fix discrepancies before proceeding.

Step 4: Regression Testing

Duration: 2-3 days · Tool: Automated test suite

Run 20+ known-good examples through extracted engine + config. Compare classification, enrichment, dossier structure, and delivery behavior.

Verify: 100% pass on classification/enrichment, > 95% structural match on dossiers. · If failed: Add missing config parameters for failing edge cases.

Step 5: Vertical #2 Dry Run

Duration: 3-5 days · Tool: New config file + test data

Create vertical #2 config, run against 20-50 test signals. Measure: config creation time (< 3 days), engine code changes (zero), config coverage (> 90%). This is the extraction validation gate. [src1, src2]

Verify: Vertical #2 operational with zero engine changes. · If failed: Refactor affected component, re-run dry run.

Step 6: Documentation and Handoff

Duration: 2-3 days · Tool: Documentation + walkthrough

Produce platform architecture document: component diagram, config schema reference, new-vertical guide, deployment topology, extension points. [src3]

Verify: Team member creates basic vertical config from docs alone. · If failed: Annotate documentation gaps and fill them.

Output Schema

{
  "output_type": "platform_extraction_package",
  "format": "code repository + documentation",
  "sections": [
    {"name": "generic_engine", "type": "object", "description": "5-layer reusable pipeline codebase"},
    {"name": "config_schema", "type": "object", "description": "JSON/YAML schema for vertical config"},
    {"name": "vertical_1_config", "type": "object", "description": "Existing vertical as config"},
    {"name": "vertical_2_config", "type": "object", "description": "New vertical dry-run config"},
    {"name": "regression_results", "type": "object", "description": "Test suite pass/fail report"},
    {"name": "architecture_doc", "type": "object", "description": "Platform architecture and config guide"}
  ]
}

Quality Benchmarks

Quality MetricMinimum AcceptableGoodExcellent
Regression pass rate> 95%> 98%100%
Vertical #2 config time< 5 days< 3 days< 2 days
Engine code changes for V2< 5 changes1-2 changesZero
Config coverage> 85%> 90%> 95%
V1 client disruption< 1 hour downtimeZero downtimeZero disruption
Doc completenessWith helpIndependent + questionsFully independent

If below minimum: Abstraction boundary is wrong. Pause vertical #2 launch, refactor engine, re-test.

Error Handling

ErrorLikely CauseRecovery Action
Regression failuresConfig schema missing edge casesDiff outputs, add missing config parameters
V2 requires engine changesAbstraction boundary too narrowGeneralize component, add config parameter
Config schema too complex (> 200 fields)Over-engineeringSimplify: 20% config covers 80% variation
V1 clients report issuesBackward compatibility regressionImmediate rollback, fix, re-deploy
Team cannot create config from docsDocumentation gapsPair-program, annotate stumbling points, fill gaps

Cost Breakdown

ComponentSolo ($10K)Small Team ($15K)Dedicated ($25K)
Architecture audit$1K$2K$3K
Config schema design$2K$3K$4K
Engine extraction$4K$5K$8K
Regression testing$1K$2K$3K
Vertical #2 dry run$1K$2K$4K
Documentation$1K$1K$3K
Total$10K$15K$25K

Anti-Patterns

Wrong: Extracting the platform before 3 paying customers

Building generic infrastructure before proving the vertical commercially. Result: you optimize for flexibility nobody needs while the product stagnates. [src1]

Correct: Prove revenue first, extract second

Wait for 3 paying customers. Their usage patterns reveal what actually needs to be generic vs. one-off.

Wrong: Making everything configurable

Creating a 200+ field config schema. Result: config becomes as complex as code, nobody can create a vertical without the architect. [src3]

Correct: Configure the 80%, code the 20%

Identify the 20% of variation that covers 80% of use cases. Accept that rare edge cases may require small code additions.

Wrong: Big-bang rewrite

Rewriting from scratch. Result: months with no features, diverging codebases, scope creep. [src4]

Correct: Strangler Fig pattern

Wrap existing code behind new interfaces. Extract one layer at a time. Zero downtime, zero risk.

When This Matters

Use when an agent needs to plan or execute the transition from a working single-vertical pipeline to a reusable platform. This is the critical inflection point: done right, it enables rapid vertical expansion at marginal cost. Done wrong, it kills momentum and wastes engineering time.

Related Units