This recipe builds a unified data pipeline connecting CRM, payment processing, analytics, advertising, banking, and email platforms into a single data warehouse, with a transformation layer producing business-ready metrics. The output eliminates manual spreadsheet consolidation by automatically computing MRR, CAC, LTV, retention, and channel attribution from raw source data. [src1]
metrics framework — Startup KPI Framework by StageWhich path?
├── Non-technical AND budget = free
│ └── PATH A: Zapier + Google Sheets — basic integration, limited scale
├── Semi-technical AND budget = free
│ └── PATH B: Airbyte (self-hosted) + Supabase + Metabase — full pipeline, $0
├── Developer AND budget = $0-100/mo
│ └── PATH C: Airbyte Cloud + Supabase + dbt + Metabase — managed pipeline
└── Developer AND budget = $100-500/mo
└── PATH D: Fivetran + BigQuery/Snowflake + dbt Cloud — enterprise-grade
| Path | Tools | Cost | Setup Time | Scalability |
|---|---|---|---|---|
| A: No-Code | Zapier + Google Sheets | $0-20/mo | 2-4 hours | Low |
| B: Self-Hosted | Airbyte OSS + Supabase + Metabase | $0 | 8-12 hours | Medium |
| C: Managed Free | Airbyte Cloud + Supabase + dbt | $0-25/mo | 4-8 hours | Medium-High |
| D: Enterprise | Fivetran + BigQuery + dbt Cloud | $200-500/mo | 6-12 hours | Very High |
Duration: 30-60 minutes · Tool: Supabase or BigQuery
Create raw_data schema for ingestion and analytics schema for transformed business metrics. Set up dedicated Airbyte user with appropriate grants.
Verify: Connect from Airbyte using airbyte_user credentials · If failed: Check connection pooler settings, use port 5432 for direct connections
Duration: 2-4 hours · Tool: Airbyte or Fivetran
Connect sources in priority order: Stripe (revenue), CRM (pipeline), GA4 (traffic), Ad platforms (spend), Email (engagement), Banking (cash). Use incremental sync mode where available.
Verify: Manual Stripe sync completes with rows visible in raw_data schema · If failed: Check API key permissions and scopes
Duration: 4-8 hours · Tool: dbt or SQL views
Create business-ready views: daily_mrr, customer_ltv, channel_cac, cohort_retention. Map raw Stripe/CRM/ad data into standardized metrics tables.
Verify: MRR query matches Stripe dashboard within 2% · If failed: Check column name mappings and currency handling
Duration: 30-60 minutes · Tool: Airbyte/Fivetran scheduler
Stripe: every 6 hours. CRM: every 6 hours. GA4: daily. Ad platforms: daily. Banking: daily. Email: daily.
Verify: Sync history shows successful runs with row counts · If failed: Enable Slack alerts for failed syncs
Duration: 2-4 hours · Tool: Metabase or Looker Studio
Connect dashboard tool to analytics schema. Create initial panels: MRR trend, LTV distribution, CAC by channel, active subscriptions, pipeline value.
Verify: Dashboard loads in under 5 seconds with accurate data · If failed: Check database connection points to analytics schema
Duration: 1-2 hours · Tool: SQL alerts or Metabase alerts
Set up freshness checks (data staleness > 12 hours), completeness checks (MRR not zero), and consistency checks (no > 20% day-over-day MRR drop).
Verify: Quality checks return no alerts when data is syncing correctly · If failed: Investigate specific source connector sync status
{
"output_type": "data_pipeline",
"format": "configured pipeline + warehouse + transformation layer",
"components": [
{"name": "source_connections", "type": "Airbyte/Fivetran configs", "required": true},
{"name": "raw_data_warehouse", "type": "PostgreSQL/BigQuery schema", "required": true},
{"name": "transformation_layer", "type": "SQL views or dbt models", "required": true},
{"name": "data_quality_monitors", "type": "SQL alerts", "required": true},
{"name": "visualization_connection", "type": "Metabase/Looker config", "required": true}
],
"expected_table_count": "5-8 raw tables, 4-6 analytics views"
}
| Quality Metric | Minimum Acceptable | Good | Excellent |
|---|---|---|---|
| Sync success rate | > 90% | > 95% | > 99% |
| Data freshness (Stripe) | < 24 hours | < 12 hours | < 6 hours |
| MRR accuracy vs Stripe | Within 5% | Within 2% | Within 0.5% |
| Source coverage | 3+ sources | 5+ sources | 7+ sources |
| Data quality alerts | Weekly | Monthly | Quarterly |
If below minimum: Check sync logs for failures, verify API credentials, confirm rate limits not exceeded.
| Error | Likely Cause | Recovery Action |
|---|---|---|
| 401 Unauthorized | API key expired or revoked | Regenerate key, update connector settings |
| 429 Rate Limited | Too many concurrent requests | Reduce sync frequency, add throttling |
| Schema drift: column missing | Source API updated schema | Update connector version; remap columns |
| MRR calculation mismatch | Currency/tax inclusion differences | Verify amounts in cents, exclude taxes |
| Warehouse storage full | Unchecked raw data growth | Implement retention policy; archive > 2 years |
| Slow dashboard queries | Missing indexes | Add indexes on date and customer_id columns |
| Component | Free Tier | Growth | Scale |
|---|---|---|---|
| Data pipeline (Airbyte) | $0 | $50-100/mo | $200-400/mo |
| Data warehouse | $0 | $25-50/mo | $50-200/mo |
| Transformation (dbt) | $0 | $0 | $100/mo |
| Visualization | $0 | $85/mo | $85-200/mo |
| Banking API (Plaid) | $0 | $10-30/mo | $30-100/mo |
| Total | $0 | $170-265/mo | $465-1,000/mo |
Writing custom Python scripts per API. These break on version changes and require 2-4 hours/month maintenance per source. [src3]
Use Airbyte/Fivetran for extraction (600+ pre-built connectors). Custom code only for the transformation layer.
Real-time syncs cost 3-5x more with no benefit if dashboards are checked daily. [src1]
Revenue: every 6 hours. Ad spend: daily. Analytics: daily. Only real-time for real-time decisions.
Use when a startup needs to combine data from 3+ sources into a single dashboard. Typically becomes necessary between seed and Series A, when investor reporting demands consistent, auditable metrics from consolidated data sources.