Data Integration Architecture for Startup Dashboards

How do I integrate data sources — CRM, ad platforms, analytics, banking, email into a unified dashboard?

Purpose

This recipe builds a unified data pipeline connecting CRM, payment processing, analytics, advertising, banking, and email platforms into a single data warehouse, with a transformation layer producing business-ready metrics. The output eliminates manual spreadsheet consolidation by automatically computing MRR, CAC, LTV, retention, and channel attribution from raw source data. [src1]

Prerequisites

API credentials for each data source — Stripe API key, GA4 service account, HubSpot token, ad platform credentials
KPI definitions from metrics framework — Startup KPI Framework by Stage
CRM configured — CRM Setup Guide
Data warehouse account — Supabase, BigQuery, or Snowflake
Docker installed (if self-hosting Airbyte) — Docker Desktop

Constraints

Airbyte Cloud free tier: 10 credits/month (~100K rows). Self-hosted is unlimited but needs Docker + 4GB RAM [src2]
Fivetran costs $1/credit scaling with volume — budget $100-500/mo at Series A scale
Stripe API: 100 read requests/second — pace historical backfills [src4]
Google Ads API needs developer token — 3-5 business days for approval
Plaid: $0.30-3/connection/month in production — disconnect dormant accounts
Batch syncs introduce 1-24 hour latency — real-time costs 3-5x more

Tool Selection Decision

Which path?
├── Non-technical AND budget = free
│   └── PATH A: Zapier + Google Sheets — basic integration, limited scale
├── Semi-technical AND budget = free
│   └── PATH B: Airbyte (self-hosted) + Supabase + Metabase — full pipeline, $0
├── Developer AND budget = $0-100/mo
│   └── PATH C: Airbyte Cloud + Supabase + dbt + Metabase — managed pipeline
└── Developer AND budget = $100-500/mo
    └── PATH D: Fivetran + BigQuery/Snowflake + dbt Cloud — enterprise-grade

Path	Tools	Cost	Setup Time	Scalability
A: No-Code	Zapier + Google Sheets	$0-20/mo	2-4 hours	Low
B: Self-Hosted	Airbyte OSS + Supabase + Metabase	$0	8-12 hours	Medium
C: Managed Free	Airbyte Cloud + Supabase + dbt	$0-25/mo	4-8 hours	Medium-High
D: Enterprise	Fivetran + BigQuery + dbt Cloud	$200-500/mo	6-12 hours	Very High

Execution Flow

Step 1: Set Up the Data Warehouse

Duration: 30-60 minutes · Tool: Supabase or BigQuery

Create raw_data schema for ingestion and analytics schema for transformed business metrics. Set up dedicated Airbyte user with appropriate grants.

Verify: Connect from Airbyte using airbyte_user credentials · If failed: Check connection pooler settings, use port 5432 for direct connections

Step 2: Configure Data Source Connections

Duration: 2-4 hours · Tool: Airbyte or Fivetran

Connect sources in priority order: Stripe (revenue), CRM (pipeline), GA4 (traffic), Ad platforms (spend), Email (engagement), Banking (cash). Use incremental sync mode where available.

Verify: Manual Stripe sync completes with rows visible in raw_data schema · If failed: Check API key permissions and scopes

Step 3: Build the Transformation Layer

Duration: 4-8 hours · Tool: dbt or SQL views

Create business-ready views: daily_mrr, customer_ltv, channel_cac, cohort_retention. Map raw Stripe/CRM/ad data into standardized metrics tables.

Verify: MRR query matches Stripe dashboard within 2% · If failed: Check column name mappings and currency handling

Step 4: Set Up Incremental Sync Schedules

Duration: 30-60 minutes · Tool: Airbyte/Fivetran scheduler

Stripe: every 6 hours. CRM: every 6 hours. GA4: daily. Ad platforms: daily. Banking: daily. Email: daily.

Verify: Sync history shows successful runs with row counts · If failed: Enable Slack alerts for failed syncs

Step 5: Connect Visualization Layer

Duration: 2-4 hours · Tool: Metabase or Looker Studio

Connect dashboard tool to analytics schema. Create initial panels: MRR trend, LTV distribution, CAC by channel, active subscriptions, pipeline value.

Verify: Dashboard loads in under 5 seconds with accurate data · If failed: Check database connection points to analytics schema

Step 6: Data Quality Monitoring

Duration: 1-2 hours · Tool: SQL alerts or Metabase alerts

Set up freshness checks (data staleness > 12 hours), completeness checks (MRR not zero), and consistency checks (no > 20% day-over-day MRR drop).

Verify: Quality checks return no alerts when data is syncing correctly · If failed: Investigate specific source connector sync status

Output Schema

{
  "output_type": "data_pipeline",
  "format": "configured pipeline + warehouse + transformation layer",
  "components": [
    {"name": "source_connections", "type": "Airbyte/Fivetran configs", "required": true},
    {"name": "raw_data_warehouse", "type": "PostgreSQL/BigQuery schema", "required": true},
    {"name": "transformation_layer", "type": "SQL views or dbt models", "required": true},
    {"name": "data_quality_monitors", "type": "SQL alerts", "required": true},
    {"name": "visualization_connection", "type": "Metabase/Looker config", "required": true}
  ],
  "expected_table_count": "5-8 raw tables, 4-6 analytics views"
}

Quality Benchmarks

Quality Metric	Minimum Acceptable	Good	Excellent
Sync success rate	> 90%	> 95%	> 99%
Data freshness (Stripe)	< 24 hours	< 12 hours	< 6 hours
MRR accuracy vs Stripe	Within 5%	Within 2%	Within 0.5%
Source coverage	3+ sources	5+ sources	7+ sources
Data quality alerts	Weekly	Monthly	Quarterly

If below minimum: Check sync logs for failures, verify API credentials, confirm rate limits not exceeded.

Error Handling

Error	Likely Cause	Recovery Action
401 Unauthorized	API key expired or revoked	Regenerate key, update connector settings
429 Rate Limited	Too many concurrent requests	Reduce sync frequency, add throttling
Schema drift: column missing	Source API updated schema	Update connector version; remap columns
MRR calculation mismatch	Currency/tax inclusion differences	Verify amounts in cents, exclude taxes
Warehouse storage full	Unchecked raw data growth	Implement retention policy; archive > 2 years
Slow dashboard queries	Missing indexes	Add indexes on date and customer_id columns

Cost Breakdown

Component	Free Tier	Growth	Scale
Data pipeline (Airbyte)	$0	$50-100/mo	$200-400/mo
Data warehouse	$0	$25-50/mo	$50-200/mo
Transformation (dbt)	$0	$0	$100/mo
Visualization	$0	$85/mo	$85-200/mo
Banking API (Plaid)	$0	$10-30/mo	$30-100/mo
Total	$0	$170-265/mo	$465-1,000/mo

Anti-Patterns

Wrong: Building Custom ETL Scripts

Writing custom Python scripts per API. These break on version changes and require 2-4 hours/month maintenance per source. [src3]

Correct: Use Managed Connectors

Use Airbyte/Fivetran for extraction (600+ pre-built connectors). Custom code only for the transformation layer.

Wrong: Syncing Everything in Real-Time

Real-time syncs cost 3-5x more with no benefit if dashboards are checked daily. [src1]

Correct: Match Sync Frequency to Decision Cadence

Revenue: every 6 hours. Ad spend: daily. Analytics: daily. Only real-time for real-time decisions.

When This Matters

Use when a startup needs to combine data from 3+ sources into a single dashboard. Typically becomes necessary between seed and Series A, when investor reporting demands consistent, auditable metrics from consolidated data sources.