This card is a cross-platform architecture pattern comparison covering three fundamental approaches to accessing ERP data from external systems: data virtualization (query in place without moving data), data replication (copy data to a local store via ETL/ELT/CDC), and direct API calls (point-to-point reads against ERP APIs). The decision framework applies universally; the implementation specifics vary by ERP system and data platform.
Data virtualization platforms (Denodo, TIBCO/Spotfire SDV, D365 Virtual Entities, CData) provide a semantic layer that federates queries across multiple ERP systems and databases without physical data movement. Replication platforms (Fivetran, Debezium, Airbyte, Oracle GoldenGate, Qlik Replicate, Striim) physically copy ERP data to a target data store — either via batch ETL/ELT or near-real-time CDC. Direct API calls use ERP-native REST, OData, or SOAP endpoints for point-to-point data retrieval.
| Property | Value |
|---|---|
| Scope | Cross-platform architecture pattern comparison |
| Approaches | Data Virtualization, Data Replication (ETL/ELT/CDC), Direct API Calls |
| Virtualization Tools | Denodo 9.x, TIBCO/SDV, D365 Virtual Entities, CData Connect |
| Replication Tools | Fivetran, Debezium 3.x, Airbyte, GoldenGate 23ai, Qlik Replicate, Striim |
| ERP Systems | Salesforce, SAP S/4HANA, Oracle ERP Cloud, Dynamics 365, NetSuite, Workday |
| Deployment | Cloud, on-premise, hybrid (approach-dependent) |
| Status | GA — all three patterns are production-proven |
Each approach uses different technical surfaces to access ERP data. [src1, src2]
| Approach | Access Mechanism | Protocol | Query Language | Write Support | Real-Time? | Caching |
|---|---|---|---|---|---|---|
| Data Virtualization (Denodo) | JDBC/ODBC virtual views | SQL over HTTPS | SQL, VQL | Read-only (most configs) | Yes (live queries) | Configurable (TTL, partial, full) |
| Data Virtualization (D365) | Dataverse Virtual Tables | OData v4 | FetchXML, OData queries | Read-only | Yes | No native caching |
| Data Virtualization (TIBCO/SDV) | JDBC/ODBC, REST | SQL, REST | SQL | Read-only (most configs) | Yes (live queries) | Configurable |
| Replication (CDC) | Kafka Connect, log readers | CDC log parsing, REST | N/A (push-based) | N/A (replication) | Near-real-time (1-10s) | N/A (data is local) |
| Replication (ETL/ELT) | Batch connectors | REST, JDBC, file-based | N/A (scheduled extraction) | N/A (replication) | No (batch scheduled) | N/A (data is local) |
| Direct API Calls | ERP REST/OData endpoints | HTTPS/JSON, OData | SOQL, OData $filter, SuiteQL | Full CRUD | Yes | Client-side only |
The critical difference: data virtualization and direct API calls consume ERP API quotas; replication consumes them once during extraction and then queries run locally. [src3]
| Approach | ERP API Calls Per Query | Daily Impact (1K queries/day) | Daily Impact (100K queries/day) | Rate Limit Risk |
|---|---|---|---|---|
| Data Virtualization (uncached) | 1-N per user query (fan-out to sources) | 1,000-5,000 API calls | 100K-500K API calls (likely exceeds limits) | HIGH |
| Data Virtualization (cached, 5min TTL) | 0-1 per unique query per TTL window | 50-500 API calls | 5K-50K API calls | MODERATE |
| Replication (CDC) | 0 per read query (continuous stream) | 0 read API calls | 0 read API calls | LOW |
| Replication (ETL batch) | 0 per read query (scheduled bulk pulls) | Bulk extraction only | Same | LOW |
| Direct API Calls | 1 per query | 1,000 API calls | 100,000 API calls | HIGH |
| ERP System | Virtualization-Friendly? | Key Limitation | Recommended Approach |
|---|---|---|---|
| Salesforce | Moderate | 100K API calls/24h (Enterprise), no direct DB access | Replicate via Bulk API; virtualize only for low-volume ad-hoc |
| SAP S/4HANA | Good | Performance depends on HANA sizing, RFC deprecated in cloud | Virtualize via CDS views; replicate via ODP/SLT for analytics |
| Oracle ERP Cloud | Moderate | No published hard rate limit but throttled per tenant | Replicate via BICC; virtualize for cross-system dashboards |
| Dynamics 365 | Good | Read-only, no joins across virtual/native, no audit trail | Use Virtual Entities for lookups; replicate for reporting |
| NetSuite | Poor | 10 concurrent requests, governance units per script execution | Replicate via Saved Searches or SuiteQL; avoid virtualization |
| Workday | Poor | Undisclosed throttle limits, RaaS is batch-only | Replicate via RaaS/EIB; use REST API for single-record lookups |
| Approach | Credential Management | Token Refresh | Multi-System Auth |
|---|---|---|---|
| Data Virtualization | Centralized in virtualization platform | Automatic (platform-managed) | Built-in — single credential store |
| Replication | In replication tool (Fivetran, Debezium) or iPaaS | Automatic (tool-managed) | Per-connector configuration |
| Direct API Calls | In calling application code | Application must handle refresh and expiry | Each integration manages its own credentials |
START — Need to access ERP data from external system
├── What's the primary use case?
│ ├── Ad-hoc BI queries / dashboards (<1K queries/day)
│ │ ├── Need cross-system joins?
│ │ │ ├── YES → DATA VIRTUALIZATION (Denodo/TIBCO/SDV)
│ │ │ │ └── Budget for $100K+/year?
│ │ │ │ ├── YES → Denodo/TIBCO with caching enabled
│ │ │ │ └── NO → CData Connect Cloud or replicate to warehouse
│ │ │ └── NO → Direct API calls with client-side caching
│ │ └── ERP has good virtualization support? (SAP CDS, D365 Virtual Entities)
│ │ ├── YES → ERP-native virtualization features
│ │ └── NO → Replicate to data warehouse, query locally
│ ├── Operational application reads (>1K queries/day, low latency)
│ │ ├── Data freshness requirement?
│ │ │ ├── Real-time (<1s stale) → Direct API calls (watch rate limits)
│ │ │ ├── Near-real-time (1-60s stale) → REPLICATION via CDC
│ │ │ └── Periodic (minutes-hours stale) → REPLICATION via ETL/ELT
│ │ └── Query volume >10K/day?
│ │ ├── YES → REPLICATION — virtualization will exhaust API limits
│ │ └── NO → Direct API calls with application-level caching
│ ├── Analytics / data warehouse / ML training
│ │ └── Always → REPLICATION (ETL/ELT or CDC to data warehouse)
│ └── Data migration / initial load
│ └── Always → REPLICATION (bulk extraction)
├── What's the data volume?
│ ├── < 10K records total → Any approach works
│ ├── 10K-1M records → Virtualization with caching OR replication
│ └── > 1M records → REPLICATION required
└── What's the budget?
├── Enterprise ($100K+/year) → Denodo + CDC (best of both worlds)
├── Mid-market ($10K-50K/year) → CData Connect or Fivetran/Airbyte
└── Minimal → Open-source CDC (Debezium) + direct API calls
| Dimension | Data Virtualization | Replication (CDC) | Replication (ETL/ELT) | Direct API Calls |
|---|---|---|---|---|
| Data Freshness | Real-time (live query) | Near-real-time (1-10s lag) | Stale (15min-24h lag) | Real-time (per request) |
| Query Latency | 200ms-5s (source + network) | <10ms (local data) | <10ms (local data) | 200ms-2s (API round-trip) |
| Data Volume Limit | Low-moderate (<1M records scanned) | Unlimited (incremental) | Unlimited (batch) | Per-request limits (2K-10K records) |
| ERP API Impact | High (every uncached query) | Low (CDC stream) | Low (scheduled pulls) | High (every request) |
| Storage Cost | None (no data copied) | Moderate-high (full replica) | Moderate-high (full replica) | None (no data stored) |
| Infrastructure | Virtualization platform | Kafka + CDC tool + target DB | ETL tool + target DB | Application code only |
| Setup Complexity | Moderate | High | Moderate | Low |
| Write Support | Read-only (typically) | N/A (one-directional) | N/A (one-directional) | Full CRUD |
| Cross-System Joins | Native (federation) | Requires shared target DB | Requires shared target DB | Manual in application code |
| Annual Cost | $100K+ (Denodo), $6K+ (CData Cloud) | Free (Debezium) to $200K+ (GoldenGate) | $10K-100K (Fivetran, Airbyte) | Free (API calls only) |
| Best For | Ad-hoc cross-system BI, <1K queries/day | High-volume operational reads, event sync | Analytics, data warehouse, historical reporting | Transactional CRUD, low-volume lookups |
Map each data flow in your ERP integration to one of the three patterns by answering: (a) query volume, (b) freshness requirement, (c) data volume, (d) write needs. Most projects use a hybrid of 2-3 approaches. [src3]
# Assessment template — fill for each data flow
Data Flow: [e.g., "Sales order dashboard"]
Source ERP: [e.g., "Salesforce"]
Query Volume: [e.g., "500 queries/day from 20 analysts"]
Freshness Need: [e.g., "< 1 hour stale is fine"]
Data Volume: [e.g., "200K orders, growing 5K/month"]
Write Required: [e.g., "No — read-only reporting"]
→ Recommended: [e.g., "Replication via ETL to data warehouse"]
Verify: Each data flow should map clearly to one approach. If you're mixing virtualization with >10K queries/day, reconsider.
Create a virtual view that federates data from SAP and Salesforce without replication. [src2]
-- Denodo VQL: Create a virtual view joining SAP and Salesforce data
CREATE DATASOURCE CUSTOM sap_s4hana
CLASSNAME = 'com.denodo.connect.odata.ODataWrapper'
PROPERTIES (
'SERVICE_URL' = 'https://your-sap.example.com/sap/opu/odata/sap/API_BUSINESS_PARTNER',
'AUTH_TYPE' = 'BASIC'
);
CREATE VIEW unified_customer_360 AS
SELECT
sf.Id AS salesforce_id, sf.Name AS customer_name,
sap.BusinessPartner AS sap_bp_number, sap.CreditStandingStatus
FROM salesforce_accounts sf
INNER JOIN sap_business_partners sap
ON sf.SAPBusinessPartnerID__c = sap.BusinessPartner
CONTEXT ('cache_status' = 'partial', 'cache_expiration' = '300');
Verify: SELECT COUNT(*) FROM unified_customer_360 → returns record count without error. Query execution time <5s.
Configure CDC-based replication from an on-premise Oracle ERP database. [src5]
{
"name": "erp-oracle-cdc",
"config": {
"connector.class": "io.debezium.connector.oracle.OracleConnector",
"database.hostname": "erp-oracle.internal",
"database.port": "1521",
"schema.include.list": "AP,AR,GL",
"table.include.list": "AP.AP_INVOICES_ALL,AR.RA_CUSTOMER_TRX_ALL",
"snapshot.mode": "initial",
"log.mining.strategy": "online_catalog",
"log.mining.continuous.mine": "true"
}
}
Verify: curl localhost:8083/connectors/erp-oracle-cdc/status → "state": "RUNNING"
For low-volume operational reads where replication is overkill. [src6]
class ERPApiCache:
"""Simple TTL cache for ERP API reads. Use for <1,000 queries/day."""
def __init__(self, base_url, access_token, cache_ttl_seconds=300):
self.base_url = base_url
self.access_token = access_token
self.cache_ttl = cache_ttl_seconds
self._cache = {}
def query(self, soql):
cache_key = soql.strip().lower()
now = time.time()
if cache_key in self._cache:
cached_time, cached_data = self._cache[cache_key]
if now - cached_time < self.cache_ttl:
return cached_data
response = requests.get(
f"{self.base_url}/services/data/v62.0/query",
params={"q": soql},
headers={"Authorization": f"Bearer {self.access_token}"},
timeout=30)
response.raise_for_status()
data = response.json()
self._cache[cache_key] = (now, data)
return data
Verify: Run the same query twice within 5 minutes — second call should return instantly (cache hit).
# Input: Query metadata (volume, freshness, data size)
# Output: Recommended approach string
def recommend_data_access_approach(
queries_per_day, max_staleness_seconds, total_records,
needs_cross_system_joins, needs_write_access, budget_tier
):
if needs_write_access:
return "direct_api"
if queries_per_day > 10_000:
return "cdc_replication" if max_staleness_seconds < 60 else "etl_replication"
if total_records > 1_000_000:
return "cdc_replication" if max_staleness_seconds < 60 else "etl_replication"
if needs_cross_system_joins and budget_tier == "enterprise":
return "virtualization"
if queries_per_day < 100 and max_staleness_seconds < 1:
return "direct_api"
if queries_per_day < 1_000:
return "virtualization" if budget_tier == "enterprise" else "direct_api"
return "cdc_replication" if max_staleness_seconds < 60 else "etl_replication"
-- Fast-changing data: short cache (Salesforce opportunities)
ALTER VIEW salesforce_opportunities CACHE PARTIAL EXPIRATION 60;
-- Slow-changing data: long cache (SAP material master)
ALTER VIEW sap_material_master CACHE FULL EXPIRATION 86400 PRELOAD ON STARTUP;
-- Cross-system join uses cached data from both sources
CREATE VIEW cross_erp_order_view AS
SELECT sf.OpportunityId, sf.Amount, sap.MaterialNumber, d365.SalesOrderNumber
FROM salesforce_opportunities sf
JOIN sap_material_master sap ON sf.ProductCode__c = sap.MaterialNumber
JOIN d365_sales_orders d365 ON sf.D365OrderId__c = d365.SalesOrderNumber;
# Query a Virtual Entity (Finance & Operations data exposed to Dataverse)
time curl -s -w "\nHTTP Status: %{http_code}\nTotal Time: %{time_total}s\n" \
-H "Authorization: Bearer $D365_TOKEN" \
-H "OData-MaxVersion: 4.0" \
"https://your-org.crm.dynamics.com/api/data/v9.2/mserp_vendors?\$select=mserp_vendoraccountnumber,mserp_name&\$top=10"
# Expected: Total Time should be 1-5s for Virtual Entities
# If >5s consistently, consider replicating this data instead
| Aspect | Virtualization | Replication | Direct API |
|---|---|---|---|
| Schema | Virtual views match source schema (or transformed) | Target schema — may differ from source | API response schema (JSON/XML) |
| Data types | Translated by virtualization engine | Mapped during ETL/CDC pipeline | Raw API types (string dates, etc.) |
| Null handling | Source-dependent (passed through) | Can be transformed during load | API-dependent (null vs absent field) |
| Currency | Source format (may differ per ERP) | Standardized during transformation | Source format |
| Datetime | Source timezone (UTC in cloud ERPs) | Converted during transformation | UTC or source timezone |
| IDs/Keys | Source IDs (no surrogate keys) | Surrogate keys possible in target | Source IDs only |
| Approach | Error | Cause | Resolution |
|---|---|---|---|
| Virtualization | Query timeout (30-120s) | Source ERP slow or unavailable | Configure timeout fallback to cached data; add circuit breaker |
| Virtualization | Source connection pool exhausted | Too many concurrent queries | Increase pool size or enable full caching for hot views |
| Virtualization | Rate limit exceeded on source ERP | Uncached queries exceeding ERP API quota | Enable aggressive caching; switch to replication for high-volume views |
| CDC Replication | Replication slot lag growing | Consumer too slow or backpressure | Scale Kafka consumers; check for slow sink connectors |
| CDC Replication | Schema change breaks pipeline | ALTER TABLE on source without notification | Use Debezium schema registry with compatibility mode |
| ETL Replication | Bulk API timeout on large extract | Extracting too many records in single job | Chunk into smaller date-range batches |
| ETL Replication | Stale data due to failed pipeline run | ETL job failed silently overnight | Implement pipeline observability with freshness SLAs |
| Direct API | 429 Too Many Requests | Rate limit exceeded | Exponential backoff (2^n seconds, max 5 retries, jitter) |
| Direct API | 504 Gateway Timeout | ERP under load, complex query | Simplify query, add selective filters, reduce payload |
Configure per-source timeouts and graceful degradation in Denodo (CONTEXT clause with TIMEOUT and DEFAULT_VALUE). [src2]Use Debezium snapshot.mode=when_needed to automatically re-snapshot when log position is lost. [src5]Implement data freshness checks — compare max(last_modified) in target vs source; alert when delta exceeds SLA. [src1]Enable Denodo query governance — set per-user and per-view query quotas; mandate caching. [src3]Cannot be fixed — this is a platform limitation. Replicate the data to Dataverse instead. [src4]# BAD — Using data virtualization for an application making 50K reads/day
# Every call hits the ERP API through the virtualization layer
def get_product_price(product_id):
result = denodo_client.query(f"SELECT price FROM erp_product_master WHERE id = '{product_id}'")
return result[0]["price"]
# Called 50,000 times/day — ERP rate limit exceeded by noon
# GOOD — Replicate product master to local DB via CDC, query locally
def get_product_price(product_id):
result = local_db.execute("SELECT price FROM product_master_replica WHERE id = %s", (product_id,))
return result[0]["price"]
# CDC keeps local replica 1-10s behind source. 50K reads cost 0 ERP API calls
# BAD — Replicating all 500 Salesforce objects to Snowflake
replication_config = {"source": "salesforce", "target": "snowflake", "objects": "ALL"}
# Result: $5,000/month Snowflake storage, $2,000/month Fivetran — only 12 objects queried
# GOOD — Replicate 12 high-volume objects; virtualize low-volume on-demand
replication_config = {
"source": "salesforce", "target": "snowflake",
"objects": ["Account", "Contact", "Opportunity", "Lead", "Campaign",
"CampaignMember", "Task", "Event", "Case", "Product2",
"PricebookEntry", "OpportunityLineItem"]
}
# Low-volume objects (ContentDocument, Entitlement) accessed via Denodo on-demand
# BAD — 6 API calls per dashboard load, 50 users, refreshed every 5 min
# = 86,400 API calls/day just for one dashboard
def get_revenue_by_region_dashboard():
regions = sf_api.query("SELECT Region__c FROM Account GROUP BY Region__c")
for region in regions["records"]:
revenue = sf_api.query(f"SELECT SUM(Amount) FROM Opportunity WHERE ...")
# GOOD — Single query against local warehouse replica, 0 ERP API calls
def get_revenue_by_region_dashboard():
return warehouse.query("""
SELECT a.region, SUM(o.amount) AS total_revenue
FROM opportunity_replica o JOIN account_replica a ON o.account_id = a.id
WHERE o.stage_name = 'Closed Won' AND o.close_date >= DATEADD(month, -12, CURRENT_DATE)
GROUP BY a.region ORDER BY total_revenue DESC""")
Enable query cost tracking in Denodo; set per-user daily query budgets; mandate caching for all production views. [src2]Use Denodo cache invalidation listeners tied to ERP change events. [src2]Use vendor-native CDC (Salesforce CDC, SAP ODP) or API-based incremental extraction with timestamp filters. [src5]Deploy Denodo in HA cluster (active-active); maintain a parallel replicated data path for critical applications. [src3]Designate one canonical path per data object — either virtualized or replicated, never both for the same consumer use case. [src1]# === Data Virtualization (Denodo) ===
# Check cache status and hit rate for a virtual view
# Denodo VQL: SELECT * FROM GET_CACHE_STATUS() WHERE viewname = 'unified_customer_360';
# Check active connections to ERP sources
# Denodo VQL: SELECT * FROM GET_CONNECTIONS() WHERE datasource_type = 'CUSTOM';
# === CDC Replication (Debezium) ===
# Check connector status and lag
curl -s localhost:8083/connectors/erp-oracle-cdc/status | python3 -m json.tool
# Check consumer group lag (Kafka)
kafka-consumer-groups.sh --bootstrap-server kafka:9092 --group erp-cdc-sink --describe
# === Direct API ===
# Check Salesforce API usage remaining
curl -s -H "Authorization: Bearer $SF_TOKEN" \
"https://your-instance.salesforce.com/services/data/v62.0/limits" \
| python3 -c "import sys,json; d=json.load(sys.stdin); print(f'DailyApiRequests: {d[\"DailyApiRequests\"][\"Remaining\"]}/{d[\"DailyApiRequests\"][\"Max\"]}')"
| Tool/Platform | Version | Release | Status | Key Changes |
|---|---|---|---|---|
| Denodo Platform | 9.1 | 2025-Q4 | Current | AI-enhanced query optimizer, improved cloud connectors |
| Denodo Platform | 9.0 | 2025-Q1 | Supported | Data catalog integration, GraphQL support |
| Denodo Platform | 8.0 | 2023-Q1 | Supported (extended) | Major architecture refresh |
| TIBCO Data Virtualization | SDV 2025 | 2025-Q3 | Current | Rebranded to Spotfire Data Virtualization |
| TIBCO Data Virtualization | TDV 8.8 | 2024-Q4 | Maintenance | Last release under TIBCO name |
| D365 Virtual Entities | 2025 Wave 2 | 2025-10 | Current | Improved OData performance, more F&O entities |
| Debezium | 3.4.x | 2025-12 | Current | Java 17+ required, Kafka 3.x baseline |
| Debezium | 2.7.x | 2024-06 | EOL | Last Java 11 compatible version |
| CData Connect Cloud | 2025.2 | 2025-Q4 | Current | Added SAP HANA and Workday connectors |
| Use When | Don't Use When | Use Instead |
|---|---|---|
| Cross-system ad-hoc BI queries, <1K queries/day, budget for enterprise tooling | High-volume application reads (>10K queries/day) | CDC replication (change-data-capture-erp) |
| Need real-time data freshness for low-volume dashboards | Analytics workloads scanning millions of records | ETL/ELT replication to data warehouse |
| Quick POC without building ETL pipelines | Cloud SaaS ERPs with strict rate limits (Salesforce, NetSuite) | API-based incremental extraction with local caching |
| SAP S/4HANA CDS view consumption (virtualization-friendly) | Need write access back to ERP | Direct API calls (system-specific API cards) |
| Centralized data governance across multiple ERPs | Budget <$50K/year for data platform tooling | Open-source replication (Debezium + Airbyte) |
| D365 read-only lookups from Power Platform apps | Complex joins between virtual and native D365 data | Replicate F&O data to Dataverse (Dual Write) |
| Capability | Data Virtualization | CDC Replication | ETL/ELT Replication | Direct API Calls |
|---|---|---|---|---|
| Implementation Time | 2-4 weeks | 4-8 weeks | 2-6 weeks | Days-weeks |
| Ongoing Maintenance | Low | Moderate | Moderate | Low |
| Data Freshness | Real-time (live query) | Near-real-time (1-10s) | Periodic (15min-24h) | Real-time (per request) |
| Query Performance | 200ms-5s | <10ms | <10ms | 200ms-2s |
| Scalability Ceiling | Low-moderate | High (Kafka scales) | High (warehouse scales) | Low (API rate limits) |
| Failure Impact | Source outage = query failure | Source outage = stale but queryable | Source outage = stale data | Source outage = total failure |
| Cost at Scale | High (licensing + API cost) | Moderate (infra + storage) | Moderate (tool + storage) | Low (API calls only) |
| Data Governance | Centralized | Distributed | Distributed | None |
| Multi-Source Joins | Native (federation) | Requires shared target DB | Requires shared target DB | Manual |
| Historical Data | No (current state only) | Partial (event sourcing) | Yes (SCD, snapshots) | No (current state) |
| Vendor Lock-in | High (proprietary) | Low-moderate | Low-moderate | None |