What are the best chunking and parallel processing strategies for bulk ERP integration?
TL;DR
Bottom line: Break datasets into 1,000-10,000 record chunks (system-dependent), process with rate-limit-aware parallelism, and implement per-chunk error handling — never send everything in one giant batch.
Key limit: Each ERP has hard ceiling limits — Salesforce 150 MB/file and 100M records/24h; NetSuite 15-55 concurrent threads; D365 30-min sync timeout; Oracle FBDI 250 MB/ZIP and 100K records/import.
Watch out for: Partial success handling — Salesforce Bulk API reports per-record success/failure, but NetSuite CSV Import and Oracle FBDI fail entire batches on any validation error.
Best for: Any integration moving > 2,000 records per operation — data migration, nightly syncs, master data distribution, large transaction loads.
Authentication: Use service-to-service OAuth flows (JWT for Salesforce, TBA for NetSuite, Entra ID for D365, OAuth 2.0 for Oracle).
System Profile
This is a cross-system architecture pattern card covering bulk data processing strategies across the five most widely integrated ERP platforms. It compares native bulk APIs, chunking approaches, concurrency models, and error handling at the bulk operation level.
System
Role
Bulk API Surface
Direction Covered
Salesforce
CRM / Platform
Bulk API 2.0 (CSV over REST)
Inbound + Outbound
Oracle NetSuite
ERP / Financials
SuiteTalk REST, CSV Import, SuiteQL
Inbound + Outbound
SAP S/4HANA
ERP / Manufacturing
OData $batch, BAPI, LSMW, LTMC
Inbound + Outbound
Microsoft D365 F&O
ERP / Finance
DMF Recurring Integration, OData batch
Inbound + Outbound
Oracle ERP Cloud
ERP / Financials
FBDI + ESS, REST
Inbound + Outbound
API Surfaces & Capabilities
ERP System
Bulk API
Protocol
Max Records/Request
Max File Size
Concurrency Limit
Async?
Partial Success?
Salesforce
Bulk API 2.0
REST/CSV
150M records/job
150 MB/file
25 concurrent jobs
Yes
Yes (per-record)
NetSuite
CSV Import
UI/Scheduled
100K+ per file
No hard limit
5 queues (multi-thread)
Yes
No (entire batch)
NetSuite
SuiteTalk REST
REST/JSON
1 record/request
104 MB response
15-55 concurrent
Optional
N/A
SAP S/4HANA
OData $batch
REST/JSON
1,000 changesets
50 MB
Shared dialog processes
Yes (async mode)
Yes (per-changeset)
SAP S/4HANA
LSMW/LTMC
Batch input
No hard limit
File-dependent
1 session/project
Yes
Partial (error log)
D365 F&O
DMF Recurring
REST/Package
Package-dependent
No documented limit
Configurable (Batch Concurrency Control)
Yes
Yes (per-record in log)
Oracle ERP Cloud
FBDI
REST/ZIP(CSV)
100,000 per import
250 MB/ZIP
5 concurrent imports
Yes (ESS jobs)
No (entire import)
Rate Limits & Quotas
Per-Request Limits
ERP System
Limit Type
Value
Notes
Salesforce
Max file size per upload
150 MB
Split larger datasets into multiple files
Salesforce
Internal batch size
10,000 records
Auto-created; each takes up to 10 min
Salesforce
PK Chunking default
250,000 records
Configurable via Sforce-PK-Chunking header
NetSuite
Records per REST request
1
Use CSV Import for true bulk ops
NetSuite
Max response payload
104 MB
Paginate queries exceeding this
SAP
Changesets per $batch
1,000
Split across multiple $batch calls
D365
Sync mode timeout
30 minutes
Use batch mode for longer operations
Oracle FBDI
Records per import
100,000
Split into multiple FBDI submissions
Oracle FBDI
ZIP file size
250 MB
Individual CSV within ZIP can be up to 1 GB
Rolling / Daily Limits
ERP System
Limit Type
Value
Window
Edition Differences
Salesforce
Total records via Bulk API
100,000,000
24h rolling
Same across editions
Salesforce
Bulk API batches
15,000
24h rolling
Shared across editions
NetSuite
Concurrent threads
15-55
Per-moment
SuiteCloud Plus adds 10/license
SAP
OData requests
Fair-use throttling
Depends on sizing
On-prem: hardware; Cloud: quota
D365
API calls
Priority-based throttling
Per-minute
Based on license count
Oracle FBDI
Concurrent imports
5
Per-moment
Same across editions
Authentication
ERP System
Recommended Flow
Token Lifetime
Notes
Salesforce
OAuth 2.0 JWT Bearer
2h (session timeout)
New JWT per bulk job; never use username-password
NetSuite
Token-Based Auth (TBA)
Until revoked
Create dedicated integration user
SAP S/4HANA
OAuth 2.0 Client Credentials
Configurable (12h default)
Basic auth deprecated
D365 F&O
Microsoft Entra ID (OAuth 2.0)
1h access token
Client credentials flow
Oracle ERP Cloud
OAuth 2.0
Session-based
Via Oracle IDCS
Authentication Gotchas
Salesforce bulk jobs inherit the API user's governor limits — a dedicated integration user prevents limit sharing with interactive users [src1]
NetSuite TBA tokens are tied to specific integration records — rotating tokens requires updating all connected middleware [src3]
D365 recurring integration jobs require the Entra ID app ID to be registered in the data project — mismatched app IDs silently reject submissions [src4]
Constraints
Salesforce Bulk API 2.0 auto-creates internal 10K-record batches from your uploaded file — you cannot control internal batch boundaries, only the file you upload
NetSuite has no native bulk upsert API — CSV Import or individual REST calls with external IDs are the only options for idempotent bulk writes
SAP OData $batch is transactional per changeset — a single failing record in a changeset rolls back the entire changeset
D365 DMF parallel package import must be explicitly enabled (Data Management > Framework Parameters > Enhanced Parallel Package Import)
Oracle FBDI rejects the entire import on any data validation error — there is no partial success mode
Cross-system: running bulk jobs during business hours competes with interactive users for API quota
Integration Pattern Decision Tree
START — User needs to bulk-load data into an ERP system
├── How many records per batch cycle?
│ ├── < 2,000 records
│ │ └── Use standard REST API with composite/batch requests
│ ├── 2,000 – 100,000 records
│ │ ├── Salesforce → Bulk API 2.0 single job
│ │ ├── NetSuite → CSV Import (multi-thread with SuiteCloud Plus)
│ │ ├── SAP → OData $batch (1,000 changesets/request)
│ │ ├── D365 → DMF Recurring Integration (single package)
│ │ └── Oracle → FBDI single ZIP + ESS job
│ ├── 100,000 – 1,000,000 records
│ │ ├── Salesforce → Bulk API 2.0 with file chunking (10-15 MB files)
│ │ ├── NetSuite → CSV Import with queue distribution (queues 1-5)
│ │ ├── SAP → LTMC / BAPI batch processing
│ │ ├── D365 → DMF + Enhanced Parallel Package Import
│ │ └── Oracle → Multiple FBDI submissions (≤100K per import, ≤5 concurrent)
│ └── > 1,000,000 records
│ ├── Salesforce → Bulk API 2.0 with job queuing (max 25 concurrent)
│ ├── NetSuite → Staged CSV Import + message queue orchestration
│ ├── SAP → LSMW / LTMC with parallel sessions
│ ├── D365 → Multiple DMF packages with batch concurrency control
│ └── Oracle → Sequential FBDI batches (respect 5-concurrent limit)
├── Chunking strategy?
│ ├── Fixed-size chunks → Simplest; use when records are independent
│ ├── Adaptive chunks → Adjust based on response times and error rates
│ └── Dependency-aware chunks → Group parent/child records together
├── Error tolerance?
│ ├── Zero-loss → Per-chunk retry + dead letter queue + idempotency keys
│ └── Best-effort → Log failures, skip, continue
└── Performance priority?
├── Throughput → Maximize parallelism up to concurrency limit
└── Reliability → Reduce parallelism, increase retry budget
2. Implement chunk generation with dependency awareness
Split source data into chunks that respect parent-child relationships. [src7]
def chunk_with_dependencies(records, chunk_size=10000):
"""Split records into chunks preserving parent-child groups."""
groups = {}
for record in records:
parent_id = record.get('parent_id')
if parent_id:
groups.setdefault(parent_id, []).append(record)
else:
groups.setdefault(record['id'], [])
groups[record['id']].insert(0, record)
chunks, current_chunk, current_size = [], [], 0
for group_id, group_records in groups.items():
group_size = len(group_records)
if current_size + group_size > chunk_size and current_chunk:
chunks.append(current_chunk)
current_chunk, current_size = [], 0
current_chunk.extend(group_records)
current_size += group_size
if current_chunk:
chunks.append(current_chunk)
return chunks
3. Implement rate-limit-aware parallel processing
Process chunks in parallel while respecting concurrency limits. [src1, src3]
import asyncio
from asyncio import Semaphore
ERP_CONCURRENCY = {
'salesforce': 10, 'netsuite': 10, 'sap': 5, 'd365': 8, 'oracle_fbdi': 3,
}
async def process_chunks_parallel(chunks, erp_type, process_fn):
semaphore = Semaphore(ERP_CONCURRENCY.get(erp_type, 5))
async def process_with_limit(idx, chunk):
async with semaphore:
try:
result = await process_fn(chunk)
return {'chunk': idx, 'status': 'success', 'result': result}
except RateLimitError as e:
await asyncio.sleep(e.retry_after or 2 ** idx)
result = await process_fn(chunk)
return {'chunk': idx, 'status': 'success', 'result': result}
except Exception as e:
return {'chunk': idx, 'status': 'failed', 'error': str(e)}
return await asyncio.gather(*[process_with_limit(i, c) for i, c in enumerate(chunks)])
4. Handle partial failures per ERP
Each ERP handles partial success differently. [src1, src4, src6]
def handle_partial_failures(results, erp_type):
if erp_type == 'salesforce':
# Per-record success/failure — retry only failed records
return [r for r in results if r['sf__Error']]
elif erp_type == 'netsuite_csv':
# Entire batch fails — retry all after fixing data
return results['all_records'] if results.get('status') == 'FAILED' else []
elif erp_type == 'sap_odata':
# Per-changeset — retry failed changesets only
return [cs for cs in results['changesets'] if cs['status'] >= 400]
elif erp_type == 'd365':
# Check execution errors via API
return results.get('execution_errors', [])
elif erp_type == 'oracle_fbdi':
# Entire import fails — fix data and resubmit
return results['all_records'] if results.get('status') == 'ERROR' else []
Salesforce Bulk API 2.0 CSV requires all lookup fields as 18-char IDs — 15-char IDs cause MALFORMED_ID errors [src1]
NetSuite CSV Import treats empty strings and "null" differently — empty string clears a field, omitting preserves existing value [src3]
SAP OData $batch with JSON requires Edm.Decimal as strings — numeric precision loss causes rounding errors
D365 DMF date fields require ISO 8601 — locale-specific formats are rejected in REST package submission [src4]
Oracle FBDI date columns require YYYY/MM/DD HH:mm:ss — ISO 8601 with 'T' separator is rejected [src6]
Error Handling & Failure Points
Common Error Codes
ERP
Code
Meaning
Cause
Resolution
Salesforce
UNABLE_TO_LOCK_ROW
Record locked
Concurrent update
Reduce parallelism; add jitter
Salesforce
REQUEST_LIMIT_EXCEEDED
Daily limit hit
Too many bulk jobs/24h
Consolidate into fewer, larger jobs
NetSuite
429
Rate limit exceeded
Burst or daily limit
Exponential backoff; honor Retry-After
NetSuite
CONCURRENT_REQUEST_LIMIT
Thread limit hit
Too many parallel requests
Reduce semaphore count
SAP
501
$batch not supported
Endpoint limitation
Use individual requests or BAPI batch
D365
ProcessedWithErrors
Partial failure
Validation errors
Call GetExecutionErrors API
Oracle
ESS FAILED
Import validation error
Data/constraint violation
Download error report; fix and resubmit
Failure Points in Production
Salesforce Bulk API silent timeout: Internal 10K-record batches can timeout; Salesforce retries 10x, then fails the job. Fix: Keep records per file under 100K. [src1]
NetSuite CSV Import order dependency: Children may process before parents in multi-threaded mode. Fix: Upload parents first, then children — or disable multi-threading. [src3]
D365 DMF parallel execution collision: Files arriving faster than processing causes lock contention. Fix: Enable sequential processing of messages in recurring job config. [src4]
Oracle FBDI BOM character: UTF-8 BOM causes FBDI to misread first column header. Fix: Save CSV as UTF-8 without BOM; validate with hex editor. [src6]
Cross-system timezone overlap: Bulk jobs at "midnight" may hit maintenance windows. Fix: Use UTC timestamps; check ERP maintenance calendar.
Anti-Patterns
Wrong: Processing all records in a single giant batch
# BAD — single-threaded, no chunking, all-or-nothing failure
def bulk_import_bad(records, api_client):
result = api_client.bulk_upload(records) # Times out at 150K+
if result.failed:
bulk_import_bad(records, api_client) # Infinite retry loop
Correct: Chunked processing with per-chunk error isolation
# GOOD — chunked, parallel, per-chunk error handling
def bulk_import_good(records, api_client, chunk_size=10000):
chunks = [records[i:i+chunk_size] for i in range(0, len(records), chunk_size)]
failed_chunks = []
for i, chunk in enumerate(chunks):
try:
result = api_client.bulk_upload(chunk)
log_success(i, len(chunk), result.processed)
except Exception as e:
failed_chunks.append((i, chunk, str(e)))
retry_failed_chunks(failed_chunks, api_client)
Wrong: Ignoring partial success results
# BAD — assumes entire job succeeded or failed
job = api_client.get_job_status(job_id)
if job.state == 'JobComplete':
print("All done!") # WRONG — can still have failed records
Correct: Checking per-record success/failure
# GOOD — inspect individual record results
job = api_client.get_job_status(job_id)
if job.state == 'JobComplete' and job.number_records_failed > 0:
failures = api_client.get_failed_results(job_id)
for record in failures:
dead_letter_queue.add(record) # Route to DLQ
Wrong: Maximum parallelism without rate-limit awareness
# BAD — fire all chunks simultaneously
tasks = [api_client.upload(chunk) for chunk in chunks]
await asyncio.gather(*tasks) # 500 concurrent → instant 429
Correct: Semaphore-controlled parallelism with backoff
# GOOD — respect concurrency limits
sem = asyncio.Semaphore(10)
async def upload_with_limit(chunk):
async with sem:
for attempt in range(3):
try:
return await api_client.upload(chunk)
except RateLimitError:
await asyncio.sleep(2 ** attempt)
await asyncio.gather(*[upload_with_limit(c) for c in chunks])
Common Pitfalls
Chunks too small (< 100 records): API overhead per request dominates — same HTTP overhead for 100 records as 10,000. Fix: Start at ERP's recommended minimum and increase until hitting timeout or memory limits. [src7]
Chunks too large: Risk timeout, memory exhaustion, all-or-nothing rollback. Fix: Stay at 50-70% of documented maximum. [src1]
Not pre-validating data: Oracle FBDI and NetSuite CSV Import fail entire batches on one bad record. Fix: Validate all records against required fields, types, and picklists before submission. [src6]
Running bulk jobs during business hours: Competes with interactive users for API quota. Fix: Schedule during off-peak hours.
No progress tracking: When a 50-chunk job fails at chunk 37, you must know what succeeded. Fix: Log chunk boundaries in persistent store; implement checkpoint/restart. [src7]
Single-threaded bulk processing: 1M records single-threaded takes hours. Fix: Use concurrent processing up to ERP's concurrency limit (typical 3-8x speedup). [src3]
Mixing create and update: Some ERPs require distinct modes. Fix: Use upsert where available; otherwise separate creates and updates into distinct jobs. [src1]
Rate limits and chunk size recommendations are subject to change with each vendor release — always verify against current documentation
Throughput benchmarks vary dramatically based on record complexity, custom logic (triggers, workflows), and time of day — figures here are baseline estimates for simple records
Edition-specific differences exist for all platforms — Salesforce Developer edition has drastically lower limits (15K API calls/24h); NetSuite tiers determine concurrent threads
Sandbox environments have different performance characteristics than production — always load-test with production-representative data volumes
Oracle FBDI limits may vary by functional module — procurement, financials, and supply chain may have different per-import record caps