OIA Data Collection

Type: Execution Recipe Confidence: 0.85 Sources: 4 Verified: 2026-03-29

Purpose

This recipe produces an anonymized, multi-platform communication metadata dataset ready for organizational network analysis. It extracts interaction patterns from Slack, email, Jira, Git, calendar, and HRIS — capturing who communicates with whom, how often, and through which channels — without accessing message content. [src3, src4]

Prerequisites

Constraints

Tool Selection Decision

Which platform combination?
├── Slack + Google Workspace
│   └── PATH A: Slack API + Google Admin SDK
├── Slack + Microsoft 365
│   └── PATH B: Slack API + Microsoft Graph API
├── Microsoft Teams + Microsoft 365
│   └── PATH C: Microsoft Graph API (unified)
└── Mixed or restricted access
    └── PATH D: Manual exports + survey supplement
PathToolsCostSpeedOutput Quality
A: Slack + Googleslack_sdk, Google Admin SDK, Python$02-3 daysExcellent
B: Slack + Microsoftslack_sdk, Microsoft Graph, Python$02-3 daysExcellent
C: Teams + MicrosoftMicrosoft Graph API, Python$01-2 daysExcellent
D: Manual + SurveyAdmin exports, survey tool$0-$2003-5 daysAdequate

Execution Flow

Step 1: Inventory Available Systems

Duration: 2-4 hours · Tool: Client IT coordination

Map all communication and collaboration platforms in use. Document API availability, admin access status, data retention period, and user count per system.

Verify: Complete system list with API availability confirmed. · If failed: Document gaps, proceed with available systems.

Step 2: Legal & Privacy Framework

Duration: 1-2 days · Tool: Legal coordination

Draft data processing agreement, employee transparency notice, anonymization protocol (SHA-256 hash of employee ID + engagement salt), and data retention schedule. [src3]

Verify: Legal sign-off received, transparency notice sent. · If failed: Escalate to executive sponsor.

Step 3: Slack Data Export

Duration: 4-8 hours · Tool: Python slack_sdk or Slack Admin Panel

Extract communication metadata: message timestamps, channel names, reaction counts, thread depths, sender IDs (anonymized). NOT message content. [src1]

# Extract metadata per channel (no message content)
from slack_sdk import WebClient
client = WebClient(token="xoxb-YOUR-BOT-TOKEN")
# conversations.history → timestamps, thread_ts, reply_count, reactions

Verify: Export covers all target departments, row count > 1000 for 90-day window. · If failed: Reduce batch size, add sleep intervals.

Step 4: Email Metadata Collection

Duration: 4-8 hours · Tool: Microsoft Graph API or Google Workspace Admin SDK

Extract email metadata: sender/receiver pairs, timestamps, volume per relationship. NOT body content. [src2]

Verify: Metadata covers 90+ days, cross-reference user count with org chart. · If failed: Implement exponential backoff for throttling.

Step 5: Project Tool Data

Duration: 2-4 hours · Tool: Jira REST API, GitHub API, calendar API

Extract ticket velocity, assignment patterns, PR review turnaround, meeting density.

Verify: Data covers 90+ days, ticket count > 100. · If failed: Use CSV export from admin panel.

Step 6: HRIS Data Integration

Duration: 2-4 hours · Tool: HRIS export (BambooHR, Workday, or manual)

Extract org structure: reporting lines, department assignments, tenure, role titles. Cross-reference with communication data.

Verify: > 90% of communication senders matched to HRIS records. · If failed: Build org chart manually from client document.

Output Schema

{
  "output_type": "oia_communication_dataset",
  "format": "CSV + JSON",
  "files": [
    {"name": "slack_metadata.csv", "description": "Slack channel activity metadata"},
    {"name": "email_metadata.csv", "description": "Email sender/receiver pair metadata"},
    {"name": "project_metadata.csv", "description": "Jira/GitHub project management metadata"},
    {"name": "org_structure.json", "description": "Anonymized org chart"},
    {"name": "combined_oia_dataset.csv", "description": "Merged dataset for network analysis"}
  ]
}

Quality Benchmarks

Quality MetricMinimum AcceptableGoodExcellent
Employee coverage (% matched to HRIS)> 70%> 85%> 95%
Temporal coverage (days of data)> 90 days> 180 days> 365 days
Platform coverage> 50%> 75%> 90%
Anonymization completeness100%100%100%
Data freshness (days since last record)< 30 days< 14 days< 7 days

If below minimum: Extend data collection period, add missing platform integrations, or supplement with survey data.

Error Handling

ErrorLikely CauseRecovery Action
Slack API rate limit (429)Too many requests per minuteAdd 2s sleep, reduce batch size to 50
Microsoft Graph auth failure (401)Token expired or insufficient permissionsRefresh OAuth token, verify scopes
HRIS export incompleteTerminated employees excludedRequest full export including all active employees
Low employee match rate (< 70%)Email/Slack ID mismatchBuild manual mapping table
Data gap in timeframePlatform retention policy too shortDocument gap, supplement with survey data

Cost Breakdown

ComponentFree TierPaid TierAt Scale
Slack API access$0$0$0
Microsoft Graph API$0$0$0
Jira API$0$0$0
Survey tool (if needed)$0 (Google Forms)$50 (Typeform)$200 (SurveyMonkey)
Total$0$0-$50$0-$500

Anti-Patterns

Wrong: Accessing message content without consent

Reading Slack messages or email bodies for richer data. Result: GDPR violation, client trust destroyed. [src3]

Correct: Metadata only, always

Extract timestamps, sender/receiver pairs, channel names, reaction counts. Metadata under legitimate interest is legally defensible.

Wrong: Skipping anonymization during collection

Processing data with real names and anonymizing later. Result: PII on consultant's machine creates breach risk. [src3]

Correct: Anonymize at point of extraction

Apply hash immediately when extracting from each API. Raw employee IDs should never persist in any file.

Wrong: Using only one communication channel

Extracting Slack alone and declaring complete. Result: misses email-heavy relationships common in senior leadership. [src4]

Correct: Multi-channel triangulation

Extract from all available platforms and merge for a complete picture.

When This Matters

Use when an agent needs to collect organizational communication metadata for network analysis. This is Step 2-3 of the OIA engagement lifecycle. Requires signed data processing agreement and admin-level API access. Output feeds directly into the network analysis execution recipe.

Related Units