This recipe produces an anonymized, multi-platform communication metadata dataset ready for organizational network analysis. It extracts interaction patterns from Slack, email, Jira, Git, calendar, and HRIS — capturing who communicates with whom, how often, and through which channels — without accessing message content. [src3, src4]
Which platform combination?
├── Slack + Google Workspace
│ └── PATH A: Slack API + Google Admin SDK
├── Slack + Microsoft 365
│ └── PATH B: Slack API + Microsoft Graph API
├── Microsoft Teams + Microsoft 365
│ └── PATH C: Microsoft Graph API (unified)
└── Mixed or restricted access
└── PATH D: Manual exports + survey supplement
| Path | Tools | Cost | Speed | Output Quality |
|---|---|---|---|---|
| A: Slack + Google | slack_sdk, Google Admin SDK, Python | $0 | 2-3 days | Excellent |
| B: Slack + Microsoft | slack_sdk, Microsoft Graph, Python | $0 | 2-3 days | Excellent |
| C: Teams + Microsoft | Microsoft Graph API, Python | $0 | 1-2 days | Excellent |
| D: Manual + Survey | Admin exports, survey tool | $0-$200 | 3-5 days | Adequate |
Duration: 2-4 hours · Tool: Client IT coordination
Map all communication and collaboration platforms in use. Document API availability, admin access status, data retention period, and user count per system.
Verify: Complete system list with API availability confirmed. · If failed: Document gaps, proceed with available systems.
Duration: 1-2 days · Tool: Legal coordination
Draft data processing agreement, employee transparency notice, anonymization protocol (SHA-256 hash of employee ID + engagement salt), and data retention schedule. [src3]
Verify: Legal sign-off received, transparency notice sent. · If failed: Escalate to executive sponsor.
Duration: 4-8 hours · Tool: Python slack_sdk or Slack Admin Panel
Extract communication metadata: message timestamps, channel names, reaction counts, thread depths, sender IDs (anonymized). NOT message content. [src1]
# Extract metadata per channel (no message content)
from slack_sdk import WebClient
client = WebClient(token="xoxb-YOUR-BOT-TOKEN")
# conversations.history → timestamps, thread_ts, reply_count, reactions
Verify: Export covers all target departments, row count > 1000 for 90-day window. · If failed: Reduce batch size, add sleep intervals.
Duration: 4-8 hours · Tool: Microsoft Graph API or Google Workspace Admin SDK
Extract email metadata: sender/receiver pairs, timestamps, volume per relationship. NOT body content. [src2]
Verify: Metadata covers 90+ days, cross-reference user count with org chart. · If failed: Implement exponential backoff for throttling.
Duration: 2-4 hours · Tool: Jira REST API, GitHub API, calendar API
Extract ticket velocity, assignment patterns, PR review turnaround, meeting density.
Verify: Data covers 90+ days, ticket count > 100. · If failed: Use CSV export from admin panel.
Duration: 2-4 hours · Tool: HRIS export (BambooHR, Workday, or manual)
Extract org structure: reporting lines, department assignments, tenure, role titles. Cross-reference with communication data.
Verify: > 90% of communication senders matched to HRIS records. · If failed: Build org chart manually from client document.
{
"output_type": "oia_communication_dataset",
"format": "CSV + JSON",
"files": [
{"name": "slack_metadata.csv", "description": "Slack channel activity metadata"},
{"name": "email_metadata.csv", "description": "Email sender/receiver pair metadata"},
{"name": "project_metadata.csv", "description": "Jira/GitHub project management metadata"},
{"name": "org_structure.json", "description": "Anonymized org chart"},
{"name": "combined_oia_dataset.csv", "description": "Merged dataset for network analysis"}
]
}
| Quality Metric | Minimum Acceptable | Good | Excellent |
|---|---|---|---|
| Employee coverage (% matched to HRIS) | > 70% | > 85% | > 95% |
| Temporal coverage (days of data) | > 90 days | > 180 days | > 365 days |
| Platform coverage | > 50% | > 75% | > 90% |
| Anonymization completeness | 100% | 100% | 100% |
| Data freshness (days since last record) | < 30 days | < 14 days | < 7 days |
If below minimum: Extend data collection period, add missing platform integrations, or supplement with survey data.
| Error | Likely Cause | Recovery Action |
|---|---|---|
| Slack API rate limit (429) | Too many requests per minute | Add 2s sleep, reduce batch size to 50 |
| Microsoft Graph auth failure (401) | Token expired or insufficient permissions | Refresh OAuth token, verify scopes |
| HRIS export incomplete | Terminated employees excluded | Request full export including all active employees |
| Low employee match rate (< 70%) | Email/Slack ID mismatch | Build manual mapping table |
| Data gap in timeframe | Platform retention policy too short | Document gap, supplement with survey data |
| Component | Free Tier | Paid Tier | At Scale |
|---|---|---|---|
| Slack API access | $0 | $0 | $0 |
| Microsoft Graph API | $0 | $0 | $0 |
| Jira API | $0 | $0 | $0 |
| Survey tool (if needed) | $0 (Google Forms) | $50 (Typeform) | $200 (SurveyMonkey) |
| Total | $0 | $0-$50 | $0-$500 |
Reading Slack messages or email bodies for richer data. Result: GDPR violation, client trust destroyed. [src3]
Extract timestamps, sender/receiver pairs, channel names, reaction counts. Metadata under legitimate interest is legally defensible.
Processing data with real names and anonymizing later. Result: PII on consultant's machine creates breach risk. [src3]
Apply hash immediately when extracting from each API. Raw employee IDs should never persist in any file.
Extracting Slack alone and declaring complete. Result: misses email-heavy relationships common in senior leadership. [src4]
Extract from all available platforms and merge for a complete picture.
Use when an agent needs to collect organizational communication metadata for network analysis. This is Step 2-3 of the OIA engagement lifecycle. Requires signed data processing agreement and admin-level API access. Output feeds directly into the network analysis execution recipe.