OIA Stress Test Execution
How do you execute organizational stress tests with scenario design and recovery scoring?
Purpose
This recipe executes organizational stress tests that measure how quickly and effectively the organization detects, responds to, coordinates around, and recovers from disruption scenarios. It produces a composite resilience score benchmarked against High-Reliability Organization standards, identifying the top 3 resilience gaps with specific remediation recommendations. [src1, src3]
Prerequisites
- Network analysis findings with SPOF inventory from OIA Network Analysis
- Autoimmune scan findings from OIA Autoimmune Scan
- Executive approval for stress test scope — signed confirmation
- Leadership team availability — 2-hour block for tabletop exercise
- IT admin cooperation for silent stress test (if approved)
Constraints
- Silent stress tests require explicit written executive approval. [src1]
- Controlled disruptions must be low-risk and fully reversible — no production impact.
- Tabletop exercises require 2-hour uninterrupted block with leadership.
- All scoring uses standardized 5-dimension rubric. [src3]
- Findings shared with leadership before broader distribution.
Tool Selection Decision
Which approach?
├── Full leadership + silent test approved
│ └── PATH A: Tabletop + Silent Test
├── Full leadership, no silent test
│ └── PATH B: Tabletop Only
├── Partial leadership availability
│ └── PATH C: Async Scenarios + Mini Tabletop
└── No leadership availability
└── PATH D: Async-Only
| Path | Tools | Cost | Speed | Output Quality |
|---|---|---|---|---|
| A: Full | Tabletop + silent test + scoring | $0-$2K | 1-2 weeks | Excellent |
| B: Tabletop | Facilitated workshop + scoring | $0-$1K | 1 week | Good |
| C: Async + Mini | Written scenarios + 1-hour session | $0-$500 | 1 week | Adequate |
| D: Async-Only | Written questionnaire | $0-$200 | 3-5 days | Basic |
Execution Flow
Step 1: Scenario Design
Duration: 4-8 hours · Tool: Scenario template based on SPOF inventory
Design 3-5 stress test scenarios based on SPOF inventory: (A) key person departure, (B) critical system outage 48 hours, (C) regulatory audit with 2-week deadline, (D) 20% budget cut in 30 days, (E) major client threatens to leave. [src2]
Verify: Each scenario has trigger, affected departments, expected impact, and success criteria. · If failed: Use generic industry scenarios, refine after tabletop.
Step 2: Tabletop Exercise
Duration: 2 hours · Tool: Facilitated workshop with leadership
Present each scenario, observe: time to identify responsible parties, response plan quality, cross-departmental coordination, communication clarity. [src2, src4]
Verify: All scenarios exercised, observations documented. · If failed: Prioritize Scenarios A and B (personnel and technology).
Step 3: Silent Stress Test
Duration: 1 week · Tool: IT admin configuration + observation
With executive approval, introduce one low-risk controlled disruption: approval rerouting, tool restriction, or information delay. Observe natural organizational response. [src1]
Verify: Disruption contained, observations captured. · If failed: Execute rollback immediately, document as a resilience finding.
Step 4: Response Measurement
Duration: 4-8 hours · Tool: Standardized scoring rubric
Score each scenario across 5 dimensions (1-5 scale): Detection Speed, Response Quality, Coordination, Recovery Time, Learning. [src3]
Verify: All 5 dimensions scored per scenario with evidence. · If failed: Re-review observation notes with second analyst.
Step 5: Resilience Score Calculation
Duration: 4-8 hours · Tool: Scoring framework + HRO benchmarks
Produce composite resilience score (0-100) per department and overall. Compare against HRO benchmarks (Weick & Sutcliffe). Identify top 3 resilience gaps. [src3]
Verify: Remediation recommendations with specific actions, owners, and timelines. · If failed: If all scores > 4, scenarios were too easy — design harder ones.
Output Schema
{
"output_type": "resilience_assessment_report",
"format": "PDF + XLSX + JSON",
"key_metrics": [
{"name": "overall_resilience_score", "description": "0-100 composite resilience score"},
{"name": "detection_avg", "description": "Average detection speed (1-5)"},
{"name": "coordination_avg", "description": "Average cross-team coordination (1-5)"},
{"name": "recovery_avg", "description": "Average recovery time score (1-5)"},
{"name": "gap_count", "description": "Dimensions scoring below 3"}
]
}
Quality Benchmarks
| Quality Metric | Minimum Acceptable | Good | Excellent |
|---|---|---|---|
| Scenarios tested | 3 | 4 | 5+ |
| Leadership participation | > 60% | > 80% | > 95% |
| Scoring consistency | > 70% agreement | > 85% | > 95% |
| Remediation specificity | General recommendations | Specific actions + owners | Actions + timelines + criteria |
| Silent test executed | No (tabletop only) | Yes (1 test) | Yes (2+ tests) |
If below minimum: Extend exercise, run async scenarios for absent members.
Error Handling
| Error | Likely Cause | Recovery Action |
|---|---|---|
| Leadership disengaged | Scenarios not relevant | Pause, ask what keeps them up at night, redesign |
| Silent test unexpected impact | Risk assessment underestimated | Execute rollback, document as finding |
| Scoring disagreement | Ambiguous observations | Re-review notes together, evidence-based calibration |
| All scores above 4 | Scenarios too easy | Design harder scenarios from actual SPOF data |
| No time for tabletop | Leadership scheduling conflict | Async responses + 30-min follow-up |
Cost Breakdown
| Component | Tabletop Only | Tabletop + Silent | Full Assessment |
|---|---|---|---|
| Scenario design | $0 | $0 | $0 |
| Facilitation | $0-$500 | $0-$500 | $0-$1K |
| Silent test execution | N/A | $0-$500 | $0-$1K |
| Analysis and scoring | $0 | $0 | $0 |
| Total | $0-$500 | $0-$1K | $0-$2K |
Anti-Patterns
Wrong: Running tabletop as a presentation
Reading scenarios aloud and telling leadership what would happen. Result: no resilience data collected. [src3]
Correct: Facilitate, don't lecture
Present the trigger and observe. Silence after a trigger is data — it means detection is slow.
Wrong: Designing abstract scenarios
Using "a major disruption occurs" without specificity. Result: generic answers, meaningless scores. [src2]
Correct: Ground scenarios in SPOF data
Use actual bottleneck names (anonymized), real system names. Specificity forces specific responses.
Wrong: Skipping silent stress test
Relying entirely on tabletop responses. Result: overestimating resilience — people describe ideal, not actual behavior. [src1]
Correct: Validate with at least one silent test
Even a small controlled disruption reveals the gap between described and actual organizational behavior.
When This Matters
Use when an agent needs to measure organizational resilience through scenario-based stress testing. This is Step 5 of the OIA engagement lifecycle. Requires SPOF inventory and autoimmune findings as inputs. Output feeds into the final OIA health score report.