OIA Stress Test Execution

How do you execute organizational stress tests with scenario design and recovery scoring?

Purpose

This recipe executes organizational stress tests that measure how quickly and effectively the organization detects, responds to, coordinates around, and recovers from disruption scenarios. It produces a composite resilience score benchmarked against High-Reliability Organization standards, identifying the top 3 resilience gaps with specific remediation recommendations. [src1, src3]

Prerequisites

Network analysis findings with SPOF inventory from OIA Network Analysis
Autoimmune scan findings from OIA Autoimmune Scan
Executive approval for stress test scope — signed confirmation
Leadership team availability — 2-hour block for tabletop exercise
IT admin cooperation for silent stress test (if approved)

Constraints

Silent stress tests require explicit written executive approval. [src1]
Controlled disruptions must be low-risk and fully reversible — no production impact.
Tabletop exercises require 2-hour uninterrupted block with leadership.
All scoring uses standardized 5-dimension rubric. [src3]
Findings shared with leadership before broader distribution.

Tool Selection Decision

Which approach?
├── Full leadership + silent test approved
│   └── PATH A: Tabletop + Silent Test
├── Full leadership, no silent test
│   └── PATH B: Tabletop Only
├── Partial leadership availability
│   └── PATH C: Async Scenarios + Mini Tabletop
└── No leadership availability
    └── PATH D: Async-Only

Path	Tools	Cost	Speed	Output Quality
A: Full	Tabletop + silent test + scoring	$0-$2K	1-2 weeks	Excellent
B: Tabletop	Facilitated workshop + scoring	$0-$1K	1 week	Good
C: Async + Mini	Written scenarios + 1-hour session	$0-$500	1 week	Adequate
D: Async-Only	Written questionnaire	$0-$200	3-5 days	Basic

Execution Flow

Step 1: Scenario Design

Duration: 4-8 hours · Tool: Scenario template based on SPOF inventory

Design 3-5 stress test scenarios based on SPOF inventory: (A) key person departure, (B) critical system outage 48 hours, (C) regulatory audit with 2-week deadline, (D) 20% budget cut in 30 days, (E) major client threatens to leave. [src2]

Verify: Each scenario has trigger, affected departments, expected impact, and success criteria. · If failed: Use generic industry scenarios, refine after tabletop.

Step 2: Tabletop Exercise

Duration: 2 hours · Tool: Facilitated workshop with leadership

Present each scenario, observe: time to identify responsible parties, response plan quality, cross-departmental coordination, communication clarity. [src2, src4]

Verify: All scenarios exercised, observations documented. · If failed: Prioritize Scenarios A and B (personnel and technology).

Step 3: Silent Stress Test

Duration: 1 week · Tool: IT admin configuration + observation

With executive approval, introduce one low-risk controlled disruption: approval rerouting, tool restriction, or information delay. Observe natural organizational response. [src1]

Verify: Disruption contained, observations captured. · If failed: Execute rollback immediately, document as a resilience finding.

Step 4: Response Measurement

Duration: 4-8 hours · Tool: Standardized scoring rubric

Score each scenario across 5 dimensions (1-5 scale): Detection Speed, Response Quality, Coordination, Recovery Time, Learning. [src3]

Verify: All 5 dimensions scored per scenario with evidence. · If failed: Re-review observation notes with second analyst.

Step 5: Resilience Score Calculation

Duration: 4-8 hours · Tool: Scoring framework + HRO benchmarks

Produce composite resilience score (0-100) per department and overall. Compare against HRO benchmarks (Weick & Sutcliffe). Identify top 3 resilience gaps. [src3]

Verify: Remediation recommendations with specific actions, owners, and timelines. · If failed: If all scores > 4, scenarios were too easy — design harder ones.

Output Schema

{
  "output_type": "resilience_assessment_report",
  "format": "PDF + XLSX + JSON",
  "key_metrics": [
    {"name": "overall_resilience_score", "description": "0-100 composite resilience score"},
    {"name": "detection_avg", "description": "Average detection speed (1-5)"},
    {"name": "coordination_avg", "description": "Average cross-team coordination (1-5)"},
    {"name": "recovery_avg", "description": "Average recovery time score (1-5)"},
    {"name": "gap_count", "description": "Dimensions scoring below 3"}
  ]
}

Quality Benchmarks

Quality Metric	Minimum Acceptable	Good	Excellent
Scenarios tested	3	4	5+
Leadership participation	> 60%	> 80%	> 95%
Scoring consistency	> 70% agreement	> 85%	> 95%
Remediation specificity	General recommendations	Specific actions + owners	Actions + timelines + criteria
Silent test executed	No (tabletop only)	Yes (1 test)	Yes (2+ tests)

If below minimum: Extend exercise, run async scenarios for absent members.

Error Handling

Error	Likely Cause	Recovery Action
Leadership disengaged	Scenarios not relevant	Pause, ask what keeps them up at night, redesign
Silent test unexpected impact	Risk assessment underestimated	Execute rollback, document as finding
Scoring disagreement	Ambiguous observations	Re-review notes together, evidence-based calibration
All scores above 4	Scenarios too easy	Design harder scenarios from actual SPOF data
No time for tabletop	Leadership scheduling conflict	Async responses + 30-min follow-up

Cost Breakdown

Component	Tabletop Only	Tabletop + Silent	Full Assessment
Scenario design	$0	$0	$0
Facilitation	$0-$500	$0-$500	$0-$1K
Silent test execution	N/A	$0-$500	$0-$1K
Analysis and scoring	$0	$0	$0
Total	$0-$500	$0-$1K	$0-$2K

Anti-Patterns

Wrong: Running tabletop as a presentation

Reading scenarios aloud and telling leadership what would happen. Result: no resilience data collected. [src3]

Correct: Facilitate, don't lecture

Present the trigger and observe. Silence after a trigger is data — it means detection is slow.

Wrong: Designing abstract scenarios

Using "a major disruption occurs" without specificity. Result: generic answers, meaningless scores. [src2]

Correct: Ground scenarios in SPOF data

Use actual bottleneck names (anonymized), real system names. Specificity forces specific responses.

Wrong: Skipping silent stress test

Relying entirely on tabletop responses. Result: overestimating resilience — people describe ideal, not actual behavior. [src1]

Correct: Validate with at least one silent test

Even a small controlled disruption reveals the gap between described and actual organizational behavior.

When This Matters

Use when an agent needs to measure organizational resilience through scenario-based stress testing. This is Step 5 of the OIA engagement lifecycle. Requires SPOF inventory and autoimmune findings as inputs. Output feeds into the final OIA health score report.