Claude Chaos Resilience Drills

Official References: Best Practices · Hooks · Security · GitHub Actions

Why drills are mandatory at advanced maturity

A runbook that is never exercised is only documentation.

Resilience drills convert written policy into verified team behavior.

Drill tier model

Tier	Scope	Frequency	Success signal
Tabletop	decision flow + ownership	weekly	all owners make correct decision handoffs
Partial simulation	one lane or one subsystem	biweekly	target SLO restored within agreed window
Full simulation	cross-lane end-to-end recovery	monthly	recovery + communication + follow-up all completed

Pre-drill design packet

Prepare before every drill:

scenario statement
expected failure signature
blast-radius boundary
stop condition
observers and scoring owner

No design packet means results cannot be compared over time.

Execution lane split

Failure injection lane: trigger scenario safely
Response lane: execute containment and mitigation
Verification lane: run checks and evidence capture
Comms lane: timeline updates and escalation notices

Assign one owner per lane and one global drill commander.

Resilience scorecard

Score each drill (0/1 per row):

detection happened in target window
ownership handoff stayed explicit
mitigation decision was reversible
verification evidence was fresh
post-drill follow-up owner assigned

Score below 4/5 means retry with narrowed scope.

Decision checkpoint protocol

At each checkpoint, force a clear choice:

continue mitigation
rollback to stable state
escalate and pause rollout

Unstated decisions are hidden risks.

Failure-handling rule

If drill execution diverges from scenario assumptions:

stop simulation
record divergence
convert divergence into next drill input

Never smooth over unexpected behavior.

Quarterly resilience program

Every quarter:

run at least one full-scope drill
rotate drill commander role
review recurring low-score dimensions
retire controls that never move reliability

Resilience improves through iteration quality, not drill count.

Advanced anti-patterns

Drill becomes performance theater

If people optimize for optics, reliability signal collapses.

Same scenario repeated with no mutation

Teams memorize one path instead of building adaptive capability.

Post-drill actions tracked without owners

Unowned actions become reliability debt.

Quick checklist

Before closing a drill cycle:

scorecard captured
decision checkpoints logged
follow-up owners assigned
next scenario drafted

Claude can help teams respond faster. Drills prove teams can respond correctly.

Drill scenario catalog (starter set)

Rotate scenarios to avoid memorized responses.

Reliability scenario set

Dependency timeout storm — primary API latency spikes beyond SLO.
Config drift release — one environment receives stale flag values.
Queue backlog saturation — processing lag creates cascading failures.
Observability blackout — one critical dashboard panel fails during incident.
Owner unavailable — primary on-call unavailable at first checkpoint.

Each cycle: pick one technical failure + one coordination failure.

Observer scoring pack

Observers should score behavior, not personality.

Dimension	What to observe
Detection quality	Was the first signal recognized and triaged correctly?
Decision quality	Was a reversible decision made quickly?
Ownership clarity	Did every checkpoint name a next owner?
Evidence quality	Were commands/logs captured at each checkpoint?
Communication cadence	Were updates sent on promised cadence?

Add evidence links for every score.

Drill timeline template (45 minutes)

00:00–05:00 scenario brief + success criteria
05:00–15:00 first signal + triage decision
15:00–30:00 mitigation path execution
30:00–40:00 verification and stability checks
40:00–45:00 debrief capture + follow-up assignment

If timeline overruns, log reason as process debt.

Communication scripts for pressure moments

First 5-minute update

Incident drill started at <time>
Observed signal: <summary>
Current branch: triage/mitigate/rollback
Next checkpoint: <time>
Commander: <name>

Escalation checkpoint update

Escalation reason: <threshold breach>
Decision: continue | rollback | pause
Immediate owner: <name>
Verification owner: <name>
Next update at: <time>

Debrief decision matrix

After drill, classify every finding:

Fix now (high-risk + low effort)
Schedule next cycle (high-risk + medium effort)
Observe (unclear impact; collect more evidence)
Drop (no measurable reliability value)

Never leave findings uncategorized.

Mutation planning for next cycle

Design next drill by mutating one factor intentionally:

failure starts 10 minutes earlier/later
key dependency fails differently
comms channel is delayed
backup owner must lead

Write mutation rationale so score changes are interpretable.

Drill completion gate

A drill cycle is complete only when:

scorecard + evidence links are archived
at least one follow-up action is owner-assigned
next mutation scenario is drafted
commander signs off on decision quality notes

Without this gate, drills become one-off events.

Drill scoring normalization

To compare drills across weeks, normalize scores:

weight detection and decision quality higher for SEV-1 style scenarios
weight comms cadence higher for multi-stakeholder scenarios
always publish both raw score and weighted score

Example weighting

detection quality: 30%
decision quality: 25%
ownership clarity: 20%
evidence quality: 15%
communication cadence: 10%

Adjust weights by scenario class, but document changes.

Commander playbook for stalled drills

If the drill stalls for >5 minutes without decision:

freeze additional discussion
force explicit branch selection (continue/rollback/escalate)
assign execution owner immediately
schedule next checkpoint in 5 minutes

This prevents analysis paralysis during rehearsal.

Debrief conversion rule

Every debrief output must become one of:

merged control
scheduled control with owner/date
documented rejection with reason

No orphan findings.

Quarterly drill campaign structure

Run a campaign, not isolated events.

Month 1: response-speed emphasis (detection + decision latency)
Month 2: coordination emphasis (handoff and communication integrity)
Month 3: recovery-quality emphasis (verification depth + follow-up closure)

Campaign design makes score trends interpretable.

Stress modifiers for realism

Add one stress modifier to each drill:

delayed signal visibility
partial owner availability
conflicting stakeholder requests
degraded observability channel

Stress modifiers reveal brittle processes hidden by “clean” simulations.

Drill evidence minimum

Each drill output must include:

timeline with decision timestamps
command-level verification snippets
owner handoff chain
comms updates sent vs promised
follow-up action mapping

Without this, scorecards are storytelling, not evidence.

Calibration review after every 3 drills

After every third drill, run calibration:

compare weighted score trends
identify over-weighted dimensions
adjust scoring weights with rationale
publish changed rubric before next cycle

Transparent calibration prevents metric gaming.

Multi-team drill federation model

For larger orgs, run drills with federation:

platform team owns shared infrastructure scenarios
product team owns customer-path scenarios
security team injects trust-boundary failures

Federation exposes cross-team coupling early.

Drill quality KPIs

Measure program quality, not just single drill scores:

% drills with complete evidence bundle
% follow-up actions closed by due date
median time to first explicit checkpoint decision
recurrence rate of identical failure mode findings

If KPI trend worsens, simplify scenario scope and restore rigor.

Observer bias controls

Reduce scoring bias:

rotate observers each cycle
require evidence links for low/high scores
blind one observer to team names when feasible

Better scoring quality improves downstream hardening decisions.

Connected Guides