Official References: Get Started · CLI Commands · Sub-agents · Skills
Why Gemini teams need resilience drills
Hardening plans without rehearsal decay quickly.
Drills verify whether teams can execute under uncertainty, not just describe process.
Drill tiers
| Tier | Scope | Cadence | Exit signal |
|---|---|---|---|
| Decision tabletop | ownership and branching decisions | weekly | no ambiguous owner handoffs |
| Service slice simulation | one workflow/surface | biweekly | recovery target met with evidence |
| Full operational simulation | multi-lane coordinated recovery | monthly | complete response + comms + follow-up |
Drill setup packet
Define before start:
- scenario and trigger
- expected detection signal
- isolation boundary
- abort condition
- score owner
If setup quality is weak, drill metrics become noisy.
Lane roles
- Injection lane: simulate fault safely
- Response lane: choose mitigation and apply
- Verification lane: prove state recovery
- Communication lane: maintain timeline updates
One commander should enforce checkpoint discipline.
Reliability scorecard (0/1)
- detection latency within target
- ownership remained explicit
- mitigation remained reversible
- verification evidence is fresh
- follow-up owner assigned
Threshold under 4/5 requires rerun.
Checkpoint decision protocol
At each major checkpoint choose explicitly:
- continue mitigation
- rollback
- escalate and pause
Delayed decisions are hidden instability.
Scenario mutation rule
Never run identical drills repeatedly.
Mutate one variable every cycle:
- timing
- dependency failure type
- ownership availability
- communication channel constraints
Mutation prevents false confidence.
Quarterly reliability review
- run at least one full simulation
- rotate commander and observers
- identify persistent low-score rows
- remove controls that produce no reliability gain
Advanced anti-patterns
Scoring without evidence links
Numbers without evidence cannot drive action.
Commander also owns all lanes
This destroys independent signal and overloads decisions.
Follow-up tracked with no due date
Undated actions are deferred risk.
Quick checklist
Before closing a drill cycle:
- scorecard stored
- checkpoint decisions logged
- scenario mutation recorded
- follow-up owners assigned
Gemini CLI increases execution speed. Drills ensure reliability keeps up.
Drill scenario catalog (starter set)
Rotate scenarios to avoid memorized responses.
Reliability scenario set
- Dependency timeout storm — primary API latency spikes beyond SLO.
- Config drift release — one environment receives stale flag values.
- Queue backlog saturation — processing lag creates cascading failures.
- Observability blackout — one critical dashboard panel fails during incident.
- Owner unavailable — primary on-call unavailable at first checkpoint.
Each cycle: pick one technical failure + one coordination failure.
Observer scoring pack
Observers should score behavior, not personality.
| Dimension | What to observe |
|---|---|
| Detection quality | Was the first signal recognized and triaged correctly? |
| Decision quality | Was a reversible decision made quickly? |
| Ownership clarity | Did every checkpoint name a next owner? |
| Evidence quality | Were commands/logs captured at each checkpoint? |
| Communication cadence | Were updates sent on promised cadence? |
Add evidence links for every score.
Drill timeline template (45 minutes)
- 00:00–05:00 scenario brief + success criteria
- 05:00–15:00 first signal + triage decision
- 15:00–30:00 mitigation path execution
- 30:00–40:00 verification and stability checks
- 40:00–45:00 debrief capture + follow-up assignment
If timeline overruns, log reason as process debt.
Communication scripts for pressure moments
First 5-minute update
Incident drill started at <time>
Observed signal: <summary>
Current branch: triage/mitigate/rollback
Next checkpoint: <time>
Commander: <name>Escalation checkpoint update
Escalation reason: <threshold breach>
Decision: continue | rollback | pause
Immediate owner: <name>
Verification owner: <name>
Next update at: <time>Debrief decision matrix
After drill, classify every finding:
- Fix now (high-risk + low effort)
- Schedule next cycle (high-risk + medium effort)
- Observe (unclear impact; collect more evidence)
- Drop (no measurable reliability value)
Never leave findings uncategorized.
Mutation planning for next cycle
Design next drill by mutating one factor intentionally:
- failure starts 10 minutes earlier/later
- key dependency fails differently
- comms channel is delayed
- backup owner must lead
Write mutation rationale so score changes are interpretable.
Drill completion gate
A drill cycle is complete only when:
- scorecard + evidence links are archived
- at least one follow-up action is owner-assigned
- next mutation scenario is drafted
- commander signs off on decision quality notes
Without this gate, drills become one-off events.
Drill scoring normalization
To compare drills across weeks, normalize scores:
- weight detection and decision quality higher for SEV-1 style scenarios
- weight comms cadence higher for multi-stakeholder scenarios
- always publish both raw score and weighted score
Example weighting
- detection quality: 30%
- decision quality: 25%
- ownership clarity: 20%
- evidence quality: 15%
- communication cadence: 10%
Adjust weights by scenario class, but document changes.
Commander playbook for stalled drills
If the drill stalls for >5 minutes without decision:
- freeze additional discussion
- force explicit branch selection (continue/rollback/escalate)
- assign execution owner immediately
- schedule next checkpoint in 5 minutes
This prevents analysis paralysis during rehearsal.
Debrief conversion rule
Every debrief output must become one of:
- merged control
- scheduled control with owner/date
- documented rejection with reason
No orphan findings.
Quarterly drill campaign structure
Run a campaign, not isolated events.
- Month 1: response-speed emphasis (detection + decision latency)
- Month 2: coordination emphasis (handoff and communication integrity)
- Month 3: recovery-quality emphasis (verification depth + follow-up closure)
Campaign design makes score trends interpretable.
Stress modifiers for realism
Add one stress modifier to each drill:
- delayed signal visibility
- partial owner availability
- conflicting stakeholder requests
- degraded observability channel
Stress modifiers reveal brittle processes hidden by “clean” simulations.
Drill evidence minimum
Each drill output must include:
- timeline with decision timestamps
- command-level verification snippets
- owner handoff chain
- comms updates sent vs promised
- follow-up action mapping
Without this, scorecards are storytelling, not evidence.
Calibration review after every 3 drills
After every third drill, run calibration:
- compare weighted score trends
- identify over-weighted dimensions
- adjust scoring weights with rationale
- publish changed rubric before next cycle
Transparent calibration prevents metric gaming.
Multi-team drill federation model
For larger orgs, run drills with federation:
- platform team owns shared infrastructure scenarios
- product team owns customer-path scenarios
- security team injects trust-boundary failures
Federation exposes cross-team coupling early.
Drill quality KPIs
Measure program quality, not just single drill scores:
- % drills with complete evidence bundle
- % follow-up actions closed by due date
- median time to first explicit checkpoint decision
- recurrence rate of identical failure mode findings
If KPI trend worsens, simplify scenario scope and restore rigor.
Observer bias controls
Reduce scoring bias:
- rotate observers each cycle
- require evidence links for low/high scores
- blind one observer to team names when feasible
Better scoring quality improves downstream hardening decisions.