Rollback

更新时间:
复制 MD 格式

Production changes can introduce unexpected failures. A rollback plan lets you restore a service, configuration, or data to its last known-good state before users are impacted.

Every production change must have a rollback plan before it goes live. The rollback scope must match the scope of the original change — a partial rollback that leaves components in mismatched states can introduce new failures.

When a change introduces risks or triggers anomalies that cannot be diagnosed quickly, restore to the pre-change state immediately rather than attempting live diagnosis under production load.

Rollback modes

When a change must be reversed, choose a mode based on whether the change object supports returning to its original state:

  1. Backward mode (rollback): Revert the change object directly to its pre-change state. If the service was in state A before the change and moved to state B after, backward mode restores it to A. Use this mode when the previous artifact or configuration is still available and the change is stateless. State transition: A→B→A.

  2. Forward mode (roll-forward): Apply a new change that produces a state equivalent to the pre-change state, rather than reverting in place. If the service was in state A before and moved to state B after, forward mode advances it to A', which is functionally identical to A. Use this mode when backward reversion is not safe — for example, when a database migration or protocol change has already been applied to live data and cannot be cleanly undone. State transition: A→B→A'.

Five elements of a rollback plan

Define the following five elements before executing any change:

  • Change object: The atomic resource being changed — for example, an application package or a configuration item. Scope the rollback plan to this unit.

  • Rollback mode: Whether to restore the change object using backward mode (revert to original) or forward mode (advance to equivalent state). Decide this before the change executes, not after a failure occurs.

  • Effective scope: The portion of the system where the change has taken effect at any given point during a phased or canary rollout. Rollback must cover the full effective scope, not just the target of the final deployment step.

  • Pre-change state: A stable, atomic description of the change object before the change — the target state for a backward-mode rollback. Capture this before the change begins.

  • Post-change state: A stable, atomic description of the change object after the change. This state confirms whether the change applied correctly and determines what must be undone or superseded during a rollback.