You Can't Have a Rollback Button
You can’t have a rollback button
The old version does not exist
The fundamental problem with rolling back to an old version is that web applications are not self-contained, and therefore they do not have versions. They have a current state. The state consists of the application code and everything that it interacts with. Databases, caches, browsers, and concurrently-running copies of itself.
You can roll back the SHA the webservers are running, but you can’t roll back what they’ve inflicted on everything else in the system.
A sharp knife, whose handle is also a knife
Adding a rollback button is not a neutral design choice. It affects the code that gets pushed. If developers incorrectly believe that their mistakes can be quickly reversed, they will tend to take more foolish risks.
Practice small corrections
Pushbutton rollback is a bad idea. The only sensible thing to do is change the way we organize our code for deployment.
- Push “dark” code. You should be deploying code behind a disabled feature flag that will not be invoked. It’s relatively easy to visually inspect an if statement for correctness and check that a flag is disabled.
- Ramp up invocations of new code. Breaking requests without a quick rollback path is bad. But it’s much worse to break 100% of requests than it is to break 1% of requests. If we ramp up new code gradually, we can often contain the scope of the damage.
- Maintain off switches. In the event that a complicated remediation is required, we’re in a stronger position if we can disable broken features while we work on them in relative calm.