Recently, an old friend asked me to assess a company for a potential investment on their part. The details of the (potential) investment and the actual company aren’t crucial for this post, so I’ll refrain from disclosing further information.
What caught my attention during my “research” was the company’s apparent aversion to deploying on Fridays. They claimed they made this decision for the sake of their employees’ peace of mind. At first glance, this seems like a commendable choice. However, I’m here to share why, in this author’s humble opinion, they may be misguided, albeit with good intentions.
I have no doubt everyone at that company is acting in good faith. But it seems that they might not have thought this through all the way to the end. While reducing Friday deployments might alleviate short-term stress, it also perpetuates the myth that deployments, in general, are ominous and warrant respect or fear. This, in turn, may elevate stress levels during deployments, even if they happen on other days of the week.
Now, here’s where the purists might chime in and argue:
You should never be afraid to deploy!
If you are afraid to deploy, then your deployment process is flawed!
I strongly believe that this is a false dichotomy. It is not a question of if if your deployment will ever fail (and potentially in a disastrous way), but rather of when that will happen. No amount of processes, automation, or quality assurance can ever reasonably provide 100% security. It is always a trade-off and one with ever-diminishing returns.
While I’m a proponent of robust CI/CD pipelines, I implore you not to fall into the rabbit hole of endlessly adding layers in an attempt to buy more security.
Instead, consider a simpler way to reduce deployment stress: easy rollbacks! Every action and decision becomes easier as the cost of reversal declines. (This is also a great way to measure “good” software architecture, but that’s another story.) If rolling back an erroneous deployment requires just one click, fear diminishes. No more panic scrolling through the code to find the cause of a bug discovered on the production system. Instead, you simply roll back the deployment, take a step back, and calmly analyze the code.
(Of course, there are even more things to be done here: Partial rollouts, solid telemetry, robust backups, and so on. But eventually, you will end up with the same fringe benefits as in the deployment automation. So be aware of the balance you need to strike here).
Our world is complex, and we seldom deal in absolutes. Reality is never just black or white and always consists of a hundred shades of grey.
So, please keep in mind that there may still be reasons to be cautious during deployments (huge database migration anyone? re-platforming recently?).
What I want you to take away from this, is that if you think you have a problem with your deployment strategy the solution probably isn’t to just add more. Instead, we should always be looking for solutions in other areas as well.