Operations magic cure: nightly server restarts
I hate to admit it, but it’s a well known fact that some people arrive at high availability by frequently rebooting their servers. As a developer I always abhored this idea. Good software should be able to stay up for a long time.
At some point early in my 2 year tenure as CTO at Angel.com I could no longer fight the obvious: trying to keep the system up for long periods of time simply made us less reliable.
It was at lunch with a CIO friend of a local SAAS company thayt he shared his dirty little secret: “we restart our servers every night. That’s why we get a lot less alerts than you seem to be getting”.
If you think about it, though, this practice is harder than it seems. You need:
* your restarts to be mostly transparent to your users. This probably implies stateless and horizontal partitioning.
* an automated restart procedure. This probably implies a certain degree of script-based automation
* a person in charge of the restarts. This implies a staffed 24/7 rotation.
So all in all, for my money, not a bad attack vector after all, if your goal is to improve uptime, as you will get procedural improvements along the way and peacefully sleeping admins as a bonus.

Recent Comments