Archive

Archive for the ‘Operations’ Category

Operations magic cure: nightly server restarts

November 24, 2009 12 comments

I hate to admit it, but it’s a well known fact that some people arrive at high availability by frequently rebooting their servers. As a developer I always abhored this idea. Good software should be able to stay up for a long time.

At some point early in my 2 year tenure as CTO at Angel.com I could no longer fight the obvious: trying to keep the system up for long periods of time simply made us less reliable.

It was at lunch with a CIO friend of a local SAAS company thayt he shared his dirty little secret: “we restart our servers every night. That’s why we get a lot less alerts than you seem to be getting”.

If you think about it, though, this practice is harder than it seems. You need:

* your restarts to be mostly transparent to your users. This probably implies stateless and horizontal partitioning.
* an automated restart procedure. This probably implies a certain degree of script-based automation
* a person in charge of the restarts. This implies a staffed 24/7 rotation.

So all in all, for my money, not a bad attack vector after all, if your goal is to improve uptime, as you will get procedural improvements along the way and peacefully sleeping admins as a bonus.

Categories: Operations Tags: , ,

The data center when you’re 20, 30 and 40

November 20, 2008 1 comment

I find it fascinating to contrast the attitudes of engineers and software execs when it comes to building data centers to run critical software. My non-scientific classification:

When you’re in your twenties your mindset is: I’ll build some cool software, compile it and then deploy it. Where? Free hosting, or just slap on a machine for each kind of server that I need (hmmm… what if I just put every one of them in ONE machine…)

When you’re in your thirties you’ve seen your share of fires and had to deal with long outages, and your mindset is: two of each. I’ll have no single point of failure. I’ll build in redundancy. But you’re still running a ‘Mickey Mouse’ operation (Ahmed’s term).

When you’re in your forties you’ve got battle scars and you’ve been burnt before, and your mindset is: EMC, RAC, Clusterware, active active replication… you have overprovisioned both your hardware and your team and practiced, practiced, practiced.

I’m 34 and I’m trying to act like I’m 44 without waiting ten years. But oh the scars…how much they hurt.

Categories: Operations