<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Operations magic cure: nightly server restarts</title>
	<atom:link href="http://blog.aparicio.org/2009/11/24/operations-magic-cure-nightly-server-restarts/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.aparicio.org/2009/11/24/operations-magic-cure-nightly-server-restarts/</link>
	<description>Confessions of a SAAS entrepreneur</description>
	<lastBuildDate>Mon, 12 Dec 2011 22:04:22 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
	<item>
		<title>By: mike may</title>
		<link>http://blog.aparicio.org/2009/11/24/operations-magic-cure-nightly-server-restarts/#comment-23</link>
		<dc:creator>mike may</dc:creator>
		<pubDate>Tue, 01 Dec 2009 11:50:06 +0000</pubDate>
		<guid isPermaLink="false">http://samaparicio.wordpress.com/2009/11/24/operations-magic-cure-nightly-server-restarts/#comment-23</guid>
		<description>As application service providers, we have less controls over OS related software bugs that cause systems to &quot;erode&quot;, leading to eventual failure. I used to depend on large, very expensive SMP servers to run apps, and lived through the massive crash, started by a routine that measures fan speed, caused the disk sub system to go down, resulting in a cpu panic, after the cache was allowed to go stale. I called this the space shuttle syndrome - very expensive systems that can be brought down by a chunk of ice, falling on a piece of foam, creating a 1 inch crack, that brings the whole thing down, with deadly results.

Now all systems are designed to be as stateless as possible, partitioned to run on small servers, so that we can rotate our bounces when the warning indicators tell us that conservative thresholds have been met. Failure never arrives without some warning, but our admins need to know what to look for.</description>
		<content:encoded><![CDATA[<p>As application service providers, we have less controls over OS related software bugs that cause systems to &#8220;erode&#8221;, leading to eventual failure. I used to depend on large, very expensive SMP servers to run apps, and lived through the massive crash, started by a routine that measures fan speed, caused the disk sub system to go down, resulting in a cpu panic, after the cache was allowed to go stale. I called this the space shuttle syndrome &#8211; very expensive systems that can be brought down by a chunk of ice, falling on a piece of foam, creating a 1 inch crack, that brings the whole thing down, with deadly results.</p>
<p>Now all systems are designed to be as stateless as possible, partitioned to run on small servers, so that we can rotate our bounces when the warning indicators tell us that conservative thresholds have been met. Failure never arrives without some warning, but our admins need to know what to look for.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Nick Kleinschmidt</title>
		<link>http://blog.aparicio.org/2009/11/24/operations-magic-cure-nightly-server-restarts/#comment-22</link>
		<dc:creator>Nick Kleinschmidt</dc:creator>
		<pubDate>Thu, 26 Nov 2009 01:32:08 +0000</pubDate>
		<guid isPermaLink="false">http://samaparicio.wordpress.com/2009/11/24/operations-magic-cure-nightly-server-restarts/#comment-22</guid>
		<description>I think it&#039;s one of the things where the journey is the destination. Once you do the infrastructure work you outlined above, so that restarts are transparent to users, you can keep the whole system up for orders of magnitude longer.

We restart our servers only a few times a year and haven&#039;t had a minute of system downtime this year, but it all depends on what components you&#039;re using. Our app servers go up and down all the time (mostly intentional ;), but everything&#039;s stateless and load balanced, so users don&#039;t know the difference.</description>
		<content:encoded><![CDATA[<p>I think it&#8217;s one of the things where the journey is the destination. Once you do the infrastructure work you outlined above, so that restarts are transparent to users, you can keep the whole system up for orders of magnitude longer.</p>
<p>We restart our servers only a few times a year and haven&#8217;t had a minute of system downtime this year, but it all depends on what components you&#8217;re using. Our app servers go up and down all the time (mostly intentional <img src='http://blog.aparicio.org/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> , but everything&#8217;s stateless and load balanced, so users don&#8217;t know the difference.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: GreenDowntime</title>
		<link>http://blog.aparicio.org/2009/11/24/operations-magic-cure-nightly-server-restarts/#comment-21</link>
		<dc:creator>GreenDowntime</dc:creator>
		<pubDate>Wed, 25 Nov 2009 22:01:44 +0000</pubDate>
		<guid isPermaLink="false">http://samaparicio.wordpress.com/2009/11/24/operations-magic-cure-nightly-server-restarts/#comment-21</guid>
		<description>Better find out in a controlled manner that the last udev update broke something, than to find it out when you _have to_ restart the server fast. So yes: even restart a Linux server now and then -- but agree: might be bad for your uptime-penis script.

.02</description>
		<content:encoded><![CDATA[<p>Better find out in a controlled manner that the last udev update broke something, than to find it out when you _have to_ restart the server fast. So yes: even restart a Linux server now and then &#8212; but agree: might be bad for your uptime-penis script.</p>
<p>.02</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Operations magic cure: nightly server restarts « Sam Aparicio &#171; e-commerce</title>
		<link>http://blog.aparicio.org/2009/11/24/operations-magic-cure-nightly-server-restarts/#comment-20</link>
		<dc:creator>Operations magic cure: nightly server restarts « Sam Aparicio &#171; e-commerce</dc:creator>
		<pubDate>Wed, 25 Nov 2009 17:28:46 +0000</pubDate>
		<guid isPermaLink="false">http://samaparicio.wordpress.com/2009/11/24/operations-magic-cure-nightly-server-restarts/#comment-20</guid>
		<description>[...] the rest here: Operations magic cure: nightly server restarts « Sam Aparicio   Comments [...]</description>
		<content:encoded><![CDATA[<p>[...] the rest here: Operations magic cure: nightly server restarts « Sam Aparicio   Comments [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Operations magic cure: nightly server restarts « Sam Aparicio &#124; nighshift@inertz.org</title>
		<link>http://blog.aparicio.org/2009/11/24/operations-magic-cure-nightly-server-restarts/#comment-19</link>
		<dc:creator>Operations magic cure: nightly server restarts « Sam Aparicio &#124; nighshift@inertz.org</dc:creator>
		<pubDate>Wed, 25 Nov 2009 16:30:01 +0000</pubDate>
		<guid isPermaLink="false">http://samaparicio.wordpress.com/2009/11/24/operations-magic-cure-nightly-server-restarts/#comment-19</guid>
		<description>[...] rest is here: Operations magic cure: nightly server restarts « Sam Aparicio Tags: a-good-argument, appid, empty, hardy-heron, linux, missing, value, [...]</description>
		<content:encoded><![CDATA[<p>[...] rest is here: Operations magic cure: nightly server restarts « Sam Aparicio Tags: a-good-argument, appid, empty, hardy-heron, linux, missing, value, [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: KCP</title>
		<link>http://blog.aparicio.org/2009/11/24/operations-magic-cure-nightly-server-restarts/#comment-18</link>
		<dc:creator>KCP</dc:creator>
		<pubDate>Wed, 25 Nov 2009 15:44:44 +0000</pubDate>
		<guid isPermaLink="false">http://samaparicio.wordpress.com/2009/11/24/operations-magic-cure-nightly-server-restarts/#comment-18</guid>
		<description>I remember back in my days as a CNE, other CNE&#039;s were taking snapshots of their server&#039;s uptime and wearing them as badges of honor.  My servers never had an uptime greater than a week, because I bounced them weekly.  I never had the same problems that the other engineers complained about either.

Today, running a company IT efforts with a mix of Windows and Linux servers, I still mandate to my system admins that they try and bounce these boxes at least monthly.  I cant think of a good argument NOT to do it.  My uptime is very high...I am convinced that regular restarts are a contributing factor in it.</description>
		<content:encoded><![CDATA[<p>I remember back in my days as a CNE, other CNE&#8217;s were taking snapshots of their server&#8217;s uptime and wearing them as badges of honor.  My servers never had an uptime greater than a week, because I bounced them weekly.  I never had the same problems that the other engineers complained about either.</p>
<p>Today, running a company IT efforts with a mix of Windows and Linux servers, I still mandate to my system admins that they try and bounce these boxes at least monthly.  I cant think of a good argument NOT to do it.  My uptime is very high&#8230;I am convinced that regular restarts are a contributing factor in it.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sam</title>
		<link>http://blog.aparicio.org/2009/11/24/operations-magic-cure-nightly-server-restarts/#comment-17</link>
		<dc:creator>Sam</dc:creator>
		<pubDate>Wed, 25 Nov 2009 13:39:34 +0000</pubDate>
		<guid isPermaLink="false">http://samaparicio.wordpress.com/2009/11/24/operations-magic-cure-nightly-server-restarts/#comment-17</guid>
		<description>I used to think along your lines for a looong time. I simply refused to do it on the basis that restarts == we are bad developers. It took bad uptime numbers to make me realize, though, that I was better off being less paradigmatic and focusing my energy on the root causes and living with the restarts.

For me restarts are like pain medication, they make you feel better but they don&#039;t make you healthier. Just because they treat the symptom doesn&#039;t mean they&#039;re not useful.</description>
		<content:encoded><![CDATA[<p>I used to think along your lines for a looong time. I simply refused to do it on the basis that restarts == we are bad developers. It took bad uptime numbers to make me realize, though, that I was better off being less paradigmatic and focusing my energy on the root causes and living with the restarts.</p>
<p>For me restarts are like pain medication, they make you feel better but they don&#8217;t make you healthier. Just because they treat the symptom doesn&#8217;t mean they&#8217;re not useful.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sam</title>
		<link>http://blog.aparicio.org/2009/11/24/operations-magic-cure-nightly-server-restarts/#comment-16</link>
		<dc:creator>Sam</dc:creator>
		<pubDate>Wed, 25 Nov 2009 13:34:26 +0000</pubDate>
		<guid isPermaLink="false">http://samaparicio.wordpress.com/2009/11/24/operations-magic-cure-nightly-server-restarts/#comment-16</guid>
		<description>When I learnt this, the system we were managing had several hundred servers, and there was a mix of Windows and Linux. In my experience, you may need the restart because of the OS, but it could also come because of leaky server software, leaky DB drivers, etc.</description>
		<content:encoded><![CDATA[<p>When I learnt this, the system we were managing had several hundred servers, and there was a mix of Windows and Linux. In my experience, you may need the restart because of the OS, but it could also come because of leaky server software, leaky DB drivers, etc.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: pointernil</title>
		<link>http://blog.aparicio.org/2009/11/24/operations-magic-cure-nightly-server-restarts/#comment-15</link>
		<dc:creator>pointernil</dc:creator>
		<pubDate>Wed, 25 Nov 2009 10:24:14 +0000</pubDate>
		<guid isPermaLink="false">http://samaparicio.wordpress.com/2009/11/24/operations-magic-cure-nightly-server-restarts/#comment-15</guid>
		<description>One of the major failures of the it industry i&#039;d say.

It&#039;s a down spiral: you simply restart &quot;to cure&quot; as system, that way the bugs/issues are not problematic, that why they are not handled, that why there is no learning, that why there is more of them over time.

Operations ppl don&#039;t have to deal with the system too much, as all they need to know is how to restart it, or code those scripts to automate the restarts.

Java and .Net stacks both provide even automatic scheduled &quot;recycling&quot; (isn&#039;t that a nice word for an app *restart from scratch*) features.

But hey! Overall, that little more silicon you need plus restarting, is cheaper than to pay those brains to do it right. Right?</description>
		<content:encoded><![CDATA[<p>One of the major failures of the it industry i&#8217;d say.</p>
<p>It&#8217;s a down spiral: you simply restart &#8220;to cure&#8221; as system, that way the bugs/issues are not problematic, that why they are not handled, that why there is no learning, that why there is more of them over time.</p>
<p>Operations ppl don&#8217;t have to deal with the system too much, as all they need to know is how to restart it, or code those scripts to automate the restarts.</p>
<p>Java and .Net stacks both provide even automatic scheduled &#8220;recycling&#8221; (isn&#8217;t that a nice word for an app *restart from scratch*) features.</p>
<p>But hey! Overall, that little more silicon you need plus restarting, is cheaper than to pay those brains to do it right. Right?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sylvain</title>
		<link>http://blog.aparicio.org/2009/11/24/operations-magic-cure-nightly-server-restarts/#comment-14</link>
		<dc:creator>Sylvain</dc:creator>
		<pubDate>Wed, 25 Nov 2009 09:58:55 +0000</pubDate>
		<guid isPermaLink="false">http://samaparicio.wordpress.com/2009/11/24/operations-magic-cure-nightly-server-restarts/#comment-14</guid>
		<description>Let me guess : you are still running windows, are you not ?</description>
		<content:encoded><![CDATA[<p>Let me guess : you are still running windows, are you not ?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sam</title>
		<link>http://blog.aparicio.org/2009/11/24/operations-magic-cure-nightly-server-restarts/#comment-13</link>
		<dc:creator>Sam</dc:creator>
		<pubDate>Wed, 25 Nov 2009 06:40:03 +0000</pubDate>
		<guid isPermaLink="false">http://samaparicio.wordpress.com/2009/11/24/operations-magic-cure-nightly-server-restarts/#comment-13</guid>
		<description>Hey DJ, thanks for the comment. I guess I meant that a failed restart, for whatever reason, needs an emergency response, as forcing another restart will clearly not solve the issue.

So it&#039;s not so much that somebody must perform the restart, as somebody must be ready to respond to a restart exception.</description>
		<content:encoded><![CDATA[<p>Hey DJ, thanks for the comment. I guess I meant that a failed restart, for whatever reason, needs an emergency response, as forcing another restart will clearly not solve the issue.</p>
<p>So it&#8217;s not so much that somebody must perform the restart, as somebody must be ready to respond to a restart exception.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: D.J. Capelis</title>
		<link>http://blog.aparicio.org/2009/11/24/operations-magic-cure-nightly-server-restarts/#comment-12</link>
		<dc:creator>D.J. Capelis</dc:creator>
		<pubDate>Wed, 25 Nov 2009 06:24:26 +0000</pubDate>
		<guid isPermaLink="false">http://samaparicio.wordpress.com/2009/11/24/operations-magic-cure-nightly-server-restarts/#comment-12</guid>
		<description>Seriously?  Staffed restarts?

Your servers can&#039;t restart themselves?  That is definitely a bug.</description>
		<content:encoded><![CDATA[<p>Seriously?  Staffed restarts?</p>
<p>Your servers can&#8217;t restart themselves?  That is definitely a bug.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

