NearlyFreeSpeech = Badass Webhosting
Jul. 17th, 2007 02:35 pm[Edit: It has since been pointed out to me that I made some incorrect assumptions about the situation. As such, I have edited this post to reflect the facts of what really went on. I apologize for not getting the full story the first time around.]
The Anthrocon website is back up, with no data lost. Here's the post-mortem of what happened:
At about 7 AM this morning, the company owning the data center, Limelight Networks performed some scheduled maintenance on their power grid. Unfortunately, Murphy's law intervened, and their main breaker malfunctioned and cut power to many machines. Included in the outage were Limelight's own email servers and phone system. Even more unfortunately, while Limelight did the right thing by notifying their customers about the scheduled outage beforehand, some of the emails to their customers (including NearlyFreeSpeech) simply never reached their destination. It's unclear what exactly happened to the emails, other than that they failed to make it into the recipients' inboxes OR spam filters.
Our webhost noticed that all their machines stopped responding at the same time, and promptly tried to get in touch with Limelight. This was complicated somewhat since Limelight's phone service was still out. Once they did get in touch with Limelight, they were informed that a second shutdown would be necessary in about one hour to replace the main breaker. At this point, NFSN proceeded to bring things up in a limited capacity, since the power would be going out again.
After about 1.5 hours, it turned out that the second shutdown would not be necessary after alll. So NFSN proceeded to finish up their file system checks and start bringing all of the servers back up. The webservers were brought back online fairly quickly, but the real bottleneck was bringing all of their MySQL servers back online, it took about 40 minutes to start them all.
One additional problem remained: routing. Since routers were also affected in the power outage, some of the database servers were unreachable by the webhosts. (3 /24 subnets for the curious) This was responsible for about half of the total downtime.
So what did I learn today? NFSN is pretty awesome when it comes to dealing with service issues and staying in touch with their customers. They were active in their forums and gave us updates about every half hour or so. They even went so far as to look into renting a truck so as to move their servers to a new data center if the outage proved to be extended.
The Anthrocon website is back up, with no data lost. Here's the post-mortem of what happened:
At about 7 AM this morning, the company owning the data center, Limelight Networks performed some scheduled maintenance on their power grid. Unfortunately, Murphy's law intervened, and their main breaker malfunctioned and cut power to many machines. Included in the outage were Limelight's own email servers and phone system. Even more unfortunately, while Limelight did the right thing by notifying their customers about the scheduled outage beforehand, some of the emails to their customers (including NearlyFreeSpeech) simply never reached their destination. It's unclear what exactly happened to the emails, other than that they failed to make it into the recipients' inboxes OR spam filters.
Our webhost noticed that all their machines stopped responding at the same time, and promptly tried to get in touch with Limelight. This was complicated somewhat since Limelight's phone service was still out. Once they did get in touch with Limelight, they were informed that a second shutdown would be necessary in about one hour to replace the main breaker. At this point, NFSN proceeded to bring things up in a limited capacity, since the power would be going out again.
After about 1.5 hours, it turned out that the second shutdown would not be necessary after alll. So NFSN proceeded to finish up their file system checks and start bringing all of the servers back up. The webservers were brought back online fairly quickly, but the real bottleneck was bringing all of their MySQL servers back online, it took about 40 minutes to start them all.
One additional problem remained: routing. Since routers were also affected in the power outage, some of the database servers were unreachable by the webhosts. (3 /24 subnets for the curious) This was responsible for about half of the total downtime.
So what did I learn today? NFSN is pretty awesome when it comes to dealing with service issues and staying in touch with their customers. They were active in their forums and gave us updates about every half hour or so. They even went so far as to look into renting a truck so as to move their servers to a new data center if the outage proved to be extended.
(no subject)
Date: 2007-07-17 07:48 pm (UTC)I hate it when I call a US company for help with something and I end up talking to an Indian who can hardly speak any english and knows very little about the service I need help with other than what's in their script. I end up spending four times longer on the phone with him than I need to, and end up generally getting transferred to a US tech who can usually fix the problem immediately.
Don't companies get it? Having your call time be five times as long probably gets rid of any savings they get by outsourcing, and builds customer resentment.
Nothing against foreigners, of course, but cultural and linguistic barriers are VERY BAD when providing tech support. You're already dealing with an irate customer; making it more difficult by adding unneeded barriers isn't going to save anything in the long run.
Even Apple is outsourcing to India now. I actually felt sorry for the Indian guy who was helping me when I called about my video card; he seemed to genuinely want to help but didn't understand simple terms like OpenGL and kernel panics. The product specialist he transferred me to understood my problem in one minute and had my replacement card on the way the next.
At least some companies (like your webhost) get it. I hope this trend starts reversing soon.
(no subject)
Date: 2007-07-17 07:50 pm (UTC)You know there were no phonecalls involved, right? :-)
NFSN does everything over email. And it works amazingly well.
(no subject)
Date: 2007-07-17 11:08 pm (UTC)(no subject)
Date: 2007-07-17 11:34 pm (UTC)Yeah, you were the one who originally mentioned them to me $SMALLINT years ago.
(no subject)
Date: 2007-07-17 11:46 pm (UTC)And here I thought that particular host center was invincible.
(no subject)
Date: 2007-07-18 04:02 am (UTC)Unlike every other service provider out there. You get an outage, you still get a monthly bill. They have no incentive to get things fixed quickly.
I've seriously been nothing but happy with nfsn. Everything is easy to control the way I want, everything works fast. It's perfect for what I need. And after about 2 months of service I've paid 31 cents, not including the 7.50 I happily paid to transfer my domain name to them from the jackholes at web.com