close up shot of a boy covering his face with a towel

So What Happened This Week

You may have noticed that Spamtacular was down for a few days this week. If you did, you may have wondered what happened.

Did Spamtacular close? Obviously not. But since you’re curious enough to have read this far, I’ll lay it all out for you.

Even though Spamtacular is run by me personally, the company I’m President & CEO of graciously allows me to host it on the company’s server. It is, after all, nice to be the king. That server has been hosted by a provider in Longview, Texas, owned by the father-in-law of a friend of mine. The server is old. When it was bought, it came with RedHat Linux 10 installed on it. Yeah, it’s that old.

On Sunday evening, the server essentially froze. I could ping the IP address and I could traceroute to it, but I couldn’t get a response from it. It was restarted on Monday morning but froze again a few hours later. It was restarted again and froze again at 4:26 a.m., Tuesday morning showing that the swap partition was completely full.

Now, as it happens, I have another server here at the house. So, I slapped a brand new install of CentOS 5.3 on it, copied over the relevant files and databases from the nightly backups that we keep here. And drove the server to Longview on Tuesday afternoon.

After I installed the server and got back home, it began to suffer from I/O errors on its disk drives and it froze as well. Total time up: four hours.

Tuesday evening was spent examining the original server to discover the source of the failure. I looked at what I could and came up pretty empty. Eventually, thanks to the help of Steve Atkins, it was decided that the cause was likely to be the death of either the RAID controller or one of the hard drives. The suggested solution: ditch RAID and copy the filesystem to an IDE drive (remember, the server is old enough that it doesn’t have SATA ports) and just use that.

Wednesday, my wife took the server back to Longview and to a computer store. They took the data from the drives and copied the data onto a new 500G EIDE drive. In theory, she was going to just take the server from the shop to the facility and we’d be done with it. But, when they did a test boot, they got a screen which said, and I quote: “GRUB”. I was told that the server booted “to grub but no further.” Since the repair folks were Windows weenies, they were at a loss. I wasn’t willing to pay them to research the problem, so I did some research and found a potential solution while my wife brings the server home.

Wednesday night I get my hands on the server at 9:30. That’s when I discovered that it got to the word “GRUB” not to the grub splash/menu screen. My researched solution (basically: rebuild the initrd) went out the window, but hey, if I found that solution, I can find another one! So, back to research.

About an hour later I discovered the answer: re-install grub. So, that was done and the machine boots. To bed!

Thursday morning, I get up and take the machine back to the facility where it lives (after getting appropriate approvals from my boss at my day job). After dealing with a couple of configuration issues, I leave and come home.

Thursday afternoon, server load climbed to 25 and swap space was getting really low for some reason, but we were able to get in a reboot command before it froze completely. Things have been stable since.

As I write this, the server has been up for 1 day, 2 hour, 25 minutes. It only needs to stay up for about 6 more weeks. Then we’ll be bringing it home since we’ll have our own T1 line into the home office.

Picture of Mickey

Mickey

A recognized leader in the fight against online abuse, specializing in email anti-abuse, compliance, deliverability, privacy, and data protection. With over 20 years of experience tackling messaging abuse, I help organizations clean up their networks and maintain a safe, secure environment.