Thursday, March 11, 2010

Server Trials in Down Time

Some of you may have noticed the last couple days the server was uhh not quite working right.  Find out why after the jump.

The server is running a Linux distro called Ubuntu Server put out by a company called Canonical Ltd.

Ubuntu is debian based and as such supports apt-get and code repositories.  Every 6 months Canonical releases a new version of Ubuntu (april and october) and gives them a major version which is the number of years since they started and a minor version of the month.  So you'll see 8.04, 8.10, 9.04, 9.10 etc.  The next release this april will be 10.04.

As Ubuntu releases new versions they faze out support for older versions with the exception of LTS (Long Term Support) releases, of which 8.04 was the last and 10.04 will be the next.

As some may know also has a mumble server.  Mumble has drastically changed their system between 1.1 and 1.2 to the point where a 1.2 client cannot connect to a 1.1 server.  Mumble has addressed this by giving people the option to install the "legacy" client and when you try to connect to a 1.1 server it prompts and then opens the 1.1 legacy client and continues connecting through that.

The first version of Ubuntu to have the 1.2 server in repositories was Karmik Koala or 9.10.  Upgrading Ubuntu is supposed to be as simple as do-release-update, bunch of crap happens then it prompts you saying it needs to restart.

I talked with Luthien about it and my plans to do an update on the server, we figured due to the recent low player count now is the time to do it and that I'd do it during the weekday to give enough time to fix the system if something goes horribly wrong, which with Linux I've come to expect, I just expected more from Ubuntu.

So Tuesday rolls around and I get to work and ssh into the server and start the update.  A quick "sudo do-release-update" command and its off scanning the system, finding packages to update, getting new repositories and away it goes. The update seemed to be going relatively smoothly but when it came time to restart everything went horribly wrong.  Apparently 9.10 kernel has severe issues and during load has kernel panics about not having enough memory.  Our server has 4 gigs, 9.10 recommends at least 256 megs.  We have more than enough memory.  As such the system couldn't boot and poof mud down.

This is where the fun starts.  I call Deru Communications (company who owns the rack our server is on) and they had noticed the server was offline as well from their automated checks and was about to contact me too.  They hook me up with a KVM (Keyboard Video Mouse) switch login so I can see what's going on remotely.  Bad things... Very bad things.

Before I started the update I grabbed the most recent backup off the server and since I work at a web development company we do have some linux servers.  I tried to get it running one of them, but I was working with our net admin and we discovered that the servers are missing utilities that are required to run the mud but it could run a PHP based website so I took what I could get at the time and threw the forums up on there and pointed to it.

A couple of status posts on there and then its back to work for now.

So Tuesday night I'm driving down to downtown Phoenix to rescue Chuck (aka tsosmud server) from the rack to take home.  I get home late, due to the awesome rush hour traffic in phoenix on the 10.

Once I finally get home I load up the Ubuntu 9.10 install cd and start a LiveCD session, essentially it boots to linux stored completely in memory so I can mount drives and back things up and try to repair my existing install, but for this I just backed up the database and home directory.  Then I shut down and unplugged my backup drive to ensure I wouldn't accidentally do anything that could cause me to lose the data on that drive.

Turn the system back on and boot into the installation of 9.10 Karmic Koala, remember this is the same CD that let me do a LiveCD from it.

I start the install, everything is going smooth, repartitioning the main drive, formatting it with the ext4 file system, etc.  Everything finishes installing and time to reboot and bad juujuu.  Boot seems to be going fine until it loads the mysql daemon then suddenly kernel panics everywhere claiming to be out of memory.  Again, 4 gigs for an OS that can run on 64 megs is an indication I'm not out of memory.

So I assume maybe I have a RAM chip going bad so out comes MEMTEST86 and a couple hours later of no faults found in the RAM at all I have to assume my chips are good.  Maybe it was the 64 bit install it didn't like.  I can't remember if I set the server up originally as 64 bit or 32 bit so I figure what the hell and started the 32 bit install.

Everything going well, install was successful, server reboots, apps loading... and BAM kernel panics.  Now both these ISO's for 64 bit and 32 bit versions I downloaded off the US server so I figured maybe the ISOs were bad so I downloaded a new one off the canada server for the 64 bit ISO.

Download finishes, burned to a disk and away we went.  Until kernel panics.  Now I'm getting mad, its 11PM Tuesday night and I have a server that I'm thinking has a bad hard drive.  I mean its not like Ubuntu would put out a SERVER version that doesn't work right? Server versions are built to be stable and not be problematic.

So I'm thinking maybe the Seagate hard drive is going bad and that I'll have to pick one up from Fry's on Wednesday.  But just to make sure I downloaded 8.04 LTS and discovered 8.04 can't install off a SATA CD Rom drive, so then I downloaded 8.10 (What I installed on the server originally) and put that in there.

Install went smooth, everything should be ok and restarted and poof! it worked!  After a few seconds I'm staring at the login prompt and I'm hating Kursed Koala.  Now its after midnight and I figure what the hell lets see how much I can go.  From the fresh install I do an upgrade to 9.04 Jaunty Jackelope.  Update went smooth no issues, booted right into 9.04 no problem.  Ok, final chance, I start the update to 9.10, update went fine no issues, server reboots and BAM Kernal Panics.  So now I know for a fact the problem lies with Kursed Koala.

By now its just after 1AM Wednesday morning and I have work in a few hours.  But I know the issue so I go to bed.

During all of this time Luthien was trying to get a copy of the mud working on his laptop, he used to have one that worked but it was before he updated his OSX and once he got it working he couldn't get outside connections.

Dom managed to get it working on a computer that had 1 of 3 hard drives dead and so I pointed to his computer to let people log in there temporarily.

Wednesday I get home and download 9.04 and do a fresh install, works beautifully.  Then I get a few things we use installed and reboot and everything working well still, do updates and reboot everything fine.  Hook the backup drive back in and get it mounted and restore home directory and database and configure apache2 and by 7PM Wednesday I have the server showing me our home page and running the mud.

I clean a few things up and turned it off to bring back down to central Phoenix to put back on the rack Thursday.  And now I finally have some time to really get into FF13.

1 comment:

  1. Tsos galumphed again and server poofed on people. -.-