Not in the Mood
May. 17th, 2009 07:34 amMy mail and webserver just crashed. At first it looked like the usual problem that one of the 3 drives in the main RAID10 array had gotten 'stuck' which, so far as I can tell, is caused by a bug in the raid software itself.
A quick reboot and the system marked the 'stuck' drive as dirty and started a rebuild. So far, so good.
3 hours later I noticed the system was unresponsive and upon closer inspection was spitting out warning messages to the console so fast that I couldn't read anything but the work 'rebuilding'.
After trying everything I could to make the system calm down, I had to power it off. I restarted it again (in single user mode this time) and it seemed fine, and I told it to go back to rebuilding the array, which it did.
3 hours later it announced that a completely different drive had failed and as it now had 2 dirty drives in one 3-drive array, it was stopping right there. And that's the current state. I have NO IDEA how to tell it to assume that a 'spare' drive is actually not a spare. I also don't know if its complaint of a failing drive is correct or not. I can't even check the logs, since the logs are stored on the drive that was rebuilding...
So, right now I don't have my main mail server, and I have no real idea on how to fix the problem. So, for the moment I'm going to stop fiddling with it and see if some extended thought might solve things.
A quick reboot and the system marked the 'stuck' drive as dirty and started a rebuild. So far, so good.
3 hours later I noticed the system was unresponsive and upon closer inspection was spitting out warning messages to the console so fast that I couldn't read anything but the work 'rebuilding'.
After trying everything I could to make the system calm down, I had to power it off. I restarted it again (in single user mode this time) and it seemed fine, and I told it to go back to rebuilding the array, which it did.
3 hours later it announced that a completely different drive had failed and as it now had 2 dirty drives in one 3-drive array, it was stopping right there. And that's the current state. I have NO IDEA how to tell it to assume that a 'spare' drive is actually not a spare. I also don't know if its complaint of a failing drive is correct or not. I can't even check the logs, since the logs are stored on the drive that was rebuilding...
So, right now I don't have my main mail server, and I have no real idea on how to fix the problem. So, for the moment I'm going to stop fiddling with it and see if some extended thought might solve things.