bad crash. any idea ?
fabrice.lorrain at univ-mlv.fr
Thu Aug 8 12:29:01 CDT 2002
2 days ago our main PE4400+PERC 3Di crashed. We just finished putting a
spare server online, but we are missing some data.Any help appreciate.
Here is the story :
- tusday the server freezes and I could not connect to it (ssh),
- I found the console spitting
"failed to exec /sbin/modprobe -s -k binfmt-b1d7, errno=8"
like mad. Couldn't log in. And a 3 fingers reboot didn't work.
At the same time, the 5 disks of our RAID5 pool where playing christmas
tree (blinking orange, the lone volume disk seems to be ok).
-> AC power stop/ AC power start
During the POST :
container #0 RAID 5 critical (known pb)
container #1 unkown --> the real pb
container #2 Volume ok
Bye-bye our 60Go /home on sdb1 (container #1), sda2 (/ on ext3) seems to
be behind salvation too.sda[5-7] are ok.
Right now, I've an nfsroot environnement with afacli booting the server.
What I would like to know is :
- where does the binfmt error message come from,
- any chance we can get container #1 online
- some explanation on how this mess could happen (ie how can we loose a
whole container with an AC shortage).
technical info :
I can provide more if needed.
- hardware : poweredge 4400+PERC 3Di, dual xeon 933Mhz, 1Go RAM, altheon
copper Giga NIC (+ onboard intel)
- BIOS A06, ESM 5.22, array monitor v2.1-3
- distrib : debian potato
- kernel : vanilla-2.2.19+SMP+aacraid patch from Matt page+ext3 patch
The server is our main file server (samba) + dhcp server + DNS slave
It has been running like a charm for more than a year with almost no load.
Current june, we had a pb with drive 4. I change the disk, but the
automatic rebuild didn't do what I expect (cf the following "container
list"). I leave the first container in critical state because we where
supposed to change the server rapidly...
AFA0> container list
Executing: container list
Num Total Oth Chunk Scsi Partition
Label Type Size Ctr Size Usage B:ID:L Offset:Size
----- ------ ------ --- ------ ------- ------ -------------
0 RAID-5 8.00GB 32KB Valid 0:00:0 64.0KB:2.00GB
/dev/sda system 0:01:0 64.0KB:2.00GB
--- Missing ---
1 RAID-5 59.7GB 32KB Valid 0:00:0 2.00GB!14.9GB
/dev/sdb donnees 0:01:0 2.00GB!14.9GB
2 Volume 16.9GB Open 0:09:0 64.0KB:16.9GB
Thanks for any insight.
administrateur systemes et reseau
universite de Marne-la-Vallee
More information about the Linux-PowerEdge