DRAC4 - kernel crash
Miroslav Jany
mj at svsbb.sk
Sat Jun 24 18:47:34 CDT 2006
Hello all!
Yesterday we encountered a problem on our PowerEdge 2850 server.
We think this might be related with the DRAC card. After looking
at the kernel logfile at the time crash began, we've found this
message:
Jun 24 10:44:15 xxxx kernel: usb 2-1: USB disconnect, address 2
Then the logfile is filled with about ~218,417 lines of these
self-repeating messages:
Jun 24 10:44:15 xxxx kernel: hdf: status error: status=0x7f {
DriveReady DeviceFault SeekComplet
e DataRequest CorrectedError Index Error }
Jun 24 10:44:15 xxxx kernel: hdf: status error: error=0x7f {
IllegalLengthIndication EndOfMedia
AbortedCommand MediaChangeRequested LastFailedSense=0x07 }
Jun 24 10:44:15 xxxx kernel: ide: failed opcode was: unknown
Jun 24 10:44:15 xxxx kernel: hdf: drive not ready for command
Jun 24 10:44:15 xxxx kernel: hdf: ATAPI reset complete
After a while, the machine was dead and this is the last log
recorded before we had to reboot the server because the root
filesystem was remounted read-only:
Jun 24 10:44:27 xxxx kernel: hdf: drive not ready for command
Jun 24 10:44:29 xxxx kernel: irq 18: nobody cared (try booting with
the "irqpoll" option)
Jun 24 10:44:29 xxxx kernel:
Jun 24 10:44:29 xxxx kernel: Call Trace: <IRQ>
<ffffffff80152ee8>{__report_bad_irq+48} <ffffffff
80153128>{note_interrupt+499}
Jun 24 10:44:29 xxxx kernel: <ffffffff80152a07>{__do_IRQ+237}
<ffffffff8011033e>{do_IRQ+69}
Jun 24 10:44:29 xxxx kernel: <ffffffff8010def4>{ret_from_intr+0}
<EOI>
Jun 24 10:44:29 xxxx kernel: handlers:
Jun 24 10:44:29 xxxx kernel: [<ffffffff80243d7e>] (ide_intr+0x0/0x1f8)
Jun 24 10:44:29 xxxx kernel: [<ffffffff8802f14c>]
(usb_hcd_irq+0x0/0x58 [usbcore])
Jun 24 10:44:29 xxxx kernel: Disabling IRQ #18
Jun 24 10:44:41 xxxx kernel: drbd0: Primary/Secondary -->
Secondary/Secondary
Jun 24 10:44:42 xxxx kernel: usb 2-1: new full speed USB device using
uhci_hcd and address 3
Jun 24 10:44:42 xxxx kernel: input: USB HID v1.10 Keyboard [Dell
DRAC4] on usb-0000:00:1d.0-1
Jun 24 10:44:42 xxxx kernel: input: USB HID v1.10 Mouse [Dell DRAC4]
on usb-0000:00:1d.0-1
Jun 24 10:44:44 xxxx kernel: drbd0: Secondary/Secondary -->
Secondary/Primary
We couldn't find out what's going on here, especially what could
be the cause of "resetting" the /dev/hdf device, probably causing
the server to crash.
It is running CentOS 4 with 2.6.13.4 kernel. DRBD version is
0.7.4. The /dev/hdf device is a VIRTUALCDROM device and the firmware
version of the DRAC controller is 1.35 (Build 09.27).
Thanks for any help in advance!
-Mirek
More information about the Linux-PowerEdge
mailing list