PowerEdge 2800 lockup running RHEL 4 U2

Victor Orgos v_orgos at yahoo.com.au
Thu Apr 27 20:48:11 CDT 2006


Thanks to all that have taken the time to reply. 

A couple have responded that this is problem "sometimes happens" for these servers but for our use this is simply unacceptable. We cannot have the server hang with our only option being a cold reboot. 

While waiting for Dell support to come back to me with a response I have been reading other threads in the forum and I begining to feel that the DRAC is source of alot of these problems, or at least the software installed on RHEL related to the DRAC. 

With the exception of remote console access, would we loose anything by removing any modules and software associated with the DRAC on RHEL? Features like remote BIOS access and the ability to power off/on the server would still be there.

Victor.

Victor Orgos <v_orgos at yahoo.com.au> wrote: Hi,

Our Oracle 10g Application Server was misbehaving this morning and no matter what I did could not get it to restart or even get details as to the cause. By the little I could find out it appears that it was the dell module related to the virtual cdrom from the DRAC. 

The system though not hang was VERY slow. A developer logged on but his profile was all screwed up, like it didnt run the scripts and an input/output error was reported. When I remotely logged on as root, it took several minutes to get to the prompt but no error. System load reported by uptime was steady at about 14. LS and PS tools responded immediately but top vmstat seem to hang. I tried to kill some processes but as far as I could tell they never got the signal. Only one defunct process reported by ps command, [40-hal-hotplug].

I tried to stop the oracle application server using our start/stop scripts but got input/output error after a long time. After trying the reboot and init 0  commands and not getting anywhere, we powered off the machine. After the reboot everything is ok.

The system is a Poweredge 2800 with 2 xeon cpus and 4g of ram. We are running the smp kernel that comes with RHEL and the DELL management/system software provided. We've never had any issues before and the system has been running for several months ok. Below are log excerpts that maybe related. 

I would appreciate any assistance. The system is due to go into production in a few months and we need to be sure that its stable.


Victor

------------------------------------------------

Apr 20 20:40:01 appsrv crond(pam_unix)[17467]: session closed for user root
Apr 20 20:50:01 appsrv crond(pam_unix)[18151]: session opened for user root by (uid=0)
Apr 20 20:50:01 appsrv crond(pam_unix)[18151]: session closed for user root
Apr 20 20:58:17 appsrv kernel: drivers/usb/input/hid-core.c: input irq status -84 received
Apr 20 20:58:17 appsrv  last message repeated 62 times
Apr 20 20:58:17 appsrv kernel: usb 2-1: USB disconnect, address 2
Apr 20 20:58:17 appsrv hal.hotplug[18784]: DEVPATH is not set
Apr 20 20:58:17 appsrv hal.hotplug[18806]: DEVPATH is not set
Apr 20 20:58:17 appsrv kernel: hdf: status error: status=0x7f { DriveReady DeviceFault SeekComplete DataRequest CorrectedError Index
 Error }
Apr 20 20:58:17 appsrv kernel: hdf: status error: error=0x7fIllegalLengthIndication EndOfMedia Aborted Command MediaChangeRequested 
LastFailedSense 0x07 
Apr 20 20:58:17 appsrv kernel: hdf: drive not ready for command
Apr 20 20:58:17 appsrv kernel: hdf: ATAPI reset complete
Apr 20 20:58:17 appsrv kernel: hdf: status error: status=0x7f { DriveReady DeviceFault SeekComplete DataRequest CorrectedError Index
 Error }

Last few lines repeat for over 10000 times. Just before the switch off,


Apr 20 20:58:31 appsrv kernel: hdf: status error:  error=0x7fIllegalLengthIndication EndOfMedia Aborted Command MediaChangeRequested 
LastFailedSense 0x07 
Apr 20 20:58:31 appsrv kernel: hdf: drive not ready for command
Apr 20 20:58:31 appsrv kernel: hdf: status error: status=0x7f { DriveReady DeviceFault SeekComplete DataRequest CorrectedError Index
 Error }
Apr 20 20:58:31 appsrv kernel: hdf: status error: error=0x7fIllegalLengthIndication EndOfMedia Aborted Command MediaChangeRequested 
LastFailedSense 0x07 
Apr 20 20:58:31 appsrv kernel: hdf: drive not ready for command
Apr 20 20:58:31 appsrv kernel: hdf: status error: status=0x80 { Busy }
Apr 20 20:58:31 appsrv kernel: hdf: status error: error=0x80LastFailedSense 0x08 
Apr 20 20:58:31 appsrv kernel: hdf: drive not ready for command
Apr 20 20:58:32 appsrv kernel: irq 193: nobody cared! (screaming interrupt?)
Apr 20 20:58:32 appsrv kernel: irq 193: Please try booting with acpi=off and report a bug
Apr 20 20:58:32 appsrv  kernel:  [<c01074c2>] __report_bad_irq+0x3a/0x77
Apr 20 20:58:32 appsrv kernel:  [<c0107739>] note_interrupt+0xea/0x115
Apr 20 20:58:32 appsrv kernel:  [<c01079e5>] do_IRQ+0x143/0x1ae
Apr 20 20:58:32 appsrv kernel:  [<c02d1a8c>] common_interrupt+0x18/0x20
Apr 20 20:58:32 appsrv kernel:  [<c01040e5>] mwait_idle+0x33/0x42
Apr 20 20:58:32 appsrv kernel:  [<c010409d>] cpu_idle+0x26/0x3b
Apr 20 20:58:32 appsrv kernel: handlers:
Apr 20 20:58:32 appsrv kernel: [<c023f519>] (ide_intr+0x0/0x11e)
Apr 20 20:58:32 appsrv kernel: [<c0257f38>] (usb_hcd_irq+0x0/0x4b)
Apr 20 20:58:32 appsrv kernel: Disabling IRQ #193
Apr 20 20:58:43 appsrv kernel: usb 2-1: new full speed USB device using address 3
Apr 20 20:58:43 appsrv kernel: usb 2-1: device not accepting address 3, error -71
Apr 20 20:58:44 appsrv kernel: usb 2-1: new full speed USB device using address 4
Apr 20 20:58:44  appsrv hal.hotplug[18938]: DEVPATH is not set
Apr 20 20:58:44 appsrv kernel: input: USB HID v1.10 Keyboard [Dell DRAC4] on usb-0000:00:1d.0-1
Apr 20 20:58:44 appsrv hal.hotplug[18997]: DEVPATH is not set
Apr 20 20:58:45 appsrv kernel: input: USB HID v1.10 Mouse [Dell DRAC4] on usb-0000:00:1d.0-1
Apr 21 11:51:12 appsrv syslogd 1.4.1: restart.
Apr 21 11:51:12 appsrv syslog: syslogd startup succeeded
Apr 21 11:51:12 appsrv kernel: klogd 1.4.1, log source = /proc/kmsg started.
Apr 21 11:51:12 appsrv kernel: Linux version 2.6.9-22.ELsmp (bhcompile at porky.build.redhat.com) (gcc version 3.4.4 20050721 (Red Hat 
3.4.4-2)) #1 SMP Mon Sep 19 18:32:14 EDT 2005
Apr 21 11:51:12 appsrv kernel: BIOS-provided physical RAM map:


Send instant messages to your online friends http://au.messenger.yahoo.com 


		
---------------------------------
On Yahoo!7 
  360°:  Your own space to share what you want with who you want!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.us.dell.com/pipermail/linux-poweredge/attachments/20060428/94860fca/attachment.htm 


More information about the Linux-PowerEdge mailing list