PowerEdge 2800 lockup running RHEL 4 U2

Tonn, Michael mtonn at us.nomura.com
Mon Apr 24 06:57:10 CDT 2006


 

We have experienced the same lockup on several servers that we recently
built with

RHEL4 U2.  Our servers are predominately PE2850s.  We have not seen this
on any

of our RHEL3 servers.

 

Dell support keeps stating that this is a hardware issue which I do not
agree with.  I am

more inclined to believe that it is OS and driver related since it has
only occurred on 

RHEL4 servers.

 

 

Mike

 

 

  _____  

From: linux-poweredge-bounces at dell.com
[mailto:linux-poweredge-bounces at dell.com] On Behalf Of Victor Orgos
Sent: Thursday, April 20, 2006 10:55 PM
To: linux-poweredge at dell.com
Subject: PowerEdge 2800 lockup running RHEL 4 U2

 

Hi,

Our Oracle 10g Application Server was misbehaving this morning and no
matter what I did could not get it to restart or even get details as to
the cause. By the little I could find out it appears that it was the
dell module related to the virtual cdrom from the DRAC. 

The system though not hang was VERY slow. A developer logged on but his
profile was all screwed up, like it didnt run the scripts and an
input/output error was reported. When I remotely logged on as root, it
took several minutes to get to the prompt but no error. System load
reported by uptime was steady at about 14. LS and PS tools responded
immediately but top vmstat seem to hang. I tried to kill some processes
but as far as I could tell they never got the signal. Only one defunct
process reported by ps command, [40-hal-hotplug].

I tried to stop the oracle application server using our start/stop
scripts but got input/output error after a long time. After trying the
reboot and init 0 commands and not getting anywhere, we powered off the
machine. After the reboot everything is ok.

The system is a Poweredge 2800 with 2 xeon cpus and 4g of ram. We are
running the smp kernel that comes with RHEL and the DELL
management/system software provided. We've never had any issues before
and the system has been running for several months ok. Below are log
excerpts that maybe related. 

I would appreciate any assistance. The system is due to go into
production in a few months and we need to be sure that its stable.


Victor

------------------------------------------------

Apr 20 20:40:01 appsrv crond(pam_unix)[17467]: session closed for user
root
Apr 20 20:50:01 appsrv crond(pam_unix)[18151]: session opened for user
root by (uid=0)
Apr 20 20:50:01 appsrv crond(pam_unix)[18151]: session closed for user
root
Apr 20 20:58:17 appsrv kernel: drivers/usb/input/hid-core.c: input irq
status -84 received
Apr 20 20:58:17 appsrv last message repeated 62 times
Apr 20 20:58:17 appsrv kernel: usb 2-1: USB disconnect, address 2
Apr 20 20:58:17 appsrv hal.hotplug[18784]: DEVPATH is not set
Apr 20 20:58:17 appsrv hal.hotplug[18806]: DEVPATH is not set
Apr 20 20:58:17 appsrv kernel: hdf: status error: status=0x7f {
DriveReady DeviceFault SeekComplete DataRequest CorrectedError Index
 Error }
Apr 20 20:58:17 appsrv kernel: hdf: status error:
error=0x7fIllegalLengthIndication EndOfMedia Aborted Command
MediaChangeRequested 
LastFailedSense 0x07 
Apr 20 20:58:17 appsrv kernel: hdf: drive not ready for command
Apr 20 20:58:17 appsrv kernel: hdf: ATAPI reset complete
Apr 20 20:58:17 appsrv kernel: hdf: status error: status=0x7f {
DriveReady DeviceFault SeekComplete DataRequest CorrectedError Index
 Error }

Last few lines repeat for over 10000 times. Just before the switch off,


Apr 20 20:58:31 appsrv kernel: hdf: status error:
error=0x7fIllegalLengthIndication EndOfMedia Aborted Command
MediaChangeRequested 
LastFailedSense 0x07 
Apr 20 20:58:31 appsrv kernel: hdf: drive not ready for command
Apr 20 20:58:31 appsrv kernel: hdf: status error: status=0x7f {
DriveReady DeviceFault SeekComplete DataRequest CorrectedError Index
 Error }
Apr 20 20:58:31 appsrv kernel: hdf: status error:
error=0x7fIllegalLengthIndication EndOfMedia Aborted Command
MediaChangeRequested 
LastFailedSense 0x07 
Apr 20 20:58:31 appsrv kernel: hdf: drive not ready for command
Apr 20 20:58:31 appsrv kernel: hdf: status error: status=0x80 { Busy }
Apr 20 20:58:31 appsrv kernel: hdf: status error:
error=0x80LastFailedSense 0x08 
Apr 20 20:58:31 appsrv kernel: hdf: drive not ready for command
Apr 20 20:58:32 appsrv kernel: irq 193: nobody cared! (screaming
interrupt?)
Apr 20 20:58:32 appsrv kernel: irq 193: Please try booting with acpi=off
and report a bug
Apr 20 20:58:32 appsrv kernel:  [<c01074c2>] __report_bad_irq+0x3a/0x77
Apr 20 20:58:32 appsrv kernel:  [<c0107739>] note_interrupt+0xea/0x115
Apr 20 20:58:32 appsrv kernel:  [<c01079e5>] do_IRQ+0x143/0x1ae
Apr 20 20:58:32 appsrv kernel:  [<c02d1a8c>] common_interrupt+0x18/0x20
Apr 20 20:58:32 appsrv kernel:  [<c01040e5>] mwait_idle+0x33/0x42
Apr 20 20:58:32 appsrv kernel:  [<c010409d>] cpu_idle+0x26/0x3b
Apr 20 20:58:32 appsrv kernel: handlers:
Apr 20 20:58:32 appsrv kernel: [<c023f519>] (ide_intr+0x0/0x11e)
Apr 20 20:58:32 appsrv kernel: [<c0257f38>] (usb_hcd_irq+0x0/0x4b)
Apr 20 20:58:32 appsrv kernel: Disabling IRQ #193
Apr 20 20:58:43 appsrv kernel: usb 2-1: new full speed USB device using
address 3
Apr 20 20:58:43 appsrv kernel: usb 2-1: device not accepting address 3,
error -71
Apr 20 20:58:44 appsrv kernel: usb 2-1: new full speed USB device using
address 4
Apr 20 20:58:44 appsrv hal.hotplug[18938]: DEVPATH is not set
Apr 20 20:58:44 appsrv kernel: input: USB HID v1.10 Keyboard [Dell
DRAC4] on usb-0000:00:1d.0-1
Apr 20 20:58:44 appsrv hal.hotplug[18997]: DEVPATH is not set
Apr 20 20:58:45 appsrv kernel: input: USB HID v1.10 Mouse [Dell DRAC4]
on usb-0000:00:1d.0-1
Apr 21 11:51:12 appsrv syslogd 1.4.1: restart.
Apr 21 11:51:12 appsrv syslog: syslogd startup succeeded
Apr 21 11:51:12 appsrv kernel: klogd 1.4.1, log source = /proc/kmsg
started.
Apr 21 11:51:12 appsrv kernel: Linux version 2.6.9-22.ELsmp
(bhcompile at porky.build.redhat.com) (gcc version 3.4.4 20050721 (Red Hat 
3.4.4-2)) #1 SMP Mon Sep 19 18:32:14 EDT 2005
Apr 21 11:51:12 appsrv kernel: BIOS-provided physical RAM map:



Send instant messages to your online friends
http://au.messenger.yahoo.com 



PLEASE READ: This message is for the named person's use only. It may contain confidential, proprietary or legally privileged information. No confidentiality or privilege is waived or lost by any mistransmission. If you receive this message in error, please delete it and all copies from your system, destroy any hard copies and notify the sender. You must not, directly or indirectly, use, disclose, distribute, print, or copy any part of this message if you are not the intended recipient. Nomura Holding America Inc., Nomura Securities International, Inc, and their respective subsidiaries each reserve the right to monitor all e-mail communications through its networks. Any views expressed in this message are those of the individual sender, except where the message states otherwise and the sender is authorized to state the views of such entity. Unless otherwise stated, any pricing information in this message is indicative only, is subject to change and does not constitute an offer to deal at any price quoted. Any reference to the terms of executed transactions should be treated as preliminary only and subject to our formal written confirmation.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.us.dell.com/pipermail/linux-poweredge/attachments/20060424/ab1492a1/attachment.htm 


More information about the Linux-PowerEdge mailing list