Possible hardware problems with a PowerEdge PE1650

Travis B. Hartwell nafai at twistedmatrix.com
Mon Aug 26 18:01:01 CDT 2002


Hi all,

I recently inherited the sys admin tasks on a Dell PowerEdge PE1650.
We have not yet put this machine into production, but are in the
process of readying to do so.  This last weekend, under a very light
load, the system started experiencing problems.  I didn't realize that
it did until this afternoon, as no one tried accessing the machine.  I
got it rebooted and found these in the logs:

Aug 23 15:00:15 sv1 ucd-snmp[1138]: Got trap from peer on fd 7 
Aug 23 15:00:15 sv1 Server Administrator: EventID: 1053  Temperature sensor detected a warning value  Sensor location: Planar  Chassis location: Main System Chassis  Previous state w
as: Critical (Failed)  Temperature sensor value (in Degrees Celsius): 54.000
Aug 23 15:02:20 sv1 ucd-snmp[1138]: Got trap from peer on fd 7 
Aug 23 15:02:20 sv1 Server Administrator: EventID: 1052  Temperature sensor returned to a normal value  Sensor location: Planar  Chassis location: Main System Chassis  Previous state
 was: Non-Critical (Warning)  Temperature sensor value (in Degrees Celsius): 50.000
Aug 23 15:02:40 sv1 kernel: aacraid:Battery charge is now OK
Aug 23 15:02:40 sv1 kernel: .
Aug 23 16:13:10 sv1 ucd-snmp[1138]: Got trap from peer on fd 7 
Aug 23 16:13:10 sv1 Server Administrator: EventID: 1053  Temperature sensor detected a warning value  Sensor location: Planar  Chassis location: Main System Chassis  Previous state w
as: OK (Normal)  Temperature sensor value (in Degrees Celsius): 51.000
Aug 23 16:21:42 sv1 kernel: aacraid:Battery is Charging
Aug 23 16:21:42 sv1 kernel: .
Aug 23 16:37:07 sv1 ucd-snmp[1138]: Got trap from peer on fd 7 
Aug 23 16:37:07 sv1 Server Administrator: EventID: 1054  Temperature sensor detected a failure value  Sensor location: Planar  Chassis location: Main System Chassis  Previous state w
as: Non-Critical (Warning)  Temperature sensor value (in Degrees Celsius): 56.000
Aug 23 18:43:04 sv1 ucd-snmp[1138]: Got trap from peer on fd 7
Aug 23 18:43:04 sv1 Server Administrator: EventID: 1053  Temperature sensor detected a warning value  Sensor location: Planar  Chassis location: Main System Chassis  Previous state w
as: Critical (Failed)  Temperature sensor value (in Degrees Celsius): 55.000
Aug 23 18:53:43 sv1 kernel: aacraid:Battery charge is now OK
Aug 23 18:53:43 sv1 kernel: .
Aug 23 18:55:33 sv1 ucd-snmp[1138]: Got trap from peer on fd 7
Aug 23 18:55:33 sv1 Server Administrator: EventID: 1052  Temperature sensor returned to a normal value  Sensor location: Planar  Chassis location: Main System Chassis  Previous state
 was: Non-Critical (Warning)  Temperature sensor value (in Degrees Celsius): 50.000
Aug 23 19:24:43 sv1 ucd-snmp[1138]: Got trap from peer on fd 7
Aug 23 19:24:43 sv1 Server Administrator: EventID: 1053  Temperature sensor detected a warning value  Sensor location: Planar  Chassis location: Main System Chassis  Previous state w
as: OK (Normal)  Temperature sensor value (in Degrees Celsius): 51.000
Aug 23 19:32:14 sv1 kernel: aacraid:Battery is Charging
Aug 23 19:32:14 sv1 kernel: .
Aug 23 19:50:44 sv1 ucd-snmp[1138]: Got trap from peer on fd 7
Aug 23 19:50:44 sv1 Server Administrator: EventID: 1054  Temperature sensor detected a failure value  Sensor location: Planar  Chassis location: Main System Chassis  Previous state w
as: Non-Critical (Warning)  Temperature sensor value (in Degrees Celsius): 56.000
Aug 24 04:02:01 sv1 syslogd 1.4.1: restart.
Aug 24 16:32:25 sv1 ucd-snmp[1138]: Got trap from peer on fd 7
Aug 24 16:32:25 sv1 Server Administrator: EventID: 1053  Temperature sensor detected a warning value  Sensor location: Planar  Chassis location: Main System Chassis  Previous state w
as: Critical (Failed)  Temperature sensor value (in Degrees Celsius): 54.000
Aug 24 16:35:33 sv1 ucd-snmp[1138]: Got trap from peer on fd 7
Aug 24 16:35:33 sv1 Server Administrator: EventID: 1052  Temperature sensor returned to a normal value  Sensor location: Planar  Chassis location: Main System Chassis  Previous state
 was: Non-Critical (Warning)  Temperature sensor value (in Degrees Celsius): 49.000
Aug 24 16:35:35 sv1 kernel: aacraid:Battery charge is now OK
Aug 24 16:35:35 sv1 kernel: .
Aug 24 17:42:11 sv1 ucd-snmp[1138]: Got trap from peer on fd 7
Aug 24 17:42:11 sv1 Server Administrator: EventID: 1053  Temperature sensor detected a warning value  Sensor location: Planar  Chassis location: Main System Chassis  Previous state w
as: OK (Normal)  Temperature sensor value (in Degrees Celsius): 51.000
Aug 24 17:46:32 sv1 kernel: aacraid:Battery is Charging
Aug 24 17:46:32 sv1 kernel: .
Aug 24 17:54:41 sv1 ucd-snmp[1138]: Got trap from peer on fd 7
Aug 24 17:54:41 sv1 Server Administrator: EventID: 1054  Temperature sensor detected a failure value  Sensor location: Planar  Chassis location: Main System Chassis  Previous state w
as: Non-Critical (Warning)  Temperature sensor value (in Degrees Celsius): 56.000
Aug 24 18:36:45 sv1 kernel: aacraid:Enclosure 0 - Temperature 123, over threshold 120
Aug 24 18:36:45 sv1 kernel: .
Aug 24 19:02:20 sv1 ucd-snmp[1138]: Got trap from peer on fd 7
Aug 24 19:02:20 sv1 Server Administrator: EventID: 1154  Voltage sensor detected a failure value  Sensor location: +5  Chassis location: Main System Chassis  Previous state was: OK (
Normal)  Voltage sensor value (in Volts): 5.387
Aug 24 19:03:23 sv1 Server Administrator: EventID: 1153  Voltage sensor detected a warning value  Sensor location: BP 5V  Chassis location: Main System Chassis  Previous state was: O
K (Normal)  Voltage sensor value (in Volts): 5.485
Aug 24 19:03:23 sv1 ucd-snmp[1138]: Got trap from peer on fd 7
Aug 24 19:04:25 sv1 ucd-snmp[1138]: Got trap from peer on fd 7
Aug 24 19:04:25 sv1 Server Administrator: EventID: 1154  Voltage sensor detected a failure value  Sensor location: BP 5V  Chassis location: Main System Chassis  Previous state was: N
on-Critical (Warning)  Voltage sensor value (in Volts): 5.589
Aug 24 19:07:23 sv1 kernel: aacraid:SCSI bus reset issued on channel 0

I have not yet custom configured the OMSA tools (or even learned how
to use them).  Well, obviously, since this problem arose, I have need
to learn about the tools.  I am wondering if anyone can decode from
these logs what may have been going on.  What else can I check to see
what happened?  I looked at various other logs but saw nothing out of
the ordinary.

We are running Redhat 7.2 on this machine.

Travis Hartwell




More information about the Linux-PowerEdge mailing list