Dell Poweredge 1950 freezing
Chmouel Boudjnah
cboudjnah at squiz.net
Wed Aug 29 18:16:22 CDT 2007
Hi,
We have one Dell PowerEdge 1950 lately crashing a lot randomly for no
reasons. The problems appear randomly and appear to open only when we
put the server in Production.
We have tried to run all the stress testing software coming
from Dell and all the linux stress testing tools we have and nothing
appear to crash the thing.
The only errors message that we can see (just after the reboot) are
coming from the dell serveradmin tools :
,----
| Aug 30 08:08:57 ying Server Administrator: Storage Service EventID: 2164 See readme.txt for a list of validated controller d
| river versions.
| Aug 30 08:08:57 ying Server Administrator: Storage Service EventID: 2334 Controller event log: Battery relearn will start in
| 4 days: Controller 0 (PERC 5/i Integrated)
| Aug 30 08:08:57 ying Server Administrator: Storage Service EventID: 2334 Controller event log: Time established as 08/27/07
| 9:40:51; (603277 seconds since power on): Controller 0 (PERC 5/i Integrated)
| Aug 30 08:08:58 ying Server Administrator: Storage Service EventID: 2334 Controller event log: Time established as 08/28/07
| 3:40:41; (667903 seconds since power on): Controller 0 (PERC 5/i Integrated)
| Aug 30 08:08:58 ying Server Administrator: Storage Service EventID: 2334 Controller event log: Battery relearn will start in
| 2 day: Controller 0 (PERC 5/i Integrated)
| Aug 30 08:08:58 ying Server Administrator: Storage Service EventID: 2334 Controller event log: Time established as 08/28/07
| 21:40:31; (732530 seconds since power on): Controller 0 (PERC 5/i Integrated)
| Aug 30 08:08:58 ying Server Administrator: Storage Service EventID: 2334 Controller event log: Firmware initialization start
| ed (PCI ID 0015/1028/1f03/1028): Controller 0 (PERC 5/i Integrated)
| Aug 30 08:08:59 ying Server Administrator: Storage Service EventID: 2334 Controller event log: Battery Present: Controller
| 0 (PERC 5/i Integrated)
| Aug 30 08:08:59 ying Server Administrator: Storage Service EventID: 2334 Controller event log: Package version 5.1.1-0040:
| Controller 0 (PERC 5/i Integrated)
| Aug 30 08:08:59 ying Server Administrator: Storage Service EventID: 2334 Controller event log: Enclosure (SES) discovered on
| PD 08(e1/s255): Controller 0 (PERC 5/i Integrated)
| Aug 30 08:08:59 ying Server Administrator: Storage Service EventID: 2334 Controller event log: Unexpected sense: PD 08(e1/s2
| 55), CDB: 1c 01 01 08 00 00, Sense: 70 00 06 00 00 00 00 0a 00 00 00 00 29 00 00 00 00 00: Controller 0 (PERC 5/i Integrated
| )
| Aug 30 08:09:00 ying Server Administrator: Storage Service EventID: 2334 Controller event log: Inserted: PD 08(e1/s255): Co
| ntroller 0 (PERC 5/i Integrated)
| Aug 30 08:09:00 ying Server Administrator: Storage Service EventID: 2334 Controller event log: Inserted: PD 00(e1/s0): Cont
| roller 0 (PERC 5/i Integrated)
`----
Did someone have even encounter that ? We have a lot of others
PowerEdge 1950 and never encounter such crashing. The system is
running Debian Etch amd64 (kernel 2.6.18-4-amd64). With i believe most
of the firmware updated.
The server look fine :
,----
| root at ying:~> omreport chassis
| Health
|
| Main System Chassis
|
| SEVERITY : COMPONENT
| Ok : Fans
| Ok : Intrusion
| Ok : Memory
| Ok : Power Supplies
| Ok : Processors
| Ok : Temperatures
| Ok : Voltages
| Ok : Hardware Log
| Ok : Batteries
|
| For further help, type the command followed by -?
`----
The controller looks fine:
,----
| root at ying:~> omreport storage controller
| Controller PERC 5/i Integrated (Embedded)
|
| Controllers
| ID : 0
| Status : Ok
| Name : PERC 5/i Integrated
| Slot ID : Embedded
| State : Ready
| Firmware Version : 5.1.1-0040
| Minimum Required Firmware Version : Not Applicable
| Driver Version : 00.00.03.01
| Minimum Required Driver Version : Not Applicable
| Number of Connectors : 2
| Rebuild Rate : 30%
| BGI Rate : 30%
| Check Consistency Rate : 30%
| Reconstruct Rate : 30%
| Alarm State : Not Applicable
| Cluster Mode : Not Applicable
| SCSI Initiator ID : Not Applicable
| Cache Memory Size : 256 MB
| Patrol Read Mode : Auto
| Patrol Read State : Stopped
| Patrol Read Rate : 30%
| Patrol Read Iterations : 11
`----
The battery look fine :
,----
| root at ying:~> omreport storage battery controller=0
| Battery 0 on Controller PERC 5/i Integrated (Embedded)
|
| Controller PERC 5/i Integrated (Slot Embedded)
| ID : 0
| Status : Ok
| Name : Battery 0
| State : Ready
| Recharge Count : Not Applicable
| Max Recharge Count : Not Applicable
| Predicted Capacity Status : Ready
| Learn State : Idle
| Next Learn Time : 1 day 8 hours
| Maximum Learn Delay : 7 days 0 hours
`----
and the system info :
,----
| System Summary
|
| ------------------
| Software Profile
| ------------------
| Systems Management
| Name : Information not available.
| Version : 3.2.0
| Description : Systems Management Software
|
| Operating System
| Name : Linux
| Version : Kernel 2.6.18-4-amd64 (x86_64)
| System Time : Thu Aug 30 09:14:21 2007
| System Bootup Time : Thu Aug 30 08:08:33 2007
|
| --------
| System
| --------
| System
| Host Name : ying
| System Location : Please set the value
|
| ---------------------
| Main System Chassis
| ---------------------
| Chassis Information
| Chassis Model : PowerEdge 1950
| Chassis Service Tag : HSP1Q1S
| Chassis Lock : Present
| Chassis Asset Tag :
|
| Processor 1
| Processor Brand : Intel(R) Xeon(R) CPU 5140 @ 2.33GHz
| Processor Version : Model 15 Stepping 6
| Voltage : 1400 mV
|
| Processor 2
| Processor Brand : Intel(R) Xeon(R) CPU 5140 @ 2.33GHz
| Processor Version : Model 15 Stepping 6
| Voltage : 1400 mV
|
| Memory
| Total Installed Capacity : 4096 MB
| Memory Available to the OS : 3955 MB
| Total Maximum Capacity : 32768 MB
| Slots Available : 8
| Slots Used : 2
| ECC Type : Multibit ECC
|
| Slot PCI1
| Adapter : [Not Occupied]
| Type : PCI E
| Data Bus Width : 8x or x8
| Speed : [Not Obtained, see card documentation]
| Slot Length : Long
| Voltage Supply : 3.3 Volts
|
| Slot PCI2
| Adapter : [Not Occupied]
| Type : PCI E
| Data Bus Width : 8x or x8
| Speed : [Not Obtained, see card documentation]
| Slot Length : Long
| Voltage Supply : 3.3 Volts
|
| BIOS Information
| Manufacturer : Dell Inc.
| Version : 1.3.7
| Release Date : 03/26/2007
|
| Firmware Information
| Name : Baseboard Management Controller
| Version : 1.33
`----
Any help will be appreciated.
--
http://www.squiz.net
More information about the Linux-PowerEdge
mailing list