Dell 6650 megaraid hang

Mikey Sklar sklarm at electric-clothing.com
Thu Oct 7 17:16:00 CDT 2004


We've updated a few hundred of our Dell 6650's to the latest megaraid
firmware (along with bios, esm, and rac levels). Today we saw the first
repeat hang on a server running AS2.1 2.4.9-e.34enterprise.

While diagnosing the hang we gathered sysrq-t data and I see that the
scsi_unjam_host call is present. One of the signatures we have associated
with megaraid hangs.

scsi_eh_0     D 00000001  5188   109      1           110    23 (L-TLB)
Call Trace: [<f884c6f0>] mega_internal_command [megaraid2] 0x140 (0xf5c4de98)
[<c011cea8>] printk [kernel] 0xd8 (0xf5c4dec4)
[<f8849431>] megaraid_reset [megaraid2] 0x61 (0xf5c4dedc)
[<c0114318>] smp_apic_timer_interrupt [kernel] 0xb8 (0xf5c4df0c)
[<f884dce0>] .LC63 [megaraid2] 0x0 (0xf5c4df10)
[<f88059ae>] scsi_try_bus_device_reset [scsi_mod] 0x3e (0xf5c4df5c)
[<f8806236>] scsi_unjam_host [scsi_mod] 0x426 (0xf5c4df70)
[<f8850940>] driver_template [megaraid2] 0x0 (0xf5c4df9c)
[<f88069b8>] scsi_error_handler [scsi_mod] 0x198 (0xf5c4dfa0)
[<c0107396>] ret_from_fork [kernel] 0x6 (0xf5c4dfbc)
[<c0105856>] arch_kernel_thread [kernel] 0x26 (0xf5c4dff0)
[<f8806820>] scsi_error_handler [scsi_mod] 0x0 (0xf5c4dff8)

Is there a known resolution to this hang?

Controller, disk and firmware revisions below.

$ bsm -A show -T container,controller,disk
--------------------------------------------------------------------------
Logical drive information
--------------------------------------------------------------------------
Logical drive number 0
   Status of logical drive                 : optimal
   Current operation                       : none
   RAID level                              : 1
   Stripe size                             : 64
   Row size                                : 2
Logical drive number 1
   Status of logical drive                 : optimal
   Current operation                       : none
   RAID level                              : 1
   Stripe size                             : 64
   Row size                                : 2
--------------------------------------------------------------------------
Controller information
--------------------------------------------------------------------------
   Controller type                         : PERC 3/DC
   Firmware version                        : 197O
   Logical devices                         : 2
   Channels                                : 2
   Maximum logical devices                 : 40
   Concurrent commands supported           : 63
--------------------------------------------------------------------------
Physical drive information
--------------------------------------------------------------------------
Channel #0
   Target on SCSI ID 0
     SCSI ID                               : 0
     State                                 : online
     Device ID                             : SEAGATE ST336753LC
   Target on SCSI ID 1
     SCSI ID                               : 1
     State                                 : online
     Device ID                             : SEAGATE ST336753LC

$ bsm -A check -T firmware
Detected system manufacturer dell
Detected system model poweredge 6650

Checking BIOS version...
Supported BIOS version(s): A15 A16
Production BIOS version: A16
Detected BIOS version: A16
BIOS version check                                                     [ PASS ]

Checking ESM version...
Supported ESM version(s): 1.64 1.78
Production ESM version: 1.78
Detected ESM version: 1.78
ESM version check                                                      [ PASS ]

Checking RAC version...
Supported RAC version(s): 3.12 3.14
Production RAC version: 3.14
Detected RAC version: 3.14
RAC version check                                                      [ PASS ]

Checking RAID version...
Supported RAID version(s): 197O
Production RAID version: 197O
Detected RAID version: 197O
RAID version check                                                     [ PASS ]




More information about the Linux-PowerEdge mailing list