Hard lockup w/CentOS 4.3, PE1950 w/Fusion-MPT SAS

Matt Garman matthew.garman at gmail.com
Mon Dec 11 08:29:53 CST 2006


Over the weekend, one of our servers locked up hard (i.e. physical
reboot required).  We thought we were going to lose some data, but
e2fsck saved our filesystem.

I believe the last message in the system log gives some insight into
the problem:

Dec  9 20:17:38 lnxsvr3 kernel: mptscsi: ioc0: attempting task abort!
(sc=000001001d1cd1c0)
Dec  9 20:17:38 lnxsvr3 kernel: scsi0 : destination target 0, lun 0
Dec  9 20:17:38 lnxsvr3 kernel:         command = Log Sense 00 6f 00
00 00 00 00 04 00
Dec  9 20:17:38 lnxsvr3 kernel: mptbase: ioc0: LogInfo(0x31140000):
Originator={PL}, Code={IO Executed}, SubCode(0x0000)
Dec  9 20:17:38 lnxsvr3 kernel: mptscsi: ioc0: task abort: SUCCESS
(sc=000001001d1cd1c0)

I did some searching on this, and found a few interesting links:

http://lkml.org/lkml/2006/2/8/132
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=200787
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=208033

However, I can't find any conclusive information, e.g. known driver
bug or anything.

Anyone else seen this firsthand?  Or have better links?

In our case the machine had been running for at least a couple weeks;
this is the first time we've experienced this problem.  What follows
is the output from lspci.

Thank you,
Matt

02:08.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068 PCI-X Fusion-
MPT SAS (rev 01)
        Subsystem: Dell: Unknown device 1f06
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr+ Step
ping- SERR+ FastB2B-
        Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort
- <MAbort- >SERR- <PERR-
        Latency: 72 (16000ns min, 2500ns max), Cache Line Size 10
        Interrupt: pin A routed to IRQ 193
        Region 0: I/O ports at ec00 [disabled] [size=256]
        Region 1: Memory at fc6fc000 (64-bit, non-prefetchable) [size=16K]
        Region 3: Memory at fc6e0000 (64-bit, non-prefetchable) [size=64K]
        Expansion ROM at fc700000 [disabled] [size=1M]
        Capabilities: [50] Power Management version 2
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot
-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [98] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable
-
                Address: 0000000000000000  Data: 0000
        Capabilities: [68] PCI-X non-bridge device.
                Command: DPERE- ERO- RBC=0 OST=6
                Status: Bus=2 Dev=8 Func=0 64bit+ 133MHz+ SCD- USC-, DC=simple,
DMMRBC=2, DMOST=6, DMCRS=4, RSCEM-
        Capabilities: [b0] MSI-X: Enable- Mask- TabSize=1
                Vector table: BAR=1 offset=00002000
                PBA: BAR=1 offset=00003000



More information about the Linux-PowerEdge mailing list