C6100 + LSI SAS2008 + SAS HD + Centos 6 failed

Bachir Amghar bar.at.criteo at gmail.com
Sat Mar 31 11:02:22 CDT 2012


Hi

I deseperatly tried to install this OS (Centos6 : 6.1 or 6.2) on our C6100 chassis (2 nodes version with 6 * SAS HD by nodes )

# lspci -k
02:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03)
         Subsystem: Inventec Corporation Device 6019
         Kernel driver in use: mpt2sas
         Kernel modules: mpt2sas

# modinfo mpt2sas
filename:       /lib/modules/2.6.32-131.0.15.el6.x86_64/extra/mpt2sas/mpt2sas.ko
version:        12.00.00.00
license:        GPL
description:    LSI MPT Fusion SAS 2.0 Device Driver
author:         LSI Corporation<DL-MPTFusionLinux at lsi.com>
srcversion:     24EF3B083F425C2BE15188F
alias:          pci:v00001000d0000007Esv*sd*bc*sc*i*
alias:          pci:v00001000d0000006Esv*sd*bc*sc*i*
alias:          pci:v00001000d00000087sv*sd*bc*sc*i*
alias:          pci:v00001000d00000086sv*sd*bc*sc*i*
alias:          pci:v00001000d00000085sv*sd*bc*sc*i*
alias:          pci:v00001000d00000084sv*sd*bc*sc*i*
alias:          pci:v00001000d00000083sv*sd*bc*sc*i*
alias:          pci:v00001000d00000082sv*sd*bc*sc*i*
alias:          pci:v00001000d00000081sv*sd*bc*sc*i*
alias:          pci:v00001000d00000080sv*sd*bc*sc*i*
alias:          pci:v00001000d00000065sv*sd*bc*sc*i*
alias:          pci:v00001000d00000064sv*sd*bc*sc*i*
alias:          pci:v00001000d00000077sv*sd*bc*sc*i*
alias:          pci:v00001000d00000076sv*sd*bc*sc*i*
alias:          pci:v00001000d00000074sv*sd*bc*sc*i*
alias:          pci:v00001000d00000072sv*sd*bc*sc*i*
alias:          pci:v00001000d00000070sv*sd*bc*sc*i*
depends:        scsi_transport_sas,raid_class
vermagic:       2.6.32-131.0.15.el6.x86_64 SMP mod_unload modversions
parm:           logging_level: bits for enabling additional logging info (default=0)
parm:           sdev_queue_depth: globally setting SAS device queue depth
parm:           max_sectors:max sectors, range 64 to 32767  default=32767 (ushort)
parm:           command_retry_count: Device discovery TUR command retry count: (default=144) (int)
parm:           max_lun: max lun, default=16895  (int)
parm:           mpt2sas_multipath: enabling mulipath support for target resets (default=0) (int)
parm:           sriov_enabled: sriov support enabled: (default=0) (uint)
parm:           max_vfs: max virtual functions allocated per physical function (default=4) (uint)
parm:           diag_buffer_enable: post diag buffers (TRACE=1/SNAPSHOT=2/EXTENDED=4/default=0) (int)
parm:           max_queue_depth: max controller queue depth  (int)
parm:           max_sgl_entries: max sg entries  (int)
parm:           msix_disable: disable msix routed interrupts (default=0) (int)
parm:           missing_delay: device missing delay , io missing delay (array of int)
parm:           mpt2sas_fwfault_debug: enable detection of firmware fault and halt firmware - (default=0)
parm:           disable_discovery: disable discovery  (int)

# grep mpt2sas /var/log/dmesg
mpt2sas version 12.00.00.00 loaded
mpt2sas 0000:02:00.0: PCI INT A ->  GSI 24 (level, low) ->  IRQ 24
mpt2sas 0000:02:00.0: setting latency timer to 64
mpt2sas0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (49420620 kB)
mpt2sas 0000:02:00.0: irq 48 for MSI/MSI-X
mpt2sas0-msix0: PCI-MSI-X enabled: IRQ 48
mpt2sas0: iomem(0x00000000fbd3c000), mapped(0xffffc900170d8000), size(16384)
mpt2sas0: ioport(0x000000000000d000), size(256)
mpt2sas0: sending diag reset !!
mpt2sas0: diag reset: SUCCESS
mpt2sas0: Allocated physical memory: size(5629 kB)
mpt2sas0: Current Controller Queue Depth(2506), Max Controller Queue Depth(2607)
mpt2sas0: Scatter Gather Elements per IO(128)
mpt2sas0: LSISAS2008: FWVersion(06.00.00.00), ChipRevision(0x03), BiosVersion(07.07.00.00)
mpt2sas0: Protocol=(Initiator,Target), Capabilities=(Raid,TLR,EEDP,Snapshot Buffer,Diag Trace Buffer,Task Set Full,NCQ)
mpt2sas0: sending port enable !!
mpt2sas0: host_add: handle(0x0001), sas_addr(0x500a0d10005155a0), phys(8)
mpt2sas0: port enable: SUCCESS

scsi: waiting for bus probes to complete ...
scsi 0:0:0:0: Direct-Access     WD       WD2000FYYG       D1B3 PQ: 0 ANSI: 6
scsi 0:0:0:0: SSP: handle(0x0009), sas_addr(0x50014ee3aabbe1c2), phy(0), device_name(0x0000000000000000)
scsi 0:0:0:0: SSP: enclosure_logical_id(0x500a0d10005155a0), slot(0)
scsi 0:0:0:0: serial_number(        WMAWP0289588)
scsi 0:0:0:0: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(7), cmd_que(1)
scsi 0:0:1:0: Direct-Access     WD       WD2000FYYG       D1B3 PQ: 0 ANSI: 6
scsi 0:0:1:0: SSP: handle(0x000a), sas_addr(0x50014ee3556665be), phy(1), device_name(0x0000000000000000)
scsi 0:0:1:0: SSP: enclosure_logical_id(0x500a0d10005155a0), slot(1)
scsi 0:0:1:0: serial_number(        WMAWP0273105)
scsi 0:0:1:0: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(7), cmd_que(1)
scsi 0:0:2:0: Direct-Access     WD       WD2000FYYG       D1B3 PQ: 0 ANSI: 6
scsi 0:0:2:0: SSP: handle(0x000b), sas_addr(0x50014ee35566897e), phy(2), device_name(0x0000000000000000)
scsi 0:0:2:0: SSP: enclosure_logical_id(0x500a0d10005155a0), slot(2)
scsi 0:0:2:0: serial_number(        WMAWP0291265)
scsi 0:0:2:0: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(7), cmd_que(1)
scsi 0:0:3:0: Direct-Access     WD       WD2000FYYG       D1B3 PQ: 0 ANSI: 6
scsi 0:0:3:0: SSP: handle(0x000c), sas_addr(0x50014ee3556680fa), phy(3), device_name(0x0000000000000000)
scsi 0:0:3:0: SSP: enclosure_logical_id(0x500a0d10005155a0), slot(3)
scsi 0:0:3:0: serial_number(        WMAWP0287128)
scsi 0:0:3:0: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(7), cmd_que(1)
scsi 0:0:4:0: Direct-Access     WD       WD2000FYYG       D1B3 PQ: 0 ANSI: 6
scsi 0:0:4:0: SSP: handle(0x000d), sas_addr(0x50014ee300113a42), phy(4), device_name(0x0000000000000000)
scsi 0:0:4:0: SSP: enclosure_logical_id(0x500a0d10005155a0), slot(4)
scsi 0:0:4:0: serial_number(        WMAWP0290404)
scsi 0:0:4:0: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(7), cmd_que(1)
scsi 0:0:5:0: Direct-Access     WD       WD2000FYYG       D1B3 PQ: 0 ANSI: 6
scsi 0:0:5:0: SSP: handle(0x000e), sas_addr(0x50014ee3001130be), phy(5), device_name(0x0000000000000000)
scsi 0:0:5:0: SSP: enclosure_logical_id(0x500a0d10005155a0), slot(5)
scsi 0:0:5:0: serial_number(        WMAWP0288481)
scsi 0:0:5:0: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(7), cmd_que(1)

But unfortunaly, after the installation done, the server become unstable once we start to stress the disk (SAS)

we see a ton of this message below in log file (messages)

  kernel: mpt2sas0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

kernel: md: delaying recovery of md1 until md2 has finished (they share one or more physical units)
kernel: sd 0:0:1:0: [sdb] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
kernel: sd 0:0:1:0: [sdb] CDB: Write(10): 2a 00 05 4f 66 80 00 04 00 00
kernel: end_request: I/O error, dev sdb, sector 89089664
kernel: md/raid1:md2: Disk failure on sdb1, disabling device.
kernel: md/raid1:md2: Operation continuing on 1 devices.
kernel: md: md2: resync done.

then after a moment, a new HDD appear

kernel: scsi 0:0:6:0: Direct-Access     WD       WD2000FYYG       D1B3 PQ: 0 ANSI: 6
lernel: scsi 0:0:6:0: SSP: handle(0x000a), sas_addr(0x50014ee300113f82), phy(1), device_name(0x0000000000000000)
kernel: scsi 0:0:6:0: SSP: enclosure_logical_id(0x500a0d1000515350), slot(1)
kernel: scsi 0:0:6:0: qdepth(254), tagged(1), simple(1), ordered(0), scsi_level(7), cmd_que(1)
kernel: sd 0:0:6:0: [sdg] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
kernel: sd 0:0:6:0: [sdg] Write Protect is off
kernel: sd 0:0:6:0: [sdg] Write cache: disabled, read cache: enabled, supports DPO and FUA
kernel: sdg: sdg1 sdg2 sdg3
kernel: sd 0:0:6:0: [sdg] Attached SCSI disk

I tried with the standard mpt2sas kernel module (8.0) and also with the last available on the lsi site (12.00),
without resolve this issue

Have you an idea about this problem ?

FYI: i just discover this chassis and i don't known the architecture of this hardware (it seem more complex than only 2 lames on a blade server no ?)


Thanks for your answer and your information (documentation, setting, etc...)


Bar






More information about the Linux-PowerEdge mailing list