[Linux-PowerEdge] Missing Disk on SAS 6

Kai Schaetzl maillists at conactive.com
Thu Nov 22 12:24:14 CST 2012


I have a Dell 1950 with an SAS 6iR aka LSI1068 card. It reports a "missing" disk. What 
does this actually mean? It looks like this is not a failed disk,
but rather some kind of "broken" RAID, although the card still 
reports a RAID.

mpt-status:
ioc0 vol_id 4 type IM, 2 phy, 464 GB, state DEGRADED, flags ENABLED
ioc0 phy 0 scsi_id 5 ATA      WDC WD5002ABYS-0 3B02, 465 GB, state ONLINE, flags NONE
ioc0 phy 1 scsi_id 8 ATA      WDC WD5002ABYS-0 3B02, 465 GB, state MISSING, flags 
OUT_OF_SYNC

sg_inq:
[0:0:1:0]    disk    ATA      WDC WD5002ABYS-0 3B02  -         /dev/sg1 WD-WCASY2302383
[0:0:4:0]    disk    ATA      WDC WD5002ABYS-0 3B02  /dev/sdb  /dev/sg0 WD-WMASY4642710
[0:1:4:0]    disk    Dell     VIRTUAL DISK     1028  /dev/sda  /dev/sg2

on a system with failed disk it should show: (= no disk at all)
[0:0:1:0]    disk    ATA      WDC WD5002ABYS-0 3B02  -  /dev/sg1 WD-WCASY2302383
[0:1:4:0]    disk    Dell     VIRTUAL DISK     1028  /dev/sda  /dev/sg2

output of lshw shows indeed "correct data" and volumes for the second disk as if it was 
attached "unraided".

SMART health check is OK.

How should I proceed here to get it back in the RAID array or what else to do?
The machine is in a datacenter some kilometers away, so I first want to try what I can 
remotely without offlining the machine.
I have lsiutil installed, but am not sure if this would be helpful. The most helpful
information so far came from other commands (like above), not from lsiutil.
Could this indicate a broken card?

It seems this has actually happened already before at least once during the last four 
weeks. But according to logs it took only 2 seconds and then the disk was moved from 
missing to online again. This time it was moved to "Direct-Access". Here's what's in the 
log.

Nov 21 20:27:31 c4 kernel: mptbase: ioc0: RAID STATUS CHANGE for PhysDisk 1 id=8
Nov 21 20:27:31 c4 kernel: mptbase: ioc0:   PhysDisk is now missing
Nov 21 20:27:31 c4 kernel: mptbase: ioc0: RAID STATUS CHANGE for PhysDisk 1 id=8
Nov 21 20:27:31 c4 kernel: mptbase: ioc0:   PhysDisk is now missing, out of sync
Nov 21 20:27:31 c4 kernel: mptbase: ioc0: RAID STATUS CHANGE for VolumeID 4
Nov 21 20:27:31 c4 kernel: mptbase: ioc0:   volume is now degraded, enabled
Nov 21 20:27:32 c4 kernel:  target0:0:3: mptsas: ioc0: delete device: fw_channel 0, fw_id 
8, phy 4, sas_addr 0x1221000004000000
Nov 21 20:27:34 c4 kernel: mptbase: ioc0: LogInfo(0x31111000): Originator={PL}, Code=
{Reset}, SubCode(0x1000) cb_idx mptbase_reply
Nov 21 20:27:35 c4 kernel: mptsas: ioc0: attaching sata device: fw_channel 0, fw_id 8, 
phy 4, sas_addr 0x1221000004000000
Nov 21 20:27:35 c4 kernel: mptscsih: ioc0: attempting target reset! (sc=ffff8800e9cc5840)
Nov 21 20:27:35 c4 kernel: scsi 0:0:4:0:
Nov 21 20:27:35 c4 kernel:         command: Inquiry: 12 00 00 00 24 00
Nov 21 20:27:35 c4 kernel: mptscsih: ioc0: target reset: SUCCESS (sc=ffff8800e9cc5840)
Nov 21 20:27:37 c4 kernel:   Vendor: ATA       Model: WDC WD5002ABYS-0  Rev: 3B02
Nov 21 20:27:37 c4 kernel:   Type:   Direct-Access                      ANSI SCSI 
revision: 05
Nov 21 20:27:37 c4 kernel: SCSI device sdb: 976771055 512-byte hdwr sectors (500107 MB)
Nov 21 20:27:37 c4 kernel: sdb: Write Protect is off
Nov 21 20:27:37 c4 kernel: SCSI device sdb: drive cache: write back
Nov 21 20:27:37 c4 kernel: SCSI device sdb: 976771055 512-byte hdwr sectors (500107 MB)
Nov 21 20:27:37 c4 kernel: sdb: Write Protect is off
Nov 21 20:27:37 c4 kernel: SCSI device sdb: drive cache: write back
Nov 21 20:27:37 c4 kernel:  sdb: sdb1 sdb2 sdb3
Nov 21 20:27:37 c4 kernel: sd 0:0:4:0: Attached scsi disk sdb
Nov 21 20:27:37 c4 kernel: sd 0:0:4:0: Attached scsi generic sg0 type 0

Thanks,

Kai




More information about the Linux-PowerEdge mailing list