[Linux-PowerEdge] MegaRAID megasasctl stops working during a backup

Raphaël Melior raphael.melior at obs-besancon.fr
Wed Dec 16 06:52:04 CST 2015


Hi

I have trouble with LSI Logic / Symbios Logic MegaRAID SAS 2008 controller.

Actually I am making a backup to an LTO drive connected with LSI Logic /
Symbios Logic SAS2008 PCI-Express Fusion-MPT SAS-2.
$ lspci | grep 2008
03:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2008
[Falcon] (rev 03)
04:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic
SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03)

During the backup the disk controler often don't responds megasasctl :
megasasctl -vvv
0 adapters, driver version 00000000

It is also impossible to see the SMART states of the disks. The disks
and volumes seems to function correctly and the backup continues.

On controller logs (when it responds) nothing special :

12/07/15 12:06:17: MfiCmdInitQueue[0]: FW now OPERATIONAL
12/07/15 12:06:17:     q.flags.mfaIs64Bits=0, q.flags.contextIs64Bits=0
12/07/15 12:06:17:     q.responseQueueEntries=1f,
responseQueueStatr=cd05d000
12/07/15 12:06:17:     q.producerIndexPtr=cd047000,
q.consumerIndexPtr=cd046000
12/07/15 12:06:17:     producerIndex=0
12/07/15 12:06:18: EVT#04149-12/07/15 12:06:18: 113=Unexpected sense:
Encl PD 20 Path 5d81f060d4822a00, CDB: 1c 01 07 00 20 00, Sense: 5/24/00
12/07/15 12:06:18: Raw Sense for PD 20: 70 00 05 00 00 00 00 0a 00 00 00
00 24 00 00 00 00 00
12/07/15 12:06:18: EVT#04150-12/07/15 12:06:18: 113=Unexpected sense:
Encl PD 20 Path 5d81f060d4822a00, CDB: 1c 01 07 00 20 00, Sense: 5/24/00
12/07/15 12:06:18: Raw Sense for PD 20: 70 00 05 00 00 00 00 0a 00 00 00
00 24 00 00 00 00 00
12/07/15 12:06:18: EVT#04151-12/07/15 12:06:18: 113=Unexpected sense:
Encl PD 20 Path 5d81f060d4822a00, CDB: 1c 01 07 00 20 00, Sense: 5/24/00
12/07/15 12:06:18: Raw Sense for PD 20: 70 00 05 00 00 00 00 0a 00 00 00
00 24 00 00 00 00 00
12/07/15 12:06:18: EVT#04152-12/07/15 12:06:18: 113=Unexpected sense:
Encl PD 20 Path 5d81f060d4822a00, CDB: 1c 01 07 00 20 00, Sense: 5/24/00
12/07/15 12:06:18: Raw Sense for PD 20: 70 00 05 00 00 00 00 0a 00 00 00
00 24 00 00 00 00 00
12/07/15 12:06:18: EVT#04153-12/07/15 12:06:18: 113=Unexpected sense:
Encl PD 20 Path 5d81f060d4822a00, CDB: 1c 01 07 00 20 00, Sense: 5/24/00
12/07/15 12:06:18: Raw Sense for PD 20: 70 00 05 00 00 00 00 0a 00 00 00
00 24 00 00 00 00 00
12/07/15 12:07:21: EVT#04154-12/07/15 12:07:21: 113=Unexpected sense:
Encl PD 20 Path 5d81f060d4822a00, CDB: 12 01 80 00 fe 00, Sense: 5/24/00
12/07/15 12:07:21: Raw Sense for PD 20: 70 00 05 00 00 00 00 0a 00 00 00
00 24 00 00 00 00 00
12/07/15 12:07:21: EVT#04155-12/07/15 12:07:21: 113=Unexpected sense:
Encl PD 20 Path 5d81f060d4822a00, CDB: 12 01 80 00 fe 00, Sense: 5/24/00
12/07/15 12:07:21: Raw Sense for PD 20: 70 00 05 00 00 00 00 0a 00 00 00
00 24 00 00 00 00 00
12/07/15 13:33:21: EVT#04156-12/07/15 13:33:21: 113=Unexpected sense:
Encl PD 20 Path 5d81f060d4822a00, CDB: 12 01 80 00 ff 00, Sense: 5/24/00
12/07/15 13:33:21: Raw Sense for PD 20: 70 00 05 00 00 00 00 0a 00 00 00
00 24 00 00 00 00 00
12/07/15 13:54:54: EVT#04157-12/07/15 13:54:54: 113=Unexpected sense:
Encl PD 20 Path 5d81f060d4822a00, CDB: 12 01 80 00 ff 00, Sense: 5/24/00
12/07/15 13:54:54: Raw Sense for PD 20: 70 00 05 00 00 00 00 0a 00 00 00
00 24 00 00 00 00 00
12/12/15  5:00:01: prDiskStart: starting Patrol Read on PD=00
12/12/15  5:00:01: prDiskStart: starting Patrol Read on PD=01
12/12/15  5:00:01: prDiskStart: starting Patrol Read on PD=02
12/12/15  5:00:01: prDiskStart: starting Patrol Read on PD=03
12/12/15  5:00:01: prDiskStart: starting Patrol Read on PD=04
12/12/15  5:00:01: EVT#04158-12/12/15  5:00:01:  39=Patrol Read started
12/12/15  8:09:00: prCallback: PR completed for pd=04
12/12/15  8:11:33: prCallback: PR completed for pd=03
12/12/15  8:15:22: prCallback: PR completed for pd=00
12/14/15  0:06:14: prCallback: PR completed for pd=01
12/14/15  0:11:39: prCallback: PR completed for pd=02
12/14/15  0:11:39: PR cycle complete
12/14/15  0:11:39: EVT#04159-12/14/15  0:11:39:  35=Patrol Read complete
12/14/15  0:11:39: Next PR scheduled to start at 12/19/15  5:00:01


On /var/log/kern.log I get :
Dec 16 10:53:51 theta1 kernel: [769640.032639] sdc: detected capacity
change from 500103667712 to 0
Dec 16 11:18:17 theta1 kernel: [771106.162471] scsi 1:0:2:0:
Sequential-Access HP       Ultrium 5-SCSI   Z5AD PQ: 0 ANSI: 6
Dec 16 11:18:17 theta1 kernel: [771106.162483] scsi 1:0:2:0: SSP:
handle(0x0009), sas_addr(0x500110a0014c8dd0), phy(3),
device_name(0xa0100150d28d4c01)
Dec 16 11:18:17 theta1 kernel: [771106.162488] scsi 1:0:2:0: SSP:
enclosure_logical_id(0x5141877047b73100), slot(4)
Dec 16 11:18:17 theta1 kernel: [771106.162494] scsi 1:0:2:0:
qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(7), cmd_que(1)
Dec 16 11:18:17 theta1 kernel: [771106.164855] scsi 1:0:2:0: TLR Enabled
Dec 16 11:18:17 theta1 kernel: [771106.167348] st 1:0:2:0: Attached scsi
tape st0
Dec 16 11:18:17 theta1 kernel: [771106.167354] st 1:0:2:0: st0: try
direct i/o: yes (alignment 4 B)
Dec 16 11:18:17 theta1 kernel: [771106.167585] st 1:0:2:0: Attached scsi
generic sg5 type 1
Dec 16 11:29:28 theta1 kernel: [771777.767773] st0: Block limits 1 -
16777215 bytes.
Dec 16 11:29:52 theta1 kernel: [771801.073028] scsi 0:0:0:0:
Direct-Access     ATA      ST91000640NS     AA0D PQ: 0 ANSI: 5
Dec 16 11:29:52 theta1 kernel: [771801.075372] scsi 0:0:1:0:
Direct-Access     ATA      ST91000640NS     AA09 PQ: 0 ANSI: 5
Dec 16 11:29:52 theta1 kernel: [771801.077619] scsi 0:0:2:0:
Direct-Access     ATA      ST91000640NS     AA0D PQ: 0 ANSI: 5
Dec 16 11:29:52 theta1 kernel: [771801.079936] scsi 0:0:3:0:
Direct-Access     ATA      ST91000640NS     AA63 PQ: 0 ANSI: 5
Dec 16 11:29:52 theta1 kernel: [771801.082074] scsi 0:0:4:0:
Direct-Access     ATA      ST91000640NS     AA63 PQ: 0 ANSI: 5
Dec 16 11:48:24 theta1 kernel: [772913.557061] failure at
/build/linux-Tvajqd/linux-3.2.68/drivers/scsi/mpt2sas/mpt2sas_ctl.c:770/_ctl_do_mpt_command()!
Dec 16 11:48:24 theta1 kernel: [772913.580024] failure at
/build/linux-Tvajqd/linux-3.2.68/drivers/scsi/mpt2sas/mpt2sas_ctl.c:770/_ctl_do_mpt_command()!
Dec 16 11:48:24 theta1 kernel: [772913.601978] failure at
/build/linux-Tvajqd/linux-3.2.68/drivers/scsi/mpt2sas/mpt2sas_ctl.c:770/_ctl_do_mpt_command()!
Dec 16 11:48:24 theta1 kernel: [772913.620034] failure at
/build/linux-Tvajqd/linux-3.2.68/drivers/scsi/mpt2sas/mpt2sas_ctl.c:770/_ctl_do_mpt_command()!

The ST91000640NS are the drives in RAIDs (+ a spare). This lines have
appaired when I have done a "echo "- - -" >/sys/class/scsi_host/host0/scan".
Failure in mpt2sas has appeared after the problem begins mpt2sas is the
driver of the SAS controller for the LTO drive, not the disks.

Thanks,



More information about the Linux-PowerEdge mailing list