[Linux-PowerEdge] PERC H730 issues (extended LD support, span element misdetection)

Stephen Dowdy sdowdy at ucar.edu
Fri Sep 4 17:52:31 CDT 2015


I have a Poweredge T630 with PERC H730 Adapter running Debian Jessie (8.1)

​    ​
# lspci -mmnn | grep PERC
​    ​
03:00.0 "RAID bus controller [0104]" "LSI Logic / Symbios Logic [1000]"
"MegaRAID SAS-3 3108 [Invader] [005d]" -r02 "Dell [1028]" "PERC H730
Adapter [1f43]"
​It has 18 4TB SATA disks.​


​Using stock kernel (3.16.0-4-amd64) or backports kernel
(4.1.0-0.bpo.1-amd64/megaraid_sas=06.806.08.00-rc1) i see multiple 'dmesg'
statements of form:

    ​[    2.798743] megasas:span 0 rowDataSize 10

​RHEL has a paid/subscriber page on this that is not very helpful.
Basically, the claim is that this message indicates that the kernel and the
PERC firmware disagree on the #elements in a span.  The numbers are printed
as %x, so kernel thinks SPAN 0 has 16 elements in it, whereas, the system
has 18 4TB SATA drives in a single RAID6 disk group with 4 Logical Volumes
of various sizes.  It is unclear to me if there is any potential
catastrophic (data destructive) end-result from this discrepency, or any
performance related issues, or if it's just an informational message.

But that's not all ...

    perccli /c0 show events type=sincereboot filter=warning
    ...
    Event Description: Host driver needs to be upgraded to enable extended
LD support
    ...

​(this message appears in the 3.16 kernel, which is why i tried out the 4.1
kernel, in which the message does not appear.   I can't find any
documentation that defines exactly what "Extended LD support" for a
MegaRAID card actually entails, but it doesn't fix the span #elements
mis-detection issue.

FWIW, i used 'megacli 8.07.10' to create the DG and LDs while booted into a
network Debian FAI install.  (if anyone knows for sure that 'perccli
1.11.03' does something differently and would be required, let me know -- i
can rework my RAID6-ALL creation script to use perccli if needed)


I will be breaking up the 24TB logical volumes into < 16TB to see if these
messages may be coming from a sizing limit for LD's, but if anyone has any
clarity on this, i'd greatly appreciate it.  (i really don't want to get a
month in to find users writing 17th Terabyte find all their data corrupted,
etc)

    # perccli /c0 show health
    Controller = 0
-->?Status = Failure
    Description = None

    Controller Health Info :
    ======================
    TemperatureROC = 76
    TemperatureCtrl = 76
    Warranty Remaining = 100
    Overall Health = GOOD
    Reason Code = 0



    # megacli-overview
    [Adapter 0]
    ADP[0]=( PERC H730 Adapter),FWPkg( 25.3.0.0016),FWVer( 4.250.00-4402)
    BBU[0]=(Learning?=No, charge=98 %, status=Complete, isSOHGood=Yes)
    PD[32:0]=(TOSHIBA/MG03ACA400/FL1H/SerialNum) ME=0,OE=0,PF=0,FW=Online,
Spun Up,F=None
    PD[32:1]=(TOSHIBA/MG03ACA400/FL1H/SerialNum) ME=0,OE=0,PF=0,FW=Online,
Spun Up,F=None
    PD[32:2]=(TOSHIBA/MG03ACA400/FL1H/SerialNum) ME=0,OE=0,PF=0,FW=Online,
Spun Up,F=None
    PD[32:3]=(TOSHIBA/MG03ACA400/FL1H/SerialNum) ME=0,OE=0,PF=0,FW=Online,
Spun Up,F=None
    PD[32:4]=(TOSHIBA/MG03ACA400/FL1H/SerialNum) ME=0,OE=0,PF=0,FW=Online,
Spun Up,F=None
    PD[32:5]=(TOSHIBA/MG03ACA400/FL1H/SerialNum) ME=0,OE=0,PF=0,FW=Online,
Spun Up,F=None
    PD[32:6]=(TOSHIBA/MG03ACA400/FL1H/SerialNum) ME=0,OE=0,PF=0,FW=Online,
Spun Up,F=None
    PD[32:7]=(TOSHIBA/MG03ACA400/FL1H/SerialNum) ME=0,OE=0,PF=0,FW=Online,
Spun Up,F=None
    PD[32:8]=(TOSHIBA/MG03ACA400/FL1H/SerialNum) ME=0,OE=0,PF=0,FW=Online,
Spun Up,F=None
    PD[32:9]=(TOSHIBA/MG03ACA400/FL1H/SerialNum) ME=0,OE=0,PF=0,FW=Online,
Spun Up,F=None
    PD[32:10]=(TOSHIBA/MG03ACA400/FL1H/SerialNum) ME=0,OE=0,PF=0,FW=Online,
Spun Up,F=None
    PD[32:11]=(TOSHIBA/MG03ACA400/FL1H/SerialNum) ME=0,OE=0,PF=0,FW=Online,
Spun Up,F=None
    PD[32:12]=(TOSHIBA/MG03ACA400/FL1H/SerialNum) ME=0,OE=0,PF=0,FW=Online,
Spun Up,F=None
    PD[32:13]=(TOSHIBA/MG03ACA400/FL1H/SerialNum) ME=0,OE=0,PF=0,FW=Online,
Spun Up,F=None
    PD[32:14]=(TOSHIBA/MG03ACA400/FL1H/SerialNum) ME=0,OE=0,PF=0,FW=Online,
Spun Up,F=None
    PD[32:15]=(TOSHIBA/MG03ACA400/FL1H/SerialNum) ME=0,OE=0,PF=0,FW=Online,
Spun Up,F=None
    PD[32:16]=(TOSHIBA/MG03ACA400/FL1H/SerialNum) ME=0,OE=0,PF=0,FW=Online,
Spun Up,F=None
    PD[32:17]=(TOSHIBA/MG03ACA400/FL1H/SerialNum) ME=0,OE=0,PF=0,FW=Online,
Spun Up,F=None
    VD[0:0]=("OpSys",R(6,0,3),SZ= 250.0 GB,SS=128
KB,CP=(W=WB,R=ReadADP,IO=D),# 18,Optimal)
    VD[0:1]=("Data1",R(6,0,3),SZ= 24.448 TB,SS=128
KB,CP=(W=WB,R=ReadADP,IO=D),# 18,Optimal)
    VD[0:2]=("Data2",R(6,0,3),SZ= 24.448 TB,SS=128
KB,CP=(W=WB,R=ReadADP,IO=D),# 18,Optimal)
    VD[0:3]=("Data3",R(6,0,3),SZ= 9.069 TB,SS=128
KB,CP=(W=WB,R=ReadADP,IO=D),# 18,Optimal)



​thanks,
--stephen​

-- 
Stephen Dowdy  -  Systems Administrator  -  NCAR/RAL
303.497.2869   -  sdowdy at ucar.edu        -  http://www.ral.ucar.edu/~sdowdy/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.us.dell.com/pipermail/linux-poweredge/attachments/20150904/8f34aa4d/attachment.html 


More information about the Linux-PowerEdge mailing list