Disk fault with LSI MPT-SAS & MegaRAID

Patrick_Fischer at Dell.com Patrick_Fischer at Dell.com
Wed Sep 30 01:20:43 CDT 2009



sorry it is not megacli what i mean, magalogr is the tool we use without open manage to gather the controller log. But it is not a test it is only a log which shows you the events which occur why the disk has failed. The log needs manually to be checked and is the best option to see why a disk has failed.


The file I sent to your e-mail address:

tar -xvzf MegaLogR_Lin_B.01.02.tar.gz


Command to gather the log:

MegaLogR -FwTermLog -Dsply -aALL > "/tmp/raidlog.txt" 


With megacli it should be similar and like this but I never view these logs:

Megacli –FwTermLog –Dsply –aALL > log.txt


Megaraid SAS 1068E => Perc (all percs have a log)

MPT Fusion SAS 1068E => SAS HBA (has no log)


Debian is not supported, that it is correct, but below is the company (many thanks) which provide OMSA for deb based systems:

Dear All,


SARA is proud to announce that they have recreated the Dell Open Mange Server Administrator package (version 5.5) for deb based systems.


More information can be found at https://subtrac.sara.nl/oss/omsa_2_deb <https://subtrac.sara.nl/oss/omsa_2_deb> 



Met vriendelijke groet, Kind Regards,





      - added a patch to dsm_om_connsvc for lenny and ubuntu that

          enables the webserver functionality again.

        Reported by: Lars Uffmann and christian at jetment dot com

        Applied by: Bas van der Vlies


      - added a path for dataeng for proper creation of dataeng lockfile

        Author: christian at jetment dot com

        Applied by: Bas van der Vlies



From: Henry-Nicolas Tourneur [mailto:hntourneur at mactelecom.com] 
Sent: 29 September 2009 16:06
To: Fischer, Patrick
Cc: linux-poweredge-Lists
Subject: RE: Disk fault with LSI MPT-SAS & MegaRAID



First, thank you for your reply.
With the R710, we are using another driver, the mpt_sas one.

You told me that megacli will declare a failure in case of bad lba, timeout or sense key problem.
Is it also true with the SAS1068E ?

I'm not using open manage and I don't think that Debian Lenny is part of your official supported OS.

The only thing is want is to be sure that either the megacli or the mpt-status tools will declare a failure
in realistic case (as you told : bad lba, timeout, sense key ...) and not only in "theorical cases" as hard drive removal.

Thank you for your help,


Le mardi 29 septembre 2009 à 14:50 +0100, Patrick_Fischer at Dell.com a écrit : 

If you know some scsi events the best is to check the controller log.
You can get the log with our open manage or with mega cli from lsi.
Here you can check for "fail" find the failed disk and check why it has
failed(in the log  some lines before). E.g. bad lba, timeout (non
critical), sense key.....
If it is a timeout or the disk is in state removed you can reseat it, if
a bad lba or a for you unknown sense key is the root cause mail it to
your local dell support for analyzing.
For open manage it is:
Omreport storage controller => to get the controller id
Omconfig storage controller action=exportlog controller="id" => to
gather the log
After 10 disks it is easy to read :-)
Another way is our diagnostic tool:
Offline => 32 Bit diag (for every dell server)
Online => online diagnostics for PE servers (only with supported OS)
If you need the links please tell me.
Both tools test the disks with some algorithms and say green for ok and
red for needs to be replaced (in most cases)
If you have the raid controller driver, its firmware and the disk
revision every time up to date you can nearly be sure that the disk
failure is a hw defect.
-----Original Message-----
From: linux-poweredge-bounces at lists.us.dell.com
[mailto:linux-poweredge-bounces at lists.us.dell.com] On Behalf Of
Henry-Nicolas Tourneur
Sent: 29 September 2009 14:52
To: linux-poweredge-Lists
Subject: Disk fault with LSI MPT-SAS & MegaRAID
I got a question about 2 raid controllers : the MegaRAID I got in
PowerEdge 2900 (Symbios Logic MegaRAID SAS 1078) and second, the Fusion
MPT-SAS I got in R710 (Symbios Logic SAS1068E PCI-Express).
The question is quite simple : what's the rule used to say that a drive
is faulty ? the test case is quite simple : I "hot remove" the disk but
this is not realistic. In the real life, I would like to know what are
the required conditions to declare a drive faulty (eg: more than x % of
bad sector ... don't know).
Does anybody have an idea on this ?
Henry-Nicolas Tourneur


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.us.dell.com/pipermail/linux-poweredge/attachments/20090930/b6919861/attachment-0001.htm 

More information about the Linux-PowerEdge mailing list