[Linux-PowerEdge] OMSA says failed drive but LED is solid green and LCD displays no error

cupertino cupertino at gmx.net
Thu Sep 4 12:22:22 CDT 2014


OK, it's a H200 or SAS6/iR then. won't have any luck with MegaCLI, but
you could try sas2ircu http://www.lsi.com/downloads/Public/Host%20Bus%
20Adapters/Host%20Bus%20Adapters%20Common%
20Files/SAS_SATA_6G_P12/SAS2IRCU_User_Guide.pdf 
smartmontools should at least be able to confirm the predictive error. 
if all that fails I would trust the LED state over OMSA, because it is
controlled by RAID controller. but you mentioned the predictive drive is
amber (should be amber green off). 
also remember a thread about OMSA, CentOS and H200 where OMSA storage
service crashed and didn't provide any good info anymore. unfortunately
I can't remember the details nor the solution, but maybe a restart of
OMSA service will help to get a recent disk state. 

On Thu, 2014-09-04 at 13:01 -0400, Jeff White wrote:
> Doesn't look like MegaCLI will be useful here:
> 
> 
> [root at compute48-0-1 ~]# lspci | grep LSI
> 05:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic 
> SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 02)
> 
> [root at compute48-0-1 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -PDList -aALL
> Exit Code: 0x00
> 
> Jeff White - GNU+Linux Systems Administrator
> University of Pittsburgh - CSSD
> 
> On 09/04/2014 11:43 AM, cupertino wrote:
> > if there is any LSI MegaRAID controller is involved you could pull a
> > controller log using MegaCLI command, which is usually the most reliable
> > source to analyze drive errors.
> > I wouldn't worry about the LCD. on R815 (and alike) systems it is not
> > very reliably when it comes to drive errors. since the controller has no
> > direct channel to talk to BMC (like in newer systems) LCD should only
> > report if backplane is involved. predictive errors shouldn't be reported
> > there at all.
> >
> >
> > On Thu, 2014-09-04 at 09:26 -0400, Jeff White wrote:
> >> I have a PE R815 that has this issue:
> >>
> >> * OMSA says drive 0:0:0 has failed but the drive physically labelled as
> >> 0 has a green LED
> >>
> >> * OMSA says drive 0:0:1 has a "failure predicted" status and has a
> >> blinking amber LED
> >>
> >> * The LCD on the front of the system shows no errors
> >>
> >> Therefore, I am confused.  When I called Dell (they at least used to
> >> help a little with OMSA even with an out of warranty system) I get "this
> >> office is currently closed" despite the Web site saying it is 24/7.
> >> Additionally when I try to get the latest version of OMSA I just get a
> >> 404
> >> (https://linux.dell.com/repo/hardware/latest/platform_independent/rh60_64)
> >> ... thanks Dell, very helpful.
> >>
> >> So:
> >>
> >> 1. Can anyone provide the correct link to the latest OMSA repo or is it
> >> down?
> >>
> >> 2. Has anyone seen the LCD not display and error yet the LED of a drive
> >> and OMSA both show something is wrong?
> >>
> >> 3. I have seen the LCD and OMSA disagree about PSU problems on R815s
> >> several times in the past but never with disk drives.  With the PSUs
> >> Dell pretty much said "OMSA is right the PSU is fine, the LCD is
> >> controlled by the BMC which is confused, swap your PSUs to fix it".  In
> >> reality another so called "fix" seems to be to lazily ignore it for a
> >> few months and the LCD magically decides everything is fine and goes
> >> back to being blue with no error.
> >>
> >> Once I can get to the repo I will update OMSA and as always update
> >> firmware but I would like to know if others have seen this behaviour.
> >>
> >
> >




More information about the Linux-PowerEdge mailing list