[Linux-PowerEdge] OMSA says failed drive but LED is solid green and LCD displays no error

cupertino cupertino at gmx.net
Fri Sep 5 13:11:16 CDT 2014


it's seems your are affected by the issue I mentioned yesterday. OMSS
keeps crashing after a while. unfortunately I wasn't able to find the
thread. maybe someone else on this list has details... 
you could restart OMSA by a cronjob regularly as a workaround. 
I also talked to a friend about it and he thinks the issue was linked to
a failed OMSA installation/update and solution was to uninstall OMSA by
using srvadmin-uninstall.sh and then do a fresh install (no update) from
repo or tar package from Dell's website. no guarantee for that, but
could be worth a try. 

On Fri, 2014-09-05 at 13:56 -0400, Jeff White wrote:
> As I thought drive 1 did go back to "failure predicted" after a few 
> minutes.  I replaced the drive but I had to restart OMSA again to see 
> the rebuild as it kept reporting the old drive was still in the array.
> 
> Jeff White - GNU+Linux Systems Administrator
> University of Pittsburgh - CSSD
> 
> On 09/05/2014 01:27 PM, cupertino wrote:
> > no, it's fine this way, because both drives are not failed or yet for
> > drive 1. in OMSA there is a column 'failure predicted' (maybe you'll
> > have to switch to 'full view') it should say 'yes' for drive 1.
> > drive 0 is fine and drive 1 should be replaced soon, because it's going
> > to fail.
> >
> >
> > On Fri, 2014-09-05 at 13:21 -0400, Jeff White wrote:
> >> Both drives are listed as OK according to that tool:
> >>
> >> [root at compute48-0-1 sas2ircu_linux_x86_rel]# ./sas2ircu 0 DISPLAY | grep
> >> State
> >>     State                                   : Optimal (OPT)
> >>     State                                   : Optimal (OPT)
> >>     State                                   : Standby (SBY)
> >>
> >>   > (should be amber green off)
> >>
> >> Drive 1 is green-amber-off-repeat, drive 0 is green.
> >>
> >> I restarted OMSA and now it reports both drives as OK.  Drive 1 still is
> >> blinking green-amber-off and I suspect OMSA just needs to wait for more
> >> errors before it calls it "failure predicted" again.
> >>
> >> Jeff White - GNU+Linux Systems Administrator
> >> University of Pittsburgh - CSSD
> >>
> >> On 09/04/2014 01:22 PM, cupertino wrote:
> >>> OK, it's a H200 or SAS6/iR then. won't have any luck with MegaCLI, but
> >>> you could try sas2ircu http://www.lsi.com/downloads/Public/Host%20Bus%
> >>> 20Adapters/Host%20Bus%20Adapters%20Common%
> >>> 20Files/SAS_SATA_6G_P12/SAS2IRCU_User_Guide.pdf
> >>> smartmontools should at least be able to confirm the predictive error.
> >>> if all that fails I would trust the LED state over OMSA, because it is
> >>> controlled by RAID controller. but you mentioned the predictive drive is
> >>> amber (should be amber green off).
> >>> also remember a thread about OMSA, CentOS and H200 where OMSA storage
> >>> service crashed and didn't provide any good info anymore. unfortunately
> >>> I can't remember the details nor the solution, but maybe a restart of
> >>> OMSA service will help to get a recent disk state.
> >>>
> >>> On Thu, 2014-09-04 at 13:01 -0400, Jeff White wrote:
> >>>> Doesn't look like MegaCLI will be useful here:
> >>>>
> >>>>
> >>>> [root at compute48-0-1 ~]# lspci | grep LSI
> >>>> 05:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic
> >>>> SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 02)
> >>>>
> >>>> [root at compute48-0-1 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -PDList -aALL
> >>>> Exit Code: 0x00
> >>>>
> >>>> Jeff White - GNU+Linux Systems Administrator
> >>>> University of Pittsburgh - CSSD
> >>>>
> >>>> On 09/04/2014 11:43 AM, cupertino wrote:
> >>>>> if there is any LSI MegaRAID controller is involved you could pull a
> >>>>> controller log using MegaCLI command, which is usually the most reliable
> >>>>> source to analyze drive errors.
> >>>>> I wouldn't worry about the LCD. on R815 (and alike) systems it is not
> >>>>> very reliably when it comes to drive errors. since the controller has no
> >>>>> direct channel to talk to BMC (like in newer systems) LCD should only
> >>>>> report if backplane is involved. predictive errors shouldn't be reported
> >>>>> there at all.
> >>>>>
> >>>>>
> >>>>> On Thu, 2014-09-04 at 09:26 -0400, Jeff White wrote:
> >>>>>> I have a PE R815 that has this issue:
> >>>>>>
> >>>>>> * OMSA says drive 0:0:0 has failed but the drive physically labelled as
> >>>>>> 0 has a green LED
> >>>>>>
> >>>>>> * OMSA says drive 0:0:1 has a "failure predicted" status and has a
> >>>>>> blinking amber LED
> >>>>>>
> >>>>>> * The LCD on the front of the system shows no errors
> >>>>>>
> >>>>>> Therefore, I am confused.  When I called Dell (they at least used to
> >>>>>> help a little with OMSA even with an out of warranty system) I get "this
> >>>>>> office is currently closed" despite the Web site saying it is 24/7.
> >>>>>> Additionally when I try to get the latest version of OMSA I just get a
> >>>>>> 404
> >>>>>> (https://linux.dell.com/repo/hardware/latest/platform_independent/rh60_64)
> >>>>>> ... thanks Dell, very helpful.
> >>>>>>
> >>>>>> So:
> >>>>>>
> >>>>>> 1. Can anyone provide the correct link to the latest OMSA repo or is it
> >>>>>> down?
> >>>>>>
> >>>>>> 2. Has anyone seen the LCD not display and error yet the LED of a drive
> >>>>>> and OMSA both show something is wrong?
> >>>>>>
> >>>>>> 3. I have seen the LCD and OMSA disagree about PSU problems on R815s
> >>>>>> several times in the past but never with disk drives.  With the PSUs
> >>>>>> Dell pretty much said "OMSA is right the PSU is fine, the LCD is
> >>>>>> controlled by the BMC which is confused, swap your PSUs to fix it".  In
> >>>>>> reality another so called "fix" seems to be to lazily ignore it for a
> >>>>>> few months and the LCD magically decides everything is fine and goes
> >>>>>> back to being blue with no error.
> >>>>>>
> >>>>>> Once I can get to the repo I will update OMSA and as always update
> >>>>>> firmware but I would like to know if others have seen this behaviour.
> >>>>>>
> >>>>>
> >>>>>
> >>>
> >>>
> >
> >




More information about the Linux-PowerEdge mailing list