Info on M910 and H200 controller status misleading

Trond Hasle Amundsen t.h.amundsen at usit.uio.no
Mon Jul 26 12:44:49 CDT 2010


Gianluca Cecchi <gianluca.cecchi at gmail.com> writes:

> hello,
> a blade M910 with RH EL 5.5 x86_64 and 6.2 agents.
> Installed these:
> libsmbios-2.2.19-10.1.el5.i386.rpm
> libsmbios-2.2.19-10.1.el5.x86_64.rpm
> RPM-GPG-KEY-dell
> RPM-GPG-KEY-libsmbios
> smbios-utils-bin-2.2.19-10.1.el5.x86_64.rpm
> srvadmin-cm-6.2.0-677.i386.rpm
> srvadmin-deng-6.2.0-1.6.el5.i386.rpm
> srvadmin-fsa-6.2.0-1.6.el3.i386.rpm
> srvadmin-hapi-6.2.0-1.17.el5.i386.rpm
> srvadmin-isvc-6.2.0-1.16.el5.i386.rpm
> srvadmin-megalib-6.2.0-1.6.el3.i386.rpm
> srvadmin-omacore-6.2.0-1.18.el5.i386.rpm
> srvadmin-omcommon-6.2.0-1.19.el5.i386.rpm
> srvadmin-omilcore-6.2.0-1.9.el5.noarch.rpm
> srvadmin-smcommon-6.2.0-1.29.el5.i386.rpm
> srvadmin-storage-6.2.0-1.29.el5.i386.rpm
> srvadmin-storage-populator-6.2.0-1.25.el3.i386.rpm
> srvadmin-storelib-6.2.0-1.11.el3.i386.rpm
> srvadmin-storelib-libpci-6.2.0-1.1.el5.i386.rpm
> srvadmin-storelib-sysfs-6.2.0-1.1.el5.i386.rpm
> srvadmin-sysfsutils-6.2.0-2.1.el5.i386.rpm
> srvadmin-xmlsup-6.2.0-1.17.el5.i386.rpm
>
> I have this strange output when querying the controller status; state is degraded, but OK...
>
> # omreport storage controller
>  Controller  PERC H200 Integrated Modular (Embedded)
>
> Controllers
> ID                                            : 0
> Status                                        : Non-Critical
> Name                                          : PERC H200 Integrated Modular
> Slot ID                                       : Embedded
> State                                         : Degraded
> Firmware Version                              : 07.01.33.00
> Minimum Required Firmware Version             : Not Applicable
> Driver Version                                : 01.101.06.00
> Minimum Required Driver Version               : 02.00.00.00
> Storport Driver Version                       : Not Applicable
> Minimum Required Storport Driver Version      : Not Applicable
> Number of Connectors                          : 1
> Rebuild Rate                                  : 50%
> BGI Rate                                      : 50%
> Check Consistency Rate                        : 50%
> Reconstruct Rate                              : Not Applicable
> Alarm State                                   : Not Applicable
> Cluster Mode                                  : Not Applicable
> SCSI Initiator ID                             : Not Applicable
> Cache Memory Size                             : Not Applicable
> Patrol Read Mode                              : Not Applicable
> Patrol Read State                             : Not Applicable
> Patrol Read Rate                              : Not Applicable
> Patrol Read Iterations                        : Not Applicable
> Abort check consistency on error              : Not Applicable
> Allow Revertible Hot Spare and Replace Member : Not Applicable
> Auto replace member on predictive failure     : Not Applicable
> Load balance                                  : Not Applicable
> Security Capable                              : Not Applicable
> Security Key Present                          : Not Applicable
> Redundant Path view                           : Not Applicable
>
> using check_openmanage from
> http://folk.uio.no/trondham/software/check_openmanage.html
> with these options
>
> /usr/lib64/nagios/plugins/check_openmanage -o 0 --blacklist ctrl_driver=0/ctrl_stdr=0 -d
>
>    System:      PowerEdge M910
>    ServiceTag:                                  OMSA version:    6.2.0
>    BIOS/date:   1.1.7 05/25/2010         Plugin version:  3.5.10
> -----------------------------------------------------------------------------
>    Storage Components                                                       
> =============================================================================
>   STATE  |    ID    |  MESSAGE TEXT                                         
> ---------+----------+--------------------------------------------------------
>       OK |        0 | Controller 0 [PERC H200 Integrated Modular] is Degraded
>       OK |  0:0:0:0 | Physical Disk 0:0:0 [SAS-HDD 146GB] on ctrl 0 is Online
>       OK |  0:0:0:1 | Physical Disk 0:0:1 [SAS-HDD 146GB] on ctrl 0 is Online
>       OK |      0:0 | Logical Drive '/dev/sda' [RAID-1, 136.13 GB] is Ready
>       OK |      0:0 | Connector 0 [SAS Port RAID Mode] on controller 0 is Ready
>       OK |    0:0:0 | Enclosure 0:0:0 [Backplane] on controller 0 is Ready
> -----------------------------------------------------------------------------
>
> ....
>
> In general the standard check (without -d but with the blacklist option, as the driver release
> in RH EL 5 is a little behind the reccomended..) returns ok.
>
> So the question is
> is it ok or not?

Both... see below.

> Entering directly into BIOS for the controller (Ctrl-C) gives Optimal as state....

In the BIOS, there is no driver. The reason behind the the degraded
state in the OS is not present in the BIOS.

> Is this a bug related with the text output....?

The controller is degraded because the driver is too old.

In the debug output, you'll see that the controller is "Degraded", and
that this is OK. This is because check_openmanage will rather report the
reason behind the degraded state. I can see how this can be confusing,
but the plugin does this to avoid spamming the user. There can be
different reasons behind the degraded state, and there can be more than
one at the same time, for example it can be any or all of:

  - out of date driver
  - out of date firmware
  - out of date storport driver

The confusion os also a consequence of using blacklisting with the
Nagios plugin. In your case:

  - the driver is out of date
  - you have blacklisted this feature in the plugin

If the 'ctrl_driver' blacklisting keyword is used, and the _only_ thing
that is "wrong" with the controller (i.e. why it is degraded), the
plugin will return OK for the controller and the out-of-date driver
alert is suppressed. You are also using the 'ctrl_stdr' keyword, so both
the driver and storport driver can be out of date without the plugin
giving an alert. Storport driver is a Windows only thing.

Try using the plugin without blacklisting, and you'll see that it
reports the out-of-date driver.

As I understand it, OMSA will put the controller in a degraded state if
it knows that there is a newer driver/firmware version available. It
does not mean that there is something wrong with the controller or even
that your current driver contains dangerous bugs. It only means that
there is a newer version and available, and this is Dell's way of
telling you that you should upgrade.

The blacklisting feature is there if upgrading is not an option.

Hope this helps :)

PS. The argument to the '-o' option can be any integer, but only 1
(default), 2 and 3 have any effect. The '-o 0' option in you example
have no effect.

Cheers,
-- 
Trond H. Amundsen <t.h.amundsen at usit.uio.no>
Center for Information Technology Services, University of Oslo




More information about the Linux-PowerEdge mailing list