Inconsistent OMSA vs Raidmon reporting of RAID status

Ben bda20 at cam.ac.uk
Mon Jan 8 06:49:17 CST 2007


Wotcha All,

I have a large number of PERC 4-based RAID systems which are all happily 
working away perfectly well at the moment.  They're all running Dell's OMSA 
4.5 as well as Dell's megamon (installed from Megamon-4.0-2.i386.rpm, 
containing MegaServ- and raidmon-named components) stuff.

I have the mega/raidmon thing set to email me (which it does, if you restart 
it occasionally to stop it failing to do so silently after a few months) 
whenever the machine is rebooted, a check consistency event happens or the 
battery goes into recharging.  You know, things like that.

Usually (when working), once a week on a Monday at 06:00, Raidmon emails me 
from each of the machines to say that machine's RAID system is in a check 
consistency state.  It mails me a few times giving percentages of 
completeness, etc.  This is fine and dandy.

Owing to not having restarted mega/raidmon on those machines in a while none 
of them mailed me this morning when my Big Brother/Hobbit RAID monitoring 
script which uses OMSA storage components to give a ton of useful 
information (which I'd just last week modified to give more information than 
it used to (let me know if you'd like a copy)) screamed that all of the PERC 
4-based machines were resynching their RAID arrays.

For a few minutes we were 'mildly curious' as to what might be happening.

It was only when I recalled that I'd not seen the emails for a few weeks and 
that all the machines were reporting this state as of 06:00 this morning 
(the time that the consistency check starts) that we felt a bit calmer. 
Call me strange but I don't call a consistency check the same as a RAID 
array "Resynching".  For me "Resynching" implies some kind of 
data-reconstruction.  I'm happy to be disabused of this assumption, though. 
Is this an artefact of the version of OMSA we're running on them or the 
PERCs' firmware level (521S) such that there's no entry for "Consistency 
Check"?

...

Actually now I take the time to look at 
/opt/dell/srvadmin/sm/mibs/dcstorag.mib (for OMSA 4.5 and 5.1) myself I see 
the following:

-- 1.3.6.1.4.1.674.10893.1.20.130.4.1.4
   arrayDiskState OBJECT-TYPE
     SYNTAX INTEGER
       {
[...]
        resynching(15),
[...]
     DESCRIPTION
     "The current condition of the array disk.
     Possible states:
[...]
     15: Resynching - Indicates one of the following types of disk operations: Transform Type, Reconfiguration, and Check Consistency.
[...]

And

-- 1.3.6.1.4.1.674.10893.1.20.140.1.1.4
   virtualDiskState OBJECT-TYPE
     SYNTAX INTEGER
       {
[...]
        resynching(15),
[...]
      DESCRIPTION
      "The current condition of this virtual disk (which includes any member array disks.)
      Possible states:
[...]
      15: Resynching
[...]

Which seems to indicate that even though the virtual disk was marked as 
"Resynching" it was in fact the array disks within those virtual disks which 
were in a different state.  Which is even more mildly confusing (given that 
my monitoring script _didn't_ have them marked as "Resynching" themselves, 
only the virtual disk), just not as worryingly so.

Either way, now that we know about this particular foible (we'd never 
noticed it before owing to getting the emails saying soothing things like 
"Check Consistency in progress" and "Check Consistency completed" and not 
seeing the OMSA details due to my script not being verbose enough) we can 
ignore the fact that every Monday our screens are going to go red for an 
hour or so.  Or I could modify my checking script to denote "Resynching" as 
a yellow warning, not a red panic.

So, yeah, just thought I'd mention this for a) those people who're using my 
script already and who (having downloaded the newer version and are using 
Raidmon to initiate consistency checking) may be in for a surprise and, b) 
the opportunity to have someone at Dell tell me more about this particular 
mode of behaviour/reporting and whether there are plans to modify firmwares 
to add in a value for "Checking Consistency" or similar.

And also to advertise my shonky attempt at scripting for Big Brother/Hobbit 
(-:

Ben
-- 
Unix Support, MISD, University of Cambridge, England
Plugger of wire, typer of keyboard, imparter of Clue
         Life Is Short.          It's All Good.



More information about the Linux-PowerEdge mailing list