Inconsistent OMSA vs Raidmon reporting of RAID status
Ben
bda20 at cam.ac.uk
Mon Jan 8 06:49:17 CST 2007
Wotcha All,
I have a large number of PERC 4-based RAID systems which are all happily
working away perfectly well at the moment. They're all running Dell's OMSA
4.5 as well as Dell's megamon (installed from Megamon-4.0-2.i386.rpm,
containing MegaServ- and raidmon-named components) stuff.
I have the mega/raidmon thing set to email me (which it does, if you restart
it occasionally to stop it failing to do so silently after a few months)
whenever the machine is rebooted, a check consistency event happens or the
battery goes into recharging. You know, things like that.
Usually (when working), once a week on a Monday at 06:00, Raidmon emails me
from each of the machines to say that machine's RAID system is in a check
consistency state. It mails me a few times giving percentages of
completeness, etc. This is fine and dandy.
Owing to not having restarted mega/raidmon on those machines in a while none
of them mailed me this morning when my Big Brother/Hobbit RAID monitoring
script which uses OMSA storage components to give a ton of useful
information (which I'd just last week modified to give more information than
it used to (let me know if you'd like a copy)) screamed that all of the PERC
4-based machines were resynching their RAID arrays.
For a few minutes we were 'mildly curious' as to what might be happening.
It was only when I recalled that I'd not seen the emails for a few weeks and
that all the machines were reporting this state as of 06:00 this morning
(the time that the consistency check starts) that we felt a bit calmer.
Call me strange but I don't call a consistency check the same as a RAID
array "Resynching". For me "Resynching" implies some kind of
data-reconstruction. I'm happy to be disabused of this assumption, though.
Is this an artefact of the version of OMSA we're running on them or the
PERCs' firmware level (521S) such that there's no entry for "Consistency
Check"?
...
Actually now I take the time to look at
/opt/dell/srvadmin/sm/mibs/dcstorag.mib (for OMSA 4.5 and 5.1) myself I see
the following:
-- 1.3.6.1.4.1.674.10893.1.20.130.4.1.4
arrayDiskState OBJECT-TYPE
SYNTAX INTEGER
{
[...]
resynching(15),
[...]
DESCRIPTION
"The current condition of the array disk.
Possible states:
[...]
15: Resynching - Indicates one of the following types of disk operations: Transform Type, Reconfiguration, and Check Consistency.
[...]
And
-- 1.3.6.1.4.1.674.10893.1.20.140.1.1.4
virtualDiskState OBJECT-TYPE
SYNTAX INTEGER
{
[...]
resynching(15),
[...]
DESCRIPTION
"The current condition of this virtual disk (which includes any member array disks.)
Possible states:
[...]
15: Resynching
[...]
Which seems to indicate that even though the virtual disk was marked as
"Resynching" it was in fact the array disks within those virtual disks which
were in a different state. Which is even more mildly confusing (given that
my monitoring script _didn't_ have them marked as "Resynching" themselves,
only the virtual disk), just not as worryingly so.
Either way, now that we know about this particular foible (we'd never
noticed it before owing to getting the emails saying soothing things like
"Check Consistency in progress" and "Check Consistency completed" and not
seeing the OMSA details due to my script not being verbose enough) we can
ignore the fact that every Monday our screens are going to go red for an
hour or so. Or I could modify my checking script to denote "Resynching" as
a yellow warning, not a red panic.
So, yeah, just thought I'd mention this for a) those people who're using my
script already and who (having downloaded the newer version and are using
Raidmon to initiate consistency checking) may be in for a surprise and, b)
the opportunity to have someone at Dell tell me more about this particular
mode of behaviour/reporting and whether there are plans to modify firmwares
to add in a value for "Checking Consistency" or similar.
And also to advertise my shonky attempt at scripting for Big Brother/Hobbit
(-:
Ben
--
Unix Support, MISD, University of Cambridge, England
Plugger of wire, typer of keyboard, imparter of Clue
Life Is Short. It's All Good.
More information about the Linux-PowerEdge
mailing list