RAID controller errors

Michael_Christensen2 at Dell.com Michael_Christensen2 at Dell.com
Fri Mar 7 01:45:22 CST 2008


> Your system is fine, the degraded state is caused by the driver being
> outdated:
> 
> Driver Version                    : 00.00.03.01
> Minimum Required Driver Version   : 00.00.03.13

Ok, cool.  That's exactly what Edmond said.  However, I'm still not sure

that I should do to fix this.  Is it okay to just leave it like this, or

should I downgrade either the firmware or OMSA so that it returns to a 
good state?

I don't like the idea of running this in a perpetual "Degraded" state, 
even if there's no actual problem at the time,  because if there IS a 
problem at some point in the future then I'll be less likely to notice. 
  It's like always driving around with your Check Engine light on.  :-)


Well it's up to you, but the "error" message isn't going anywhere,
unless up update the driver. 
Quite often you see these messages for a reason - the driver might not
be 100% compatible with the current controller firmware and you might be
missing out on some performance, a new feature or a fix, if you leave it
as is.


>> "The Status, State, and Rate lines are what worry me.  They just stay
at
>> this level.  The rebuild does not ever complete."
> 
> There is no rebuild going on. The rate entries refer to the amount of
> controller CPU power, is allocated to the specific tasks.

This is also matches up to what Edmond described, but I still don't 
really get how this works.  You say that these entries refer to the 
amount of controller CPU power allocated to the specific tasks, but I 
have two problems with that statement:

1. This didn't show up (as far as I'm aware) until AFTER it started 
reporting a Degraded state.  Prior to that, I'm pretty sure it reported 
everything at 0%.  In fact, I'm very confident of this, because if I saw

that a rebuild or something was happening (or even wrongly assumed it 
was), I never would've performed the PERC firmware update to begin with.

  So, this is definitely new behavior.


This is the default setting for the controller, so I guess you must have
missed it.
The Perc5/6 controller actually allows you to update the firmware, even
when the controller is in a degraded state (Perc2/3/4 did not)
This can actually be required when troubleshooting such things as
foreign logical drives or failed logical drives.


2. If 30% of available controller CPU is allocated to each task, and 
there are five tasks, then this comes up to 150% allocation.  Either my 
math is way off, or I'm just not understanding what you mean by 
allocated in this context (I'm leaning toward the later...).


The rates are set for different "jobs" that will never be performed
simultaneously.
The controller won't run a background initialization (BGI) when doing a
rebuild, so the controller will never be taxed more than 30% total.
You will always have 70% controller CPU for "normal" i/o (Unless you
change the setting that is)


I do very much appreciate the feedback on this.  In case it's not 
obvious, this is the first true server-class system that I'm responsible

for managing all the way down to the hardware level, and I'm learning a 
lot of this stuff as I go, so I just want to make sure that I have a 
clear picture of what's going on.


Trust me, you are waaaay ahead of the curve as you are looking into
these issues, before something goes wrong, which is sadly not the norm. 


//Michael



More information about the Linux-PowerEdge mailing list