PERC3/Di failure workaround hypothesis

Sean Bruno - TELECOM sean.bruno at metro1.com
Sat May 22 16:12:01 CDT 2004


I am disabling the READ/WRITE caches on a machine right now.  I should
have the system online in a few hours.  The failures seem to happen in
about 3-4 days of use with AS 3.  I will report back on Wednesday with
results.

BTW, what will be the action here if this does indeed "work-around" this
failure?  Will there be a new version of the firmware for the 3Di?


On Sat, 2004-05-22 at 13:02, Matt Domsch wrote:
> On Sat, May 22, 2004 at 12:31:13PM -0700, Sean Bruno - TELECOM wrote:
> > O.k.  I have two PE2650's right now that are exhibiting this issue. 
> > Basically they run for a few days and then "poof" they hard lock(no
> > direct console, no logging).
> > 
> > They are still pingable, but unaccessible.  I can execute your test
> > procedures, but what types of feedback are you looking for?  
> 
> With the RAID read and write caches disabled via afacli as in my note
> Thursday, does the system still hard lock as you describe?  If not,
> great, let us know that after a few days where you might have expected
> it to fail.  If so, can you attach a serial console as in Friday
> night's note and send the output from that, as well as what time you
> think the system crashed, and what you may have been running at the
> time, including cron jobs.
>  
> > BTW, I am running both machines under RH AS 3, the two drives are in a
> > standard Raid 1 configuration.
> 
> OK, RAID1 seems to be the most likely to fail, so if the above causes
> it not to fail, then that would be good to know.  Basically, we're
> trying to make sure that the workaround (disabling the caches) does in
> fact solve everyone's failure case, and that there isn't another
> failure mode we haven't reproduced and root caused.
> 
> Thanks,
> Matt
-- 
Sean Bruno - TELECOM <sean.bruno at metro1.com>




More information about the Linux-PowerEdge mailing list