PERC3/Di failure workaround hypothesis
Sean Bruno - TELECOM
sean.bruno at metro1.com
Sat May 22 16:12:01 CDT 2004
I am disabling the READ/WRITE caches on a machine right now. I should
have the system online in a few hours. The failures seem to happen in
about 3-4 days of use with AS 3. I will report back on Wednesday with
BTW, what will be the action here if this does indeed "work-around" this
failure? Will there be a new version of the firmware for the 3Di?
On Sat, 2004-05-22 at 13:02, Matt Domsch wrote:
> On Sat, May 22, 2004 at 12:31:13PM -0700, Sean Bruno - TELECOM wrote:
> > O.k. I have two PE2650's right now that are exhibiting this issue.
> > Basically they run for a few days and then "poof" they hard lock(no
> > direct console, no logging).
> > They are still pingable, but unaccessible. I can execute your test
> > procedures, but what types of feedback are you looking for?
> With the RAID read and write caches disabled via afacli as in my note
> Thursday, does the system still hard lock as you describe? If not,
> great, let us know that after a few days where you might have expected
> it to fail. If so, can you attach a serial console as in Friday
> night's note and send the output from that, as well as what time you
> think the system crashed, and what you may have been running at the
> time, including cron jobs.
> > BTW, I am running both machines under RH AS 3, the two drives are in a
> > standard Raid 1 configuration.
> OK, RAID1 seems to be the most likely to fail, so if the above causes
> it not to fail, then that would be good to know. Basically, we're
> trying to make sure that the workaround (disabling the caches) does in
> fact solve everyone's failure case, and that there isn't another
> failure mode we haven't reproduced and root caused.
Sean Bruno - TELECOM <sean.bruno at metro1.com>
More information about the Linux-PowerEdge