PERC3/Di failure workaround hypothesis

Matt Domsch Matt_Domsch at dell.com
Sat May 22 15:03:00 CDT 2004


On Sat, May 22, 2004 at 12:31:13PM -0700, Sean Bruno - TELECOM wrote:
> O.k.  I have two PE2650's right now that are exhibiting this issue. 
> Basically they run for a few days and then "poof" they hard lock(no
> direct console, no logging).
> 
> They are still pingable, but unaccessible.  I can execute your test
> procedures, but what types of feedback are you looking for?  

With the RAID read and write caches disabled via afacli as in my note
Thursday, does the system still hard lock as you describe?  If not,
great, let us know that after a few days where you might have expected
it to fail.  If so, can you attach a serial console as in Friday
night's note and send the output from that, as well as what time you
think the system crashed, and what you may have been running at the
time, including cron jobs.
 
> BTW, I am running both machines under RH AS 3, the two drives are in a
> standard Raid 1 configuration.

OK, RAID1 seems to be the most likely to fail, so if the above causes
it not to fail, then that would be good to know.  Basically, we're
trying to make sure that the workaround (disabling the caches) does in
fact solve everyone's failure case, and that there isn't another
failure mode we haven't reproduced and root caused.

Thanks,
Matt

-- 
Matt Domsch
Sr. Software Engineer, Lead Engineer
Dell Linux Solutions linux.dell.com & www.dell.com/linux
Linux on Dell mailing lists @ http://lists.us.dell.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.us.dell.com/pipermail/linux-poweredge/attachments/20040522/32eedf17/attachment.bin


More information about the Linux-PowerEdge mailing list