2650 + new BIOS + 2.6.10-ac11 and it *still* crashes

Eberhard Moenkeberg emoenke at gwdg.de
Wed Mar 16 16:46:08 CST 2005


Hi,

On Wed, 16 Mar 2005, Mark Plaksin wrote:
> Craig Kelley <ckelley at ibnads.com> writes:
>> Mark Plaksin wrote:
>>     Craig Kelley <ckelley at ibnads.com> writes:
>>         Mark Plaksin wrote:

>>             Do people have problems with 2650s and 2.4 kernels?  With 2650s and RHEL?
>>
>>         We have dozens of 2650's with RHEL3's 2.4 kernel running just fine.
>>
>>     Do these machines see a lot of IO?
>>
>> Yes, they run backup processes that transfer gigabytes of information nightly over
>> the network.  Additionally, the compression takes place on the RAID-5 array, which
>> is a lot of simultaneous read/write commands all grouped together.  All of this
>> while running a full-scale RDBMS commercial package.  They vary from SMP to UP
>> systems.  Hyperthreading is unilaterally turned off.
>
> Fascinating!  You seem to be the only one with rock-solid systems.  Are you
> running the latest RHEL3 kernel?  Which version is it?  Besides turning HT
> off, have you turned off read and write caching as Matt recommended a while
> back?
>
> Thanks for the feedback!

After my experience, only lots of simultaneous small writes will crash the 
aacraid driver in a PE2650.

BTW: it seems that I can survive these situations now, surprisingly, 
using the latest SUSE SLES-9 kernel with all controller caches ENABLED.
Only with ENABLED.

My guess is that the Perc3/Di hardware only seems to be a crap because the 
driver is.

But the Dell support relevant case is: if the aacraid driver has paniced, 
the PE2650 will not boot before you clear the "hidden" Perc3/Di NVRAM by 
going into the controller BIOS and do an "initialize" (with any other 
single disk). The onboard "NVRAM clear" jumper does not help.

So software can make hardware unusable, and if your hardware is unusable 
even after PowerOff/On, you have the right to request a Dell technician.
Even Bronze support will present you a "reaction" the next working day
(should be at least a technician with a new mainboard).

Do it again and again. I needed those brave guys 7 times before the Dell 
support swapped the whole machine (the first of about 5 suspicious, we 
have about another 120 which never showed these symptoms).

Cheers -e
-- 
Eberhard Moenkeberg (emoenke at gwdg.de, em at kki.org)




More information about the Linux-PowerEdge mailing list