2650 + new BIOS + 2.6.10-ac11 and it *still* crashes

Steve_Boley@Dell.com Steve_Boley at Dell.com
Wed Mar 16 19:26:31 CST 2005

One thing you can try is pull all drives from the backplane and go to
system setup and change from raid to scsi and reboot and go through the
loss of data questions.

Then reboot and change it back to raid answering the data loss questions
and at the end you have the F1 F2 no boot device, shove the drives back
in and do control alt delete and see if all the problems are gone.

This procedure should completely dump and clear the nvram of the 3di and
be much easier than initializing drives which can be quite deadly to
your array if not performed properly.

The loss of data questions are there because when the 3di does an
initial boot after being enabled, will find and perform an
initialization on every hard drive in the system and would result in
totally wacking and removing your containers.  This action without
drives performs the same nvram reset.
Steve Boley
SCSI Advanced Solutions Team
Dell Incorporated 

-----Original Message-----
From: linux-poweredge-bounces-Lists On Behalf Of Eberhard Moenkeberg
Sent: Wednesday, March 16, 2005 4:46 PM
To: linux-poweredge-Lists
Subject: Re: 2650 + new BIOS + 2.6.10-ac11 and it *still* crashes


On Wed, 16 Mar 2005, Mark Plaksin wrote:
> Craig Kelley <ckelley at ibnads.com> writes:
>> Mark Plaksin wrote:
>>     Craig Kelley <ckelley at ibnads.com> writes:
>>         Mark Plaksin wrote:

>>             Do people have problems with 2650s and 2.4 kernels?  With
2650s and RHEL?
>>         We have dozens of 2650's with RHEL3's 2.4 kernel running just
>>     Do these machines see a lot of IO?
>> Yes, they run backup processes that transfer gigabytes of information

>> nightly over the network.  Additionally, the compression takes place 
>> on the RAID-5 array, which is a lot of simultaneous read/write 
>> commands all grouped together.  All of this while running a 
>> full-scale RDBMS commercial package.  They vary from SMP to UP
systems.  Hyperthreading is unilaterally turned off.
> Fascinating!  You seem to be the only one with rock-solid systems.  
> Are you running the latest RHEL3 kernel?  Which version is it?  
> Besides turning HT off, have you turned off read and write caching as 
> Matt recommended a while back?
> Thanks for the feedback!

After my experience, only lots of simultaneous small writes will crash
the aacraid driver in a PE2650.

BTW: it seems that I can survive these situations now, surprisingly,
using the latest SUSE SLES-9 kernel with all controller caches ENABLED.
Only with ENABLED.

My guess is that the Perc3/Di hardware only seems to be a crap because
the driver is.

But the Dell support relevant case is: if the aacraid driver has
paniced, the PE2650 will not boot before you clear the "hidden" Perc3/Di
NVRAM by going into the controller BIOS and do an "initialize" (with any
other single disk). The onboard "NVRAM clear" jumper does not help.

So software can make hardware unusable, and if your hardware is unusable
even after PowerOff/On, you have the right to request a Dell technician.
Even Bronze support will present you a "reaction" the next working day
(should be at least a technician with a new mainboard).

Do it again and again. I needed those brave guys 7 times before the Dell
support swapped the whole machine (the first of about 5 suspicious, we
have about another 120 which never showed these symptoms).

Cheers -e
Eberhard Moenkeberg (emoenke at gwdg.de, em at kki.org)

Linux-PowerEdge mailing list
Linux-PowerEdge at dell.com
Please read the FAQ at http://lists.us.dell.com/faq

More information about the Linux-PowerEdge mailing list