PE2650 / Perc 3Di crash

James Bourne jbourne at mtroyal.ab.ca
Wed Aug 6 16:25:15 CDT 2003


On Wed, 6 Aug 2003, Salyzyn, Mark wrote:

> :-( Sorry for the false hope.
> 
> I've tried several experiments on this list thus far and I must thank you
> for permitting me this indulgence. It is a symptom of my failure to
> reproduce this issue.
> 
> It is rather hard to encapsulate any science when blind alleys go
> unreported, so bear with me on my understanding and please correct me if I
> am mistaken. Your movement to 2.4.22-pre10 does absorb some of the latest
> driver changes too, so I am left wondering what is next.

FYI, it's almost 24 hours and I have't received any timeouts even.  This
is using the 2.4.22-pre10 aacraid driver and AAC_NUM_IO_FIB.  I think that 
Andy was using the 2.4.21 which does have some differences...   I am
also currently running with the write cache on.

Though again I'm waiting until it does get timeouts.  If timeouts happen
then the server will crash eventually.

> It appears that turning Adapter Caching off (write through mode) resolves
> the crashes, perhaps an acceptable fix but trades performance. About a year
> ago it was discovered that a high level of outstanding commands (this
> AAC_NUM_IO_FIB issue) would also exasperate the controller, this also trades
> performance. There was mention of turning ACPI off also, I am unclear if
> that provided any joy to some of the members. Another thread based on
> Drives, and their Firmware revisions is starting as well and is not
> necessarily a blind alley. I know there were a few bugs in the driver
> discovered over the past 4 months that could also be the root of these
> problems when it comes to various corners of error recovery.
> 
> It is my opinion that there is no pristine bug, requiring some science to
> get to the root.
> 
> I look forward to any new clues to this class of issues.

Regards
James Bourne

> Sincerely -- Mark Salyzyn
> 
> -----Original Message-----
> From: Andy De Petter [mailto:adepette at skybel.net] 
> Sent: Wednesday, August 06, 2003 5:07 AM
> To: James Bourne
> Cc: Salyzyn, Mark; 'Matthias Pigulla'; linux-poweredge at dell.com;
> linux-aacraid-devel at dell.com; matt_domsch at dell.com
> Subject: Re: PE2650 / Perc 3Di crash
> 
> James Bourne wrote:
> 
> >On Tue, 5 Aug 2003, Salyzyn, Mark wrote:
> >  
> >Hi,
> >I've updated my 2.4.20 to aacraid driver from 2.4.22-pre10 and changed
> >AAC_NUM_IO_FIB to 64 in this build.  I'll also turn write caching back on
> >in the controller as we know that is required to cause a crash (or
> >at least I've not been able to crash the crontroller/driver without it on).
> >
> >Regards
> >James Bourne
> >  
> >
> 
> Running 2.4.21, with AAC_NUM_IO_FIB set to 64, adn write cache disabled, 
> didn't help here - crash within 2 hours ;-/
> 
> -a
> 
> 

-- 
James Bourne, Supervisor Data Centre Operations
Mount Royal College, Calgary, AB, CA
www.mtroyal.ab.ca

"There are only 10 types of people in this world: those who
understand binary and those who don't."

*****************************************************************************

This  communication  is intended for the use  of the recipient to which it is
addressed,  and  may  contain  confidential,  personal,  and   or  privileged
information.  Please  contact  the  sender  immediately  if  you  are not the
intended recipient of this  communication, and  do not  copy, distribute,  or
take action relying on it. Any communication received in error, or subsequent
reply, should be deleted or destroyed.

*****************************************************************************




More information about the Linux-PowerEdge mailing list