kernel: aacraid: Host adapter reset request. SCSI hang ?

Stefano Turolla sturolla at eso.org
Wed Aug 27 05:28:00 CDT 2003


Hello,
we have the same problem with poweredge 1650 and 2650.
We tried different versions of kernel and redhat releases (7.3 and 9)
kernel tried 
2.4.18-18.7.x
2.4.18-24.7.x
2.4.18-26.7.x
2.4.20-13.7
2.4.20-19.7
2.4.21.ac2-rc2
As a workaround i removed raid controller form some 1650 and re-install
the machine with only scsi interface connected. We didn't have any more crash in
the last month!

Besides, for most of our machines (1650) disabling the hyperthreading has no sense
as they have one cpu (pentium III from 1.4 to 1.7 GHz) with no hyperthreading,
of course.
On the other hand we have other 2650 that are running since 2 or three moths
without problems, some of them with hyperthreading disabled.
A couple of other 2650 had several crashes whene they were used as ftp server.
I don't know what it really means but it seems something not really related to
hyperthreading, but only to a high substained i/o   

On Fri, 2003-08-22 at 13:07, Javier Rodriguez wrote:
> Hello,
>  
> Does anyone developing the aacraid driver have an update regarding the
> problem below? Disabling HyperThreading (Logical Processor) within the
> Dell 2650 BIOS has without a doubt circumvented the problem for us (as
> well as a few others), but it would be nice to reenable the feature.
>  
> For reference, with HyperThreading disabled, we've been able to
> successfully execute Red Hat's distribution of Linux kernel-2.4.20-9,
> kernel-smp-2.4.20-9, kernel-2.4.20-13.9, kernel-smp-2.4.20-13.9,
> kernel-2.4.20-18.9 and kernel-smp-2.4.20-18.9. We currently have two
> Dell 2650s executing kernel-smp-2.4.20-18.9 for 70 days without
> incident. Prior to disabling HyperThreading, our systems would
> normally crash within 24 hours (no longer than 48 hours) with both the
> smp and non-smp version of the kernel.
>  
> Thanks,
> Javier
>         -----Original Message-----
>         From: linux-aacraid-devel-admin at dell.com
>         [mailto:linux-aacraid-devel-admin at dell.com] On Behalf Of
>         Javier Rodriguez
>         Sent: Saturday, May 31, 2003 7:43 PM
>         To: linux-aacraid-devel at dell.com
>         Subject: kernel: aacraid: Host adapter reset request. SCSI
>         hang ?
>         
>         
>         Hello,
>          
>         We recently purchased two Dell PowerEdge 2650 servers with
>         PERC3/Di controllers. Both servers are executing RedHat Linux
>         9.0. On both servers we are encountering the following error:
>          
>         <<< Portion of server message log >>>
>         May 31 16:14:07 server1 kernel: aacraid: Host adapter reset
>         request. SCSI hang ?
>         May 31 16:14:17 server1 kernel: scsi: device set offline -
>         command error recover failed: host 0 channel 0 id 0 lun 0
>         May 31 16:14:17 server1 kernel: SCSI disk error : host 0
>         channel 0 id 0 lun 0 return code = 6000000
>         May 31 16:14:17 server1 kernel:  I/O error: dev 08:03, sector
>         83200
>         May 31 16:14:17 server1 kernel:  I/O error: dev 08:03, sector
>         13568
>         May 31 16:14:17 server1 kernel:  I/O error: dev 08:03, sector
>         13616
>         May 31 16:14:17 server1 kernel:  I/O error: dev 08:03, sector
>         83200
>         May 31 16:14:17 server1 kernel:  I/O error: dev 08:03, sector
>         22030904
>         May 31 16:14:17 server1 kernel:  I/O error: dev 08:03, sector
>         88348712
>         May 31 16:14:17 server1 kernel:  I/O error: dev 08:03, sector
>         72976
>         May 31 16:14:17 server1 kernel:  I/O error: dev 08:03, sector
>         13624
>         May 31 16:14:17 server1 kernel:  I/O error: dev 08:03, sector
>         13752
>         May 31 16:14:17 server1 kernel:  I/O error: dev 08:03, sector
>         13768
>         May 31 16:14:17 server1 kernel:  I/O error: dev 08:03, sector
>         72976
>         <<< I/O error messages continue until the server is rebooted
>         >>>
>          
>          
>         Here are a few notes regarding the error and operating
>         environment:
>          
>         - The error occurs with RedHat's kernel RPMs
>         kernel-smp-2.4.20-9 and kernel-smp-2.4.20-13.9. As of today,
>         we are testing kernel-2.4.20-9 to determine if the problem
>         occurs under a non-smp environment.
>         - The time between failures varies from several hours to
>         several days.
>         - The failures occur both during light and heavy system loads.
>         - PowerEdge 2650 BIOS is at 1.10 A10
>         - Backplane firmware is at 1.01
>         - PERC3/Di BIOS is at V2.7-1 (build 3170)
>         - A full system diagnostics has been successfully executed on
>         both servers.
>         - The RAID media has been successfully 'verified' on both
>         servers.
>          
>         Thank you in advance for your assistance in helping to get
>         this problem resolved.
>          
>         Javier
>          
>          
>         JLR Consulting, PO Box 638, Bernville, PA 19506-0638
>         mailto:jlr at jlrconsulting.com
>          
-- 
+------+---------+--------+--------+--------+---------+--------+-------+
| Stefano Turolla                             Phone : +49 89 32006537  |
| UNIX System Manager                         Fax   : +49 89 32006380  |
| European Southern Observatory (ESO):        E-Mail: sturolla at eso.org |
| Karl-Schwarzschild-strasse 2 D-85748 Garching bei Muenchen           |
+------+---------+--------+--------+--------+---------+--------+-------+
Computers are like airconditioners ,
they stop working properly if you open WINDOWS





More information about the Linux-PowerEdge mailing list