Fixing SCSI aacraid driver related FS crashes on Dell PowerEdge 2650 with Perc 3/Di runing kernel 2.6.8 in Debian stable ?
Olivier Berger
olivier.berger at int-evry.fr
Tue Jun 13 07:41:19 CDT 2006
Hi.
I've experienced again problems with the Perc 3/Di SCSI/RAID controller
on a PE 2650 running stock Debian stable's 2.6.8 (i686-smp) kernel...
I'd be happy to see others confronting their experiences with this
setup, cause I'm not sure at all I fixed all problems :(
I thought the machine was safe after I had upgraded to sarge and 2.6,
since I hadn't had any more crashes (also I had turned off the cache and
upgraded firmware before), until recently, when I had another FS crash,
but this time at least I had some copy of kernel messages moved to
another host on the network, which helped me google to the following
patch. :(
The last step I took was to patch the aacraid driver in Debian's 2.6.8
with aac-remove-handle-aif.patch... and it seems it worked so far...
Thanks in advance for your comments and suggestions.
Best regards,
------------------
Here's a copy of my post on my blog
(http://www-inf.int-evry.fr/~olberger/weblog/2006/06/13/fixing-scsi-aacraid-driver-related-fs-crashes-on-dell-poweredge-2650-with-perc-3di-runing-kernel-268-in-debian-stable
I’ve experienced random crashes of the file-system on a Dell server,
model PowerEdge 2650, with a Perc 3/Di SCSI controller, runninng a
Debian testing system with the standard 2.6.8 Debian kernel (i686+smp),
mainly during disk-intensive operations (for instance, I suspect such a
crash happened when amanda backup task were launched on the machine).
There have been numerous discussions on the linux-poweredge mailing-list
and many proposals for fixing this issue (see details on google).
The symptoms look like this :
Jun 9 20:52:58 myhost kernel: aacraid: Host adapter reset
request. SCSI hang ?
Jun 9 20:52:58 myhost kernel: aacraid: Host adapter reset
request. SCSI hang ?
Jun 9 20:52:58 myhost kernel: aacraid: SCSI bus appears hung
Jun 9 20:52:58 myhost kernel: aacraid: SCSI bus appears hung
Jun 9 20:52:58 myhost syslogd: /var/log/messages: Read-only file
system
Jun 9 20:52:58 myhost kernel: scsi: Device offlined - not ready
after error recovery: host 0 channel 0 id 0 lun 0
Jun 9 20:52:58 myhost kernel: SCSI error : <0 0 0 0> return code
= 0x6000000
Jun 9 20:52:58 myhost kernel: end_request: I/O error, dev sda,
sector 401836233
Jun 9 20:52:58 myhost kernel: scsi0 (0:0): rejecting I/O to
offline device
Jun 9 20:52:58 myhost kernel: scsi0 (0:0): rejecting I/O to
offline device
I think I have come closer than never to a solution, applying the
following steps :
1. upgrading the firmware of the Perc 3/Di controller : look at the
Dell site for the right version…
2. disabling the cache with afacli :
# afacli
open AFA0
AFA0 container set
cache /read_cache_enable=FALSE /write_cache_enable=FALSE
0
AFA0 container show cache 0
Executing: container show cache 0
Global Container Read Cache Size : 0
Global Container Write Cache Size : 118259712
Read Cache Setting : DISABLE
Write Cache Setting : DISABLE
Write Cache Status : Inactive, cache disabled
3. patching the 2.6.8 aacraid driver’s code with the following
patch : aac-remove-handle-aif.patch), to avoid tacking the
controller offline in some circumstances (see explanation in
this post :
http://marc.theaimsgroup.com/?l=linux-scsi&m=110252243627410&w=2).
1. get the kernel-source-2.6.8 package from stable
2. unpack it and apply patch
3. get the running (uname -r) kernel’s .config from /boot
and copy it to the /usr/src/kernel-source-2.6.7/
4. make-kpkg clean
5. make oldconfig
6. make-kpkg –append_to_version=patchaacremovehandleaif –
initrd kernel_image
7. install resulting kernel, and reboot
4. pray ;)
The machine had worked almost OK since it was in Debian’s 2.6.8 kernel
with cache disabled and firmware upgraded, but it finally crashed again…
I hope that the patch against aacraid driver will solve the issue.
--
Olivier BERGER <olivier.berger at int-evry.fr>
Ingénieur Recherche - Dept INF
INT Evry (http://www.int-evry.fr)
OpenPGP-Id: 1024D/6B829EEC
More information about the Linux-PowerEdge
mailing list