Stability problems with PERC 3/Di on PE 1650 & kernel 2.6.12.6
Robert McQueen
robert.mcqueen at bluelinux.co.uk
Tue Aug 8 13:40:55 CDT 2006
I've got an SMP PowerEdge 1650 server running with a 2.6.12.6-xen
kernel. I have a PERC 3/Di RAID controller with three 36GB disks
attached to it, configured as a RAID 5 array. The controller BIOS is
flashed with the latest firmware (6098) I can find belonging to the PE
1650 on support.dell.com.
The controller prints the following at bootup:
Dell PowerEdge Expandable RAID Controller 3/Di, BIOS V2.8-1 [Build
6098]
(c) 1998-2003 Adaptec, Inc. All Rights Reserved.
Controller monitor V2.8-1[6098], Controller kernel V2.8-1[6098]
And the driver prints the following:
Red Hat/Adaptec aacraid driver (1.1.2-lk2 Mar 7 2006)
AAC0: kernel 2.8-1[6098]
AAC0: monitor 2.8-1[6098]
AAC0: bios 2.8-1[6098]
AAC0: serial 74d010d3
Intermittently (every week or two, possibly during high load, but I'm
never sure because I only get to look at the system after it's happened)
the following messages get printed:
aacraid: Host adapter reset request. SCSI hang ?
aacraid: SCSI bus appears hung
scsi: Device offlined - not ready after error recovery: host 0 channel 0
id 0 lun 0
SCSI error : <0 0 0 0> return code = 0x6000000
end_request: I/O error, dev sda, sector 7816370
After which all further IO requests fail, and this is printed:
scsi0 (0:0): rejecting I/O to offline device
Until I reboot the machine.
Googling suggests things to do with write caches, and updating the
firmware, and patches to the driver, but these all seem to be very old
posts and I don't know how relevant they are. Does anyone have some
current advice for stabilising this configuration, short of backing up
all the data, removing the RAID controller entirely and using software RAID?
Thanks,
Rob
--
Robert McQueen
Bluelinux Internet Services Ltd.
More information about the Linux-PowerEdge
mailing list