PE 2650 Perc3/DI aacraid Kernel 2.6

Dominik L. Borkowski dom at vbi.vt.edu
Thu Jul 7 14:30:09 CDT 2005


On Thursday 07 July 2005 11:11 am, Andy Loftus wrote:
> The only thing I've found that works is to run the UP kernel.  Seems
> only SMP kernel is affected.
>
> There was a note on this list to turn off hyper-threading in the bios,
> I have not tried this.
>
> Another suggestion was to contact your Dell support representative for
> a replacement Perc4 card.
>
> Running Fedora Core 3 (kernel: 2.6.11-1.35_FC3) on PE2650 with
> Perc3/Di.

The more I read, the more fingers point at Dell's firmware. The gist of it 
seems to be:

1) the more I/O - more likelyhood of this bug occurring 
2) the faster the machine - same thing. This would probably also account for 
the UP vs SMP
3) I found this which talks more about what happens:

http://marc.theaimsgroup.com/?l=linux-scsi&m=110124885402226&w=2
and:
http://bugme.osdl.org/show_bug.cgi?id=3651

The suggested solution of extracting the adaptec driver and replacing the 
source from the kernel doesn't work for 2.6.12 kernel [undefined references, 
etc]

I somehow doubt that moving from 2.4.x kernel to 2.6.x one would result in 
such a great performance boost, that this bug would be encountered more 
often, thus point #2 wouldn't probably apply.

I also can't seem to find any Changelogs as to exactly what things were 
'fixed' in the aacraid driver, while the code is definitely changed between 
2.6.11.12 and 2.6.12. In my case, 2.6.11.12 didn't even last a few hours, 
while 2.6.12 lasts about a week or two before the container is off-line.

To make it more frustrating, it seems that:

a) adaptec points at dell's firmware
b) dell's latest firmware doesn't fix the issue
c) dell supports only redhat, and supposedly RHES4 works just fine [I checked 
the kernel, RHES4 uses heavily patched 2.6.9, and they have two patches for 
aacraid, neither of which seems to be the one addressing this issue]

Let's sum up all the possible fixes [just in case if somebody else may find it 
useful]

1) run 2.4.x kernel. Almost every case I've seen, including all of my own 
servers, 2.4.x kernels work just fine.
2) run UP kernel. I haven't tested it, but I'd prefer to run 2.4.x kernel than 
UP 2.6.x [personally]. 
3) try the adaptec drivers, which didn't work for the 2.6.12 kernel [undefined 
references]
4) run RHES4. Not an option for some of us
5) Exchange the controller. I'm going to see how well this will be received by 
our Dell rep, since I have at least a dozen of those 2650's...


Thanks to everybody for suggestions. 

sincerely,
-- 
Dominik L. Borkowski - Systems Administrator
Virginia Bioinformatics Institute - www.vbi.vt.edu



More information about the Linux-PowerEdge mailing list