My experiences with six new Dell 2650's and RedHat Linux 7.2

Russo, Ben Ben.Russo at tnsi.com
Tue Aug 6 15:10:00 CDT 2002


I just got 6 Dell 2650's a few weeks ago 
and have been setting them up and
playing with them before they become 
production servers.  

I used the ROMB to set up a single raid group 
and hot spare on each server, then I installed 
RedHat 7.2 on all of them using a ks.cfg on a 
floppy disk for unattended install setup.  

Just don't do the Xconfig during the install, it will
hang the display and you will have to restart the install.
After you install 7.2 you can run Xconfigurator and it 
will work just fine, but don't do it during the install process!  

The only special thing I had to do was the 
"noprobe aacraid_pciid=0x1028......."  thing that was in 
the Dell release notes to get the install procedure to recognize
the RAID CONTAINER as /dev/sda

However I run up2date and there are newer RedHat kernel's available 
that supposedly fix some rare problems.  I would like to avoid ever
encountering a problem and thought I would upgrade to 2.4.9-34.  
However the RPM from RedHat updates for 2.4.9-34 kernel has an 
aacraid module that does not like parm_aacraid_pciid ?  Also I 
found that there is no bcm5700 module support built into the 
modules dir for the standard RedHat kernel?

Is there a way to take the standard RedHat kernel source and apply 
the patches just for aacraid and bcm5700?  If so where would I get those
patches?  Or even better yet, does Dell have the newer stable kernels in
binary RPM form somewhere?

Also, just like everybody else I had problems with the broadcom 
5701 NIC's on my RedHat 7.2 boxes.  They would occasionally report 
that the Link was down and then would require a reboot to get 
working again.  I tried changing the auto-negotiate and speed and 
duplex settings in the switch and the /etc/modules.conf to no avail.  
And I found that I could reproduce the problem by doing 
"ifconfig eth0 down" followed by "ifconfig eth0 up"  
a few times on all 6 of the servers.

I ended up disabling the broadcom adapters in the BIOS and putting 
in a PCI 3Com card that works with no problems. But it is a shame to 
waste those two on board NIC's and have to use a PCI slot for no 
good reason.  Then I had a problem on the 2650's with the PCI NIC.
But only on the 2650's with dual processors.....  
I checked /proc/interrupts and found that the aacraid and the eth0 
were both using the same interrupt, so I made sure that kudzu was 
set to on with chkconfig and then went into the bios and 
disabled serial port 2, USB and the broadcom NIC's and rebooted, 
then when it came up kudzu reassigned the plug-n-pray IRQ's
for the PCI NIC's.

Now all is well, but I have really nice servers where I have to 
disable many features and use other NIC's....  sigh.

I got the RAC working just fine through a web browser, but only with MS-IE 
v6 using the MS-VM (if I used the SUN JRE it would crash the browser every
time, and it was very slow).  Would be nice if Dell made them have SSH2 
and tightVNC on them and forgot about the web interface.

The Serial Console redirection worked OK with TeraTerm especially if I told 
TeraTerm to use precisely 80x24 VT100 9600,8n1 no flow control with full 
color fonts.

I even edited the /boot/grub/grub.conf to get rid of the splash image 
and told the kernel to use a serial console and then it worked all the way
through except that curses apps don't work well and when the boot process 
gets to the section where it shows all the sysV init scripts and their
status the terminal would stop working until the login prompt came up.  
But that isn't that bad.

The only inconsistent problem I found was when playing with the afacli 
while the OS was running.  I slapped a new hard disk into one of the boxes 
and issued a "controller rescan" command and then the box locked up and 
the Disks had flashing yellow lights and the LCD panel said that the ROMB
was having an error.  However I removed power from the box for a few 
minutes and then plugged it back in and everything was OK.  
(hurray for journalling file systems).

I would say though that you should always configure RAID boxes with hot
spares that have auto-rebuild configured, and always if possible take an 
outage window to swap disks while the server is down.  I have seen 
EMC, NetApp, HP, DG, and other RAID systems that have had similar problems
with hot swapping.  (always very very rare problems, but still problems).

-Ben.
This e-mail message is for the sole use of the intended recipient(s) and may
contain confidential and privileged information of Transaction Network
Services.  Any unauthorized review, use, disclosure or distribution is
prohibited.  If you are not the intended recipient, please contact the
sender by reply e-mail and destroy all copies of the original message.




More information about the Linux-PowerEdge mailing list