AW: 2650 + new BIOS + 2.6.10-ac11 and it *still* crashes

Robert Goley ragoley at rdasys.com
Tue Mar 15 17:50:03 CST 2005


>From the reports I have heard on this list for this problem, it was
special IO caused by certain types of writes that occur mainly in the
ext3 fileystem(may be old info).  Has anyone had better success using
reiserfs, JFS, or XFS?  I know none of these are there for Redhat users
by default but what about other distros?  I am looking at upgrading a
Redhat 7.3 PE 2650 and have been dreading it due to the problems
mentioned on this list.  It will be running Debian (Sarge) with a 2.6.x
kernel.  

Robert

On Tue, 2005-03-15 at 18:15, Matthias Pigulla wrote:
> Hey all,
>  
> > Yes, we have been having problems with 2650's and RHEL.  
> Good to know it's RHEL :)
> 
> > The server remains alive to ICMP pings.  If you portscan it, 
> > it shows that ports are open.  But if you attempt to connect 
> > to any of the services running on the machine, you get 
> > nothing. 
> 
> For us, the server becomes totally unresponsive, no pongs.
> 
> > You can't get a console prompt, either through a monitor, or 
> > through a serial connection.
> 
> ... and not through the ERA remote console, however, basically, that's a
> monitor :)
> 
> > The kicker has been that absolutely nothing is ever logged.  
> > No oops, panic, or warning.  When you reboot the server 
> > (thank goodness for RAC cards), and go back and comb through 
> > the logs, there's absolutely nothing logged to indicate a 
> > problem. 
> 
> Same for us, most of the time. Only in rare cases, the well known "scsi
> ... timeout ... hang..." messages make it from the box to a remote
> loghost (via network!). 
> 
> > Nothing in ESM either.
> 
> We find "Event: Drive [0, 2 or 3] drive slot sensor drive fault
> detected" in the ERA log, ERA also generates such e-mails to the admin
> address.
> 
> > We are running RHEL 3, Update 4. 
> 
> Debian woody here.
> 
> > Several common items 
> > include most servers are attached to a Dell/EMC CX300 SAN, 
> > with single Qlogic 2340 HBAs, they are running iptables, we 
> > are using the bonding driver with the tg3 NIC driver, in 
> > active-fallback mode.  Kernel is either 2.4.21-27.0.1smp or 
> > 2.4.21-27.0.2smp and OpenManage is installed.  BIOS and FW 
> > are current.
> 
> Nothing special here; BIOS, firmware are up-to-date. Kernel is a
> standard 2.4.27 with aacraid 1.1.5 (from Adaptec). 
> 
> This is a single CPU box; disabling hyperthreading did not help. Crashes
> almost always (only one exception I can remember) occur while Legato
> Networker is making backups. Mounting filesystems with noatime improved
> the situation a little (crashes are less frequent), I presume that's
> simply because it reduces IO load.
> 
> Mark Salyzyn from Adaptec mentioned disk make/firmware might play a
> role, we have 4 x QUANTUM ATLAS10K3_18_SCA Rev 120G here. Maxtor support
> says there is no more recent firmware.
> 
> Exchanging four disks is equally to exchanging the controller (in terms
> of $$$, time and work needed to rebuild the array), however the latter
> probably has a greater effect... You are not alone :)
> 
> Best regards,
> Matthias
> 
> _______________________________________________
> Linux-PowerEdge mailing list
> Linux-PowerEdge at dell.com
> http://lists.us.dell.com/mailman/listinfo/linux-poweredge
> Please read the FAQ at http://lists.us.dell.com/faq




More information about the Linux-PowerEdge mailing list