Returning Network stability problems on R710 servers and BCM5709
Carlson, Timothy S
Timothy.Carlson at pnl.gov
Wed Feb 3 08:30:34 CST 2010
I've moved away from the RHEL/Centos driver and have gone directly to the bnx2 driver from Broadcomm.
dmesg | grep bnx
Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v1.9.20b (July 9, 2009)
bnx2: eth0: using MSI
That driver seems stable for me. I was seeing your things similar to your problem and this driver fixed things right up for me.
You'll need to download that driver and rebuild it from the SRPM. You'll also need to rebuild the driver for each kernel update which is a pain.
From: linux-poweredge-bounces at dell.com [mailto:linux-poweredge-bounces at dell.com] On Behalf Of James Sparenberg
Sent: Wednesday, February 03, 2010 1:08 AM
To: linux-poweredge at dell.com
Subject: Returning Network stability problems on R710 servers and BCM5709
I'm referencing an earlier thread from last Sept.
In it there was a discussion related to stability problems with the Broadcom BCM5709 on a Dell r610, where there would be a loss of connectivity for new connections but existing connections, or all connections of a different protocol passed.
For example. Just now I lost the ability to ping eth0, or get NIS authentication on that IP, I also lost the ability to get TFTP connections via the eth1 address. However at the same time DHCP is running against eth1, and SNMP NTP and HTTP over port 10000 (webmin) where merrily working quite well on eth0.
OS CentOS 5.4
kernel 2.6.18-164.10.1.el5 SMP x86_64
Kernel module bk2
description: Broadcom NetXtreme II BCM5706/5708/5709/5716 Driver
author: Michael Chan <mchan at broadcom.com>
vermagic: 2.6.18-164.10.1.el5.centos.plus SMP mod_unload gcc-4.1
parm: disable_msi:Disable Message Signaled Interrupt (MSI) (int)
parm: enable_entropy:Allow bnx2 to populate the /dev/random entropy pool (int)
So you can see that the version I have exceeds the version said to be stable in the prior thread. BTW this chassis is about 1 month old so it should (but unverified) have the latest BIOS.
Ironic part. Same model running the same version/kernel of CentOS (kick start install so all my boxes are the same) is running some load testing pushing millions of sessions and billions (soaking 4 1G nics) of packets without a hitch in our LAB, testing out equipment, yet, this box which has a relatively low throughput is the one that locks up.
Any thoughts or suggestions would be appreciated. So far nothing in normal logs so I'm going to turn some additional logging on.
Linux-PowerEdge mailing list
Linux-PowerEdge at dell.com
Please read the FAQ at http://lists.us.dell.com/faq
More information about the Linux-PowerEdge