Returning Network stability problems on R710 servers and BCM5709

James Sparenberg james at stoke.com
Wed Feb 3 03:07:37 CST 2010


All,

   I'm referencing an earlier thread from last Sept.

http://lists.us.dell.com/pipermail/linux-poweredge/2009-September/040252.html

   In it there was a discussion related to stability problems with the Broadcom BCM5709 on a Dell r610, where there would be a loss of connectivity for new connections but existing connections, or all connections of a different protocol passed.

For example.  Just now I lost the ability to ping eth0, or get NIS authentication on that IP, I also lost the ability to get TFTP connections via the eth1 address.  However at the same time DHCP is running against eth1, and SNMP NTP and HTTP over port 10000 (webmin) where merrily working quite well on eth0.  

OS CentOS 5.4 

kernel   2.6.18-164.10.1.el5 SMP x86_64 
Kernel module bk2 

modinfo output 

filename:       /lib/modules/2.6.18-164.10.1.el5.centos.plus/kernel/drivers/net/bnx2.ko
version:        1.9.3
license:        GPL
description:    Broadcom NetXtreme II BCM5706/5708/5709/5716 Driver
author:         Michael Chan <mchan at broadcom.com>
srcversion:     1040A42F87B8BE8A019736C
alias:          pci:v000014E4d0000163Csv*sd*bc*sc*i*
alias:          pci:v000014E4d0000163Bsv*sd*bc*sc*i*
alias:          pci:v000014E4d0000163Asv*sd*bc*sc*i*
alias:          pci:v000014E4d00001639sv*sd*bc*sc*i*
alias:          pci:v000014E4d000016ACsv*sd*bc*sc*i*
alias:          pci:v000014E4d000016AAsv*sd*bc*sc*i*
alias:          pci:v000014E4d000016AAsv0000103Csd00003102bc*sc*i*
alias:          pci:v000014E4d0000164Csv*sd*bc*sc*i*
alias:          pci:v000014E4d0000164Asv*sd*bc*sc*i*
alias:          pci:v000014E4d0000164Asv0000103Csd00003106bc*sc*i*
alias:          pci:v000014E4d0000164Asv0000103Csd00003101bc*sc*i*
depends:
vermagic:       2.6.18-164.10.1.el5.centos.plus SMP mod_unload gcc-4.1
parm:           disable_msi:Disable Message Signaled Interrupt (MSI) (int)
parm:           enable_entropy:Allow bnx2 to populate the /dev/random entropy pool (int)
module_sig:     883f3504b47af9bd3b84a368dd51f2112b6b90a0ed1bac15e1b94720602336594dc65775db83c460991575cc8694cf9c03aca6e623e0950281e5094

So you can see that the version I have exceeds the version said to be stable in the prior thread.  BTW this chassis is about 1 month old so it should (but unverified) have the latest BIOS. 

Ironic part.  Same model running the same version/kernel of CentOS (kick start install so all my boxes are the same) is running some load testing pushing millions of sessions and billions (soaking 4 1G nics) of packets without a hitch in our LAB, testing out equipment, yet, this box which has a relatively low throughput is the one that locks up.  

Any thoughts or suggestions would be appreciated.  So far nothing in normal logs so I'm going to turn some additional logging on.

James Sparenberg



More information about the Linux-PowerEdge mailing list