Returning Network stability problems on R710 servers and BCM5709

Narendra_K at Dell.com Narendra_K at Dell.com
Thu Feb 4 03:25:18 CST 2010


Hello James,

It would be great if you could provide a couple of details about this
box with relatively low traffic which loses connectivity. What are the
network applications this server is running ? Any details related to the
network activity in this server would help ? Is this machine part of
some cluster ?

Also if you can share the output of "ethtool -i eth0" from both failing
and not failing boxes would help.


With regards,
Narendra K



-----Original Message-----
From: linux-poweredge-bounces at dell.com
[mailto:linux-poweredge-bounces at dell.com] On Behalf Of James Sparenberg
Sent: Wednesday, February 03, 2010 1:08 AM
To: linux-poweredge at dell.com
Subject: Returning Network stability problems on R710 servers and
BCM5709

All,

   I'm referencing an earlier thread from last Sept.

http://lists.us.dell.com/pipermail/linux-poweredge/2009-September/040252
.html

   In it there was a discussion related to stability problems with the
Broadcom BCM5709 on a Dell r610, where there would be a loss of
connectivity for new connections but existing connections, or all
connections of a different protocol passed.

For example.  Just now I lost the ability to ping eth0, or get NIS
authentication on that IP, I also lost the ability to get TFTP
connections via the eth1 address.  However at the same time DHCP is
running against eth1, and SNMP NTP and HTTP over port 10000 (webmin)
where merrily working quite well on eth0.  

OS CentOS 5.4 

kernel   2.6.18-164.10.1.el5 SMP x86_64 
Kernel module bk2 

modinfo output 

filename:
/lib/modules/2.6.18-164.10.1.el5.centos.plus/kernel/drivers/net/bnx2.ko
version:        1.9.3
license:        GPL
description:    Broadcom NetXtreme II BCM5706/5708/5709/5716 Driver
author:         Michael Chan <mchan at broadcom.com>
srcversion:     1040A42F87B8BE8A019736C
alias:          pci:v000014E4d0000163Csv*sd*bc*sc*i*
alias:          pci:v000014E4d0000163Bsv*sd*bc*sc*i*
alias:          pci:v000014E4d0000163Asv*sd*bc*sc*i*
alias:          pci:v000014E4d00001639sv*sd*bc*sc*i*
alias:          pci:v000014E4d000016ACsv*sd*bc*sc*i*
alias:          pci:v000014E4d000016AAsv*sd*bc*sc*i*
alias:          pci:v000014E4d000016AAsv0000103Csd00003102bc*sc*i*
alias:          pci:v000014E4d0000164Csv*sd*bc*sc*i*
alias:          pci:v000014E4d0000164Asv*sd*bc*sc*i*
alias:          pci:v000014E4d0000164Asv0000103Csd00003106bc*sc*i*
alias:          pci:v000014E4d0000164Asv0000103Csd00003101bc*sc*i*
depends:
vermagic:       2.6.18-164.10.1.el5.centos.plus SMP mod_unload gcc-4.1
parm:           disable_msi:Disable Message Signaled Interrupt (MSI)
(int)
parm:           enable_entropy:Allow bnx2 to populate the /dev/random
entropy pool (int)
module_sig:
883f3504b47af9bd3b84a368dd51f2112b6b90a0ed1bac15e1b94720602336594dc65775
db83c460991575cc8694cf9c03aca6e623e0950281e5094

So you can see that the version I have exceeds the version said to be
stable in the prior thread.  BTW this chassis is about 1 month old so it
should (but unverified) have the latest BIOS. 

Ironic part.  Same model running the same version/kernel of CentOS (kick
start install so all my boxes are the same) is running some load testing
pushing millions of sessions and billions (soaking 4 1G nics) of packets
without a hitch in our LAB, testing out equipment, yet, this box which
has a relatively low throughput is the one that locks up.  

Any thoughts or suggestions would be appreciated.  So far nothing in
normal logs so I'm going to turn some additional logging on.

James Sparenberg

_______________________________________________
Linux-PowerEdge mailing list
Linux-PowerEdge at dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq






More information about the Linux-PowerEdge mailing list