Broadcom NetXtremeII Packet Loss/PE1950

Matt Saladna msaladna at apisnetworks.com
Fri Jan 4 11:07:41 CST 2008


Brian,
	I have already turned off RX/TX checksumming, scatter-gather, and TSO
support with no change in the reliability of NIC.  Can anyone else run
the script on a BCM5708 with a plain vanilla distribution of
RHEL/CentOS?  The servers are running the latest Broadcom firmware
(3.5.12).  It may be time to bite the bullet and escalate the issue to
Dell's tech support or replace all of the NICs with e1000s.

- Matt

Brian A. Seklecki wrote:
> Check ethtool(8) man page:
> 
>        ethtool -K--offload ethX [rx on|off] [tx on|off] [sg on|off] [tso
> on|off] [ufo on|off] [gso on|off]
> 
>       -k --show-offload
>               queries the specified ethernet device for offload
> information.
> 
> ~BAS
> 
> 
> On Fri, 2008-01-04 at 11:30 -0500, Matt Saladna wrote:
>> Patrick,
>> 	I removed the TOE key from one of the problematic servers last week,
>> but there was no change.  That should automatically disable TOE support
>> from within the BIOS.  It is a production server, so I would like to
>> avoid taking it down again to check in the BIOS to verify that TOE has
>> been disabled.  Is there another way to check or can someone confirm
>> whether just removing the TOE key disables TOE on the NIC?
>>
>> - Matt
>>
>> Patrick Schreurs wrote:
>>> Hi Matt,
>>>
>>> Is TOE enabled in the BIOS? I've seen improvements by removing the TOE key.
>>>
>>> Good luck,
>>>
>>> Patrick Schreurs
>>>
>>> Matt Saladna wrote:
>>>> Hi,
>>>>     Has anyone noticed sporadic packet loss with the BCM5708 NICs on
>>>> Linux?
>>>>  I have been attempting to track down the root cause, but everything is
>>>> turning up empty.
>>>>
>>>> The symptoms are once every few hundred HTTP requests or so, there will
>>>> be a timeout; same goes for DNS resolution.  It looks like any TCP or
>>>> UDP request has a 1 in 30 or so chance of abruptly timing out on this
>>>> card.  I have been testing by running a lookup query to 4.2.2.1 and
>>>> averaging out trip times with the following script:
>>>>
>>>> ( set -o pipefail ; let COUNT=0 ; RET=0 ; until [ $RET -ne 0 -o $COUNT
>>>> -gt 1000 ] ; do dig +short +trace @4.2.2.1 google.com | grep 'from
>>>> server' | awk '{print $7}' ; RET=$? ; ((COUNT++)) ; done ) | awk '{ SUM
>>>> += $1; COUNT += 1; } END { print SUM/COUNT, COUNT }' ; clock
>>>>
>>>> Three of the servers running CentOS 4 with the BCM5708 chipset die
>>>> within the first 30 queries:
>>>>
>>>> dig: couldn't get address for 'H.ROOT-SERVERS.NET': not found
>>>> 722.948 77
>>>>
>>>> Two of the three servers are running the official 2.6.22.1 kernel, while
>>>> another is running on 2.6.24-rc6.  There is a fourth server, a PE SC1435
>>>> with the Broadcom BCM5721 chipset, which performs the lookups
>>>> flawlessly.  All four servers are on the same switch in the facility
>>>> effectively ruling out network issues on the uplink.  I removed the TOE
>>>> from one of the troubled PE1950s last weekend, but that hasn't affected
>>>> packet loss.  Packet loss occurs independent of any offload settings
>>>> made via ethtool.  I have turned all of the parameters off and on with
>>>> no success.
>>>>
>>>> Finally, just for kicks, I decided to run the DNS lookup test on a
>>>> development machine with a mirror image of the filesystem.  I'm running
>>>> 2.6.22.2 with Via's VT6102 (Rhine-II) embedded NIC.  This works without
>>>> a problem either.
>>>>
>>>> Thanks!
>>>>    Matt Saladna
>>>>    Apis Networks
>>>>
>>>> _______________________________________________
>>>> Linux-PowerEdge mailing list
>>>> Linux-PowerEdge at dell.com
>>>> http://lists.us.dell.com/mailman/listinfo/linux-poweredge
>>>> Please read the FAQ at http://lists.us.dell.com/faq
>> _______________________________________________
>> Linux-PowerEdge mailing list
>> Linux-PowerEdge at dell.com
>> http://lists.us.dell.com/mailman/listinfo/linux-poweredge
>> Please read the FAQ at http://lists.us.dell.com/faq
>>
>>
>>
>>
>>
> 



More information about the Linux-PowerEdge mailing list