Dell 1855 Blade Chassis - 5316M Switch flaw?

Mann, Andrew amann at ea.com
Sun Dec 17 16:45:40 CST 2006


                Recently we rolled out a chassis full of (10x) 1955
blades in the Dell Modular Blade Chassis.  The chassis has two of the
5316M switching modules, though we're only using one at the moment.
This system was deployed into an existing network layout which uses
100mbit connectivity.  While the gigabit networking helps the
intra-connectivity between blades, and the applications running on the
systems benefit from the increased bandwidth, the 'uplink' of the
combined set of blades to the rest of the world achieves a peak traffic
rate of only about 20mbit/sec (physical layer) , so the 100mbit uplink
isn't a bottleneck generally speaking.

                None of the ports on the 5316M have flow control on,
QoS, back pressure, or are operating at half duplex, so my understanding
is that the Head of Line blocking feature should be active.  HOL is
described in the Dell 5316M user's manual as:

 

Head of Line (HOL) blocking results in traffic delays and frame loss
caused by traffic competing

for the same egress port resources. HOL blocking queues packets, and the
packets at the head of

the queue are forwarded before packets at the end of the queue. By
default HOL blocking is active

at all times except when QoS, Flow Control, or Back Pressure is active
on a port, the HOL blocking

prevention mechanism is disabled on the whole system.

 

                When our traffic reached 15 mbit /s and ~ 17,000 packets
per sec, we noticed that frames coming from the internal blades and
destined for the 100mbit uplink were being dropped - about 5-10% of
frames.  The statistics on the 5316M, the blade units themselves, and
the Cisco 4006 switch which the 5316M uplinks to show no errors on any
ports/lines.  The 4006 shows no packets dropped from that port for any
reason.  Unfortunately the 5316M doesn't seem to report dropped frames
anywhere in the command line interface nor through the provided SNMP
data, so I can't see if the switch acknowledges that it's dropping
frames.  I can't find any documentation or reported information from the
switch itself about how much memory is available for HOL packet
buffering.

                As a temporary corrective measure, we connected the
5316M switch to another external gigabit switch (Cisco 4500), and then
connected that switch across to the 100mbit switch.  This has eliminated
the packet loss. 

 

                At this point we consider the 5316M switch unusable with
any 100mbit port configured.  This isn't a major showstopper for us
since we can work around it. If anyone is planning on rolling blades out
into a network that still has some 100mbit connectivity, they may want
to be cautious and perform some testing to see if the same issue will
affect them.  If Dell isn't aware of this issue, they may want to run
some tests and figure out what's going on as well J

 

Andrew                

 

----------------------

Andrew Mann

Lead Engineer

Operations

EA Mythic

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.us.dell.com/pipermail/linux-poweredge/attachments/20061217/b600ed3d/attachment.htm 


More information about the Linux-PowerEdge mailing list