bcm5700+basp = taint+load=1?

Matt Cowan cowan at bnl.gov
Wed Jan 22 03:43:01 CST 2003


After reading the docs and hunting around a bit I've gotten load
balancing + failover + multi vlans working on a pe2650 (dual 2.8GHz,
hyperthreading enabled, 4gigs ram) running redhat 7.3,
2.4.18-19.7.xsmp (using bcm5700+basp rebuilt from src package from
dell support site
ftp://ftp.us.dell.com/network/Bcom_LAN_60_RHSrc_02.tar.gz, read and
follow /usr/share/doc/basplnx-3.0.9/release.txt, copy and modify one
of the team examples in /etc/basp/samples, configure the virtual
interfaces using your favorite tool, tada it works, no problemo,
well... uno problemo).  I haven't had any stability problems, although
I haven't hammered the system with a really heavy load yet.

It seems to be working fine except for one small oddity.  When I start
basp (/etc/init.d/basp start) and it brings the interfaces up, the
load average climbs up and sticks at 1 within a minute or so with
absolutely nothing going on.  The system remains perfectly responsive,
but something is driving the load up (artificially?).  If I stop basp
(/etc/init.d/basp stop), the load drops back down to ~0 within another
minute.

After initial bootup, the starting/stopping doesn't unload/reload the
basp module, it just brings the "team" up and down using baspif and
baspteam, so it's not just the basp kernel module (which I do get
taint warnings from when it loads, and lsmod shows a tainted state
(should basp rebuilt from src taint the kernel?)), but the module when
its doing something, but next to nothing.  When basp is running the
network there is a "basp_worker" kernel process running; when the basp
module is loaded, but not used, there is no basp_worker process.

If I configure and 'ifup' the nics directly (eth{0,1} instead of the
virtual interfaces) (still bcm5700, no basp (the module is still
loaded, but unused), no advanced features), everything works as
expected, and the load is normal (~0).

Having read
http://lists.us.dell.com/pipermail/linux-poweredge/2003-January/011353.html
it sounds like tg3 may eventually be the way to go, but all of the
features I mentioned and stability are critical for the system in
question, and my problem seems to be with basp, not the bcm5700
module.  Although I guess it could be the bcm5700 when basp enables
the advanced features?  Are the "heavy-handed" workarounds mentioned
in the above post so bad that they would do this?

Anyone else run into a weird load like this?  Any ideas on tracking
down exactly what is inflating the load? (top and sar/isag show
nothing non-negligible going on; the only remotely significant change
being the load average when switching to and from a basped network
setup).  Should the basp module be tainting the kernel?!  (recompiled
from src rpm from
ftp://ftp.us.dell.com/network/Bcom_LAN_60_RHSrc_02.tar.gz)

I haven't yet had a chance to really investigate how much of the basp
functionality I could duplicate using bonding and the kernel vlan
system.  From
/usr/src/linux-2.4.18-19.7.x/Documentation/networking/bonding.txt it
looks like I would at least either lose the switch independence of
basp's SLB mode using bonding round robin (I assume the xor mode is
similarly switch dependent?) or be reduced to an active+backup not
active+active mode using bonding's active+backup mode.

any help is appreciated.
sorry for the longwindedness
-matt




More information about the Linux-PowerEdge mailing list