Strange networking issue

Brian O'Mahony brian.omahony at curamsoftware.com
Fri Jun 25 05:47:19 CDT 2010


I have a PE2850, running RHEL5.4, with a two port intel card in it. One port from both onboard and PCI NIC connect to the network in an active-backup bond. Yesterday at about 2pm we had a broadcast storm on one of the switches, and eth0 failed over to eth3. Everything was running fine (I didn't even notice the failover, nor did users). At 6pm, we had another strange networking issue (not sure what yet, but one of our VMware HA boxes decided to no longer be available by heartbeat). The eth3 interface dropped this:

Jun 24 18:07:43 ccvobdubpr kernel: e1000: eth3: e1000_clean_tx_irq: Detected Tx Unit Hang
Jun 24 18:07:43 ccvobdubpr kernel:   Tx Queue             <0>
Jun 24 18:07:43 ccvobdubpr kernel:   TDH                  <a9>
Jun 24 18:07:43 ccvobdubpr kernel:   TDT                  <93>
Jun 24 18:07:43 ccvobdubpr kernel:   next_to_use          <93>
Jun 24 18:07:43 ccvobdubpr kernel:   next_to_clean        <a9>
Jun 24 18:07:43 ccvobdubpr kernel: buffer_info[next_to_clean]
Jun 24 18:07:43 ccvobdubpr kernel:   time_stamp           <4257186b>
Jun 24 18:07:43 ccvobdubpr kernel:   next_to_watch        <a9>
Jun 24 18:07:43 ccvobdubpr kernel:   jiffies              <42573ee7>
Jun 24 18:07:43 ccvobdubpr kernel:   next_to_watch.status <0>

There was no failover message. People started complaining this morning about the speed. A colleague ifdown/up the bond and all is ok. Until I checked the log:
Jun 25 08:38:03 ccvobdubpr kernel: e1000: eth3: e1000_watchdog_task: NIC Link is Up 10 Mbps Half Duplex, Flow Control: None

Currently we are running on eth0.

[root at ccvobdubpr log]# cat /etc/sysconfig/network-scripts/ifcfg-eth0
# Intel Corporation 82541GI Gigabit Ethernet Controller
DEVICE=eth0
BOOTPROTO=dhcp
#HWADDR=00:13:72:5F:10:6B
ONBOOT=yes
MASTER=bond0
SLAVE=yes
[root at ccvobdubpr log]# cat /etc/sysconfig/network-scripts/ifcfg-eth3
# Intel Corporation 82546GB Gigabit Ethernet Controller
DEVICE=eth3
BOOTPROTO=dhcp
HWADDR=00:1B:21:53:62:51
ONBOOT=no
MASTER=bond0
SLAVE=yes
[root at ccvobdubpr log]# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.4.0 (October 7, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 10000
Down Delay (ms): 0

Slave Interface: eth0
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:13:72:5f:10:6b

Slave Interface: eth3
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:1b:21:53:62:51

I have a number of questions here.
#1 why didn't it fail over. I take it it hung, and the interface slowed right down.
#2 Why did it come back up as 10Mb (It was definitely running fine between 2pm and 6pm yesterday)
#3 can I fix eth3 without a reboot.

Thanks

B


The information in this email is confidential and may be legally privileged.
It is intended solely for the addressee. Access to this email by anyone else
is unauthorized. If you are not the intended recipient, any disclosure,
copying, distribution or any action taken or omitted to be taken in reliance
on it, is prohibited and may be unlawful. If you are not the intended
addressee please contact the sender and dispose of this e-mail. Thank you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.us.dell.com/pipermail/linux-poweredge/attachments/20100625/b8dfba45/attachment-0001.htm 


More information about the Linux-PowerEdge mailing list