linux + Xen + bnx2 + bonding

G.Bakalarski at icm.edu.pl G.Bakalarski at icm.edu.pl
Tue Mar 27 13:15:00 CDT 2012


Dear All

At the moment we are convinced to some degree that that origin
of the problem was usage of hardware virtualisation in our R815 servers i.e
IOMMU (DMA Virtualization ON) ...
We turned it OFF on Friday and u[ to now no single network error ...

We turned it on because disk IO with IOMMU is significantly faster
(up to 80% on sequential writes).

We did not notice the problem during testing because on a testbed server
we used plain network arrangement and it did not die (anyway network
usage was minimal and server was restarted frequently ).

So at the moment we recommend NOT to switch on DMA Virtualisation for R815
and R715 ...

BTW: Does anybody use with success hardware virtualisation on DELL machines
equipped with AMD processors (61xx, 62xx series) in similar environment
(XEN 4.1, linux with recent kernel 3.x)?

What about Dell recent machines with Intel CPUs and Intel's hardware
virtualisation (e.q. R910 with 10 cores E7 CPU ) ?

All the best ...


GB

PS. Dear mr Chan - we could not way longer for your patch - sorry ...

> I need to add additional printks during tx_timeout to further understand
> this.  Will you be able to re-test if I send you a patch?
>
> Thanks.
>
> On Tue, 2012-03-20 at 16:22 +0100, G.Bakalarski at icm.edu.pl wrote:
>> Dear ALL.
>>
>> We have strange problems using bonding module on our DELL R815s farm.
>>
>> Hardware:
>>
>> Stacked 2 Juniper Ex-4200 switches (from DELL ;) )
>> Bunch of R815s - 2 1GBit ports connected (1 to each physical switch
>> enclosure)
>> The R815s have Broadcom 5709 NICs.
>>
>>
>> Software:
>>
>> Xen 4.1.2
>> Linux with kernel 3.2.0 (2.6.32-5-adm64-xen from Debian also tested)
>> In dom0 we have bridges on VLANs and on top bonded interfaces
>> with virtal interfaces for domUs.
>>
>> Example:
>> #> ip a
>>
>>  1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
>>     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>>     inet 127.0.0.1/8 scope host lo
>>     inet6 ::1/128 scope host
>>        valid_lft forever preferred_lft forever
>> 16: vif-pub-dom1: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc
>> pfifo_fast master eth-pub state UP qlen 32
>>     link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff
>>     inet6 fe80::fcff:ffff:feff:ffff/64 scope link
>>        valid_lft forever preferred_lft forever
>> 17: vif-mon-dom1: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc
>> pfifo_fast master eth-mon state UP qlen 32
>>     link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff
>>     inet6 fe80::fcff:ffff:feff:ffff/64 scope link
>>        valid_lft forever preferred_lft forever
>> 42: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue
>> state UP
>>     link/ether 14:fe:b5:ca:4e:d5 brd ff:ff:ff:ff:ff:ff
>>     inet6 fe80::16fe:b5ff:feca:4ed5/64 scope link
>>        valid_lft forever preferred_lft forever
>> 44: eth-pub: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state
>> UP
>>     link/ether 14:fe:b5:ca:4e:d5 brd ff:ff:ff:ff:ff:ff
>>     inet6 2001:6a0:0:21::d0:20/64 scope global
>>        valid_lft forever preferred_lft forever
>>     inet6 fe80::16fe:b5ff:feca:4ed5/64 scope link
>>        valid_lft forever preferred_lft forever
>> 45: bond0.21 at bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
>> master eth-pub state UP
>>     link/ether 14:fe:b5:ca:4e:d5 brd ff:ff:ff:ff:ff:ff
>>     inet6 fe80::16fe:b5ff:feca:4ed5/64 scope link
>>        valid_lft forever preferred_lft forever
>> 46: eth-mon: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state
>> UP
>>     link/ether 14:fe:b5:ca:4e:d5 brd ff:ff:ff:ff:ff:ff
>>     inet6 2001:6a0:1021::2:2000/112 scope global
>>        valid_lft forever preferred_lft forever
>>     inet6 fe80::16fe:b5ff:feca:4ed5/64 scope link
>>        valid_lft forever preferred_lft forever
>> 47: bond0.402 at bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
>> noqueue
>> master eth-mon state UP
>>     link/ether 14:fe:b5:ca:4e:d5 brd ff:ff:ff:ff:ff:ff
>>     inet6 fe80::16fe:b5ff:feca:4ed5/64 scope link
>>        valid_lft forever preferred_lft forever
>> 48: eth0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master
>> bond0 state UP qlen 1000
>>     link/ether 14:fe:b5:ca:4e:d5 brd ff:ff:ff:ff:ff:ff
>> 49: eth1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master
>> bond0 state UP qlen 1000
>>     link/ether 14:fe:b5:ca:4e:d5 brd ff:ff:ff:ff:ff:ff
>> 50: eth2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
>>     link/ether 14:fe:b5:ca:4e:d9 brd ff:ff:ff:ff:ff:ff
>> 51: eth3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
>>     link/ether 14:fe:b5:ca:4e:db brd ff:ff:ff:ff:ff:ff
>>
>>
>> --------------------------------------------------------
>>
>> bridges:
>> #> brctl show
>> bridge name	bridge id		STP enabled	interfaces
>> eth-mon		8000.14feb5ca4ed5	no		bond0.402
>> 							vif-mon-dom1
>> eth-pub		8000.14feb5ca4ed5	no		bond0.21
>> 							vif-pub-dom1
>>
>> ----------------------------------------------------------
>>
>> bonding details:
>>
>>
>> #> cat /proc/net/bonding/bond0
>> Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
>>
>> Bonding Mode: IEEE 802.3ad Dynamic link aggregation
>> Transmit Hash Policy: layer3+4 (1)
>> MII Status: up
>> MII Polling Interval (ms): 100
>> Up Delay (ms): 0
>> Down Delay (ms): 0
>>
>> 802.3ad info
>> LACP rate: fast
>> Min links: 0
>> Aggregator selection policy (ad_select): stable
>> Active Aggregator Info:
>> 	Aggregator ID: 3
>> 	Number of ports: 2
>> 	Actor Key: 17
>> 	Partner Key: 21
>> 	Partner Mac Address: 2c:21:72:9e:b0:80
>>
>> Slave Interface: eth0
>> MII Status: up
>> Speed: 1000 Mbps
>> Duplex: full
>> Link Failure Count: 0
>> Permanent HW addr: 14:fe:b5:ca:4e:d5
>> Aggregator ID: 3
>> Slave queue ID: 0
>>
>> Slave Interface: eth1
>> MII Status: up
>> Speed: 1000 Mbps
>> Duplex: full
>> Link Failure Count: 0
>> Permanent HW addr: 14:fe:b5:ca:4e:d7
>> Aggregator ID: 3
>> Slave queue ID: 0
>> --------------------------------
>>
>> switch is configured with link aggregaation + LACP
>>
>> ---------------------------------------------------
>> in dmesg we can see:
>>
>>
>> [617897.820090] ------------[ cut here ]------------
>> [617897.820106] WARNING: at
>> /mnt/linux-2.6-3.2.6/debian/build/source_amd64_none/net/sched/sch_generic.c:255
>> dev_watchdog+0xe9/0x148()
>> [617897.820111] Hardware name: PowerEdge R815
>> [617897.820115] NETDEV WATCHDOG: eth1 (bnx2): transmit queue 0 timed out
>> [617897.820119] Modules linked in: bonding bnx2 xt_physdev xen_netback
>> xen_blkback ebt_ip ebt_ip6 ebtable_filter ebtables bridge xen_evtchn xenfs
>> dm_round_robin dm_multipath scsi_dh ipmi_si ipmi_devintf ipmi_msghandler
>> 8021q
>> garp stp snd_pcm snd_timer snd soundcore snd_page_alloc pcspkr psmouse
>> serio_raw evdev joydev sp5100_tco tpm_tis tpm dcdbas tpm_bios amd64_edac_mod
>> edac_core edac_mce_amd k10temp acpi_power_meter button processor thermal_sys
>> xfs dm_mod nfs lockd fscache auth_rpcgss nfs_acl sunrpc xt_tcpudp xt_state
>> ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables
>> ip6t_REJECT nf_conntrack_ipv6 nf_conntrack nf_defrag_ipv6 ip6table_filter
>> ip6_tables x_tables usbhid hid sg sr_mod cdrom sd_mod ses crc_t10dif
>> enclosure
>> ahci libahci libata lpfc scsi_transport_fc scsi_tgt ohci_hcd ehci_hcd
>> megaraid_sas usbcore usb_common scsi_mod [last unloaded: bnx2]
>> [617897.820257] Pid: 0, comm: swapper/0 Tainted: G        W    3.2.0-1-amd64
>> #1
>> [617897.820261] Call Trace:
>> [617897.820264]  <IRQ>  [<ffffffff81046465>] ?
>> warn_slowpath_common+0x78/0x8c
>> [617897.820283]  [<ffffffff81046511>] ? warn_slowpath_fmt+0x45/0x4a
>> [617897.820290]  [<ffffffff81291f15>] ? netif_tx_lock+0x40/0x72
>> [617897.820297]  [<ffffffff81292076>] ? dev_watchdog+0xe9/0x148
>> [617897.820305]  [<ffffffff81051af4>] ? run_timer_softirq+0x19a/0x261
>> [617897.820311]  [<ffffffff81291f8d>] ? netif_tx_unlock+0x46/0x46
>> [617897.820318]  [<ffffffff8104ba54>] ? __do_softirq+0xb9/0x177
>> [617897.820326]  [<ffffffff8120a0ab>] ? __xen_evtchn_do_upcall+0x1b5/0x1f2
>> [617897.820334]  [<ffffffff8133e8ec>] ? call_softirq+0x1c/0x30
>> [617897.820342]  [<ffffffff8100f875>] ? do_softirq+0x3c/0x7b
>> [617897.820348]  [<ffffffff8104bcbc>] ? irq_exit+0x3c/0x9a
>> [617897.820354]  [<ffffffff8120b675>] ? xen_evtchn_do_upcall+0x27/0x32
>> [617897.820360]  [<ffffffff8133e93e>] ? xen_do_hypervisor_callback+0x1e/0x30
>> [617897.820363]  <EOI>  [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
>> [617897.820374]  [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
>> [617897.820382]  [<ffffffff8100663a>] ? xen_safe_halt+0xc/0x13
>> [617897.820389]  [<ffffffff81014448>] ? default_idle+0x47/0x7f
>> [617897.820395]  [<ffffffff8100d25f>] ? cpu_idle+0xaf/0xf2
>> [617897.820402]  [<ffffffff81687b38>] ? start_kernel+0x3b8/0x3c3
>> [617897.820408]  [<ffffffff8168963b>] ? xen_start_kernel+0x586/0x58c
>> [617897.820412] ---[ end trace a7919e7f17c0a757 ]---
>>
>>
>> and
>>
>>
>> [617897.820427] bnx2 0000:01:00.1: eth1: DEBUG: intr_sem[0]
>> PCI_CMD[00180006]
>> [617897.824071] bnx2 0000:01:00.1: eth1: DEBUG: PCI_PM[19002008]
>> PCI_MISC_CFG[92000088]
>> [617897.824071] bnx2 0000:01:00.1: eth1: DEBUG: EMAC_TX_STATUS[00000008]
>> EMAC_RX_STATUS[00000000]
>> [617897.824071] bnx2 0000:01:00.1: eth1: DEBUG: RPM_MGMT_PKT_CTRL[40000088]
>> [617897.824071] bnx2 0000:01:00.1: eth1: DEBUG:
>> HC_STATS_INTERRUPT_STATUS[01fe0001]
>> [617897.824071] bnx2 0000:01:00.1: eth1: <--- start MCP states dump --->
>> [617897.824071] bnx2 0000:01:00.1: eth1: DEBUG: MCP_STATE_P0[0003610e]
>> MCP_STATE_P1[0003610e]
>> [617897.824071] bnx2 0000:01:00.1: eth1: DEBUG: MCP mode[0000b880]
>> state[80000000] evt_mask[00000500]
>> [617897.824071] bnx2 0000:01:00.1: eth1: DEBUG: pc[0800c6c8] pc[0800d7d4]
>> instr[ac620038]
>> [617897.824071] bnx2 0000:01:00.1: eth1: DEBUG: shmem states:
>> [617897.824071] bnx2 0000:01:00.1: eth1: DEBUG: drv_mb[0103000f]
>> fw_mb[0000000f] link_status[0000006f] drv_pulse_mb[00001d40]
>> [617897.824071] bnx2 0000:01:00.1: eth1: DEBUG: dev_info_signature[44564903]
>> reset_type[01005254] condition[0003610e]
>> [617897.824071] bnx2 0000:01:00.1: eth1: DEBUG: 000003cc: 44444444 44444444
>> 44444444 00000a28
>> [617897.824071] bnx2 0000:01:00.1: eth1: DEBUG: 000003dc: 000cffff 00000000
>> ffff0000 00000000
>> [617897.824071] bnx2 0000:01:00.1: eth1: DEBUG: 000003ec: 00000000 00000000
>> 00000000 00000000
>> [617897.824071] bnx2 0000:01:00.1: eth1: DEBUG: 0x3fc[0000ffff]
>> [617897.824071] bnx2 0000:01:00.1: eth1: <--- end MCP states dump --->
>> [617898.236955] bnx2 0000:01:00.1: eth1: NIC Copper Link is Down
>> [617898.268155] bonding: bond0: link status definitely down for interface
>> eth1, disabling it
>>
>>
>> The network is very unstable: from timeouts or IPv6 broadcasting somehow
>> filtered/dropped (some hosts not responding to neighbour requests), through
>> one or other VLAN not responding, up to both physical interfaces totally
>> down
>> ....
>>
>> From immediate "no connection", through problems after 1.5 hour,  up to
>> strange
>> behaviuor after 5 days ....
>>
>> Without bonding network was much more stable (i.e.  with only eth0 UP)
>> - 1-2 months without problmes, however messages in dmesg where also present
>> -
>> without bonding ...
>>
>> bnx2 version:
>> 2.1.11
>>
>> The server are not heavily loaded or dont have high network throughput ....
>>
>> ANY HELP HIGLY APPRECIATED !
>>
>> GB
>>
>>
>
>
>
>




More information about the Linux-PowerEdge mailing list