linux + Xen + bnx2 + bonding

Joseph Glanville joseph.glanville at orionvm.com.au
Thu Mar 29 15:37:05 CDT 2012


I think you would be more likely to find assistance with Xen/IOMMU
related issues on the xen-devel list, specifically the AMD and IOMMU
guys might be able to help you out.
I have cc'd xen-devel for you.

Joseph.

On 28 March 2012 05:15,  <G.Bakalarski at icm.edu.pl> wrote:
> Dear All
>
> At the moment we are convinced to some degree that that origin
> of the problem was usage of hardware virtualisation in our R815 servers i.e
> IOMMU (DMA Virtualization ON) ...
> We turned it OFF on Friday and u[ to now no single network error ...
>
> We turned it on because disk IO with IOMMU is significantly faster
> (up to 80% on sequential writes).
>
> We did not notice the problem during testing because on a testbed server
> we used plain network arrangement and it did not die (anyway network
> usage was minimal and server was restarted frequently ).
>
> So at the moment we recommend NOT to switch on DMA Virtualisation for R815
> and R715 ...
>
> BTW: Does anybody use with success hardware virtualisation on DELL machines
> equipped with AMD processors (61xx, 62xx series) in similar environment
> (XEN 4.1, linux with recent kernel 3.x)?
>
> What about Dell recent machines with Intel CPUs and Intel's hardware
> virtualisation (e.q. R910 with 10 cores E7 CPU ) ?
>
> All the best ...
>
>
> GB
>
> PS. Dear mr Chan - we could not way longer for your patch - sorry ...
>
>> I need to add additional printks during tx_timeout to further understand
>> this.  Will you be able to re-test if I send you a patch?
>>
>> Thanks.
>>
>> On Tue, 2012-03-20 at 16:22 +0100, G.Bakalarski at icm.edu.pl wrote:
>>> Dear ALL.
>>>
>>> We have strange problems using bonding module on our DELL R815s farm.
>>>
>>> Hardware:
>>>
>>> Stacked 2 Juniper Ex-4200 switches (from DELL ;) )
>>> Bunch of R815s - 2 1GBit ports connected (1 to each physical switch
>>> enclosure)
>>> The R815s have Broadcom 5709 NICs.
>>>
>>>
>>> Software:
>>>
>>> Xen 4.1.2
>>> Linux with kernel 3.2.0 (2.6.32-5-adm64-xen from Debian also tested)
>>> In dom0 we have bridges on VLANs and on top bonded interfaces
>>> with virtal interfaces for domUs.
>>>
>>> Example:
>>> #> ip a
>>>
>>>  1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
>>>     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>>>     inet 127.0.0.1/8 scope host lo
>>>     inet6 ::1/128 scope host
>>>        valid_lft forever preferred_lft forever
>>> 16: vif-pub-dom1: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc
>>> pfifo_fast master eth-pub state UP qlen 32
>>>     link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff
>>>     inet6 fe80::fcff:ffff:feff:ffff/64 scope link
>>>        valid_lft forever preferred_lft forever
>>> 17: vif-mon-dom1: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc
>>> pfifo_fast master eth-mon state UP qlen 32
>>>     link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff
>>>     inet6 fe80::fcff:ffff:feff:ffff/64 scope link
>>>        valid_lft forever preferred_lft forever
>>> 42: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue
>>> state UP
>>>     link/ether 14:fe:b5:ca:4e:d5 brd ff:ff:ff:ff:ff:ff
>>>     inet6 fe80::16fe:b5ff:feca:4ed5/64 scope link
>>>        valid_lft forever preferred_lft forever
>>> 44: eth-pub: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state
>>> UP
>>>     link/ether 14:fe:b5:ca:4e:d5 brd ff:ff:ff:ff:ff:ff
>>>     inet6 2001:6a0:0:21::d0:20/64 scope global
>>>        valid_lft forever preferred_lft forever
>>>     inet6 fe80::16fe:b5ff:feca:4ed5/64 scope link
>>>        valid_lft forever preferred_lft forever
>>> 45: bond0.21 at bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
>>> master eth-pub state UP
>>>     link/ether 14:fe:b5:ca:4e:d5 brd ff:ff:ff:ff:ff:ff
>>>     inet6 fe80::16fe:b5ff:feca:4ed5/64 scope link
>>>        valid_lft forever preferred_lft forever
>>> 46: eth-mon: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state
>>> UP
>>>     link/ether 14:fe:b5:ca:4e:d5 brd ff:ff:ff:ff:ff:ff
>>>     inet6 2001:6a0:1021::2:2000/112 scope global
>>>        valid_lft forever preferred_lft forever
>>>     inet6 fe80::16fe:b5ff:feca:4ed5/64 scope link
>>>        valid_lft forever preferred_lft forever
>>> 47: bond0.402 at bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
>>> noqueue
>>> master eth-mon state UP
>>>     link/ether 14:fe:b5:ca:4e:d5 brd ff:ff:ff:ff:ff:ff
>>>     inet6 fe80::16fe:b5ff:feca:4ed5/64 scope link
>>>        valid_lft forever preferred_lft forever
>>> 48: eth0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master
>>> bond0 state UP qlen 1000
>>>     link/ether 14:fe:b5:ca:4e:d5 brd ff:ff:ff:ff:ff:ff
>>> 49: eth1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master
>>> bond0 state UP qlen 1000
>>>     link/ether 14:fe:b5:ca:4e:d5 brd ff:ff:ff:ff:ff:ff
>>> 50: eth2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
>>>     link/ether 14:fe:b5:ca:4e:d9 brd ff:ff:ff:ff:ff:ff
>>> 51: eth3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
>>>     link/ether 14:fe:b5:ca:4e:db brd ff:ff:ff:ff:ff:ff
>>>
>>>
>>> --------------------------------------------------------
>>>
>>> bridges:
>>> #> brctl show
>>> bridge name  bridge id               STP enabled     interfaces
>>> eth-mon              8000.14feb5ca4ed5       no              bond0.402
>>>                                                      vif-mon-dom1
>>> eth-pub              8000.14feb5ca4ed5       no              bond0.21
>>>                                                      vif-pub-dom1
>>>
>>> ----------------------------------------------------------
>>>
>>> bonding details:
>>>
>>>
>>> #> cat /proc/net/bonding/bond0
>>> Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
>>>
>>> Bonding Mode: IEEE 802.3ad Dynamic link aggregation
>>> Transmit Hash Policy: layer3+4 (1)
>>> MII Status: up
>>> MII Polling Interval (ms): 100
>>> Up Delay (ms): 0
>>> Down Delay (ms): 0
>>>
>>> 802.3ad info
>>> LACP rate: fast
>>> Min links: 0
>>> Aggregator selection policy (ad_select): stable
>>> Active Aggregator Info:
>>>      Aggregator ID: 3
>>>      Number of ports: 2
>>>      Actor Key: 17
>>>      Partner Key: 21
>>>      Partner Mac Address: 2c:21:72:9e:b0:80
>>>
>>> Slave Interface: eth0
>>> MII Status: up
>>> Speed: 1000 Mbps
>>> Duplex: full
>>> Link Failure Count: 0
>>> Permanent HW addr: 14:fe:b5:ca:4e:d5
>>> Aggregator ID: 3
>>> Slave queue ID: 0
>>>
>>> Slave Interface: eth1
>>> MII Status: up
>>> Speed: 1000 Mbps
>>> Duplex: full
>>> Link Failure Count: 0
>>> Permanent HW addr: 14:fe:b5:ca:4e:d7
>>> Aggregator ID: 3
>>> Slave queue ID: 0
>>> --------------------------------
>>>
>>> switch is configured with link aggregaation + LACP
>>>
>>> ---------------------------------------------------
>>> in dmesg we can see:
>>>
>>>
>>> [617897.820090] ------------[ cut here ]------------
>>> [617897.820106] WARNING: at
>>> /mnt/linux-2.6-3.2.6/debian/build/source_amd64_none/net/sched/sch_generic.c:255
>>> dev_watchdog+0xe9/0x148()
>>> [617897.820111] Hardware name: PowerEdge R815
>>> [617897.820115] NETDEV WATCHDOG: eth1 (bnx2): transmit queue 0 timed out
>>> [617897.820119] Modules linked in: bonding bnx2 xt_physdev xen_netback
>>> xen_blkback ebt_ip ebt_ip6 ebtable_filter ebtables bridge xen_evtchn xenfs
>>> dm_round_robin dm_multipath scsi_dh ipmi_si ipmi_devintf ipmi_msghandler
>>> 8021q
>>> garp stp snd_pcm snd_timer snd soundcore snd_page_alloc pcspkr psmouse
>>> serio_raw evdev joydev sp5100_tco tpm_tis tpm dcdbas tpm_bios amd64_edac_mod
>>> edac_core edac_mce_amd k10temp acpi_power_meter button processor thermal_sys
>>> xfs dm_mod nfs lockd fscache auth_rpcgss nfs_acl sunrpc xt_tcpudp xt_state
>>> ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables
>>> ip6t_REJECT nf_conntrack_ipv6 nf_conntrack nf_defrag_ipv6 ip6table_filter
>>> ip6_tables x_tables usbhid hid sg sr_mod cdrom sd_mod ses crc_t10dif
>>> enclosure
>>> ahci libahci libata lpfc scsi_transport_fc scsi_tgt ohci_hcd ehci_hcd
>>> megaraid_sas usbcore usb_common scsi_mod [last unloaded: bnx2]
>>> [617897.820257] Pid: 0, comm: swapper/0 Tainted: G        W    3.2.0-1-amd64
>>> #1
>>> [617897.820261] Call Trace:
>>> [617897.820264]  <IRQ>  [<ffffffff81046465>] ?
>>> warn_slowpath_common+0x78/0x8c
>>> [617897.820283]  [<ffffffff81046511>] ? warn_slowpath_fmt+0x45/0x4a
>>> [617897.820290]  [<ffffffff81291f15>] ? netif_tx_lock+0x40/0x72
>>> [617897.820297]  [<ffffffff81292076>] ? dev_watchdog+0xe9/0x148
>>> [617897.820305]  [<ffffffff81051af4>] ? run_timer_softirq+0x19a/0x261
>>> [617897.820311]  [<ffffffff81291f8d>] ? netif_tx_unlock+0x46/0x46
>>> [617897.820318]  [<ffffffff8104ba54>] ? __do_softirq+0xb9/0x177
>>> [617897.820326]  [<ffffffff8120a0ab>] ? __xen_evtchn_do_upcall+0x1b5/0x1f2
>>> [617897.820334]  [<ffffffff8133e8ec>] ? call_softirq+0x1c/0x30
>>> [617897.820342]  [<ffffffff8100f875>] ? do_softirq+0x3c/0x7b
>>> [617897.820348]  [<ffffffff8104bcbc>] ? irq_exit+0x3c/0x9a
>>> [617897.820354]  [<ffffffff8120b675>] ? xen_evtchn_do_upcall+0x27/0x32
>>> [617897.820360]  [<ffffffff8133e93e>] ? xen_do_hypervisor_callback+0x1e/0x30
>>> [617897.820363]  <EOI>  [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
>>> [617897.820374]  [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
>>> [617897.820382]  [<ffffffff8100663a>] ? xen_safe_halt+0xc/0x13
>>> [617897.820389]  [<ffffffff81014448>] ? default_idle+0x47/0x7f
>>> [617897.820395]  [<ffffffff8100d25f>] ? cpu_idle+0xaf/0xf2
>>> [617897.820402]  [<ffffffff81687b38>] ? start_kernel+0x3b8/0x3c3
>>> [617897.820408]  [<ffffffff8168963b>] ? xen_start_kernel+0x586/0x58c
>>> [617897.820412] ---[ end trace a7919e7f17c0a757 ]---
>>>
>>>
>>> and
>>>
>>>
>>> [617897.820427] bnx2 0000:01:00.1: eth1: DEBUG: intr_sem[0]
>>> PCI_CMD[00180006]
>>> [617897.824071] bnx2 0000:01:00.1: eth1: DEBUG: PCI_PM[19002008]
>>> PCI_MISC_CFG[92000088]
>>> [617897.824071] bnx2 0000:01:00.1: eth1: DEBUG: EMAC_TX_STATUS[00000008]
>>> EMAC_RX_STATUS[00000000]
>>> [617897.824071] bnx2 0000:01:00.1: eth1: DEBUG: RPM_MGMT_PKT_CTRL[40000088]
>>> [617897.824071] bnx2 0000:01:00.1: eth1: DEBUG:
>>> HC_STATS_INTERRUPT_STATUS[01fe0001]
>>> [617897.824071] bnx2 0000:01:00.1: eth1: <--- start MCP states dump --->
>>> [617897.824071] bnx2 0000:01:00.1: eth1: DEBUG: MCP_STATE_P0[0003610e]
>>> MCP_STATE_P1[0003610e]
>>> [617897.824071] bnx2 0000:01:00.1: eth1: DEBUG: MCP mode[0000b880]
>>> state[80000000] evt_mask[00000500]
>>> [617897.824071] bnx2 0000:01:00.1: eth1: DEBUG: pc[0800c6c8] pc[0800d7d4]
>>> instr[ac620038]
>>> [617897.824071] bnx2 0000:01:00.1: eth1: DEBUG: shmem states:
>>> [617897.824071] bnx2 0000:01:00.1: eth1: DEBUG: drv_mb[0103000f]
>>> fw_mb[0000000f] link_status[0000006f] drv_pulse_mb[00001d40]
>>> [617897.824071] bnx2 0000:01:00.1: eth1: DEBUG: dev_info_signature[44564903]
>>> reset_type[01005254] condition[0003610e]
>>> [617897.824071] bnx2 0000:01:00.1: eth1: DEBUG: 000003cc: 44444444 44444444
>>> 44444444 00000a28
>>> [617897.824071] bnx2 0000:01:00.1: eth1: DEBUG: 000003dc: 000cffff 00000000
>>> ffff0000 00000000
>>> [617897.824071] bnx2 0000:01:00.1: eth1: DEBUG: 000003ec: 00000000 00000000
>>> 00000000 00000000
>>> [617897.824071] bnx2 0000:01:00.1: eth1: DEBUG: 0x3fc[0000ffff]
>>> [617897.824071] bnx2 0000:01:00.1: eth1: <--- end MCP states dump --->
>>> [617898.236955] bnx2 0000:01:00.1: eth1: NIC Copper Link is Down
>>> [617898.268155] bonding: bond0: link status definitely down for interface
>>> eth1, disabling it
>>>
>>>
>>> The network is very unstable: from timeouts or IPv6 broadcasting somehow
>>> filtered/dropped (some hosts not responding to neighbour requests), through
>>> one or other VLAN not responding, up to both physical interfaces totally
>>> down
>>> ....
>>>
>>> From immediate "no connection", through problems after 1.5 hour,  up to
>>> strange
>>> behaviuor after 5 days ....
>>>
>>> Without bonding network was much more stable (i.e.  with only eth0 UP)
>>> - 1-2 months without problmes, however messages in dmesg where also present
>>> -
>>> without bonding ...
>>>
>>> bnx2 version:
>>> 2.1.11
>>>
>>> The server are not heavily loaded or dont have high network throughput ....
>>>
>>> ANY HELP HIGLY APPRECIATED !
>>>
>>> GB
>>>
>>>
>>
>>
>>
>>
>
>
> _______________________________________________
> Linux-PowerEdge mailing list
> Linux-PowerEdge at dell.com
> https://lists.us.dell.com/mailman/listinfo/linux-poweredge



-- 
Founder | Director | VP Research
Orion Virtualisation Solutions | www.orionvm.com.au | Phone: 1300 56
99 52 | Mobile: 0428 754 846



More information about the Linux-PowerEdge mailing list