c2100 CentOS 6.2 kernel hang.

Oleksiy Kovyrin alexey at kovyrin.net
Mon Mar 26 11:23:37 CDT 2012


Oh, I've spent my whole last weekend on this issue. Here is how you fix that:

grubby --grub --args="pcie_aspm=off" --update-kernel=ALL
and reboot

For more information on the issue, here is a list of "resources":
- Bug in RH bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=704758
- Similar issues with abother intel NIC:
http://serverfault.com/questions/193114/linux-e1000e-intel-networking-driver-problems-galore-where-do-i-start
- Some info about the option itself:
http://serverfault.com/questions/226319/what-does-pcie-aspm-do
- Thread in e1000-devel discussing similar case:
http://www.mail-archive.com/e1000-devel@lists.sourceforge.net/msg04715.html

On Mon, Mar 26, 2012 at 5:00 AM, Scott Clark <scott.clark at webfusion.com> wrote:
> I've just received a batch of c2100s, installed CentOS 6.2 on them,
> after a few minutes of running, I get the following on the console:
>
> Uhhuh. NMI received for unknown reason 2d on CPU 0.
> Do you have a strange power saving mode enabled?
> Dazed and confused, but trying to continue.
>
> And the 4 port gigabit ethernet adaptor goes offline:
> idb 0000:06:00.0: eth0 reset adapter
> idb 0000:07:00.1: eth3 reset adapter
> idb 0000:06:00.1: eth1 reset adapter
> idb 0000:07:00.0: eth2 reset adapter
>
> Output from dmesg:
> Uhhuh. NMI received for unknown reason 2d on CPU 0.
> Do you have a strange power saving mode enabled?
> Dazed and confused, but trying to continue
> ------------[ cut here ]------------
> WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x26d/0x280() (Not
> tainted)
> Hardware name: PowerEdge C2100
> NETDEV WATCHDOG: eth3 (igb): transmit queue 0 timed out
> Modules linked in: ipmi_si mpt2sas scsi_transport_sas raid_class mptctl
> mptbase ipmi_devintf ipmi_msghandler dell_rbu 8021q garp stp llc bonding
> ipv6 dm_mod ses enclosure sg igb dca dcdbas serio_raw i2c_i801 i2c_core
> iTCO_wdt iTCO_vendor_support i7core_edac edac_core shpchp ext3 jbd
> mbcache sd_mod crc_t10dif megaraid_sas pata_acpi ata_generic ata_piix
> [last unloaded: ipmi_si]
> Pid: 0, comm: swapper Not tainted 2.6.32-220.7.1.el6.x86_64 0000001
> Call Trace:
>  <IRQ> [<ffffffff81069a17>] ? warn_slowpath_common+0x87/0xc0
>  [<ffffffff81069b06>] ? warn_slowpath_fmt+0x46/0x50
>  [<ffffffff8144a60d>] ? dev_watchdog+0x26d/0x280
>  [<ffffffff8107cff4>] ? mod_timer+0x144/0x220
>  [<ffffffff8144a3a0>] ? dev_watchdog+0x0/0x280
>  [<ffffffff8107c7f7>] ? run_timer_softirq+0x197/0x340
>  [<ffffffff810a0b20>] ? tick_sched_timer+0x0/0xc0
>  [<ffffffff8102af2d>] ? lapic_next_event+0x1d/0x30
>  [<ffffffff81072001>] ? __do_softirq+0xc1/0x1d0
>  [<ffffffff81095610>] ? hrtimer_interrupt+0x140/0x250
>  [<ffffffff8100c24c>] ? call_softirq+0x1c/0x30
>  [<ffffffff8100de85>] ? do_softirq+0x65/0xa0
>  [<ffffffff81071de5>] ? irq_exit+0x85/0x90
>  [<ffffffff814f4eb0>] ? smp_apic_timer_interrupt+0x70/0x9b
>  [<ffffffff8100bc13>] ? apic_timer_interrupt+0x13/0x20
>  <EOI> [<ffffffff812c4b0e>] ? intel_idle+0xde/0x170
>  [<ffffffff812c4af1>] ? intel_idle+0xc1/0x170
>  [<ffffffff813fa027>] ? cpuidle_idle_call+0xa7/0x140
>  [<ffffffff81009e06>] ? cpu_idle+0xb6/0x110
>  [<ffffffff814d420a>] ? rest_init+0x7a/0x80
>  [<ffffffff81c1ff76>] ? start_kernel+0x424/0x430
>  [<ffffffff81c1f33a>] ? x86_64_start_reservations+0x125/0x129
>  [<ffffffff81c1f438>] ? x86_64_start_kernel+0xfa/0x109
> ---[ end trace 120c4b9c89ff5465 ]---
> igb 0000:07:00.1: eth3: Reset adapter
> bonding: bond0: link status definitely down for interface eth3, disabling it
> igb 0000:06:00.0: eth0: Reset adapter
> bonding: bond0: link status definitely down for interface eth0, disabling it
> igb 0000:06:00.1: eth1: Reset adapter
> bonding: bond0: link status definitely down for interface eth1, disabling it
> igb 0000:07:00.0: eth2: Reset adapter
> bonding: bond0: link status definitely down for interface eth2, disabling it
>
> I've got CentOS 5.7 installed on c2100s as well which don't experience
> this issue.
>
> Any ideas whats causing this?
>
> --
> Regards,
>
> Scott Clark
> Unix System Administrator
> Webfusion
> Web: http://www.webfusion.com/
>
> _______________________________________________
> Linux-PowerEdge mailing list
> Linux-PowerEdge at dell.com
> https://lists.us.dell.com/mailman/listinfo/linux-poweredge



-- 
Oleksiy Kovyrin
http://kovyrin.net/



More information about the Linux-PowerEdge mailing list