c2100 CentOS 6.2 kernel hang.

Sabuj Pattanayek sabujp at gmail.com
Mon Mar 26 08:37:03 CDT 2012


I've had some serious issues with a C2100 as well for the past months
with CPU 11 PROC IERR errors while running RHEL5.8, where the system
will just hang without any kernel messages. I also did have centos 6
(don't recall which minor version) installed on it for a month or so
beginning in early Feb., which did not have the PROC IERR errors or
NMI errors. We switched out the motherboard, moved the CPUs around,
turned off c-states, and it still keeps saying that there's a PROC
IERR on CPU 11. I downgraded to the certified RHEL5.4/Centos 5.4
kernel last Friday and the system is still up, usually it crashes
within two days, but it's still going now (I'll let you know if it
crashes within the next few days). It's interesting that you've had no
issues with it at 5.7. Perhaps there was a change from 5.7 to 5.8
that's causing the PROC IERR error's. Have you tried older centos 6
kernels? There could have also been a kernel change from prior
versions that's causing your problem as well.

On Mon, Mar 26, 2012 at 4:00 AM, Scott Clark <scott.clark at webfusion.com> wrote:
> I've just received a batch of c2100s, installed CentOS 6.2 on them,
> after a few minutes of running, I get the following on the console:
>
> Uhhuh. NMI received for unknown reason 2d on CPU 0.
> Do you have a strange power saving mode enabled?
> Dazed and confused, but trying to continue.
>
> And the 4 port gigabit ethernet adaptor goes offline:
> idb 0000:06:00.0: eth0 reset adapter
> idb 0000:07:00.1: eth3 reset adapter
> idb 0000:06:00.1: eth1 reset adapter
> idb 0000:07:00.0: eth2 reset adapter
>
> Output from dmesg:
> Uhhuh. NMI received for unknown reason 2d on CPU 0.
> Do you have a strange power saving mode enabled?
> Dazed and confused, but trying to continue
> ------------[ cut here ]------------
> WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x26d/0x280() (Not
> tainted)
> Hardware name: PowerEdge C2100
> NETDEV WATCHDOG: eth3 (igb): transmit queue 0 timed out
> Modules linked in: ipmi_si mpt2sas scsi_transport_sas raid_class mptctl
> mptbase ipmi_devintf ipmi_msghandler dell_rbu 8021q garp stp llc bonding
> ipv6 dm_mod ses enclosure sg igb dca dcdbas serio_raw i2c_i801 i2c_core
> iTCO_wdt iTCO_vendor_support i7core_edac edac_core shpchp ext3 jbd
> mbcache sd_mod crc_t10dif megaraid_sas pata_acpi ata_generic ata_piix
> [last unloaded: ipmi_si]
> Pid: 0, comm: swapper Not tainted 2.6.32-220.7.1.el6.x86_64 0000001
> Call Trace:
>  <IRQ> [<ffffffff81069a17>] ? warn_slowpath_common+0x87/0xc0
>  [<ffffffff81069b06>] ? warn_slowpath_fmt+0x46/0x50
>  [<ffffffff8144a60d>] ? dev_watchdog+0x26d/0x280
>  [<ffffffff8107cff4>] ? mod_timer+0x144/0x220
>  [<ffffffff8144a3a0>] ? dev_watchdog+0x0/0x280
>  [<ffffffff8107c7f7>] ? run_timer_softirq+0x197/0x340
>  [<ffffffff810a0b20>] ? tick_sched_timer+0x0/0xc0
>  [<ffffffff8102af2d>] ? lapic_next_event+0x1d/0x30
>  [<ffffffff81072001>] ? __do_softirq+0xc1/0x1d0
>  [<ffffffff81095610>] ? hrtimer_interrupt+0x140/0x250
>  [<ffffffff8100c24c>] ? call_softirq+0x1c/0x30
>  [<ffffffff8100de85>] ? do_softirq+0x65/0xa0
>  [<ffffffff81071de5>] ? irq_exit+0x85/0x90
>  [<ffffffff814f4eb0>] ? smp_apic_timer_interrupt+0x70/0x9b
>  [<ffffffff8100bc13>] ? apic_timer_interrupt+0x13/0x20
>  <EOI> [<ffffffff812c4b0e>] ? intel_idle+0xde/0x170
>  [<ffffffff812c4af1>] ? intel_idle+0xc1/0x170
>  [<ffffffff813fa027>] ? cpuidle_idle_call+0xa7/0x140
>  [<ffffffff81009e06>] ? cpu_idle+0xb6/0x110
>  [<ffffffff814d420a>] ? rest_init+0x7a/0x80
>  [<ffffffff81c1ff76>] ? start_kernel+0x424/0x430
>  [<ffffffff81c1f33a>] ? x86_64_start_reservations+0x125/0x129
>  [<ffffffff81c1f438>] ? x86_64_start_kernel+0xfa/0x109
> ---[ end trace 120c4b9c89ff5465 ]---
> igb 0000:07:00.1: eth3: Reset adapter
> bonding: bond0: link status definitely down for interface eth3, disabling it
> igb 0000:06:00.0: eth0: Reset adapter
> bonding: bond0: link status definitely down for interface eth0, disabling it
> igb 0000:06:00.1: eth1: Reset adapter
> bonding: bond0: link status definitely down for interface eth1, disabling it
> igb 0000:07:00.0: eth2: Reset adapter
> bonding: bond0: link status definitely down for interface eth2, disabling it
>
> I've got CentOS 5.7 installed on c2100s as well which don't experience
> this issue.
>
> Any ideas whats causing this?
>
> --
> Regards,
>
> Scott Clark
> Unix System Administrator
> Webfusion
> Web: http://www.webfusion.com/
>
> _______________________________________________
> Linux-PowerEdge mailing list
> Linux-PowerEdge at dell.com
> https://lists.us.dell.com/mailman/listinfo/linux-poweredge



More information about the Linux-PowerEdge mailing list