[Linux-PowerEdge] Fwd: syslog gets hosed with CPU power

Tim Connors tim.w.connors at gmail.com
Thu Sep 5 20:42:56 CDT 2013


You saw one person complain about the performance dropping (and I
remember seeing that post), but that's because those of us who suffered
from the same problem saw that post and didn't update it because we had no
extra information to add.

There is a real bug in the BIOS or kernel (RHEL6 2.6.32-... in our case)
somewhere.  Under some very limited circumstances which I can't tell you
how to reproduce (because I don't know how to), the machine really does
slow down to the speed of an aging single core x86 486.  The problem is,
this has only happened to us once in the more than 12 months we've had
some 12th gen machines.

In the one case we've encountered this, the machine did not go back to
normal speed once the load was removed.  It took about a minute to log in,
and about another 10 minutes to successfully fail the production HA
package over to the other failover pair (yikes - no automatic failover
when the machine is just limping along, and no certainly no quick
emergency failover given that we have no STONITH!).  The load on the
machine was only rectified by rebooting it.

I've tried kicking up the load to over 64 on it since (making sure the
power supply went from about 200W consumption to over 600W), and with
various BIOS settings, but have not yet managed to reproduce the problem.
We have of course upgraded the kernel and firmware since then, but there
was nothing in the changelogs of either to suggest that anyone had found
and fixed a speed/temperature/throttling related problem.

On Thu, 5 Sep 2013, Anthony Ciani wrote:

> Hi Fred,
>
>
>
> The clearcpuid thing may not be present in your kernel.  It just masks certain cpu features so that the kernel doesn’t use them.  It is also possible that the feature bits in the CPUID are different on your CPUs than the ones used by the person who suggested it.
>
>
>
> These log messages appear on a lot more than just PowerEdge systems.  Everything with the Intel Sandy Bridge and later architectures with frequency stepping enabled will generate them.
>
>
>
> As mentioned, these are nothing but harmless notifications about the load on the system.  The “bug” was solved by disabling these messages by default, but allowing the user to turn them on by passing “int_pln_enable” as a kernel parameter.
>
>
>
> I only saw one post about performance degradation.  Perhaps that person had their cpufreq governor set at some very high usage limit, like 95% (so the CPU frequency would never increase).
>
>
>
> Until Redhat applies the kernel patch to deactivate the messages, you’ll just have to live with them.  I can’t imagine they make the log all that large, but if they are really annoying, you could:
>
>
>
> 1) Write a cron job to run a sed script to delete the lines from the log every hour or day.
>
>
>
> sed '/Core power/d;/Package power/d' –i /var/log/messages
>
>
>
> OR
>
>
>
> 2) Try modifying the load limits on your cpufreq governor.  Perhaps stepping up at a lower load will prevent reaching the power limits.  Just a guess though.
>
>
>
> OR
>
>
>
> 3) Turn off frequency stepping in the BIOS and/or run the system at full speed/power using the performance governor.
>
>
>
>
>
> From: Fred van Zwieten [mailto:fvzwieten at gmail.com]
> Sent: Thursday, September 05, 2013 2:37 AM
> To: Anthony Ciani
> Cc: linux-poweredge at dell.com
> Subject: Re: [Linux-PowerEdge] Fwd: syslog gets hosed with CPU power
>
>
>
> Well, I tried the clearcpuid=299, but that did not help. When googling around I see a lot of people complaining about the same and it's all on PowerEdge hardware only. RedHat mostly says it's harmless but it's also investigating it together with Dell. There are reports from people that they also experience slowness when these messages occur.
>
>
>
> Any other ways to switch this spam off?
>
>
>
>
> Groeten,
>
>
>
> Fred
>
>
>
> Science flies us to the moon. Religion flies us into buildings (Victor Stenger)
>
>
>
> On Tue, Sep 3, 2013 at 9:47 PM, Anthony Ciani <aciani at sivananthanlabs.us> wrote:
>
> Fred,
>
> Those are just notifications that the load on the CPU reached some limit and
> then went back down to normal.  From the times you gave, it looks like a
> high load program runs on the machine for about an hour, stops, and then
> about an hour later the machine is used again.
>
> The messages are just informative, and really do nothing other than tell you
> that the CPU is being used at 80+%.  For an HPC cluster, those messages may
> as well not even exist.
>
> They can supposedly be disabled by passing clearcpuid=299 as a kernel
> parameter.
>
> You could also edit your syslog.conf to reduce the verbosity of the message
> or direct them to another log location.
>
> https://bugzilla.kernel.org/show_bug.cgi?id=36182
>
>
>
>
> Date: Mon, 2 Sep 2013 20:57:38 +0200
> From: Fred van Zwieten <fvzwieten at vxcompany.com>
>
> Hi,
>
> We have a bunch of R620's with up-to-date RHEL6.4 OS on them. BIOS is 1.6.0
> and system firmware 1.37.35 (Build 02). Afaik all is current level.
>
> I get this in my /var/log/messages file:
>
> Sep  2 10:28:14 svg008 kernel: CPU1: Core power limit normal
> Sep  2 10:28:14 svg008 kernel: CPU3: Core power limit normal
> Sep  2 10:28:14 svg008 kernel: CPU5: Core power limit normal
> Sep  2 10:28:14 svg008 kernel: CPU7: Core power limit normal
> Sep  2 10:28:14 svg008 kernel: CPU3: Package power limit normal
> Sep  2 10:28:14 svg008 kernel: CPU5: Package power limit normal
> Sep  2 11:36:22 svg008 kernel: CPU1: Core power limit notification (total
> events = 93314)
> Sep  2 11:36:22 svg008 kernel: CPU3: Core power limit notification (total
> events = 93290)
> Sep  2 11:36:22 svg008 kernel: CPU5: Core power limit notification (total
> events = 93304)
> Sep  2 11:36:22 svg008 kernel: CPU7: Core power limit notification (total
> events = 92976)
> Sep  2 11:36:22 svg008 kernel: CPU3: Package power limit notification (total
> events = 93802)
> Sep  2 11:36:22 svg008 kernel: CPU5: Package power limit notification (total
> events = 93911)
> Sep  2 11:36:22 svg008 kernel: CPU7: Package power limit notification (total
> events = 93754)
> Sep  2 11:36:22 svg008 kernel: CPU1: Package power limit notification (total
> events = 93562)
> Sep  2 11:36:22 svg008 kernel: CPU1: Core power limit normal
> Sep  2 11:36:22 svg008 kernel: CPU3: Core power limit normal
> Sep  2 11:36:22 svg008 kernel: CPU5: Core power limit normal
> Sep  2 11:36:22 svg008 kernel: CPU7: Core power limit normal
> <snip>
> <-snip->
> Temperatures on all machines is well below warning level according to iDrac.
> <snip>
> <-snip->
> Any thoughts?
>
> Regards,
>
> Fred
>
>
> _______________________________________________
> Linux-PowerEdge mailing list
> Linux-PowerEdge at dell.com
> https://lists.us.dell.com/mailman/listinfo/linux-poweredge
>
>
>
>

-- 
Tim Connors



More information about the Linux-PowerEdge mailing list