R910 and MCE CPU overheat

Irwan Hadi ihblist18 at gmail.com
Sun Sep 19 21:05:15 CDT 2010


We have two brand new Dell PowerEdge R910 with quad E7540 processor
each, running Redhat Enterprise 5.4 and Oracle that keep getting CPU
overheat error in their MCE log.
There is no temperature warning in DRAC log, and also based on DRAC,
CPU temperature is fine..
Currently all of these R910 are running BIOS version 1.0.1

An example of the MCE log is as follow:

MCE 0 HARDWARE ERROR. This is *NOT* a software problem! Please contact
your hardware vendor CPU 21 THERMAL EVENT TSC 1d6f5a78c4a1eb [at 1995
Mhz 45 days 1:34:53 uptime (unreliable)] Processor 21 heated above
trip temperature. Throttling enabled. Please check your system
cooling. Performance will be impacted STATUS 880003cb MCGSTATUS 0


Does anyone have similar issue with their R910 ? I'm wondering if we
have defective CPU with bad thermal sensor or something with these
R910...


Thanks



More information about the Linux-PowerEdge mailing list