Problems with R410

Dennis Jacobfeuerborn dennisml at conversis.de
Mon Jan 9 07:37:05 CST 2012


On 01/09/2012 09:38 AM, Jens Dueholm Christensen (JEDC) wrote:
> Dennis Jacobfeuerborn wrote:
>
>> Is there any official response from Dell regarding this issue? Given that a
>> lot of new servers are going to be installed with RHEL/CentOS 6 that sounds
>> like something owners of the hardware should be informed of.
>
> It's really not a HW-issue - the same hardware has no problems under other OSes (RHEL<  v6, *BSD, Windows etc etc)

According to Intel it is a hardware issue and the reason other OSes work is 
because they do not support the new c-states of these processors and as a 
result simply do not hit this particular issue. From the Intel spec update:
"Rapid Core C3/C6 Transition May Cause Unpredictable System Behavior

Under a complex set of internal conditions, cores rapidly performing C3/C6 
transitions in a system with Intel® Hyper-Threading Technology enabled may 
cause a machine check error (IA32_MCi_STATUS.MCACOD = 0x0106), system hang 
or unpredictable system behavior.

This erratum may cause a machine check error, system hang or unpredictable 
system behavior."

and for 56xx:
"Package C3/C6 Transitions When Memory 2x Refresh is Enabled May Result in 
a System Hang

If ASR_PRESENT (MC_CHANNEL_{0,1,2}_REFRESH_THROTTLE_SUP PORT CSR function 
0, offset 68H, bit [0], Auto Self Refresh Present) is clear which indicates 
that high temperature operation is not supported on the DRAM, the memory 
controller will not enter self-refresh if software has REF_2X_NOW (bit 4 of 
the MC_CLOSED_LOOP CSR, function 3, offset 84H) set. This scenario may 
cause the system to hang during C3/C6 entry.

Failure to enter self-refresh can delay C3/C6 power state transitions to 
the point that a system hang may result with CATERR being asserted. 
REF_2X_NOW is used to double the refresh rate when the DRAM is operating in 
extended temperature range. The ASR_PRESENT was intended to allow low power 
self refresh with DRAM that does not support automatic self refresh."

> Redhat is working on something, but the bugzilla issue (#710265 - https://bugzilla.redhat.com/show_bug.cgi?id=710265) has been marked non-public for quite some time now, so it's anyones guess on the linux-poweredge list whats going on there.
>
> I wonder if Shyam Iyer from Dell (see http://lists.us.dell.com/pipermail/linux-poweredge/2011-December/045732.html) has found out anything yet..

Some update from official sources would be nice. So far most of the 
information out there seems to be gathered by users and I'm pretty sure 
somebody at Dell must be looking into this and have some information about 
progress and when we can expect a proper resolution of this issue.

Regards,
   Dennis



More information about the Linux-PowerEdge mailing list