Problems with R410

Tehmasp Chaudhri tchaudhri at rallydev.com
Mon Jan 9 10:47:26 CST 2012


On our r810 CentOS 5.7, under /proc, we have seen the cores go to C3 when
C/C1E is ENABLED in the BIOS.
True, once disabled in the BIOS I have not seen any C state transitions;
but the issue has been elusive and
on another thread one remedy was to force the max c-state value in the
Linux OS regardless.

Tehmasp

On Mon, Jan 9, 2012 at 6:37 AM, Dennis Jacobfeuerborn <dennisml at conversis.de
> wrote:

> On 01/09/2012 09:38 AM, Jens Dueholm Christensen (JEDC) wrote:
> > Dennis Jacobfeuerborn wrote:
> >
> >> Is there any official response from Dell regarding this issue? Given
> that a
> >> lot of new servers are going to be installed with RHEL/CentOS 6 that
> sounds
> >> like something owners of the hardware should be informed of.
> >
> > It's really not a HW-issue - the same hardware has no problems under
> other OSes (RHEL<  v6, *BSD, Windows etc etc)
>
> According to Intel it is a hardware issue and the reason other OSes work is
> because they do not support the new c-states of these processors and as a
> result simply do not hit this particular issue. From the Intel spec update:
> "Rapid Core C3/C6 Transition May Cause Unpredictable System Behavior
>
> Under a complex set of internal conditions, cores rapidly performing C3/C6
> transitions in a system with Intel® Hyper-Threading Technology enabled may
> cause a machine check error (IA32_MCi_STATUS.MCACOD = 0x0106), system hang
> or unpredictable system behavior.
>
> This erratum may cause a machine check error, system hang or unpredictable
> system behavior."
>
> and for 56xx:
> "Package C3/C6 Transitions When Memory 2x Refresh is Enabled May Result in
> a System Hang
>
> If ASR_PRESENT (MC_CHANNEL_{0,1,2}_REFRESH_THROTTLE_SUP PORT CSR function
> 0, offset 68H, bit [0], Auto Self Refresh Present) is clear which indicates
> that high temperature operation is not supported on the DRAM, the memory
> controller will not enter self-refresh if software has REF_2X_NOW (bit 4 of
> the MC_CLOSED_LOOP CSR, function 3, offset 84H) set. This scenario may
> cause the system to hang during C3/C6 entry.
>
> Failure to enter self-refresh can delay C3/C6 power state transitions to
> the point that a system hang may result with CATERR being asserted.
> REF_2X_NOW is used to double the refresh rate when the DRAM is operating in
> extended temperature range. The ASR_PRESENT was intended to allow low power
> self refresh with DRAM that does not support automatic self refresh."
>
> > Redhat is working on something, but the bugzilla issue (#710265 -
> https://bugzilla.redhat.com/show_bug.cgi?id=710265) has been marked
> non-public for quite some time now, so it's anyones guess on the
> linux-poweredge list whats going on there.
> >
> > I wonder if Shyam Iyer from Dell (see
> http://lists.us.dell.com/pipermail/linux-poweredge/2011-December/045732.html)
> has found out anything yet..
>
> Some update from official sources would be nice. So far most of the
> information out there seems to be gathered by users and I'm pretty sure
> somebody at Dell must be looking into this and have some information about
> progress and when we can expect a proper resolution of this issue.
>
> Regards,
>   Dennis
>
> _______________________________________________
> Linux-PowerEdge mailing list
> Linux-PowerEdge at dell.com
> https://lists.us.dell.com/mailman/listinfo/linux-poweredge
>



-- 
Tehmasp
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.us.dell.com/pipermail/linux-poweredge/attachments/20120109/161d420d/attachment.html 


More information about the Linux-PowerEdge mailing list