[Linux-PowerEdge] Clearing memory errors

Karsten Suehring suehringlists at gmail.com
Wed Aug 27 05:55:04 CDT 2014


Hi,

these errors should disappear directly after exchanging a broken DIMM (at
least they did when I exchanged memory modules). Only a critical entry in
the ESM log should still be displayed, which would disappear after clearing
the log. But I did not have to make an exchange on 12G servers yet.

The usual exchange procedure includes changing the memory module to a
different bank, to check if the error is related to the module or the
memory channel. Maybe the problem was not the DIMM, or maybe you
accidentally replaced a wrong one?

Anyway, I would suggest contacting the Dell support again to clarify the
issue.

BR,
Karsten



On Tue, Aug 26, 2014 at 3:24 PM, Ben <bda20 at cam.ac.uk> wrote:

> We recently had an R720 exceed memory errors for a DIMM (16GB).  We got a
> replacement and swapped it in.  During which time the power cables were
> removed and the power button held down for 20+ seconds.
>
> However, on reboot, although the DRAC GUI says all is well, we see this
> with
> OMSA:
>
> # omreport chassis
> Health
>
> Main System Chassis
>
> SEVERITY : COMPONENT
> Ok       : Fans
> Ok       : Intrusion
> Critical : Memory
> [...]
>
> # omreport chassis memory
> Memory Information
>
> Health : Critical
>
> Attributes of Memory Array(s)
>
> Attributes of Memory Array(s)
> Location           : System Board or Motherboard
> Use                : System Memory
> Installed Capacity : 131072  MB
> Maximum Capacity   : 1572864  MB
> Slots Available    : 24
> Slots Used         : 8
> Error Correction   : Multibit ECC
>
> Total of Memory Array(s)
> Total Installed Capacity                     : 131072  MB
> Total Installed Capacity Available to the OS : 2974  MB
> Total Maximum Capacity                       : 1572864  MB
>
> Details of Memory Array 1
> Index          : 0
> Status         : Critical
> Connector Name : DIMM_A1
> Type           : DDR3 - Synchronous Registered (Buffered)
> Size           : 16384  MB
> [...]
>
> # omreport chassis memory index=0
> Memory Device Information
>
> Health : Critical
>
> Status      : Critical
> Device Name : DIMM_A1
> Size        : 16384 MB
> Type        : DDR3 Synchronous Registered (Buffered)
> Speed       : 0.54 ns
> Rank        : Dual
> Failures    : Single-bit failure error rate exceeded.
>
>
> Any ideas how to clear this, please?
>
> Ben
> --
> Unix Support, UIS, University of Cambridge, England
>
> _______________________________________________
> Linux-PowerEdge mailing list
> Linux-PowerEdge at dell.com
> https://lists.us.dell.com/mailman/listinfo/linux-poweredge
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.us.dell.com/pipermail/linux-poweredge/attachments/20140827/bb180790/attachment.html 


More information about the Linux-PowerEdge mailing list