[Linux-PowerEdge] C6145 ECC Error, how to find bad DIMM?

John Hanks griznog at gmail.com
Mon Jan 28 15:15:21 CST 2013


Hi,

I have an C6145 server that has all 32 DIMM slots filled and is
spontaneously rebooting several times per week. Each reboot shows up in the
SEL log as this (as viewed with ipmitool):

76 | 01/28/2013 | 13:03:35 | Unknown #0x81 |
77 | 01/28/2013 | 13:03:37 | Memory #0x60 | Uncorrectable ECC | Asserted
78 | 01/28/2013 | 13:04:55 | System Firmware Error #0x06 | Unknown Error |
Asserted
79 | 01/28/2013 | 13:05:06 | System Event #0x85 | OEM System boot event |
Asserted

Does anyone know hos I can map #0x60 back to a specific DIMM slot or even
to a specific bank/CPU? I'm really not looking forward to searching through
32 DIMMs, swapping them one at a time and waiting to see if I get another
ECC error.

Thanks,

jbh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.us.dell.com/pipermail/linux-poweredge/attachments/20130128/e29f819b/attachment.html 


More information about the Linux-PowerEdge mailing list