"fixing" memory error

Tino Schwarze linux-poweredge.lists at tisc.de
Thu Sep 20 06:20:19 CDT 2007


Hi Fred,

On Wed, Sep 19, 2007 at 01:12:57PM -0400, Fred Skrotzki wrote:
> You didn't say which system it is.  But you might be able to use the
> following if it has a BMC and you installed openipmi and you'll get
> all the information you want. Or even from another system with
> openipmi installed over the lan (see the manpage for the options).

Sorry, it's a PE650, so no BMC.

> Ipmitool sel list 
> This will dump the bmc log and give you date and time stamps for when things occurred.

ipmitool doesn't work, the kernel cannot find any drivers:

Sep 20 13:10:26 xxxx kernel: ipmi message handler version 39.0
Sep 20 13:10:26 xxxx kernel: IPMI System Interface driver.
Sep 20 13:10:27 xxxx kernel: ipmi_si: Unable to find any System Interface(s)

> 
> To get more information on a specific event do
> Ipmi sel list -v 
> and look for the event number.  With this output you can decode the bank and slot numbers from the event data.

I know the slot from OMSA (actually, there's only one DIMM).

> If the system has a line at the end of several repeating errors like this
> "Event Logging Disabled #0x06 | Correctable memory error logging disabled | Asserted"
> then you'll not see more messages for more if it is still happening.  To clear the flags do
> Ipmitool bmc reset cold
> This re-powers the BMC interface only and will clear the blinking orange light until the error occurs again.

Is there a way to do that from OMSA?

Thanks,

Tino.

> -----Original Message-----
> From: linux-poweredge-bounces at dell.com [mailto:linux-poweredge-bounces at dell.com] On Behalf Of Tino Schwarze
> Sent: Sunday, September 16, 2007 11:22 AM
> To: linux-poweredge at dell.com
> Subject: "fixing" memory error
> 
> Hi there,
> 
> we've got a machine here which hasn't seen OMSA for a long time. Three years ago, a multibit ECC fault was detected (we had a broken DIMM then). We replaced the DIMM and haven't had any problem since then.
> Recently, the machine got a new OS and we installed OMSA on it. Now the error shows up:
> 
> xxxxxx:~ # omreport chassis memory
> Memory Information
> 
> Health : Critical
> [...]
> Details of Memory Array 1
> Index          : 0
> Status         : Critical
> Connector Name : DIMM A
> Type           : DDR-SYNCHRONOUS
> Size           : 1024  MB
> 
> How do we clear the error condition? Do we need to power down, swap the DIMM, power on for the BIOS to notice, power off, swap back? Or is there an easier way (couldn't find one with omconfig)?
> 
> Thanks,
> 
> Tino.
> 
> --
> www.spiritualdesign-chemnitz.de
> www.craniosacralzentrum.de
> 
> Tino Schwarze * Parkstraße 17h * 09120 Chemnitz
> 
> _______________________________________________
> Linux-PowerEdge mailing list
> Linux-PowerEdge at dell.com
> http://lists.us.dell.com/mailman/listinfo/linux-poweredge
> Please read the FAQ at http://lists.us.dell.com/faq

-- 
www.spiritualdesign-chemnitz.de

Tino Schwarze * Parkstraße 17h * 09120 Chemnitz



More information about the Linux-PowerEdge mailing list