need explaination for these error messages from OMSA
Damon L. Chesser
damon at damtek.com
Thu Oct 9 08:57:44 CDT 2008
On Thu, 2008-10-09 at 11:20 +0000, Arnar Þórarinsson wrote:
>
> Hello,
>
> Could somebody please explain these error messges to me. I've been
> trying to find some info on this but have found nothing.
>
> Severity : Critical
> ID : 1404
> Date and Time : Fri Oct 3 19:57:10 2008
> Category : Instrumentation Service
> Description : Memory device status is critical Memory device
> location: DIMM2_B Possible memory module event cause:Single bit
> warning error rate exceeded,Single bit error logging disabled
>
> Severity : Non-Critical
> ID : 1403
> Date and Time : Fri Oct 3 18:01:02 2008
> Category : Instrumentation Service
> Description : Memory device status is non-critical Memory device
> location: DIMM2_B Possible memory module event cause:Single bit
> warning error rate exceeded
>
>
> /Arnar Thorarinsson
Single bit warning errors by them selves mean very little other then the
memory found an error and corrected for it. However, IF you see many of
these errors, then there is a more serious issue. That would indicate
that you have a bad dimm or a bad dimm card. To test, just swap out
dimm2-b with another dimm and see if the error follows the dimm or stays
with the slot. If it stays with the slot, you need a new dimm card/MB,
if it follows the dimm, you need a new dimm.
Again, a few of these warnings mean nothing other then the ECC for your
memory is working as designed. Many of these warnings means you have
bad memory or bad memory riser/MB.
--
Damon L. Chesser <damon at damtek.com>
More information about the Linux-PowerEdge
mailing list