scanning for bad ram
Nick_Parrott at Dell.com
Nick_Parrott at Dell.com
Thu Nov 8 04:55:01 CST 2007
Again, with reference to Kuba's response;
" To do proper RAM testing in an on-line system requires kernel support
(kernel pages need to be moved around in physical memory for the
duration of the test). I don't know that such a thing exists, or if it's
used by OMSA.
The only sane way to test RAM is by booting with memtest86. There are
really no alternatives."
I have to agree entirely, omdiag system memory will not be able to test
the DIMMs in full, and I don't believe it does as extreme a test as
MPmemory - if you want an accurate result, you have to do the test
offline.
NB: I've NEVER utilised results of omdiag system memory for
troubleshooting - always used the offline test, it's the only way to
know for sure..
-----Original Message-----
From: Kewley, David
Sent: 07 November 2007 20:54
To: Parrott, Nick
Subject: RE: scanning for bad ram
Nick,
OMSA 5.0.0 has 'omdiag system memory'. Would you not recommend that to
customers? I don't remember using it, only know about its existence.
David
> -----Original Message-----
> From: linux-poweredge-bounces at dell.com
> [mailto:linux-poweredge-bounces at dell.com] On Behalf Of
> Nick_Parrott at dell.com
> Sent: Wednesday, November 07, 2007 7:55 AM
> To: razor at meganet.net; linux-poweredge-Lists
> Subject: RE: scanning for bad ram
>
> Hi Paul,
>
> There are no diags within OMSA, you need to boot off some diagnostics
> media and run the MPmemory diagnostic.
>
> Clear the ESM/SEL log first, then run the test to see if
> anything fails,
> if it does, you need to pop the lid and swap the faulty DIMM
> with a good
> DIMM, clear the log again, run the diagnostic and confirm
> that the fault
> follows the DIMM or the slot, then call support to get parts
> replaced if
> under warranty.
>
> MPmemory shows the SEL events, so clearing the log is a good
> idea as to
> avoid confusion..
>
> Regards,
>
> Nick
>
> -----Original Message-----
> From: Paul A [mailto:razor at meganet.net]
> Sent: 06 November 2007 19:43
> To: Parrott, Nick; linux-poweredge-Lists
> Subject: RE: scanning for bad ram
>
> Nick, thanks for the information.
>
> The reason I'm asking is because we have 3 1900's bought
> refurbished and
> one
> application is exiting with status 11 (SIGSEGV) on two of the servers.
> The
> provider of the software tells me it's probably due to hardware or ram
> failure.
>
> Can run osma and test the ram hardware while the server is up, will it
> affect data stored in memory. I can always take one of the servers I'm
> testing offline if it does.
>
> paul
> ________________________________________
> From: Nick_Parrott at Dell.com [mailto:Nick_Parrott at Dell.com]
> Sent: Tuesday, November 06, 2007 1:19 PM
> To: razor at meganet.net; linux-poweredge at lists.us.dell.com
> Subject: RE: scanning for bad ram
>
> It will indeed, both Single Bit errors and Multi Bit errors
>
> Single bit's are ECC recoverable - so you won't see effect on
> applications,
> multi-bit you'll probably know about..
>
> Any fault that the BMC (Baseboard Management Controller) logs will be
> displayed in OMSA under "Logs" > "BMC/SEL"
>
> Use Dell MPmemory to test DIMMs offline, it's on the 32-bit
> Diagnostics
> CD's. It's a modified memtest86, however memtest86 will fail
> immediately
> as
> the system BIOS reserves a very small portion of memory space
> upon boot
>
> Nick
>
> From: linux-poweredge-bounces at dell.com
> [mailto:linux-poweredge-bounces at dell.com] On Behalf Of Paul A
> Sent: 06 November 2007 17:49
> To: linux-poweredge-Lists
> Subject: scanning for bad ram
>
> I have yet to install osma on my 1950's I know it will
> monitor disks but
> will it monitor and report problems with ram.
> Paul
>
> _______________________________________________
> Linux-PowerEdge mailing list
> Linux-PowerEdge at dell.com
> http://lists.us.dell.com/mailman/listinfo/linux-poweredge
> Please read the FAQ at http://lists.us.dell.com/faq
>
More information about the Linux-PowerEdge
mailing list