scanning for bad ram

Paul A razor at meganet.net
Thu Nov 8 09:49:31 CST 2007


Nick,

that makes sense and I will be doing the test offline soon. I did setup OSMA
on the server and can https to it. The log/alert files don't show anything
out of the ordinary with the ram. 

I do have one more out of topic question, I setup IT assistant on my desktop
and I can see certain servers that have the latest versions of OSMA but not
all of them even though I can https to them. These are servers on the same
vlan using the same ACL's/Firewall but I can see some but not others through
IT assistant. For some reason I can only see 3 servers out of 6 that I have
the OSMA software installed. I installed OSMA from the dell REPO.

Any thoughts ? 

Thanks,

paul

P.A > -----Original Message-----
P.A > From: linux-poweredge-bounces at dell.com [mailto:linux-poweredge-
P.A > bounces at dell.com] On Behalf Of Nick_Parrott at dell.com
P.A > Sent: Thursday, November 08, 2007 5:55 AM
P.A > To: linux-poweredge at lists.us.dell.com
P.A > Subject: RE: scanning for bad ram
P.A > 
P.A > Again, with reference to Kuba's response;
P.A > 
P.A > " To do proper RAM testing in an on-line system requires kernel
P.A > support
P.A > (kernel pages need to be moved around in physical memory for the
P.A > duration of the test). I don't know that such a thing exists, or if
P.A > it's
P.A > used by OMSA.
P.A > 
P.A > The only sane way to test RAM is by booting with memtest86. There are
P.A > really no alternatives."
P.A > 
P.A > I have to agree entirely, omdiag system memory will not be able to
P.A > test
P.A > the DIMMs in full, and I don't believe it does as extreme a test as
P.A > MPmemory - if you want an accurate result, you have to do the test
P.A > offline.
P.A > 
P.A > NB: I've NEVER utilised results of omdiag system memory for
P.A > troubleshooting - always used the offline test, it's the only way to
P.A > know for sure..
P.A > 
P.A > 
P.A > -----Original Message-----
P.A > From: Kewley, David
P.A > Sent: 07 November 2007 20:54
P.A > To: Parrott, Nick
P.A > Subject: RE: scanning for bad ram
P.A > 
P.A > Nick,
P.A > 
P.A > OMSA 5.0.0 has 'omdiag system memory'.  Would you not recommend that
P.A > to
P.A > customers?  I don't remember using it, only know about its existence.
P.A > 
P.A > David
P.A > 
P.A > > -----Original Message-----
P.A > > From: linux-poweredge-bounces at dell.com
P.A > > [mailto:linux-poweredge-bounces at dell.com] On Behalf Of
P.A > > Nick_Parrott at dell.com
P.A > > Sent: Wednesday, November 07, 2007 7:55 AM
P.A > > To: razor at meganet.net; linux-poweredge-Lists
P.A > > Subject: RE: scanning for bad ram
P.A > >
P.A > > Hi Paul,
P.A > >
P.A > > There are no diags within OMSA, you need to boot off some
P.A > diagnostics
P.A > > media and run the MPmemory diagnostic.
P.A > >
P.A > > Clear the ESM/SEL log first, then run the test to see if
P.A > > anything fails,
P.A > > if it does, you need to pop the lid and swap the faulty DIMM
P.A > > with a good
P.A > > DIMM, clear the log again, run the diagnostic and confirm
P.A > > that the fault
P.A > > follows the DIMM or the slot, then call support to get parts
P.A > > replaced if
P.A > > under warranty.
P.A > >
P.A > > MPmemory shows the SEL events, so clearing the log is a good
P.A > > idea as to
P.A > > avoid confusion..
P.A > >
P.A > > Regards,
P.A > >
P.A > > Nick
P.A > >
P.A > > -----Original Message-----
P.A > > From: Paul A [mailto:razor at meganet.net]
P.A > > Sent: 06 November 2007 19:43
P.A > > To: Parrott, Nick; linux-poweredge-Lists
P.A > > Subject: RE: scanning for bad ram
P.A > >
P.A > > Nick, thanks for the information.
P.A > >
P.A > > The reason I'm asking is because we have 3 1900's bought
P.A > > refurbished and
P.A > > one
P.A > > application is exiting with status 11 (SIGSEGV) on two of the
P.A > servers.
P.A > > The
P.A > > provider of the software tells me it's probably due to hardware or
P.A > ram
P.A > > failure.
P.A > >
P.A > > Can run osma and test the ram hardware while the server is up, will
P.A > it
P.A > > affect data stored in memory. I can always take one of the servers
P.A > I'm
P.A > > testing offline if it does.
P.A > >
P.A > > paul
P.A > > ________________________________________
P.A > > From: Nick_Parrott at Dell.com [mailto:Nick_Parrott at Dell.com]
P.A > > Sent: Tuesday, November 06, 2007 1:19 PM
P.A > > To: razor at meganet.net; linux-poweredge at lists.us.dell.com
P.A > > Subject: RE: scanning for bad ram
P.A > >
P.A > > It will indeed, both Single Bit errors and Multi Bit errors
P.A > >
P.A > > Single bit's are ECC recoverable - so you won't see effect on
P.A > > applications,
P.A > > multi-bit you'll probably know about..
P.A > >
P.A > > Any fault that the BMC (Baseboard Management Controller) logs will
P.A > be
P.A > > displayed in OMSA under "Logs" > "BMC/SEL"
P.A > >
P.A > > Use Dell MPmemory to test DIMMs offline, it's on the 32-bit
P.A > > Diagnostics
P.A > > CD's. It's a modified memtest86, however memtest86 will fail
P.A > > immediately
P.A > > as
P.A > > the system BIOS reserves a very small portion of memory space
P.A > > upon boot
P.A > >
P.A > > Nick
P.A > >
P.A > > From: linux-poweredge-bounces at dell.com
P.A > > [mailto:linux-poweredge-bounces at dell.com] On Behalf Of Paul A
P.A > > Sent: 06 November 2007 17:49
P.A > > To: linux-poweredge-Lists
P.A > > Subject: scanning for bad ram
P.A > >
P.A > > I have yet to install osma on my 1950's I know it will
P.A > > monitor disks but
P.A > > will it monitor and report problems with ram.
P.A > > Paul
P.A > >
P.A > > _______________________________________________
P.A > > Linux-PowerEdge mailing list
P.A > > Linux-PowerEdge at dell.com
P.A > > http://lists.us.dell.com/mailman/listinfo/linux-poweredge
P.A > > Please read the FAQ at http://lists.us.dell.com/faq
P.A > >
P.A > 
P.A > _______________________________________________
P.A > Linux-PowerEdge mailing list
P.A > Linux-PowerEdge at dell.com
P.A > http://lists.us.dell.com/mailman/listinfo/linux-poweredge
P.A > Please read the FAQ at http://lists.us.dell.com/faq



More information about the Linux-PowerEdge mailing list