scanning for bad ram

Nick_Parrott at Dell.com Nick_Parrott at Dell.com
Fri Nov 9 04:57:44 CST 2007


Check OMSA on each machine, configure the SNMP settings (in OMSA)
identically on every box, and ensure that the SNMP community is set
correctly (case sensitive)

If your ITA box is Windows, do the same in the SNMP settings, and of
course, UDP 161 and 162 need to be opened on all machines

-----Original Message-----
From: Paul A [mailto:razor at meganet.net] 
Sent: 08 November 2007 15:50
To: Parrott, Nick; linux-poweredge-Lists
Subject: RE: scanning for bad ram 

Nick,

that makes sense and I will be doing the test offline soon. I did setup
OSMA
on the server and can https to it. The log/alert files don't show
anything
out of the ordinary with the ram. 

I do have one more out of topic question, I setup IT assistant on my
desktop
and I can see certain servers that have the latest versions of OSMA but
not
all of them even though I can https to them. These are servers on the
same
vlan using the same ACL's/Firewall but I can see some but not others
through
IT assistant. For some reason I can only see 3 servers out of 6 that I
have
the OSMA software installed. I installed OSMA from the dell REPO.

Any thoughts ? 

Thanks,

paul

P.A > -----Original Message-----
P.A > From: linux-poweredge-bounces at dell.com [mailto:linux-poweredge-
P.A > bounces at dell.com] On Behalf Of Nick_Parrott at dell.com
P.A > Sent: Thursday, November 08, 2007 5:55 AM
P.A > To: linux-poweredge at lists.us.dell.com
P.A > Subject: RE: scanning for bad ram
P.A > 
P.A > Again, with reference to Kuba's response;
P.A > 
P.A > " To do proper RAM testing in an on-line system requires kernel
P.A > support
P.A > (kernel pages need to be moved around in physical memory for the
P.A > duration of the test). I don't know that such a thing exists, or
if
P.A > it's
P.A > used by OMSA.
P.A > 
P.A > The only sane way to test RAM is by booting with memtest86. There
are
P.A > really no alternatives."
P.A > 
P.A > I have to agree entirely, omdiag system memory will not be able to
P.A > test
P.A > the DIMMs in full, and I don't believe it does as extreme a test
as
P.A > MPmemory - if you want an accurate result, you have to do the test
P.A > offline.
P.A > 
P.A > NB: I've NEVER utilised results of omdiag system memory for
P.A > troubleshooting - always used the offline test, it's the only way
to
P.A > know for sure..
P.A > 
P.A > 
P.A > -----Original Message-----
P.A > From: Kewley, David
P.A > Sent: 07 November 2007 20:54
P.A > To: Parrott, Nick
P.A > Subject: RE: scanning for bad ram
P.A > 
P.A > Nick,
P.A > 
P.A > OMSA 5.0.0 has 'omdiag system memory'.  Would you not recommend
that
P.A > to
P.A > customers?  I don't remember using it, only know about its
existence.
P.A > 
P.A > David
P.A > 
P.A > > -----Original Message-----
P.A > > From: linux-poweredge-bounces at dell.com
P.A > > [mailto:linux-poweredge-bounces at dell.com] On Behalf Of
P.A > > Nick_Parrott at dell.com
P.A > > Sent: Wednesday, November 07, 2007 7:55 AM
P.A > > To: razor at meganet.net; linux-poweredge-Lists
P.A > > Subject: RE: scanning for bad ram
P.A > >
P.A > > Hi Paul,
P.A > >
P.A > > There are no diags within OMSA, you need to boot off some
P.A > diagnostics
P.A > > media and run the MPmemory diagnostic.
P.A > >
P.A > > Clear the ESM/SEL log first, then run the test to see if
P.A > > anything fails,
P.A > > if it does, you need to pop the lid and swap the faulty DIMM
P.A > > with a good
P.A > > DIMM, clear the log again, run the diagnostic and confirm
P.A > > that the fault
P.A > > follows the DIMM or the slot, then call support to get parts
P.A > > replaced if
P.A > > under warranty.
P.A > >
P.A > > MPmemory shows the SEL events, so clearing the log is a good
P.A > > idea as to
P.A > > avoid confusion..
P.A > >
P.A > > Regards,
P.A > >
P.A > > Nick
P.A > >
P.A > > -----Original Message-----
P.A > > From: Paul A [mailto:razor at meganet.net]
P.A > > Sent: 06 November 2007 19:43
P.A > > To: Parrott, Nick; linux-poweredge-Lists
P.A > > Subject: RE: scanning for bad ram
P.A > >
P.A > > Nick, thanks for the information.
P.A > >
P.A > > The reason I'm asking is because we have 3 1900's bought
P.A > > refurbished and
P.A > > one
P.A > > application is exiting with status 11 (SIGSEGV) on two of the
P.A > servers.
P.A > > The
P.A > > provider of the software tells me it's probably due to hardware
or
P.A > ram
P.A > > failure.
P.A > >
P.A > > Can run osma and test the ram hardware while the server is up,
will
P.A > it
P.A > > affect data stored in memory. I can always take one of the
servers
P.A > I'm
P.A > > testing offline if it does.
P.A > >
P.A > > paul
P.A > > ________________________________________
P.A > > From: Nick_Parrott at Dell.com [mailto:Nick_Parrott at Dell.com]
P.A > > Sent: Tuesday, November 06, 2007 1:19 PM
P.A > > To: razor at meganet.net; linux-poweredge at lists.us.dell.com
P.A > > Subject: RE: scanning for bad ram
P.A > >
P.A > > It will indeed, both Single Bit errors and Multi Bit errors
P.A > >
P.A > > Single bit's are ECC recoverable - so you won't see effect on
P.A > > applications,
P.A > > multi-bit you'll probably know about..
P.A > >
P.A > > Any fault that the BMC (Baseboard Management Controller) logs
will
P.A > be
P.A > > displayed in OMSA under "Logs" > "BMC/SEL"
P.A > >
P.A > > Use Dell MPmemory to test DIMMs offline, it's on the 32-bit
P.A > > Diagnostics
P.A > > CD's. It's a modified memtest86, however memtest86 will fail
P.A > > immediately
P.A > > as
P.A > > the system BIOS reserves a very small portion of memory space
P.A > > upon boot
P.A > >
P.A > > Nick
P.A > >
P.A > > From: linux-poweredge-bounces at dell.com
P.A > > [mailto:linux-poweredge-bounces at dell.com] On Behalf Of Paul A
P.A > > Sent: 06 November 2007 17:49
P.A > > To: linux-poweredge-Lists
P.A > > Subject: scanning for bad ram
P.A > >
P.A > > I have yet to install osma on my 1950's I know it will
P.A > > monitor disks but
P.A > > will it monitor and report problems with ram.
P.A > > Paul
P.A > >
P.A > > _______________________________________________
P.A > > Linux-PowerEdge mailing list
P.A > > Linux-PowerEdge at dell.com
P.A > > http://lists.us.dell.com/mailman/listinfo/linux-poweredge
P.A > > Please read the FAQ at http://lists.us.dell.com/faq
P.A > >
P.A > 
P.A > _______________________________________________
P.A > Linux-PowerEdge mailing list
P.A > Linux-PowerEdge at dell.com
P.A > http://lists.us.dell.com/mailman/listinfo/linux-poweredge
P.A > Please read the FAQ at http://lists.us.dell.com/faq



More information about the Linux-PowerEdge mailing list