Clearing ECC error without rebooting?

Trond Hasle Amundsen t.h.amundsen at usit.uio.no
Mon Dec 15 06:20:03 CST 2008


Dirk Taggesell <dirk.taggesell at proximic.com> writes:

> on one of our DELL servers here (a 1950) an ECC mem error occured, it
> was non-critical and I want to delete the event so that the NAGIOS check
> (check_dell_sensors) doesn't complain anymore (until another ECC error
> occurs).
>
> Of course I could reboot the machine, but that doesn't make sense whe
> the RAM error was only a single event and does not occur frequently.
>
> Yet I couldn't figure a way to "reset" the internal error log. I already
> deleted the event log - to no avail.
>
> How do I clear the error log without having to reboot the machine?
>
> The DELL software (5.2) is installed (omreport, omconfig, omexec,
> omupdate), the machine is running OpenSuSE 10.2 64Bit.

There are two logs, you may have to clear both:

  omconfig system alertlog action=clear
  omconfig system esmlog action=clear

Then restart the OM services:

  srvadmin-services.sh restart

That should be enough to remove the error.

On the other hand, you should probably replace the failed memory
module instead of pretending that it never happened... ;)

Cheers,
-- 
Trond H. Amundsen <t.h.amundsen at usit.uio.no>
Center for Information Technology Services, University of Oslo



More information about the Linux-PowerEdge mailing list