What to do when OM stops working?
Flaherty, Patrick
pflaherty at wsi.com
Tue Oct 21 16:43:57 CDT 2008
> On occasion Openmanage stops working, on seemingly random times and on
> random servers. Omreport will show output like this:
>
> # omreport chassis memory
> Memory Information
>
> Error : Memory object not found
>
> Similar errors for all other components. Sometimes it helps to restart
> the services ('srvadmin-services restart'), but most often it
> does not.
> Only thing that seems to help is to power off the server. The servers
> are running OM 5.4.0 on RHEL4 and RHEL5. The problem applies to
> different poweredge models.
>
> Have any of you experienced the same, and if so, do you have a better
> solution than powering off the server?
I think it might be an ipmi bug/incompatibility/gremlin/evil spirit.
Seen a similar bug on a bunch of different models and patch levels for
`omreport chassis`.
Try :
#this command stops omsa, start ipmi, and starts omsa
srvadmin-services.sh stop && service ipmi start && srvadmin-services.sh
start
On a side note, most of the monitoring scripts I've seen that run
omreport directly don't catch this condition. I modified mine to error
out if too few lines come back from omreport. You could also make a sudo
rule to allow your monitoring user to run `srvadmin-services.sh status`,
but that seemed like more work.
Patrick
More information about the Linux-PowerEdge
mailing list