What to do when OM stops working?

Flaherty, Patrick pflaherty at wsi.com
Tue Oct 21 16:43:57 CDT 2008


> On occasion Openmanage stops working, on seemingly random times and on
> random servers. Omreport will show output like this:
> 
>   # omreport chassis memory
>   Memory Information
>   
>   Error : Memory object not found
> 
> Similar errors for all other components. Sometimes it helps to restart
> the services ('srvadmin-services restart'), but most often it 
> does not.
> Only thing that seems to help is to power off the server. The servers
> are running OM 5.4.0 on RHEL4 and RHEL5. The problem applies to
> different poweredge models.
> 
> Have any of you experienced the same, and if so, do you have a better
> solution than powering off the server?


I think it might be an ipmi bug/incompatibility/gremlin/evil spirit.
Seen a similar bug on a bunch of different models and patch levels for
`omreport chassis`. 

Try :
 #this command stops omsa, start ipmi, and starts omsa
 srvadmin-services.sh stop && service ipmi start && srvadmin-services.sh
start

On a side note, most of the monitoring scripts I've seen that run
omreport directly don't catch this condition. I modified mine to error
out if too few lines come back from omreport. You could also make a sudo
rule to allow your monitoring user to run `srvadmin-services.sh status`,
but that seemed like more work.

Patrick 



More information about the Linux-PowerEdge mailing list