ServerAssistant daemons hanging

Les Niles lniles at Narus.com
Wed Mar 13 19:18:00 CST 2002


We're building a product based on the PowerEdge 6450, 
running RedHat 7.2.  A process runs every minute to 
gather health and statistics info on each box, to 
report to our system controller.  It invokes 
omreport several times, to collect temperatures, 
voltages, etc.

When the box is under heavy load, one of the dcstor32d 
processes hangs.  To be specific, it goes into an 
uninterruptible sleep (status "D").  Once this 
happens, omreport also hangs but can be killed.  The 
hung dcstor32d cannot be killed.  All the other 
dellomsa daemons can be restarted, but that doesn't 
really solve the problem: omreport then reports "No 
temperature probes [or whatever] found on this system."

It seems to take a few minutes of heavy load before 
the dcstor32d hangs.  By "heavy load" I mean roughly 
that 1 CPU in the dual CPU box is fully occupied by a 
process; the other CPU is not heavily loaded and the 
box remains generally responsive.  Within a few minutes 
after the heavy load disappears, the hung dcstor32d 
clears itself, at least some of the time.

Has anyone seen this kind of problem, and more usefully, 
found a solution?  Is there a way to collect the hardware 
health information -- voltages, fan speeds, temperatures, 
and power supplies' health -- without going through 
the daemon?  Like a published or reverse-engineered 
API to the esm driver? 

Les Niles
Narus, Inc.
lniles at narus.com




More information about the Linux-PowerEdge mailing list