Michael E Brown michael_e_brown at
Wed Nov 14 17:09:00 CST 2001

Dirk Wetter wrote:

>one of our cluster machines (PE1550) dies once a week or so
>without any warning or hint in a system log. what i suppose
>is that the watchdog may be because of exceeding a temperature
>threshold could be the culprit. would there be a hint (i am running
>SuSE in this this) somewhere?
>Cerberus is running without errors for some hours... I was also
>wondering whether there's a program from DELL, since I expect
>that DELLs service department would give me a hard time, IF I would
>say I have a hardware problem detected by VA's cerberus....
If you load the Dell OpenManage Server Agent on your box, you can use IT 
Assistant to check whether your hardware is set for thermal shutdown or 
not, as well as turn this feature off.

Since OMSA depends on the RH kernels, it may be worth it for you to 
build a custom-built rescue scsi disk that you can boot and check these 
settings. (Unless you want to try to go through the effort of getting 
OMSA compiled for SuSE)

Michael E. Brown, RHCE, MCSE+I, CNA
Dell Linux Solutions

  If each of us have one object, and we exchange them,
     then each of us still has one object.
  If each of us have one idea,   and we exchange them,
     then each of us now has two ideas.

More information about the Linux-PowerEdge mailing list