Michael E Brown
michael_e_brown at dell.com
Wed Nov 14 17:09:00 CST 2001
Dirk Wetter wrote:
>one of our cluster machines (PE1550) dies once a week or so
>without any warning or hint in a system log. what i suppose
>is that the watchdog may be because of exceeding a temperature
>threshold could be the culprit. would there be a hint (i am running
>SuSE in this this) somewhere?
>Cerberus is running without errors for some hours... I was also
>wondering whether there's a program from DELL, since I expect
>that DELLs service department would give me a hard time, IF I would
>say I have a hardware problem detected by VA's cerberus....
If you load the Dell OpenManage Server Agent on your box, you can use IT
Assistant to check whether your hardware is set for thermal shutdown or
not, as well as turn this feature off.
Since OMSA depends on the RH kernels, it may be worth it for you to
build a custom-built rescue scsi disk that you can boot and check these
settings. (Unless you want to try to go through the effort of getting
OMSA compiled for SuSE)
Michael E. Brown, RHCE, MCSE+I, CNA
Dell Linux Solutions
If each of us have one object, and we exchange them,
then each of us still has one object.
If each of us have one idea, and we exchange them,
then each of us now has two ideas.
More information about the Linux-PowerEdge