watchdog

Dirk Wetter dirkw at rentec.com
Fri Nov 16 10:53:01 CST 2001


Hey Michael,

thx for your answer!

On Wed, 14 Nov 2001, Michael E Brown wrote:

> Dirk Wetter wrote:
>
> >Hi,
> >
> >one of our cluster machines (PE1550) dies once a week or so
> >without any warning or hint in a system log. what i suppose
> >is that the watchdog may be because of exceeding a temperature
> >threshold could be the culprit. would there be a hint (i am running
> >SuSE in this this) somewhere?
> >
> >Cerberus is running without errors for some hours... I was also
> >wondering whether there's a program from DELL, since I expect
> >that DELLs service department would give me a hard time, IF I would
> >say I have a hardware problem detected by VA's cerberus....
> >
> If you load the Dell OpenManage Server Agent on your box, you can use IT
> Assistant to check whether your hardware is set for thermal shutdown or
> not, as well as turn this feature off.
>
> Since OMSA depends on the RH kernels, it may be worth it for you to
> build a custom-built rescue scsi disk that you can boot and check these
> settings.

that's a very good hint actually!

> (Unless you want to try to go through the effort of getting
> OMSA compiled for SuSE)

i am in the process of doing that. but i gives me unneccessarily a
hard time, which disappoints me. the whole thing is designed more for
dummies then for people which most of the time know what they are
doing.
 i could live with the fact that it's not documented where actually the
rpm's and install scripts are "hidden" on the CD, i also would survive the
fact that the dependencies of the rpms are not ok on my SuSE systems, i
could also live with the fact, that the startup scripts are RH dependent,
but... it assumes fixed and redhat specific mount points for the CD, it
assumes that the CD is mounted on the same computer as the machine where I
want to install it (i have the CD popped in my SUN or maybe my 2000 or
whatever desktop and the data center is somewhere else) , it assumes
everything below /boot follows RH standards, it assumes, it assumes .....
finally, as you indicated.... it's some effort! well, sooner or later i'll
figure that out, since there are only shell scripts and Makefiles for that
part.

my point: why couldn't this stuff designed in a more compatible way?
it IS possible to compile a kernel, 2nd party drivers like qlogic, oss or
e1000, independent on the brand of Linux installed.

my suggestion would also be: if DELL would really opensource the
openmangement tools, the combination of efforts would result in a
better product, which is not only a benefit for the costumers but also
for DELL, since it saves service time. currently if a have a service
case on our 128CPU+ cluster, a lot of time is being spend on both sides
tracking down what the actual problem is.


thanks,
	~dirkw

-----------
Dirk Wetter
Renaissance Technologies/NY





More information about the Linux-PowerEdge mailing list