very bizzare 1650 and 7.3 problems

Bryce C bryce8080 at yahoo.com
Thu Aug 8 08:27:02 CDT 2002


Hi,

I could really use some advice with some very strange
Dell 1650 and RedHat 7.3 problems that we are having.

I've set up a number of identical configurations that
have worked fine in the past, however a recent
shipment of three 1650's all seem to have the
following problems:

+ anything with the second CPU causes a crash (sooner
or later). For now I will skip the reasons why I
believe it to be a 2nd cpu problem (it took a long
time to diagnose), but I do know that if I remove the
second CPU or boot to a non SMP-Linux kernal all works
perfectly.

+ The same RedHat 7.3 install that I have made to
other machines in the past (from the same install cd -
"server install, deselect the classic X Windows
option) now adds extra files (I've not had time to
check it out fully but I now get entries for CUPS
(printing) and vbox (voice-mail???) in /var/log/

I have NO idea on the software problem. I've used this
same CD for many installs, same options, but suddenly
on all three of these new 1650's get those files
installed. Bizarre.

On the second CPU issue, I have tried everything -
swap CPU's with known working machines, remove any
extra RAM, remove raid card, rebuild, rebuild the RAID
containers, Dell has even came out and replaced the
entire motherboard in one of the systems!

Each system has different symptoms. The most obvious
is the system that will not get past the "Checking for
new hardware" step in the boot up. It has a few times
but if I try to up2date the system to the latest SMP
kernel it dies as soon as it starts to install the
files. If I remove the second CPU, it boots fine 
fine. Like I said, I've also totally replaced both
CPU's with known working ones and the problem still
occurs!?

Are there any settings that I'm missing?

The other two systems appeared to work fine. However
now that one is in production I found it hung twice as
soon as my users started to do serious work on it
(running a large mysqldump process). It hung a couple
times, I took the 2nd CPU out (out of desperation!)
and now it works fine.

The third system also appears fine but gets strange
errors anytime SSL functions are called (for example,
it will not register with /usr/sbin/rhn_register -
gets SSL errors (the DNS is setup correctly - the
machine can lookup redhat's ips fine). It is not in
production yet so has not really saw any use.

I can only guess that the 2nd and 3rd machines
problems are related to the second CPU as well.

The machines came from Dell as:

single cpu 1.13 ghz
256 RAM
2 73 gig drives (no RAID)

We added:

second cpu (1.13 ghz 512k cache, the server version of
the chip - same as we always add - retail boxed
version

1 gig ram (registered, etc. same specs as we always
use. i still get the problems even if i remove the ram
though.)

Perc 3di card (same cards we've used in other 1650's -
straight from dell).

I hope someone has some magic configuration setting
that I need to change! :)

Also, the bios sees the two cpu's just fine.

I am at a loss! Any help is appreciated. Does anyone
have any idea what could be wrong? Like I've said,
Dell has even replaced the motherboard on the first
system. I'm starting to think it has something to do
with the strange RedHat installs (the extra CUPS and
vbox entires in /var/log/). However the install CD is
the same as I've always had and only I have access to
it (it is a cdr so in my most paranoid thinking it
could have been tampered with and replaced but no one
else has access to it).

I have no clue...

Thanks for any, any, ANY suggestions!



__________________________________________________
Do You Yahoo!?
HotJobs - Search Thousands of New Jobs
http://www.hotjobs.com




More information about the Linux-PowerEdge mailing list