PE 4600 Death Spiral

Sean Bruno sean.bruno at metro1.com
Sat Oct 18 19:42:01 CDT 2003


See below:


On Sat, 2003-10-18 at 15:09, jason andrade wrote:
> On Sat, 18 Oct 2003, Sean Bruno wrote:
> 
> > We are running Solaris 8 X86 on the DELL PE 4600.  Thus far, we have
> > been successful and have about 10 units in production without a hitch.
> >
> > We recently received two new units running the higher end P4 Zeons and
> > ran into problems.  Specifically the machine goes into a "death spiral"
> > after approximately 4 days.
> 
> can you please provide specifications for your original PE4600s and
> the new servers ?  both of them use the P4 Xeon - your new 4600s may be
> using a faster spec.  in any case it's difficult to comment on what
> may be causing this if you don't specify the differences.
> 
> [...]
> 
Output of "psrinfo -v" for "old" machine that is currently in
production:

Status of processor 0 as of: 10/18/03 19:37:56
  Processor has been on-line since 01/28/03 03:49:36.
  The i386 processor operates at 1995 MHz,
        and has an i387 compatible floating point processor.
Status of processor 1 as of: 10/18/03 19:37:56
  Processor has been on-line since 01/28/03 03:49:38.
  The i386 processor operates at 1995 MHz,
        and has an i387 compatible floating point processor.

Output of "psrinfo -v" for "new" machine:
Status of processor 0 as of: 10/18/03 18:39:04
  Processor has been on-line since 10/15/03 22:50:59.
  The i386 processor operates at 2498 MHz,
        and has an i387 compatible floating point processor.
Status of processor 1 as of: 10/18/03 18:39:04
  Processor has been on-line since 10/15/03 22:51:01.
  The i386 processor operates at 2498 MHz,
        and has an i387 compatible floating point processor.

> > What is the significance of the 4 day run time before death?  What could
> 
> how long have you been running this config for to be sure it is always 4
> days ?
> 
I have been running this config for quite some time.  Even with nothing
happening on the box, i.e. no user applications, the box locks up in
approximately 4 days.  Sometimes I can get a shell, but things like
"top" and "ps" hang.  A truss shows them locking up on "/dev/mem" or
"/dev/kmem"
> > have changed between the old and new 4600's that would cause this?  The
> > uptime on our old machines is months, not days/hours.
> 
> initial suggestions - is hyper threading enabled on new servers but not
> the old ones because the new ones were shipped with faster cpus ?
> 
I thought of this one myself, hyper threading(logical processor) is
enabled on the ones that are working.  I checked my BIOS on the new
machines and it was enabled there as well.  I disabled this feature in
BIOS and this had no effect.

> regards,
> 
> -jason
-- 
Sean Bruno
Telecommunications Engineer
Metro One Telecommunications
Desk (503)524-1632
Cell (503)358-6832




More information about the Linux-PowerEdge mailing list