Crashes with 2.4.18-19 still

John P john at pmbbs.demon.co.uk
Fri Jan 10 09:05:00 CST 2003


I'm not sure now. The box didn't "crash" it ran itself out of memory, and
started to kill new connections. Hence connections would start but be killed
straightaway. I couldn't even log on locally.

All I've got are kernel: out of memory errors starting at 0052, with no idea
of what caused it? Nothing started running at that time, HTTP load wasn't
more than normal and no other daemons were overloaded. Nothing in any of the
logs except for the out of memory: process killed (http) messages.

I am running 1.25GB RAM and have 2GB swap.

Could this be an Apache problem? Or OS? Or hardware? (So I can take it to
the right place!)

John


----- Original Message -----
From: "Rechenberg, Andrew" <ARechenberg at shermanfinancialgroup.com>
To: "John P" <john at pmbbs.demon.co.uk>; <linux-poweredge at dell.com>
Sent: Friday, January 10, 2003 1:09 PM
Subject: RE: Crashes with 2.4.18-19 still



If the box is crashing even with the bcm module then it's probably not the
NIC.  Is the status LED on the box showing the all the hardware is OK?  We
had a 6600 that we to replace all the CPU's, the VRM's, and the system board
because the box would just die, but there were kernel panic messages on the
console when this happened.

I would probably get Dell on the phone for this one.

My 2¢
Andy.

-----Original Message-----
From: John P [mailto:john at pmbbs.demon.co.uk]
Sent: Thursday, January 09, 2003 8:43 PM
To: linux-poweredge at dell.com
Subject: Crashes with 2.4.18-19 still


Hi all

Today one of our 2650 servers has stopped responding to connections
properly. On eth0/eth1, any services (ssh/http/ftp - run from xinetd or
standalone) stop responding after making the initial connection before
authentication. eth0 can not be pinged but eth1 can. This is from a variety
of machines in the same datacentre and outside.

I am running RedHat 8.0 with kernel 2.4.18-19.8.0smp and the bcm5700
ethernet modules instead of tg3.

I haven't had it rebooted at the datacentre yet so can't check the logs.
However this is the first time it has crashed since moving from tg3 to
bcm5700 on the 1st. What can I do to debug this? They are colocated so I
really need these machines stable .. I have two machines, next to each
other, with the same spec and software so could link up a serial cable
between each so I can get oops data etc? Or should I be getting Dell to help
me sort this out ?


Regards
John

_______________________________________________
Linux-PowerEdge mailing list
Linux-PowerEdge at dell.com
http://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq or search the list
archives at http://lists.us.dell.com/htdig/

_______________________________________________
Linux-PowerEdge mailing list
Linux-PowerEdge at dell.com
http://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq or search the list
archives at http://lists.us.dell.com/htdig/




More information about the Linux-PowerEdge mailing list