Steve_Boley at Dell.com
Tue Mar 5 16:23:00 CST 2002
Look at the sizes of the databases on the 2 systems and see if there is a
size difference. I'm pasting text from a file to follow for locking
servers. Follow these guidelines and see if the locking stops.
Problem: Poweredge Servers locking up and screen freezing
Detail: System is hard locking and you have to power cycle the system to
recover. Screen is usually frozen and in NT you get only the
previous shutdown was unexpected error in event viewer. The issue is more
prevailant in PEx3xx and above and is more common on systems
with raid and multiple pci cards in the system. Is caused the majority of
the times by resource conflicts happening in the pci bus of the
server. Most commonly the locking is from raid/scsi controllers sharing
with network adaptors or other raid/scsi controllers. Heavy
loads on network and scsi system at the same time cause the lockups to
occur. Such as network backups accessing nic and scsi controller
that the tape drive is on or local backups accessing the scsi controller the
tape drive is on and the raid controller that the hard
drives are running off of at the same time and they are sharing resources.
Default bios settings for Poweredge servers is all options
enabled and in later models, there is only one irq available before adding
anything additional into the system and some models have none
1. Go into system setup and disable any unused or unnecessary devices. On
x3xx and early x4xx systems is on second page of the
bios. Enter F2 when the initial postup of the system instructs to enter
setup and when you enter setup, use Alt-P for 2nd page where
the devices are listed. Most common things to disable are; Parallel Port,
One or both Serial Ports, USB, Embedded NIC if not using, and
embedded SCSI controllers if an add-in raid card is present (if not tape
drive is on it). Later x4xx, x5xx, and x6xx systems are listed
as 'integrated devices'.
2. While in system setup and after disabling devices, turn on the Caps, Num,
and Scroll lock lights on your keyboard and then press Alt-E
and the system should beep at you. This is going to force a rescan of the
PCI bus after you reboot and reassign any irq's and memory
addresses that were sharing to open irq's and memory addresses that were
freed by disabling resources.
3. In older systems you can boot to the corresponding Resource Configuration
Utility and by choosing 'Step 3: View or edit details' and
the F7 for 'Advanced Options Menu'. Then choose the option 'View additional
system information' menu and then you can choose 'used
resources' and then hit escape and then choose 'available resources'. With
raid controllers you will see a bridge and the raid controller
sharing an irq as well as dual port nic's having the same bridge and nic
sharing but everything else should have it's own irq. You cannot
disable and free resources here, that has to be done in setup before either
booting to RCU from floppy or running it from your F10 utility
partition if it's still a partition on your hard drive or raid array. If
you manually want to assign irq's you need to highlight the
resource after entering the Step 3: 'View or edit details' and hit the F6
key. Example is 'PCI Function 1 PCI 5' would be the
corresponding entry for whatever pci card is in slot5.
This resolves a large number of systems freezing up and locking. Is most
common on later models that have ide cdroms, embedded raid, usb
controllers, and extra cards in pci slots but has occurred in older early
"black" box Poweredge server version x3xx as well.
From: Soukup, Kevin (IL50) [mailto:kevin.soukup at honeywell.com]
Sent: Tuesday, March 05, 2002 4:23 PM
To: 'linux-poweredge at dell.com'
Subject: (no subject)
I have 2-2550's that shipped from Dell within about a month of each other.
They're dual processor, PERC-3Di enabled with RAID 5 array and 2GB RAM. The
two servers were wiped clean and freshly installed from a Redhat 7.2 CD that
was built from an ISO. They're both currently running the 2.4.7-10enterprise
kernels. I did not do the fresh install, but let our SysAdmin do them for
me. I believe he tried to perform the exact same config to both servers, but
again, they were installed at different times.
I'm using both servers for Oracle, one for production and the other for
development and testing. The production server has been ROCK-SOLID and I
have no complaints about the performance or stability. It really is working
well. I've been quite satisfied except for one odd situation. The
development server has frozen solid, locked so tight I couldn't do a thing,
on two different occasions. There is no way to shut it down, it just freezes
and I have to power it off. Luckily, I am not seeing this symptom on my
production box, but it does have me worried.
The only observable difference between the two servers is the use of swap.
The production server has 2GB memory and 2000MB SWAP. It _never_ uses
swap, and only uses about 700MB RAM to begin with. I wouldn't expect it to
swap out since it has plenty of RAM in reserve.
The development has also has 2GB memory but a 2047MB swap file. It's using
only 500MB RAM, but for some crazy reason it keeps increasing the amount of
swap and doesn't appear to ever free it up. I don't understand why it would
use any swap at all since it has an abundance of free RAM at it's
finger-tips. I'm also wondering if this might be the cause of the lockups.
I have not been able to retrieve any information from the message log, and
I'm worried about this condition.
Any suggestions would be greatly appreciated.
E-mail: kevin.soukup at honeywell.com
Linux-PowerEdge mailing list
Linux-PowerEdge at dell.com
Please read the FAQ at http://lists.us.dell.com/faq or search the list
archives at http://lists.us.dell.com/htdig/
More information about the Linux-PowerEdge