stability problem with PE6850 on PERC4e/Di (CentOS 4.1/i386 + Sybase ASE 12.5)

Jerry Yu jjj863 at gmail.com
Wed Nov 1 12:27:42 CST 2006


Recently, we started to have lockups on a Dell PE6850. The server has been
up since last July and has been picking up more load as the database grows
in size and more web request/queries run against it.  It is a dedicated
database server running Sybase ASE 12.5.  Details below. Any ideas?

   - 4x Xeon CPU and 16G DDR2 ram (HT enabled in BIOS and in system, aka,
   8 logical CPUs, all Seagate disks)
   - CentOS 4.1/i386 (kernel-hugemem-2.6.9-11.EL with default cfq io
   scheduler)
   - an embded PERC 4e/Di (was at 521A before 10/17's lockup and 522A
   after)
   - two lockups with PERC firmware at 521A ( 09/172/006 2am and
   10/17/2006 2am) "reject i/o to offlined disk" without kernel panic or
   corruption
   - one brief disk activity suspension today with PERC firmware at 522A
   A13

Today at 11:00am just when the server started to ramp up to its daily load
peak,  some processes failed to write to the disk and 'date > junk' from
cmdline just hang there. I canceled that 'date>junk'.  All is good after
less than 4 minutes. Nothing interesting (warn/error/abort) in the system
log, exportlog from PERC, or database log.

Older postings on similar topic on this list suggested PR could be the
culprit if BIOS/firmware is up-to-date. On the system, I get the following
output from '"megapr -dispPR -a0" today. Is #Iterations current count of the
total PR has run or a threshold or some sort? If the former, how to clear
it? If the latter, how to increase?  Basically I am looking into why it
locked up exactly 30 days (could be coincidence too. and we are now using
newer BIOS and firmware). Dell diag from OMSA 4.4 on 10/17/2006 suggests
nothing wrong the controller, memory, or underlying disks. (omreport on the
controller is appended below too).

********PR INFO********

        Mode       :AUTO
        #Iterations:2200
        Status     :PR In Progress

# omreport storage controller
 Controller  PERC 4e/Di (Embedded)

Controllers
ID                                : 0
Status                            : Ok
Name                              : PERC 4e/Di
Slot ID                           : Embedded
State                             : Ready
Firmware Version                  : 522A
Driver Version                    : Not Applicable
Minimum Required Firmware Version : Not Applicable
Minimum Required Driver Version   : Not Applicable
Number of Channels                : 2
Rebuild Rate                      : 30%
Alarm State                       : Not Applicable
Cluster Mode                      : Not Applicable
SCSI Initiator ID                 : 7
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.us.dell.com/pipermail/linux-poweredge/attachments/20061101/4403f832/attachment.htm 


More information about the Linux-PowerEdge mailing list