stability problem with PE6850 on PERC4e/Di (CentOS 4.1/i386 + Sybase ASE 12.5)

Jerry Yu jjj863 at gmail.com
Tue Nov 7 09:18:27 CST 2006


anyone ?
The box is at 677MHZ, so the A03 doesn't apply to this system either.

On 11/1/06, Jerry Yu <jjj863 at gmail.com> wrote:
>
> the BIOS version is at A01 while the latest is A04. Second read of A03's
> release notes, I noticed the following two fixes that could be relevant to
> our system.   Where can I find more detailed notes other than
> PE6850-BIOSA03.TXT ?
> * Added support for Virtualization Technology in the processor.
>  Should I assume this is not referring to HT, but of special server
> virtualization assitance from Intel's VT (?) technology or alike ?
> * Added support for 800MHz system configurations.
> Does this mean BIOS prior to A03 doesn't support 800MHZ system
> configurations?
>
>
> On 11/1/06, Jerry Yu <jjj863 at gmail.com> wrote:
> >
> > Recently, we started to have lockups on a Dell PE6850. The server has
> > been up since last July and has been picking up more load as the database
> > grows in size and more web request/queries run against it.  It is a
> > dedicated database server running Sybase ASE 12.5.  Details below. Any
> > ideas?
> >
> >    - 4x Xeon CPU and 16G DDR2 ram (HT enabled in BIOS and in system,
> >    aka, 8 logical CPUs, all Seagate disks)
> >    - CentOS 4.1 /i386 (kernel-hugemem-2.6.9-11.EL with default cfq io
> >    scheduler)
> >    - an embded PERC 4e/Di (was at 521A before 10/17's lockup and 522A
> >    after)
> >    - two lockups with PERC firmware at 521A ( 09/172/006 2am and
> >    10/17/2006 2am) "reject i/o to offlined disk" without kernel panic or
> >    corruption
> >    - one brief disk activity suspension today with PERC firmware at
> >    522A A13
> >
> > Today at 11:00am just when the server started to ramp up to its daily
> > load peak,  some processes failed to write to the disk and 'date > junk'
> > from cmdline just hang there. I canceled that 'date>junk'.  All is good
> > after less than 4 minutes. Nothing interesting (warn/error/abort) in the
> > system log, exportlog from PERC, or database log.
> >
> > Older postings on similar topic on this list suggested PR could be the
> > culprit if BIOS/firmware is up-to-date. On the system, I get the following
> > output from '"megapr -dispPR -a0" today. Is #Iterations current count of the
> > total PR has run or a threshold or some sort? If the former, how to clear
> > it? If the latter, how to increase?  Basically I am looking into why it
> > locked up exactly 30 days (could be coincidence too. and we are now using
> > newer BIOS and firmware). Dell diag from OMSA 4.4 on 10/17/2006 suggests
> > nothing wrong the controller, memory, or underlying disks. (omreport on the
> > controller is appended below too).
> >
> > ********PR INFO********
> >
> >         Mode       :AUTO
> >         #Iterations:2200
> >         Status     :PR In Progress
> >
> > # omreport storage controller
> >  Controller  PERC 4e/Di (Embedded)
> >
> > Controllers
> > ID                                : 0
> > Status                            : Ok
> > Name                              : PERC 4e/Di
> > Slot ID                           : Embedded
> > State                             : Ready
> > Firmware Version                  : 522A
> > Driver Version                    : Not Applicable
> > Minimum Required Firmware Version : Not Applicable
> > Minimum Required Driver Version   : Not Applicable
> > Number of Channels                : 2
> > Rebuild Rate                      : 30%
> > Alarm State                       : Not Applicable
> > Cluster Mode                      : Not Applicable
> > SCSI Initiator ID                 : 7
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.us.dell.com/pipermail/linux-poweredge/attachments/20061107/16410a10/attachment.htm 


More information about the Linux-PowerEdge mailing list