stability problem with PE6850 on PERC4e/Di (CentOS 4.1/i386 + Sybase ASE 12.5)
Jerry Yu
jjj863 at gmail.com
Wed Nov 1 19:16:33 CST 2006
the BIOS version is at A01 while the latest is A04. Second read of A03's
release notes, I noticed the following two fixes that could be relevant to
our system. Where can I find more detailed notes other than
PE6850-BIOSA03.TXT?
* Added support for Virtualization Technology in the processor.
Should I assume this is not referring to HT, but of special server
virtualization assitance from Intel's VT (?) technology or alike ?
* Added support for 800MHz system configurations.
Does this mean BIOS prior to A03 doesn't support 800MHZ system
configurations?
On 11/1/06, Jerry Yu <jjj863 at gmail.com> wrote:
>
> Recently, we started to have lockups on a Dell PE6850. The server has been
> up since last July and has been picking up more load as the database grows
> in size and more web request/queries run against it. It is a dedicated
> database server running Sybase ASE 12.5. Details below. Any ideas?
>
> - 4x Xeon CPU and 16G DDR2 ram (HT enabled in BIOS and in system,
> aka, 8 logical CPUs, all Seagate disks)
> - CentOS 4.1 /i386 (kernel-hugemem-2.6.9-11.EL with default cfq io
> scheduler)
> - an embded PERC 4e/Di (was at 521A before 10/17's lockup and 522A
> after)
> - two lockups with PERC firmware at 521A ( 09/172/006 2am and
> 10/17/2006 2am) "reject i/o to offlined disk" without kernel panic or
> corruption
> - one brief disk activity suspension today with PERC firmware at
> 522A A13
>
> Today at 11:00am just when the server started to ramp up to its daily load
> peak, some processes failed to write to the disk and 'date > junk' from
> cmdline just hang there. I canceled that 'date>junk'. All is good after
> less than 4 minutes. Nothing interesting (warn/error/abort) in the system
> log, exportlog from PERC, or database log.
>
> Older postings on similar topic on this list suggested PR could be the
> culprit if BIOS/firmware is up-to-date. On the system, I get the following
> output from '"megapr -dispPR -a0" today. Is #Iterations current count of the
> total PR has run or a threshold or some sort? If the former, how to clear
> it? If the latter, how to increase? Basically I am looking into why it
> locked up exactly 30 days (could be coincidence too. and we are now using
> newer BIOS and firmware). Dell diag from OMSA 4.4 on 10/17/2006 suggests
> nothing wrong the controller, memory, or underlying disks. (omreport on the
> controller is appended below too).
>
> ********PR INFO********
>
> Mode :AUTO
> #Iterations:2200
> Status :PR In Progress
>
> # omreport storage controller
> Controller PERC 4e/Di (Embedded)
>
> Controllers
> ID : 0
> Status : Ok
> Name : PERC 4e/Di
> Slot ID : Embedded
> State : Ready
> Firmware Version : 522A
> Driver Version : Not Applicable
> Minimum Required Firmware Version : Not Applicable
> Minimum Required Driver Version : Not Applicable
> Number of Channels : 2
> Rebuild Rate : 30%
> Alarm State : Not Applicable
> Cluster Mode : Not Applicable
> SCSI Initiator ID : 7
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.us.dell.com/pipermail/linux-poweredge/attachments/20061101/afe4d8a5/attachment-0001.htm
More information about the Linux-PowerEdge
mailing list