PE1650 system crashes

H. Vernon Jones hvj at fluent.com
Wed Aug 27 16:02:01 CDT 2003


Hello Everyone,

I have a PE1650 that crashes regularly _after_ performing the network backup of 
a small remote office to an external SDLT jukebox. The system is a generic RH7.3 
(2.4.18-3) using the PERC3/Di to hold two simple volumes, no hardware RAID, with 
these volumes mirrored under linux software raid.

I have moved logging to local /var/log/messages instead of a network loghost, 
but I have not seen much but the below errors just before it stops logging. 
Since it is both disks, I am looking to the controller for problems.

---------------------------
Aug 26 04:04:15 nsr kernel: scsi : aborting command due to timeout : pid 
1462228, scsi2, channel 0, id 1, lun 0 Write (10) 00 00 20 db 53 00 00 10 00
Aug 26 04:04:15 nsr kernel: scsi : aborting command due to timeout : pid 
1462228, scsi2, channel 0, id 1, lun 0 Write (10) 00 00 20 db 53 00 00 10 00
Aug 26 04:04:15 nsr kernel: scsi : aborting command due to timeout : pid 
1462229, scsi2, channel 0, id 0, lun 0 Write (10) 00 00 20 db 53 00 00 10 00
Aug 26 04:04:15 nsr kernel: scsi : aborting command due to timeout : pid 
1462229, scsi2, channel 0, id 0, lun 0 Write (10) 00 00 20 db 53 00 00 10 00
--------------------------------------

I have just installed afaapps-2.6-1 in hopes of diagnosing this issue, but I 
cannot increase the S.M.A.R.T. MRIE error level, which is currently at "6". 
Setting SMART one disk at a time by calling the bus,channel,id returns the same 
error. "container list /full" does not show either disk 'locked'. Any pointers 
would be much appreciated. I would like to call the 800# with all my ducks in a row.

Here are the PERC details. I have downloaded the latest firmware, which I plan 
to apply tomorrow.

---------------------------------------------
AFA0> disk set smart /all /enable_exceptions=TRUE /logerr=TRUE /mrie=3
Executing: disk set smart /all=TRUE /enable_exceptions=TRUE /logerr=TRUE /mrie=3
One or more SMART fields on disk (0,00,0) were not set.
One or more SMART fields on disk (0,01,0) were not set.
Disk (0,06,0) does not support SMART error detection.

AFA0> disk show smart
Executing: disk show smart

         Smart    Method of         Enable
         Capable  Informational     Exception  Performance  Error
B:ID:L  Device   Exceptions(MRIE)  Control    Enabled      Count
------  -------  ----------------  ---------  -----------  ------
0:00:0     Y            6             Y           N             0
0:01:0     Y            6             Y           N             0
0:06:0     N

----------------------------------------------

FA0> controller details
Executing: controller details
Controller Information
----------------------
          Remote Computer: .
              Device Name: AFA0
          Controller Type: PERC 3/Di
              Access Mode: READ-WRITE
Controller Serial Number: Last Six Digits = 9041D3
          Number of Buses: 2
          Devices per Bus: 15
           Controller CPU: i960 R series
     Controller CPU Speed: 100 Mhz
        Controller Memory: 128 Mbytes
            Battery State: Ok

Component Revisions
-------------------
                 CLI: 2.7-0 (Build #4935)
                 API: 2.7-0 (Build #4935)
     Miniport Driver: 3.0-0 (Build #20773)
Controller Software: 2.7-1 (Build #3170)
     Controller BIOS: 2.7-1 (Build #3170)
Controller Firmware: (Build #3170)
---------------------------
TIA,

Vernon
Systems Administrator
Fluent, Inc.




More information about the Linux-PowerEdge mailing list