EXT3 errors on PE 1600SC

Karl Zander kzander at commpartners.com
Mon Jan 8 09:43:26 CST 2007


Its RAID-5.  Yes, the server is crashing.

--Karl

On Mon, 08 Jan 2007 10:34:08 -0500
  Joe Malicki <jmalicki at metacarta.com> wrote:
> It sounds like you have a bad disk (media errors).  Is 
>this
> RAID-0?  A single bad disk shouldn't cause the array to
> go down.
> 
> Also, note that Dell has firmware updates for Maxtor
> Atlas 10K disks that fall of the SCSI bus, but I don't
> recall them ever giving media errors, just timeouts.
> 
> -joe
> 
> Karl Zander wrote:
>>   
>> We have a PE 1600SC with MegaRAID controller.
>> 
>> 01:02.0 RAID bus controller: LSI Logic / Symbios Logic 
>> MegaRAID (rev 01)
>> 01:04.0 SCSI storage controller: LSI Logic / Symbios 
>>Logic
>> 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 07)
>> 
>> 
>> We are getting the error
>> 
>> EXT3-fs error (device sda6) in start_transaction: 
>>Journal 
>> has aborted
>> 
>> and the server crashes.  A reboot gets it back up, but 
>>it 
>> will go down in 24 hours or so.
>> 
>> To run e2fsck -c /dev/sda6 must I unmount the system 
>> first?
>> 
>> The list archives show others have also had this 
>>problem. 
>>  One suggestion was to get linttylog to read the RAID 
>> controller logs.  Here is that log.  I am not sure what 
>> its showing me.
>> 
>> 
>> TTY History for HA(0) -Bus 0x01 Device 0x02
>> B @T0
>> (C) LSI Logic 2002 @T0
>> Megaraid Series 0 firmware version 3.28 @T0
>> Build date: Jun 30 2003 at 10:14:11 @T0
>> Board type: 1000/1960/1028/0520 @T0
>> DRAM_ALT sig invalid from previous boot @T0
>> FLUSH_ON_SYSTEM_RESET=0 @T0
>> WAIT FOR BIOS.... @T0
>> bus 1 dev 5 function 0 @T0
>> reg 0 value ffffff01 @T0
>> reg 1 value fbff0004 @T0
>> reg 2 value 0 @T0
>> reg 3 value fbfe0004 @T0
>> reg 4 value 0 @T0
>> reg 5 value 0 @T0
>> SCSI chip is on the secondary bus @T0
>> Found MPT LVD 30 at fbff0004 @T0
>>   isp 0 membaseaddr 8bff0000 iobaseaddr 9001ff00  @T0
>> BIOS UP! @T0
>> Can_flush=0 DRAM SIZE=64 MB @T0
>> * RST * @T0
>> Enabling data cache @T0
>> pciDebug = d0008a9c @T0
>> calling init_scsi @T0
>> DISK_CACHE_ADDR=d0d2bc00  @T0
>> MEM_END_ADDR=d3fffff0  @T0
>> Total LSI MPT Chips found 1  @T0
>> LSI_InitMPT : start_index 0 totalLSIMPTChips 1 @T0
>>          Verifying Image Signatures...VERIFIED @T0
>>          Verifying image check sum... VERIFIED @T0
>> BaseAddr 9001ff00 chip 0 @T0
>> Checking Diagnostic Register write access...    Enabled. 
>> @T0
>> reset adapter bit cleared @T0
>> Complete. @T0
>> Checking Diagnostic Register write access...    Enabled. 
>> @T0
>> The FW version being loaded is MPTFW-01.03.06.00-IT @T0
>> NextImageHeaderOffset 9784 @T0
>> ExtImage Size 818 @T0
>> Diag, Register disabled DIAGNOSTIC_REG 131 @T0
>> FW download complete... Expecting LSI FW to start excute 
>> and come to ready state
>>   @T0
>>   For this sys doorbell reg bit 28 should be set  @T0
>>   MISM CHN_STATE_MPT_GET_FW_FEAT chip 0  @T2
>>   Check IOC FACTS chip 0  @T2
>>   MISM CHN_STATE_MPT_OPERATIONAL chip 0  @T2
>>   MISM: Reply frame size 60 start addr d05389ec  @T2
>> ff reply free frames posted @T2
>>   MISM CHN_STATE_MPT_INIT_BUS_RST chip 0  @T2
>> MPT_Poll: chip 0 CHN_STATE_MPT_INIT_BUS_RST  @T2
>> cmdBufferAddr = fc538770, ioIdx = 0 @T5
>> cmdBufferAddr = fc538808, ioIdx = 1 @T5
>> CommandBufferPost Post: Request = d00201c0 @T5
>> DISM: Queued! @T5
>>   MPT_ProcessIo Reply Fr 2 EVENT_NOTIFICATION @T5
>> MPI_EVENT_EVENT_CHANGE @T5
>> DISM: CR8 SAFTE at chan 0 id 6 @T6
>> DISM_ProcessPprState: DomainVal done on all disks @T8
>> DISM: Complete!!! @T28
>> bbuDebugFlags = d000d77c @T28
>> battery init: battery backup circuit is not mounted @T28
>> TBBU: No TBBU h/w @T28
>> Veirfyin config struct at Addr e0001400  @T28
>> NVRAM checksum OK - reading configuration @T28
>> DISK_CACHE_ADDR=d0d2bc00  @T28
>> MEM_END_ADDR=d3fffff0  @T28
>> Memory End d3fffff0 @T28
>> Total Number of Cache Lines 810 @T28
>> L 5   SS 128   Size cb31000   N 810   Status 2   DT  251 
>>   BT 512 @T28
>> can_flush = 0 @T28
>> No Reconst:Checking drive info @T28
>>   @T28
>> REF drive found at ch 0 tgt 0  @T28
>> Attempting to perform drive roaming @T28
>> NOT Flushing Cache @T28
>> Battery Bad: Changing to WRTHRU @T28
>> BIOS CALL FOR DRV ROAMING : 55 @1/8 12:57:26
>> drive roaming not done @1/8 12:57:26
>> REC:log MedErr on pid[1] FcRty=0 ScsiRty=0 @1/8 12:58:19
>> REC: MedErr on LD[1] BadLba=8659c0 @1/8 12:58:19
>> Retrying cmdId 2e @1/8 12:58:19
>> REC:log MedErr on pid[1] FcRty=0 ScsiRty=0 @1/8 12:58:20
>> REC: MedErr on LD[1] BadLba=8659c0 @1/8 12:58:20
>> DIO with no cache command. returning FAILURE with CRB 
>>set 
>> to -3 @1/8 12:58:20
>>   @1/8 12:58:20
>> Retrying cmdId 2e @1/8 12:58:20
>> REC:log MedErr on pid[1] FcRty=0 ScsiRty=0 @1/8 12:58:21
>> REC: MedErr on LD[1] BadLba=8659c0 @1/8 12:58:21
>> <0,1> scsiErr=f1 rwCmdInx=2e cmdType=2 @1/8 12:58:21
>>   Reassign d00a0840 94 40 @1/8 12:58:21
>> REC:log MedErr on pid[1] FcRty=0 ScsiRty=0 @1/8 12:58:24
>> REC: MedErr on LD[1] BadLba=8669d6 @1/8 12:58:24
>> DIO with no cache command. returning FAILURE with CRB 
>>set 
>> to -3 @1/8 12:58:24
>>   @1/8 12:58:24
>> Retrying cmdId 29 @1/8 12:58:24
>> REC:log MedErr on pid[1] FcRty=0 ScsiRty=0 @1/8 12:58:32
>> REC: MedErr on LD[1] BadLba=c216f6 @1/8 12:58:32
>> <0,1> scsiErr=f1 rwCmdInx=10 cmdType=2 @1/8 12:58:32
>>   Reassign d00988c0 170 76 @1/8 12:58:32
>> REC:log MedErr on pid[1] FcRty=0 ScsiRty=0 @1/8 12:58:36
>> REC: MedErr on LD[1] BadLba=c217d8 @1/8 12:58:36
>> <0,1> scsiErr=f1 rwCmdInx=1d cmdType=2 @1/8 12:58:36
>>   Reassign d00a2940 179 58 @1/8 12:58:36
>> REC:log MedErr on pid[1] FcRty=0 ScsiRty=0 @1/8 13:12:18
>> REC: MedErr on LD[1] BadLba=86668f @1/8 13:12:18
>> <0,1> scsiErr=f1 rwCmdInx=34 cmdType=2 @1/8 13:12:18
>>   Reassign d0091f00 110 f @1/8 13:12:18
>> 
>> --Karl
>> 
>> _______________________________________________
>> Linux-PowerEdge mailing list
>> Linux-PowerEdge at dell.com
>> http://lists.us.dell.com/mailman/listinfo/linux-poweredge
>> Please read the FAQ at http://lists.us.dell.com/faq
>> 
> 
> _______________________________________________
> Linux-PowerEdge mailing list
> Linux-PowerEdge at dell.com
> http://lists.us.dell.com/mailman/listinfo/linux-poweredge
> Please read the FAQ at http://lists.us.dell.com/faq

  



More information about the Linux-PowerEdge mailing list