EXT3 errors on PE 1600SC

Joe Malicki jmalicki at metacarta.com
Mon Jan 8 09:34:08 CST 2007


It sounds like you have a bad disk (media errors).  Is this
RAID-0?  A single bad disk shouldn't cause the array to
go down.

Also, note that Dell has firmware updates for Maxtor
Atlas 10K disks that fall of the SCSI bus, but I don't
recall them ever giving media errors, just timeouts.

-joe

Karl Zander wrote:
>   
> We have a PE 1600SC with MegaRAID controller.
> 
> 01:02.0 RAID bus controller: LSI Logic / Symbios Logic 
> MegaRAID (rev 01)
> 01:04.0 SCSI storage controller: LSI Logic / Symbios Logic
> 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 07)
> 
> 
> We are getting the error
> 
> EXT3-fs error (device sda6) in start_transaction: Journal 
> has aborted
> 
> and the server crashes.  A reboot gets it back up, but it 
> will go down in 24 hours or so.
> 
> To run e2fsck -c /dev/sda6 must I unmount the system 
> first?
> 
> The list archives show others have also had this problem. 
>  One suggestion was to get linttylog to read the RAID 
> controller logs.  Here is that log.  I am not sure what 
> its showing me.
> 
> 
> TTY History for HA(0) -Bus 0x01 Device 0x02
> B @T0
> (C) LSI Logic 2002 @T0
> Megaraid Series 0 firmware version 3.28 @T0
> Build date: Jun 30 2003 at 10:14:11 @T0
> Board type: 1000/1960/1028/0520 @T0
> DRAM_ALT sig invalid from previous boot @T0
> FLUSH_ON_SYSTEM_RESET=0 @T0
> WAIT FOR BIOS.... @T0
> bus 1 dev 5 function 0 @T0
> reg 0 value ffffff01 @T0
> reg 1 value fbff0004 @T0
> reg 2 value 0 @T0
> reg 3 value fbfe0004 @T0
> reg 4 value 0 @T0
> reg 5 value 0 @T0
> SCSI chip is on the secondary bus @T0
> Found MPT LVD 30 at fbff0004 @T0
>   isp 0 membaseaddr 8bff0000 iobaseaddr 9001ff00  @T0
> BIOS UP! @T0
> Can_flush=0 DRAM SIZE=64 MB @T0
> * RST * @T0
> Enabling data cache @T0
> pciDebug = d0008a9c @T0
> calling init_scsi @T0
> DISK_CACHE_ADDR=d0d2bc00  @T0
> MEM_END_ADDR=d3fffff0  @T0
> Total LSI MPT Chips found 1  @T0
> LSI_InitMPT : start_index 0 totalLSIMPTChips 1 @T0
>          Verifying Image Signatures...VERIFIED @T0
>          Verifying image check sum... VERIFIED @T0
> BaseAddr 9001ff00 chip 0 @T0
> Checking Diagnostic Register write access...    Enabled. 
> @T0
> reset adapter bit cleared @T0
> Complete. @T0
> Checking Diagnostic Register write access...    Enabled. 
> @T0
> The FW version being loaded is MPTFW-01.03.06.00-IT @T0
> NextImageHeaderOffset 9784 @T0
> ExtImage Size 818 @T0
> Diag, Register disabled DIAGNOSTIC_REG 131 @T0
> FW download complete... Expecting LSI FW to start excute 
> and come to ready state
>   @T0
>   For this sys doorbell reg bit 28 should be set  @T0
>   MISM CHN_STATE_MPT_GET_FW_FEAT chip 0  @T2
>   Check IOC FACTS chip 0  @T2
>   MISM CHN_STATE_MPT_OPERATIONAL chip 0  @T2
>   MISM: Reply frame size 60 start addr d05389ec  @T2
> ff reply free frames posted @T2
>   MISM CHN_STATE_MPT_INIT_BUS_RST chip 0  @T2
> MPT_Poll: chip 0 CHN_STATE_MPT_INIT_BUS_RST  @T2
> cmdBufferAddr = fc538770, ioIdx = 0 @T5
> cmdBufferAddr = fc538808, ioIdx = 1 @T5
> CommandBufferPost Post: Request = d00201c0 @T5
> DISM: Queued! @T5
>   MPT_ProcessIo Reply Fr 2 EVENT_NOTIFICATION @T5
> MPI_EVENT_EVENT_CHANGE @T5
> DISM: CR8 SAFTE at chan 0 id 6 @T6
> DISM_ProcessPprState: DomainVal done on all disks @T8
> DISM: Complete!!! @T28
> bbuDebugFlags = d000d77c @T28
> battery init: battery backup circuit is not mounted @T28
> TBBU: No TBBU h/w @T28
> Veirfyin config struct at Addr e0001400  @T28
> NVRAM checksum OK - reading configuration @T28
> DISK_CACHE_ADDR=d0d2bc00  @T28
> MEM_END_ADDR=d3fffff0  @T28
> Memory End d3fffff0 @T28
> Total Number of Cache Lines 810 @T28
> L 5   SS 128   Size cb31000   N 810   Status 2   DT  251 
>   BT 512 @T28
> can_flush = 0 @T28
> No Reconst:Checking drive info @T28
>   @T28
> REF drive found at ch 0 tgt 0  @T28
> Attempting to perform drive roaming @T28
> NOT Flushing Cache @T28
> Battery Bad: Changing to WRTHRU @T28
> BIOS CALL FOR DRV ROAMING : 55 @1/8 12:57:26
> drive roaming not done @1/8 12:57:26
> REC:log MedErr on pid[1] FcRty=0 ScsiRty=0 @1/8 12:58:19
> REC: MedErr on LD[1] BadLba=8659c0 @1/8 12:58:19
> Retrying cmdId 2e @1/8 12:58:19
> REC:log MedErr on pid[1] FcRty=0 ScsiRty=0 @1/8 12:58:20
> REC: MedErr on LD[1] BadLba=8659c0 @1/8 12:58:20
> DIO with no cache command. returning FAILURE with CRB set 
> to -3 @1/8 12:58:20
>   @1/8 12:58:20
> Retrying cmdId 2e @1/8 12:58:20
> REC:log MedErr on pid[1] FcRty=0 ScsiRty=0 @1/8 12:58:21
> REC: MedErr on LD[1] BadLba=8659c0 @1/8 12:58:21
> <0,1> scsiErr=f1 rwCmdInx=2e cmdType=2 @1/8 12:58:21
>   Reassign d00a0840 94 40 @1/8 12:58:21
> REC:log MedErr on pid[1] FcRty=0 ScsiRty=0 @1/8 12:58:24
> REC: MedErr on LD[1] BadLba=8669d6 @1/8 12:58:24
> DIO with no cache command. returning FAILURE with CRB set 
> to -3 @1/8 12:58:24
>   @1/8 12:58:24
> Retrying cmdId 29 @1/8 12:58:24
> REC:log MedErr on pid[1] FcRty=0 ScsiRty=0 @1/8 12:58:32
> REC: MedErr on LD[1] BadLba=c216f6 @1/8 12:58:32
> <0,1> scsiErr=f1 rwCmdInx=10 cmdType=2 @1/8 12:58:32
>   Reassign d00988c0 170 76 @1/8 12:58:32
> REC:log MedErr on pid[1] FcRty=0 ScsiRty=0 @1/8 12:58:36
> REC: MedErr on LD[1] BadLba=c217d8 @1/8 12:58:36
> <0,1> scsiErr=f1 rwCmdInx=1d cmdType=2 @1/8 12:58:36
>   Reassign d00a2940 179 58 @1/8 12:58:36
> REC:log MedErr on pid[1] FcRty=0 ScsiRty=0 @1/8 13:12:18
> REC: MedErr on LD[1] BadLba=86668f @1/8 13:12:18
> <0,1> scsiErr=f1 rwCmdInx=34 cmdType=2 @1/8 13:12:18
>   Reassign d0091f00 110 f @1/8 13:12:18
> 
> --Karl
> 
> _______________________________________________
> Linux-PowerEdge mailing list
> Linux-PowerEdge at dell.com
> http://lists.us.dell.com/mailman/listinfo/linux-poweredge
> Please read the FAQ at http://lists.us.dell.com/faq
> 



More information about the Linux-PowerEdge mailing list