2650 & RHEL Freezes (Randomly & Without Logging)

StefanoTurolla sturolla at eso.org
Wed Jan 28 12:30:01 CST 2004


Unfortunately this is a known problem
a lot of us is experiencing since months.
The problem happen random when there is a 
lot of i/o data on the disc.
The problem seems to be the driver aacraid 
used to manage the PERC 3Di raid controller
from DELL.
There a lot of open calls about it but it is not solved
just to keep in touch with people working on it i suggest
to have a look in bugzilla
 
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=92129


our workaround was to disable raid controller and 
install Linux using two disk mirrored via software.

ciao
stefano
Kevin Stussman wrote:
> We purchased a Dell 2650 about a few months ago. Things were fine up
> until a few weeks ago (what changed!?!)...unfortunately now we are
> having random "freezes" of the OS. The only breadcrumbs are messages on
> the console like the following (this is actually taken from another
> email message on this list, but is essentially the same...we have yet to
> capture the actual message on our system):
> 
> EXT3-fs error (device sd(8,1))
> ext3_reserve_inode_write: IO failure
> ext3_reserve_inode_write: IO failure
> ext3_get_inode_loc
> 
> Only a hard reboot fixes it; needless to say this does breed confidence.
> 
> I read a bunch of emails from people with similar problems, but there is
> no consensus on a solution (examples include turning off cached writing,
> bad CPU or controller, BIOS or firmware rev, tg3 problems, etc..have not
> started down this path yet). 
> 
> It _seems_ like it is stemming from the Perc3 (raid that holds our OS),
> but it's just a guess right now. We tried reproducing it using
> (bonnie++) to no avail. Our best guess is that it is not a stress/load
> issue, rather it seems to be more like "idle for a while, then a burst
> of activity, then freeze".
> 
> It looks like I will need to take the "blanket approach" to fixing this,
> but wanted to get feedback from anyone who has already had this problem
> and perhaps fixed it.
> 
> Thanks for any feedback,
> 
> Kevin
> 
> Specs
> ------------
> - Dell 2650
> 
> - 2 x Intel(R) Xeon(TM) CPU 3.20GHz
> 
> - Red Hat Enterprise Linux 2.4.21-4.0.1.ELsmp #1 SMP Thu Oct 23 01:27:36
> EDT 2003 i686 i686 i386 GNU/Linux
> 
> - SCSI Info 
> (perc 3 (on board raid 1 for OS)  & perc 4 (connected to external RAID
> 1+0 powervault 2205 unit running dual channel)
> 
> SCSI subsystem driver Revision: 1.00
> megaraid: v1.18j (Release Date: Mon Jul  7 14:39:55 EDT 2003)
> megaraid: found 0x1000:0x1960:idx 0:bus 1:slot 6:func 0
> scsi0 : Found a MegaRAID controller at 0xf883f000, IRQ: 16
> scsi0 : Enabling 64 bit support
> megaraid: [3.28:1.05] detected 1 logical drives
> megaraid: supports extended CDBs.
> megaraid: channel[1] is raid.
> megaraid: channel[2] is raid.
> scsi0 : LSI Logic MegaRAID 3.28 254 commands 15 targs 5 chans 7 luns
> Starting timer : 0 0
> blk: queue c84d6218, I/O limit 4095Mb (mask 0xffffffff)
> scsi0: scanning virtual channel 0 for logical drives.
>   Vendor: MegaRAID  Model: LD0 RAID1 41906R  Rev: 3.28
>   Type:   Direct-Access                      ANSI SCSI revision: 02
> Starting timer : 0 0
> blk: queue c84d6018, I/O limit 4095Mb (mask 0xffffffff)
> scsi0: scanning virtual channel 1 for logical drives.
> scsi0: scanning virtual channel 2 for logical drives.
> scsi0: scanning physical channel 0 for devices.
>   Vendor: DELL      Model: PV22XS            Rev: E.14
>   Type:   Processor                          ANSI SCSI revision: 03
> Starting timer : 0 0
> blk: queue f6a27e18, I/O limit 4095Mb (mask 0xffffffff)
> scsi0: scanning physical channel 1 for devices.
>   Vendor: DELL      Model: PV22XS            Rev: E.14
>   Type:   Processor                          ANSI SCSI revision: 03
> Starting timer : 0 0
> blk: queue f6a27c18, I/O limit 4095Mb (mask 0xffffffff)
> Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
> SCSI device sda: 495423488 512-byte hdwr sectors (253657 MB)
> Partition check:
>  sda: sda1
> Red Hat/Adaptec aacraid driver (1.1.2 Oct 23 2003 01:34:04)
> AAC0: kernel 2.7.4 build 3170
> AAC0: monitor 2.7.4 build 3170
> AAC0: bios 2.7.0 build 3170
> AAC0: serial 216461d3fafaf001
> scsi1 : percraid
> Starting timer : 0 0
> blk: queue f69dbe18, I/O limit 4095Mb (mask 0xffffffff)
>   Vendor: DELL      Model: PERCRAID Mirror   Rev: V1.0
>   Type:   Direct-Access                      ANSI SCSI revision: 02
> Starting timer : 0 0
> blk: queue f69dbc18, I/O limit 4095Mb (mask 0xffffffff)
> Attached scsi removable disk sdb at scsi1, channel 0, id 0, lun 0
> SCSI device sdb: 143357184 512-byte hdwr sectors (73399 MB)
> sdb: Write Protect is off
>  sdb: sdb1 sdb2 sdb3
> 
> 
> - eth0 & 1: Tigon3 [partno(BCM95703A30) rev 1002 PHY(5703)]
> (PCIX:133MHz:64-bit) 10/100/1000BaseT Ethernet
> 
> 
> 
> _______________________________________________
> Linux-PowerEdge mailing list
> Linux-PowerEdge at dell.com
> http://lists.us.dell.com/mailman/listinfo/linux-poweredge
> Please read the FAQ at http://lists.us.dell.com/faq or search the list archives at http://lists.us.dell.com/htdig/




More information about the Linux-PowerEdge mailing list