Fatal I/O Failures on 2650

Johnathan Conley jdc at nextjet.com
Wed Aug 20 11:45:00 CDT 2003


For reference, we are an enterprise-level software development shop, and
this is our build server.
We have appx 4 builds running, and the disk I/O is rather heavy and
continuous.

We have about 12 of these 2650 servers - however the I/O usage on this
box is considerably higher than any other. We have already swapped the
hard drives to another chassis, and also imaged the disks (2x146gb) and
re-imaged them to 3x32b disks to ensure there are no problems with the
actual disks. We have the latest bios, backplane firmware and PERC3/Di
firmware installed.

We are running RedHat Linux 7.3, latest kernel - the file systems are
buffered, and the RAID cache is enabled read/write.

We have also run 2 sets of Dell Diagnostics provided by tech support
with no errors.


At some random time when the builds are running - the system console
starts flooding with messages like the ones below. These are not written
to the system logs, so these were manually captured and reproduced as
best as possible here. It also appears that no actual disk I/O occurs
(no lights flashing) once the system gets in this state. Since nothing
is logged, it's impossible to tell if there is some more important
message up front that all of these follow.

Most of the times after rebooting, the file system is corrupt and has to
be fixed. (we are running both ext3 and ext2)

Any help would be appreciated - this box is completely unreliable. (can
crash several times a day - crash frequency seems tied to the amount of
random I/O we throw at it)


EXT3-fs error (device sd(8,1))
ext3_reserve_inode_write: IO failure
ext3_reserve_inode_write: IO failure
ext3_get_inode_loc

EXT2-fs error (devide sd(8,17))
ext2_write_inode_loc:
	unable to write inode
ext2_write_inode

I/O Error: dev 08:11, sector 82313288	(tons of sectors following)

Inode=####(some #), block=####(some #)

dev 08:01
dev 08:11




More information about the Linux-PowerEdge mailing list