IO error issues with MD3000i on Linux
Shyam_Iyer at Dell.com
Shyam_Iyer at Dell.com
Fri Jun 19 05:31:09 CDT 2009
> -----Original Message-----
> From: Scott Ehrlich [mailto:srehrlich at gmail.com]
> Sent: Friday, June 19, 2009 3:35 PM
> To: Iyer, Shyam
> Cc: linux-poweredge-Lists
> Subject: Re: IO error issues with MD3000i on Linux
>
> On Fri, Jun 19, 2009 at 4:10 AM, <Shyam_Iyer at dell.com> wrote:
> >> -----Original Message-----
> >> From: linux-poweredge-bounces at lists.us.dell.com [mailto:linux-
> >> poweredge-bounces at lists.us.dell.com] On Behalf Of Scott Ehrlich
> >> Sent: Friday, June 19, 2009 6:55 AM
> >> To: linux-poweredge-Lists
> >> Subject: Re: IO error issues with MD3000i on Linux
> >>
> >> Here is a basic question -
> >>
> >> dmesg shows many errors from the md3000i. Newer kernels treat the
> >> "errors" more appropriately.
> >>
> >
> > True. Newer kernels like 2.6.27~ and above have a scsi device handler
> > module (scsi_dh_rdac) which can handle active/active, active/passive
> > paths effectively.
> >
> >
> >> My question is - are the errors really bad errors? Is the
> hardware,
> >> or is the data, really in trouble? Or, are the logged errors simply
> >> benign software reports that the kernel and drivers don't know any
> >> better how to deal with the responses from the md3000i, but the data
> >> and hardware are all fine?
> >>
> >
> > Currently, stock rhel-5.3 has the module backported but without the
> > support for the MD3000i device. Patches + kernel fix available in
> > bugzilla https://bugzilla.redhat.com/show_bug.cgi?id=487293 for quick
> > peak.
>
> I'm currently running 64-bit CentOS 5.2 on an isolated LAN.
> Fundamentally, again, is there any actually data loss/corruption - is
> there a real data integrity problem - do I have anything to worry
> about, or is the software as a while (kernel + drivers) producing
> actual data read/write errors, thus an upgrade would be required?
The I/O errors are actually from the passive path. As all /dev/sdXs are visible to applications, I/O could be retried through the passive paths leading to errors.
You could possibly reduce the number of I/O errors by blacklisting/filtering the /dev/sdX devices from use by applications like lvm, hal, fdisk etc but yet, rhel-5.2 has an inefficient architecture for DM-Multipath with the MD3000i.
An upgrade is a good option if you could do that because constant I/O errors can cause some performance loss. Also if you decide to change active/passive path configuration using MDSM in the middle of I/Os that are inflight then data reordering might happen leading to corruption.
Mpp driver is supported with RHEL-5.2 today.
More information about the Linux-PowerEdge
mailing list