IO error issues with MD3000i on Linux
Shyam_Iyer at Dell.com
Shyam_Iyer at Dell.com
Fri Jun 19 06:22:38 CDT 2009
> -----Original Message-----
> From: Scott Ehrlich [mailto:srehrlich at gmail.com]
> Sent: Friday, June 19, 2009 4:43 PM
> To: Iyer, Shyam
> Cc: linux-poweredge-Lists
> Subject: Re: IO error issues with MD3000i on Linux
>
> On Fri, Jun 19, 2009 at 6:31 AM, <Shyam_Iyer at dell.com> wrote:
> >> -----Original Message-----
> >> From: Scott Ehrlich [mailto:srehrlich at gmail.com]
> >> Sent: Friday, June 19, 2009 3:35 PM
> >> To: Iyer, Shyam
> >> Cc: linux-poweredge-Lists
> >> Subject: Re: IO error issues with MD3000i on Linux
> >>
> >> On Fri, Jun 19, 2009 at 4:10 AM, <Shyam_Iyer at dell.com> wrote:
> >> >> -----Original Message-----
> >> >> From: linux-poweredge-bounces at lists.us.dell.com [mailto:linux-
> >> >> poweredge-bounces at lists.us.dell.com] On Behalf Of Scott Ehrlich
> >> >> Sent: Friday, June 19, 2009 6:55 AM
> >> >> To: linux-poweredge-Lists
> >> >> Subject: Re: IO error issues with MD3000i on Linux
> >> >>
> >> >> Here is a basic question -
> >> >>
> >> >> dmesg shows many errors from the md3000i. Newer kernels treat
> the
> >> >> "errors" more appropriately.
> >> >>
> >> >
> >> > True. Newer kernels like 2.6.27~ and above have a scsi device
> handler
> >> > module (scsi_dh_rdac) which can handle active/active,
> active/passive
> >> > paths effectively.
> >> >
> >> >
> >> >> My question is - are the errors really bad errors? Is the
> >> hardware,
> >> >> or is the data, really in trouble? Or, are the logged errors
> simply
> >> >> benign software reports that the kernel and drivers don't know
> any
> >> >> better how to deal with the responses from the md3000i, but the
> data
> >> >> and hardware are all fine?
> >> >>
> >> >
> >> > Currently, stock rhel-5.3 has the module backported but without
> the
> >> > support for the MD3000i device. Patches + kernel fix available in
> >> > bugzilla https://bugzilla.redhat.com/show_bug.cgi?id=487293 for
> quick
> >> > peak.
> >>
> >> I'm currently running 64-bit CentOS 5.2 on an isolated LAN.
> >> Fundamentally, again, is there any actually data loss/corruption -
> is
> >> there a real data integrity problem - do I have anything to worry
> >> about, or is the software as a while (kernel + drivers) producing
> >> actual data read/write errors, thus an upgrade would be required?
> >
> > The I/O errors are actually from the passive path. As all /dev/sdXs
> are visible to applications, I/O could be retried through the passive
> paths leading to errors.
> > You could possibly reduce the number of I/O errors by
> blacklisting/filtering the /dev/sdX devices from use by applications
> like lvm, hal, fdisk etc but yet, rhel-5.2 has an inefficient
> architecture for DM-Multipath with the MD3000i.
> >
> > An upgrade is a good option if you could do that because constant I/O
> errors can cause some performance loss. Also if you decide to change
> active/passive path configuration using MDSM in the middle of I/Os that
> are inflight then data reordering might happen leading to corruption.
> >
> > Mpp driver is supported with RHEL-5.2 today.
> >
>
> How clean have you seen CentOS an upgrade versus fresh install, of
> CentOS 5.2 to CentOS 5.3, with the partitioning information remaining
> intact? We are talking out-of-box 64-bit CentOS 5.2 to out-of-box
> 64-bit CentOS 5.3.
>
I can speak for RHEL-5.2 to RHEL-5.3 which is a supported config.
> Bottom line, if we remain with the md3000i connected to a CentOS 5.2
> out-of-box installation, unpatched, on an NIS/Samba network, no LVM
> (logical volume management), will we see data integrity? To prevent
> data corruption, what safe methods can I use if we opt to keep the
> current 5.2 install? Nobody has complained of data issues, yet,
> under 5.2. I'm aiming to be proactive here.
>
> Next to 5.2, any known issues for an upgrade, vs fresh install, from
> 5.2 to 5.3?
>
Redhat Kbases ??
> Finally, for mdsm, during my testing period when I learned how the box
> worked with test data, any time I reinstalled the OS on the system the
> 3000i was connected to, the RAID info disappeared. As such, I'd be
> _very_ worried that any upgrade would cause the data to be lost again,
> meaning nobody could get to their stuff. How do I preserve the RAID
> config?
>
Are the MD3000i RAID-LUNs part of the OS disk selection? You might be deleting raid metadata to the LUN presented by MD3000i while formatting for installation.
More information about the Linux-PowerEdge
mailing list