IO error issues with MD3000i on Linux
Scott Ehrlich
srehrlich at gmail.com
Fri Jun 19 06:13:21 CDT 2009
On Fri, Jun 19, 2009 at 6:31 AM, <Shyam_Iyer at dell.com> wrote:
>> -----Original Message-----
>> From: Scott Ehrlich [mailto:srehrlich at gmail.com]
>> Sent: Friday, June 19, 2009 3:35 PM
>> To: Iyer, Shyam
>> Cc: linux-poweredge-Lists
>> Subject: Re: IO error issues with MD3000i on Linux
>>
>> On Fri, Jun 19, 2009 at 4:10 AM, <Shyam_Iyer at dell.com> wrote:
>> >> -----Original Message-----
>> >> From: linux-poweredge-bounces at lists.us.dell.com [mailto:linux-
>> >> poweredge-bounces at lists.us.dell.com] On Behalf Of Scott Ehrlich
>> >> Sent: Friday, June 19, 2009 6:55 AM
>> >> To: linux-poweredge-Lists
>> >> Subject: Re: IO error issues with MD3000i on Linux
>> >>
>> >> Here is a basic question -
>> >>
>> >> dmesg shows many errors from the md3000i. Newer kernels treat the
>> >> "errors" more appropriately.
>> >>
>> >
>> > True. Newer kernels like 2.6.27~ and above have a scsi device handler
>> > module (scsi_dh_rdac) which can handle active/active, active/passive
>> > paths effectively.
>> >
>> >
>> >> My question is - are the errors really bad errors? Is the
>> hardware,
>> >> or is the data, really in trouble? Or, are the logged errors simply
>> >> benign software reports that the kernel and drivers don't know any
>> >> better how to deal with the responses from the md3000i, but the data
>> >> and hardware are all fine?
>> >>
>> >
>> > Currently, stock rhel-5.3 has the module backported but without the
>> > support for the MD3000i device. Patches + kernel fix available in
>> > bugzilla https://bugzilla.redhat.com/show_bug.cgi?id=487293 for quick
>> > peak.
>>
>> I'm currently running 64-bit CentOS 5.2 on an isolated LAN.
>> Fundamentally, again, is there any actually data loss/corruption - is
>> there a real data integrity problem - do I have anything to worry
>> about, or is the software as a while (kernel + drivers) producing
>> actual data read/write errors, thus an upgrade would be required?
>
> The I/O errors are actually from the passive path. As all /dev/sdXs are visible to applications, I/O could be retried through the passive paths leading to errors.
> You could possibly reduce the number of I/O errors by blacklisting/filtering the /dev/sdX devices from use by applications like lvm, hal, fdisk etc but yet, rhel-5.2 has an inefficient architecture for DM-Multipath with the MD3000i.
>
> An upgrade is a good option if you could do that because constant I/O errors can cause some performance loss. Also if you decide to change active/passive path configuration using MDSM in the middle of I/Os that are inflight then data reordering might happen leading to corruption.
>
> Mpp driver is supported with RHEL-5.2 today.
>
How clean have you seen CentOS an upgrade versus fresh install, of
CentOS 5.2 to CentOS 5.3, with the partitioning information remaining
intact? We are talking out-of-box 64-bit CentOS 5.2 to out-of-box
64-bit CentOS 5.3.
Bottom line, if we remain with the md3000i connected to a CentOS 5.2
out-of-box installation, unpatched, on an NIS/Samba network, no LVM
(logical volume management), will we see data integrity? To prevent
data corruption, what safe methods can I use if we opt to keep the
current 5.2 install? Nobody has complained of data issues, yet,
under 5.2. I'm aiming to be proactive here.
Next to 5.2, any known issues for an upgrade, vs fresh install, from 5.2 to 5.3?
Finally, for mdsm, during my testing period when I learned how the box
worked with test data, any time I reinstalled the OS on the system the
3000i was connected to, the RAID info disappeared. As such, I'd be
_very_ worried that any upgrade would cause the data to be lost again,
meaning nobody could get to their stuff. How do I preserve the RAID
config?
Thanks.
More information about the Linux-PowerEdge
mailing list