performance bottleneck in Linux MD RAID-1

Bond Masuda bond.masuda at jlbond.com
Thu Jul 15 10:52:18 CDT 2010


Thanks for the suggestions. Yes, we are aware of those other
parameters, but we now know the bottleneck is in the MD RAID-1 layer.
This is RHEL 5.5 w/ latest updated kernel (don't have the version with
me right now)

We've tried all schedulers, a variety of read ahead buffers, etc. The
only thing that has allowed us to break the 200MB/s seq. write limit is
when we get rid of the MD RAID-1 layer.

Even if we don't use the file system (XFS in this case), if we build the
MD RAID-1 with a missing half, and then add the 2nd half to allow it to
re-sync, the fastest the re-sync will go (with all else pretty much
idle) is about 200MB/s. So, this is MD RAID-1 layer doing it's own block
copying with no LVM2 or XFS or anything else involved.

-Bond

On Thu, 2010-07-15 at 10:34 -0500, Paul M. Dyer wrote:
> Hi,
> 
> which IO elevator are you using?   Are you using RHEL4 or RHEL5?
> 
> In RHEL5, you could try the deadline or noop elevator to see if that works better.  Implement by using this example for sda, change for your particular device:
> 
> cat /sys/block/sda/queue/scheduler
> 
> echo "deadline" > /sys/block/sda/queue/scheduler
> 
> or use noop:
> echo "noop" > /sys/block/sda/queue/scheduler
> 
> Here is a link from RHEL4 days about the schedulers.
> http://www.redhat.com/magazine/008jun05/features/schedulers/
> 
> Paul
> 
> 
> ----- Original Message -----
> From: "Bond Masuda" <bond.masuda at jlbond.com>
> To: "linux-poweredge" <linux-poweredge at dell.com>
> Sent: Wednesday, July 14, 2010 10:32:57 PM
> Subject: performance bottleneck in Linux MD RAID-1
> 
> Hi Everyone,
> 
> I'm wondering if some of the gurus around here might be able to help me
> out. We have a PE2970 with two PERC 6/E, each PERC6/E is connected via
> single SAS cable to an MD1000 with 15x 1TB Hitachi SATA 7.2K drives. We
> have each MD1000 setup in RAID-10 with 14 drives and 1 hot spare. Within
> Linux, we mirror the two MD1000's with Linux MD RAID-1 as /dev/md0. On
> top of /dev/md0, we have LVM2 and then XFS on the LV. The reason for the
> LVM2 is to take snapshots (we reserve about 10% of space in VG for it)
> 
> We're seeing a performance bottleneck of about 200MBytes/sec sequential
> writes when testing with iozone. We were expecting with 7x effective
> spindles on the RAID-10, to get about ~350MBytes/sec sustained writes
> for sequential access.
> 
> After trying out several combinations of things, we found that if we
> remove the Linux MD software RAID layer, and just LVM2 on top of
> the /dev/sdc (the vdisk as presented by the PERC 6/E RAID-10), we get
> about 340MBytes/sec sequential writes. If we put XFS directly on top
> of /dev/sdc1, we get about the same 340MBytes/sec. So, we can get our
> anticipated performance of about 350MB/s only when we don't use the MD
> RAID-1.
> 
> Since both MD1000s are connected via separate PERC 6/E, we didn't think
> the MD RAID-1 would cause >40% performance loss...
> 
> We even tried to degrade the MD RAID-1 and see if writing only to one of
> the mirrors would improve performance. It did NOT.. .still 200MB/s. It
> almost seems like Linux MD layer has a performance cap at around
> 200MB/s.
> 
> Has anyone encountered this and have suggestions to remove this
> bottleneck? Any advice would be appreciated.
> 
> Thanks,
> -Bond
> 
> _______________________________________________ Linux-PowerEdge mailing
> list Linux-PowerEdge at dell.com
> https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read
> the FAQ at http://lists.us.dell.com/faq




More information about the Linux-PowerEdge mailing list