Some MD1200 tuning info

Bond Masuda bond.masuda at jlbond.com
Wed Mar 31 11:25:56 CDT 2010


Hi David,

thanks for sharing that information. we do a lot of tuning of Linux and
MD1000, but we haven't played with the MD1200 yet. so, my suggestions below
will really only apply with our experience using MD1000s; you should verify
the results with your own benchmarks/applications.

1. we've found that turning OFF read ahead in the controller, and letting
the OS do it's block device read ahead gained better performance, especially
on large sequential access. also, the 8MB read ahead using /sbin/blockdev is
within the range of optimal performance... when we tune that parameter on
RAID storage, we *ALWAYS* start from 8MB and go up to 32MB in 4MB
increments. In our experience, that is the range that gives us the best
results.

2. for really large sequential writes, we've found that write-through was
better than write-back. but the opposite was true under a certain threshold.
we had a customer that did scheduled massive uploads to their file server
farm, and using write-through was able to get the job completed sooner.

3. Linux I/O scheduler can cause conflicts with RAID controllers. The
default 'cfq' doesn't work well with any RAID controllers we've tested
(mostly HP, Dell, 3ware). We had the best performance setting it to 'noop'
for Dell equipment, and using 'deadline' for HP controllers. We don't know
anything about the new H800 controllers yet, but give 'noop' a try.

4. In case you want to tweak XFS some more, these are the typical options we
use for XFS:

mkfs.xfs -l version=2 /dev/<your device>
mount -o rw,noatime,logbufs=8,logbsize=256k /dev/<your device> <mount point>

you can also look into tweaking the "-d sunit=XXX,swidth=XXX" options in
mkfs.xfs to match your RAID geometry. the effects of these settings become
less noticeable as your data sets get larger and larger.

5. to speed up your benchmarking, you might consider booting Linux with
limited memory with mem=256M or something appropriate. this will allow your
benchmarks to quickly get outside the range where caching techniques become
less effective.

Hope that helps.... Thanks again for sharing information about the
MD1200+H800! We're hoping to get our hands on one, one of these days...

-Bond Masuda
Principal Consultant
-----------------------------------
JL Bond Consulting / www.JLBond.com

> -----Original Message-----
> From: linux-poweredge-bounces at dell.com [mailto:linux-poweredge-
> bounces at dell.com] On Behalf Of David Hubbard
> Sent: Wednesday, March 31, 2010 8:57 AM
> To: linux-poweredge at dell.com
> Subject: Some MD1200 tuning info
> 
> Just wanted to share some info I accumulated on
> the MD1200 and H800 controller while testing and
> configuring a disk deduplication media server for
> a NetBackup installation.  The performance of the
> H800 was atrocious while the background
> initialization was running, so don't put an array
> into production while it's still doing that if you
> require good performance.  In fact, if the
> performance is similar when it is rebuilding,
> that may be an issue for some people too because
> it was literally a factor of eight slower than
> after initialization finally completed.
> 
> I initialized an array of (10) 2 TB 7200 rpm SAS
> drives with two hot spares in an MD1200 connected
> to an H800 controller via dual-paths on a T710
> server and it took about three days total to
> finish initialization.  The array is configured as
> RAID 50 across the ten drives with what ended up
> being a 128k stripe size.
> 
> To test, I used the Bonnie++ disk benchmarking tool
> because it pretty closely simulates the type of
> load NetBackup puts on a server when doing disk-based
> backup with deduplication.  The external array is
> about 16 TB usable after formatting, it's partitioned
> with parted and I tested on CentOS 5.4 latest kernel
> and both XFS and EXT3 with a combination of 64k
> and 128k stripe sizes on the hardware side, ended
> up with 128k as it was faster for this testing.  I
> used the bonnie defaults so on this 40 GB server, it
> ended up testing with an 80 GB data set.
> 
> The results:
> 
> 1) XFS with hardware read ahead: 455 MB/sec write,
> 675 MB/sec read, 97 MB/sec random rewrite, 397 random
> seeks/sec.
> 
> 2) XFS with hardware adaptive read ahead: 218 MB/sec
> write, 290 MB/sec read, 40 MB/sec random rewrite, 431
> random seeks/sec.
> 
> 3) EXT3 with hardware read ahead: 510 MB/sec write,
> 633 MB/sec read, 187 MB/sec random rewrite, 796 random
> seeks/sec.
> 
> 4) EXT3 with hardware adaptive read ahead: 507 MB/sec
> write, 632 MB/sec read, 205 MB/sec random rewrite, 887
> random seeks/sec.
> 
> I was kind of surprised at that, I had expected XFS to
> be a lot better, perhaps there are mkfs or mount
> options I need to play with but I didn't do anything
> special to EXT3 either.  I have not disabled atime in
> the mount.
> 
> So then I come across this article:
> 
> http://thias.marmotte.net/archives/2008/01/05/Dell-PERC5E-and-MD1000-
> per
> formance-tweaks.html
> 
> and it advises of the blockdev command and adjusting
> the read ahead value.  I tried a few options and
> setting it to 8192 achieved the best result, which
> changed my EXT3 with adaptive read ahead to 516 MB/sec
> write, 959 MB/sec read (!!), 292 MB/sec random rewrite,
> 806 random seek/sec.  I did try the starting sector
> alignment stuff too, serious PITA when using parted,
> but it didn't make a significant difference.
> 
> Should be noted that while XFS was a lot slower for my
> particular configuration, the CPU usage under writing
> was about half what it was with EXT3, so that may be
> a factor for some.  I'd also expect less dramatic
> figures on servers handling lots of small files, maybe
> that is where XFS shines too; for a backup de-dupe
> server it is a lot of large files.
> 
> Dave
> 
> _______________________________________________
> Linux-PowerEdge mailing list
> Linux-PowerEdge at dell.com
> https://lists.us.dell.com/mailman/listinfo/linux-poweredge
> Please read the FAQ at http://lists.us.dell.com/faq



More information about the Linux-PowerEdge mailing list