[Linux-PowerEdge] Extremely poor performance with LVM vs. RAW disc

Linda A. Walsh dell at tlinx.org
Mon Nov 5 16:13:44 CST 2012


Gregory Gulik wrote:
>   Thanks for this great info.
> How would I find out if the stripes line up and if they don't how does 
> one go about fixing them?
> I'm looking at the mkfs.ext4 man page and is it the "-b block_size" 
> option or something else?
> 
> 
> BTW, in experimenting further over the weekend I found the PERC 6/E 
> firmware was out of date so I updated it and now performance is 
> significantly better.  That said I want to make sure all settings are as 
> optimal as they could be before we put real data on the server.
---
look at the xfs man page:

Under the data section, you see:

                    sunit=value
                           This  is  used to specify the stripe unit for a RAID
                           device or a logical volume.  The  value  has  to  be
                           specified in 512-byte block units. Use the su subop-
                           tion to specify the stripe unit size in bytes.  This
                           suboption  ensures  that  data  allocations  will be
                           stripe unit aligned when the current end of file  is
                           being  extended  and  the  file  size is larger than
                           512KiB. Also inode allocations and the internal  log
                           will be stripe unit aligned.

                    su=value
                           This  is an alternative to using sunit.  The su sub-
                           option is used to specify the stripe unit for a RAID
                           device or a striped logical volume. The value has to
                           be specified in bytes, (usually using  the  m  or  g
                           suffixes).  This  value  must  be  a multiple of the
                           filesystem block size.

                    swidth=value
                           This is used to specify the stripe width for a  RAID
                           device or a striped logical volume. The value has to
                           be specified in 512-byte block  units.  Use  the  sw
                           suboption to specify the stripe width size in bytes.
                           This suboption is required  if  -d  sunit  has  been
                           specified  and  it  has  to  be a multiple of the -d
                           sunit suboption.

                    sw=value
                           suboption is an alternative to using swidth.  The sw
                           suboption  is used to specify the stripe width for a
                           RAID device or striped logical volume. The value  is
                           expressed  as  a multiplier of the stripe unit, usu-
                           ally the same as the number of stripe members in the
                           logical  volume  configuration,  or  data disks in a
                           RAID device.

                           When a filesystem is created  on  a  logical  volume
                           device,  mkfs.xfs will automatically query the logi-
                           cal volume for appropriate sunit and swidth values.


--- You use sunit+swidth or su+sw to specify your raid layout that you specified
when making the raid (it's strip size and the width being the number of data disks).

A good strip width (su*#data disks) is one that will easily fit in the 
controller's cache.
Most modern cache cards have 256-1024MB (I __THINK__...been a while since I read 
up on these
things.  I've seen lit from LSI suggesting a 64K as a good unit size with 128K 
being a max
suggested.  In a 4Data disk RAID; that would give you a width of 256K.  Unless 
you are a bank
or fiancial institution, -- if you use Enterprise class disks, I would suggest 
using
RAID5, out of your 6, -- 4 DATA, 1 PARITY and 1 for a hot-spare.  You'll get 
high protection,
AND a spare, AND your write speeds will be 320x of 1 vs. 266x of 1 (Reads should be
about the same), but you know your risk tolerance level better than I.

Oh one thing I keep forgetting to ask -- why use 8k r/w --- that's all bug 
guaranteed
NOT to be the size of 1 strip-unit let alone a width.

I found my fastest R/W speeds were when I wrote  2-4GB files, though there seems 
to be
a bug in more recent kernels for writing over 2G....still even at 1G, I got 1GB 
reads
and writes. using direct I/O.  Going through the system file buffer take that down
to 600-700MB/s.  I'm guessing my having my VGgroups misaligned contribute to my 
not getting
closer to theoretical -- with 12 data disks, I should get 1.2 GB/...  I have 
seen as high
as 1.1.... so it's not a huge hit...

I can't strongly recommend enough using XFS if you want performance on a large 
file system.
Both it's allocation mechanism and it's RAID aware params were designed for 
corporate I/O servers
from it's inception.   ext4 has grown out of a PC-desktop file system.  It's 
added on more
XFS features -- journalling, acl's extended attributes (some of those have to be 
turned on
separately) that were all designed into XFS at the beginning.  xfs also has the 
optino of
real-time data partitions where you don't place a file system but are just 
repositories for
real-time data streaming.

You also have no fsck and almost no mkfs time when making an xfs partition. 
(fsck tests to see if the disk is mounted -- that's how good the normal 
journalling is).

NOTE: one thing that XFS really wants to be happy -- (and get it's good rep) -- 
a UPS -- unplanned shutdowns are bad for high perf-data servers that delay 
writes to disk to improve performance.

You can tighten up the time allowed for writing to disk, at the penalty of 
speed.  They are
tunable params in /proc.



More information about the Linux-PowerEdge mailing list