Slow RAID IO on PE2970

John LLOYD jal at mdacorporation.com
Fri Mar 13 11:02:35 CDT 2009


> Date: Fri, 13 Mar 2009 13:13:52 +0000
> From: Jason Ede <J.Ede at birchenallhowden.co.uk>
> Subject: Slow RAID IO on PE2970
> To: "linux-poweredge at dell.com" <linux-poweredge at dell.com>
> Message-ID:
> 	<1213490F1F316842A544A850422BFA961C09CF3B at BHLSBS.bhl.local>
> Content-Type: text/plain; charset="us-ascii"
> 
> We've got a PE2970 running openfiler 2.3 and 750GB SATA 
> drives and the disk access is temperamental to say the least.
> 
> (Incidentally managed to get MegaCli on it for RAID monitoring)
> 
> We've also got an old PE830 running the same version of 
> openfiler that seems much more responsive. I think there is 
> something funny with the RAID, but not sure what. It seems 
> that although the drives aren't under heavy disk IO load the 
> utilisation goes through the roof and performance becomes 
> very, very sluggish.
> 
> The drives are in a RAID5 array and iostat reports:
> 
> [root at BHLSAN MegaCli]# iostat -d -k -x 5 3
> Linux 2.6.26.8-1.0.11.smp.gcc3.4.x86_64 (BHLSAN)        03/13/2009
> 
> Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s 
>    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
> sda          0.01   0.42  0.08  0.21    1.74    5.00     0.87 
>     2.50    22.95     0.01   37.01  10.71   0.31
> sda1         0.00   0.00  0.00  0.00    0.00    0.00     0.00 
>     0.00     8.32     0.00   14.68   8.76   0.00
> sda2         0.01   0.42  0.08  0.21    1.73    5.00     0.87 
>     2.50    22.96     0.01   37.03  10.72   0.31
> sdb          0.75   0.61  7.67 16.36  141.02 1520.80    70.51 
>   760.40    69.14     0.30   12.41   3.91   9.39
> sdb1         0.75   0.61  7.67 16.36  141.02 1520.80    70.51 
>   760.40    69.14     0.30   12.41   3.91   9.39
> dm-0         0.00   0.00  7.68 16.97  140.28 1520.80    70.14 
>   760.40    67.39     0.31   12.64   3.81   9.39
> 

Of course, ignore the first report, as it shows averages since boot
time.


> Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s 
>    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
> sda          0.00   0.00  0.00  0.00    0.00    0.00     0.00 
>     0.00     0.00     0.00    0.00   0.00   0.00
> sda1         0.00   0.00  0.00  0.00    0.00    0.00     0.00 
>     0.00     0.00     0.00    0.00   0.00   0.00
> sda2         0.00   0.00  0.00  0.00    0.00    0.00     0.00 
>     0.00     0.00     0.00    0.00   0.00   0.00
> sdb          0.00   1.80 27.40 65.20  318.40 6392.40   159.20 
>  3196.20    72.47     6.02   65.56  10.49  97.12
> sdb1         0.00   1.80 27.40 65.20  318.40 6392.40   159.20 
>  3196.20    72.47     6.02   65.56  10.49  97.12
> dm-0         0.00   0.00 27.20 66.40  313.60 6387.40   156.80 
>  3193.70    71.59     6.60   71.12  10.38  97.12

The second (and subsequent) reports show the news --- 97% busy.  The IO
rate is r/s plus w/s or roughly 100 operations per second.  SATA can do
this, but not much better than this.  The average read transfer size is
159/27 = 6kbyte, and average write transfer size is 3196/65 = 49 kbyte.

If your RAID5 stripe size is much larger than 49 kbyte, then a "write"
to the operating system/application becomes an "update" as far as the
RAID5 software is concerned, since 49k byte will be typically in the
middle of a, say 128kbyte stripe .  So that performing the write
requires reading 128 k bytes from each disk (3 * 128 kbyte reads),
update the one or two stripes from the data disks (depending on whether
the 49 kbytes crosses a stripe), then writing the whole 3 * 128 kbytes
back to the three disks.  Your 49 kbyte write requires 768 kbyte of
Input plus Output.  So the disks are a little busy.

(By "stripe" I mean the size of the per-disk data stripe.  RAID5 is
sometimes called RAID0 plus parity, where the parity stripe is the same
size as the data stripe.  Of course the arithmetic above changes if your
stripe size is 16 kbyte.  But the result is the same -- lots more disk
IO than the first appears to be needed.)

I would recommend you reconsider using RAID5 and 3 disks to obtain the
capacity of 2 disks -- RAID1 is a better choice here, and leaves you one
disk spare.

--John



More information about the Linux-PowerEdge mailing list