Slow RAID IO on PE2970

Jason Ede J.Ede at birchenallhowden.co.uk
Fri Mar 13 11:20:15 CDT 2009


> -----Original Message-----
> From: linux-poweredge-bounces at dell.com [mailto:linux-poweredge-
> bounces at dell.com] On Behalf Of John LLOYD
> Sent: 13 March 2009 16:03
> To: linux-poweredge at dell.com
> Subject: RE: Slow RAID IO on PE2970
> 
> > Date: Fri, 13 Mar 2009 13:13:52 +0000
> > From: Jason Ede <J.Ede at birchenallhowden.co.uk>
> > Subject: Slow RAID IO on PE2970
> > To: "linux-poweredge at dell.com" <linux-poweredge at dell.com>
> > Message-ID:
> > 	<1213490F1F316842A544A850422BFA961C09CF3B at BHLSBS.bhl.local>
> > Content-Type: text/plain; charset="us-ascii"
> >
> > We've got a PE2970 running openfiler 2.3 and 750GB SATA
> > drives and the disk access is temperamental to say the least.
> >
> > (Incidentally managed to get MegaCli on it for RAID monitoring)
> >
> > We've also got an old PE830 running the same version of
> > openfiler that seems much more responsive. I think there is
> > something funny with the RAID, but not sure what. It seems
> > that although the drives aren't under heavy disk IO load the
> > utilisation goes through the roof and performance becomes
> > very, very sluggish.
> >
> > The drives are in a RAID5 array and iostat reports:
> >
> > [root at BHLSAN MegaCli]# iostat -d -k -x 5 3
> > Linux 2.6.26.8-1.0.11.smp.gcc3.4.x86_64 (BHLSAN)        03/13/2009
> >
> > Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s
> >    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
> > sda          0.01   0.42  0.08  0.21    1.74    5.00     0.87
> >     2.50    22.95     0.01   37.01  10.71   0.31
> > sda1         0.00   0.00  0.00  0.00    0.00    0.00     0.00
> >     0.00     8.32     0.00   14.68   8.76   0.00
> > sda2         0.01   0.42  0.08  0.21    1.73    5.00     0.87
> >     2.50    22.96     0.01   37.03  10.72   0.31
> > sdb          0.75   0.61  7.67 16.36  141.02 1520.80    70.51
> >   760.40    69.14     0.30   12.41   3.91   9.39
> > sdb1         0.75   0.61  7.67 16.36  141.02 1520.80    70.51
> >   760.40    69.14     0.30   12.41   3.91   9.39
> > dm-0         0.00   0.00  7.68 16.97  140.28 1520.80    70.14
> >   760.40    67.39     0.31   12.64   3.81   9.39
> >
> 
> Of course, ignore the first report, as it shows averages since boot
> time.
> 
> 
> > Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s
> >    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
> > sda          0.00   0.00  0.00  0.00    0.00    0.00     0.00
> >     0.00     0.00     0.00    0.00   0.00   0.00
> > sda1         0.00   0.00  0.00  0.00    0.00    0.00     0.00
> >     0.00     0.00     0.00    0.00   0.00   0.00
> > sda2         0.00   0.00  0.00  0.00    0.00    0.00     0.00
> >     0.00     0.00     0.00    0.00   0.00   0.00
> > sdb          0.00   1.80 27.40 65.20  318.40 6392.40   159.20
> >  3196.20    72.47     6.02   65.56  10.49  97.12
> > sdb1         0.00   1.80 27.40 65.20  318.40 6392.40   159.20
> >  3196.20    72.47     6.02   65.56  10.49  97.12
> > dm-0         0.00   0.00 27.20 66.40  313.60 6387.40   156.80
> >  3193.70    71.59     6.60   71.12  10.38  97.12
> 
> The second (and subsequent) reports show the news --- 97% busy.  The IO
> rate is r/s plus w/s or roughly 100 operations per second.  SATA can do
> this, but not much better than this.  The average read transfer size is
> 159/27 = 6kbyte, and average write transfer size is 3196/65 = 49 kbyte.
> 
> If your RAID5 stripe size is much larger than 49 kbyte, then a "write"
> to the operating system/application becomes an "update" as far as the
> RAID5 software is concerned, since 49k byte will be typically in the
> middle of a, say 128kbyte stripe .  So that performing the write
> requires reading 128 k bytes from each disk (3 * 128 kbyte reads),
> update the one or two stripes from the data disks (depending on whether
> the 49 kbytes crosses a stripe), then writing the whole 3 * 128 kbytes
> back to the three disks.  Your 49 kbyte write requires 768 kbyte of
> Input plus Output.  So the disks are a little busy.
> 
> (By "stripe" I mean the size of the per-disk data stripe.  RAID5 is
> sometimes called RAID0 plus parity, where the parity stripe is the same
> size as the data stripe.  Of course the arithmetic above changes if
> your
> stripe size is 16 kbyte.  But the result is the same -- lots more disk
> IO than the first appears to be needed.)
> 
> I would recommend you reconsider using RAID5 and 3 disks to obtain the
> capacity of 2 disks -- RAID1 is a better choice here, and leaves you
> one
> disk spare.
Many thanks for that. It makes a good deal of sense.

We have a 4th drive waiting to go into that box so would a RAID 10 be a better answer in terms of performance? Are we close to the limit of SATA drives irrespective of the array type we use?

How much performance would we get by moving to SAS drives and could the Perc5i have 2 arrays of 1 SATA and 1 SAS (all internal drives) as we're unsure if can have mixed array types?

Jason



More information about the Linux-PowerEdge mailing list