Copying a large file system (again)
Brice Figureau
brice+dell at daysofwonder.com
Wed Dec 5 13:31:55 CST 2007
On Wed, December 5, 2007 17:31, Tino Schwarze wrote:
> On Mon, Dec 03, 2007 at 08:23:44PM +0100, Tino Schwarze wrote:
>
>> > > I need to copy the whole /backup/backuppc directory structure in one
>> > > pass so that hardlinks are preserved. Or I'd need some tool
>> > > which could
>> > > preserve the hardlinks another way.
>> >
>> > http://backuppc.sourceforge.net/faq/BackupPC.html#other_installation_topics
>> >
>> > The section on Copying the Pool may help, specifically using
>> > BackupPC_tarPCCopy. If that does not work, you may be able to copy
>> the
>> > pool over first, then do the hosts one at a time and try to manually
>> run
>> > BackupPC_link.
>>
>> Oh well, that should do the trick. I'll try copying the pool first, then
>> use BackupPC_tarPCCopy to get a tar with almost only hardlinks in it,
>> pointing to the pool I've already copied.
>
> I successfully copied the pool! :-) It took amost two days. Now I'm
> running BackupPC_tarPCCopy and watching the machines... it seems there's
> still some glitch, this time with the destination machine (PE1800 with
> 3x750 GB SATA PERC RAID). I observe the following:
>
> Everything runs fine, both machines have I/O (I watch using vmstat), the
> receiving tar tells me about the files it unpacks (mostly hardlinks to
> the pool). After some time, the destination machine slows down
> considerably, doing almost no I/O any more with 25%-50% I/O wait (it's
> got 4 cores, source has only one and a lot of I/O wait). Using netstat I
> see that the receive buffer is quite full (about a meg), the source
> machine stops sending data and is 100% idle.
>
> Here is some "vmstat 1" output of the destination machine when it's
> getting slow:
>
> procs -----------memory---------- ---swap-- -----io---- -system--
> ----cpu----
> r b swpd free buff cache si so bi bo in cs us sy id
> wa
> 0 0 3044 17200 2276 1205492 0 0 2 79 2 2 0 0
> 99 0
> 1 0 3044 17076 2276 1205456 0 0 20 3392 695 1666 0 0
> 99 0
> 0 0 3044 16084 2276 1205456 0 0 8 0 779 1063 0 0
> 100 0
> 0 2 3044 15960 2280 1205460 0 0 0 4956 613 1476 0 0
> 71 28
> 0 2 3044 15952 2280 1205464 0 0 8 0 275 1070 0 0
> 50 50
> 0 2 3044 15952 2280 1205472 0 0 0 9 277 1073 0 0
> 50 50
> 0 2 3044 15952 2280 1205472 0 0 0 0 259 1069 0 0
> 50 50
> 1 2 3044 15952 2280 1205472 0 0 0 0 271 1067 0 0
> 50 50
> 1 0 3044 16084 2284 1205472 0 0 84 1020 440 1433 0 1
> 87 12
> 0 2 3044 15836 2288 1205732 0 0 140 8166 812 2166 0 1
> 75 24
> 0 2 3044 15828 2288 1205736 0 0 0 0 281 1115 0 0
> 50 50
> 1 2 3044 15828 2288 1205736 0 0 0 88 281 1080 0 0
> 50 50
> 1 2 3044 15828 2288 1205736 0 0 0 0 257 1059 0 0
> 50 50
> 1 2 3044 15828 2288 1205736 0 0 0 0 272 1067 0 0
> 50 50
> 1 2 3044 15852 2288 1205736 0 0 0 0 251 1044 0 0
> 50 50
>
> -> almost no I/O done, something is blocked there. The whole machine is
> slow (e.g. calling a man page, pressing tab in the shell).
>
> I see one of the [pdflush] kernel threads in "D" state when this
> happens, so I expect this to be a buffering/flushing issue. Any hints
> where to look next? Any kernel parameters to tune?
I think that your system is starting writing dirty pages cached (see the
1.2GB of cache) to disk.
Is it an ext3 filesystem (I mean the destination)?
To know if that's the write-back you can cat /proc/meminfo and see if
writeback is > 0.
The following might increase your thoughput:
- echo "1" > /proc/sys/vm/dirty_background_ratio
- echo "2" > /proc/sys/vm/dirty_ratio
This will force the dirty writeback to start early and continuously.
You could maybe play with /proc/sys/vm/swappiness, too.
Other than that, if the destination filesystem is ext3, make sure it is
mounted noatime (atime are killing performances by adding more write
pressure on the disks). Also mount it with data=writeback, it should help
too.
You might also experiencing a kernel issue I reported a while ago on the
kernel bugzilla:
http://bugzilla.kernel.org/show_bug.cgi?id=7372
Unfortunately it hasn't been solved yet.
Which kernel are you running on the destination side ?
Hope that helps,
--
Brice Figureau
Days of Wonder
More information about the Linux-PowerEdge
mailing list