The system takes over

Norman Gaywood norm at turing.une.edu.au
Wed Nov 27 17:22:00 CST 2002


On Wed, Nov 27, 2002 at 09:03:25AM -0500, Rechenberg, Andrew wrote:
> What's your disk subsystem consist of?  You state that you are using
> software RAID1 and 5, but you don't dtate what type of disk
> hardware/controllers and in what configuration you have it in.

OK. I have no H/W raid controller.  I've attached the syslog startup to
this message which I think should have all the numbers you need. Here
is /etc/fstab:

/dev/md2                /                       ext3    defaults        1 1
/dev/md0                /boot                   ext3    defaults        1 2
none                    /dev/pts                devpts  gid=5,mode=620  0 0
/dev/md6                /home                   ext3    defaults        1 2
/dev/md1                /opt                    ext3    defaults        1 2
none                    /proc                   proc    defaults        0 0
none                    /dev/shm                tmpfs   defaults        0 0
/dev/md4                /tmp                    ext3    defaults        1 2
/dev/md3                /var                    ext3    defaults        1 2
/dev/md5                /var/spool/mail         ext3    defaults        1 2
/dev/sdb2               swap                    swap    defaults        0 0
/dev/sda2               swap                    swap    defaults        0 0
/dev/sdh1		/.automount/alan/disks/alan/h1 ext3 defaults 1 3
/dev/sdi1		/.automount/alan/disks/alan/i1 ext3 defaults 1 3
/dev/sdj1		/.automount/alan/disks/alan/j1 ext3 defaults 1 3
/dev/cdrom              /mnt/cdrom              iso9660 noauto,owner,kudzu,ro 0 0
/dev/fd0                /mnt/floppy             auto    noauto,owner,kudzu 0 0

Here is /proc/mdstat not long after a reboot:

Personalities : [raid1] [raid5] 
read_ahead 1024 sectors
md0 : active raid1 sdb1[1] sda1[0]
      104320 blocks [2/2] [UU]
      
md1 : active raid1 sdb3[1] sda3[0]
      16779776 blocks [2/2] [UU]
      
md3 : active raid1 sdb5[1] sda5[0]
      16779776 blocks [2/2] [UU]
      
md2 : active raid1 sdb6[1] sda6[0]
      8385792 blocks [2/2] [UU]
      
md4 : active raid1 sdb7[1] sda7[0]
      4192832 blocks [2/2] [UU]
      
md5 : active raid1 sdb8[1] sda8[0]
      8658880 blocks [2/2] [UU]
      
md6 : active raid5 sdg1[4] sdf1[3] sde1[2] sdd1[1] sdc1[0]
      215045760 blocks level 5, 64k chunk, algorithm 0 [4/4] [UUUU]
      [========>............]  resync = 44.4% (31848552/71681920)
finish=64.3min speed=10313K/sec
unused devices: <none>

And /etc/raidtab:

raiddev		    /dev/md2
raid-level		    1
nr-raid-disks		    2
chunk-size		    64k
persistent-superblock	    1
nr-spare-disks		    0
    device	    /dev/sda6
    raid-disk     0
    device	    /dev/sdb6
    raid-disk     1
raiddev		    /dev/md0
raid-level		    1
nr-raid-disks		    2
chunk-size		    64k
persistent-superblock	    1
nr-spare-disks		    0
    device	    /dev/sda1
    raid-disk     0
    device	    /dev/sdb1
    raid-disk     1
raiddev		    /dev/md6
raid-level		    5
nr-raid-disks		    4
chunk-size		    64k
persistent-superblock	    1
nr-spare-disks		    1
    device	    /dev/sdc1
    raid-disk     0
    device	    /dev/sdd1
    raid-disk     1
    device	    /dev/sde1
    raid-disk     2
    device	    /dev/sdf1
    raid-disk     3
    device	    /dev/sdg1
    spare-disk     0
raiddev		    /dev/md1
raid-level		    1
nr-raid-disks		    2
chunk-size		    64k
persistent-superblock	    1
nr-spare-disks		    0
    device	    /dev/sda3
    raid-disk     0
    device	    /dev/sdb3
    raid-disk     1
raiddev		    /dev/md4
raid-level		    1
nr-raid-disks		    2
chunk-size		    64k
persistent-superblock	    1
nr-spare-disks		    0
    device	    /dev/sda7
    raid-disk     0
    device	    /dev/sdb7
    raid-disk     1
raiddev		    /dev/md3
raid-level		    1
nr-raid-disks		    2
chunk-size		    64k
persistent-superblock	    1
nr-spare-disks		    0
    device	    /dev/sda5
    raid-disk     0
    device	    /dev/sdb5
    raid-disk     1
raiddev		    /dev/md5
raid-level		    1
nr-raid-disks		    2
chunk-size		    64k
persistent-superblock	    1
nr-spare-disks		    0
    device	    /dev/sda8
    raid-disk     0
    device	    /dev/sdb8
    raid-disk     1


> I would venture to guess that your system is waiting for disk I/O and
> that is why you're seeing such high loads and slow rsync performance.

I like the idea that it might be stuck doing disk I/O. That would explain
the high system time %. Note that rsync does not start slow.  Early on
the speed seems quite impressive. The load on the system is less that
1 and very little time seems to be spent in "system time". It's after
about 2 hours that the system is noticeably slower interactively at the
shell. The load reported by top is about 2-3.  As time goes on it gets
much slower and the load grows. More and more time seems to be spent in
"system time". I left an rsync going overnight, this time writing to
a non-raided partition (/dev/sdh1). About 13G was transfered and this
morning the system was still ping-able but not responsive. This is the
last top:

  3:12am  up 12:26,  2 users,  load average: 65.41, 64.54, 63.08
174 processes: 115 sleeping, 59 running, 0 zombie, 0 stopped
CPU0 states:  0.2% user, 99.1933% system,  0.0% nice,  0.767% idle
CPU1 states:  0.2% user, 99.1945% system,  0.0% nice,  0.755% idle
CPU2 states:  0.2% user, 99.1609% system,  0.0% nice,  0.1091% idle
CPU3 states:  0.2% user, 99.1901% system,  0.0% nice,  0.799% idle
CPU4 states:  0.3% user, 99.1724% system,  0.0% nice,  0.975% idle
CPU5 states:  0.2% user, 99.1859% system,  0.0% nice,  0.841% idle
CPU6 states:  0.2% user, 99.1780% system,  0.0% nice,  0.920% idle
CPU7 states:  0.1% user, 99.1830% system,  0.0% nice,  0.871% idle
Mem:  16280784K av, 13560736K used, 2720048K free,       0K shrd, 796K buff
Swap: 33559768K av,       0K used, 33559768K free 12733000K cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
   10 root      25   0     0    0     0 RW   16.4  0.0 169:19 keventd
   19 root      25   0     0    0     0 SW   16.2  0.0 275:48 kswapd
 1430 root      22   0   632  632   600 R    14.1  0.0 119:28 crond
 1426 root      20   0   632  632   600 R    14.0  0.0 126:57 crond
 1480 root      20   0   632  632   600 R    14.0  0.0  53:33 crond
 1509 root      24   0   632  632   600 R    14.0  0.0  32:59 crond
 1439 root      15   0   632  632   600 R    13.9  0.0 101:12 crond
 1483 root      15   0   632  632   600 R    13.9  0.0  50:13 crond
 1447 root      20   0   632  632   600 R    13.8  0.0  90:31 crond
 1450 root      20   0   632  632   600 R    13.8  0.0  85:35 crond
 1460 root      19   0   632  632   600 R    13.8  0.0  72:57 crond
 1498 root      16   0   632  632   600 R    13.8  0.0  40:21 crond
 1415 root      21   0   632  632   600 R    13.7  0.0 147:48 crond
 1433 root      17   0   632  632   600 R    13.6  0.0 112:52 crond
 1436 root      16   0   632  632   600 R    13.6  0.0 107:30 crond
 1451 root      23   0   632  632   600 R    13.6  0.0  84:54 crond
 1470 root      16   0   632  632   600 R    13.6  0.0  62:26 crond
 1488 root      23   0   632  632   600 R    13.6  0.0  47:21 crond


> Also remember that due to hardware restraints, the rsync process will
> actually only use about 3GB of the 16GB of memory in your system.  The
> rest is just used as cache.

Yes, that's OK.

> If you don't mind, or it's not confidential, could you mail me your disk
> susbsytem configuration?  I might be able to tell you wherein lies your
> problem.

Hope you can see something.

I'm about to try a non-rsync copy. Something like:

(rsh host tar cf - .) | tar xf -

and see how that goes.

Cheers.

> Good luck,
> Andy.
> 
> Andrew Rechenberg
> Infrastructure Team, Sherman Financial Group
> arechenberg at shermanfinancialgroup.com
> Phone: 513.707.3809
> Fax:   513.707.3838

-- 
Norman Gaywood -- School of Mathematical and Computer Sciences
University of New England, Armidale, NSW 2351, Australia
norm at turing.une.edu.au     http://turing.une.edu.au/~norm
Phone: +61 2 6773 2412     Fax: +61 2 6773 3312




More information about the Linux-PowerEdge mailing list