Magic SysRq (was RE: The system takes over)

Rechenberg, Andrew arechenberg at shermfin.com
Tue Dec 3 09:28:01 CST 2002


Speaking of Magic SysRq key sequences, does anyone know how to use them
with servers connected to a Dell KVM switch?

According to the manual for the KVM, if you want to do a print screen
you hit the Print Screen button like 3 or 4 times and then that works.
Well I think when pressing the Alt key after the third Print Screen
depress, it resets the KVM buffer and the SysRq sequence doesn't work.

If anyone could provide information on how to pull this sequence off, it
would be much appreciated.

Thanks,
Andy.

Andrew Rechenberg
Infrastructure Team, Sherman Financial Group
arechenberg at shermanfinancialgroup.com
Phone: 513.707.3809
Fax:   513.707.3838


-----Original Message-----
From: Swinefurth, Chris [mailto:CSwinefurth at nisys.com] 
Sent: Monday, December 02, 2002 9:18 PM
To: 'Norman Gaywood'; 'linux-poweredge at dell.com'
Subject: RE: The system takes over


Norman,
	Have your done a kernel thread dump during an instance of a high
load?  I have a similar problem here with large file copies.  I traced
it
down to a bug in truncate_list_pages.  Try doing a Alt-Sysreq-T on the
console.  I still do not have resolution to this problem, but it is
happening to us very rarely.  If you come across anything, please share.
	Good luck,
	--Chris

-----Original Message-----
From: Norman Gaywood [mailto:norm at turing.une.edu.au] 
Sent: Wednesday, November 27, 2002 10:50 PM
To: linux-poweredge at dell.com
Subject: Re: The system takes over


I hope I am not annoying people with all these posts. But I do have a
real problem I believe and I'm hoping someone will spot what might be
wrong.

On Thu, Nov 28, 2002 at 12:28:37PM +1100, Norman Gaywood wrote:
> I just tried a copy like this:
> 
> (rsh otherhost tar cf - .) | tar xf -
> 
> and managed to transfer 13G in about 25 minutes. Great! Just what you
> would expect.  The system showed hardly any load at all. Problem is,
> I introduced another variable by doing the copy to a disk partition on
> a seperate controller to where the SW raid sets are located.
> 
> I'm just doing some more copy tests and may have some more clues in a
> few hours.

OK, so I tried an rsync copy to the same non-sw-raid partition that did
the quick tar copy. The system gradually slowed down as before. So I
think I can conclude that it's not the SW-Raid, the disk controller or
a disk that is causing the problem.

This time I did not let the rsync continue until the system was unusable
and killed the rsync processes. I left the system to do nothing for a 2
hours to see if it recovered. It didn't, the system is still sluggish
two hours after the rsync was killed. Following is the top output. Note
kswapd.

  1:58pm  up  4:46,  3 users,  load average: 2.12, 2.17, 2.29
122 processes: 121 sleeping, 1 running, 0 zombie, 0 stopped
CPU0 states:  0.0% user,  0.2% system,  0.0% nice, 99.23% idle
CPU1 states:  0.0% user,  1.9% system,  0.0% nice, 98.16% idle
CPU2 states:  0.0% user,  0.1% system,  0.0% nice, 99.24% idle
CPU3 states:  0.0% user,  0.14% system,  0.0% nice, 99.11% idle
CPU4 states:  0.0% user,  0.3% system,  0.0% nice, 99.22% idle
CPU5 states:  0.0% user,  0.2% system,  0.0% nice, 99.23% idle
CPU6 states:  0.0% user, 83.6% system,  0.0% nice, 16.19% idle
CPU7 states:  0.0% user,  0.4% system,  0.0% nice, 99.21% idle
Mem:  16280784K av, 16186012K used,   94772K free,       0K shrd, 784K
buff
Swap: 33559768K av,       0K used, 33559768K free 15461796K cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
   19 root      25   0     0    0     0 SW   83.4  0.0  90:05 kswapd
  915 root      16   0   640  640   560 S     0.6  0.0   0:16 crond
   21 root      15   0     0    0     0 SW    0.5  0.0   0:29 kupdated
 1252 root      15   0  1212 1212   960 R     0.2  0.0   2:51 top
    1 root      15   0   480  480   428 S     0.0  0.0   0:04 init
    2 root      0K   0     0    0     0 SW    0.0  0.0   0:00
migration_CPU0
    3 root      0K   0     0    0     0 SW    0.0  0.0   0:00
migration_CPU1
    4 root      0K   0     0    0     0 SW    0.0  0.0   0:00
migration_CPU2
    5 root      0K   0     0    0     0 SW    0.0  0.0   0:00
migration_CPU3
    6 root      0K   0     0    0     0 SW    0.0  0.0   0:00
migration_CPU4
    7 root      0K   0     0    0     0 SW    0.0  0.0   0:00
migration_CPU5
    8 root      0K   0     0    0     0 SW    0.0  0.0   0:00
migration_CPU6
    9 root      0K   0     0    0     0 SW    0.0  0.0   0:00
migration_CPU7
   10 root      15   0     0    0     0 SW    0.0  0.0   3:59 keventd
   11 root      34  19     0    0     0 SWN   0.0  0.0   0:03
ksoftirqd_CPU0
   12 root      34  19     0    0     0 SWN   0.0  0.0   0:02
ksoftirqd_CPU1
   13 root      34  19     0    0     0 SWN   0.0  0.0   0:03
ksoftirqd_CPU2
   14 root      34  19     0    0     0 SWN   0.0  0.0   0:02
ksoftirqd_CPU3
   15 root      34  19     0    0     0 SWN   0.0  0.0   0:03
ksoftirqd_CPU4
   16 root      34  19     0    0     0 SWN   0.0  0.0   0:02
ksoftirqd_CPU5
   17 root      34  19     0    0     0 SWN   0.0  0.0   0:03
ksoftirqd_CPU6
   18 root      34  19     0    0     0 SWN   0.0  0.0   0:02
ksoftirqd_CPU7
   20 root      15   0     0    0     0 SW    0.0  0.0   0:01 bdflush
   22 root      25   0     0    0     0 SW    0.0  0.0   0:00
mdrecoveryd
   28 root      25   0     0    0     0 SW    0.0  0.0   0:00 scsi_eh_0
   29 root      25   0     0    0     0 SW    0.0  0.0   0:00 scsi_eh_1
   30 root      25   0     0    0     0 SW    0.0  0.0   0:00 scsi_eh_2
   31 root      25   0     0    0     0 SW    0.0  0.0   0:00 scsi_eh_3
   32 root      25   0     0    0     0 SW    0.0  0.0   0:00 scsi_eh_4
   33 root      25   0     0    0     0 SW    0.0  0.0   0:00 scsi_eh_5
   34 root      25   0     0    0     0 SW    0.0  0.0   0:00 scsi_eh_6
   40 root      15   0     0    0     0 SW    0.0  0.0  10:09 raid5d
   41 root      15   0     0    0     0 SW    0.0  0.0   2:40 raid5syncd
   42 root      16   0     0    0     0 SW    0.0  0.0   0:00 raid1d
   43 root      15   0     0    0     0 SW    0.0  0.0   0:02 raid1d
   44 root      15   0     0    0     0 SW    0.0  0.0   0:01 raid1syncd
   45 root      15   0     0    0     0 SW    0.0  0.0   0:05 raid1d
   46 root      15   0     0    0     0 SW    0.0  0.0   0:03 raid1syncd
   47 root      15   0     0    0     0 SW    0.0  0.0   0:11 raid1d
   48 root      15   0     0    0     0 SW    0.0  0.0   0:06 raid1syncd
   49 root      15   0     0    0     0 SW    0.0  0.0   0:11 raid1d
   50 root      15   0     0    0     0 SW    0.0  0.0   0:06 raid1syncd
   51 root      16   0     0    0     0 SW    0.0  0.0   0:00 raid1d
   52 root      15   0     0    0     0 SW    0.0  0.0   0:01 kjournald
  108 root      16   0     0    0     0 SW    0.0  0.0   0:00 khubd
  211 root      15   0     0    0     0 SW    0.0  0.0   0:00 kjournald
  224 root      15   0     0    0     0 SW    0.0  0.0   0:00 kjournald
  227 root      15   0     0    0     0 SW    0.0  0.0   0:00 kjournald
  228 root      15   0     0    0     0 SW    0.0  0.0   0:00 kjournald
  229 root      15   0     0    0     0 SW    0.0  0.0   0:10 kjournald
  230 root      16   0     0    0     0 SW    0.0  0.0   0:00 kjournald
  231 root      15   0     0    0     0 SW    0.0  0.0   0:00 kjournald
  232 root      16   0     0    0     0 SW    0.0  0.0   0:00 kjournald
  233 root      15   0     0    0     0 SW    0.0  0.0   0:27 kjournald
  515 root      15   0   540  540   460 S     0.0  0.0   0:00 syslogd
  519 root      15   0   428  428   376 S     0.0  0.0   0:00 klogd
  536 rpc       16   0   576  576   496 S     0.0  0.0   0:00 portmap
  555 rpcuser   16   0   760  760   660 S     0.0  0.0   0:00 rpc.statd
  626 root      16   0  1000 1000   816 S     0.0  0.0   0:01 ypbind
  672 root      15   0  4908 4904  2060 S     0.0  0.0   0:00 snmpd
  682 root      16   0  1472 1472  1328 S     0.0  0.0   0:26 sshd
  696 root      15   0   944  944   792 S     0.0  0.0   0:00 xinetd
  710 ntp       15   0  1908 1908  1720 S     0.0  0.0   0:00 ntpd
  729 root      17   0   552  552   476 S     0.0  0.0   0:00
rpc.rquotad
  734 root      15   0     0    0     0 SW    0.0  0.0   0:00 nfsd
  735 root      15   0     0    0     0 SW    0.0  0.0   0:00 nfsd
  736 root      15   0     0    0     0 SW    0.0  0.0   0:00 nfsd
  737 root      15   0     0    0     0 SW    0.0  0.0   0:00 nfsd
  738 root      15   0     0    0     0 SW    0.0  0.0   0:00 nfsd
  739 root      15   0     0    0     0 SW    0.0  0.0   0:00 nfsd
  740 root      15   0     0    0     0 SW    0.0  0.0   0:00 nfsd
  741 root      15   0     0    0     0 SW    0.0  0.0   0:00 nfsd
  742 root      17   0     0    0     0 SW    0.0  0.0   0:00 lockd
  743 root      15   0     0    0     0 SW    0.0  0.0   0:00 rpciod
  749 root      18   0   816  816   680 S     0.0  0.0   0:01 rpc.mountd
  758 root      15   0  1524 1524  1204 S     0.0  0.0   0:00 amd
  772 root      16   0  1228 1228  1052 S     0.0  0.0   0:00
safe_mysqld
  804 mysql     15   0  4728 4728  2092 S     0.0  0.0   0:00 mysqld
  818 root      15   0  2312 2312  1684 S     0.0  0.0   0:00 sendmail
  828 smmsp     15   0  2084 2080  1580 S     0.0  0.0   0:00 sendmail
  839 root      17   0   432  432   380 S     0.0  0.0   0:00 gpm
  850 root      15   0  8064 8064  7908 S     0.0  0.0   0:00 httpd
  893 postgres  15   0  1784 1784  1700 S     0.0  0.0   0:00 postmaster
  903 postgres  16   0  1776 1776  1712 S     0.0  0.0   0:00 postmaster
  904 postgres  16   0  1808 1808  1716 S     0.0  0.0   0:00 postmaster
  956 xfs       15   0  3200 3200   904 S     0.0  0.0   0:00 xfs
  965 daemon    15   0   524  524   464 S     0.0  0.0   0:00 atd
  974 root      16   0   408  408   356 S     0.0  0.0   0:00 mingetty
  975 root      16   0   408  408   356 S     0.0  0.0   0:00 mingetty
  976 root      16   0   408  408   356 S     0.0  0.0   0:00 mingetty
  977 root      16   0   408  408   356 S     0.0  0.0   0:00 mingetty
  978 root      16   0   408  408   356 S     0.0  0.0   0:00 mingetty
  979 root      16   0   408  408   356 S     0.0  0.0   0:00 mingetty
  991 norm      15   0  1704 1704  1096 S     0.0  0.0   0:00 tcsh
 1016 root      16   0  1080 1080   892 S     0.0  0.0   0:00 su
 1020 root      15   0  1512 1512  1200 S     0.0  0.0   0:00 bash
 1188 norm      15   0  1764 1764  1104 S     0.0  0.0   0:00 tcsh
 1214 root      15   0  1080 1080   892 S     0.0  0.0   0:00 su
 1218 root      15   0  1484 1484  1184 S     0.0  0.0   0:00 bash
 1260 norm      15   0  1760 1760  1096 S     0.0  0.0   0:00 tcsh
 1324 root      16   0  1080 1080   892 S     0.0  0.0   0:00 su
 1328 root      17   0  1488 1488  1188 S     0.0  0.0   0:00 bash

I have two 16G swap partitions:

/dev/sdb2       swap                    swap    defaults 0 0
/dev/sda2       swap                    swap    defaults 0 0

Seems like a lot but I was going on the "should have 2x the amount of
memory for swap" rule of thumb. The system should not be using swap
anyway. So why is kswapd so busy?

> On Thu, Nov 28, 2002 at 09:48:50AM +1000, jason andrade wrote:
> > What is the behaviour of the system like without running the rsync
process
> > over a period of time ?  what does /proc/interrupts show you - are
they
> > balanced across cpus ?

/proc/interrupts looks good (I think) with numbers balanced over CPUs.
I'll spare you the output unless you want to see it.

My current experiment will be to do nothing! I'll reboot the system and
see if it slows down overnight while nothing serious is happening.

Sigh.

-- 
Norman Gaywood -- School of Mathematical and Computer Sciences
University of New England, Armidale, NSW 2351, Australia
norm at turing.une.edu.au     http://turing.une.edu.au/~norm
Phone: +61 2 6773 2412     Fax: +61 2 6773 3312

_______________________________________________
Linux-PowerEdge mailing list
Linux-PowerEdge at dell.com
http://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq or search the list
archives at http://lists.us.dell.com/htdig/

_______________________________________________
Linux-PowerEdge mailing list
Linux-PowerEdge at dell.com
http://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq or search the list
archives at http://lists.us.dell.com/htdig/




More information about the Linux-PowerEdge mailing list