About KIPMI0 process

Michael E Brown Michael_E_Brown at dell.com
Tue May 29 11:49:03 CDT 2007


On Fri, May 25, 2007 at 05:35:20PM -0700, Chance Reschke wrote:
> Hi,
> 
> I'm don't follow this list closely and I'm just jumping into this  
> thread in the middle and apologize if I'm bringing something up  
> that's already been covered.
> 
> Anyway, the load average doesn't necessarily have anything to do with  
> how busy or not busy any of your CPUs are.  Rather, it's a reflection  
> of the depth of the run queue.  The run queue can back up because  
> there aren't enough cycles to service the load being offered by the  
> computing you want to do, but it can just as easily be the product of  
> the CPUs waiting around for access to a device - usually a disk.   
> Look for processes in state 'D' - device-wait - and you might find  
> the culprit(s).  Even with very fast storage, this can easily happen  
> when two or more processes attempt to write to a single file  
> simultaneously.  Fast CPUs, fast disk, almost nothing even trying to  
> run, but high load average anyway.

+1. I was just about to say the exact same thing.
--
Michael


> 
> On May 25, 2007, at 5:20 PM, Chris - PowerEdge Linux List wrote:
> 
> > ----- Original Message -----
> > From: "Michael E Brown" <Michael_E_Brown at dell.com>
> > Sent: Wednesday, May 23, 2007 3:56 PM
> > Subject: Re: About KIPMI0 process
> >
> >>> What do we have to do to turn all this stuff completely off to  
> >>> bring the
> >>> CPU
> >>> load down to 0.00 when it's not running anything at all?  I'm  
> >>> open to any
> >>> and all suggestions at this point.  We've resisted putting this  
> >>> server
> >>> into
> >>> production.  I know this is considered "harmless" load by Dell,  
> >>> but it
> >>> really messes up our monitoring systems and alters the true CPU  
> >>> load that
> >>> we
> >>> monitor for best application processing.  There's no reason we  
> >>> should be
> >>> seeing anything but 0.00 on a system that has nothing installed and
> >>> nothing
> >>> running on it.
> >>
> >> You sure it isnt some random system daemon? You havent provided  
> >> any data
> >> to show what is causing the cpu load.
> >
> > That's the problem.  I disabled virtually all daemons that get  
> > installed
> > with 'RHEL4 minimal install' and I've been watching 'top' and even  
> > have a
> > script running that constantly checks the load and if it exceeds  
> > 0.40 it
> > loggs the top 20 processes once a second, and NOTHING is  
> > showing...  I just
> > spent the last 15 minutes staring non-stop at 'top' and here's what  
> > the
> > results look like, when the load suddenly spikes at 0.60:
> >
> > ---------------------------------------------------------------------- 
> > ------------
> > Thu May 24 17:04:02 PDT 2007
> > top - 17:04:03 up 1 day, 22:08,  2 users,  load average: 0.60,  
> > 0.23, 0.08
> > Tasks:  59 total,   1 running,  58 sleeping,   0 stopped,   0 zombie
> > Cpu0  :  0.0% us,  0.0% sy,  0.0% ni, 100.0% id,  0.0% wa,  0.0%  
> > hi,  0.0%
> > si
> > Cpu1  :  0.0% us,  0.0% sy,  0.0% ni, 100.0% id,  0.0% wa,  0.0%  
> > hi,  0.0%
> > si
> > Cpu2  :  0.0% us,  0.0% sy,  0.0% ni, 100.0% id,  0.0% wa,  0.0%  
> > hi,  0.0%
> > si
> > Cpu3  :  0.0% us,  0.0% sy,  0.0% ni, 100.0% id,  0.0% wa,  0.0%  
> > hi,  0.0%
> > si
> > Mem:   4149240k total,   336916k used,  3812324k free,    46620k  
> > buffers
> > Swap:  4192956k total,        0k used,  4192956k free,   231060k  
> > cached
> >
> >   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> >     1 root      15   0  3536  548  472 S    0  0.0   0:00.63 init
> >     2 root      RT   0     0    0    0 S    0  0.0   0:00.02  
> > migration/0
> >     3 root      34  19     0    0    0 S    0  0.0   0:00.00  
> > ksoftirqd/0
> >     4 root      RT   0     0    0    0 S    0  0.0   0:00.01  
> > migration/1
> >     5 root      34  19     0    0    0 S    0  0.0   0:00.00  
> > ksoftirqd/1
> >     6 root      RT   0     0    0    0 S    0  0.0   0:00.02  
> > migration/2
> >     7 root      34  19     0    0    0 S    0  0.0   0:00.00  
> > ksoftirqd/2
> >     8 root      RT   0     0    0    0 S    0  0.0   0:00.01  
> > migration/3
> >     9 root      34  19     0    0    0 S    0  0.0   0:00.00  
> > ksoftirqd/3
> >    10 root       5 -10     0    0    0 S    0  0.0   0:00.00 events/0
> > ---------------------------------------------------------------------- 
> > ------------
> >
> > The 2nd user is me - one SSH session running 'top', second session me
> > grabbing data from the log.  Nothing else is running.  'init' seems  
> > to stay
> > at the top of the 'top' list, and someone for no reason whatsoever  
> > the load
> > goes from 0.00 to around 0.5 to 0.6 for about 20-40 seconds, then  
> > drops back
> > down to 0.00.  I don't see any other processes running when the  
> > load spikes
> > to ~0.60, and as you can see from the 'top' list above, there's  
> > nothing in
> > the %CPU column either, which is the part that is driving me nuts.
> > SOMETHING is causing the CPU load to spike, but not a single  
> > process is
> > showing in 'top' as using anything but 0% CPU.
> >
> > Any ideas what else I can try or how else to troubleshoot this to  
> > figure out
> > what on earth might be causing this?  I really don't see any  
> > "processes"
> > using CPU resources when this load issue occurs.
> >
> > Chris
> >
> > _______________________________________________
> > Linux-PowerEdge mailing list
> > Linux-PowerEdge at dell.com
> > http://lists.us.dell.com/mailman/listinfo/linux-poweredge
> > Please read the FAQ at http://lists.us.dell.com/faq
> 
> _______________________________________________
> Linux-PowerEdge mailing list
> Linux-PowerEdge at dell.com
> http://lists.us.dell.com/mailman/listinfo/linux-poweredge
> Please read the FAQ at http://lists.us.dell.com/faq



More information about the Linux-PowerEdge mailing list