About KIPMI0 process

Chance Reschke reschke at bakerlab.org
Fri May 25 19:35:20 CDT 2007


Hi,

I'm don't follow this list closely and I'm just jumping into this  
thread in the middle and apologize if I'm bringing something up  
that's already been covered.

Anyway, the load average doesn't necessarily have anything to do with  
how busy or not busy any of your CPUs are.  Rather, it's a reflection  
of the depth of the run queue.  The run queue can back up because  
there aren't enough cycles to service the load being offered by the  
computing you want to do, but it can just as easily be the product of  
the CPUs waiting around for access to a device - usually a disk.   
Look for processes in state 'D' - device-wait - and you might find  
the culprit(s).  Even with very fast storage, this can easily happen  
when two or more processes attempt to write to a single file  
simultaneously.  Fast CPUs, fast disk, almost nothing even trying to  
run, but high load average anyway.

Good luck!

  - Chance

--
Chance Reschke
Department of Biochemistry
University of Washington



On May 25, 2007, at 5:20 PM, Chris - PowerEdge Linux List wrote:

> ----- Original Message -----
> From: "Michael E Brown" <Michael_E_Brown at dell.com>
> Sent: Wednesday, May 23, 2007 3:56 PM
> Subject: Re: About KIPMI0 process
>
>>> What do we have to do to turn all this stuff completely off to  
>>> bring the
>>> CPU
>>> load down to 0.00 when it's not running anything at all?  I'm  
>>> open to any
>>> and all suggestions at this point.  We've resisted putting this  
>>> server
>>> into
>>> production.  I know this is considered "harmless" load by Dell,  
>>> but it
>>> really messes up our monitoring systems and alters the true CPU  
>>> load that
>>> we
>>> monitor for best application processing.  There's no reason we  
>>> should be
>>> seeing anything but 0.00 on a system that has nothing installed and
>>> nothing
>>> running on it.
>>
>> You sure it isnt some random system daemon? You havent provided  
>> any data
>> to show what is causing the cpu load.
>
> That's the problem.  I disabled virtually all daemons that get  
> installed
> with 'RHEL4 minimal install' and I've been watching 'top' and even  
> have a
> script running that constantly checks the load and if it exceeds  
> 0.40 it
> loggs the top 20 processes once a second, and NOTHING is  
> showing...  I just
> spent the last 15 minutes staring non-stop at 'top' and here's what  
> the
> results look like, when the load suddenly spikes at 0.60:
>
> ---------------------------------------------------------------------- 
> ------------
> Thu May 24 17:04:02 PDT 2007
> top - 17:04:03 up 1 day, 22:08,  2 users,  load average: 0.60,  
> 0.23, 0.08
> Tasks:  59 total,   1 running,  58 sleeping,   0 stopped,   0 zombie
> Cpu0  :  0.0% us,  0.0% sy,  0.0% ni, 100.0% id,  0.0% wa,  0.0%  
> hi,  0.0%
> si
> Cpu1  :  0.0% us,  0.0% sy,  0.0% ni, 100.0% id,  0.0% wa,  0.0%  
> hi,  0.0%
> si
> Cpu2  :  0.0% us,  0.0% sy,  0.0% ni, 100.0% id,  0.0% wa,  0.0%  
> hi,  0.0%
> si
> Cpu3  :  0.0% us,  0.0% sy,  0.0% ni, 100.0% id,  0.0% wa,  0.0%  
> hi,  0.0%
> si
> Mem:   4149240k total,   336916k used,  3812324k free,    46620k  
> buffers
> Swap:  4192956k total,        0k used,  4192956k free,   231060k  
> cached
>
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>     1 root      15   0  3536  548  472 S    0  0.0   0:00.63 init
>     2 root      RT   0     0    0    0 S    0  0.0   0:00.02  
> migration/0
>     3 root      34  19     0    0    0 S    0  0.0   0:00.00  
> ksoftirqd/0
>     4 root      RT   0     0    0    0 S    0  0.0   0:00.01  
> migration/1
>     5 root      34  19     0    0    0 S    0  0.0   0:00.00  
> ksoftirqd/1
>     6 root      RT   0     0    0    0 S    0  0.0   0:00.02  
> migration/2
>     7 root      34  19     0    0    0 S    0  0.0   0:00.00  
> ksoftirqd/2
>     8 root      RT   0     0    0    0 S    0  0.0   0:00.01  
> migration/3
>     9 root      34  19     0    0    0 S    0  0.0   0:00.00  
> ksoftirqd/3
>    10 root       5 -10     0    0    0 S    0  0.0   0:00.00 events/0
> ---------------------------------------------------------------------- 
> ------------
>
> The 2nd user is me - one SSH session running 'top', second session me
> grabbing data from the log.  Nothing else is running.  'init' seems  
> to stay
> at the top of the 'top' list, and someone for no reason whatsoever  
> the load
> goes from 0.00 to around 0.5 to 0.6 for about 20-40 seconds, then  
> drops back
> down to 0.00.  I don't see any other processes running when the  
> load spikes
> to ~0.60, and as you can see from the 'top' list above, there's  
> nothing in
> the %CPU column either, which is the part that is driving me nuts.
> SOMETHING is causing the CPU load to spike, but not a single  
> process is
> showing in 'top' as using anything but 0% CPU.
>
> Any ideas what else I can try or how else to troubleshoot this to  
> figure out
> what on earth might be causing this?  I really don't see any  
> "processes"
> using CPU resources when this load issue occurs.
>
> Chris
>
> _______________________________________________
> Linux-PowerEdge mailing list
> Linux-PowerEdge at dell.com
> http://lists.us.dell.com/mailman/listinfo/linux-poweredge
> Please read the FAQ at http://lists.us.dell.com/faq



More information about the Linux-PowerEdge mailing list