About KIPMI0 process

Michael E Brown Michael_E_Brown at dell.com
Tue May 29 16:43:53 CDT 2007


On Tue, May 29, 2007 at 11:37:48AM -0700, Chris - PowerEdge Linux List wrote:
> ----- Original Message ----- 
> From: "Michael E Brown" <Michael_E_Brown at dell.com>
> Sent: Tuesday, May 29, 2007 9:49 AM
> Subject: Re: About KIPMI0 process
> 
> 
> > On Fri, May 25, 2007 at 05:35:20PM -0700, Chance Reschke wrote:
> >>
> >> I'm don't follow this list closely and I'm just jumping into this
> >> thread in the middle and apologize if I'm bringing something up
> >> that's already been covered.
> >>
> >> Anyway, the load average doesn't necessarily have anything to do with
> >> how busy or not busy any of your CPUs are.  Rather, it's a reflection
> >> of the depth of the run queue.  The run queue can back up because
> >> there aren't enough cycles to service the load being offered by the
> >> computing you want to do, but it can just as easily be the product of
> >> the CPUs waiting around for access to a device - usually a disk.
> >> Look for processes in state 'D' - device-wait - and you might find
> >> the culprit(s).  Even with very fast storage, this can easily happen
> >> when two or more processes attempt to write to a single file
> >> simultaneously.  Fast CPUs, fast disk, almost nothing even trying to
> >> run, but high load average anyway.
> >
> > +1. I was just about to say the exact same thing.
> 
> Guys - really appreciate these suggestions, and while what you suggest does 
> make logical sense - this server was loaded with RHEL 4.4 "minimal install" 
> and as of yet has *nothing* running on it, no apps waiting for disk or 
> anything else.  I just spent another half hour staring non-stop at 'top' 
> watching the load go from 0.00 to 0.60 every few minutes and did not see any 
> processes in the 'D' state - just the usual 'S', while 'top' itself shows up 
> as state 'R' which is normal - and there's nothing else running - all other 
> active processes are permanently in the 'S' state and nothing changes when 
> the load hits 0.60.  Nothing.  The top 25 processes remain the same, all in 
> 'S' state, all showing 0.00% CPU usage, yet load goes from 0.00 to 0.60 
> within a 10-20 second span every few minutes.
> 
> Are you suggesting that perhaps this system somehow spins down disks every 
> few minutes (5 or so?) and then has to spin them up again, which is what 
> might be causing this?  That makes sense, but out of 150+ identical hardware 
> boxes we have, this one is the only one that is experiencing this problem so 
> I find it odd that the SCSI or RAID bios would have settings set to spin 
> drives down like this - again, with identical hardware and software loaded. 
> I don't think we would have ever set it up like this, given the choice.  If 
> this is the case, how do we turn off disk spin-down?
> 
> For reference, I checked all drives and they all pass with flying colors, so 
> there's no issues with the drives going bad or RAID controller reporting any 
> errors either.

Are you getting lots (or unusual amounts) of interrupts during the busy
time? "watch cat /proc/interrupts".

What about vmstat?  "vmstat 1"
or iostat?  "iostat 1"

There are other tools than 'top' to see what is going on in your system.
If it comes down to it and you *really* want to see what is going on,
systemtap and (iirc) frysk let you probe the kernel.
--
Michael



More information about the Linux-PowerEdge mailing list