About KIPMI0 process

Chris - PowerEdge Linux List linux-poweredge at dotcomdesigners.com
Tue May 29 13:37:48 CDT 2007


----- Original Message ----- 
From: "Michael E Brown" <Michael_E_Brown at dell.com>
Sent: Tuesday, May 29, 2007 9:49 AM
Subject: Re: About KIPMI0 process


> On Fri, May 25, 2007 at 05:35:20PM -0700, Chance Reschke wrote:
>>
>> I'm don't follow this list closely and I'm just jumping into this
>> thread in the middle and apologize if I'm bringing something up
>> that's already been covered.
>>
>> Anyway, the load average doesn't necessarily have anything to do with
>> how busy or not busy any of your CPUs are.  Rather, it's a reflection
>> of the depth of the run queue.  The run queue can back up because
>> there aren't enough cycles to service the load being offered by the
>> computing you want to do, but it can just as easily be the product of
>> the CPUs waiting around for access to a device - usually a disk.
>> Look for processes in state 'D' - device-wait - and you might find
>> the culprit(s).  Even with very fast storage, this can easily happen
>> when two or more processes attempt to write to a single file
>> simultaneously.  Fast CPUs, fast disk, almost nothing even trying to
>> run, but high load average anyway.
>
> +1. I was just about to say the exact same thing.

Guys - really appreciate these suggestions, and while what you suggest does 
make logical sense - this server was loaded with RHEL 4.4 "minimal install" 
and as of yet has *nothing* running on it, no apps waiting for disk or 
anything else.  I just spent another half hour staring non-stop at 'top' 
watching the load go from 0.00 to 0.60 every few minutes and did not see any 
processes in the 'D' state - just the usual 'S', while 'top' itself shows up 
as state 'R' which is normal - and there's nothing else running - all other 
active processes are permanently in the 'S' state and nothing changes when 
the load hits 0.60.  Nothing.  The top 25 processes remain the same, all in 
'S' state, all showing 0.00% CPU usage, yet load goes from 0.00 to 0.60 
within a 10-20 second span every few minutes.

Are you suggesting that perhaps this system somehow spins down disks every 
few minutes (5 or so?) and then has to spin them up again, which is what 
might be causing this?  That makes sense, but out of 150+ identical hardware 
boxes we have, this one is the only one that is experiencing this problem so 
I find it odd that the SCSI or RAID bios would have settings set to spin 
drives down like this - again, with identical hardware and software loaded. 
I don't think we would have ever set it up like this, given the choice.  If 
this is the case, how do we turn off disk spin-down?

For reference, I checked all drives and they all pass with flying colors, so 
there's no issues with the drives going bad or RAID controller reporting any 
errors either.

Chris 



More information about the Linux-PowerEdge mailing list