About KIPMI0 process

Chance Reschke reschke at bakerlab.org
Tue May 29 17:07:28 CDT 2007


Hi,

A couple of things:

1) ditto to ME_Brown at dell on the disk spin-down, etc.

2) You say you have 150 identical boxes and that only one exhibits  
this behavior in the top(1) output.  Really?  You've set up all 150  
with minimal installs, killed every daemon, fired up top and stared  
at the screen for hours?  Sounds hellish! ;)

My guess is that there's nothing wrong and that some normal low-level  
process is occasionally having to wait a bit for access to a device.   
There's lots of nonsense in even a minimal install these days -  
selinux, cpuspeed management, irq balancing, and on and on.  Any of  
this stuff could be the culprit.

Good luck!

  - Chance


--
Chance Reschke
Biochemistry Department
University of Washington



On May 29, 2007, at 2:38 PM, Michael E Brown wrote:

> On Tue, May 29, 2007 at 11:37:48AM -0700, Chris - PowerEdge Linux  
> List wrote:
>> ----- Original Message -----
>> From: "Michael E Brown" <Michael_E_Brown at dell.com>
>> Sent: Tuesday, May 29, 2007 9:49 AM
>> Subject: Re: About KIPMI0 process
>>
>>
>>> On Fri, May 25, 2007 at 05:35:20PM -0700, Chance Reschke wrote:
>>>>
>>>> I'm don't follow this list closely and I'm just jumping into this
>>>> thread in the middle and apologize if I'm bringing something up
>>>> that's already been covered.
>>>>
>>>> Anyway, the load average doesn't necessarily have anything to do  
>>>> with
>>>> how busy or not busy any of your CPUs are.  Rather, it's a  
>>>> reflection
>>>> of the depth of the run queue.  The run queue can back up because
>>>> there aren't enough cycles to service the load being offered by the
>>>> computing you want to do, but it can just as easily be the  
>>>> product of
>>>> the CPUs waiting around for access to a device - usually a disk.
>>>> Look for processes in state 'D' - device-wait - and you might find
>>>> the culprit(s).  Even with very fast storage, this can easily  
>>>> happen
>>>> when two or more processes attempt to write to a single file
>>>> simultaneously.  Fast CPUs, fast disk, almost nothing even  
>>>> trying to
>>>> run, but high load average anyway.
>>>
>>> +1. I was just about to say the exact same thing.
>>
>> Guys - really appreciate these suggestions, and while what you  
>> suggest does
>> make logical sense - this server was loaded with RHEL 4.4 "minimal  
>> install"
>> and as of yet has *nothing* running on it, no apps waiting for  
>> disk or
>> anything else.  I just spent another half hour staring non-stop at  
>> 'top'
>> watching the load go from 0.00 to 0.60 every few minutes and did  
>> not see any
>> processes in the 'D' state - just the usual 'S', while 'top'  
>> itself shows up
>> as state 'R' which is normal - and there's nothing else running -  
>> all other
>> active processes are permanently in the 'S' state and nothing  
>> changes when
>> the load hits 0.60.  Nothing.  The top 25 processes remain the  
>> same, all in
>> 'S' state, all showing 0.00% CPU usage, yet load goes from 0.00 to  
>> 0.60
>> within a 10-20 second span every few minutes.
>
> What about the rest of the processes? If one of the bottom 25  
> processes
> went zombie, it would bring the load average up without showing up in
> top.
>
>>
>> Are you suggesting that perhaps this system somehow spins down  
>> disks every
>
> That isnt what I was suggesting. Every process in an uninterruptible
> wait will drive up the load average. For example, zombie processes do
> this. If, for example, you have a process that forks off another and
> then doesnt clean it up for a few seconds when it quits, you will see
> load average spike. Another example would be as the original poster
> mentioned, processes waiting on disk io.
>
> The point, really, is that load average isnt always an accurate  
> measure
> of CPU 'busyness'.
>
> Also, your earlier suggestion that ipmi or dell_rbu might have  
> something
> to do with it seem rather unlikely to me. If you dont have the ipmi
> modules loaded, I dont see that it could cause your system to be busy.
> And the dell_rbu driver doesnt do *anything* unless you  
> specifically are
> doing a BIOS update at that exact second.
>
>> few minutes (5 or so?) and then has to spin them up again, which  
>> is what
>> might be causing this?  That makes sense, but out of 150+  
>> identical hardware
>> boxes we have, this one is the only one that is experiencing this  
>> problem so
>> I find it odd that the SCSI or RAID bios would have settings set  
>> to spin
>> drives down like this - again, with identical hardware and  
>> software loaded.
>> I don't think we would have ever set it up like this, given the  
>> choice.  If
>> this is the case, how do we turn off disk spin-down?
>>
>> For reference, I checked all drives and they all pass with flying  
>> colors, so
>> there's no issues with the drives going bad or RAID controller  
>> reporting any
>> errors either.
>
> Shooting in the dark: have you checked all of the RAID card  
> settings to
> ensure they are identical between systems? BIOS Settings? Things like
> 'patrol read' on some raid cards *might* do something similar.
> --
> Michael
>
> _______________________________________________
> Linux-PowerEdge mailing list
> Linux-PowerEdge at dell.com
> http://lists.us.dell.com/mailman/listinfo/linux-poweredge
> Please read the FAQ at http://lists.us.dell.com/faq



More information about the Linux-PowerEdge mailing list