About KIPMI0 process

RB aoz.syn at gmail.com
Tue May 29 21:57:34 CDT 2007


Too many daemons/options in this little head of mine, too little
sleep; strike the earlier "audit" statement and replace it with
"ACCT":

[bob at tst ~] sudo zgrep ACCT /proc/config.gz
Password:
CONFIG_BSD_PROCESS_ACCT=y

http://www.gnu.org/directory/acct.html


RB

On 5/29/07, RB <aoz.syn at gmail.com> wrote:
> http://people.redhat.com/sgrubb/audit/
>
> [bob at tst ~] sudo zgrep AUDIT /proc/config.gz
> Password:
> # CONFIG_AUDIT is not set
>
> My system's not set up to do it, but if you're looking to find out
> *every* userspace process consuming resources, CONFIG_AUDIT and it's
> accompanying tools are where you'll need to go.
>
>
> RB
>
>
> On 5/29/07, Chance Reschke <reschke at bakerlab.org> wrote:
> > Hi,
> >
> > A couple of things:
> >
> > 1) ditto to ME_Brown at dell on the disk spin-down, etc.
> >
> > 2) You say you have 150 identical boxes and that only one exhibits
> > this behavior in the top(1) output.  Really?  You've set up all 150
> > with minimal installs, killed every daemon, fired up top and stared
> > at the screen for hours?  Sounds hellish! ;)
> >
> > My guess is that there's nothing wrong and that some normal low-level
> > process is occasionally having to wait a bit for access to a device.
> > There's lots of nonsense in even a minimal install these days -
> > selinux, cpuspeed management, irq balancing, and on and on.  Any of
> > this stuff could be the culprit.
> >
> > Good luck!
> >
> >   - Chance
> >
> >
> > --
> > Chance Reschke
> > Biochemistry Department
> > University of Washington
> >
> >
> >
> > On May 29, 2007, at 2:38 PM, Michael E Brown wrote:
> >
> > > On Tue, May 29, 2007 at 11:37:48AM -0700, Chris - PowerEdge Linux
> > > List wrote:
> > >> ----- Original Message -----
> > >> From: "Michael E Brown" <Michael_E_Brown at dell.com>
> > >> Sent: Tuesday, May 29, 2007 9:49 AM
> > >> Subject: Re: About KIPMI0 process
> > >>
> > >>
> > >>> On Fri, May 25, 2007 at 05:35:20PM -0700, Chance Reschke wrote:
> > >>>>
> > >>>> I'm don't follow this list closely and I'm just jumping into this
> > >>>> thread in the middle and apologize if I'm bringing something up
> > >>>> that's already been covered.
> > >>>>
> > >>>> Anyway, the load average doesn't necessarily have anything to do
> > >>>> with
> > >>>> how busy or not busy any of your CPUs are.  Rather, it's a
> > >>>> reflection
> > >>>> of the depth of the run queue.  The run queue can back up because
> > >>>> there aren't enough cycles to service the load being offered by the
> > >>>> computing you want to do, but it can just as easily be the
> > >>>> product of
> > >>>> the CPUs waiting around for access to a device - usually a disk.
> > >>>> Look for processes in state 'D' - device-wait - and you might find
> > >>>> the culprit(s).  Even with very fast storage, this can easily
> > >>>> happen
> > >>>> when two or more processes attempt to write to a single file
> > >>>> simultaneously.  Fast CPUs, fast disk, almost nothing even
> > >>>> trying to
> > >>>> run, but high load average anyway.
> > >>>
> > >>> +1. I was just about to say the exact same thing.
> > >>
> > >> Guys - really appreciate these suggestions, and while what you
> > >> suggest does
> > >> make logical sense - this server was loaded with RHEL 4.4 "minimal
> > >> install"
> > >> and as of yet has *nothing* running on it, no apps waiting for
> > >> disk or
> > >> anything else.  I just spent another half hour staring non-stop at
> > >> 'top'
> > >> watching the load go from 0.00 to 0.60 every few minutes and did
> > >> not see any
> > >> processes in the 'D' state - just the usual 'S', while 'top'
> > >> itself shows up
> > >> as state 'R' which is normal - and there's nothing else running -
> > >> all other
> > >> active processes are permanently in the 'S' state and nothing
> > >> changes when
> > >> the load hits 0.60.  Nothing.  The top 25 processes remain the
> > >> same, all in
> > >> 'S' state, all showing 0.00% CPU usage, yet load goes from 0.00 to
> > >> 0.60
> > >> within a 10-20 second span every few minutes.
> > >
> > > What about the rest of the processes? If one of the bottom 25
> > > processes
> > > went zombie, it would bring the load average up without showing up in
> > > top.
> > >
> > >>
> > >> Are you suggesting that perhaps this system somehow spins down
> > >> disks every
> > >
> > > That isnt what I was suggesting. Every process in an uninterruptible
> > > wait will drive up the load average. For example, zombie processes do
> > > this. If, for example, you have a process that forks off another and
> > > then doesnt clean it up for a few seconds when it quits, you will see
> > > load average spike. Another example would be as the original poster
> > > mentioned, processes waiting on disk io.
> > >
> > > The point, really, is that load average isnt always an accurate
> > > measure
> > > of CPU 'busyness'.
> > >
> > > Also, your earlier suggestion that ipmi or dell_rbu might have
> > > something
> > > to do with it seem rather unlikely to me. If you dont have the ipmi
> > > modules loaded, I dont see that it could cause your system to be busy.
> > > And the dell_rbu driver doesnt do *anything* unless you
> > > specifically are
> > > doing a BIOS update at that exact second.
> > >
> > >> few minutes (5 or so?) and then has to spin them up again, which
> > >> is what
> > >> might be causing this?  That makes sense, but out of 150+
> > >> identical hardware
> > >> boxes we have, this one is the only one that is experiencing this
> > >> problem so
> > >> I find it odd that the SCSI or RAID bios would have settings set
> > >> to spin
> > >> drives down like this - again, with identical hardware and
> > >> software loaded.
> > >> I don't think we would have ever set it up like this, given the
> > >> choice.  If
> > >> this is the case, how do we turn off disk spin-down?
> > >>
> > >> For reference, I checked all drives and they all pass with flying
> > >> colors, so
> > >> there's no issues with the drives going bad or RAID controller
> > >> reporting any
> > >> errors either.
> > >
> > > Shooting in the dark: have you checked all of the RAID card
> > > settings to
> > > ensure they are identical between systems? BIOS Settings? Things like
> > > 'patrol read' on some raid cards *might* do something similar.
> > > --
> > > Michael
> > >
> > > _______________________________________________
> > > Linux-PowerEdge mailing list
> > > Linux-PowerEdge at dell.com
> > > http://lists.us.dell.com/mailman/listinfo/linux-poweredge
> > > Please read the FAQ at http://lists.us.dell.com/faq
> >
> > _______________________________________________
> > Linux-PowerEdge mailing list
> > Linux-PowerEdge at dell.com
> > http://lists.us.dell.com/mailman/listinfo/linux-poweredge
> > Please read the FAQ at http://lists.us.dell.com/faq
> >
>



More information about the Linux-PowerEdge mailing list