DOMSA 4.80 -- dcstor32d process 'D' state and workaround.

Andrew Rechenberg arechenberg at shermanfinancialgroup.com
Mon Jun 30 07:12:01 CDT 2003


I'm not sure about the Dell OMSA, but you should definitely update your
kernel if you have a multi-processor machine..  

There was a serious ext3 bug in 2.4.18-3 that can cause SMP systems to
panic so I would update to a newer kernel if it is possible.  When you
update you'll have to re-compile the OMSA kernel modules as well and
that may help your issue.

Good luck,
Andy.


On Fri, 2003-06-27 at 16:36, Adrian Chung wrote:
> Hi there, we've been battling and trying to come up with a solution to
> a certain niggling problem for about a year now.
> 
> We SNMP poll our Linux servers for various system states to make sure
> they're healthy.  We do this every few minutes.  We've found that
> occassionally, anywhere from a couple of times a day to once or twice
> every few weeks, some of the servers stop responding to polls.  Upon
> investigation, a process listing shows one of the dcstor32d processes
> as stuck in the 'D' state.  Originally, we'd have to schedule a
> maintenance window to reboot, but we've since found a workaround,
> which allows us to restart the dellomsa components.  A process in 'D'
> can't be killed, and attempts to restart the dell components would
> fail miserably.  Our workaround allows us to successfully poll the box
> again for some time before it happens again.
> 
> A typical process list just showing dcstor32d:
> 
> USER       PID %CPU %MEM   VSZ  RSS TTY      STAT START   TIME COMMAND
> root     13439  0.0  1.2 17316 3296 ?        D    Dec03   0:20
> /usr/sbin/dcstor32d
> [...]
> 
> The workaround is as follows:
> 
>    - In /etc/delloma.d/omsa, we rename the file .omsaipc (actually in
>      /usr/lib/dell/openmanage/omsa/.omsaipc) to anything else.
>    - Then we run /usr/sbin/dcstor32d, which makes the old dcstor32d
>      which is stuck in 'D' die somehow.
>    - Then we do an /etc/init.d/dellomsa stop and start, and
>      everything's happy again.
> 
> We've been trying to get to the bottom of why and how this manifests
> itself, but as far as we can tell it has something to do with
> functionality inside the esm.o module that we can't get at.
> 
> Does anyone else experience this?  Can anyone from Dell shed any light
> on this?
> 
> It is worthwhile to note that the servers in question are running:
>     - RedHat 7.3
>     - kernel-2.4.18-3
>     - dellomsa-drivers-4.80-3736
>     - dellomsa-4.80-3736
>     - ucd-snmp-utils-4.2.5-7.73.0
>     - ucd-snmp-4.2.5-7.73.0
>     - ucd-snmp-devel-4.2.5-7.73.0
> 
> The problem seems to happen more often during periods of higher
> network I/O, though boxes which are pretty much idle also exhibit the
> same problem.  All the boxes that this happens on are running
> iptables/Netfilter modules (1.2.6a).
> 
> --
> Adrian Chung (adrian at enfusion-group dot com)
> http://www.enfusion-group.com/~adrian/
> GPG Fingerprint: C620 C8EA 86BA 79CC 384C E7BE A10C 353B 919D 1A17
> [gambit.enfusion-group.com] 4:20pm up 34 days, 17:57, 10 users
> 
> _______________________________________________
> Linux-PowerEdge mailing list
> Linux-PowerEdge at dell.com
> http://lists.us.dell.com/mailman/listinfo/linux-poweredge
> Please read the FAQ at http://lists.us.dell.com/faq or search the list
> archives at http://lists.us.dell.com/htdig/
> 
> 




More information about the Linux-PowerEdge mailing list