DOMSA 4.80 -- dcstor32d process 'D' state and workaround.
arechenberg at shermanfinancialgroup.com
Mon Jun 30 07:12:01 CDT 2003
I'm not sure about the Dell OMSA, but you should definitely update your
kernel if you have a multi-processor machine..
There was a serious ext3 bug in 2.4.18-3 that can cause SMP systems to
panic so I would update to a newer kernel if it is possible. When you
update you'll have to re-compile the OMSA kernel modules as well and
that may help your issue.
On Fri, 2003-06-27 at 16:36, Adrian Chung wrote:
> Hi there, we've been battling and trying to come up with a solution to
> a certain niggling problem for about a year now.
> We SNMP poll our Linux servers for various system states to make sure
> they're healthy. We do this every few minutes. We've found that
> occassionally, anywhere from a couple of times a day to once or twice
> every few weeks, some of the servers stop responding to polls. Upon
> investigation, a process listing shows one of the dcstor32d processes
> as stuck in the 'D' state. Originally, we'd have to schedule a
> maintenance window to reboot, but we've since found a workaround,
> which allows us to restart the dellomsa components. A process in 'D'
> can't be killed, and attempts to restart the dell components would
> fail miserably. Our workaround allows us to successfully poll the box
> again for some time before it happens again.
> A typical process list just showing dcstor32d:
> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
> root 13439 0.0 1.2 17316 3296 ? D Dec03 0:20
> The workaround is as follows:
> - In /etc/delloma.d/omsa, we rename the file .omsaipc (actually in
> /usr/lib/dell/openmanage/omsa/.omsaipc) to anything else.
> - Then we run /usr/sbin/dcstor32d, which makes the old dcstor32d
> which is stuck in 'D' die somehow.
> - Then we do an /etc/init.d/dellomsa stop and start, and
> everything's happy again.
> We've been trying to get to the bottom of why and how this manifests
> itself, but as far as we can tell it has something to do with
> functionality inside the esm.o module that we can't get at.
> Does anyone else experience this? Can anyone from Dell shed any light
> on this?
> It is worthwhile to note that the servers in question are running:
> - RedHat 7.3
> - kernel-2.4.18-3
> - dellomsa-drivers-4.80-3736
> - dellomsa-4.80-3736
> - ucd-snmp-utils-4.2.5-7.73.0
> - ucd-snmp-4.2.5-7.73.0
> - ucd-snmp-devel-4.2.5-7.73.0
> The problem seems to happen more often during periods of higher
> network I/O, though boxes which are pretty much idle also exhibit the
> same problem. All the boxes that this happens on are running
> iptables/Netfilter modules (1.2.6a).
> Adrian Chung (adrian at enfusion-group dot com)
> GPG Fingerprint: C620 C8EA 86BA 79CC 384C E7BE A10C 353B 919D 1A17
> [gambit.enfusion-group.com] 4:20pm up 34 days, 17:57, 10 users
> Linux-PowerEdge mailing list
> Linux-PowerEdge at dell.com
> Please read the FAQ at http://lists.us.dell.com/faq or search the list
> archives at http://lists.us.dell.com/htdig/
More information about the Linux-PowerEdge