omreport segfault and OMSA with nagios many semaphore

Chandrasekhar_R at Dell.com Chandrasekhar_R at Dell.com
Fri Sep 23 11:07:17 CDT 2011


Hi Shawn,

Can you please make sure your storage related drivers/firmware components are upto date? 

Can you tell us the OMSA commands you are using? What are the storage controllers you have?

Thanks,
Chandrasekhar R
Dell | OpenManage
office +91 80 41178649


Message: 5
Date: Thu, 22 Sep 2011 01:03:57 -0700
From: shawn at systemtemplar.org
Subject: omreport segfault and OMSA with nagios many semaphore
To: linux-poweredge at dell.com
Message-ID: <4E7AEBED.4030308 at systemtemplar.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

>  Hmm...
>
>  So I guess this is only happening to me?
>
>  Strange I have tried a firmware update in different version of OMSA  
> and CentOS but all failed.

Hello, I am having the exact same issue on a pair of R710's running Scientific Linux 6.

I'm using the nagios plugin check_openmanage, which is nothing more than a perl script that scrapes 'omreport' and puts the results into a nagios readable format. With the amount of checks I'm doing this amounts to running omreport 20 times every 5 minutes.

Each server has the exact same packages (managed with puppet) and I was not having the problem until an update from last week (details incoming). Since then i'm running out of semaphores to the point where I had to setup the following cron.hourly script:

#!/bin/bash
## NOTE: I don't use semaphores for anything so I can happily clear them all. ymmv.
for i in $(ipcs |grep "0x0" |grep 600 |awk '{print $2}'); do ipcrm -s $i; done; > /dev/null 2>&1 /opt/dell/srvadmin/sbin/srvadmin-services.sh restart > /dev/null 2>&1

I was doing a cron.daily script, but i started getting too many false alerts so switched to hourly. I'm ready to turn off omsa and switch to ipmitool.

/var/log/messages has a boatload of these:
Sep 21 17:01:01 ndb02 kernel: dsm_sa_datamgrd[25547]: segfault at 8 ip 00007fe5af91ceaa sp 00007fe5a6bdbde8 error 4 in libdsm_sm_queue.so[7fe5af917000+a000]

So. What's this about an update, and it was working before? Well sir. 
This server was working fine for the past 3 months with no changes. We get to an upgrade cycle, upgrade several packages, and now it's problematic.

Here's what we updated:

Packages Installed:
     kernel-devel-2.6.32-131.12.1.el6.x86_64
     kernel-2.6.32-131.12.1.el6.x86_64

  Packages Updated:
     curl-7.19.7-26.el6_1.1.x86_64
     libcurl-7.19.7-26.el6_1.1.x86_64
     selinux-policy-targeted-3.7.19-93.el6_1.7.noarch
     nss-softokn-3.12.9-3.el6.x86_64
     perl-Compress-Zlib-2.020-119.el6.x86_64
     subversion-1.6.11-2.el6_1.4.x86_64
     1:dbus-libs-1.2.24-5.el6_1.x86_64
     nss-3.12.9-12.el6_1.x86_64
     4:perl-5.10.1-119.el6.x86_64
     32:bind-utils-9.7.3-2.el6_1.P3.2.x86_64
     32:bind-libs-9.7.3-2.el6_1.P3.2.x86_64
     1:perl-Pod-Simple-3.13-119.el6.x86_64
     pixman-0.18.4-1.el6_0.1.x86_64
     ca-certificates-2010.63-3.el6_1.5.noarch
     apr-1.3.9-3.el6_1.2.x86_64
     facter-1.6.0-2.el6.noarch
     perl-Compress-Raw-Zlib-2.023-119.el6.x86_64
     perl-Proc-ProcessTable-0.45-1.el6.rf.x86_64
     3:perl-version-0.77-119.el6.x86_64
     rsyslog-4.6.2-3.el6_1.2.x86_64
     perl-IO-Compress-Base-2.020-119.el6.x86_64
     elfutils-libelf-0.152-1.el6.x86_64
     ruby-1.8.7.299-7.el6_1.1.x86_64
     ruby-libs-1.8.7.299-7.el6_1.1.x86_64
     nss-sysinit-3.12.9-12.el6_1.x86_64
     nss-softokn-freebl-3.12.9-3.el6.x86_64
     openssl-1.0.0-10.el6.x86_64
     selinux-policy-3.7.19-93.el6_1.7.noarch
     nspr-4.8.7-1.el6.x86_64
     krb5-libs-1.9-9.el6_1.1.x86_64
     kernel-headers-2.6.32-131.12.1.el6.x86_64
     1:dbus-1.2.24-5.el6_1.x86_64
     nss-util-3.12.9-1.el6.x86_64
     python-libs-2.6.6-20.el6.x86_64
     perl-IO-Compress-Zlib-2.020-119.el6.x86_64
     freetype-2.3.11-6.el6_1.6.x86_64
     1:perl-Pod-Escapes-1.04-119.el6.x86_64
     kernel-firmware-2.6.32-131.12.1.el6.noarch
     sudo-1.7.4p5-5.el6.x86_64
     4:perl-libs-5.10.1-119.el6.x86_64
     2:postfix-2.6.6-2.2.el6_1.x86_64
     python-2.6.6-20.el6.x86_64
     tzdata-2011h-3.el6.noarch
     4:perl-Time-HiRes-1.9721-119.el6.x86_64
     2:libpng-1.2.46-1.el6_1.x86_64
     system-config-firewall-base-1.2.27-3.el6_1.3.noarch
     12:dhclient-4.1.1-19.P1.el6_1.1.x86_64
     nss-softokn-freebl-3.12.9-3.el6.i686
     1:perl-Module-Pluggable-3.90-119.el6.x86_64

What omsa are we running? Good question:

Packages Installed:
     srvadmin-base-6.5.0-1.1.1.el6.x86_64
     srvadmin-xmlsup-6.5.0-1.141.2.el6.x86_64
     srvadmin-omacore-6.5.0-1.143.2.el6.x86_64
     srvadmin-deng-6.5.0-1.31.1.el6.x86_64
     srvadmin-isvc-6.5.0-1.52.2.el6.x86_64
     sysfsutils-2.1.0-6.1.el6.x86_64
     srvadmin-storelib-sysfs-6.5.0-1.1.1.el6.x86_64
     srvadmin-smcommon-6.5.0-1.201.1.el6.x86_64
     ipmitool-1.8.11-99.dell.1.117.1.el6.x86_64
     srvadmin-sysfsutils-6.5.0-1.1.el6.x86_64
     srvadmin-storage-6.5.0-1.201.1.el6.x86_64
     srvadmin-omcommon-6.5.0-1.142.2.el6.x86_64
     srvadmin-hapi-6.5.0-1.33.2.el6.x86_64
     libsysfs-2.1.0-6.1.el6.x86_64
     srvadmin-omilcore-6.5.0-1.396.1.el6.noarch
     srvadmin-storageservices-6.5.0-1.1.1.el6.x86_64
     libsmbios-2.2.26-6.1.el6.x86_64
     smbios-utils-bin-2.2.26-6.1.el6.x86_64
     srvadmin-storelib-6.5.0-1.326.1.el6.x86_64

What else have you done? We uninstalled omsa via doing a yum remove on the omsa packages listed above. Then reinstalled. Same problem.

BIOS=2.3.12 01/24/2011
iDRAC6=1.54

I am aware of the 3.0.0 bios being out, but policy prevents me from upgrading it for several weeks. I don't think this is a bios related bug, as things were working great 2 weeks ago. One of the packages that we have upgraded clearly doesn't get alone with the current version of omsa. The question is, which?



------------------------------

Message: 6
Date: Thu, 22 Sep 2011 05:08:20 -0700 (PDT)
From: karasline at yahoo.com
Subject: Re: iDRAC6: undocumented settings for console com2
To: Arthur Prokosch <arthurp at csail.mit.edu>,
	"linux-poweredge at lists.us.dell.com"
	<linux-poweredge at lists.us.dell.com>
Message-ID:
	<1316693300.33894.YahooMailNeo at web160710.mail.bf1.yahoo.com>
Content-Type: text/plain; charset="iso-8859-1"

Sorry, I haven't gotten back to you.? A hernia operation has left me flat on my back (literally).

The topology can be much simpler and still fail.? For example



A-sysadm desktop <-----> B=iDRAC6 


If A runs a vncviewer (either locally or on B's machine) and brings up a browser in the vncviewer, points the browser to B's iDRAC6, and launches the console, the keys will be mapped incorrectly.? Chuck Anderson (http://lists.us.dell.com/pipermail/linux-poweredge/2011-September/045264.html) wrote of one possible solution, but it only seems to address the arrow and sysreg keys.? With respect to the iDRAC6, all the keys appear to be mapped incorrectly when typing in any text window on the console within a VNC session.? Since the same situation work fine with the iDRAC5, I would think that this should be an issue Dell should correct.


Note that if one brings up the browser from A and points the browser to B's iDRAC6 (without using VNC), it works okay.? However, my actual situation is much more complicated which, because of network security on the gateway machine, prohibits port forwarding.

In addition to working on the iDRAC5, the VNC interface also works when using HP's iLo3.? So again, I think that this is a bug in Dell's iDRAC6 firmware.? Does anyone know the method for reporting this bug?



________________________________
From: Arthur Prokosch <arthurp at csail.mit.edu>
To: linux-poweredge at lists.us.dell.com
Sent: Tuesday, September 6, 2011 5:12 PM
Subject: Re: iDRAC6: undocumented settings for console com2

On Mon, Aug 29, 2011 at 06:04:27AM -0700, karasline at yahoo.com wrote:
> Did you by any chance see my post on the iDRAC6 and VNC 
> (http://lists.us.dell.com/pipermail/linux-poweredge/2011-July/045028.html)??
> I wasn't sure if your iDRAC6 issue was related to mine, but 
> regardless, I still can't get the iDRAC6 console to work correctly 
> within a VNC window.? Any thoughts???

I did.? Our desire to use a serial console stems in part from how cumbersome it can be to access DRACs located on a secure network -- which I expect has something to do with why you're having to use VNC.
So I guess in that sense our posts are related.

So, topology:
? A=Sysadmin's desktop? <--->? B=gateway? <--->? C=DRAC

You're using VNC for the link between "A" and "B", then running the browser/java on "B".? So https and proprietary-applet-VNC are connecting "B" and "C".? I don't know what OS your intermediate host is running.? If *nix, you might check X11 keymaps and such.? Have you tried switching the flavor of VNC software running on A and/or B?

We initiate an ssh connection to the intermediate host with port forwarding, so that ports 443, 5900, and 5901 pass from A, through B to C.? That is, if both A and B are *nix/Mac OS: 
? ssh gatewayhost -L 443:drac-host:443 -L 5900:drac-host:5900 -L 5901:drac-host:5901 -N and then on my desktop, I point my browser to ? https://localhost/

Would our approach be of use to you?

-arthur.

_______________________________________________
Linux-PowerEdge mailing list
Linux-PowerEdge at dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.us.dell.com/pipermail/linux-poweredge/attachments/20110922/f64a1cbe/attachment.html 

------------------------------

_______________________________________________
Linux-PowerEdge mailing list
Linux-PowerEdge at dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge

End of Linux-PowerEdge Digest, Vol 88, Issue 27
***********************************************



More information about the Linux-PowerEdge mailing list