Starting dsm_sa_datamgr32d FAILED

Patrick_Boyd at Dell.com Patrick_Boyd at Dell.com
Mon Apr 30 09:38:31 CDT 2007


Ok the storage component is being told to exit in your case. This is not
related to the other crash that was under discussion on this list. I'm
really not sure why your process is not starting.

 

From: De Wetenschapper [mailto:wetenschapper at gmail.com] 
Sent: Monday, April 30, 2007 1:47 AM
To: Boyd, Patrick
Cc: Gary.Mansell at ricardo.com; Leccardi, Diego; linux-poweredge-Lists
Subject: Re: Starting dsm_sa_datamgr32d FAILED

 

I tried to comment out lsivil, but dsm_sa_datamgr32d STILL won't
start....

I also turned to debuggin on 3 (with lsivil NOT commented out) you can
view the file here:
http://www.jhdechoke.be/temp/dcomsm.log




On 4/27/07, Patrick_Boyd at dell.com < Patrick_Boyd at dell.com
<mailto:Patrick_Boyd at dell.com> > wrote:

Ok your problem appears to be in the lsivil. You can safely comment out
that line since you aren't using any LSI SCSI RAID controllers in your
system. That should restore management of your SAS based RAID
controllers and stop the crash. 

-----Original Message-----
From: Mansell, Gary [mailto:Gary.Mansell at ricardo.com]
Sent: Friday, April 27, 2007 3:38 AM
To: Boyd, Patrick; Leccardi, Diego
Cc: linux-poweredge-Lists 
Subject: RE: Starting dsm_sa_datamgr32d FAILED



-----Original Message-----
From: Mansell, Gary [mailto:Gary.Mansell at ricardo.com]
Sent: Friday, April 27, 2007 3:38 AM 
To: Boyd, Patrick; Leccardi, Diego
Cc: linux-poweredge-Lists
Subject: RE: Starting dsm_sa_datamgr32d FAILED

I have commented out the vil3=nrsvil line in the file
/opt/dell/srvadmin/sm/stsvc.ini ,as suggested, but it still causes OMSA
5.2 not to work:

[root at dfgsrv2] omreport system summary

System Summary

------------------
Software Profile
------------------
Systems Management
Name            : Server Administrator Install Core (subscription) 
Version         : 5.2.0
Description     : Systems Management Software
Contains        : Data Engine 5.7.0
                : Hardware Application Programming Interface 5.7.0
                : Instrumentation Service 5.7.0
                : Instrumentation Service Integration Layer 3.2.0
                : Meta package for installing correct IPMI dependencies
depending on OS 5.0.0
                : OS Specific Omauth Packages 3.2.0
                : OpenManage Inventory Collector 2.6.0
                : RAC Command Interface 5.1.0
                : RAC5 Integration Layer 5.1.0
                : Remote Access Card Data Populator RAC5, 5.1.0
                : Secure Port Server 3.2.0
                : Server Administrator Framework 3.2.0
                : Storage Management 2.2.0
                : Sun Java Runtime Environment 1.5.0

-------- 
System
--------
System
Host Name       : dfgsrv2.unix.stc.ricplc.com
System Location :


When I perform a status command, all seems OK (strangely): 

[root at dfgsrv2 ~]# srvadmin-services.sh status dell_rbu (module) is
running ipmi driver is running dsm_sa_datamgr32d (pid 9962) is running
dsm_sa_eventmgr32d (pid 9924) is running dsm_sa_snmp32d (pid 9935) is
running dsm_om_shrsvc32d (pid 6750) is running dsm_om_connsvc32d (pid
7483 7482) is running 

When I look at the messages file I can see the segfault:

Apr 27 09:09:35 dfgsrv2
kernel:
/var/lib/dkms/mptlinux/3.02.63/build/mptctl.c at 2126::mptctl_do_mpt_comman
d - Target ID out of bounds.
Apr 27 09:09:35 dfgsrv2 last message repeated 3 times Apr 27 09:09:35
dfgsrv2 dataeng: dsm_sa_datamgr32d startup succeeded Apr 27 09:09:35
dfgsrv2 
kernel:
/var/lib/dkms/mptlinux/3.02.63/build/mptctl.c at 2126::mptctl_do_mpt_comman
d - Target ID out of bounds.
Apr 27 09:09:35 dfgsrv2
kernel:
/var/lib/dkms/mptlinux/3.02.63/build/mptctl.c at 2126::mptctl_do_mpt_comman
d - Target ID out of bounds. 
Apr 27 09:09:35 dfgsrv2 Server Administrator: Instrumentation Service
EventID: 1000  Server Administrator starting Apr 27 09:09:35 dfgsrv2
Server Administrator: Instrumentation Service
EventID: 1012  IPMI status  Interface: OS Apr 27 09:09:35 dfgsrv2
dataeng: dsm_sa_eventmgr32d startup succeeded Apr 27 09:09:35 dfgsrv2
Server Administrator: Instrumentation Service 
EventID: 1001  Server Administrator startup complete Apr 27 09:09:36
dfgsrv2 dataeng: dsm_sa_snmp32d startup succeeded Apr 27 09:09:36
dfgsrv2 kernel: dsm_sa_datamgr3[9961]: segfault at 000000000000056d rip
00000000f7fcc129 rsp 00000000ec1f8108 error 4 Apr 27 09:09:38 dfgsrv2
snmpd[7148]: [smux_accept] accepted fd 13 from 
127.0.0.1:58138
Apr 27 09:09:38 dfgsrv2 snmpd[7148]: accepted smux peer: oid
SNMPv2-SMI::enterprises.674.10892.1, password , descr Systems Management
SNMP MIB Plug-in Manager Apr 27 09:09:43 dfgsrv2 snmpd[7148]: Got trap
from peer on fd 13 


I have turned on debugging in the config file, as requested, restarted
the service, and have attached the logfile (I did not comment out the
vil3=nrsvil line).


; Enable/disable dcomsm.log at startup of the Systems Management Data
Manager. 
Debug=On
; Debug Levels in a comma separated list in this order Queue, Ral, Val,
AFAVIL, LSIVIL, NRSVIL, EVIL, SASVIL, SASEVIL, HEL ; DebugLevels will
cause all debug data tagged at that level or lower (a level of 3 or
higher is the most verbose) 
; Debug Level   0   =   CRITICAL debug data only
; Debug Level   1   =   CRITICAL_INFO
; Debug Level   2   =   DEBUG
; Debug Level   3   =   INFO
;DebugLevels=0,0,0,0,0,0,0,0,0,0
DebugLevels=3,3,3,3,3,3,3,3,3,3 


The OS that I am runnning is RHEL4 fully updated - kernel
2.6.9-42.0.10.ELsmp

I have attached the lspci -v output


Hope that helps....


Regards






On Thu, 2007-04-26 at 10:23 -0500, Patrick_Boyd at Dell.com wrote:
> I will look into this. For now in the /opt/dell/srvadmin/sm/stsvc.ini
> file you can disable our enumeration of Non-RAID Controllers by 
> commenting out the following line vil3=nrsvil with a semicolon. This
> should allow you to add the ral line back in so that you can still
> manage and monitor your PERC controller and MD1000.
>
> If you are feeling generous the following information will help me to
> debug this:
> 1. A debug log from the storage component. You can enable this by
> setting the debug flag to on and the debugmask to 3's in the stsvc.ini
> file mentioned above.
> 2. What OS are you running?
> 3. Complete lspci -v output from your LSI SCSI card.
>
> Thanks,
> Patrick Boyd
>
>
>
> -----Original Message----- 
> From: Mansell, Gary [mailto:Gary.Mansell at ricardo.com]
> Sent: Thursday, April 26, 2007 10:09 AM
> To: Boyd, Patrick
> Cc: linux-poweredge-Lists
> Subject: RE: Starting dsm_sa_datamgr32d FAILED 
>
> Hi,
>
> Dell UK supplied me these to get around the problem with the Adaptec
> cards that they originally supplied:
>
> 12:04.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 
> PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 08)
> 12:04.1 SCSI storage controller: LSI Logic / Symbios Logic 53c1030
> PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 08)
>
>
>
>
> 
> On Thu, 2007-04-26 at 10:04 -0500, Patrick_Boyd at Dell.com wrote:
> > Out of curiosity which LSI SCSI HBA are you using? We've tested with
> all
> > of them that Dell ships without any issues. 
> >
> > -----Original Message-----
> > From: linux-poweredge-bounces at dell.com
> > [mailto: linux-poweredge-bounces at dell.com
<mailto:linux-poweredge-bounces at dell.com> ] On Behalf Of Mansell, Gary
> > Sent: Thursday, April 26, 2007 9:56 AM
> > To: linux-poweredge-Lists
> > Subject: Re: Starting dsm_sa_datamgr32d FAILED
> > 
> > Hi,
> >
> > I have experienced the same problem with OMSA 5.2
> >
> > It has occurred on my machine since I applied the latest firmware
> > updates for my PE2950 and MD1000. In my case it seems to be related 
> > to
> a
> > conflict between my LSI HBA SCSI Card and the Firmware upgrade. If I
> > remove the LSI card from the system OMSA 5.2 works fine.
> >
> > Unfortunately removing the LSI SCSI HBA means that I cannot perform 
> > backups !!! The Overland Neo2000 tape library will not work with
> Adaptec
> > SCSI HBA's !!!
> >
> > Another solution is to comment out the following line in one of the
> OMSA
> > config files - this stops OMSA 5.2 running the Storage component and
> > hence you get no reporting.
> >
> > Edit /opt/dell/srvadmin/dataeng/ini/dcdmdy32.ini and comment out the

> > line:
> >
> > popalias.0x0F=ral32
> >
> > by putting a semi colon in front of it.
> >
> > restart OMSA with srvadmin-services.sh restart
> >
> >
> > Hope that helps
> >
> > Gary
> > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> > -
> -
> > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
> > -
> -
> > - - - - - - - - - - - - -
> > This e-mail and any files transmitted with it are confidential and
> > intended solely for the use of the individual or entity to whom they

> are
> > addressed.If you have received this e-mail in error please notify
> > the sender immediately and delete this e-mail from your
> > system.Please note that any views or opinions presented in this 
> > e-mail are solely those
> of
> > the author and do not necessarily represent those of Ricardo (save
> > for reports and other documentation formally approved and signed for
> release 
> > to the intended recipient).Only Directors are authorised to enter
> > into legally binding obligations on behalf of Ricardo. Ricardo may
> > monitor outgoing and incoming e-mails and other telecommunications
systems. 
> > By replying to this e-mail you give consent to such monitoring.The
> > recipient should check e-mail and any attachments for the presence
> > of viruses. Ricardo accepts no liability for any damage caused by 
> > any
> virus
> > transmitted by this e-mail. "Ricardo" means Ricardo plc and its
> > subsidiary companies.
> > Ricardo plc is a public limited company registered in England with 
> > registered number 00222915.
> > The registered office of Ricardo plc is Shoreham Technical Centre,
> > Shoreham-by Sea, West Sussex, BN43 5FG.
> > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
> > -
> -
> > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> > -
> -
> > - - - - - - - - - - - - -
> >
> > _______________________________________________ 
> > Linux-PowerEdge mailing list
> > Linux-PowerEdge at dell.com
> > http://lists.us.dell.com/mailman/listinfo/linux-poweredge 
> > Please read the FAQ at http://lists.us.dell.com/faq


_______________________________________________
Linux-PowerEdge mailing list
Linux-PowerEdge at dell.com
http://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.us.dell.com/pipermail/linux-poweredge/attachments/20070430/905b94ac/attachment-0001.htm 


More information about the Linux-PowerEdge mailing list