[Linux-PowerEdge] dsm_sa_datamgrd segfault that crahses whole system (lejeczek)

Deepesh_C_P at Dell.com Deepesh_C_P at Dell.com
Mon Feb 13 00:41:44 CST 2017

Dell - Internal Use - Confidential  

Can you please share the following details and logs to investigate this issue?

1) Operating system (version)
2) OMSA version

Also please collect the logs.

Steps to collect dcomsm.log file:
1)	Stop Data Manager Service   (./srvadmin-services.sh stop)
2)	Go to /opt/dell/srvadmin/etc/srvadmin-storage  and open stsvc.ini file.
3)	Change "Debug=Off" to "Debug=On"
4)	Change all the debug levels from "DebugLevels=0,0,0,0,0,0,0,0,0,0,0" to "DebugLevels=3,3,3,3,3,3,3,3,3,3,3"
5)	Start Data Manager Service  (./srvadmin-services.sh stop)
6)	dcomsm.log file will be generated at /opt/dell/srvadmin/var/log/openmanage location.

Deepesh CP

-----Original Message-----
From: linux-poweredge-bounces-Lists On Behalf Of linux-poweredge-request-Lists
Sent: Friday, February 10, 2017 11:30 PM
To: linux-poweredge-Lists <linux-poweredge at lists.us.dell.com>
Subject: Linux-PowerEdge Digest, Vol 153, Issue 4

Send Linux-PowerEdge mailing list submissions to
	linux-poweredge at dell.com

To subscribe or unsubscribe via the World Wide Web, visit
or, via email, send a message with subject or body 'help' to
	linux-poweredge-request at dell.com

You can reach the person managing the list at
	linux-poweredge-owner at dell.com

When replying, please edit your Subject line so it is more specific than "Re: Contents of Linux-PowerEdge digest..."

Today's Topics:

   1.  dsm_sa_datamgrd segfault that crahses whole system (lejeczek)


Message: 1
Date: Fri, 10 Feb 2017 13:55:03 +0000
From: lejeczek <peljasz at yahoo.co.uk>
Subject: [Linux-PowerEdge] dsm_sa_datamgrd segfault that crahses whole
To: linux-poweredge at dell.com
Message-ID: <49ab5b46-0747-2398-9a80-6b6f81cde453 at yahoo.co.uk>
Content-Type: text/plain; charset="utf-8"

hi everybody
also hopefully Dell tech as this should be directly hardware related I believe.

I manage to segfault omsa(which then crashes the whole system):

[ 1117.103438] dsm_sa_datamgrd[28952]: segfault at 0 ip
00007f2e1ab57b46 sp 00007f2e1197c020 error 4 in libdsm_sm_sasvil.so[7f2e1aae8000+bb000]

Simply by having one H700 in one specific PCI slot in my
R815 servers.
Server(s) setup in somewhat not-usual, I've stumbled upon this segfault purely by a chance.

I have "embedded" H200 in "integrated storage controller card slot"
I have a Dell Broadcom 4port NIC in "expansion-card slot 2"
I have a H700 in "expansion-card riser 1"
Lastly I have a H800 in "expansion-card slot 5"

H700 was installed for we are going to move hdd array from
H200 to H700(but not just yet so we put the H700 card only).

when H700 is in "expansion-card slot 2" everything is working perfectly fine, no segfaults.
But "expansion-card slot 1" is pcieX8 which matches H700, and "expansion-card slot 2" is only pcieX4.

2)Now: segfault seems to occur only when I run omxxx storage on that specific H700 controller.

$ omreport storage vdisk vdisk=0 controller=1 - H200, no segfaults $ omreport storage vdisk vdisk=0 controller=0 - H700, segfaul!

omreport system summary - does not cause it.
After ~20sec after segfault the system suffers from hard/cold power cycle.

So this seems critical. It would be expected of tech-team to investigate it.
Separately I'm going to report "tech support request" but felt sharing with other R815's I should do.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.us.dell.com/pipermail/linux-poweredge/attachments/20170210/77419bf9/attachment-0001.html 


Linux-PowerEdge mailing list
Linux-PowerEdge at dell.com

End of Linux-PowerEdge Digest, Vol 153, Issue 4

More information about the Linux-PowerEdge mailing list