[Linux-PowerEdge] dsm_sa_datamgrd segfault that crahses whole system

lejeczek peljasz at yahoo.co.uk
Fri Feb 10 07:55:03 CST 2017


hi everybody
also hopefully Dell tech as this should be directly hardware 
related I believe.

I manage to segfault omsa(which then crashes the whole system):

[ 1117.103438] dsm_sa_datamgrd[28952]: segfault at 0 ip 
00007f2e1ab57b46 sp 00007f2e1197c020 error 4 in 
libdsm_sm_sasvil.so[7f2e1aae8000+bb000]

Simply by having one H700 in one specific PCI slot in my 
R815 servers.
Server(s) setup in somewhat not-usual, I've stumbled upon 
this segfault purely by a chance.

I have "embedded" H200 in "integrated storage controller 
card slot"
I have a Dell Broadcom 4port NIC in "expansion-card slot 2"
I have a H700 in "expansion-card riser 1"
Lastly I have a H800 in "expansion-card slot 5"

H700 was installed for we are going to move hdd array from 
H200 to H700(but not just yet so we put the H700 card only).

1)Now:
when H700 is in "expansion-card slot 2" everything is 
working perfectly fine, no segfaults.
But "expansion-card slot 1" is pcieX8 which matches H700, 
and "expansion-card slot 2" is only pcieX4.

2)Now: segfault seems to occur only when I run omxxx storage 
on that specific H700 controller.

$ omreport storage vdisk vdisk=0 controller=1 - H200, no 
segfaults
$ omreport storage vdisk vdisk=0 controller=0 - H700, segfaul!

omreport system summary - does not cause it.
After ~20sec after segfault the system suffers from 
hard/cold power cycle.

So this seems critical. It would be expected of tech-team to 
investigate it.
Separately I'm going to report "tech support request" but 
felt sharing with other R815's I should do.

b.w.&r.
L


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.us.dell.com/pipermail/linux-poweredge/attachments/20170210/77419bf9/attachment.html 


More information about the Linux-PowerEdge mailing list