omsa error - "general protection rip" and "dsm_sa_datamgr3 tried to read /dev/mem..."

J Potter jpotter-dell at codepuppy.com
Tue Mar 3 21:11:14 CST 2009


Hi List --

omsa is failing for us on on server that's part of a cluster. The  
other servers work fine and have had identical configuration.

Has anyone seen errors like the following before? Any clues as to what  
the root cause might be?

> Mar  3 22:03:53 someHostname kernel: Program dsm_sa_datamgr3 tried  
> to read /dev/mem between fffff000->100010000.
> Mar  3 22:04:17 someHostname Server Administrator: Instrumentation  
> Service EventID: 1000  Server Administrator starting
> Mar  3 22:04:17 someHostname Server Administrator: Instrumentation  
> Service EventID: 1012  IPMI status  Interface: OS
> Mar  3 22:04:18 someHostname Server Administrator: Instrumentation  
> Service EventID: 1001  Server Administrator startup complete
> Mar  3 22:04:18 someHostname kernel: dsm_sa_datamgr3[6853] general  
> protection rip:f7f1ff04 rsp:f58ca30c error:0
> Mar  3 22:04:24 someHostname Server Administrator: Instrumentation  
> Service EventID: 1009  Systems Management Data Manager Stopped

This is on CentOS 5 (2.6.18-92.1.22.el5) on a PE 2950 with 16 GB ram.  
The box had been running omsa 5.4, and started giving these errors; I  
upgraded it to 5.5 and the errors continue.

I've run memtester on 15GB of the box, and it checks out ok.  
(Admittedly there could be a problem elsewhere in RAM.)

Thanks,
Jeff


p.s. Here is what is written to /var/log/messages upon running  
'srvadmin-services.sh restart':
Mar  3 22:08:43 someHostname dataeng: dsm_sa_eventmgr32d shutdown  
succeeded
Mar  3 22:08:43 someHostname instsvcdrv: dell_rbu device driver unloaded
Mar  3 22:08:50 someHostname instsvcdrv: dell_rbu device driver loaded
Mar  3 22:08:51 someHostname kernel: Program dsm_sa_datamgr3 tried to  
read /dev/mem between fffff000->100010000.
Mar  3 22:09:15 someHostname Server Administrator: Instrumentation  
Service EventID: 1000  Server Administrator starting
Mar  3 22:09:15 someHostname Server Administrator: Instrumentation  
Service EventID: 1012  IPMI status  Interface: OS
Mar  3 22:09:15 someHostname Server Administrator: Instrumentation  
Service EventID: 1001  Server Administrator startup complete
Mar  3 22:09:17 someHostname kernel: racsvm[9125] general protection  
rip:f77b582e rsp:ff8a4d30 error:0
Mar  3 22:09:18 someHostname kernel: dsm_sa_datamgr3[8773] general  
protection rip:f7f9ef04 rsp:f594930c error:0
Mar  3 22:09:21 someHostname kernel: dsm_sa_eventmgr[8794] general  
protection rip:f7f894a0 rsp:f6e7d090 error:0



More information about the Linux-PowerEdge mailing list