Dell IPMI OEM SEL event descriptions

Sven Ulland sveniu at opera.com
Wed Jul 4 10:07:38 CDT 2012


Are Dell's OEM SEL event descriptions freely available anywhere?

I'd like to use ipmi to retrieve the system event log (SEL), and then
extract useful information from it. Primarily, I'm looking for memory
and disk errors that can be used to trigger testing and possible
replacements, in as programmatic way as possible. I'm not interested
in using omsa.

Dell uses OEM-specific SEL event descriptions, and I'd like to get my
hands on those, instead of deducing it manually. An earlier post by
Fred Skrotzki explains how to deduce memory location from the ipmi sel
event data [1]. I assume he's been manually deducing this info by
correlating the ipmi sel log with the i/drac system event log, where
OEM data is included. This is prone to problems if the data changes
with hardware or i/drac releases and updates.

Dell's Wayne Weilnau says: "If you are using a Dell version of IPMI
Tool (its available on the Systems Management DVD), there should be an
option to use Dell extensions that will make the output a little more
user friendly." [2]. I'm unable to find any ipmitool in the OMSA
distributions. It could contain an oem file that can be fed directly
to 'ipmitool -O <file>', which would be brilliant.

Here's how a memory fault is reported in the SEL:

SEL Record ID          : 0002
  Record Type           : 02
  Timestamp             : 05/30/2012 01:11:59
  Generator ID          : 00b1
  EvM Revision          : 04
  Sensor Type           : Memory
  Sensor Number         : 02
  Event Type            : Sensor-specific Discrete
  Event Direction       : Assertion Event
  Event Data            : a19001
  Description           : Uncorrectable ECC

This can be (easily) mapped to point to DIMM_A1. From there, I can
look up dmi data to find additional details like serial and asset
numbers. (Btw, the event data field is defined in the IPMI v2 spec,
section 29.7. 0xa = 1010b => 'oem code in next byte'.)

While manual mapping is certainly possible, I would like to avoid
such a brittle, hardcoded approach. Also, full OEM info can be applied
to other events, like disks and CPUs, and even the events that just
say "An OEM diagnostic event has occurred" (event data 0x210400, for
example).

On a side note, some events are possible to look up in extra detail,
for example CPU errors. Just use 'sel get <id>', and in some cases
where it's supported, FRU details are shown:

"""
$ ipmitool ... sel get 0x10
SEL Record ID          : 0010
  Record Type           : 02
  Timestamp             : 06/02/2012 22:27:21
  Generator ID          : 0020
  EvM Revision          : 04
  Sensor Type           : Processor
  Sensor Number         : 60
  Event Type            : Sensor-specific Discrete
  Event Direction       : Assertion Event
  Event Data (RAW)      : 00ffff
  Event Interpretation Missing
  Description           : IERR

Sensor ID              : Status (0x60)
  Entity ID             : 3.1
  Sensor Type (Discrete): Processor
  States Asserted       : Processor
                          [Presence detected]

FRU Device Description : CPU1 (ID 176)
  Device not present (Requested sensor, data, or record not found)
"""

Let me know if anyone is already doing something like this, with
a minimum of fuss (in other words, as standards-compliant [ipmi] as
possible), and a minimum of megabytes (that again excludes omsa).

Sven

[1]: Linux-PowerEdge: memory test on Linux
Message-ID: <6134F668515B04409948FBEB644A3E0DD27D6D at msmail.textwise.com>
<URL:http://lists.us.dell.com/pipermail/linux-poweredge/2006-October/027701.html>

[2]: Linux-PowerEdge: how to enable BMC sensors
Message-ID: 
<07E32F241046DA418A0381C1225BFC0119C49AE950 at AUSX7MCPS301.AMER.DELL.COM>
<URL:http://lists.us.dell.com/pipermail/linux-poweredge/2010-May/042261.html>



More information about the Linux-PowerEdge mailing list