PE 1850 IPMI Problem

Charles Leu CLeu at castironsys.com
Wed Mar 9 21:24:12 CST 2005


A Problem with IPMI on the Dell 1850 (specifically with the kcs driver):
 
We've observed a problem with IPMI, only on our Dell 1850s, and only with the KCS driver.  The problem manifests itself as a condition in which SMS_ATTN from the BMC indicates to handle an event, and a get flags request yields flags indicating an OEM 0 interrupt/event; and this condition persists.
 
Apparently, the generic Linux IPMI kcs driver (OpenIPMI v33 from SourceForge) doesn't handle/clear the OEM 0 event/flag.  Although the driver continues to run, because it hasn't cleared the event/flag, it effectively prevents the driver from handling requests from clients (e.g. ipmitool).  Also note that the driver can't be unloaded when in this state.
 
Thus, the questions:
 
Q1: Does the BMC for the PE 1850 generate interrupts/events that result in the OEM flags being set?
Q2: If the answer to Q1 is yes, then how should such interrupts/flags be cleared?
Q3: If the answer to Q2 is no, then what course of action should the driver take to ensure
      further get flags requests won't yield SMS_ATTN with OEM flag(s) set? (e.g. reset the
      BMC; although if the underlying problem isn't addressed, I consider this a work around,
      and not a true fix).
 
Some related info:
  DELL 1850 running RedHat Linux 2.4.21
  BMC Revision 1.23
  Backplane Firmware 1.00
  OpenIPMI Driver v33
  IPMI 1.5, KCS interface, 
  The version of the BIOS doesn't seem to matter; the hang occurs with both A00 and A03
  The version of the SMBIOS doesn't see to matter either.
  Built-in FRU device:
    Board Mfg:       DELL
    Board Product: FRU16K,DELL P/N
    Board Serial:    CN1374048L00WD
    Board Part:      0F1667A02
 
To yield the problem:
  Run ipmitool in a loop to continually poll machine state (see shell script below).
  After a random amount of time (typically between 4 to 12 hours), the IPMI tool will
  'hang'.  The state of the stack for the IPMI tool is consistent in each hang;
  the kernel debugger btp shows:
 
0xf22fc000    25495     6829  0    1   S  0xf22fc580  ipmitool
ESP        EIP        Function (args)
0xf22fde74 0xc0124106 context_switch+0xa2 (0xc0435c80, 0xf22fc000, 0xc4cdc000, 0xf, 0x1)
                               kernel .text 0xc0100000 0xc0124064 0xc01241bf
0xf22fde90 0xc0121aab schedule+0x34f
                               kernel .text 0xc0100000 0xc012175c 0xc0121cee
0xf22fdee4 0xc01346af schedule_timeout+0xb5 (0xf4941b00, 0xf22fdf3c, 0xf22fc000, 0x0, 0xf22fc000)
                               kernel .text 0xc0100000 0xc01345fa 0xc01346b1
0xf22fdf1c 0xc017bc1c do_select+0x131
                               kernel .text 0xc0100000 0xc017baeb 0xc017bd35
0xf22fdf60 0xc017c0a4 sys_select+0x33c
                               kernel .text 0xc0100000 0xc017bd68 0xc017c232
0xf22fdfc4 0xc03f206f no_timing+0x7
                               kernel .entry.text 0xc03f2000 0xc03f2068 0xc03f2074
[1]kdb> 
 
 
#!/bin/bash
#  Repeatedly call ipmitool until it hangs.
for i in `seq 1 100000`;
do
    echo ""
    echo "============================================"
    echo "IPMITOOL iteration $i:"
    echo ""
    ipmitool bmc
    echo ""
    ipmitool sdr
    echo ""
    ipmitool fru
    echo ""
    ipmitool sel
    echo     echo "============================================"
done

Any/all help is greatly appreciated.
 
Best regards,
Charles Leu
 
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.us.dell.com/pipermail/linux-poweredge/attachments/20050309/ac20c940/attachment.htm


More information about the Linux-PowerEdge mailing list