PE 1850 IPMI Problem
CLeu at castironsys.com
Wed Mar 9 21:24:12 CST 2005
A Problem with IPMI on the Dell 1850 (specifically with the kcs driver):
We've observed a problem with IPMI, only on our Dell 1850s, and only with the KCS driver. The problem manifests itself as a condition in which SMS_ATTN from the BMC indicates to handle an event, and a get flags request yields flags indicating an OEM 0 interrupt/event; and this condition persists.
Apparently, the generic Linux IPMI kcs driver (OpenIPMI v33 from SourceForge) doesn't handle/clear the OEM 0 event/flag. Although the driver continues to run, because it hasn't cleared the event/flag, it effectively prevents the driver from handling requests from clients (e.g. ipmitool). Also note that the driver can't be unloaded when in this state.
Thus, the questions:
Q1: Does the BMC for the PE 1850 generate interrupts/events that result in the OEM flags being set?
Q2: If the answer to Q1 is yes, then how should such interrupts/flags be cleared?
Q3: If the answer to Q2 is no, then what course of action should the driver take to ensure
further get flags requests won't yield SMS_ATTN with OEM flag(s) set? (e.g. reset the
BMC; although if the underlying problem isn't addressed, I consider this a work around,
and not a true fix).
Some related info:
DELL 1850 running RedHat Linux 2.4.21
BMC Revision 1.23
Backplane Firmware 1.00
OpenIPMI Driver v33
IPMI 1.5, KCS interface,
The version of the BIOS doesn't seem to matter; the hang occurs with both A00 and A03
The version of the SMBIOS doesn't see to matter either.
Built-in FRU device:
Board Mfg: DELL
Board Product: FRU16K,DELL P/N
Board Serial: CN1374048L00WD
Board Part: 0F1667A02
To yield the problem:
Run ipmitool in a loop to continually poll machine state (see shell script below).
After a random amount of time (typically between 4 to 12 hours), the IPMI tool will
'hang'. The state of the stack for the IPMI tool is consistent in each hang;
the kernel debugger btp shows:
0xf22fc000 25495 6829 0 1 S 0xf22fc580 ipmitool
ESP EIP Function (args)
0xf22fde74 0xc0124106 context_switch+0xa2 (0xc0435c80, 0xf22fc000, 0xc4cdc000, 0xf, 0x1)
kernel .text 0xc0100000 0xc0124064 0xc01241bf
0xf22fde90 0xc0121aab schedule+0x34f
kernel .text 0xc0100000 0xc012175c 0xc0121cee
0xf22fdee4 0xc01346af schedule_timeout+0xb5 (0xf4941b00, 0xf22fdf3c, 0xf22fc000, 0x0, 0xf22fc000)
kernel .text 0xc0100000 0xc01345fa 0xc01346b1
0xf22fdf1c 0xc017bc1c do_select+0x131
kernel .text 0xc0100000 0xc017baeb 0xc017bd35
0xf22fdf60 0xc017c0a4 sys_select+0x33c
kernel .text 0xc0100000 0xc017bd68 0xc017c232
0xf22fdfc4 0xc03f206f no_timing+0x7
kernel .entry.text 0xc03f2000 0xc03f2068 0xc03f2074
# Repeatedly call ipmitool until it hangs.
for i in `seq 1 100000`;
echo "IPMITOOL iteration $i:"
echo echo "============================================"
Any/all help is greatly appreciated.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Linux-PowerEdge