WARNING: Conflict with old BCM5708 firmware and OMSA 6.5.0

Austin Murphy austin.murphy at gmail.com
Fri Nov 11 15:45:42 CST 2011


Hi Linux PowerEdge List,

I have 2x PE 1950 II servers that had a strange problem, which has
been solved.  I hope this problem report is useful for someone.


I ran into this problem while doing a clean install of RHEL 5.7 (x64).
 The servers previously ran an earlier version of RHEL 5 without
trouble so I was not expecting any trouble.

Each server has 2x GigE ports onboard using the BCM5708 chip.
  Chip:   BCM5708 (rev12)
  Firmware:  2.9.1

  lspci -vnn (snippet):
  05:00.0 Ethernet controller [0200]: Broadcom Corporation NetXtreme
II BCM5708 Gigabit Ethernet [14e4:164c] (rev 12)
        Subsystem: Dell Device [1028:01b3]
        ...
        Kernel driver in use: bnx2

As part of my standard procedure, I installed the latest OMSA using
Dell's yum repo.  In this case, OMSA 6.5 (64bit) was installed.  I
also installed dell_ft_install from the yum repo. .

Up to this point, everything went smoothly and seemed fine.


My trouble started when I tried to configure the second on-board GigE
port, eth1.

On both systems, I got failure messages like this:
  # ifconfig eth1 up
  SIOCSIFFLAGS: Device or resource busy

and a sad message in the dmesg  & system console:

  bnx2: fw sync timeout, reset code = 1030003

Repeating the command did not help, it just gave logged more messages
like the first:

bnx2: fw sync timeout, reset code = 1030006
bnx2: fw sync timeout, reset code = 1030009
bnx2: fw sync timeout, reset code = 103000c
...

Knowing that these were old servers, I tried to update the firmware.
The firmware install was successful for all the items EXCEPT the NICs
... and on rebooting, the system failed to come up properly.  My trek
to the server room found that it had hung towards the end of the
redhat init.d sequence, so I felt confident that at least the firmware
update was applied.  After a cold reboot, the server came back up
properly and indeed the updates had applied.

BUT, I still had the problem with the NIC.  In fact, it seemed that
any command that touches the chip, including:
  ifconfig eth1 up
  inventory_firmware
  update_firmware
failed and resulted in one more message like this on the system console

bnx2: fw sync timeout, reset code = 103000f
bnx2: fw sync timeout, reset code = 1030012
bnx2: fw sync timeout, reset code = 1030015
bnx2: fw sync timeout, reset code = 1030018
bnx2: fw sync timeout, reset code = 103001b
bnx2: fw sync timeout, reset code = 103001e

I also found out the hard way, that after any message like these shows
up, then the next reboot will hang.  By some struck of luck, I found
that the hangs only happened if the boot kernel was the latest
(2.6.18-274.7.1.el5).  If I used the version distributed with RHEL 5.7
(2.6.18-274.el5), then I didn't have the reboot problem.  That was
nice for the short term, but I worried about what to do the next time
a kernel was released by RedHat.

I tried to do the NIC firmware update again.  I saw that  "yum install
$(bootstrap_firmware) " did pickup a firmware package for this NIC:
  BCM5708_Copper_LOM_ven_0x14e4_dev_0x164c-a04-1.noarch

but wondered why "update_firmware --yes" does not see it or install
it.   I also tried the manual firmware install packages from the dell
drivers download site and they all said that the hardware was
incompatible.


After forgetting about this for a few days and coming back to it, now
with remote serial console, I tried again (and again).
The breakthrough was when I noticed that if I logged in ASAP after a
reboot and ran inventory_firmware immediately, it saw the NIC and
didn't cause the error message on the console.  After another 15
seconds or so, it would stop working and give the console message.
AHA!
What finishes loading last? OMSA!

PROBLEM: Once OMSA loads, the management tools lose communication with
the NIC. (IPMI seems to be involved)

SOLUTION:
srvadmin-services.sh disable
shutdown -r now
update_firmware --yes
srvadmin-services.sh enable
shutdown -r now


Now that the NICs use firmware 6.2.14, I can use the latest OMSA and
RHEL kernel with no problems.

It might be useful for OMSA to check for firmware versions and prompt
how to update.


Austin



More information about the Linux-PowerEdge mailing list