hanging DRAC5 cards

Pawel Kudzia kudzia at gmail.com
Mon Jun 7 01:58:48 CDT 2010


for nearly 2 years i'm experiencing stability problems with DRAC5
management cards in all PE1950/PE2950 servers i take care of. machines
run mostly debian linux but also vmware esxi and windows 2008.
hangs occur randomly in the whole family of ~ 35 machines and have
following symptoms:
 * hanged drac card responds to pings
 * hanged drac card listens on usual tcp ports [ eg 22, 443 ] but
   does not provide any answers on them - eg no ssh banner
 * hanged drac card no longer responds to IPMI over LAN requests
 * after few weeks of hang card no longer responds to pings

so far i had to take more often fully functional servers to reset
the DRAC than use the drac to troubleshoot problematic machine.

i was in contact with Dell's support numerous times. i was providing
them dumps of network traffic, countless logs but best i got was early
beta of 1.51 firmware which was released in December 2009. indeed this
firmware made things better but still - around once per month i get a

what do i do with dracs? i ping them; i query them over ipmi over lan
every ~ 1h; i reboot them twice a day - i started doing all this after
they started to hang [ i had stability problems also before all this
monitoring was introduced ]. i tried not to monitor dracs at all - results
were same... after a while they got hanged.

some of dracs use dedicated nic, other share LOM - both hang.

what are your experiences with those cards - did you had similar
issues? do you - by any chance - have any suggestions?
just in case: i have all firmwares/bioses upgraded to most recent.

thanks a lot! regards

Pawel Kudzia / .PaKud

