PERC4/DI megaraid2 lockup with Debian Woody on 1750?

Marcel Baur baurma at student.ethz.ch
Thu Aug 7 15:12:01 CDT 2003


Greetings everyone,

we're having a bit of trouble with the PERC4/DI RAID controller of a 
brand-new Dell 1750 server here.

We want to run Debian GNU/Linux 3.0r1 (woody) and I was able to 
successfully install it using the alternate boot floppies from
Ronald Sprouse [1]. Apparently, they have megaraid 2.00.2 compiled in.

Unfortunately, I had to make my own kernel as support for the
onboard Broadcom NetXtreme 5700 was missing. So I downloaded a
vanilla Linux 2.4.20 kernel and patched it twice. First, I added
megaraid 2.00.7 from [2] (as it seemed to be the newest version)
and then I added support for the Broadcom nic using the source files
provided on the cd-rom that came with the Dell. I had to edit some
Makefiles, but it went surprisingly smoothly.

I rebooted and the server ran very good for approx. 1 day (I was
already remotely installing stuff for our customer) when suddenly,
the system went off the network. When I went down into the basement,
the screen was full of megaraid error messages.

Apparently, it was looping endlessly in megaraid2.c:2670 [2], printing

...
megarid: Waiting for 4 commands to flush, iter n-1
megarid: Waiting for 4 commands to flush, iter n
megarid: Waiting for 4 commands to flush, iter n+1
...
megarid: critical hardware error!
...
megarid: Waiting for 4 commands to flush, iter n-1
megarid: Waiting for 4 commands to flush, iter n
megarid: Waiting for 4 commands to flush, iter n+1
...

As the keyboard no longer worked, I had to press the front power
button a couple of seconds until power off.

After reboot, the system went into fsck, and after maybe five
or six seconds, the above loop continued.

I ran a BIOS RAID consistency check, but the hardware seems all fine.
Among other things, I tried to reboot the old kernel from
Ronald Sprouse's boot floppy disk. Surprisingly, the fsck
went through and as of now, the system is still up and running
(but without network!).

So my lame questions are:

- Why does megaraid2 say "megarid:". Is it simply a typo and
   shouldn't the message read "megaraid:"? There are other pleaces
   in the Linux kernel source. Am I missing something?

- Apparently, megaraid2 2.00.2 is more stable in our setup. I want
   to try a new kernel with megaraid2 2.00.2 (instead of 2.00.7)
   that has broadcom NetXtreme support compiled-in. However, I wasn't
   able to grab the megaraid 2.00.2 files from [3]. Does anyone know
   how/where I can obtain these files? What other megaraid2 version do
   you suggest?

- Did anyone encounter the "critical hardware error" death loop
   with a megaraid2 PERC4/DI controller on a Dell and what did you
   do to work around it?

- Does anyone successfully run Woody on a similar hardware setup?
   I understand that our hardware is fairly new, but if anyone got
   it running, I would appreciate any hints. Stability is crucial
   for our application, which was one of the main reasons we bought
   another Dell for our racks.

I would appreciate any help on this matter. I understand that Debian
is unsupported, nevertheless, we really want to try running it.
Thanks a lot in advance for your help.

Cheers,
	
	Marcel

[1] http://www.domsch.com/linux/debian/bf_mega.bin
[2] ftp://ftp.lsil.com/pub/linux-megaraid/drivers/version-2.00.7/
[3] ftp://ftp.lsil.com/pub/linux-megaraid/drivers/





More information about the Linux-PowerEdge mailing list