PE 1850 - LSI Logic MegaRAID (PERC 4e/Si) hw problems

Kuba Ober kuba at mareimbrium.org
Fri Apr 13 16:01:10 CDT 2007


> line 104: T12: rebuildResume checksum is bad - initializing NVRAM structure
>
> line 106: T12: RMW: NVRAM structure invalid - initializing
>
> line 147+: ECC Error: Multi-Bit Read error from ATU, addr=c6dfcde0,
> syndrome=66 [bit=255]
> ECC Error: Single-Bit Read error from ATU, addr=c6dfcdf0, syndrome=c4
> [bit=2]
> Multi-bit or overflow encountered (mcisr=3)...shutting down
> Total ecc errors encountered this boot=3
...

> Or is it just plain faulty hardware, that's acting up after 1 year in
> production with no problems whatsoever, until this week.

It's worth pointing out that usually hardware is OK when it leaves the factory 
(they test, or so I hope), and will start acting flaky some random period of 
time afterwards. Can you tell me of any other way? Because I can't see any :)

The log tells you more or less in plain English ;) that your hardware is 
ailing. NVRAM is the nonvolatile RAM used in the RAID controller to store (at 
least) state of things such as rebuild location, patrol read location, and 
whatnot, and possibly it's also the cache RAM, although I don't know Dell's 
terminology here. What's ATU I don't know, but any read errors from anything, 
indicate that the hardware is dying.

You may try simply reseating the cache RAM stick on the PERC board. That fixed 
a similar problem for me on the PE2650 motherboard. Maybe there's a separate 
nvram chip, in which case reseat it too.

Cheers, Kuba



More information about the Linux-PowerEdge mailing list