Dell PE1850/2850 RAID array issue with RHEL4
Hansjörg Maurer
hansjoerg.maurer at dlr.de
Fri Aug 4 08:16:29 CDT 2006
Hi
we had stability issues with a perc4di to.
One week of Dell hardware support (Firmware, omsa logs etc) has no results.
(including Level Trier 3 Logs, and Omsa Logs)
The next week, we got conntact to another dell technical supporter, who
gave us
an linttylog program (according to google availabel form here ...)
http://www.firehat.org/repodata/repoview/linttylog-0-1.00-0.html
It detects
CC Error: Multi-Bit Read error from Secondary ATU, addr=d786cb00,
syndrome=57 [bit=-1]
^MMulti-bit or overflow encountered (mcisr=1)...shutting down
^MTotal ecc errors encountered this boot=2
^M^M
which occurs the same time linux dies.
Dell changes the Mainboard and the Controller RAM afterwards, and we are
hoping,
that this solves the problem.
The server has been running for 3 years, but ther error occurs first time
after migrating 1000's of rrd files files to the raid1 disk which are
updated
often, so that there is much more IO....
I would try to use this tool to check for HW Errors on the controller
side....
Greetings
Hansjörg
Nicky Peeters wrote:
>Well, just a small update:
>
>It's was a PE 1850, using 2 disks in RAID1, running RHEL4 and
>aforementioned kernel.
>After a remote reboot (thank god for IPMI) the machine booted without
>problem, and all seems fine.
>
>Still investigation though.
>
>OMSA reports no issues.
>
>I haven't run any fsck/consistency check for the time being.
>
>On 04 Aug 2006, at 11:47, Nicky Peeters wrote:
>
>
>
>>Well no,
>>
>>For now I can only access what dmesg provides me with, but I saw no
>>hdf errors (or even the APIC errors) you had in your log.
>>
>>Hopefully my datacenter trip will reveal more...
>>
>>On 04 Aug 2006, at 11:31, wolf2k5 wrote:
>>
>>
>>
>>>On 8/4/06, Nicky Peeters <nicky at beta9.be> wrote:
>>>
>>>
>>>>It's the only server where the kernel was upgraded (22 days ago) to
>>>>2.6.9-34.0.1.ELsmp.
>>>>
>>>>Did you by any chance also upgrade to that kernel ?
>>>>
>>>>
>>>Yes, but other 1850/2850 servers hit the same issue with earlier
>>>kernels too.
>>>
>>>Did you also get any "hdf" errors ,like I did, when your server hit
>>>the issue?
>>>
>>>Please let me know if you find out anything.
>>>
>>>Thanks.
>>>
>>>_______________________________________________
>>>Linux-PowerEdge mailing list
>>>Linux-PowerEdge at dell.com
>>>http://lists.us.dell.com/mailman/listinfo/linux-poweredge
>>>Please read the FAQ at http://lists.us.dell.com/faq
>>>
>>>
>>>
>>_______________________________________________
>>Linux-PowerEdge mailing list
>>Linux-PowerEdge at dell.com
>>http://lists.us.dell.com/mailman/listinfo/linux-poweredge
>>Please read the FAQ at http://lists.us.dell.com/faq
>>
>>
>>
>
>_______________________________________________
>Linux-PowerEdge mailing list
>Linux-PowerEdge at dell.com
>http://lists.us.dell.com/mailman/listinfo/linux-poweredge
>Please read the FAQ at http://lists.us.dell.com/faq
>
>
More information about the Linux-PowerEdge
mailing list