Error messages: "Correctable Non-Mirrored Demand Data ECC"

Peter Matulis peter.matulis at canonical.com
Tue Dec 21 08:11:00 CST 2010


On 12/21/2010 07:14 AM, vincent at cojot.name wrote:
> 
> Hi Peter,
> 
> On ECC memory, memtest will not usually find errors (unless the
> developpers added code to deal with ECC RAM), after all that's the
> purpose of ECC. The only way to figure that out is to ask the system's
> memory controllers to tell you if any errors were seem. On my RHEL5.5 I
> do it like this:
> # edac-util -v
> mc0: 0 Uncorrected Errors with no DIMM info
> mc0: 0 Corrected Errors with no DIMM info
> mc0: csrow0: ch0|ch1: 0 Uncorrected Errors
> mc0: csrow0: ch0: 0 Corrected Errors
> mc0: csrow0: ch1: 0 Corrected Errors
> mc0: csrow1: ch0|ch1: 0 Uncorrected Errors
> mc0: csrow1: ch0: 0 Corrected Errors
> mc0: csrow1: ch1: 0 Corrected Errors
> mc0: csrow2: ch0|ch1: 0 Uncorrected Errors
> mc0: csrow2: ch0: 0 Corrected Errors
> mc0: csrow2: ch1: 0 Corrected Errors
> mc0: csrow3: ch0|ch1: 0 Uncorrected Errors
> mc0: csrow3: ch0: 0 Corrected Errors
> mc0: csrow3: ch1: 0 Corrected Errors
> 
> If your counts are above zero, chances are you'd want to order
> replacement modules.
> 
> Just my 2c,
> 
> Vincent
> 
> 
> On Fri, 17 Dec 2010, Peter Matulis wrote:
> 
>> Hi gang.  I have a Precision WorkStation 690 running Ubuntu 10.04.1 and
>> I am getting a lot of the following messages in my logs:
>>
>> Dec 17 21:45:31 server-2 kernel: [4362679.480049] EDAC MC0: CE row 2,
>> channel 3, label "": (Branch=1 DRAM-Bank=1 RDWR=Read RAS=8952 CAS=768,
>> CE Err=0x2000 (Correctable Non-Mirrored Demand Data ECC))
>> Dec 17 21:45:38 server-2 kernel: [4362686.480028] EDAC MC0: CE row 2,
>> channel 3, label "": (Branch=1 DRAM-Bank=1 RDWR=Read RAS=9627 CAS=768,
>> CE Err=0x2000 (Correctable Non-Mirrored Demand Data ECC))
>> Dec 17 21:45:40 server-2 kernel: [4362688.480024] EDAC MC0: CE row 2,
>> channel 3, label "": (Branch=1 DRAM-Bank=1 RDWR=Read RAS=9540 CAS=768,
>> CE Err=0x2000 (Correctable Non-Mirrored Demand Data ECC))
>> Dec 17 21:45:43 server-2 kernel: [4362691.482844] EDAC MC0: CE row 2,
>> channel 3, label "": (Branch=1 DRAM-Bank=1 RDWR=Read RAS=9462 CAS=768,
>> CE Err=0x2000 (Correctable Non-Mirrored Demand Data ECC))
>> Dec 17 21:45:49 server-2 kernel: [4362697.480445] EDAC MC0: CE row 2,
>> channel 3, label "": (Branch=1 DRAM-Bank=1 RDWR=Read RAS=9180 CAS=768,
>> CE Err=0x2000 (Correctable Non-Mirrored Demand Data ECC))
>> Dec 17 21:45:58 server-2 kernel: [4362706.480027] EDAC MC0: CE row 2,
>> channel 3, label "": (Branch=1 DRAM-Bank=1 RDWR=Read RAS=9314 CAS=768,
>> CE Err=0x2000 (Correctable Non-Mirrored Demand Data ECC))
>>
>> I have performed a memtest (GRUB menu) and the results are good.  Should
>> I order a replacement memory module?

Thanks a lot for that.  This is my output:

mc0: 0 Uncorrected Errors with no DIMM info
mc0: 0 Corrected Errors with no DIMM info
mc0: csrow0: 0 Uncorrected Errors
mc0: csrow0: ch0: 0 Corrected Errors
mc0: csrow0: ch1: 0 Corrected Errors
mc0: csrow0: ch2: 0 Corrected Errors
mc0: csrow0: ch3: 0 Corrected Errors
mc0: csrow1: 0 Uncorrected Errors
mc0: csrow1: ch0: 0 Corrected Errors
mc0: csrow1: ch1: 0 Corrected Errors
mc0: csrow1: ch2: 0 Corrected Errors
mc0: csrow1: ch3: 0 Corrected Errors
mc0: csrow2: 0 Uncorrected Errors
mc0: csrow2: ch0: 0 Corrected Errors
mc0: csrow2: ch1: 0 Corrected Errors
mc0: csrow2: ch2: 1 Corrected Errors
mc0: csrow2: ch3: 2005594 Corrected Errors
mc0: csrow3: 0 Uncorrected Errors
mc0: csrow3: ch0: 0 Corrected Errors
mc0: csrow3: ch1: 0 Corrected Errors
mc0: csrow3: ch2: 1 Corrected Errors
mc0: csrow3: ch3: 0 Corrected Errors

I have 8x2GB modules.  Would you know which ones need replacing?

-- 
Peter



More information about the Linux-Precision mailing list