E1422 CPU2 machine check error. Power cycle AC.

Gonçal Badenes goncal.badenes at icfo.es
Mon Mar 21 10:32:55 CDT 2011


Hi!

We have had exactly the same problem with a 4x12core AMD R815, but ours 
has Opteron 6168 processors. In our case, after opening a case with Dell 
support, we had the motherboard and CPU2 (it was also CPU2 the one with 
problems!) replaced. Since then, the problem has not appeared anymore, 
but we experience some infrequent kernel panics under CPU-intensive load 
(simulations using all cores). I suspect that the panics might be 
something related to the original problem, but to be honest I'm not sure 
about it.

By the way, the CPU2 machine check error appeared both with BIOS 1.3.1 
and 1.4.1. My impression is that they were less frequent with BIOS 1.4.1...

Cheers,

     Gonçal Badenes
     ICFO-The Institute of Photonic Sciences
     Barcelona, Spain
     www.icfo.es

On 21/03/2011 15:56, Hanne Munkholm wrote:
> Hi.
>
> I have some new R815 servers with 4 x 12core AMD opteron 6174
> and 256 / 128 GB RAM.
>
> When I enable the "DMA Virtualization" in BIOS (under Processor
> Settings), I get this error message in the display:
>
> "E1422 CPU2 machine check error. Power cycle AC."
>
> This happenes both with BIOS 1.3.1 and 1.4.1.
>
> I first saw it Friday and I went straight up and opened a
> support case with Dell. The servers are not in production yet
> and I want it solved before they are.
>
> The Dell engineer was very helpful and suggested some things. I
> had just upgraded the BIOS to 1.4.1 and thought that this might
> have caused it, so he checked that all my other firmware was up
> to date etc. Which it was. He also suggested reseating all the
> components, clearing the log, and if it did not go away,
> downgrading the BIOS.
>
> I adopted another strategy since I had 3 more servers to play
> with. I upgraded BIOS on one more - no error. Then I enabled DMA
> Virtualization on one more, with "old" BIOS 1.3.1 and the error
> was there right away.
>
> So I disabled it again and cleared the log and it did not come
> back. (I have not cleared the log of the first server since I
> might not be done with it).
>
> I still have the case open with Dell, the engineer promised to
> ask their second level support so I guess it is not a normal
> everyday issue.
>
> My questions to this list is:
>
> 1) What is DMA virtualization in BIOS exactly supposed to do? It
> did not make the
> "[    0.000000] Your BIOS doesn't leave a aperture memory hole
> [    0.000000] Please enable the IOMMU option in the BIOS setup
> [    0.000000] This costs you 64 MB of RAM"
>
> message in my dmesg log go away, which is why I enabled it in
> the first place.
>
> 2) Has anyone seen this before?
>
> 3) Does anyone have a clue why this happens? What does "E1422
> CPU machine Chk" mean anyway?
>
> Thanks in advance for any clue, or pointers to documentaion that
> would make this clearer.
>
> Med venlig hilsen / Best regards
> --
> Hanne Munkholm                      Email: hanne at binf.ku.dk
> Systemadministrator                 Tlf: +45 35 32 13 49
>
> Bioinformatik-centret
> Københavns Biocenter, Biologisk Institut
> Ole Maaløes Vej 5, 2200 København N
>
> _______________________________________________
> Linux-PowerEdge mailing list
> Linux-PowerEdge at dell.com
> https://lists.us.dell.com/mailman/listinfo/linux-poweredge



More information about the Linux-PowerEdge mailing list