AW: Machine check exception (Debian Etch)
Morten P.D. Stevens
mstevens at win-professional.com
Tue Feb 3 09:43:15 CST 2009
Hi Dave,
i think it´s a CPU-related problem with the CPU Data Cache.
When you try cat /proc/cpuinfo CPU3 means the first physical CPU Core 4.
Best regards,
Morten Stevens
-----Ursprüngliche Nachricht-----
Von: linux-poweredge-bounces at dell.com [mailto:linux-poweredge-bounces at dell.com] Im Auftrag von Dave Ewart
Gesendet: Dienstag, 3. Februar 2009 11:43
An: linux-poweredge at dell.com
Betreff: Machine check exception (Debian Etch)
We have an R905 with four quad-core CPUs and 128GB RAM (32 4GB sticks).
It runs Debian Etch. A machine check exception was reported and decoded as follows:
$ sudo mcelog --ascii --k8
MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 3 BANK 4 TSC 32559b687fb05
MISC e00c0ffe01000000 ADDR 1019aa6cc4
STATUS 9c034480011c017b MCGSTATUS 0
MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 3 0 data cache TSC 32559b687fb05
MISC e00c0ffe01000000
Data cache ECC error (syndrome 6)
bit39 = res7
bit42 = res10
bit46 = corrected ecc error
bit59 = misc error valid
memory/cache error 'evict mem transaction, generic transaction, level generic'
STATUS 9c034480011c017b MCGSTATUS 0
We have the Debian-packaged OMSA installed - it reported nothing for this incident, which happened just once.
What's going on here? Is this an ECC error in RAM which was corrected, or something else?
If it's RAM-related, identifying the errant RAM stick will be easier if I know what 'CPU 3' means. Is it:
(a) the fourth core on the first CPU (counting each of the
16 cores from 0 to 15), or
(b) is it the fourth CPU (counting only physical CPUs from 0 to 3)?
Any comments appreciated.
Thanks,
Dave.
--
Dave Ewart
davee at ceu.ox.ac.uk
Computing Manager, Cancer Epidemiology Unit University of Oxford / Cancer Research UK
PGP: CC70 1883 BD92 E665 B840 118B 6E94 2CFD 694D E370 Get key from http://www.ceu.ox.ac.uk/~davee/davee-ceu-ox-ac-uk.asc
N 51.7518, W 1.2016
More information about the Linux-PowerEdge
mailing list