Machine check exception (Debian Etch)
Dave Ewart
davee at ceu.ox.ac.uk
Tue Feb 3 04:42:58 CST 2009
We have an R905 with four quad-core CPUs and 128GB RAM (32 4GB sticks).
It runs Debian Etch. A machine check exception was reported and decoded
as follows:
$ sudo mcelog --ascii --k8
MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 3 BANK 4 TSC 32559b687fb05
MISC e00c0ffe01000000 ADDR 1019aa6cc4
STATUS 9c034480011c017b MCGSTATUS 0
MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 3 0 data cache TSC 32559b687fb05
MISC e00c0ffe01000000
Data cache ECC error (syndrome 6)
bit39 = res7
bit42 = res10
bit46 = corrected ecc error
bit59 = misc error valid
memory/cache error 'evict mem transaction, generic transaction, level generic'
STATUS 9c034480011c017b MCGSTATUS 0
We have the Debian-packaged OMSA installed - it reported nothing for
this incident, which happened just once.
What's going on here? Is this an ECC error in RAM which was corrected,
or something else?
If it's RAM-related, identifying the errant RAM stick will be easier if
I know what 'CPU 3' means. Is it:
(a) the fourth core on the first CPU (counting each of the
16 cores from 0 to 15), or
(b) is it the fourth CPU (counting only physical CPUs from 0 to 3)?
Any comments appreciated.
Thanks,
Dave.
--
Dave Ewart
davee at ceu.ox.ac.uk
Computing Manager, Cancer Epidemiology Unit
University of Oxford / Cancer Research UK
PGP: CC70 1883 BD92 E665 B840 118B 6E94 2CFD 694D E370
Get key from http://www.ceu.ox.ac.uk/~davee/davee-ceu-ox-ac-uk.asc
N 51.7518, W 1.2016
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.us.dell.com/pipermail/linux-poweredge/attachments/20090203/d60c6369/attachment.sig
More information about the Linux-PowerEdge
mailing list