Memtest errors on DELL 2650 Resolved
simonw at zynet.net
Wed Feb 25 05:26:43 CST 2009
On Tuesday 24 February 2009 18:11:47 Stuart_Hayes at dell.com wrote:
> I seem to recall that the 2650 BIOS (via system management mode) uses
> some memory when it is doing legacy emulation of USB devices (i.e.,
> making USB keyboards & mice look like PS/2 keyboards and mice). Once a
> real OS (i.e., not DOS) comes up, it should take over control of the USB
> controllers, and the BIOS will quit doing that. BIOS should probably
> tell the OS that it uses that memory, too, but I don't know if it does
> or if memtest86+ looks at where BIOS says it is using memory.
reports on the net say this affects only some Poweredge 2650's, and that
replacing the motherboard resolves it.
In my case disabling USB in bios, did allow memtest86+ to run for 15 hours (in
a 3GB) machine, as opposed to failing after a few seconds. For us, and given
the machines age, this is a fine workaround for now, since we weren't
[knowingly] using the USB port at all on this device (it still has a floppy
drive who needs USB flash drives?!).
As regards memtest86+ the documentation says this on sizing memory:
"Starting in version 2.9 alternative methods are available for determining the
memory size. By default the test attempts to get the memory size from the
BIOS using the "e820" method. With "e820" the BIOS provides a table of memory
segments and identifies what they will be used for. By default Memtest86
will test all of the ram marked as available and also the area reserved for
the ACPI tables. This is safe since the test does not use the ACPI tables
and the "e820" specifications state that this memory may be reused after the
tables have been copied. Although this is a safe default some memory will
not be tested."
(Source of quote debian package which is based on memtest86 3)
I ran with default settings, so it does look like memtest should be avoiding
the memory if the bios was correctly declaring its use of same.
Of course I have no way of knowing if this is or isn't responsible for the
problem that caused us to take the machine out of service. I suspect that was
a RAID driver issue - so having latest firmware, and newer version of driver
in the kernel, may resolve that. Then I'll leave it stress testing just in
case the computer room isn't warm enough already.
More information about the Linux-PowerEdge