attempt to access beyond end of device
David Titzer
dtitzer at woti.com
Tue May 1 15:45:24 CDT 2007
Thanks for the advice. I've learned that we've got a firmware issue with our Coraid shelf. We are running raid5 underneath.
The latest firmware addresses an issue where a drive failure can cause corruption on the raid5 volume (where have we heard that before?). It may be possible to do the firmware update _before_ replacing the failed drive. Proceeding in that order may lead to a cleaner recovery, so I've been told. I'm confirming that with Coraid just to be sure.
-dat
_____
From: Michael E Brown [mailto:Michael_E_Brown at dell.com]
To: David Titzer [mailto:dtitzer at woti.com]
Cc: linux-poweredge at dell.com
Sent: Tue, 01 May 2007 15:25:15 -0400
Subject: Re: attempt to access beyond end of device
On Tue, May 01, 2007 at 12:08:18PM -0400, David Titzer wrote:
> Seeing the following paired kernel error messages:
>
> "attempt to access beyond end of device"
> "dm-1: rw=0, want=26553873480, limit=3786932224"
wow, that is a huge discrepancy. You are definitely in for some
problems.
>
> The server is a PE2950 running RHEL ES 4.4. The system has one filesystem under LVM, physically living on an AoE storage server. AoE drivers are up-to-date. Server exports the root of that logical filesystem to other servers. The server's kernel is 2.6.9-42-ELsmp.
>
> We are seeing odd small-file corruption. Sometimes, this corruption occurs to static data files after being accessed for read.
The error would definitely explain corruption.
>
> I've not seen info related to this problem on recent Red Hat releases, or recent kernels for that matter.
>
> I could use a good starting point for troubleshooting this! This isn't
> the only AoE installation I'm working with, but it is the only one
> using PE2950s, and the only one with such errors. Thanks.
You need to double-check all of the sizes of everything. Things to check
for:
-- partition sizes represent actual size of disk
-- lvm pv sizes match size of device it is on
-- lvm vg sizes match size of all pvs in vg
-- check that you dont have any devices that may have gone offline
and shrunk your vg size unexpectedly. Something like a raid0 or
something with an offline device would be *bad*.
-- check that the size of the fs on the device matches the size of
the device.
Things that could have gone wrong:
-- somebody resized a device underneath you. For example, if you
had a raw AoE device (not sure what kind you have, do they do raid
underneath?) that was, eg. 100GB, did a pvcreate, vgextend, etc,
then somebody resized the device to 50GB, you might not see problems
until it started filling up.
--
Michael
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.us.dell.com/pipermail/linux-poweredge/attachments/20070501/6c66bbd8/attachment.htm
More information about the Linux-PowerEdge
mailing list