I'd thought you'd all appreciate an update on this one.  It turns out that the users were mistaken, and they were _not_ seeing checksum errors on the local system -- they were seeing them across an nfs mount from an older 2.4.2 (and non-Intel) system.

We've disabled nfs caching (with -noac) on the client, as they can't upgrade the kernel for a while, so we hope that works around the problem.


>We've got Red Hat 7.1 with kernel 2.4.18-18.7.xsmp running on a PowerEdge 6300; the system has four PIII Xeon cpus and 2 GB RAM.  It's got four filesystems on three logical disks, all on its original PERC.  sda is a RAID 5 array of four disks, sdb is a RAID 1 of two disks, and sdc is a RAID 5 array of eight disks in a PowerVault 220.
>All hardware upgrades are Dell OEM.
>Three users have told me that today and yesterday, when they copied a file from one filesystem to another, the file got corrupted -- the checksum of the copy didn't match that of the original.  They were all logged into the system locally; the copy was not made over any network protocol.  Both the source and destination filesystems are shared via nfs.
>I'm trying to duplicate the error in hope of finding a pattern, but it is so rare that I haven't yet been able to duplicate it once.
>I couldn't find any problem like this in Google groups for this kernel.  I hesitate to blame the hardware, as I think it's checksumming in so many places.
>Does anyone have any ideas as to where the source of the problem may be?
