alazarev at itg.uiuc.edu
Wed Aug 27 13:44:00 CDT 2003
Yeah, it is strange. As far as the failure goes here's what happened:
The tower (8 X 36GB SCSI IBM Ultra 160 drives in RAID 5 config on PERC3
external single channel) was working just great for 9-10 hours. We plugged
it in Saturday morning, setup the RAID 5 array in the PERC3 BIOS setup,
started the init, then booted to linux and created the filesystem on the
new RAID 5 array. During this, the RAID init was proceeding as planned -
no problems. We started copying data from another filesystem to this new
filesystem. It worked perfectly. After 9 hours, the copy was done. But the
initialization was not done. We booted to single user mode, in order to
rsync the filesystems and expand some other stuff. As soon as it got to
single user mode, the PE4600 started beeping the error codes, indicating a
RAID failure failure. We immediately went to the PERC3 BIOS and saw that
the new RAID 5 array, which had been working for 9 hours, now had the
first two drives listed as "FAILED". The array was lost. All errors were
indicated by the PERC3 controller itself.
Since then, we've moved the tower to another linux box, attached via
adaptec 29160 non-raid. We've been banging on the drives for the last 2
days with bonnie++1.03a, and they totally work. Nothing is wrong with the
drives/tower that we can tell. The drives that were indicated as failed
are totally working perfectly.
So I'm really beginning to wonder if we have to let it initialize first
before we start cp/rsync. But I'm also wondering if I should recommend
trashing the tower. If there is some incompatibility with the PERC3, then
this may happen again.
On Wed, 27 Aug 2003, jason andrade wrote:
> On Tue, 26 Aug 2003, Alexander Lazarevich wrote:
> > I ask because the array worked for 9+ hours while we were copying data to
> > it. When the copy was done, we booted to single user mode to rsync, and
> > the array failed - two drives went bad. The drives are fine, but the
> > entire array was lost. We didn't lose data because we still have the
> > original array. But we need to determine if the tower of drives are okay,
> > which we are almost certain they are. They've been running fine on AIX
> > 4.3.3 system for 2+ years.
> > Can the array init complete in single user mode? PERC3/QC manual doesn't
> > say jack about it.
> that is strange. it is definitely recommended to let the array complete
> initialization before using it but it _should_ be possible to use it
> while it is initializing.. i've done that with a perc3/qc (connected
> to a PV220S). of course it is also faster to let the array complete
> first and then use it for data copying..
> when you say the array failed as two drives 'went bad' what sort of
> symptoms did you see ? e.g errors on the scsi bus ? any errors from
> the controller itself ?
> Linux-PowerEdge mailing list
> Linux-PowerEdge at dell.com
> Please read the FAQ at http://lists.us.dell.com/faq or search the list archives at http://lists.us.dell.com/htdig/
More information about the Linux-PowerEdge