dset for CentOS 3.9? / trouble shooting hardware issues on an old server?

Jeff Boyce jboyce at meridianenv.com
Thu Oct 13 11:09:27 CDT 2011


Replying to the digest version again, see below.

>
> ------------------------------
>
> Message: 2
> Date: Wed, 12 Oct 2011 15:35:57 -0400
> From: "J. Epperson" <Dell at epperson.homelinux.net>
> Subject: Re: dset for CentOS 3.9? / trouble shooting hardware issues
> on an old server? (PJF)
> To: linux-poweredge at dell.com
> Message-ID:
> <2aa35f70b5df61248bff0ee8bff0aa47.squirrel at epperson.homelinux.net>
> Content-Type: text/plain;charset=iso-8859-1
>
> On Wed, October 12, 2011 14:04, Jeff Boyce wrote:
>>
>> Also something to check (if you are still diagnosing your problem) is
>> that you have the latest firmware on all your HDs.  Your error messages
>> are similar to what I had on the one and only problem I have had on my
>> (still running) PE2600.  I am was running a Raid5 for several years
>> without any problem when all of a sudden in Feb. 2009 I dropped two
>> drives out of my array simultaneously.  In my case, at least I saw amber
>> LEDs on the drives that dropped out.  The system would reboot and come
>> back up for a while, anywhere from half a day to a couple of weeks.  Each
>> time it crashed it might be a different combination of two drives.  It
>> was my only machine and I had to keep it up as much as possible while
>> diagnosing and repairing the problem.  With some assistance from this
>> mailing list and some Dell techs it was determined that my problem was a
>> firmware bug.  I updated the firmware on my drives and it has been
>> running problem-free ever since.  Also throughout the whole problem over
>> the course of a month, I never lost any data on the drives, and I did not
>> have to restore anything from backup.  I will finally be replacing my
>> PE2600 in the next few months and it will move into a backup role.
>>
>
> That's kind of odd.  Any time you lose two disks simultaneously in a
> RAID5, it should be game over for the existing container.  Or do you mean
> you replaced them, built a new container, and restored from backup and
> then had these problems?
>
>
My original problem occured suddenly and unexpectedly, there had been no 
changes in the hardware (no disk additions, etc.) for over a year prior to 
the crash, and the system had been up continuously for over a year.  After 
updating the firmware, all the disks came back on-line, and all the data was 
there up to the point of the crash.  A Dell tech walked me through some 
steps that I think included something like re-initializing or re-building 
the array parity (don't recall exactly what it was and don't have my notes 
handy right now.  I am enough of a Linux novice that I don't know enough 
about what happened and how it was fixed to explain it any more).  Files 
that were open at the time of the crash were saved to local disks then moved 
back to the fileserver later.  We did not loose a single file and I did not 
have to restore from tape.  After the system had been running for a few 
months without showing any sign of the problem re-occuring, I added two 
disks to my Raid5 array and it has been up continuously ever since.

Jeff Boyce
Meridian Environmental



More information about the Linux-PowerEdge mailing list