Raid failure PE2500

jason andrade jason at rtfmconsult.com
Tue Aug 26 16:25:01 CDT 2003


On Tue, 26 Aug 2003, David Guerrero wrote:

> In the past days I was having errors in the /var/log/messages file about
> several failures in one of our disks. The machine is a PE 2500 with
> PERC/3Di and 6 disks configured as a raid-10 (stripe of 3 mirrors). We
> run Oracle in this system.

interesting - i've setup a system that's almost exactly the same.. except
we don't use RAID10 for the oracle setup there.. (dba wanted it setup a
different way, so there are 6 disks, raid1 for the OS with a hot spare
and 3 separate disks for the oracle databases.. (no raid) and a recovery
procedure... (obviously has no requirement for business continuity here..)

[...]

> Finally Dell support replaced the disk, and everything seems to run
> smooth since then, but the guy that replaced the disk didn't know a word
> about linux, so, i'm still in doubt about this question:
>
>   * the failure of a disk should not affect the stability of the system,
> isn't it?
>

this is theoretically correct.  unfortunately in real life sometimes a
disk can fail and cause issues on the scsi bus which affects the OS as
the scsi bus may 'freeze' for longer than it likes.. stuff doesn't
get written out and the OS panics..  the panic is to try to minimize
the possibility of further data getting written out which could result
in corruption (or lack of recoverability from minor corruption) to the
filesystem.

i have seen this happen occasionally.. it isn't common by any means.

regards,

-jason




More information about the Linux-PowerEdge mailing list