Raid failure PE2500

David Guerrero david at boe.es
Tue Aug 26 09:56:00 CDT 2003


Hi!

In the past days I was having errors in the /var/log/messages file about 
several failures in one of our disks. The machine is a PE 2500 with 
PERC/3Di and 6 disks configured as a raid-10 (stripe of 3 mirrors). We 
run Oracle in this system.

The front led of the system was green, and afacli showed the 6 disks as 
OK. The only problem i could find was the "disk show defects 4" showing 
4 "grown errors". The rest of my disks have 0 "grown errors". I had to 
convince Dell tech suppport that the disk had a problem.

The question is as a raid system, with this disk having problems (disk 
4) being part of a mirror container, the operating systems may be 
isolated of this problem, but the reallity is that the the system 
crashed very bad a number of times, showing in the logs the errors that 
I pasted below and ext3 errors in console about failed writes.

Finally Dell support replaced the disk, and everything seems to run 
smooth since then, but the guy that replaced the disk didn't know a word 
about linux, so, i'm still in doubt about this question:

  * the failure of a disk should not affect the stability of the system, 
isn't it?

This is system is going to be in production very soon, and i would like 
to fix any mistakes by now...

The log:

Aug 18 22:50:54 dragon kernel: aacraid:ID(0:04:0); Error Event 
[command:0x28]
Aug 18 22:50:54 dragon kernel: aacraid:ID(0:04:0); Medium Error, Block 
Range 205568 : 205695
Aug 18 22:50:54 dragon kernel: aacraid:ID(0:04:0); Unrecovered Read Error
Aug 18 22:51:31 dragon kernel: aacraid:ID(0:04:0); Error Event 
[command:0x28]
Aug 18 22:51:31 dragon kernel: aacraid:ID(0:04:0); Medium Error, Block 
Range 147200 : 147327
Aug 18 22:51:31 dragon kernel: aacraid:ID(0:04:0); Unrecovered Read Error
Aug 18 22:52:04 dragon kernel: aacraid:ID(0:04:0) Medium Error, LBN 
Range 158592:158719
Aug 18 22:52:05 dragon kernel: aacraid:ID(0:04:0) Starting BBR sequence
Aug 18 22:52:19 dragon kernel: aacraid:ID(0:04:0) Medium Error, LBN 
Range 158336:158463
Aug 18 22:52:20 dragon kernel: aacraid:ID(0:04:0) Starting BBR sequence
Aug 19 00:13:35 dragon kernel: aacraid:ID(0:04:0); Error Event 
[command:0x28]
Aug 19 00:13:35 dragon kernel: aacraid:ID(0:04:0); Medium Error, Block 
Range 71096225 : 71096319
Aug 19 00:13:35 dragon kernel: aacraid:ID(0:04:0); Unrecovered Read Error
Aug 19 00:13:37 dragon kernel: aacraid:ID(0:04:0) Medium Error, LBN 
Range 71096225:71096319
Aug 19 00:13:38 dragon kernel: aacraid:ID(0:04:0) Starting BBR sequence
Aug 19 00:13:57 dragon kernel: aacraid:Container 62 started REBUILD task 
on drive 0:4:0

Thanks for any help.

-- 
David Guerrero                                      E-mail: david at boe.es
Dpto Tecnologias de la Informacion                    Telf: 91 384 16 13
B.O.E. (Boletin Oficial del Estado)         Ministerio de la Presidencia





More information about the Linux-PowerEdge mailing list