afacli says Number of PRIMARY defects on drive: 5903

Arne Kepp trashbin at smallworld.no
Wed Aug 28 09:32:00 CDT 2002


Hi,
Specs: Poweredge 4400 with 3/Di with 6 drives , Running RedHat 7.3 on
the standard kernel.
Earlier this morning we had a kernel panic, unfortunately the user who
rebooted the machine did not write down any of the info on the terminal
and the kernel message logs of course don't mention it. It *could* have
been one like:
scsi : aborting command due to timeout : pid 778229, scsi 3, channel 0,
id 0, lun 0, write (10) 00 00 03 7a 85 00 00 10 00
SCSI host 3 abort (pid 778229 timed out - resetting
SCSI bus is being reset for host 3 channel 0
Kernel panic : scsi_free : Bad offset
In interrupt handler - not syncing

I used to get these (with the same id,lun etc) almost weekly for a while
without being able to fix them, but they hasn't shown for a while after
I removed a scsi controller and created a new container on disk0..

Then, later today after the machine had been rebooted and Ext3 cleaned
itself up, I look at the console and it lists (about once every second,
notice it's a different disk than what used to be mentioned on the
kernel panics):
I/O Error dev 08:41, sector 46668232
read callback: read failed, status = 5
SCSI disk error: host 2 cheannel 0 id 4 lun return code = 1

I did a fairly clean shutdown and rebooted the machine without mounting
this volume and started running diagnostics, here I am wondering if some
of the pros can give me their opinion on this:
I run afacli (open afa0) and "disk show defects 5" on a drive that is
currently not mounted. The result is:
Number of PRIMARY defects: 5903
Number of GROWN defects on drive: 0

Is this like in the old days, where you expected to discover some bad
blocks on the first format, or is this serious? I ran fsck and checked
for bad blocks (only reading as far as I could see) without getting any
errors on the drive.

I have done a dump from the afacli diagnostics and put the files in
http://chaos.smallworld.no/pelinux , feel free to have a look if they're
of any interest. I avoided attaching them in case mailman expands the
text files. Unfortunately I do not have redundancy for this machine,
proper cdrom-based diagnostics will have to wait until after business hours.

Thank you for reading, any opinions are appreciated.

Arne Kepp
Smallworld Systems AS





More information about the Linux-PowerEdge mailing list