NVRAM and Megaraid.

Pierre POMES ppomes at reservit.com
Wed Jun 5 08:25:00 CDT 2002


Hi all,

I have been running for 2 months a cluster with Kimberlite on two PE
2550 with a shared powervault 210s controlled by a PERC3 controller on
each node.

The setup has been working properly for one month, but this night, the
local container crashed (/dev/sdb, RAID 0+1).

I had a lot of SCSI warning message in /var/log/messages :

Jun  5 03:41:37 lascours kernel: scsi : aborting command due to timeout
: pid 107312, scsi2, channel 0, id 0, lun 0 Write (10) 00 00 00 bc 54 00
00 01 00
Jun  5 03:41:42 lascours kernel: scsi : aborting command due to timeout
: pid 107315, scsi2, channel 5, id 15, lun 0 Receive Diagnostic 01 80 00
04 00
Jun  5 03:42:09 lascours kernel: scsi : aborting command due to timeout
: pid 107352, scsi2, channel 0, id 0, lun 0 Write (10) 00 03 1c 8a a6 00
00 08 00
Jun  5 03:42:09 lascours kernel: scsi : aborting command due to timeout
: pid 107353, scsi2, channel 0, id 0, lun 0 Write (10) 00 02 74 8a ce 00
00 08 00
....
....
Jun  5 04:01:21 lascours kernel: SCSI disk error : host 2 channel 0 id 0
lun 0 return code = 25040001
Jun  5 04:01:21 lascours kernel:  I/O error: dev 08:12, sector 17

Jun  5 04:01:25 lascours kernel: SCSI disk error : host 2 channel 0 id 0
lun 0 return code = 25040001
Jun  5 04:01:25 lascours kernel:  I/O error: dev 08:17, sector 21510560
Jun  5 04:01:25 lascours kernel: SCSI disk error : host 2 channel 0 id 0
lun 0 return code = 25040001

This morning I rebooted the two nodes. But the PERC bios reported the
same warning on the two nodes :

Configuration of NVRAM and drives mismatch (normal mismatch).

I run the BIOS utility to see the configuration :
1) On the NVRAM configuration, some hard disks were is failed state
2) On the disk configuration, everything was fine

So I chose to save the disk configuration and my container became
available again.

My question : how the NVRAM configuration can be modified ? By the
megaraid driver in the kernel ? (I'am using a redhat 7.3 with the last
kernel 2.4.18-4). Two weeks ago, I change read-ahead policy from
adaptive (default setting) to read-ahead (for better performance). May
it be a problem ?

I know that some people in this list are running on production with this
config. Some partitions are used by the two nodes, at the same time, 
but theses devices don't contain any filesystem (they are raw devices
only, used by Kimberlite). It is not recommended, but it has been
working for a while without any troubles...

Thanks for your help,
Pierre

-- 
Pierre POMES                        mailto:ppomes at reservit.com
Interface Technologies




More information about the Linux-PowerEdge mailing list