Megaraid SAS RESET Problems with PE2950 attached to a MD1000 viaPerc5e

Mansell, Gary Gary.Mansell at ricardo.com
Wed Feb 28 05:33:11 CST 2007


Dear all,

I am close to sending back for a refund two identical PE2950 systems
that I bought from Dell in November last year to serve as redundant
fileservers at my company.

I am looking to hear from anyone who can shed some light on the problems
that I am seeing with the systems.

The machines are PE2950's running RHEL 4 (fully up2dated) and are each
attached to an MD1000 array 15 SAS drives in a RAID 5 configuration via
an internal Perc5e card. One of the machines is attached to an Overland
Neo200 tape library with 2 Ultrium 3 tape drives via an Adaptec 39320
SCSI card. The intention was to house the two machines in separate
buildings on our site and rsync the filesystems between them hourly to
provide a rudimentary cold-swap failover solution. 


The first problem that I have been trying to resolve with the help of
Dell Gold Support are SAS resets that occur when one of the machines is
subjected to heavy IO. Stangely (for identical machines) only one has
suffered this problem to date; but Dell think that the problem lies with
the megaraid_sas driver. This is supported by other mentions on the
Internet that I have found.

This week Dell supplied me with a beta driver to fix the problem:
megaraid_sas-v00.00.03.09 but this still exhibits the same issue.

The errors that I get in the messages file look like this:
Feb 28 01:35:22 dfgsrv1 kernel: megasas: RESET -25566187 cmd=2a <c=2 t=0
l=0> retries=0
Feb 28 01:35:22 dfgsrv1 kernel: megasas: reset successful 

On the surface of it the errors don't seem to have caused corruption to
the filesystem but at least once I have had the filesystem transitioned
to Read Only as there was corruption in the ext3 journal. It seems to be
asking for trouble to put these machines into service as fileservers for
upto 200 machines with this error hanging around!!


The second issue that I am having is related to the machine with the
tape drive attached - I am getting SCSI bus resets when trying to
read/write to the tape. According to Overland the problem is due to a
bug in the RHEL4 aic79xx driver:

"The problem I've come across on every Adaptec in a 2.6 kernel is that
the domain validation gets re-activated by the driver/module when it
loads and even using the aic79xx=dv:{0,0,0} syntax in modprobe.conf
doesn't help. It still activates and then brings the SCSI bus down..."

Obviously it would be unwise to go live with these machines if I cannot
be assured that I can backup/restore successfully!

Has anyone else seen these problems, does anyone have any info they can
add to this?

All advice gladly accepted (I am four months late with deployment with
no solution in sight !!)

Regards

Gary Mansell

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
This e-mail and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed.If you have received this e-mail in error please notify the sender immediately and delete this e-mail from your system.Please note that any views or opinions presented in this e-mail are solely those of the author and do not necessarily represent those of Ricardo (save for reports and other documentation formally approved and signed for release to the intended recipient).Only Directors are authorised to enter into legally binding obligations on behalf of Ricardo. Ricardo may monitor outgoing and incoming e-mails and other telecommunications systems.
By replying to this e-mail you give consent to such monitoring.The recipient should check e-mail and any attachments for the presence of viruses. Ricardo accepts no liability for any damage caused by any virus transmitted by this e-mail. "Ricardo" means Ricardo plc and its subsidiary companies.
Ricardo plc is a public limited company registered in England with registered number 00222915.
The registered office of Ricardo plc is Shoreham Technical Centre, Shoreham-by Sea, West Sussex, BN43 5FG.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 



More information about the Linux-PowerEdge mailing list