PERC3/Di failure workaround hypothesis

Robert L Mathews lists at tigertech.com
Fri May 21 18:01:01 CDT 2004


At 5/20/04 2:21 PM, Matt_Domsch at dell.com wrote:

>We believe we have root cause to the SCSI command timeouts seen with
>the PERC3/Di, related to how the RAID controller firmware handles
>read and write caching.

Is this the problem in which the machine can lock up without any errors 
being visible on the console/logs/LED? If you could provide some more 
detailed information, that would help some of us tell if we will be able 
to provide useful feedback on the same problem you're testing.

In other words, I have no idea whether my problem is due to SCSI command 
timeouts -- I just know the symptoms, which are the same on two machines 
(a 2650 and a 2550 with Perc3/Di):

 - no error message on console/logs/LED
 - machine still pingable
 - network services that don't touch the disk, such as named,
   still running fine
 - everything else that requires disk access is locked up
 - all disk activity has stopped
 - no orange lights on the disks
 - problem persists even with the latest released Perc firmware and
   aacraid driver
 - problem persists even if ethernet is disabled, so it's not the tg3
   driver

Is this the problem that you are investigating?

In general, it would make me feel better about it if you could tell us 
what you're doing and what you've found. Things like, Do you have a 
reproducible test case? What exactly are the symptoms of the issue you're 
working on, so we can tell if it's the same as our issue? etc.

-- 
Robert L Mathews, Tiger Technologies      http://www.tigertech.net/

 "Ignorance more frequently begets confidence than does knowledge."
                                                           -- Darwin




More information about the Linux-PowerEdge mailing list