PERC3/Di failure workaround hypothesis
Robert L Mathews
lists at tigertech.com
Fri May 21 18:01:01 CDT 2004
At 5/20/04 2:21 PM, Matt_Domsch at dell.com wrote:
>We believe we have root cause to the SCSI command timeouts seen with
>the PERC3/Di, related to how the RAID controller firmware handles
>read and write caching.
Is this the problem in which the machine can lock up without any errors
being visible on the console/logs/LED? If you could provide some more
detailed information, that would help some of us tell if we will be able
to provide useful feedback on the same problem you're testing.
In other words, I have no idea whether my problem is due to SCSI command
timeouts -- I just know the symptoms, which are the same on two machines
(a 2650 and a 2550 with Perc3/Di):
- no error message on console/logs/LED
- machine still pingable
- network services that don't touch the disk, such as named,
still running fine
- everything else that requires disk access is locked up
- all disk activity has stopped
- no orange lights on the disks
- problem persists even with the latest released Perc firmware and
- problem persists even if ethernet is disabled, so it's not the tg3
Is this the problem that you are investigating?
In general, it would make me feel better about it if you could tell us
what you're doing and what you've found. Things like, Do you have a
reproducible test case? What exactly are the symptoms of the issue you're
working on, so we can tell if it's the same as our issue? etc.
Robert L Mathews, Tiger Technologies http://www.tigertech.net/
"Ignorance more frequently begets confidence than does knowledge."
More information about the Linux-PowerEdge