Dell Power Edge R710 with CentOS
Markus.Kovero at nebula.fi
Mon May 24 10:12:45 CDT 2010
>>> (eg. in solaris) Dell has been aware of the issue for months without
>>> real fix, just workarounds.
>>I'm not sure what "latest" means, but we did manage to find the root
>>cause of the failure where the MSI bit would get stuck - which also
>>explains why disabling MSI-X worked around it. The right solution is
>>to use code already in the driver to manage the timeout on that bit
>>automatically, which is what we are testing with 5.5+ and expect in
>>newer RHEL kernels ASAP.
>I'm really glad I follow this mailing list and hence came to know of this problem. If Dell has been aware of this issue isn't there some way to notify >users? I have ~300 R410 systems here and not a word about this. Dell, how do you expect users to find out!?
Even through support it took us couple months to figure the severity of the problem, which eventually came to us after we started googling, not advised from Dell. It seems they're not that interested in keeping user community informed about such minor details.
>From what I've gathered;
Redhat is investigating (includes workarounds)
and Broadcom made fix in driver-level
afaik this is driver-level "timeout" for stuck MSI-bit.
Disabling C-states seems to work, I think. Although it increases power consumption of the servers.
What I'd like to see, is that there should be lower level fix for issue, so that non-redhat-glue-fixed systems could work with c-states enabled, like they should.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Linux-PowerEdge