FW: T410 Network Failure

Ryan Pugatch rpug at tripadvisor.com
Thu Sep 3 10:17:31 CDT 2009


Is this issue fixed by using the new driver from support.dell.com and 
NOT having disable_msi?

Thanks,

Ryan

Narendra_K at Dell.com wrote:
> Hello,
> 
> Yes, there might not be a link down message, everytime this issue is
> seen.In a failed state, you cannot ping to the system and you cannot
> ping from the system.With disable_msi=1 we have not seen the issue. When
> the issue occurs, except that the system becomes unreachable, there
> might not be any logs in dmesg or syslog. Issue is not seen with
> upstream kernel. Dell and RedHat are working on this, and we should know
> soon, what is going on. 
> 
> With regards,
> Narendra K
> 
>> -----Original Message-----
>> From: Ryan Pugatch [mailto:rpug at tripadvisor.com] 
>> Sent: Wednesday, September 02, 2009 2:33 AM
>> To: K, Narendra
>> Cc: akrherz at iastate.edu; linux-poweredge-Lists
>> Subject: Re: FW: T410 Network Failure
>>
>> (sorry, resent as I sent from wrong email originally)
>>
>> Just checked logs again and the copper link down message 
>> hasn't happened every time there was a problem, so that may 
>> not be related.
>>
>> Ryan
>>
>>
>> Ryan Pugatch wrote:
>>> FWIW, I am also having the same issue with some R710's.  They are a 
>>> part of a hadoop cluster.  Interestingly enough, so far only 
>> 2 out of 
>>> the 3 servers have experienced the issue thus far in that 
>> cluster.  We 
>>> also run our corporate mail server on an R710 and that has not shown 
>>> any problems yet (except for a weird issue where outgoing TCP 
>>> connections would intermittently fail until we restarted the 
>> network interfaces..
>>> not sure if this is related--has only happened once).
>>>
>>> We are running CentOS 5.3.  All three hadoop machines are running
>>> 2.6.18-128.2.1.el5 and the mail server is running 2.6.18-128.1.10.el5
>>>
>>> It seems that when the network would drop it would log:
>>> 	
>>> kernel: bnx2: eth0 NIC Copper Link is Down
>>>
>>> Not sure that the disable_msi option will fix the two hadoop 
>> machines 
>>> having the issues as the problem happens somewhat randomly and not 
>>> easily reproducible.  That being said, we aren't getting 
>> some network 
>>> related errors in our hadoop logs that we had been getting 
>> previously 
>>> so I suspect that is a good sign.  Time will tell!
>>>
>>> Is this issue related to the 2.6.28-rc3 regression specified here? 
>>> http://lkml.indiana.edu/hypermail/linux/kernel/0811.0/01374.html
>>>
>>> I am hoping a fix will make its way to RHEL and downstream to CentOS 
>>> (has anyone heard if that is happening?  I'm having trouble 
>> finding a 
>>> redhat or centos bug logged).
>>>
>>> Are there any performance concerns with using disable_msi?  I know 
>>> that the driver from Dell.com should fix the problem but I'd 
>> prefer to 
>>> use a driver provided from upstream.
>>>
>>> Ryan Pugatch
>>> Systems Administrator, TripAdvisor
>>>
>>>
>>> Narendra_K at dell.com wrote:
>>>> Hello,
>>>>
>>>> Thanks, this info is of great help.
>>>>
>>>> With regards,
>>>> Narendra K
>>>>
>>>> -----Original Message-----
>>>> From: daryl herzmann [mailto:akrherz at iastate.edu]
>>>> Sent: Thursday, August 13, 2009 7:07 PM
>>>> To: K, Narendra
>>>> Cc: Biligiri, Raghavendra; linux-poweredge-Lists
>>>> Subject: RE: FW: T410 Network Failure
>>>>
>>>> On Thu, 13 Aug 2009, Narendra_K at Dell.com wrote:
>>>>
>>>>> Thanks. Top output need not be at the time of failure. It 
>> can be any 
>>>>> time, just to get an idea as to what is resource 
>> utilization so that 
>>>>> we can replicate it. And general high level detail about the 
>>>>> database you are using - like is it a oracle database ?
>>>> It is running PostgreSQL 8.4 .  sar reports that the average CPU 
>>>> utilization for today is 0.44% . 10% of memory is used.  network 
>>>> utilization is only a few kbps.  I suspect when the 
>> failures occured, 
>>>> the machine got hit with a few hundred postgresql connections at 
>>>> once, but I have no way to prove it.
>>>>
>>>> sorry again,
>>>>   daryl
>>>>
>>>> _______________________________________________
>>>> Linux-PowerEdge mailing list
>>>> Linux-PowerEdge at lists.us.dell.com
>>>> https://lists.us.dell.com/mailman/listinfo/linux-poweredge
>>>> Please read the FAQ at http://lists.us.dell.com/faq
>>> _______________________________________________
>>> Linux-PowerEdge mailing list
>>> Linux-PowerEdge at lists.us.dell.com
>>> https://lists.us.dell.com/mailman/listinfo/linux-poweredge
>>> Please read the FAQ at http://lists.us.dell.com/faq
>>



More information about the Linux-PowerEdge mailing list