Redhat errata kernel addresses tg3 driver lockup

Amit_Bhutani@Dell.com Amit_Bhutani at Dell.com
Tue Nov 19 18:11:00 CST 2002


The latest Red Hat errata kernel 2.4.18-18.8.0 states that it addresses the
"Kernel Crashes in TG3 Driver" issue (Bugzilla ID:69920). 
After installing the kernel-source for the errata rpm and performing a diff
between the errata kernel (2.4.18-18.8.0) and the RH 8.0 stock kernel
(2.4.18-14), it was evident that the tg3 patch was "not" included in the
errata kernel. 

Dell is in the process of notifying Red Hat about this. 

Refer below for the actual patch (originally posted on Linux Kernel Mailing
List)

Thanks,

Amit Bhutani
Software Engineer - Linux Development Group
Dell Enterprise Software Development

PATCH:
-----

ChangeSet 1.790, 2002/11/14 14:43:47-05:00, davem at redhat.com

	Fix tg3 net driver to properly disable interrupts during some TX
operations


# This patch includes the following deltas:
#	           ChangeSet	1.789   -> 1.790  
#	   drivers/net/tg3.c	1.37    -> 1.38   
#

 tg3.c |   46 ++++++++++++++++++++++++++++++++++++++--------
 1 files changed, 38 insertions(+), 8 deletions(-)


diff -Nru a/drivers/net/tg3.c b/drivers/net/tg3.c
--- a/drivers/net/tg3.c	Fri Nov 15 09:08:21 2002
+++ b/drivers/net/tg3.c	Fri Nov 15 09:08:21 2002
@@ -59,8 +59,8 @@
 
 #define DRV_MODULE_NAME		"tg3"
 #define PFX DRV_MODULE_NAME	": "
-#define DRV_MODULE_VERSION	"1.1"
-#define DRV_MODULE_RELDATE	"Aug 30, 2002"
+#define DRV_MODULE_VERSION	"1.2"
+#define DRV_MODULE_RELDATE	"Nov 14, 2002"
 
 #define TG3_DEF_MAC_MODE	0
 #define TG3_DEF_RX_MODE		0
@@ -2373,13 +2373,28 @@
 	/* No BH disabling for tx_lock here.  We are running in BH disabled
 	 * context and TX reclaim runs via tp->poll inside of a software
 	 * interrupt.  Rejoice!
+	 *
+	 * Actually, things are not so simple.  If we are to take a hw
+	 * IRQ here, we can deadlock, consider:
+	 *
+	 *       CPU1		CPU2
+	 *   tg3_start_xmit
+	 *   take tp->tx_lock
+	 *			tg3_timer
+	 *			take tp->lock
+	 *   tg3_interrupt
+	 *   spin on tp->lock
+	 *			spin on tp->tx_lock
+	 *
+	 * So we really do need to disable interrupts when taking
+	 * tx_lock here.
 	 */
-	spin_lock(&tp->tx_lock);
+	spin_lock_irq(&tp->tx_lock);
 
 	/* This is a hard error, log it. */
 	if (unlikely(TX_BUFFS_AVAIL(tp) <= (skb_shinfo(skb)->nr_frags + 1)))
{
 		netif_stop_queue(dev);
-		spin_unlock(&tp->tx_lock);
+		spin_unlock_irq(&tp->tx_lock);
 		printk(KERN_ERR PFX "%s: BUG! Tx Ring full when queue
awake!\n",
 		       dev->name);
 		return 1;
@@ -2520,7 +2535,7 @@
 		netif_stop_queue(dev);
 
 out_unlock:
-	spin_unlock(&tp->tx_lock);
+	spin_unlock_irq(&tp->tx_lock);
 
 	dev->trans_start = jiffies;
 
@@ -2538,13 +2553,28 @@
 	/* No BH disabling for tx_lock here.  We are running in BH disabled
 	 * context and TX reclaim runs via tp->poll inside of a software
 	 * interrupt.  Rejoice!
+	 *
+	 * Actually, things are not so simple.  If we are to take a hw
+	 * IRQ here, we can deadlock, consider:
+	 *
+	 *       CPU1		CPU2
+	 *   tg3_start_xmit
+	 *   take tp->tx_lock
+	 *			tg3_timer
+	 *			take tp->lock
+	 *   tg3_interrupt
+	 *   spin on tp->lock
+	 *			spin on tp->tx_lock
+	 *
+	 * So we really do need to disable interrupts when taking
+	 * tx_lock here.
 	 */
-	spin_lock(&tp->tx_lock);
+	spin_lock_irq(&tp->tx_lock);
 
 	/* This is a hard error, log it. */
 	if (unlikely(TX_BUFFS_AVAIL(tp) <= (skb_shinfo(skb)->nr_frags + 1)))
{
 		netif_stop_queue(dev);
-		spin_unlock(&tp->tx_lock);
+		spin_unlock_irq(&tp->tx_lock);
 		printk(KERN_ERR PFX "%s: BUG! Tx Ring full when queue
awake!\n",
 		       dev->name);
 		return 1;
@@ -2635,7 +2665,7 @@
 	if (TX_BUFFS_AVAIL(tp) <= (MAX_SKB_FRAGS + 1))
 		netif_stop_queue(dev);
 
-	spin_unlock(&tp->tx_lock);
+	spin_unlock_irq(&tp->tx_lock);
 
 	dev->trans_start = jiffies;
 




More information about the Linux-PowerEdge mailing list