Bug 54231 - r8169 driver regression caused by the commit aee77e4accbeb2c86b1d294cd84fec4a12dde3bd
Summary: r8169 driver regression caused by the commit aee77e4accbeb2c86b1d294cd84fec4a...
Status: RESOLVED CODE_FIX
Alias: None
Product: Networking
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Francois Romieu
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-02-22 16:32 UTC by Tomi Orava
Modified: 2013-04-16 21:55 UTC (History)
1 user (show)

See Also:
Kernel Version: 3.4.x 3.7.x 3.8.x
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments

Description Tomi Orava 2013-02-22 16:32:15 UTC
The commit aee77e4accbeb2c86b1d294cd84fec4a12dde3bd merged the unlimited TX DMA burst feature into the r8169 driver. However, the following was mentioned in the commit log:

    The Realtek-provided r8168 driver (v8.032.00) uses unlimited TX DMA burst too,
    except for CFG_METHOD_1 where the TX DMA burst is set to 512 bytes.
    CFG_METHOD_1 appears to be the oldest MAC version of "RTL8168B/8111B",
    i.e. RTL_GIGA_MAC_VER_11 in r8169. Not sure if this MAC version really needs
    the smaller burst limit, or if any other versions have similar requirements.

It seems that my Asus M4A78-EM has the following NIC:

03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 01)

03:00.0 0200: 10ec:8168 (rev 01)
	Subsystem: 1043:8385
	Flags: bus master, fast devsel, latency 0, IRQ 40
	I/O ports at e800 [size=256]
	Memory at febff000 (64-bit, non-prefetchable) [size=4K]
	Expansion ROM at febc0000 [disabled] [size=128K]
	Capabilities: [40] Power Management version 2
	Capabilities: [48] Vital Product Data
	Capabilities: [50] MSI: Enable+ Count=1/2 Maskable- 64bit+
	Capabilities: [60] Express Endpoint, MSI 00
	Capabilities: [84] Vendor Specific Information: Len=4c <?>
	Kernel driver in use: r8169

And this old HW really needs that the TX DMA bursts are limited to 512 or otherwise the NIC will die/hang completely in a few minutes with the right kind of traffic with the following error:

Feb 17 15:55:51 tatooine kernel: ------------[ cut here ]------------
Feb 17 15:55:51 tatooine kernel: WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0x191/0x250()
Feb 17 15:55:51 tatooine kernel: Hardware name: System Product Name
Feb 17 15:55:51 tatooine kernel: NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
Feb 17 15:55:51 tatooine kernel: ACPI: Invalid Power Resource to register!
Feb 17 15:55:51 tatooine kernel: Modules linked in: iptable_filter ip_tables pktcdvd ebtable_broute bridge stp llc ebtable_nat ebtable_filter ebtables x_tables ipv6 snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event ir_lirc_codec lirc_dev snd_seq ir_mce_kbd_decoder snd_timer rc_rc6_mce snd_seq_device kvm_amd snd kvm edac_core mceusb soundcore cdc_acm serio_raw k8temp rc_core joydev snd_page_alloc nouveau ttm drm_kms_helper drm agpgart fb fbdev hwmon i2c_algo_bit cfbcopyarea mxm_wmi video cfbimgblt sr_mod wmi cdrom sd_mod crc_t10dif cfbfillrect button raid0 r8169 mii multipath linear
Feb 17 15:55:51 tatooine kernel: Pid: 0, comm: swapper/1 Not tainted 3.7.8-tmo2 #1
Feb 17 15:55:51 tatooine kernel: Call Trace:
Feb 17 15:55:51 tatooine kernel: <IRQ>  [<ffffffff810373ec>] warn_slowpath_common+0x8c/0xc0
Feb 17 15:55:51 tatooine kernel: [<ffffffff810374c1>] warn_slowpath_fmt+0x41/0x50
Feb 17 15:55:51 tatooine kernel: [<ffffffff81051b56>] ? __queue_work+0x3b6/0x3f0
Feb 17 15:55:51 tatooine kernel: [<ffffffff81065a8f>] ? check_preempt_curr+0x5f/0xa0
Feb 17 15:55:51 tatooine kernel: [<ffffffff81495d01>] dev_watchdog+0x191/0x250
Feb 17 15:55:51 tatooine kernel: [<ffffffff81051b90>] ? __queue_work+0x3f0/0x3f0
Feb 17 15:55:51 tatooine kernel: [<ffffffff81495b70>] ? pfifo_fast_dequeue+0xe0/0xe0
Feb 17 15:55:51 tatooine kernel: [<ffffffff81495b70>] ? pfifo_fast_dequeue+0xe0/0xe0
Feb 17 15:55:51 tatooine kernel: [<ffffffff81045038>] call_timer_fn+0x88/0x150
Feb 17 15:55:51 tatooine kernel: [<ffffffff8104623e>] ? cascade+0x7e/0xa0
Feb 17 15:55:51 tatooine kernel: [<ffffffff81495b70>] ? pfifo_fast_dequeue+0xe0/0xe0
Feb 17 15:55:51 tatooine kernel: [<ffffffff81046498>] run_timer_softirq+0x238/0x280
Feb 17 15:55:51 tatooine kernel: [<ffffffff812b841c>] ? timerqueue_add+0x8c/0xb0
Feb 17 15:55:51 tatooine kernel: [<ffffffff8107e8d4>] ? ktime_get+0x64/0xd0
Feb 17 15:55:51 tatooine kernel: [<ffffffff8103ee51>] __do_softirq+0x111/0x240
Feb 17 15:55:51 tatooine kernel: [<ffffffff81085b2f>] ? tick_program_event+0x1f/0x30
Feb 17 15:55:51 tatooine kernel: [<ffffffff815617cc>] call_softirq+0x1c/0x30
Feb 17 15:55:51 tatooine kernel: [<ffffffff810043dc>] do_softirq+0x3c/0x80
Feb 17 15:55:51 tatooine kernel: [<ffffffff8103f084>] irq_exit+0x44/0xb0
Feb 17 15:55:51 tatooine kernel: [<ffffffff81022c3a>] smp_apic_timer_interrupt+0x8a/0xa0
Feb 17 15:55:51 tatooine kernel: [<ffffffff815611ca>] apic_timer_interrupt+0x6a/0x70
Feb 17 15:55:51 tatooine kernel: <EOI>  [<ffffffff81009f17>] ? default_idle+0xe7/0x1c0
Feb 17 15:55:51 tatooine kernel: [<ffffffff8100a46d>] amd_e400_idle+0xed/0x100
Feb 17 15:55:51 tatooine kernel: [<ffffffff8100ac76>] cpu_idle+0xc6/0xf0
Feb 17 15:55:51 tatooine kernel: [<ffffffff8154d2db>] start_secondary+0x1e0/0x1e7
Feb 17 15:55:51 tatooine kernel: ---[ end trace 9af7f31a2ae58d49 ]---
Feb 17 15:55:51 tatooine kernel: r8169 0000:02:00.0 eth0: link up

The following patch fixes the problem in my case:

--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -77,6 +77,7 @@
 static const int multicast_filter_limit = 32;
 
 #define MAX_READ_REQUEST_SHIFT 12
+#define TX_DMA_BURST_512        5       /* Maximum PCI burst, limited to 512 */
 #define TX_DMA_BURST   7       /* Maximum PCI burst, '7' is unlimited */
 #define InterFrameGap  0x03    /* 3 means InterFrameGap = the shortest one */
 
@@ -4406,8 +4407,14 @@ static void rtl_set_rx_tx_config_registers(struct rtl8169_private *tp)
        void __iomem *ioaddr = tp->mmio_addr;
 
        /* Set DMA burst size and Interframe Gap Time */
-       RTL_W32(TxConfig, (TX_DMA_BURST << TxDMAShift) |
-               (InterFrameGap << TxInterFrameGapShift));
+
+       if (tp->mac_version == RTL_GIGA_MAC_VER_11) {
+               RTL_W32(TxConfig, (TX_DMA_BURST_512 << TxDMAShift) |
+                       (InterFrameGap << TxInterFrameGapShift));
+       } else {
+               RTL_W32(TxConfig, (TX_DMA_BURST << TxDMAShift) |
+                       (InterFrameGap << TxInterFrameGapShift));
+       }
 }
 
 static void rtl_hw_start(struct net_device *dev)

It seems this fix is needed to all stable versions since the original commit so that this old HW version works properly.
Comment 1 Tomi Orava 2013-02-22 16:40:49 UTC
Although the r8169 has been working just fine on 3.4.31 for the past 5 days, it seems that I missed the second DMA Burst setting in the previous patch that should get fixed as well:

--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -77,6 +77,7 @@
 static const int multicast_filter_limit = 32;
 
 #define MAX_READ_REQUEST_SHIFT	12
+#define TX_DMA_BURST_512        5       /* Maximum PCI burst, limited to 512 */
 #define TX_DMA_BURST	7	/* Maximum PCI burst, '7' is unlimited */
 #define InterFrameGap	0x03	/* 3 means InterFrameGap = the shortest one */
 
@@ -4406,8 +4407,14 @@ static void rtl_set_rx_tx_config_registers(struct rtl8169_private *tp)
 	void __iomem *ioaddr = tp->mmio_addr;
 
 	/* Set DMA burst size and Interframe Gap Time */
-	RTL_W32(TxConfig, (TX_DMA_BURST << TxDMAShift) |
-		(InterFrameGap << TxInterFrameGapShift));
+
+	if (tp->mac_version == RTL_GIGA_MAC_VER_11) {
+		RTL_W32(TxConfig, (TX_DMA_BURST_512 << TxDMAShift) |
+			(InterFrameGap << TxInterFrameGapShift));
+	} else {
+		RTL_W32(TxConfig, (TX_DMA_BURST << TxDMAShift) |
+			(InterFrameGap << TxInterFrameGapShift));
+	}
 }
 
 static void rtl_hw_start(struct net_device *dev)
@@ -5148,8 +5155,13 @@ static void rtl_hw_start_8168(struct net_device *dev)
 
 	rtl_set_rx_mode(dev);
 
-	RTL_W32(TxConfig, (TX_DMA_BURST << TxDMAShift) |
-		(InterFrameGap << TxInterFrameGapShift));
+	if (tp->mac_version == RTL_GIGA_MAC_VER_11) {
+		RTL_W32(TxConfig, (TX_DMA_BURST_512 << TxDMAShift) |
+			(InterFrameGap << TxInterFrameGapShift));
+	} else {
+		RTL_W32(TxConfig, (TX_DMA_BURST << TxDMAShift) |
+			(InterFrameGap << TxInterFrameGapShift));
+	}
 
 	RTL_R8(IntrMask);
Comment 2 Francois Romieu 2013-04-16 21:55:31 UTC
It was diagnosed as a jumbo settings init bug and fixed in
faf1e7857a1b87cd8baf48c3e962142e21ad417c

-- 
Ueimor

Note You need to log in before you can comment on or make changes to this bug.