Bug 29282

Summary: Slow throughput on RTL8111E (r8169 driver), NOHZ: local_softirq_pending 08 errors
Product: Drivers Reporter: Edwin (utomatoe)
Component: NetworkAssignee: Francois Romieu (romieu)
Status: RESOLVED CODE_FIX    
Severity: normal CC: andyrtr, arthur.titeica, enrico.tagliavini, liquid.acid, romieu, stuffcorpse
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.35-2.6.37 Subsystem:
Regression: No Bisected commit-id:
Attachments: Fatal cast error

Description Edwin 2011-02-17 02:58:01 UTC
Using the r8169 driver in 2.6.35-2.6.37 I get the following errors when sending a large amount of data:

Feb  8 22:40:16 codis kernel: [47801.730128] NOHZ: local_softirq_pending 08
Feb  8 22:50:22 codis kernel: [48407.812624] r8169 0000:05:00.0: eth0: link up
Feb  8 22:50:22 codis kernel: [48407.812638] NOHZ: local_softirq_pending 08
Feb  8 22:58:13 codis kernel: [48878.640170] r8169 0000:05:00.0: eth0: link up
Feb  8 22:58:13 codis kernel: [48878.640182] NOHZ: local_softirq_pending 08
Feb  8 23:00:39 codis kernel: [49024.430358] r8169 0000:05:00.0: eth0: link up
Feb  8 23:00:39 codis kernel: [49024.430374] NOHZ: local_softirq_pending 08
Feb  8 23:03:00 codis kernel: [49165.080070] r8169 0000:05:00.0: eth0: link up

Tested 2 hosts, at about 300-400mb/s, no errors. On another host where I can hit 800-900mb/s with the realtek r8168 driver module.

With the 8169 module, with the above error messages, I can only average around 170mb/s or so.

r8169 0000:05:00.0: eth0: RTL8168b/8111b at 0xffffc90011824000, 20:cf:30:76:ff:15, XID 0c200000 IRQ 43

This is a R8111E chip on an Asus M4A89GTD Pro/USB3 board.

lspci -nn output:
05:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller [10ec:8168] (rev 06)

Was about to try this patch (http://www.spinics.net/lists/linux-net/msg17539.html), but found the Realtek driver worked as well. Will try it in a day or two and report back.
Comment 1 Edwin 2011-02-17 03:03:05 UTC
Oops, can someone reassign to Drivers->Network? I don't seem to have the privileges necessary to reassign.
Comment 2 Francois Romieu 2011-02-22 17:10:19 UTC
Created attachment 48682 [details]
Fatal cast error
Comment 3 Francois Romieu 2011-02-22 17:14:01 UTC
Reassigned.

The first patch above is not needed for your 8168 revision if
you want to try current kernel -git (as of -rc6) but it should
avoid collateral damages with innocent testers.

-- 
Ueimor
Comment 4 Tobias Jakobi 2011-03-20 23:52:43 UTC
I'm hitting a more serious issue here, but with the same output in kernel log.

kernel is 2.6.38 vanilla

network adapter:
Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 06)

Just now I was piping some data from one machine to this one through a ssh tunnel. The remote machine is also connected via GBit network, like this machine is.
Anyway, suddenly the system just instantly rebooted -- display went black (first I thought I did exhaust the battery) and soon the BIOS status appeared.
Comment 5 Tobias Jakobi 2011-03-20 23:55:40 UTC
Forgot to add that the affected system is a Fujitsu Lifebook A530.
Comment 6 Edwin 2011-03-28 13:13:35 UTC
This patch doesn't work against 2.6.35 (function being patched doesn't exist!), can you provide a patch for 2.6.35? I'll look into a more recent kernel when I have more time as well.

(In reply to comment #3)
> Reassigned.
> 
> The first patch above is not needed for your 8168 revision if
> you want to try current kernel -git (as of -rc6) but it should
> avoid collateral damages with innocent testers.
> 
> -- 
> Ueimor
Comment 7 Tobias Jakobi 2011-05-01 21:42:16 UTC
Still occurs with vanilla 2.6.38.4.
Comment 8 Tobias Jakobi 2011-05-01 23:29:30 UTC
And more triaging in this bug:
https://bugzilla.kernel.org/show_bug.cgi?id=32962
Comment 9 Francois Romieu 2011-05-04 19:05:59 UTC
This XID 0c200000 should be correctly recognized but the needed bits are in
David Miller's net-next tree.

Some related comments may be found in :

https://bugzilla.kernel.org/show_bug.cgi?id=34172

-- 
Ueimor
Comment 10 Francois Romieu 2012-06-18 22:37:08 UTC
It should not be possible to trigger local_softirq_pending warnings with
current 3.5-rc since 7dbb491878a2c51d372a8890fa45a8ff80358af1.

Please update status if needed.

-- 
Ueimor