Bug 202275

Summary: Older r8169 family NIC no longer negotiates 1Gb/s links
Product: Drivers Reporter: Pier Luigi Pau (pigipau)
Component: NetworkAssignee: drivers_network (drivers_network)
Status: NEW ---    
Severity: normal CC: anandrajan, hkallweit1, mwg1812, rm+bko, svalchkov
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 4.19.12 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: dmesg output from 4.19.12, with module reloading at the end

Description Pier Luigi Pau 2019-01-15 10:33:24 UTC
On a 4.19+ kernel (at least since 4.19.12), the network interface card on my older desktop computer (circa 2009) no longer autonegotiates a 1Gb/s link with my gigabit switch. Networking is operational, but the NIC is limited to a 100Mb/s link. I have seen this behaviour while running a Debian package of a 4.19.12 kernel, as well as a 4.20.2 kernel compiled by myself.

Booting with 4.18.x kernels (tested up to 4.18.20), the same hardware is able to negotiate a 1Gbps/s link with no issue.

Relevant lines from the output of lspci -nn:

03:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 03)
04:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 03)

Output of ethtool eth0 on 4.18.20:

Settings for eth0:
	Supported ports: [ TP MII ]
	Supported link modes:   10baseT/Half 10baseT/Full 
	                        100baseT/Half 100baseT/Full 
	                        1000baseT/Half 1000baseT/Full 
	Supported pause frame use: No
	Supports auto-negotiation: Yes
	Advertised link modes:  10baseT/Half 10baseT/Full 
	                        100baseT/Half 100baseT/Full 
	                        1000baseT/Half 1000baseT/Full 
	Advertised pause frame use: Symmetric Receive-only
	Advertised auto-negotiation: Yes
	Link partner advertised link modes:  10baseT/Half 10baseT/Full 
	                                     100baseT/Half 100baseT/Full 
	                                     1000baseT/Full 
	Link partner advertised pause frame use: Symmetric
	Link partner advertised auto-negotiation: Yes
	Speed: 1000Mb/s
	Duplex: Full
	Port: MII
	PHYAD: 0
	Transceiver: internal
	Auto-negotiation: on
	Supports Wake-on: pumbg
	Wake-on: g
	Current message level: 0x00000033 (51)
			       drv probe ifdown ifup
	Link detected: yes

Output of ethtool eth0 on 4.19.12:

Settings for eth0:
	Supported ports: [ TP AUI BNC MII FIBRE ]
	Supported link modes:   100baseT/Full 
	Supported pause frame use: Symmetric Receive-only
	Supports auto-negotiation: No
	Advertised link modes:  100baseT/Full 
	Advertised pause frame use: Symmetric Receive-only
	Advertised auto-negotiation: No
	Link partner advertised link modes:  10baseT/Half 10baseT/Full 
	                                     100baseT/Half 100baseT/Full 
	Link partner advertised pause frame use: Symmetric
	Link partner advertised auto-negotiation: Yes
	Speed: 100Mb/s
	Duplex: Full
	Port: MII
	PHYAD: 0
	Transceiver: internal
	Auto-negotiation: on
	Supports Wake-on: pumbg
	Wake-on: g
	Current message level: 0x00000033 (51)
			       drv probe ifdown ifup
	Link detected: yes

I have no such problem on a different NIC from the same family, which I have on my laptop (lspci reports it similarly to the line above, but as "rev 0a").
Comment 1 Heiner Kallweit 2019-01-17 14:36:59 UTC
Could you please provide a full dmesg output?
And does unloading / reloading module r8169 fix the issue?
Comment 2 Pier Luigi Pau 2019-01-17 15:24:29 UTC
Created attachment 280563 [details]
dmesg output from 4.19.12, with module reloading at the end

The part after the 71 second mark happens when I unload/reload the module.
Comment 3 Pier Luigi Pau 2019-01-17 15:25:06 UTC
Thanks for your reply. Removing and reloading the r8169 module does indeed fix the issue (until reboot). For the record, the output of ethtool eth0 on 4.19.12 after reloading r8169 is also slightly different:

Settings for eth0:
	Supported ports: [ TP MII ]
	Supported link modes:   10baseT/Half 10baseT/Full 
	                        100baseT/Half 100baseT/Full 
	                        1000baseT/Half 1000baseT/Full 
	Supported pause frame use: Symmetric Receive-only
	Supports auto-negotiation: Yes
	Advertised link modes:  10baseT/Half 10baseT/Full 
	                        100baseT/Half 100baseT/Full 
	                        1000baseT/Half 1000baseT/Full 
	Advertised pause frame use: Symmetric Receive-only
	Advertised auto-negotiation: Yes
	Link partner advertised link modes:  10baseT/Half 10baseT/Full 
	                                     100baseT/Half 100baseT/Full 
	                                     1000baseT/Full 
	Link partner advertised pause frame use: Symmetric
	Link partner advertised auto-negotiation: Yes
	Speed: 1000Mb/s
	Duplex: Full
	Port: MII
	PHYAD: 0
	Transceiver: internal
	Auto-negotiation: on
	Supports Wake-on: pumbg
	Wake-on: g
	Current message level: 0x00000033 (51)
			       drv probe ifdown ifup
	Link detected: yes

I assume you need a dmesg output from a 4.19 kernel, so I attached one from a 4.19.12 kernel from Debian stretch-backports. The part after the 71 second mark happens when I reload the module.

Please let me know if you need more information.

Meanwhile, I have compiled a 4.19 (.0) kernel to test, and the NIC also links at 100Mb/s at boot with that kernel.
Comment 4 Heiner Kallweit 2019-01-17 17:14:31 UTC
Thanks a lot. Based on your information it seems to be a duplicate of the following issue, here a link to the fix. The fix didn't make it to 4.19/4.20 yet.

https://bugzilla.redhat.com/show_bug.cgi?id=1650984#c111
Comment 5 Pier Luigi Pau 2019-01-20 19:30:28 UTC
I have read the bug report you mentioned, the root cause looks similar, although in my case the network hasn't stopped working altogether, as it did for most users there. My NIC simply fails to negotiate a gigabit link and works at 100 Mbit/s. Also, there is no difference depending on whether I reboot from a 4.18 kernel or from a 4.19 kernel (at least thus far).

I have tried fixes and workarounds from that bug report, and none seems to work for me. More specifically, since the release of kernel 4.19.16, I have compiled that version of the kernel:
- unpatched
- with the patch from comment 62 applied to phy_device.c and the patch from comment 111 applied to r8169.c
- with both the aforementioned patches and the one from comment 80 applied to phy_device.c

with no changes, i.e. my network comes up at 100Mb/s at boot, and only after rmmod/modprobe it negotiates a 1Gb/s link. Even increasing the msleep period to 1000 in the patch at comment 80 does not help. Should I try increasing it further? Have I overlooked some other patch that is needed?

Would you need a dmesg output with the patches from comment 106?

For the record, concerning the part about /sys/class/net/eth0/phydev/phy_id from that thread:

$ cat /sys/class/net/eth0/phydev/phy_id
0xc1071002
$ sudo rmmod r8169
$ sudo modprobe r8169
$ cat /sys/class/net/eth0/phydev/phy_id
0x001cc912

$ sudo dmesg |grep XID
[    2.081995] r8169 0000:03:00.0 eth0: RTL8168d/8111d, 00:24:1d:12:f4:01, XID 281000c0, IRQ 24
[    2.083681] r8169 0000:04:00.0 eth1: RTL8168d/8111d, 00:24:1d:12:f3:c0, XID 281000c0, IRQ 25
[  512.113205] r8169 0000:03:00.0 eth0: RTL8168d/8111d, 00:24:1d:12:f4:01, XID 281000c0, IRQ 24
[  512.121506] r8169 0000:04:00.0 eth1: RTL8168d/8111d, 00:24:1d:12:f3:c0, XID 281000c0, IRQ 25

$ sudo dmesg |grep "attached PHY driver"
[   20.910251] Generic PHY r8169-300:00: attached PHY driver [Generic PHY] (mii_bus:phy_addr=r8169-300:00, irq=IGNORE)
[   21.070284] Generic PHY r8169-400:00: attached PHY driver [Generic PHY] (mii_bus:phy_addr=r8169-400:00, irq=IGNORE)
[  512.133451] RTL8211B Gigabit Ethernet r8169-400:00: attached PHY driver [RTL8211B Gigabit Ethernet] (mii_bus:phy_addr=r8169-400:00, irq=IGNORE)
[  512.289884] RTL8211B Gigabit Ethernet r8169-300:00: attached PHY driver [RTL8211B Gigabit Ethernet] (mii_bus:phy_addr=r8169-300:00, irq=IGNORE)
Comment 6 Heiner Kallweit 2019-01-20 20:06:42 UTC
Thanks for the feedback. The issue seems to be caused by a wrong PHY ID being read initially:

$ cat /sys/class/net/eth0/phydev/phy_id
0xc1071002
$ sudo rmmod r8169
$ sudo modprobe r8169
$ cat /sys/class/net/eth0/phydev/phy_id
0x001cc912

Then of course the PHY driver can't be loaded and strange effects may occur like in your case. I too have an older test system with this version of the network chip, but it behaves properly. So I don't think the root cause is the network chip or the driver.
It seems to be specific to this system, but it's hard to imagine what could cause a wrong PHY ID being read. Before 4.19 the PHY driver was hardcoded and phylib wasn't used, but removeing phylib from the driver isn't an option.
For now I think you have to go with the workaround to reload the r8169 module.
Comment 7 Pier Luigi Pau 2019-01-21 16:57:16 UTC
Thanks for your comment. This inspired me to experiment with BIOS settings, and I have found out that if I change the setting to enable the LAN Boot ROM (which was previously set to 'disabled'), the link is established about two seconds later during boot, but it is a 1Gb/s link the first time.

I don't expect any ill effects, but I am going to wait a day or two to see if there are any, just in case. Please let me know whether I should just mark this as resolved if there are none, or whether there is something that needs testing.
Comment 8 Michael G 2019-06-27 00:29:32 UTC
Has any progress been made with r8169?  I'm currently stuck at kernel 4.18.  Any newer version has a significant impact on my nic's (r8168) performance.  For examle, speed test via Google show:

Kernel 4.18

Download   Upload
887MB      947MB
922MB      945MB
921MB      939MB
917MB      943MB
924MB      942MB

Kernel 5.1.15

Download   Upload
135MB      928MB
141MB      947MB
144MB      931MB
137MB      949MB
147MB      943MB

As you can see there's quite a hit on downloads.  Downloading a new release of linux (doesn't matter which flavor) feels an awful like I'm on a dial-up connection rather than the gigabit fiber I have.

FYI:

michael@Nirvana:~$ lspci | egrep -i --color 'network|ethernet'
01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)
02:00.0 Network controller: Qualcomm Atheros QCA9377 802.11ac Wireless Network Adapter (rev 30)

Yes, I know Realtek's driver is not open source but I'm using the r8169 driver that's part of the kernel.  Something in the changes applied between kernel 4.18 and 4.19 is likely to be the culprit and no subsequent iteration of the kernel has corrected the problem.
Comment 9 Roman Mamedov 2019-07-20 17:57:26 UTC
@Michael: your issue seems unrelated to one in this bugreport, I suggest that you take a look at the following ones instead:
https://bugzilla.kernel.org/show_bug.cgi?id=202945
https://bugzilla.kernel.org/show_bug.cgi?id=203135