Bug 211659 - [REGRESSION] r8169 cannot restart phy after suspend
Summary: [REGRESSION] r8169 cannot restart phy after suspend
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Network (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: drivers_network@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-02-09 18:14 UTC by Armin Wolf
Modified: 2021-02-14 17:58 UTC (History)
2 users (show)

See Also:
Kernel Version: 5.11.0-rc6
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
dmesg after the first suspend on 5.11.0-rc6 (66.77 KB, text/plain)
2021-02-09 18:14 UTC, Armin Wolf
Details
dmesg after doing a second suspend with kernel 5.11.0-rc6 (75.69 KB, text/plain)
2021-02-09 18:15 UTC, Armin Wolf
Details
dmesg when reloading the module after suspend with kernel 5.11.0-rc6 (95.28 KB, text/plain)
2021-02-09 18:16 UTC, Armin Wolf
Details
dmesg with kernel 4.19.0-171 (partial freeze) (97.91 KB, text/plain)
2021-02-09 18:17 UTC, Armin Wolf
Details

Description Armin Wolf 2021-02-09 18:14:42 UTC
Created attachment 295165 [details]
dmesg after the first suspend on 5.11.0-rc6

When waking up from suspend, r8169 fails to restart the phy, preventing any form of networking until a complete reboot. This bug was introduced in commit

e80bd76fbf563cc7ed8c9e9f3bbcdf59b0897f69 r8169: work around power-saving bug on some chip versions

and could be reproduced with
- the latest net-next kernel (5.11.0-rc6)
- stable 4.19.0-171

but not with stable 4.19.0-160.

The bug occurs regularly when suspending the maschine, but sometimes everything works fine after suspend.
However on stable 4.19.0-171 when suspending without any LAN cable plugged in, the kernel does a partial freeze and needs to be restarted by the case switch.

cat /proc/version:
Linux version 5.11.0-rc6-net-next+ (wolf@MX-Linux-Intel) (gcc (Debian 8.3.0-6) 8.3.0, GNU ld (GNU Binutils for Debian) 2.31.1) #3 SMP Sat Feb 6 20:41:37 CET 2021

hostnamectl | grep "Operating System":
Operating System: Debian GNU/Linux 10 (buster)

lspci -nn:
00:00.0 Host bridge [0600]: Intel Corporation 2nd Generation Core Processor Family DRAM Controller [8086:0100] (rev 09)
00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200/2nd Generation Core Processor Family PCI Express Root Port [8086:0101] (rev 09)
00:02.0 Display controller [0380]: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller [8086:0102] (rev 09)
00:16.0 Communication controller [0780]: Intel Corporation 6 Series/C200 Series Chipset Family MEI Controller #1 [8086:1c3a] (rev 04)
00:1a.0 USB controller [0c03]: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 [8086:1c2d] (rev 05)
00:1b.0 Audio device [0403]: Intel Corporation 6 Series/C200 Series Chipset Family High Definition Audio Controller [8086:1c20] (rev 05)
00:1c.0 PCI bridge [0604]: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 [8086:1c10] (rev b5)
00:1c.2 PCI bridge [0604]: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 3 [8086:1c14] (rev b5)
00:1d.0 USB controller [0c03]: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 [8086:1c26] (rev 05)
00:1f.0 ISA bridge [0601]: Intel Corporation H61 Express Chipset Family LPC Controller [8086:1c5c] (rev 05)
00:1f.2 SATA controller [0106]: Intel Corporation 6 Series/C200 Series Chipset Family SATA AHCI Controller [8086:1c02] (rev 05)
00:1f.3 SMBus [0c05]: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller [8086:1c22] (rev 05)
01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Cedar [Radeon HD 7350/8350 / R5 220] [1002:68fa]
01:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Cedar HDMI Audio [Radeon HD 5400/6300/7300 Series] [1002:aa68]
03:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8101/2/6E PCI Express Fast/Gigabit Ethernet controller [10ec:8136] (rev 05)
Comment 1 Armin Wolf 2021-02-09 18:15:51 UTC
Created attachment 295167 [details]
dmesg after doing a second suspend with kernel 5.11.0-rc6
Comment 2 Armin Wolf 2021-02-09 18:16:33 UTC
Created attachment 295169 [details]
dmesg when reloading the module after suspend with kernel 5.11.0-rc6
Comment 3 Armin Wolf 2021-02-09 18:17:33 UTC
Created attachment 295171 [details]
dmesg with kernel 4.19.0-171 (partial freeze)
Comment 4 Heiner Kallweit 2021-02-09 19:27:34 UTC
Thanks for the report. Seems like the fix for some other chip version triggered a hw issue on RTL8105e. I can't reproduce the issue on RTL8168g.
Could you please test whether the following fixes the issue (patch applies up to 5.11, not on net-next).

diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 0d78408b4..e7a59dc5f 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -2208,6 +2208,7 @@ static void rtl_pll_power_down(struct rtl8169_private *tp)
 
 	switch (tp->mac_version) {
 	case RTL_GIGA_MAC_VER_25 ... RTL_GIGA_MAC_VER_26:
+	case RTL_GIGA_MAC_VER_29 ... RTL_GIGA_MAC_VER_30:
 	case RTL_GIGA_MAC_VER_32 ... RTL_GIGA_MAC_VER_33:
 	case RTL_GIGA_MAC_VER_37:
 	case RTL_GIGA_MAC_VER_39:
@@ -2235,6 +2236,7 @@ static void rtl_pll_power_up(struct rtl8169_private *tp)
 {
 	switch (tp->mac_version) {
 	case RTL_GIGA_MAC_VER_25 ... RTL_GIGA_MAC_VER_26:
+	case RTL_GIGA_MAC_VER_29 ... RTL_GIGA_MAC_VER_30:
 	case RTL_GIGA_MAC_VER_32 ... RTL_GIGA_MAC_VER_33:
 	case RTL_GIGA_MAC_VER_37:
 	case RTL_GIGA_MAC_VER_39:
-- 
2.30.0
Comment 5 Heiner Kallweit 2021-02-09 19:29:45 UTC
To avoid misunderstandings: net-next is the development version for 5.12. 5.11-rc isn't net-next.
Comment 6 Armin Wolf 2021-02-09 21:56:37 UTC
My fault, i should have called it net-next and not 5.11-rc6.
I will edit the original bug report to change that.
Comment 7 Armin Wolf 2021-02-09 22:00:31 UTC
I cant edit the original report, so just ignore the last line.
Comment 8 Heiner Kallweit 2021-02-10 12:05:04 UTC
On net-next you can test the following:

diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 04231585e..376dfd011 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -1252,6 +1252,7 @@ static void rtl_set_d3_pll_down(struct rtl8169_private *tp, bool enable)
 {
 	switch (tp->mac_version) {
 	case RTL_GIGA_MAC_VER_25 ... RTL_GIGA_MAC_VER_26:
+	case RTL_GIGA_MAC_VER_29 ... RTL_GIGA_MAC_VER_30:
 	case RTL_GIGA_MAC_VER_32 ... RTL_GIGA_MAC_VER_37:
 	case RTL_GIGA_MAC_VER_39 ... RTL_GIGA_MAC_VER_63:
 		if (enable)
-- 
2.30.1
Comment 9 Armin Wolf 2021-02-14 15:58:45 UTC
Ok, after some testing, i observed the following:
- the bug only appears after the computer has been disconnected from the power line for some time, after a reboot, the bug seems to be gone
- with the patched 5.11 kernel, the bug never appears

So i believe the patch solved the problem, thank you.
Comment 10 Heiner Kallweit 2021-02-14 16:28:33 UTC
Interesting, thanks for the feedback! Not sure which difference it can make for the NIC whether system runs on battery or AC. At a first glance this more sounds like a BIOS bug. But if the patch can avoid the issue, fine with me.
Comment 11 Armin Wolf 2021-02-14 17:58:24 UTC
Maybe it partly is, the nic also disappears from the PCIe bus without reboot=pci, which however is a BIOS bug.
I think the bug is resolved now.

Note You need to log in before you can comment on or make changes to this bug.