Bug 207205

Summary: r8169 stops working after a while.
Product: Drivers Reporter: anonymous (fepaw95099)
Component: NetworkAssignee: drivers_network (drivers_network)
Status: NEW ---    
Severity: normal CC: 1990konger, a.nolting, akef, camspam, hkallweit1, naveensix, wgh
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 5.4.31 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg-4.19.97-gentoo.log
dmesg-5.4.31-gentoo.log
On 5.4.32, r8169 stops working after a while.
ethtool -d enp2s0 before the internet stops working.
ethtool -d enp2s0 after the internet stops working
r8169 emits an error in dmesg when the internet stops working.
early dmesg error
dmesg log from power on to device failure
dmesg-5.14.21-aspm-off

Description anonymous 2020-04-11 22:46:02 UTC
I upgraded from 4.19.97 to 5.4.31.

With 5.4.31, r8169 stops working after a while.
Comment 1 anonymous 2020-04-11 22:46:37 UTC
Created attachment 288359 [details]
dmesg-4.19.97-gentoo.log

On 4.19.97, r8169 works fine.
Comment 2 anonymous 2020-04-11 22:47:03 UTC
Created attachment 288361 [details]
dmesg-5.4.31-gentoo.log

On 5.4.31, r8169 stops working after a while.
Comment 3 Heiner Kallweit 2020-04-13 16:05:14 UTC
Please re-test with 5.4.32.
Comment 4 anonymous 2020-04-14 06:07:47 UTC
Created attachment 288431 [details]
On 5.4.32, r8169 stops working after a while.
Comment 5 Heiner Kallweit 2020-04-14 06:18:51 UTC
Then it would be helpful if you could use "git bisect" to check which commit causes the issue on your system. Your chip version RTL8168f is not that common.
Comment 6 anonymous 2020-04-14 06:29:00 UTC
How can I do an effective bisect between 4.19 and 5.4? I don't know whether "git bisect" is possible between diverged git branches.
Comment 7 Heiner Kallweit 2020-04-14 07:28:22 UTC
You would have to clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git, and set v4.19.97 as last known good version and 5.4.31 as first known bad. Git knows how to deal with this.

In a first step you could also get the latest versions for 5.0-5.3 from here and check: https://cdn.kernel.org/pub/linux/kernel/v5.x/
Then we at least know between which two kernel versions it broke.
Comment 8 Heiner Kallweit 2020-04-14 08:23:15 UTC
Also you could briefly check whether switching off EEE (via ethtool)helps.
Enabling EEE per default is one of the few changes in RTL8168f-specific code.
Comment 9 anonymous 2020-04-16 01:27:30 UTC
After days of tedious work on git bisect, I found the bad commit.

commit 288ac524cf70a8e7ed851a61ed2a9744039dae8d (HEAD)
Author: Heiner Kallweit <hkallweit1@gmail.com>
Date:   Sat Mar 30 17:13:24 2019 +0100

    r8169: disable default rx interrupt coalescing on RTL8168

    It was reported that re-introducing ASPM, in combination with RX
    interrupt coalescing, results in significantly increased packet
    latency, see [0]. Disabling ASPM or RX interrupt coalescing fixes
    the issue. Therefore change the driver's default to disable RX
    interrupt coalescing. Users still have the option to enable RX
    coalescing via ethtool.

    [0] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=925496

    Fixes: a99790bf5c7f ("r8169: Reinstate ASPM Support")
    Reported-by: Mike Crowe <mac@mcrowe.com>
    Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
Comment 10 anonymous 2020-04-16 01:29:07 UTC
This might be related, too.

commit 583e6361414903c5206258a30e5bd88cb03c0254 (refs/bisect/bad)
Author: Aaro Koskinen <aaro.koskinen@nokia.com>
Date:   Wed Mar 27 22:35:35 2019 +0200

    net: stmmac: use correct DMA buffer size in the RX descriptor

    We always program the maximum DMA buffer size into the receive descriptor,
    although the allocated size may be less. E.g. with the default MTU size
    we allocate only 1536 bytes. If somebody sends us a bigger frame, then
    memory may get corrupted.

    Fix by using exact buffer sizes.

    Signed-off-by: Aaro Koskinen <aaro.koskinen@nokia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
Comment 11 Heiner Kallweit 2020-04-16 07:43:03 UTC
(In reply to crocket from comment #9)
> After days of tedious work on git bisect, I found the bad commit.
> 
> commit 288ac524cf70a8e7ed851a61ed2a9744039dae8d (HEAD)
> Author: Heiner Kallweit <hkallweit1@gmail.com>
> Date:   Sat Mar 30 17:13:24 2019 +0100
> 
>     r8169: disable default rx interrupt coalescing on RTL8168
> 
>     It was reported that re-introducing ASPM, in combination with RX
>     interrupt coalescing, results in significantly increased packet
>     latency, see [0]. Disabling ASPM or RX interrupt coalescing fixes
>     the issue. Therefore change the driver's default to disable RX
>     interrupt coalescing. Users still have the option to enable RX
>     coalescing via ethtool.
> 
>     [0] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=925496
> 
>     Fixes: a99790bf5c7f ("r8169: Reinstate ASPM Support")
>     Reported-by: Mike Crowe <mac@mcrowe.com>
>     Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>

Thanks for the analysis efforts. To verify the finding you could enable rx coalescing via ethtool and check whether issue persists.
Comment 12 Heiner Kallweit 2020-04-16 07:53:42 UTC
(In reply to crocket from comment #10)
> This might be related, too.
> 
> commit 583e6361414903c5206258a30e5bd88cb03c0254 (refs/bisect/bad)
> Author: Aaro Koskinen <aaro.koskinen@nokia.com>
> Date:   Wed Mar 27 22:35:35 2019 +0200
> 
>     net: stmmac: use correct DMA buffer size in the RX descriptor
> 
>     We always program the maximum DMA buffer size into the receive
> descriptor,
>     although the allocated size may be less. E.g. with the default MTU size
>     we allocate only 1536 bytes. If somebody sends us a bigger frame, then
>     memory may get corrupted.
> 
>     Fix by using exact buffer sizes.
> 
>     Signed-off-by: Aaro Koskinen <aaro.koskinen@nokia.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>

Why do you think it could be related? This commit is about a different network driver.
Comment 13 anonymous 2020-04-16 08:51:53 UTC
(In reply to Heiner Kallweit from comment #12)
> Why do you think it could be related? This commit is about a different
> network driver.

Because bisect/bad points to it.
Comment 14 Heiner Kallweit 2020-04-16 09:20:12 UTC
(In reply to crocket from comment #13)
> (In reply to Heiner Kallweit from comment #12)
> > Why do you think it could be related? This commit is about a different
> > network driver.
> 
> Because bisect/bad points to it.

Can happen that bisecting points to the wrong commit in cases where the issue isn't 100% reproducable within a certain period of time.

Maybe the same with the commit mentioned before (RX coalescing), as it's included in 4.19.97 too.

What you can, recapping what was stated earlier;
- test with latest versions of kernels 5.0 - 5.3
- Switch on RX coalescing via ethtool -C and check
- Disable EEE via ethtool and check
Comment 15 anonymous 2020-04-16 11:43:27 UTC
It turns out that r8169 broke for me at
288ac524cf70a8e7ed851a61ed2a9744039dae8d
r8169: disable default rx interrupt coalescing on RTL8168.

r8169 works fine on the previous commit which is 22bdf7d459ceff6eb06a99364b1d75ecb2fcafe5

I don't still know a workaround or a fix for this issue.

Somehow, executing "sudo ethtool --coalesce enp2s0 rx-usecs 200 rx-frames 4 tx-usecs 200 tx-frames 4" doesn't fix the issue on 288ac524cf70a8e7ed851a61ed2a9744039dae8d
r8169: disable default rx interrupt coalescing on RTL8168.
Comment 16 Heiner Kallweit 2020-04-16 18:54:07 UTC
OK, thanks. Could you please try the following patch on top of 5.4.32?
This flag shouldn't be needed, but several RTL8168 chip versions are quite quirky.


diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 870982426..17ba564ea 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -3041,6 +3041,11 @@ static void rtl_hw_start_8168f_1(struct rtl8169_private *tp)
 	rtl_ephy_init(tp, e_info_8168f_1);
 
 	rtl_w0w1_eri(tp, 0x0d4, ERIAR_MASK_0011, 0x0c00, 0xff00);
+
+	if (tp->mac_version == RTL_GIGA_MAC_VER_35) {
+		tp->cp_cmd |= PktCntrDisable;
+		RTL_W16(tp, CPlusCmd, tp->cp_cmd);
+	}
 }
 
 static void rtl_hw_start_8411(struct rtl8169_private *tp)
-- 
2.26.1
Comment 17 anonymous 2020-04-16 22:24:18 UTC
The patch didn't fix the issue.
Comment 18 anonymous 2020-04-17 05:51:29 UTC
After applying the patch, executing "sudo ethtool --coalesce enp2s0 rx-usecs 200 rx-frames 4 tx-usecs 200 tx-frames 4" fixes the issue. I don't know why it does.
Comment 19 Heiner Kallweit 2020-04-17 08:58:40 UTC
Uh, interesting. Let me check with Realtek whether they are aware of any related hw issue with RTL8168f (and maybe other chip versions).
Comment 20 Heiner Kallweit 2020-05-12 19:04:02 UTC
I received feedback from Realtek, however testing would be needed before a patch can be submitted. Would be great if you can check the following:

W/o the patch from comment 16, if you set the following, is the issue still there or gone?
sudo ethtool --coalesce enp2s0 rx-usecs 200 rx-frames 16 tx-usecs 200 tx-frames 16
Comment 21 anonymous 2020-05-13 13:05:13 UTC
Executing `sudo ethtool --coalesce enp2s0 rx-usecs 200 rx-frames 16 tx-usecs 200 tx-frames 16` on linux 5.4.40 doesn't fix the issue.
Comment 22 Heiner Kallweit 2020-05-13 20:31:52 UTC
Thanks for testing! OK, one (hopefully) final test:

Could you please apply the patch below (w/o the patch from comment 16) and then check these two configs:
sudo ethtool --coalesce enp2s0 rx-usecs 200 rx-frames 0 tx-usecs 0 tx-frames 0
sudo ethtool --coalesce enp2s0 rx-usecs 0 rx-frames 0 tx-usecs 200 tx-frames 0


diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index b030993a7..ba55c36fe 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -1872,6 +1872,14 @@ static int rtl_set_coalesce(struct net_device *dev, struct ethtool_coalesce *ec)
 
 	RTL_W16(tp, IntrMitigate, w);
 
+	if (rtl_is_8168evl_up(tp)) {
+		if (!rx_fr && !tx_fr)
+			/* disable packet counter */
+			tp->cp_cmd |= PktCntrDisable;
+		else
+			tp->cp_cmd &= ~PktCntrDisable;
+	}
+
 	tp->cp_cmd = (tp->cp_cmd & ~INTT_MASK) | cp01;
 	RTL_W16(tp, CPlusCmd, tp->cp_cmd);
 	rtl_pci_commit(tp);
-- 
2.26.2
Comment 23 anonymous 2020-05-14 12:13:14 UTC
I applied the patch to 5.4.40 and got the following error during compilation.

drivers/net/ethernet/realtek/r8169_main.c: In function ‘rtl_set_coalesce’:
drivers/net/ethernet/realtek/r8169_main.c:2026:8: error: ‘rx_fr’ undeclared (first use in this function)
 2026 |   if (!rx_fr && !tx_fr)
      |        ^~~~~
drivers/net/ethernet/realtek/r8169_main.c:2026:8: note: each undeclared identifier is reported only once for each function it appears in
drivers/net/ethernet/realtek/r8169_main.c:2026:18: error: ‘tx_fr’ undeclared (first use in this function)
 2026 |   if (!rx_fr && !tx_fr)
      |                  ^~~~~
Comment 24 Heiner Kallweit 2020-05-14 20:14:11 UTC
Sorry, forgot to mention this: The patch applies on top of linux-next.
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
Comment 25 anonymous 2020-05-15 09:55:08 UTC
Does zfs compile against linux-next? My root filesystem is on ZFS.
Comment 26 Heiner Kallweit 2020-05-15 17:18:57 UTC
(In reply to crocket from comment #25)
> Does zfs compile against linux-next? My root filesystem is on ZFS.

I can't really help you with that, just try and see what happens.
Comment 27 anonymous 2020-05-16 07:56:53 UTC
zfs-kmod is not supported by linux-next

 * ERROR: sys-fs/zfs-kmod-0.8.3::gentoo failed (setup phase):
 *   Linux 5.4 is the latest supported version

Do you have to test the patch against linux-next?
Comment 28 Heiner Kallweit 2020-05-16 17:03:34 UTC
The following applies on top of 5.4.41. It doesn't work with every RTL8168 chip version, but it's good enough for tests on your system.

diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 3bc6d1ef2..af8275257 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -1989,6 +1989,8 @@ static int rtl_set_coalesce(struct net_device *dev, struct ethtool_coalesce *ec)
 	if (IS_ERR(scale))
 		return PTR_ERR(scale);
 
+	tp->cp_cmd |= PktCntrDisable;
+
 	for (i = 0; i < 2; i++, p++) {
 		u32 units;
 
@@ -2012,6 +2014,9 @@ static int rtl_set_coalesce(struct net_device *dev, struct ethtool_coalesce *ec)
 		if (p->frames > RTL_COALESCE_FRAME_MAX || p->frames % 4)
 			return -EINVAL;
 
+		if (p->frames)
+			tp->cp_cmd &= ~PktCntrDisable;
+
 		w <<= RTL_COALESCE_SHIFT;
 		w |= units;
 		w <<= RTL_COALESCE_SHIFT;
-- 
2.26.2
Comment 29 anonymous 2020-05-18 04:43:05 UTC
The issue goes away if I execute `sudo ethtool --coalesce enp2s0 rx-usecs 0 rx-frames 0 tx-usecs 200 tx-frames 0` on the patched linux 5.4.41.
Comment 30 Heiner Kallweit 2020-05-18 20:29:05 UTC
Thanks again for testing! I think I'll submit the patch from comment 22 for net-next. Means it won't get backported to older kernel versions.
Reason is that we still don't know the root cause and whether the issue affects the RTL8168f chip version in general, or whether issue is specific to your system.
Comment 31 anonymous 2020-05-19 04:14:52 UTC
Older kernel versions like 5.4?
Comment 32 Heiner Kallweit 2020-05-19 11:01:20 UTC
No. Change will be available from 5.8. Reasoning as stated in my previous comment, apart from root cause still being unknown we don't know whether setting this bit might cause issues with other supported chip versions.
Comment 33 anonymous 2020-09-09 05:42:43 UTC
I'm now on 5.8.7. The issue seems to have been fixed on 5.8.7. What did you do?

> ethtool --show-coalesce enp2s0

Coalesce parameters for enp2s0:
Adaptive RX: n/a  TX: n/a
stats-block-usecs: n/a
sample-interval: n/a
pkt-rate-low: n/a
pkt-rate-high: n/a

rx-usecs: 0
rx-frames: 1
rx-usecs-irq: n/a
rx-frames-irq: n/a

tx-usecs: 0
tx-frames: 1
tx-usecs-irq: n/a
tx-frames-irq: n/a

rx-usecs-low: n/a
rx-frame-low: n/a
tx-usecs-low: n/a
tx-frame-low: n/a

rx-usecs-high: n/a
rx-frame-high: n/a
tx-usecs-high: n/a
tx-frame-high: n/a
Comment 34 anonymous 2020-09-10 02:53:02 UTC
I will test it for a few more days and report back.
Comment 35 anonymous 2020-09-11 02:20:48 UTC
Unfortunately, r8169 stops working after a while on linux 5.8.7.
Comment 36 Heiner Kallweit 2020-09-23 09:33:54 UTC
How is it if you enable irq coalescing like in comment 29?
Comment 37 anonymous 2020-09-23 11:20:28 UTC
After executing `sudo ethtool --coalesce enp2s0 rx-usecs 0 rx-frames 0 tx-usecs 200 tx-frames 0`

$ sudo ethtool --show-coalesce enp2s0
Coalesce parameters for enp2s0:
Adaptive RX: n/a  TX: n/a
stats-block-usecs: n/a
sample-interval: n/a
pkt-rate-low: n/a
pkt-rate-high: n/a

rx-usecs: 0
rx-frames: 1
rx-usecs-irq: n/a
rx-frames-irq: n/a

tx-usecs: 205
tx-frames: 0
tx-usecs-irq: n/a
tx-frames-irq: n/a

rx-usecs-low: n/a
rx-frame-low: n/a
tx-usecs-low: n/a
tx-frame-low: n/a

rx-usecs-high: n/a
rx-frame-high: n/a
tx-usecs-high: n/a
tx-frame-high: n/a

If I execute `sudo ethtool --coalesce enp2s0 rx-usecs 0 rx-frames 0 tx-usecs 200 tx-frames 0`, the internet works for a few minutes and stops working.

coalesce parameters keeps being reset to

Coalesce parameters for enp2s0:
Adaptive RX: n/a  TX: n/a
stats-block-usecs: n/a
sample-interval: n/a
pkt-rate-low: n/a
pkt-rate-high: n/a

rx-usecs: 0
rx-frames: 1
rx-usecs-irq: n/a
rx-frames-irq: n/a

tx-usecs: 0
tx-frames: 1
tx-usecs-irq: n/a
tx-frames-irq: n/a

rx-usecs-low: n/a
rx-frame-low: n/a
tx-usecs-low: n/a
tx-frame-low: n/a

rx-usecs-high: n/a
rx-frame-high: n/a
tx-usecs-high: n/a
tx-frame-high: n/a

After a while, the command cannot make the internet working again.
Comment 38 Heiner Kallweit 2020-09-25 07:05:06 UTC
Could you please test the following? It effectively reverts the change referred to in comment 9 for RTL8168f. With this draft patch it shouldn't be needed to manually adjust the coalesc parameters.


diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 4fb49fd0d..3bc51978e 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -2910,6 +2910,13 @@ static void rtl_hw_start_8168f_1(struct rtl8169_private *tp)
 	rtl_ephy_init(tp, e_info_8168f_1);
 
 	rtl_eri_set_bits(tp, 0x0d4, 0x1f00);
+
+	/* Tx timeouts have been reported w/o this special handling */
+	RTL_W16(tp, IntrMitigate, 0x5151);
+	tp->cp_cmd |= PktCntrDisable;
+	/* set irq coalescing scale to 40us at 1Gbps */
+	tp->cp_cmd = (tp->cp_cmd & ~INTT_MASK) | 0x0001;
+	RTL_W16(tp, CPlusCmd, tp->cp_cmd);
 }
 
 static void rtl_hw_start_8411(struct rtl8169_private *tp)
@@ -3745,10 +3752,10 @@ static void rtl_hw_start_8168(struct rtl8169_private *tp)
 	else
 		RTL_W8(tp, MaxTxPacketSize, TxPacketMax);
 
-	rtl_hw_config(tp);
-
 	/* disable interrupt coalescing */
 	RTL_W16(tp, IntrMitigate, 0x0000);
+
+	rtl_hw_config(tp);
 }
 
 static void rtl_hw_start_8169(struct rtl8169_private *tp)
-- 
2.28.0
Comment 39 anonymous 2020-09-26 02:21:41 UTC
I applied the patch and compiled 5.8.7. r8169 stops working after a few minutes anyway.
Comment 40 Heiner Kallweit 2020-09-27 16:42:06 UTC
To check something, could you please attach the output of ethtool -d enp2s0?
Comment 41 Heiner Kallweit 2020-09-27 17:19:33 UTC
The EPHY config for RTL8168f differs a little bit from the r8168 vendor driver. Not sure whether this can be related to the issue here. Just to be on the safe side, can you test the following patch?

diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 9e4e6a883..02beefc21 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -2901,7 +2901,7 @@ static void rtl_hw_start_8168f_1(struct rtl8169_private *tp)
 		{ 0x08, 0x0001,	0x0002 },
 		{ 0x09, 0x0000,	0x0080 },
 		{ 0x19, 0x0000,	0x0224 },
-		{ 0x00, 0x0000,	0x0004 },
+		{ 0x00, 0x0000,	0x0008 },
 		{ 0x0c, 0x3df0,	0x0200 },
 	};
 
-- 
2.28.0
Comment 42 anonymous 2020-09-29 06:53:40 UTC
I upgraded to linux 5.8.12.

On linux 5.8.12, r8159 seems to work without any patch.

Just to make sure, I will report back in a few days. I need a few days to make sure that it really is fixed.
Comment 43 Heiner Kallweit 2020-09-29 07:02:02 UTC
Good if the issue actually has been gone. I just wonder what could have caused this, because there was no change in r8169 driver from 5.8.7 to 5.8.12.
Comment 44 anonymous 2020-10-02 02:24:30 UTC
I tested for hours and days. The internet stops working in an hour or two on 5.8.12 patched by gentoo linux.

If I apply patches on https://bugzilla.kernel.org/show_bug.cgi?id=207205#c41 and https://bugzilla.kernel.org/show_bug.cgi?id=207205#c38 to gentoo's 5.8.12, the internet stops working quickly.
Comment 45 anonymous 2020-10-03 10:34:26 UTC
Created attachment 292783 [details]
ethtool -d enp2s0 before the internet stops working.
Comment 46 anonymous 2020-10-03 10:35:51 UTC
Created attachment 292785 [details]
ethtool -d enp2s0 after the internet stops working
Comment 47 anonymous 2020-10-03 10:36:22 UTC
Created attachment 292787 [details]
r8169 emits an error in dmesg when the internet stops working.
Comment 48 Heiner Kallweit 2020-10-05 08:37:23 UTC
After checking the RTL8168f firmware there might be an issue due to PHY reset still being in progress on firmware load exit. Could you please test whether the following patch makes a difference?


diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 0fa99298a..9afd1ef57 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -2055,11 +2055,18 @@ static void rtl_release_firmware(struct rtl8169_private *tp)
 
 void r8169_apply_firmware(struct rtl8169_private *tp)
 {
+	int val;
+
 	/* TODO: release firmware if rtl_fw_write_firmware signals failure. */
 	if (tp->rtl_fw) {
 		rtl_fw_write_firmware(tp, tp->rtl_fw);
 		/* At least one firmware doesn't reset tp->ocp_base. */
 		tp->ocp_base = OCP_STD_PHY_BASE;
+
+		/* PHY soft reset may still be in progress */
+		phy_read_poll_timeout(tp->phydev, MII_BMCR, val,
+				      !(val & BMCR_RESET),
+				      50000, 600000, true);
 	}
 }
 
-- 
2.28.0
Comment 49 anonymous 2020-10-06 06:15:43 UTC
Created attachment 292837 [details]
early dmesg error

I see this error in dmesg early. But, it takes an hour or two after the error for the internet to stop working.
Comment 50 anonymous 2020-10-06 06:17:20 UTC
This is what I see on linux-4.19.146

[    2.704174] r8169 0000:02:00.0 eth1: RTL8168f/8111f, 08:62:66:2c:b6:1f, XID 48000800, IRQ 51
[   15.214431] RTL8211E Gigabit Ethernet r8169-200:00: attached PHY driver [RTL8211E Gigabit Ethernet] (mii_bus:phy_addr=r8169-200:00, irq=IGNORE)
Comment 51 Heiner Kallweit 2020-10-06 11:07:39 UTC
Sometimes also order of register writes is relevant. Can you test the following in combination with the patch in comment 48?
Unfortunatly Realtek doesn't release any chip and errata documentation, therefore only trial&error is possible here.


diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 9afd1ef57..880666e8d 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -2908,6 +2908,13 @@ static void rtl_hw_start_8168f_1(struct rtl8169_private *tp)
 		{ 0x0c, 0x3df0,	0x0200 },
 	};
 
+	/* Tx timeouts have been reported w/o this special handling */
+	RTL_W16(tp, IntrMitigate, 0x5151);
+	tp->cp_cmd |= PktCntrDisable;
+	/* set irq coalescing scale to 40us at 1Gbps */
+	tp->cp_cmd = (tp->cp_cmd & ~INTT_MASK) | 0x0001;
+	RTL_W16(tp, CPlusCmd, tp->cp_cmd);
+
 	rtl_hw_start_8168f(tp);
 
 	rtl_ephy_init(tp, e_info_8168f_1);
@@ -3748,10 +3755,10 @@ static void rtl_hw_start_8168(struct rtl8169_private *tp)
 	else
 		RTL_W8(tp, MaxTxPacketSize, TxPacketMax);
 
-	rtl_hw_config(tp);
-
 	/* disable interrupt coalescing */
 	RTL_W16(tp, IntrMitigate, 0x0000);
+
+	rtl_hw_config(tp);
 }
 
 static void rtl_hw_start_8169(struct rtl8169_private *tp)
-- 
2.28.0
Comment 52 anonymous 2020-10-06 11:19:39 UTC
Do you recommend any other ethernet adaptor company that releases specifications? I don't like proprietary products.
Comment 53 anonymous 2020-10-07 01:42:36 UTC
I applied the last two patches. I don't see an error on dmesg, but the internet still stops working after a while.
Comment 54 Heiner Kallweit 2020-10-07 08:48:46 UTC
The r8168 vendor driver has some additional magic in the chip configuration. Could you please apply the following on top of the other patches and re-test?


diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 0fa99298a..0c3d61077 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -1317,10 +1317,10 @@ static void rtl_link_chg_patch(struct rtl8169_private *tp)
 		   tp->mac_version == RTL_GIGA_MAC_VER_36) {
 		if (phydev->speed == SPEED_1000) {
 			rtl_eri_write(tp, 0x1bc, ERIAR_MASK_1111, 0x00000011);
-			rtl_eri_write(tp, 0x1dc, ERIAR_MASK_1111, 0x00000005);
+			rtl_eri_write(tp, 0x1dc, ERIAR_MASK_1111, 0x0000001f);
 		} else {
 			rtl_eri_write(tp, 0x1bc, ERIAR_MASK_1111, 0x0000001f);
-			rtl_eri_write(tp, 0x1dc, ERIAR_MASK_1111, 0x0000003f);
+			rtl_eri_write(tp, 0x1dc, ERIAR_MASK_1111, 0x0000001f);
 		}
 	} else if (tp->mac_version == RTL_GIGA_MAC_VER_37) {
 		if (phydev->speed == SPEED_10) {
diff --git a/drivers/net/ethernet/realtek/r8169_phy_config.c b/drivers/net/ethernet/realtek/r8169_phy_config.c
index 913d030d7..aa79e97be 100644
--- a/drivers/net/ethernet/realtek/r8169_phy_config.c
+++ b/drivers/net/ethernet/realtek/r8169_phy_config.c
@@ -714,6 +714,14 @@ static void rtl8168f_hw_phy_config(struct rtl8169_private *tp,
 	/* Improve 10M EEE waveform */
 	r8168d_phy_param(phydev, 0x8b86, 0x0000, 0x0001);
 
+	r8168d_phy_param(phydev, 0x8b54, BIT(11), 0);
+	r8168d_phy_param(phydev, 0x8b5d, BIT(11), 0);
+	r8168d_phy_param(phydev, 0x8a7c, BIT(8), 0);
+	r8168d_phy_param(phydev, 0x8a7f, 0, BIT(8));
+	r8168d_phy_param(phydev, 0x8a82, BIT(8), 0);
+	r8168d_phy_param(phydev, 0x8a85, BIT(8), 0);
+	r8168d_phy_param(phydev, 0x8a88, BIT(8), 0);
+
 	rtl8168f_config_eee_phy(phydev);
 }
 
-- 
2.28.0
Comment 55 anonymous 2020-10-08 01:06:01 UTC
No error message on dmesg. The internet stops working soon.

Coalesce parameters for enp2s0:
Adaptive RX: n/a  TX: n/a
stats-block-usecs: n/a
sample-interval: n/a
pkt-rate-low: n/a
pkt-rate-high: n/a

rx-usecs: 103
rx-frames: 4
rx-usecs-irq: n/a
rx-frames-irq: n/a

tx-usecs: 103
tx-frames: 4
tx-usecs-irq: n/a
tx-frames-irq: n/a

rx-usecs-low: n/a
rx-frame-low: n/a
tx-usecs-low: n/a
tx-frame-low: n/a

rx-usecs-high: n/a
rx-frame-high: n/a
tx-usecs-high: n/a
tx-frame-high: n/a

[    2.191101] r8169 0000:02:00.0: can't disable ASPM; OS doesn't have ASPM control
[    2.192395] libphy: r8169: probed
[    2.192714] r8169 0000:02:00.0 eth0: RTL8168f/8111f, 08:62:66:2c:b6:1f, XID 480, IRQ 52
[    2.192716] r8169 0000:02:00.0 eth0: jumbo features [frames: 9194 bytes, tx checksumming: ko]
[    5.263531] r8169 0000:02:00.0 enp2s0: renamed from eth0
[   15.398044] RTL8211E Gigabit Ethernet r8169-200:00: attached PHY driver [RTL8211E Gigabit Ethernet] (mii_bus:phy_addr=r8169-200:00, irq=IGNORE)
[   15.613734] r8169 0000:02:00.0 enp2s0: Link is Down
[   29.925768] r8169 0000:02:00.0 enp2s0: Link is Up - 100Mbps/Full - flow control rx/tx
Comment 56 Heiner Kallweit 2020-10-08 06:09:23 UTC
Link speed is 100Mbps only, didn't you have 1Gbps before?
Did you ever try with different cable and switch brand?
Comment 57 anonymous 2020-10-09 00:03:03 UTC
The current ethernet  is going to be limited to 100Mbps until I change the cable behind the wall. I don't know how to do that, yet.
Comment 58 anonymous 2020-10-09 00:03:22 UTC
The current ethernet -> The current ethernet wall socket
Comment 59 anonymous 2020-10-12 07:58:32 UTC
r8169 doesn't work well on 5.8.14
Comment 60 Heiner Kallweit 2020-10-12 08:24:17 UTC
Maybe it's a board-specific issue. Best test r8168 vendor driver and if it works go with this one.
Comment 61 anonymous 2020-10-12 22:27:16 UTC
Do you know any ethernet adaptor manufacturer that releases specifications?
Comment 62 Heiner Kallweit 2020-10-13 05:47:58 UTC
Most likely on consumer PCIe network cards, apart from Realtek, you will get only Intel chips. Not sure whether they release specs (at least under NDA), but they actively maintain their in-kernel drivers.
Comment 63 anonymous 2020-10-13 07:41:39 UTC
I also have an intel PCIe ethernet adaptor, but I haven't seen intel ethernet adaptors in motherboards.

I will test r8168 and come back.
Comment 64 anonymous 2020-10-18 10:57:51 UTC
I tested r8168 for 2 days. It works without an issue. But, `ethtool --show-coalesce` is not supported by r8168.
Comment 65 Heiner Kallweit 2020-10-18 11:34:45 UTC
Good to hear that r8168 works for you. And right, r8168 sets fixed coalesce settings (RTL_W16(tp, IntrMitigate, 0x5f51)). This translates to:

tx-frames: 60
tx-usecs: 200 (100 at 100Mbps)
rx-frames: 4
rx-usesc: 200 (100 at 100Mbps)
Comment 66 jssblngr 2020-10-28 16:48:15 UTC
I'm experiencing a related issue involving the r8169 driver.

I've no network issues with kernel 5.3.7.  However, testing it with any more recent kernels causes connection issues.

The network comes up fine if rebooting from windows into linux when windows has an active ethernet connection.  However, upon rebooting linux, the driver fails to load correctly.

I'm an intermediate user at best, but please let me know if I can somehow help in isolating this issue.

The only way I've been able to 'fix' the issue is by installing the 8168 driver, blacklisting the 8169, AND switching Network Manager for wicked.
Comment 67 jssblngr 2020-10-28 16:49:14 UTC
Also, my hardware is a Gigabyte AB350M-DS3H motherboard w/ the onboard Realtek adapter.
Comment 68 Heiner Kallweit 2020-10-28 17:46:30 UTC
(In reply to jssblngr from comment #66)
> I'm experiencing a related issue involving the r8169 driver.
> 
> I've no network issues with kernel 5.3.7.  However, testing it with any more
> recent kernels causes connection issues.
> 
What is the exact issue? Link loss? Tx timeout? Dropped packets? ..

> The network comes up fine if rebooting from windows into linux when windows
> has an active ethernet connection.  However, upon rebooting linux, the
> driver fails to load correctly.
> 
What is the exact error?
Best attach a full dmesg log.

> I'm an intermediate user at best, but please let me know if I can somehow
> help in isolating this issue.
> 
Best bisect the issue to identify the offending commit.
(see any tutorial for git bisect)

> The only way I've been able to 'fix' the issue is by installing the 8168
> driver, blacklisting the 8169, AND switching Network Manager for wicked.

To rule out network manager issues, best test w/o one.
Comment 69 jssblngr 2020-10-28 18:08:53 UTC
(In reply to Heiner Kallweit from comment #68)
> What is the exact issue? Link loss? Tx timeout? Dropped packets? ..
There is no link established upon reboot.  Unable to obtain a connection, even if unplugging/replugging ethernet, or reloading the device.  It will begin working after 

> What is the exact error?
> Best attach a full dmesg log.
(Not the full log due to comment length restrictions, but this includes from the first mention of 8169 until the end of the log)
[    9.045362] ccp 0000:0a:00.2: ccp enabled
[    9.046480] libphy: r8169: probed
[    9.046664] r8169 0000:05:00.0 eth0: RTL8168g/8111g, e0:d5:5e:6c:87:18, XID 4c0, IRQ 58
[    9.046666] r8169 0000:05:00.0 eth0: jumbo features [frames: 9194 bytes, tx checksumming: ko]
[    9.055911] ccp 0000:0a:00.2: tee enabled
[    9.055914] ccp 0000:0a:00.2: psp enabled
[    9.084526] nvme nvme0: pci function 0000:09:00.0
[    9.096174] nvme nvme0: 7/0/0 default/read/poll queues
[    9.100910]  nvme0n1: p1 p2
[    9.167438] AMD-Vi: AMD IOMMUv2 driver by Joerg Roedel <jroedel@suse.de>
[    9.167439] AMD-Vi: AMD IOMMUv2 functionality not available on this system
[    9.266378] r8169 0000:05:00.0 enp5s0: renamed from eth0
[    9.268725] usb-storage 1-4:1.0: USB Mass Storage device detected
[    9.288155] scsi host9: usb-storage 1-4:1.0
[    9.288235] usbcore: registered new interface driver usb-storage
[    9.297073] usbcore: registered new interface driver uas
[    9.607293] [drm] amdgpu kernel modesetting enabled.
[    9.610033] amdgpu: Ignoring ACPI CRAT on non-APU system
[    9.610036] Virtual CRAT table created for CPU
[    9.610044] amdgpu: Topology: Add CPU node
[    9.610113] amdgpu 0000:01:00.0: vgaarb: deactivate vga console
[    9.611570] Console: switching to colour dummy device 80x25
[    9.611690] [drm] initializing kernel modesetting (POLARIS10 0x1002:0x67DF 0x1458:0x22F1 0xE7).
[    9.611692] amdgpu 0000:01:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
[    9.611704] [drm] register mmio base: 0xFCF00000
[    9.611704] [drm] register mmio size: 262144
[    9.611714] [drm] add ip block number 0 <vi_common>
[    9.611714] [drm] add ip block number 1 <gmc_v8_0>
[    9.611715] [drm] add ip block number 2 <tonga_ih>
[    9.611716] [drm] add ip block number 3 <gfx_v8_0>
[    9.611716] [drm] add ip block number 4 <sdma_v3_0>
[    9.611717] [drm] add ip block number 5 <powerplay>
[    9.611718] [drm] add ip block number 6 <dm>
[    9.611718] [drm] add ip block number 7 <uvd_v6_0>
[    9.611719] [drm] add ip block number 8 <vce_v3_0>
[    9.611898] amdgpu 0000:01:00.0: No more image in the PCI ROM
[    9.611919] amdgpu: ATOM BIOS: xxx-xxx-xxx
[    9.611936] [drm] UVD is enabled in VM mode
[    9.611937] [drm] UVD ENC is enabled in VM mode
[    9.611939] [drm] VCE enabled in VM mode
[    9.611960] [drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment size is 9-bit
[    9.611994] amdgpu 0000:01:00.0: amdgpu: VRAM: 8192M 0x000000F400000000 - 0x000000F5FFFFFFFF (8192M used)
[    9.611995] amdgpu 0000:01:00.0: amdgpu: GART: 256M 0x000000FF00000000 - 0x000000FF0FFFFFFF
[    9.611998] [drm] Detected VRAM RAM=8192M, BAR=256M
[    9.611999] [drm] RAM width 256bits GDDR5
[    9.615039] [TTM] Zone  kernel: Available graphics memory: 8165702 KiB
[    9.615040] [TTM] Zone   dma32: Available graphics memory: 2097152 KiB
[    9.615040] [TTM] Initializing pool allocator
[    9.615045] [TTM] Initializing DMA pool allocator
[    9.615075] [drm] amdgpu: 8192M of VRAM memory ready
[    9.615077] [drm] amdgpu: 8192M of GTT memory ready.
[    9.615078] [drm] GART: num cpu pages 65536, num gpu pages 65536
[    9.618806] [drm] PCIE GART of 256M enabled (table at 0x000000F400900000).
[    9.618918] [drm] Chained IB support enabled!
[    9.653045] amdgpu: hwmgr_sw_init smu backed is polaris10_smu
[    9.653122] [drm] Found UVD firmware Version: 1.130 Family ID: 16
[    9.663073] [drm] Found VCE firmware Version: 53.26 Binary ID: 3
[    9.762927] [drm] DM_PPLIB: values for Engine clock
[    9.762928] [drm] DM_PPLIB:	 300000
[    9.762928] [drm] DM_PPLIB:	 600000
[    9.762929] [drm] DM_PPLIB:	 900000
[    9.762929] [drm] DM_PPLIB:	 1145000
[    9.762929] [drm] DM_PPLIB:	 1215000
[    9.762929] [drm] DM_PPLIB:	 1257000
[    9.762930] [drm] DM_PPLIB:	 1300000
[    9.762930] [drm] DM_PPLIB:	 1365000
[    9.762930] [drm] DM_PPLIB: Validation clocks:
[    9.762931] [drm] DM_PPLIB:    engine_max_clock: 136500
[    9.762931] [drm] DM_PPLIB:    memory_max_clock: 200000
[    9.762931] [drm] DM_PPLIB:    level           : 8
[    9.762932] [drm] DM_PPLIB: values for Memory clock
[    9.762932] [drm] DM_PPLIB:	 300000
[    9.762932] [drm] DM_PPLIB:	 1000000
[    9.762933] [drm] DM_PPLIB:	 2000000
[    9.762933] [drm] DM_PPLIB: Validation clocks:
[    9.762933] [drm] DM_PPLIB:    engine_max_clock: 136500
[    9.762933] [drm] DM_PPLIB:    memory_max_clock: 200000
[    9.762934] [drm] DM_PPLIB:    level           : 8
[    9.763090] [drm] Display Core initialized with v3.2.84!
[    9.795036] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    9.830311] [drm] UVD and UVD ENC initialized successfully.
[    9.956165] [drm] VCE initialized successfully.
[    9.956706] kfd kfd: Allocated 3969056 bytes on gart
[    9.959337] Virtual CRAT table created for GPU
[    9.959401] amdgpu: Topology: Add dGPU node [0x67df:0x1002]
[    9.959403] kfd kfd: added device 1002:67df
[    9.959405] amdgpu 0000:01:00.0: amdgpu: SE 4, SH per SE 1, CU per SH 9, active_cu_number 36
[    9.960949] [drm] fb mappable at 0xE0E30000
[    9.960950] [drm] vram apper at 0xE0000000
[    9.960950] [drm] size 8294400
[    9.960951] [drm] fb depth is 24
[    9.960951] [drm]    pitch is 7680
[    9.961119] fbcon: amdgpudrmfb (fb0) is primary device
[   10.048543] Console: switching to colour frame buffer device 240x67
[   10.060398] amdgpu 0000:01:00.0: fb0: amdgpudrmfb frame buffer device
[   10.081417] [drm] Initialized amdgpu 3.38.0 20150101 for 0000:01:00.0 on minor 0
[   10.331453] scsi 9:0:0:0: Direct-Access     General  UDisk            5.00 PQ: 0 ANSI: 2
[   10.331612] sd 9:0:0:0: Attached scsi generic sg1 type 0
[   10.333396] sd 9:0:0:0: [sdb] 15728640 512-byte logical blocks: (8.05 GB/7.50 GiB)
[   10.333526] sd 9:0:0:0: [sdb] Write Protect is off
[   10.333527] sd 9:0:0:0: [sdb] Mode Sense: 0b 00 00 08
[   10.333649] sd 9:0:0:0: [sdb] No Caching mode page found
[   10.333650] sd 9:0:0:0: [sdb] Assuming drive cache: write through
[   10.347707] PM: Image not found (code -22)
[   10.388197]  sdb: sdb1
[   10.389076] sd 9:0:0:0: [sdb] Attached SCSI removable disk
[   10.571611] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: (null)
[   10.852810] systemd-journald[238]: Received SIGTERM from PID 1 (systemd).
[   10.869293] printk: systemd: 16 output lines suppressed due to ratelimiting
[   10.994639] SELinux:  Permission watch in class filesystem not defined in policy.
[   10.994646] SELinux:  Permission watch in class file not defined in policy.
[   10.994646] SELinux:  Permission watch_mount in class file not defined in policy.
[   10.994647] SELinux:  Permission watch_sb in class file not defined in policy.
[   10.994648] SELinux:  Permission watch_with_perm in class file not defined in policy.
[   10.994648] SELinux:  Permission watch_reads in class file not defined in policy.
[   10.994651] SELinux:  Permission watch in class dir not defined in policy.
[   10.994651] SELinux:  Permission watch_mount in class dir not defined in policy.
[   10.994652] SELinux:  Permission watch_sb in class dir not defined in policy.
[   10.994652] SELinux:  Permission watch_with_perm in class dir not defined in policy.
[   10.994653] SELinux:  Permission watch_reads in class dir not defined in policy.
[   10.994657] SELinux:  Permission watch in class lnk_file not defined in policy.
[   10.994657] SELinux:  Permission watch_mount in class lnk_file not defined in policy.
[   10.994658] SELinux:  Permission watch_sb in class lnk_file not defined in policy.
[   10.994658] SELinux:  Permission watch_with_perm in class lnk_file not defined in policy.
[   10.994659] SELinux:  Permission watch_reads in class lnk_file not defined in policy.
[   10.994661] SELinux:  Permission watch in class chr_file not defined in policy.
[   10.994662] SELinux:  Permission watch_mount in class chr_file not defined in policy.
[   10.994662] SELinux:  Permission watch_sb in class chr_file not defined in policy.
[   10.994663] SELinux:  Permission watch_with_perm in class chr_file not defined in policy.
[   10.994663] SELinux:  Permission watch_reads in class chr_file not defined in policy.
[   10.994665] SELinux:  Permission watch in class blk_file not defined in policy.
[   10.994666] SELinux:  Permission watch_mount in class blk_file not defined in policy.
[   10.994666] SELinux:  Permission watch_sb in class blk_file not defined in policy.
[   10.994667] SELinux:  Permission watch_with_perm in class blk_file not defined in policy.
[   10.994667] SELinux:  Permission watch_reads in class blk_file not defined in policy.
[   10.994669] SELinux:  Permission watch in class sock_file not defined in policy.
[   10.994670] SELinux:  Permission watch_mount in class sock_file not defined in policy.
[   10.994670] SELinux:  Permission watch_sb in class sock_file not defined in policy.
[   10.994671] SELinux:  Permission watch_with_perm in class sock_file not defined in policy.
[   10.994671] SELinux:  Permission watch_reads in class sock_file not defined in policy.
[   10.994673] SELinux:  Permission watch in class fifo_file not defined in policy.
[   10.994674] SELinux:  Permission watch_mount in class fifo_file not defined in policy.
[   10.994674] SELinux:  Permission watch_sb in class fifo_file not defined in policy.
[   10.994674] SELinux:  Permission watch_with_perm in class fifo_file not defined in policy.
[   10.994675] SELinux:  Permission watch_reads in class fifo_file not defined in policy.
[   10.994732] SELinux:  Permission perfmon in class capability2 not defined in policy.
[   10.994733] SELinux:  Permission bpf in class capability2 not defined in policy.
[   10.994739] SELinux:  Permission perfmon in class cap2_userns not defined in policy.
[   10.994740] SELinux:  Permission bpf in class cap2_userns not defined in policy.
[   10.994783] SELinux:  Class perf_event not defined in policy.
[   10.994783] SELinux:  Class lockdown not defined in policy.
[   10.994783] SELinux: the above unknown classes and permissions will be allowed
[   10.994788] SELinux:  policy capability network_peer_controls=1
[   10.994788] SELinux:  policy capability open_perms=1
[   10.994788] SELinux:  policy capability extended_socket_class=1
[   10.994789] SELinux:  policy capability always_check_network=0
[   10.994789] SELinux:  policy capability cgroup_seclabel=1
[   10.994789] SELinux:  policy capability nnp_nosuid_transition=1
[   10.994790] SELinux:  policy capability genfs_seclabel_symlinks=0
[   11.013848] systemd[1]: Successfully loaded SELinux policy in 102.642ms.
[   11.018942] systemd[1]: RTC configured in localtime, applying delta of -240 minutes to system time.
[   11.055456] systemd[1]: Relabelled /dev, /dev/shm, /run, /sys/fs/cgroup in 22.328ms.
[   11.057501] systemd[1]: systemd v243.9-1.fc31 running in system mode. (+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified)
[   11.069064] systemd[1]: Detected architecture x86-64.
[   11.069237] systemd[1]: Set hostname to <localhost.localdomain>.
[   11.160169] systemd[1]: /usr/lib/systemd/system/sssd.service:12: PIDFile= references a path below legacy directory /var/run/, updating /var/run/sssd.pid → /run/sssd.pid; please update the unit file accordingly.
[   11.161178] systemd[1]: /usr/lib/systemd/system/iscsid.service:11: PIDFile= references a path below legacy directory /var/run/, updating /var/run/iscsid.pid → /run/iscsid.pid; please update the unit file accordingly.
[   11.161367] systemd[1]: /usr/lib/systemd/system/iscsiuio.service:13: PIDFile= references a path below legacy directory /var/run/, updating /var/run/iscsiuio.pid → /run/iscsiuio.pid; please update the unit file accordingly.
[   11.162371] systemd[1]: /usr/lib/systemd/system/libvirtd-admin.socket:8: ListenStream= references a path below legacy directory /var/run/, updating /var/run/libvirt/libvirt-admin-sock → /run/libvirt/libvirt-admin-sock; please update the unit file accordingly.
[   11.162639] systemd[1]: /usr/lib/systemd/system/libvirtd-ro.socket:8: ListenStream= references a path below legacy directory /var/run/, updating /var/run/libvirt/libvirt-sock-ro → /run/libvirt/libvirt-sock-ro; please update the unit file accordingly.
[   11.162851] systemd[1]: /usr/lib/systemd/system/libvirtd.socket:6: ListenStream= references a path below legacy directory /var/run/, updating /var/run/libvirt/libvirt-sock → /run/libvirt/libvirt-sock; please update the unit file accordingly.
[   11.163110] systemd[1]: /usr/lib/systemd/system/virtlockd.socket:6: ListenStream= references a path below legacy directory /var/run/, updating /var/run/libvirt/virtlockd-sock → /run/libvirt/virtlockd-sock; please update the unit file accordingly.
[   11.250039] Adding 8237052k swap on /dev/mapper/fedora_localhost--live-swap.  Priority:-2 extents:1 across:8237052k SSFS
[   11.267773] EXT4-fs (dm-0): re-mounted. Opts: (null)
[   11.283092] systemd-journald[620]: Received client request to flush runtime journal.
[   11.445376] acpi_cpufreq: overriding BIOS provided _PSD data
[   11.636956] input: Generic X-Box pad as /devices/pci0000:00/0000:00:08.1/0000:0a:00.3/usb3/3-4/3-4:1.0/input/input11
[   11.638552] usbcore: registered new interface driver xpad
[   11.639852] snd_hda_intel 0000:01:00.1: Handle vga_switcheroo audio client
[   11.639854] snd_hda_intel 0000:01:00.1: Force to non-snoop mode
[   11.643160] mc: Linux media interface: v0.10
[   11.654650] snd_hda_intel 0000:01:00.1: bound 0000:01:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
[   11.656139] input: HDA ATI HDMI HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.1/0000:01:00.1/sound/card0/input12
[   11.656179] input: HDA ATI HDMI HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:01.1/0000:01:00.1/sound/card0/input13
[   11.656208] input: HDA ATI HDMI HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:01.1/0000:01:00.1/sound/card0/input14
[   11.656235] input: HDA ATI HDMI HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:01.1/0000:01:00.1/sound/card0/input15
[   11.656263] input: HDA ATI HDMI HDMI/DP,pcm=10 as /devices/pci0000:00/0000:00:01.1/0000:01:00.1/sound/card0/input16
[   11.656292] input: HDA ATI HDMI HDMI/DP,pcm=11 as /devices/pci0000:00/0000:00:01.1/0000:01:00.1/sound/card0/input17
[   11.657439] ACPI Warning: SystemIO range 0x0000000000000B00-0x0000000000000B08 conflicts with OpRegion 0x0000000000000B00-0x0000000000000B0F (\GSA1.SMBI) (20200528/utaddress-204)
[   11.657445] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
[   11.664143] sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver
[   11.664207] sp5100-tco sp5100-tco: Using 0xfeb00000 for watchdog MMIO address
[   11.666025] sp5100-tco sp5100-tco: initialized. heartbeat=60 sec (nowayout=0)
[   11.674313] snd_hda_codec_realtek hdaudioC1D0: autoconfig for ALC887-VD: line_outs=1 (0x14/0x0/0x0/0x0/0x0) type:line
[   11.674315] snd_hda_codec_realtek hdaudioC1D0:    speaker_outs=0 (0x0/0x0/0x0/0x0/0x0)
[   11.674316] snd_hda_codec_realtek hdaudioC1D0:    hp_outs=1 (0x1b/0x0/0x0/0x0/0x0)
[   11.674317] snd_hda_codec_realtek hdaudioC1D0:    mono: mono_out=0x0
[   11.674318] snd_hda_codec_realtek hdaudioC1D0:    dig-out=0x11/0x0
[   11.674318] snd_hda_codec_realtek hdaudioC1D0:    inputs:
[   11.674319] snd_hda_codec_realtek hdaudioC1D0:      Front Mic=0x19
[   11.674320] snd_hda_codec_realtek hdaudioC1D0:      Rear Mic=0x18
[   11.674321] snd_hda_codec_realtek hdaudioC1D0:      Line=0x1a
[   11.694423] input: PC Speaker as /devices/platform/pcspkr/input/input18
[   11.698803] input: HD-Audio Generic Front Mic as /devices/pci0000:00/0000:00:08.1/0000:0a:00.6/sound/card1/input19
[   11.698857] input: HD-Audio Generic Rear Mic as /devices/pci0000:00/0000:00:08.1/0000:0a:00.6/sound/card1/input20
[   11.698888] input: HD-Audio Generic Line as /devices/pci0000:00/0000:00:08.1/0000:0a:00.6/sound/card1/input21
[   11.698941] input: HD-Audio Generic Line Out as /devices/pci0000:00/0000:00:08.1/0000:0a:00.6/sound/card1/input22
[   11.698972] input: HD-Audio Generic Front Headphone as /devices/pci0000:00/0000:00:08.1/0000:0a:00.6/sound/card1/input23
[   11.725095] RAPL PMU: API unit is 2^-32 Joules, 1 fixed counters, 163840 ms ovfl timer
[   11.725096] RAPL PMU: hw unit of domain package 2^-16 Joules
[   11.801339] kvm: disabled by bios
[   11.811118] kvm: disabled by bios
[   11.812253] MCE: In-kernel MCE decoding enabled.
[   11.815307] EDAC amd64: F17h_M10h detected (node 0).
[   11.815559] EDAC amd64: Node 0: DRAM ECC disabled.
[   11.825090] kvm: disabled by bios
[   11.848380] EDAC amd64: F17h_M10h detected (node 0).
[   11.848498] EDAC amd64: Node 0: DRAM ECC disabled.
[   11.853361] kvm: disabled by bios
[   11.858921] EDAC amd64: F17h_M10h detected (node 0).
[   11.859159] EDAC amd64: Node 0: DRAM ECC disabled.
[   11.862847] usbcore: registered new interface driver snd-usb-audio
[   11.879448] EDAC amd64: F17h_M10h detected (node 0).
[   11.879524] EDAC amd64: Node 0: DRAM ECC disabled.
[   12.022542] EXT4-fs (nvme0n1p1): mounted filesystem with ordered data mode. Opts: (null)
[   12.024691] EXT4-fs (dm-2): mounted filesystem with ordered data mode. Opts: (null)
[   12.135183] RPC: Registered named UNIX socket transport module.
[   12.135184] RPC: Registered udp transport module.
[   12.135185] RPC: Registered tcp transport module.
[   12.135185] RPC: Registered tcp NFSv4.1 backchannel transport module.
[   12.828526] Generic FE-GE Realtek PHY r8169-500:00: attached PHY driver [Generic FE-GE Realtek PHY] (mii_bus:phy_addr=r8169-500:00, irq=IGNORE)
[   12.988498] r8169 0000:05:00.0 enp5s0: Link is Down
[   13.038104] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
[   13.051113] tun: Universal TUN/TAP device driver, 1.6
[   13.054799] virbr0: port 1(virbr0-nic) entered blocking state
[   13.054801] virbr0: port 1(virbr0-nic) entered disabled state
[   13.054842] device virbr0-nic entered promiscuous mode
[   13.478108] virbr0: port 1(virbr0-nic) entered blocking state
[   13.478110] virbr0: port 1(virbr0-nic) entered listening state
[   13.505793] virbr0: port 1(virbr0-nic) entered disabled state
[   15.441831] r8169 0000:05:00.0 enp5s0: Link is Up - 1Gbps/Full - flow control rx/tx
[   15.441839] IPv6: ADDRCONF(NETDEV_CHANGE): enp5s0: link becomes ready
[   15.962644] rfkill: input handler disabled
[   20.944837] ------------[ cut here ]------------
[   20.944865] NETDEV WATCHDOG: enp5s0 (r8169): transmit queue 0 timed out
[   20.944901] WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:442 dev_watchdog+0x24e/0x260
[   20.944904] Modules linked in: xt_CHECKSUM xt_MASQUERADE nf_nat_tftp nf_conntrack_tftp nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_REJECT nf_reject_ipv6 ip6t_rpfilter tun ipt_REJECT nf_reject_ipv4 bridge xt_conntrack stp llc ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc edac_mce_amd kvm irqbypass rapl pcspkr wmi_bmof snd_usb_audio k10temp snd_hda_codec_realtek snd_hda_codec_generic sp5100_tco ledtrig_audio joydev snd_usbmidi_lib i2c_piix4 snd_rawmidi snd_hda_codec_hdmi mc snd_hda_intel xpad ff_memless snd_intel_dspcfg snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore gpio_amdpt gpio_generic acpi_cpufreq ip_tables amdgpu uas usb_storage crct10dif_pclmul crc32_pclmul crc32c_intel iommu_v2
[   20.944954]  gpu_sched i2c_algo_bit ttm drm_kms_helper ghash_clmulni_intel cec drm nvme nvme_core r8169 ccp wmi video pinctrl_amd fuse
[   20.944974] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.8.15-101.fc31.x86_64 #1
[   20.944978] Hardware name: Gigabyte Technology Co., Ltd. AB350M-DS3H/AB350M-DS3H-CF, BIOS F50d 07/02/2020
[   20.944983] RIP: 0010:dev_watchdog+0x24e/0x260
[   20.944988] Code: 85 c0 75 e5 eb 9c 4c 89 ef c6 05 41 75 23 01 01 e8 b7 05 fb ff 44 89 e1 4c 89 ee 48 c7 c7 48 f9 49 9e 48 89 c2 e8 b4 f5 6e ff <0f> 0b e9 7a ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00
[   20.944991] RSP: 0018:ffffb6a0001fce98 EFLAGS: 00010292
[   20.944995] RAX: 000000000000003b RBX: ffff8c9943602800 RCX: 0000000000000000
[   20.944998] RDX: ffff8c994e8a7060 RSI: ffff8c994e898d00 RDI: 0000000000000300
[   20.945001] RBP: ffff8c994c4723dc R08: ffff8c994e898d00 R09: 0000000000000003
[   20.945004] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
[   20.945007] R13: ffff8c994c472000 R14: ffff8c994c472480 R15: 0000000000000001
[   20.945011] FS:  0000000000000000(0000) GS:ffff8c994e880000(0000) knlGS:0000000000000000
[   20.945014] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   20.945017] CR2: 00007f75d028c00c CR3: 00000003e7074000 CR4: 00000000003406e0
[   20.945020] Call Trace:
[   20.945025]  <IRQ>
[   20.945033]  ? pfifo_fast_enqueue+0x150/0x150
[   20.945041]  call_timer_fn+0x2d/0x130
[   20.945046]  run_timer_softirq+0x183/0x490
[   20.945052]  ? tick_sched_do_timer+0x70/0x70
[   20.945057]  ? __hrtimer_run_queues+0xf5/0x240
[   20.945061]  ? ktime_get+0x38/0x90
[   20.945067]  ? sched_clock+0x5/0x10
[   20.945075]  __do_softirq+0xee/0x2ff
[   20.945080]  asm_call_irq_on_stack+0x12/0x20
[   20.945083]  </IRQ>
[   20.945089]  do_softirq_own_stack+0x37/0x40
[   20.945095]  irq_exit_rcu+0xd8/0xe0
[   20.945100]  sysvec_apic_timer_interrupt+0x34/0x80
[   20.945106]  asm_sysvec_apic_timer_interrupt+0x12/0x20
[   20.945112] RIP: 0010:cpuidle_enter_state+0xc9/0x3e0
[   20.945117] Code: e8 6c 02 7e ff 80 7c 24 0f 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 e5 02 00 00 31 ff e8 fe 58 84 ff fb 66 0f 1f 44 00 00 <45> 85 ed 0f 88 3b 02 00 00 49 63 d5 4c 2b 64 24 10 48 8d 04 52 48
[   20.945120] RSP: 0018:ffffb6a0000b3e78 EFLAGS: 00000246
[   20.945123] RAX: ffff8c994e8aa2c0 RBX: ffff8c9944eccc00 RCX: 000000000000001f
[   20.945126] RDX: 0000000000000000 RSI: 00000000249c1da0 RDI: 0000000000000000
[   20.945129] RBP: ffffffff9eb7a160 R08: 00000004e068754d R09: 0000000000000000
[   20.945132] R10: 000000000000000f R11: ffff8c994e8a90e4 R12: 00000004e068754d
[   20.945134] R13: 0000000000000002 R14: 0000000000000002 R15: ffff8c994ceccd80
[   20.945142]  ? cpuidle_enter_state+0xa4/0x3e0
[   20.945146]  cpuidle_enter+0x29/0x40
[   20.945151]  do_idle+0x1c0/0x260
[   20.945155]  cpu_startup_entry+0x19/0x20
[   20.945161]  start_secondary+0x144/0x170
[   20.945166]  secondary_startup_64+0xb6/0xc0
[   20.945170] ---[ end trace fcd28edcb225e439 ]---
[   20.962097] r8169 0000:05:00.0 enp5s0: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100).
[   23.111409] rfkill: input handler enabled
[   25.017411] rfkill: input handler disabled
[   27.099592] r8169 0000:05:00.0 enp5s0: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100).
[   32.730922] r8169 0000:05:00.0 enp5s0: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100).
[   37.844293] r8169 0000:05:00.0 enp5s0: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100).
[   42.960286] r8169 0000:05:00.0 enp5s0: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100).
[   49.098774] r8169 0000:05:00.0 enp5s0: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100).
[   54.724701] r8169 0000:05:00.0 enp5s0: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100).


> Best bisect the issue to identify the offending commit.
> (see any tutorial for git bisect)
I will work on learning how to do this and hopefully try to narrow down which commits are causing the behavior.

> To rule out network manager issues, best test w/o one.
Just reproduced the issue using both network manager and wicked.  Unsure how to go about testing without using either.
Comment 70 jssblngr 2020-10-28 18:10:18 UTC
> It will begin working after 

It will begin after booting into Windows and establishing a working connection, and then rebooting back into linux.  But again, it will no longer work after any linux reboots.
Comment 71 Heiner Kallweit 2020-10-28 19:44:26 UTC
If you cold-boot into Linux, then network is ok even after rebooting to Linux?
If that's the case then the Windows driver seems to set something that the Linux driver isn't aware of. Unfortunately Realtek doesn't provide any public chip documentation.

You said that 5.3.7 is ok. The log you provided is from 5.8.15, did you already test any kernel versions in between?
Comment 72 jssblngr 2020-10-28 19:46:47 UTC
I'm working on testing some different kernel versions currently.  It appears to be working with no issues up to 5.7.15, and the first version I've found it not working it 5.8.8.  I have not tested any kernel versions between 5.7.15 and 5.8.8 however.

I was going to try to learn and dig into git bisect starting with those two versions.
Comment 73 Heiner Kallweit 2020-10-28 20:15:52 UTC
(In reply to jssblngr from comment #72)
> I'm working on testing some different kernel versions currently.  It appears
> to be working with no issues up to 5.7.15, and the first version I've found
> it not working it 5.8.8.  I have not tested any kernel versions between
> 5.7.15 and 5.8.8 however.
> 
> I was going to try to learn and dig into git bisect starting with those two
> versions.
Great, much appreciated!
Comment 74 jssblngr 2020-10-29 00:02:39 UTC
Well, I narrowed it down a tiny bit more.

5.7.17 - Works with no issues.
5.8.6 - Bug with r8169 driver occurs as described above.

I did dig into the source a bit, but there's quite a few changes to the r8169_main.c file between those two commits, and I'm far too inexperienced as a developer to try to narrow it further than that.

Hope that helps a little.  Please let me know if there's something else I could test to try to further narrow down the issue.  Thanks for your help and time sir!
Comment 75 anonymous 2020-10-29 01:13:02 UTC
Use `git bisect`.
Comment 76 anonymous 2020-11-08 04:16:58 UTC
Is there any blocker?
Comment 77 anonymous 2021-01-22 10:45:24 UTC
I have tested linux 5.10.7 for a few days. The issue seems to have been fixed on 5.10.7.

Can anyone else confirm?
Comment 78 Naveenkumar 2021-06-14 09:18:19 UTC
I am facing the same issue in kernel 5.12.9.
   10.338200] IPv6: ADDRCONF(NETDEV_CHANGE): enp1s0: link becomes ready
[   21.173464] ------------[ cut here ]------------
[   21.173487] NETDEV WATCHDOG: enp1s0 (r8169): transmit queue 0 timed out
[   21.173525] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:467 dev_watchdog+0x24d/0x260
[   21.173543] Modules linked in: nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter cmac bnep sunrpc vfat fat snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core uvcvideo btusb snd_hwdep btrtl iTCO_wdt intel_pmc_bxt btbcm videobuf2_vmalloc at24 snd_seq iTCO_vendor_support btintel videobuf2_memops snd_seq_device mei_hdcp intel_rapl_msr videobuf2_v4l2 bluetooth snd_pcm rndis_host videobuf2_common x86_pkg_temp_thermal cdc_ether intel_powerclamp hp_wmi usbnet sparse_keymap snd_timer
[   21.173819]  videodev coretemp ecdh_generic mii snd rapl rfkill intel_cstate ecc processor_thermal_device i2c_i801 joydev processor_thermal_rfim intel_uncore processor_thermal_mbox mc pcspkr processor_thermal_rapl intel_wmi_thunderbolt mei_me intel_rapl_common wmi_bmof mei soundcore i2c_smbus intel_soc_dts_iosf intel_pch_thermal int3403_thermal hp_accel int340x_thermal_zone lis3lv02d int3400_thermal hp_wireless acpi_thermal_rel acpi_pad zram ip_tables i915 rtsx_pci_sdmmc mmc_core i2c_algo_bit crct10dif_pclmul drm_kms_helper crc32_pclmul cec crc32c_intel ghash_clmulni_intel drm serio_raw wmi rtsx_pci r8169 video fuse
[   21.174005] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.12.9-300.fc34.x86_64 #1
[   21.174014] Hardware name: HP HP ENVY Notebook/8154, BIOS F.17 07/27/2016
[   21.174020] RIP: 0010:dev_watchdog+0x24d/0x260
[   21.174030] Code: 49 99 fd ff eb a9 4c 89 f7 c6 05 6b 10 2f 01 01 e8 18 73 fa ff 44 89 e9 4c 89 f6 48 c7 c7 68 3f 49 b2 48 89 c2 e8 70 72 16 00 <0f> 0b eb 8a 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44
[   21.174038] RSP: 0018:ffffa6c9c0158eb0 EFLAGS: 00010282
[   21.174047] RAX: 000000000000003b RBX: ffff906049bdfe00 RCX: 0000000000000000
[   21.174054] RDX: ffff9063a3c66720 RSI: ffff9063a3c585c0 RDI: 0000000000000300
[   21.174059] RBP: ffff90604a2243dc R08: 0000000000000000 R09: ffffa6c9c0158ce0
[   21.174065] R10: ffffa6c9c0158cd8 R11: ffffffffb2b45f28 R12: ffff90604a224480
[   21.174071] R13: 0000000000000000 R14: ffff90604a224000 R15: ffff906049bdfe80
[   21.174076] FS:  0000000000000000(0000) GS:ffff9063a3c40000(0000) knlGS:0000000000000000
[   21.174084] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   21.174090] CR2: 00005557093f5a80 CR3: 0000000310a10001 CR4: 00000000003706e0
[   21.174096] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   21.174101] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   21.174107] Call Trace:
[   21.174113]  <IRQ>
[   21.174120]  ? pfifo_fast_enqueue+0x150/0x150
[   21.174130]  call_timer_fn+0x29/0xf0
[   21.174144]  __run_timers.part.0+0x1b1/0x210
[   21.174156]  ? __hrtimer_run_queues+0x129/0x250
[   21.174168]  ? recalibrate_cpu_khz+0x10/0x10
[   21.174180]  ? ktime_get+0x38/0x90
[   21.174187]  ? sched_clock+0x5/0x10
[   21.174199]  run_timer_softirq+0x26/0x50
[   21.174209]  __do_softirq+0xd0/0x28f
[   21.174220]  __irq_exit_rcu+0xbf/0x100
[   21.174228]  sysvec_apic_timer_interrupt+0x72/0x90
[   21.174240]  </IRQ>
[   21.174245]  asm_sysvec_apic_timer_interrupt+0x12/0x20
[   21.174258] RIP: 0010:cpuidle_enter_state+0xc7/0x350
[   21.174272] Code: 8b 3d 05 e4 6a 4e e8 b8 00 7b ff 49 89 c5 0f 1f 44 00 00 31 ff e8 f9 18 7b ff 45 84 ff 0f 85 fa 00 00 00 fb 66 0f 1f 44 00 00 <45> 85 f6 0f 88 06 01 00 00 49 63 d6 4c 2b 2c 24 48 8d 04 52 48 8d
[   21.174279] RSP: 0018:ffffa6c9c00f3eb0 EFLAGS: 00000246
[   21.174288] RAX: ffff9063a3c6a3c0 RBX: 0000000000000008 RCX: 000000000000001f
[   21.174293] RDX: 0000000000000000 RSI: 000000003161f0f0 RDI: 0000000000000000
[   21.174298] RBP: ffff9063a3c75b80 R08: 00000004ee091a7b R09: 0000000000000018
[   21.174304] R10: 000000000000474e R11: 00000000000032e5 R12: ffffffffb2c58ec0
[   21.174309] R13: 00000004ee091a7b R14: 0000000000000008 R15: 0000000000000000
[   21.174321]  cpuidle_enter+0x29/0x40
[   21.174333]  do_idle+0x1c7/0x270
[   21.174343]  cpu_startup_entry+0x19/0x20
[   21.174351]  secondary_startup_64_no_verify+0xc2/0xcb
[   21.174364] ---[ end trace 52b54b5cc9bb0f86 ]---
[   21.204468] r8169 0000:01:00.0 enp1s0: rtl_chipcmd_cond == 1 (loop: 100, delay: 100).
[   21.227284] r8169 0000:01:00.0 enp1s0: rtl_eriar_cond == 1 (loop: 100, delay: 100).
[   21.249432] r8169 0000:01:00.0 enp1s0: rtl_eriar_cond == 1 (loop: 100, delay: 100).
[   21.271942] r8169 0000:01:00.0 enp1s0: rtl_eriar_cond == 1 (loop: 100, delay: 100).
[   21.294329] r8169 0000:01:00.0 enp1s0: rtl_eriar_cond == 1 (loop: 100, delay: 100).
[   21.316546] r8169 0000:01:00.0 enp1s0: rtl_eriar_cond == 1 (loop: 100, delay: 100).
Comment 79 Heiner Kallweit 2021-06-14 09:47:57 UTC
The symptom "generic tx timeout" is the same, the root cause not necessarily.
Please attach a full dmesg log. Is the issue reproducible? Is it a regression? What was the last known good kernel version?
Comment 80 Naveenkumar 2021-06-14 10:35:02 UTC
Created attachment 297355 [details]
dmesg log from power on to device failure

at [10.004310] r8169 link is up and crashes at [24.962220] transmit queue timedout..
Comment 81 Naveenkumar 2021-06-14 10:45:48 UTC
Is the issue reproducible?
A: Happens always after power on boot.
The ethernet plug LEDs blink.
Device entry enp1s0 stays, but no dmesg log or any other effect when ip link up/
down, modprobe r8169 is performed.

Is it a regression?
The kernel is Fedora normal update.

What was the last known good kernel version?
In 5.12.9 the issue is severe, device is unusable.
However the problem was observed intermittently in past two weeks
(kernel 5.12.7, 5.12.8).
Comment 82 Naveenkumar 2021-06-14 13:32:12 UTC
I tried-
echo auto > /sys/bus/pci/devices/0000:<device>/power/control

as in
https://bugzilla.kernel.org/show_bug.cgi?id=199549#c6

and rebooted the laptop. The link came up and stable (for atleast 20mins now)
Comment 83 Heiner Kallweit 2021-06-14 13:56:53 UTC
In the referenced issue it was about disabling this command.
To make sure we don't misunderstand each other:

You used to have "on" for this property? This would mean Runtime PM was disabled. Setting the property to "auto" enables Runtime PM.
With Runtime PM enabled the issue is gone in your case?
Comment 84 Naveenkumar 2021-06-14 14:34:39 UTC
(In reply to Heiner Kallweit from comment #83)

> You used to have "on" for this property?
Yes it looks the default was set to 'on'. I did not do this tweak earlier.
> With Runtime PM enabled the issue is gone in your case?
So far (~1hr) yes. I will observe for a few days and update.
Comment 85 Heiner Kallweit 2021-06-14 20:09:55 UTC
Hmm, I don't really have an explanation for this by now. Runtime PM kicks in 10s after interface or link went down. You (roughly) have:
probe() -> 4s -> open() -> 4s -> link up
Therefore it shouldn't make a difference.
Comment 86 Naveenkumar 2021-06-20 01:02:02 UTC
I could not observe any consistent behaviors other than repeated "r8169 0000:01:00.0 enp1s0: rtl_eriar_cond == 1 (loop: 100, delay: 100)." in dmesg on boot.
It looks to be a hardware issue now that the device has become unusable
(no i/o takes place, but the device remains registered.). May be the bug can be closed.
Comment 87 anonymous 2022-01-14 09:59:01 UTC
On 5.10.76, there is no issue.
On 5.15.11, this issue occurs again.
Comment 88 kongdeyuan 2022-07-02 02:42:58 UTC
hi, how can I check the possible cause of the error in this situation?

root@*****:~# lspci | grep  -i net 
0000:3b:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. Killer E3000 2.5GbE Controller (rev 06)
0000:3c:00.0 Network controller: Intel Corporation Wi-Fi 6 AX210/AX211/AX411 160MHz (rev 1a)

root@*****:~# modprobe  -r  r8169 
root@*****:~# modprobe  r8169
root@*****:~# dmesg | tail -n 6
[  564.018275] r8169 0000:3b:00.0 enp59s0: Link is Down
[  638.485759] r8169 0000:3b:00.0 eth0: RTL8125B, 38:14:28:41:c6:cf, XID 641, IRQ 168
[  638.485777] r8169 0000:3b:00.0 eth0: jumbo features [frames: 9194 bytes, tx checksumming: ko]
[  638.490836] r8169 0000:3b:00.0 enp59s0: renamed from eth0
[  638.570494] RTL8226B_RTL8221B 2.5Gbps PHY r8169-0-3b00:00: attached PHY driver (mii_bus:phy_addr=r8169-0-3b00:00, irq=MAC)
[  638.750857] r8169 0000:3b:00.0 enp59s0: Link is Down
root@*****:~# ip a 
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
3: wlp60s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 40:1c:83:bf:5a:63 brd ff:ff:ff:ff:ff:ff
    inet 172.16.19.90/26 brd 172.16.19.127 scope global dynamic noprefixroute wlp60s0
       valid_lft 82990sec preferred_lft 82990sec
    inet6 fe80::78b4:fb27:a47e:8aac/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
6: enp59s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN group default qlen 1000
    link/ether 38:14:28:41:c6:cf brd ff:ff:ff:ff:ff:ff
root@*****:~# uname -a
Linux 5.18.7-custom #1 SMP PREEMPT_DYNAMIC Sun Jun 26 12:57:06 CST 2022 x86_64 x86_64 x86_64 GNU/Linux
Comment 89 Alexander Nolting 2022-08-15 11:22:57 UTC
Im fighting with two problems. One with the kernel built-in driver r8169 and the other with the alternative driver r8168.

On my mainboard MSI X570 Gaming Edge Wifi the H model revision 15 is found.
As nearly the same issue @kongdeyuan has described in his comment I would like also to ask for some help here.

The kernel builtin driver r8169 seem not to be able to setup the RTL8168H controller properly. The controller is shown as a device in network configuration but completely unable to establish a link and seem also not to detecting that the controller is connected to a LAN.

The reason why I switched to r8169 is the fact that since kernel 5.18.14 the external driver r8168 is not longer able to hold up the link. With r8168 driver the link goes continuously up and down. With r8169 the link is always down.

Here some more details:

r8169

# lsmod | grep r816*
r8169                 102400  0
mdio_devres            16384  1 r8169
libphy                172032  3 r8169,mdio_devres,realtek

# ifconfig -a
enp39s0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        ether 00:d8:61:a2:e4:a9  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

# dmesg | grep r8169
[    4.054788] r8169 0000:27:00.0: enabling device (0000 -> 0003)
[    4.074482] r8169 0000:27:00.0 eth0: RTL8168h/8111h, 00:d8:61:a2:e4:a9, XID 541, IRQ 101
[    4.074488] r8169 0000:27:00.0 eth0: jumbo features [frames: 9194 bytes, tx checksumming: ko]
[    4.199367] r8169 0000:27:00.0 enp39s0: renamed from eth0
[    5.113071] Generic FE-GE Realtek PHY r8169-0-2700:00: attached PHY driver (mii_bus:phy_addr=r8169-0-2700:00, irq=MAC)
[    5.316504] r8169 0000:27:00.0 enp39s0: Link is Down
[   21.246915] r8169 0000:27:00.0 enp39s0: Link is Down

With r8169 the controller does not show up for network-manager.

If one guide me how to debug this in more detail I'm willing to do so.

Best Alex
Comment 90 Heiner Kallweit 2022-08-15 11:39:53 UTC
This indicates an issue on the physical side. Try with another link partner (or e.g. for testing put a switch in the middle) and cable, also there could be an issue with the RJ45 port.
On my test systems RTL8168h works fine also with the latest kernels.
Comment 91 Alexander Nolting 2022-08-15 11:42:34 UTC
Hello Heiner
There is switch in between. Cables are wall mounted and ports on the switch were changed and the systems booted with another disk using windows does not show any error.
Best Alex
Comment 92 Heiner Kallweit 2022-08-15 11:54:44 UTC
Realtek's Windows driver may have additional workarounds for compatibility issues on the physical side. Having said that I still think that an issue on the physical side is the most likely cause of your problem.
Comment 93 anonymous 2022-08-16 11:16:16 UTC
As far as I know, r8168 doesn't support linux 5.18 and above, yet.
Comment 94 Cameron Rapp 2022-09-11 22:48:07 UTC
I'm seeing this with an r8125 chip supported by this driver too.
Comment 95 Heiner Kallweit 2022-09-12 07:39:22 UTC
(In reply to Cameron Rapp from comment #94)
> I'm seeing this with an r8125 chip supported by this driver too.

Seeing what? I'm asking because this issue has been used to report different issues. Please attach at least a full dmesg log.
Have you checked also with ASPM disabled?
Comment 96 Cameron Rapp 2022-09-12 14:15:36 UTC
(In reply to Heiner Kallweit from comment #95)
> (In reply to Cameron Rapp from comment #94)
> > I'm seeing this with an r8125 chip supported by this driver too.
> 
> Seeing what? I'm asking because this issue has been used to report different
> issues. Please attach at least a full dmesg log.
> Have you checked also with ASPM disabled?

It stops working after awhile and dumps a calltrace in dmesg log, seems to be weather i'm using it or not, in the log i'm posting it's about 80mins after boot.
Comment 97 Cameron Rapp 2022-09-12 14:20:44 UTC
Created attachment 301791 [details]
dmesg-5.14.21-aspm-off

Actually it just happened again 35mins in with no aspm.
Comment 98 Cameron Rapp 2022-09-12 17:36:36 UTC
It has not happened since I reduced the MTU on all devices on this network from 9000 to 1500, i thought i did this as a troubleshooting step but i missed one.
Comment 99 Heiner Kallweit 2022-09-12 19:40:48 UTC
Thanks, for the details, interesting. Seems RTL8125A may have a silicon bug processing jumbo frames. Unfortunately Realtek doesn't release datasheets and errata information.
What you could do is check whether the same issue occurs with the r8125 vendor driver from Realtek.
Comment 100 Cameron Rapp 2022-09-12 19:48:52 UTC
I'm sorry I spoke too soon, it did happen again even without jumbo frames, I will give their driver a try though. Is there any else i can share to help with debugging r8169?
Comment 101 Heiner Kallweit 2022-09-12 20:10:20 UTC
Something system-dependent may be involved because I haven't seen such reports before. You could check whether behavior is still the same with latest 5.19 kernel.
Comment 102 Cameron Rapp 2022-09-17 22:27:16 UTC
It looks like my issue is hardware related, r8125 is failing too but it appears more graceful. Thanks!
Comment 103 Apostolos 2022-09-20 18:41:18 UTC
I have a problem with r8169 that might be related to these reports. In my case the link is active, I can ping from/to the affected pc but no other traffic (no ssh/http/etc). Not working on all kernels up to 5.10.140. Everything ok on 4.19.

$ lspci -v
01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL810xE PCI Express Fast Ethernet controller (rev 04)
	Subsystem: Hewlett-Packard Company RTL810xE PCI Express Fast Ethernet controller
	Flags: bus master, fast devsel, latency 0, IRQ 16
	I/O ports at 2000 [size=256]
	Memory at 50004000 (64-bit, prefetchable) [size=4K]
	Memory at 50000000 (64-bit, prefetchable) [size=16K]
	Capabilities: <access denied>
	Kernel driver in use: r8169
	Kernel modules: r8169

$ sudo dmesg |grep r8169
[    4.608354] r8169 0000:01:00.0: can't disable ASPM; OS doesn't have ASPM control
[    4.689687] r8169 0000:01:00.0 eth0: RTL8401, 00:21:cc:5a:6b:5d, XID 240, IRQ 16
[    4.871853] r8169 0000:01:00.0 enp1s0f0: renamed from eth0
[   30.135867] RTL8201CP Ethernet r8169-0-100:00: attached PHY driver [RTL8201CP Ethernet] (mii_bus:phy_addr=r8169-0-100:00, irq=IGNORE)

Thanks!
Comment 104 Heiner Kallweit 2022-10-04 10:30:06 UTC
(In reply to Apostolos from comment #103)
> I have a problem with r8169 that might be related to these reports. In my
> case the link is active, I can ping from/to the affected pc but no other
> traffic (no ssh/http/etc). Not working on all kernels up to 5.10.140.
> Everything ok on 4.19.
> 
> $ lspci -v
> 01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL810xE PCI
> Express Fast Ethernet controller (rev 04)
>       Subsystem: Hewlett-Packard Company RTL810xE PCI Express Fast Ethernet
> controller
>       Flags: bus master, fast devsel, latency 0, IRQ 16
>       I/O ports at 2000 [size=256]
>       Memory at 50004000 (64-bit, prefetchable) [size=4K]
>       Memory at 50000000 (64-bit, prefetchable) [size=16K]
>       Capabilities: <access denied>
>       Kernel driver in use: r8169
>       Kernel modules: r8169
> 
> $ sudo dmesg |grep r8169
> [    4.608354] r8169 0000:01:00.0: can't disable ASPM; OS doesn't have ASPM
> control
> [    4.689687] r8169 0000:01:00.0 eth0: RTL8401, 00:21:cc:5a:6b:5d, XID 240,
> IRQ 16
> [    4.871853] r8169 0000:01:00.0 enp1s0f0: renamed from eth0
> [   30.135867] RTL8201CP Ethernet r8169-0-100:00: attached PHY driver
> [RTL8201CP Ethernet] (mii_bus:phy_addr=r8169-0-100:00, irq=IGNORE)
> 
> Thanks!

Seems like cdafdc29ef75 ("r8169: sync support for RTL8401 with vendor driver") broke it in your case. Does it work again if you make one of the following functions a no-op?
rtl_hw_start_8401()
rtl8401_hw_phy_config()

Also would be good to know whether vendor driver r8101 works for you.