Bug 212617

Summary: r8169: Flow Control not working (well)?
Product: Drivers Reporter: Roman Mamedov (rm+bko)
Component: NetworkAssignee: drivers_network (drivers_network)
Status: RESOLVED CODE_FIX    
Severity: normal CC: hkallweit1
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 5.4.109 Subsystem:
Regression: No Bisected commit-id:

Description Roman Mamedov 2021-04-08 21:09:07 UTC
Hello,

I have my server connected into a router's managed switch (AR8327), which has a 100 Mbit internet uplink coming into it.

This all works fine when using Intel 82580 port on the server.

Then I replug the router into the onboard "Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)".

Now, any upload stream towards the Internet (100 Mbit) will be heavily disrupted by Gigabit-speeds traffic uploading from the same server towards other LAN hosts:

[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec  13.9 MBytes   116 Mbits/sec  181   2.04 MBytes       
[  4]   1.00-2.00   sec  11.2 MBytes  94.4 Mbits/sec  179   1.99 MBytes       
[  4]   2.00-3.00   sec  10.0 MBytes  83.9 Mbits/sec  141   1.11 MBytes       
[  4]   3.00-4.00   sec  11.2 MBytes  94.4 Mbits/sec  165   1.99 MBytes       
[  4]   4.00-5.00   sec  11.2 MBytes  94.4 Mbits/sec   71   2.00 MBytes       
[  4]   5.00-6.00   sec  11.2 MBytes  94.4 Mbits/sec  188   2.00 MBytes       
[  4]   6.00-7.00   sec  1.25 MBytes  10.5 Mbits/sec    1   1.39 KBytes       
[  4]   7.00-8.00   sec  1.25 MBytes  10.5 Mbits/sec    1   1.39 KBytes       
[  4]   8.00-9.00   sec  1.02 MBytes  8.60 Mbits/sec    1   1.39 KBytes       
[  4]   9.00-10.00  sec  0.00 Bytes  0.00 bits/sec    0   1.39 KBytes       
[  4]  10.00-11.00  sec  0.00 Bytes  0.00 bits/sec    1   1.39 KBytes       
[  4]  11.00-12.00  sec  1.07 MBytes  8.99 Mbits/sec    0   5.55 KBytes       
[  4]  12.00-13.00  sec  0.00 Bytes  0.00 bits/sec  866   1.91 MBytes       
[  4]  13.00-14.00  sec   549 KBytes  4.49 Mbits/sec   19   1.88 MBytes       
[  4]  14.00-15.00  sec  0.00 Bytes  0.00 bits/sec    0   1.88 MBytes       
[  4]  15.00-16.00  sec  0.00 Bytes  0.00 bits/sec    0   1.88 MBytes       
[  4]  16.00-17.00  sec  0.00 Bytes  0.00 bits/sec    0   1.88 MBytes       
[  4]  17.00-18.00  sec  0.00 Bytes  0.00 bits/sec    0   1.88 MBytes       
[  4]  18.00-19.00  sec  6.25 MBytes  52.4 Mbits/sec  785   1.89 MBytes       
[  4]  19.00-20.00  sec  10.0 MBytes  83.8 Mbits/sec   97   1.13 MBytes       
^C[  4]  20.00-20.80  sec  7.50 MBytes  78.9 Mbits/sec  135   1.16 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-20.80  sec  97.7 MBytes  39.4 Mbits/sec  2831             sender
[  4]   0.00-20.80  sec  0.00 Bytes  0.00 bits/sec                  receiver
iperf3: interrupt - the client has terminated

My guess is that the switch has a small packet buffer, and the Intel NIC is saved by the working Flow Control between it and the router; the router can ask the NIC to slow down its transfers while the switch's buffer is full.

With the Realtek NIC, even though:

r8169 0000:04:00.0 rt0: Link is Up - 1Gbps/Full - flow control rx/tx

I have:

Settings for rt0:
	Supported ports: [ TP MII ]
	Supported link modes:   10baseT/Half 10baseT/Full 
	                        100baseT/Half 100baseT/Full 
	                        1000baseT/Half 1000baseT/Full 
	Supported pause frame use: Symmetric Receive-only
	Supports auto-negotiation: Yes
	Advertised link modes:  10baseT/Half 10baseT/Full 
	                        100baseT/Half 100baseT/Full 
	                        1000baseT/Half 1000baseT/Full 
	Advertised pause frame use: Symmetric Receive-only
	Advertised auto-negotiation: Yes
	Link partner advertised link modes:  10baseT/Half 10baseT/Full 
	                                     100baseT/Half 100baseT/Full 
	                                     1000baseT/Full 
	Link partner advertised pause frame use: Symmetric Receive-only
	Link partner advertised auto-negotiation: Yes
	Speed: 1000Mb/s
	Duplex: Full
	Port: MII
	PHYAD: 0
	Transceiver: internal
	Auto-negotiation: on
	Supports Wake-on: pumbg
	Wake-on: d
	Current message level: 0x00000033 (51)
			       drv probe ifdown ifup
	Link detected: yes

$ sudo ethtool -A rt0 autoneg on rx on tx on
Cannot get device pause settings: Operation not supported

So is it actually working? If so, what else could be the cause of such behavior? Or if not, is it possible to implement/improve it?

Thanks!
Comment 1 Roman Mamedov 2021-04-08 21:11:10 UTC
I should clarify that the router is not doing any actual routing here, it only acts as a VLAN-capable switch, encapsulating the Internet uplink (untagged) into one of VLANs (tagged).
Comment 2 Roman Mamedov 2021-04-08 21:27:33 UTC
Some more details, the device is:

[780834.487733] r8169 0000:04:00.0 eth0: RTL8168evl/8111evl, xxxxxxxx, XID 2c9, IRQ 32
[780834.487735] r8169 0000:04:00.0 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko]

And I see that the r8168 vendor driver only enables flow control if Jumbo Frames are not in use:

                //flow control
                if (dev->mtu <= ETH_DATA_LEN)
                        auto_nego |= ADVERTISE_PAUSE_CAP|ADVERTISE_PAUSE_ASYM;

Here jumbo frames are enabled, and the mentioned Gigabit traffic would indeed consist of those.
Comment 3 Roman Mamedov 2021-04-08 22:36:18 UTC
...and somehow, disabling EEE helps lower the severity of this issue by an order of magnitude.

 ethtool --set-eee rt0 eee off

Now the Internet upload stream is no longer impacted at all, but the LAN transfers are still being slowed down. Before there would be an effect on both.

Simultaneous upload to:  Internet     LAN                 

Realtek (EEE on):        unusable     unusable - both connections stall
Realtek (EEE off):       94 Mbit      120-400 Mbit, couple of drops to 4 and 33 Mbit

Intel (EEE unsupported): 94 Mbit      600-700 Mbit
Comment 4 Heiner Kallweit 2021-04-10 21:46:50 UTC
AR8327 (at least certain revisions) has known compatibility issues with EEE. Therefore e.g. in OpenWRT EEE is disabled per default for this chipset.

Unfortunately Realtek doen't release any NIC datasheets, therefore I don't know whether pause (sym or asym) is supported for jumbo frames. But you're right, the vendor driver (apart from r8168 also r8125) doesn't use pause with jumbo frames. Can you switch to a standard MTU to check whether the issue gone then?

Regarding the ethtool question:
r8169 doesn't support getting or setting pause settings. That's what "operation not supported" refers to, it doesn't say anything about whether pause is active or not.
Comment 5 Heiner Kallweit 2021-04-11 11:36:46 UTC
You can give the following a try, like the vendor driver it disables pause advertisement in jumbo mode.


diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index eb6da93ac..2a7a9ca55 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -2318,6 +2318,7 @@ static void r8168b_1_hw_jumbo_disable(struct rtl8169_private *tp)
 static void rtl_jumbo_config(struct rtl8169_private *tp)
 {
 	bool jumbo = tp->dev->mtu > ETH_DATA_LEN;
+	struct phy_device *phydev = tp->phydev;
 	int readrq = 4096;
 
 	rtl_unlock_config_regs(tp);
@@ -2358,6 +2359,20 @@ static void rtl_jumbo_config(struct rtl8169_private *tp)
 
 	if (pci_is_pcie(tp->pci_dev) && tp->supports_gmii)
 		pcie_set_readrq(tp->pci_dev, readrq);
+
+	/* Chip doesn't support pause in jumbo mode */
+	if (jumbo) {
+		linkmode_clear_bit(ETHTOOL_LINK_MODE_Pause_BIT,
+				   phydev->advertising);
+		linkmode_clear_bit(ETHTOOL_LINK_MODE_Asym_Pause_BIT,
+				   phydev->advertising);
+	} else {
+		linkmode_set_bit(ETHTOOL_LINK_MODE_Pause_BIT,
+				 phydev->advertising);
+		linkmode_set_bit(ETHTOOL_LINK_MODE_Asym_Pause_BIT,
+				 phydev->advertising);
+	}
+	phy_start_aneg(phydev);
 }
 
 DECLARE_RTL_COND(rtl_chipcmd_cond)
@@ -4619,8 +4634,6 @@ static int r8169_phy_connect(struct rtl8169_private *tp)
 	if (!tp->supports_gmii)
 		phy_set_max_speed(phydev, SPEED_100);
 
-	phy_support_asym_pause(phydev);
-
 	phy_attached_info(phydev);
 
 	return 0;
-- 
2.31.1
Comment 6 rm 2021-04-11 21:06:26 UTC
On Sat, 10 Apr 2021 21:46:50 +0000
bugzilla-daemon@bugzilla.kernel.org wrote:

> --- Comment #4 from Heiner Kallweit (hkallweit1@gmail.com) ---
> AR8327 (at least certain revisions) has known compatibility issues with EEE.
> Therefore e.g. in OpenWRT EEE is disabled per default for this chipset.

Actually my router runs OpenWRT 17.01.7 and I did not manually enable EEE
there. I see the port status as:

	enable_eee: 0
	igmp_snooping: 0
	pvid: 0
	link: port:3 link:up speed:1000baseT full-duplex txflow rxflow eee100 eee1000 auto

If it's not enabled on the switch side, it seems weird that toggling it on the
NIC side affects things so much.

> Can you switch to a standard MTU to check whether the issue gone then?

Switching to MTU 1500 reduces the issue severity with EEE on, so it's about
the same as with EEE off, with none of the previous catastrophic stalling.

With EEE off though, there's not really any difference whether jumbo frames are
used or not. So of course I'll pick keeping jumbo frames over keeping EEE.

Also it seems that setting the actual NIC to MTU 1500 is not making a
difference over just limiting MTU of the used VLAN to 1500 (and keeping the
NIC itself at MTU 9000). So in this case any positive effect appears to be not
from reducing the device MTU in the driver, but just from sending smaller
packets than before.
Comment 7 Heiner Kallweit 2021-04-11 21:18:25 UTC
(In reply to rm from comment #6)
> On Sat, 10 Apr 2021 21:46:50 +0000
> bugzilla-daemon@bugzilla.kernel.org wrote:
> 
> > --- Comment #4 from Heiner Kallweit (hkallweit1@gmail.com) ---
> > AR8327 (at least certain revisions) has known compatibility issues with
> EEE.
> > Therefore e.g. in OpenWRT EEE is disabled per default for this chipset.
> 
> Actually my router runs OpenWRT 17.01.7 and I did not manually enable EEE
> there. I see the port status as:
> 
>       enable_eee: 0
>       igmp_snooping: 0
>       pvid: 0
>       link: port:3 link:up speed:1000baseT full-duplex txflow rxflow eee100
> eee1000 auto
> 
> If it's not enabled on the switch side, it seems weird that toggling it on
> the
> NIC side affects things so much.
> 
The port status includes eee100 and eee1000, therefore EEE seems to be active. Looks like enable_eee setting is ignored.
Comment 8 rm 2021-04-11 21:51:37 UTC
On Sun, 11 Apr 2021 11:36:46 +0000
bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=212617
> 
> --- Comment #5 from Heiner Kallweit (hkallweit1@gmail.com) ---
> You can give the following a try, like the vendor driver it disables pause
> advertisement in jumbo mode.

Thanks, I'll try it some time later when I set up a newer series build, as
it doesn't apply to the 5.4 kernel I'm testing with at the moment.

> diff --git a/drivers/net/ethernet/realtek/r8169_main.c
> b/drivers/net/ethernet/realtek/r8169_main.c
> index eb6da93ac..2a7a9ca55 100644
> --- a/drivers/net/ethernet/realtek/r8169_main.c
> +++ b/drivers/net/ethernet/realtek/r8169_main.c
> @@ -2318,6 +2318,7 @@ static void r8168b_1_hw_jumbo_disable(struct
> rtl8169_private *tp)
>  static void rtl_jumbo_config(struct rtl8169_private *tp)
>  {
>         bool jumbo = tp->dev->mtu > ETH_DATA_LEN;
> +       struct phy_device *phydev = tp->phydev;
>         int readrq = 4096;
> 
>         rtl_unlock_config_regs(tp);
> @@ -2358,6 +2359,20 @@ static void rtl_jumbo_config(struct rtl8169_private
> *tp)
> 
>         if (pci_is_pcie(tp->pci_dev) && tp->supports_gmii)
>                 pcie_set_readrq(tp->pci_dev, readrq);
> +
> +       /* Chip doesn't support pause in jumbo mode */
> +       if (jumbo) {
> +               linkmode_clear_bit(ETHTOOL_LINK_MODE_Pause_BIT,
> +                                  phydev->advertising);
> +               linkmode_clear_bit(ETHTOOL_LINK_MODE_Asym_Pause_BIT,
> +                                  phydev->advertising);
> +       } else {
> +               linkmode_set_bit(ETHTOOL_LINK_MODE_Pause_BIT,
> +                                phydev->advertising);
> +               linkmode_set_bit(ETHTOOL_LINK_MODE_Asym_Pause_BIT,
> +                                phydev->advertising);
> +       }
> +       phy_start_aneg(phydev);
>  }
> 
>  DECLARE_RTL_COND(rtl_chipcmd_cond)
> @@ -4619,8 +4634,6 @@ static int r8169_phy_connect(struct rtl8169_private
> *tp)
>         if (!tp->supports_gmii)
>                 phy_set_max_speed(phydev, SPEED_100);
> 
> -       phy_support_asym_pause(phydev);
> -
>         phy_attached_info(phydev);
> 
>         return 0;
Comment 9 Roman Mamedov 2021-04-12 18:32:01 UTC
Turns out the patch does not apply to 5.10 or 5.11 either, and I'm not really comfortable running the RC or mainline kernels on my main machine.

If you think it's still something worth trying (given the other observations I posted), could you make a version of the patch for 5.4?

Or even a smaller one that just disables pause altogether, since we can already try the case of "MTU 1500, pause enabled" as is, and the counterpart remaining to be tested is "MTU 9000, pause disabled".
Comment 10 Heiner Kallweit 2021-04-12 21:46:25 UTC
This one applies on 5.4 and disables pause support unconditionally.


diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index bd8decc54..15ab0cad6 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -6368,7 +6368,8 @@ static int r8169_phy_connect(struct rtl8169_private *tp)
 	if (!tp->supports_gmii)
 		phy_set_max_speed(phydev, SPEED_100);
 
-	phy_support_asym_pause(phydev);
+	linkmode_clear_bit(ETHTOOL_LINK_MODE_Pause_BIT, phydev->advertising);
+	linkmode_clear_bit(ETHTOOL_LINK_MODE_Asym_Pause_BIT, phydev->advertising);
 
 	phy_attached_info(phydev);
 
-- 
2.31.1
Comment 11 Roman Mamedov 2021-04-13 17:13:55 UTC
Thank you. It is *so* much better with this patch!

Now getting a stable and steady 840 Mbit upload to LAN during a concurrent 94 Mbit upload to the Internet.

And now it does not matter whether EEE is on or off, it's all fine even with EEE on.
Comment 12 Roman Mamedov 2021-04-13 18:49:26 UTC
To clarify, it never worked this well before (with pause on), not even with EEE off and MTU 1500.

So while pause indeed better be disabled with jumbo frames since the vendor driver does that, on my setup it looks like pause is not contributing in a positive way even with the standard MTU. Maybe it does in some other scenarios though, or when paired with a different switch.
Comment 13 Heiner Kallweit 2021-04-14 20:41:44 UTC
Fixed with 453a77894efa ("r8169: don't advertise pause in jumbo mode").
Comment 14 Roman Mamedov 2021-04-21 11:48:01 UTC
Thanks!