Bug 209839
Summary: | r8169 (RTL8125B): "rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100)" and connectivity loss, caused by small fragmented datagrams | ||
---|---|---|---|
Product: | Drivers | Reporter: | WGH (wgh) |
Component: | Network | Assignee: | drivers_network (drivers_network) |
Status: | REOPENED --- | ||
Severity: | normal | CC: | curtdept, haithcockce, hkallweit1, lubosd, mparka, priit, rm+bko, xplo.bn, ykonotopov |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 5.9.1 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
pcap when the the hang occured
trace when the hang occured distilled hang pcap full dmesg 30 packets to cause crash Test patch v1 Test patch v2 Dmesg output Kernel 5.15 and 6.2 vs. r8169 and r8125 Test results - kernel 6.1.0 Test results - kernel 6.4.0-rc1 |
Description
WGH
2020-10-24 10:15:05 UTC
The problem occurs when both VM and host bridged through a bridge interface as well, so that likely rules macvtap problem out. On tx timeout a chip reset is done and connectivity should recover. However, depending on how confused the chip is, even this may not help 100% (indicated by the error messages that tx/rx fifo's aren't empty). I wonder what could be special with RDP traffic. It should be normal TCP traffic. What may help to analyze the issue: - trace when the tx timeout occurs. Enable trace events (under /sys/kernel/debug/tracing/events/net): - net_dev_start_xmit - net_dev_xmit - net_dev_xmit_timeout - a capture of the network traffic when tx times out Created attachment 293271 [details]
pcap when the the hang occured
Created attachment 293273 [details]
trace when the hang occured
The hang didn't occur immediately, it appears it only happens almost as soon as the desktop is rendered for the first time, and rarely much later. The network topology is as follows: br0 bridge interface with enp6s0 (ethernet adapter) and vnet0 (VM) enslaved. I captured the traffic on the enp6s0 interface. I have edited the trace by adding the realtime column so it's easier to match trace with the pcap. I have randomized MAC addresses in the pcap, and removed a stray IPv6 multicast packet that has my public address in it. I can send you an unedited dump to you privately, if you need it. I didn't fiddle with ethtool settings, everything was left at their defaults (I know that GSO/TSO sometimes produce funny oversized packets in pcaps, though it doesn't seem to be the case this time). Thanks for the logs. Only thing that looks interesting at the first glance is the fragmented packets in the pcap log. Not sure however whether this could conflict with the following in the r8169 driver: /* The driver does not support incoming fragmented frames. * They are seen as a symptom of over-mtu sized frames. */ if (unlikely(rtl8169_fragmented_frame(status))) { dev->stats.rx_dropped++; dev->stats.rx_length_errors++; goto release_descriptor; } Your intuition was right, it does appear it has something to do with fragmented packets. I have distilled the test case to 3 packets you can simply send with tcpreplay to get the hang with 100% reliability (I wish every driver/hardware bug was like that!). (As a side-note, I find it odd that Windows decided to use fragmented IP datagrams. To my knowledge, IP fragmentation is something that is usually avoided. It seems that the qemu device advertises TSO support, but instead of splitting it into TCP segments, oversized packets gets to the bridge device, and gets fragmented on Ethernet output. Which is likely a bug on its own, but still, IP fragments is not something that should hang the network card driver. Disabling Large Send Offload on the Windows side prevents this fragmentation) Created attachment 293293 [details]
distilled hang pcap
sudo tcpreplay -i enp6s0 distilled_hang1.pcap
The tx timeout is triggered by the net core if the tx queue has been stopped for too long and there are pending packets. Can't say how this relates to the dropped incoming fragmented packets, IOW: who stops the tx queue. The fragmented rx packets are dropped only in the driver, therefore I'd expect that the rx/tx hw should be fine. Well, except this chip version has a hw issue with fragmented rx packets. Not sure whether an upper layer could have stopped the tx queue due to the missing (dropped) packets. What we could do fow now is adding a WARN_ONCE if we detect a fragmented incoming packet, knowing that this can cause trouble. I don't really have the expertise with this driver and hardware, but at the first glance that particular comment about fragmented frames seems to be concerned with receiving large Ethernet frames when each ring buffer entry is too small (smaller than actual MTU). Fragmentation on the IP level has nothing to do with it. Unless there's some weird fragmentation/segmentation offload interaction that causes trouble, of course. Oh, just to clarify, I'm running tcpreplay on the machine that has the RTL8125B network card, i.e. I'm sending, not receiving. To verify whether the mentioned fragmented packets rx path is hit: If the issue occurs, do you see rx length errors here? /sys/class/net/<if>/statistics/rx_length_errors It's zero. The only non-zero, in fact, are multicast, rx_bytes, rx_packets, tx_bytes and tx_packets. FWIW, tcpreplaying the pcap from other machine, and watching tcpdump on the affected one, has no ill effects. Packets are visible in tcpdump, but nothing else happens. No error counter increase as well. Ah, then we don't talk about the rx path. If you tcpreplay the "distilled hang" pcap, does the traffic look ok on the wire? > Ah, then we don't talk about the rx path. Sorry for not making it more clear in the beginning. > If you tcpreplay the "distilled hang" pcap, does the traffic look ok on the > wire? The other machine reliably sees only the first 2 packets (out of 3). I mean it's always 2 out of 3, not that the 3rd packet appears unrealiably. OK. Hmm, maybe a HW bug. Can you test whether the same issue occurs with the r8125 vendor driver? There's no errata info available from Realtek, but I can check with a contact in Realtek whether they are aware of any such issue. It's exactly the same with the out-of-tree r8125, except it doesn't say anything in dmesg when the hang happens, and never recovers on its own. Realtek was able to reproduce the issue. RTL8125B may suffer from a similar issue with short packets like RTL8168evl. Could you please check whether the following fixes the issue for you? diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c index 7e0947e29..375f451cc 100644 --- a/drivers/net/ethernet/realtek/r8169_main.c +++ b/drivers/net/ethernet/realtek/r8169_main.c @@ -4053,7 +4053,13 @@ static int rtl8169_xmit_frags(struct rtl8169_private *tp, struct sk_buff *skb, static bool rtl_test_hw_pad_bug(struct rtl8169_private *tp, struct sk_buff *skb) { - return skb->len < ETH_ZLEN && tp->mac_version == RTL_GIGA_MAC_VER_34; + switch (tp->mac_version) { + case RTL_GIGA_MAC_VER_34: + case RTL_GIGA_MAC_VER_63: + return skb->len < ETH_ZLEN; + default: + return false; + } } static void rtl8169_tso_csum_v1(struct sk_buff *skb, u32 *opts) -- 2.29.2 Seems to help! No hangs, and the other machine receives all 3 packets fine as well. Great! Then I'll check with Realtek whether other chip versions may be affected too before submitting the change. This way bug analysis is fun: Realtek responds within 2 hrs, and you test within 10 mins. My own e-mail to realtek that I sent earlier today, given the lack of positive delivery status notification, seems to be devnulled, though :) Fixed with 2aaf09a0e784 ("r8169: work around short packet hw bug on RTL8125"). The fix was released in Linux 5.9.7. Was this fix removed in 5.10.0 RC5? I still get this error regularly when torrenting. Have not tried 5.9 because ... ryzen 5950x and it works better with 5.10. [1274670.281163] r8169 0000:25:00.0 huginn: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100). [1274670.289901] r8169 0000:25:00.0 huginn: rtl_rxtx_empty_cond_2 == 0 (loop: 42, delay: 100). [1274680.304429] r8169 0000:25:00.0 huginn: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100). [1274680.313171] r8169 0000:25:00.0 huginn: rtl_rxtx_empty_cond_2 == 0 (loop: 42, delay: 100). [1275010.329033] r8169 0000:25:00.0 huginn: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100). [1275010.337773] r8169 0000:25:00.0 huginn: rtl_rxtx_empty_cond_2 == 0 (loop: 42, delay: 100). [1275020.355642] r8169 0000:25:00.0 huginn: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100). [1275020.364378] r8169 0000:25:00.0 huginn: rtl_rxtx_empty_cond_2 == 0 (loop: 42, delay: 100). [1275375.340078] r8169 0000:25:00.0 huginn: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100). [1275375.348817] r8169 0000:25:00.0 huginn: rtl_rxtx_empty_cond_2 == 0 (loop: 42, delay: 100). [1275390.273322] r8169 0000:25:00.0 huginn: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100). [1275390.282058] r8169 0000:25:00.0 huginn: rtl_rxtx_empty_cond_2 == 0 (loop: 42, delay: 100). $ lspci | grep RTL 25:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller (rev 04) Not sure whether the fix made it to 5.10-rc5 or to a later rc version only. Please test with 5.10.1. There the fix is included. Nope, not fixed. It's still here. jaan 05 07:50:16 Zen kernel: r8169 0000:25:00.0 huginn: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100). jaan 05 07:50:16 Zen kernel: r8169 0000:25:00.0 huginn: rtl_rxtx_empty_cond_2 == 0 (loop: 42, delay: 100). jaan 05 07:51:56 Zen kernel: r8169 0000:25:00.0 huginn: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100). jaan 05 07:51:56 Zen kernel: r8169 0000:25:00.0 huginn: rtl_rxtx_empty_cond_2 == 0 (loop: 42, delay: 100). [deemon@Zen ~]$ uname -a Linux Zen 5.10.2-2-MANJARO #1 SMP PREEMPT Tue Dec 22 08:14:42 UTC 2020 x86_64 GNU/Linux The patch from comment 21 is included in 5.10.2. And in comment 22 it was confirmed that it fixes the issue. So maybe we talk about something different here. Do you also see the transmit queue timeouts? Because the log messages are triggered from the chip reset procedure after a tx timeout. Best test with a mainline non-preempt kernel to rule out that the issue is caused by downstream kernel patches. This is Manjaro default "mainline" kernel. There is also "real-time" variant for audio production people and such. No non-preempt options. I have no idea what this "transmit queue" even is, but I do lose connection when this thing is happening (pings get lost for like 5-15 seconds.) But maybe I don't have this "RTL8125B" but just "RTL8125" without B? $ lspci | grep -i realtek 25:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller (rev 04) Also no wizard to create myself any such packets for testing, but for me the problem appears every time I use qbittorrent to download stuff. Quite easy way for me to check if the problem is still there. Are you sure that Manjaro didn't add own patches? Best compile a mainline kernel yourself from kernel.org sources. And please attach full dmesg output. Created attachment 294501 [details]
full dmesg
OK, so it is a RTL8125B. Please provide the output of "ethtool -i huginn" to check for the right firmware version being loaded. $ ethtool -i huginn driver: r8169 version: 5.10.2-2-MANJARO firmware-version: rtl8125b-2_0.0.2 07/13/20 expansion-rom-version: bus-info: 0000:25:00.0 supports-statistics: yes supports-test: no supports-eeprom-access: no supports-register-dump: yes supports-priv-flags: no All this looks normal. Just strange that the link breaks immediately once it's established. You could re-test with a different cable and best also a different link partner. Maybe it's a platform issue as your dmesg shows several other errors. It would be helpful if you could check whether any special network packets trigger the issue. Last but not least please check whether the same happens with the r8125 vendor driver. I have never experienced such problem when qbittorrent is not running. And I have NAS, so I move big files over the network quite often back and forth and never have this link drop problem then -- this should eliminate the cable problem (sadly can't swap it easily out, as it is kind of built into the wall). Link partner is my only router, so not really? I could put some switch in-between, but I doubt it will change much. About platform errors -- maybe; Yes, Zen3 is pretty raw also still; and I am eagerly waiting for MSI to provide newer stable BIOSes, but they are slow and release only betas right now, so... don't feel like being their beta tester with possibility to brick the board :D (they just yesterday released newer beta, with "Update to ComboAM4PIV2 1.1.9.0" but ... it's still beta sadly. About special network packets -- I have no idea how. If you provide some more handholding or guide with what packets to test and how, I could try, maybe. >About special network packets -- I have no idea how. If you provide
>some more
>handholding or guide with what packets to test and how, I could try,
>maybe.
The tcpreplay command I posted earlier is literally all you need to run.
ah yes, and I never experienced this link drop problem with my other (previous) computer in the same router port with the same cable with qbittorrent. So I assume it's either this NIC or the NIC drivers issue, not the "other end of the link". But I can do the switch in-between test later. Also will look into the tcpreplay thing a bit later when back at home. Downloaded the distilled_hang1.pcap and ran the command on the NIC. 20 times. No hang. How can I create my own version? :) It's Heiner's call, but I'd say that yours is a different bug.
> How can I create my own version?
Capture the network traffic with Wireshark or tcpdump when the hang occurs. You'll likely notice TCP retransmissions when it happens. If tcpreplaying it gives the same problem - good news, you're mostly there. Wireshark lets you export a selected subset of packets to a new file, which may help to minimize the testcase.
If _incoming_ traffic causes the problem, then you might need to replay the traffic from a different machine.
it's kind of stupid to find the correct package because tcpreplay fails to deal with big packages (does not fragment them?) and just dies. Is there some flag or trick I can use to remedy that? Warning: Unable to send packet: Error with PF_PACKET send() [119]: Message too long (errno = 90) Actual: 118 packets (87206 bytes) sent in 0.009997 seconds Rated: 8723216.9 Bps, 69.78 Mbps, 11803.54 pps Flows: 19 flows, 1900.57 fps, 119 flow packets, 0 non-flow Statistics for network device: huginn Successful packets: 118 Failed packets: 1 Truncated packets: 0 Retried packets (ENOBUFS): 0 Retried packets (EAGAIN): 0 This might be caused by TSO/GSO. (When TSO is active, the in-kernel TCP/IP stack implementation sends seemingly oversized TCP segments to the network card, which cuts it up to appropriately sized segments, offloading the segmentation to the network card instead of CPU. Note that it has nothing to do with IP fragmentation. As a side effect, tcpdump usually captures such impossibly large IP datagrams, even though they will appear normal on the wire.) Try to temporarily disable TSO/GSO with ethtool, that should fix the capture. Created attachment 294519 [details]
30 packets to cause crash
probably takes less than all those 30.
Yup, I can also reproduce it with your pcap. The vendor r8125 driver hangs the same way. However, their driver doesn't even have a fix for my "distilled pcap" yet. (After several rmmod/insmod the r8169 driver to longer detects the link, but r8125 works fine. I think I'll need a reboot, lol) It looks weird that a packet of 1480 bytes is fragmented to 1472 + 8 bytes. Maybe something with your MTU setting is incorrect. Try to reduce your MTU by 8 bytes. This may still not explain the hw issue, but it may help to avoid triggering it. Not quite sure which one of us you are talking, but my network MTU is 1500 like it's by default. Normally the unfragmented traffic in Wireshark is also 1514 bytes "on wire" (technically it should be 1518, 14 header and 4 tail checksum; no idea why wireshark doesn't show this): Ethernet II: 14 bytes IP header: 20 bytes UDP header: 8 bytes data or "payload": 1472 bytes. (checksum: 4 bytes which wireshark doesn't count) https://en.wikipedia.org/wiki/Ethernet_frame#Ethernet_II 1472(data) + 8(udp) + 20(ip) = 1500 MTU size exactly I have no clear understanding why it sometimes starts fragmenting normal size packets to smaller pieces -- maybe because the other end asks for smaller (or bigger) chunks? Smaller makes sense if they are connected over PPPoE or VPN or something (those protocol encapsulations need to fit into the frame also) and instead of multiple fragmentations on the way they ask smaller size from the beginning somehow? Not sure where you got this 1480 and what exactly you mean by it, but when the payload is 1480 (anything over 1472) for some reason, it has to be fragmented to 1472 byte chunks I think. So, will you reopen this bug or let it remain "fixed" and create a new one? :) Technically it's a problem with the same hardware and same driver. Your pcap causes the same problem, and looks strikingly similar to mine, except 1) UDP instead of TCP (inside the fragmented IP datagrams), 2) the mininum IP datagram size (of the last fragment) is 28 instead of 25, 3) the padding workaround implemented earlier doesn't fix it. In my opinion, the underlying problem is likely the same, it's just the fix was incomplete. I'm reopening it. Thank you WGH. Just to clarify one thing -- since the minimum Ethernet frame by IEEE802.3 is 64 bytes/octets (14 header; 46 "payload"; 4 checksum). When the payload is smaller than 46 octets, the padding should be added accordingly by specification. The question is, which component in the linux network stack is responsible for that (I have no clue), but when it should be the driver, then adding padding is not a "workaround" but actual "fix" and how it should have been done in the first place. :-) For chips handled by r8169 the hw is supposed to do the padding. (In reply to Priit O. from comment #48) > Not sure where you got this 1480 and what exactly you mean by it, but when > the payload is 1480 (anything over 1472) for some reason, it has to be > fragmented to 1472 byte chunks I think. The 1480 comes from the recorded pcap file. A 1480 byte UDP packet is fragmented to 1472 + 8 bytes. IP fragmentation is bad in general. Data load should be splitted on a higher level. So the question is why a higher level creates 1480 byte UDP packets instead of 1472 bytes packets. It's hard to tell. libtorrent (which qBittorrent is based on) appears to have MTU discovery code built in its uTP implementation. https://github.com/arvidn/libtorrent/blob/e3f2b016dcd37a9a6e8a94006c7befcf2cb7bfac/src/utp_stream.cpp#L1752 Perhaps the code is simply not written to avoid fragmentation at all costs, so datagrams will be occasionally fragmented before path MTU is probed. "Don't fragment" is set only on the probe datagrams, so other datagrams might still be fragmented. The kernel might also already know that the effective path MTU to some specific host is lower (e.g. discovered by Path MTU Discovery earlier), which would cause fragmentation right away (as opposed to on some distant router). Maybe the libtorrent developer had a reason to not enforce "don't fragment" on all datagrams, as ICMP is known to be occasionally blocked by some retarded ISPs for no reason, causing broken PMTUD (hello, Fastly). TCP has a "clamp MSS" workaround for that scenario, but I don't think there's any for UDP. Anyway, I believe this is tangential to the problem at hand. For information i have the same bug [Tue Jan 19 22:24:00 2021] r8169 0000:01:00.0 enp1s0: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100). [Tue Jan 19 22:24:00 2021] r8169 0000:01:00.0 enp1s0: rtl_rxtx_empty_cond_2 == 0 (loop: 42, delay: 100). [Tue Jan 19 22:24:20 2021] r8169 0000:01:00.0 enp1s0: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100). [Tue Jan 19 22:24:20 2021] r8169 0000:01:00.0 enp1s0: rtl_rxtx_empty_cond_2 == 0 (loop: 42, delay: 100). [Tue Jan 19 22:24:34 2021] r8169 0000:01:00.0 enp1s0: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100). [Tue Jan 19 22:24:34 2021] r8169 0000:01:00.0 enp1s0: rtl_rxtx_empty_cond_2 == 0 (loop: 42, delay: 100). [Tue Jan 19 22:24:44 2021] r8169 0000:01:00.0 enp1s0: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100). [Tue Jan 19 22:24:44 2021] r8169 0000:01:00.0 enp1s0: rtl_rxtx_empty_cond_2 == 0 (loop: 42, delay: 100). When it happens the server is timing out for a few seconds. It happens randomly when there is heavy traffic. I also use qbitorrent. But i never have seen this problem on an older hardware that used the same connexion and the same rj45 cable. I tried to change the rj45 cable but it didnt fix it. Following this thread i tried : debian kernel 5.9 debian kernel 5.10 ubuntu kernel 5.10.8 ubuntu kernel 5.11.0-051100rc4-generic I ll try the same setup on windows 10 pro (if i can do it with WSL2) to see if it s not driver related but hardware related. Please check whether the same issue happens with the r8125 vendor driver. r8125 realtek driver was worse and it didnt log an error in dmesg. I could see enp1s0 going up multiple time on dmesg when the server was not responding. It was also random on heavy traffic but the timed out was way longer like 30s-2mn. The ssh connexion would just never recover when in r8169 it would recover after a few seconds. Hey, Heiner, are you in touch with Realtek regarding the new pcap this time? I contacted Realtek, however have no feedback yet. I searched for similar problem and found out that it was happening with UDP connexion even on windows 10. https://forum-en.msi.com/index.php?threads/realtek-pcie-2-5gbe-family-controller-resetting.345823/page-4 It seems realtek fixed it but just in the windows driver ( 2020/12/31 version ) https://www.realtek.com/en/component/zoo/category/network-interface-controllers-10-100-1000m-gigabit-ethernet-pci-express-software All the post strongly suggest heavy udp load that lead the driver to crash on windows. It seems exactly the same problem on linux. setting qbittorrent to TCP only and bloquing UDP seems to have stabilized the connexion for one day. Now anyway i just use a usb gigabit ethernet adaptator and everything work fine. Created attachment 294809 [details]
Test patch v1
Realtek provided a workaround proposal. Please test attached patch. i dont know how to do that, is there a guide i could follow ? Apply patch (patch -p1 -i <patch>) on top of linux-next. It should apply also on top of recent stable kernels, but maybe small adjustments are needed. Then build the kernel as usual, procedure may be distro-dependent. i think i managed to install it. I compiled the new r8169.ko module and just copied it over the old one so i would not have to recompile the whole kernel ( and probably fail doing it ). Thank you it s been 1 hour without any timeout so it looks good. Thanks a lot for testing! Few inquiries, just to be on the safe side: - After installing the fresh r8169 module, you rmmod/modprobe'd it? - You reversed the changes described in comment 61? i did not rmmod/modprobe it. I just rebooted. I saw that modinfo r8169 changed, it lost its signature part so my guess is that it did work but i have no idea. So now i did modprobe -r r8169 (and found out the hard way it s directly effective as it s headless server and i was connected throught the realtek :o ) modprobe -v r8169 and i rebooted Yes i reversed what i did to set back udp on in qbittorrent I acutally got the [Fri Jan 22 18:47:35 2021] r8169 0000:01:00.0 enp1s0: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100). [Fri Jan 22 18:47:35 2021] r8169 0000:01:00.0 enp1s0: rtl_rxtx_empty_cond_2 == 0 (loop: 42, delay: 100). But that was before i did the modprobe actions so i hope i was still on the old version. I m not good enought with linux module compilation to know if what i did was the correct way to update it :/ here is what i get with modinfo (no more signature part) xplo@nas:~$ modinfo r8169 filename: /lib/modules/5.10.9-051009-generic/kernel/drivers/net/ethernet/realtek/r8169.ko firmware: rtl_nic/rtl8125b-2.fw firmware: rtl_nic/rtl8125a-3.fw firmware: rtl_nic/rtl8107e-2.fw firmware: rtl_nic/rtl8107e-1.fw firmware: rtl_nic/rtl8168fp-3.fw firmware: rtl_nic/rtl8168h-2.fw firmware: rtl_nic/rtl8168h-1.fw firmware: rtl_nic/rtl8168g-3.fw firmware: rtl_nic/rtl8168g-2.fw firmware: rtl_nic/rtl8106e-2.fw firmware: rtl_nic/rtl8106e-1.fw firmware: rtl_nic/rtl8411-2.fw firmware: rtl_nic/rtl8411-1.fw firmware: rtl_nic/rtl8402-1.fw firmware: rtl_nic/rtl8168f-2.fw firmware: rtl_nic/rtl8168f-1.fw firmware: rtl_nic/rtl8105e-1.fw firmware: rtl_nic/rtl8168e-3.fw firmware: rtl_nic/rtl8168e-2.fw firmware: rtl_nic/rtl8168e-1.fw firmware: rtl_nic/rtl8168d-2.fw firmware: rtl_nic/rtl8168d-1.fw license: GPL softdep: pre: realtek description: RealTek RTL-8169 Gigabit Ethernet driver author: Realtek and the Linux r8169 crew <netdev@vger.kernel.org> srcversion: 7022707DE347B6B940D4E30 alias: pci:v000010ECd00003000sv*sd*bc*sc*i* alias: pci:v000010ECd00008125sv*sd*bc*sc*i* alias: pci:v00000001d00008168sv*sd00002410bc*sc*i* alias: pci:v00001737d00001032sv*sd00000024bc*sc*i* alias: pci:v000016ECd00000116sv*sd*bc*sc*i* alias: pci:v00001259d0000C107sv*sd*bc*sc*i* alias: pci:v00001186d00004302sv*sd*bc*sc*i* alias: pci:v00001186d00004300sv*sd*bc*sc*i* alias: pci:v00001186d00004300sv00001186sd00004B10bc*sc*i* alias: pci:v000010ECd00008169sv*sd*bc*sc*i* alias: pci:v000010FFd00008168sv*sd*bc*sc*i* alias: pci:v000010ECd00008168sv*sd*bc*sc*i* alias: pci:v000010ECd00008167sv*sd*bc*sc*i* alias: pci:v000010ECd00008161sv*sd*bc*sc*i* alias: pci:v000010ECd00008136sv*sd*bc*sc*i* alias: pci:v000010ECd00008129sv*sd*bc*sc*i* alias: pci:v000010ECd00002600sv*sd*bc*sc*i* alias: pci:v000010ECd00002502sv*sd*bc*sc*i* depends: retpoline: Y name: r8169 vermagic: 5.10.9-051009-generic SMP mod_unload still happening [Fri Jan 22 19:42:30 2021] r8169 0000:01:00.0 enp1s0: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100). [Fri Jan 22 19:42:30 2021] r8169 0000:01:00.0 enp1s0: rtl_rxtx_empty_cond_2 == 0 (loop: 42, delay: 100). I just feel it s less frequent but i m not sure at all. I have applied the patch to 5.10.7 (the first hunk with the includes didn't apply cleanly, but it was easy enough to fix). Neither my TCP distilled pcap nor fragmented UDP pcap by Priit causes any hangs, so it seems to work. xplo, could you tcpreplay the attached pcaps on your machine with the patch applied? If they don't cause any hangs, could you try to create a small reproduction pcap the same way we did further up this discussion? Created attachment 294813 [details]
Test patch v2
I attached a second patch version, also based on a proposal from Realtek.
Maybe the first version fixes most of the problem cases, but not all.
forget everything i said i just fail at patching the driver :/ WGH since i reproduce the problem with Priit udp pcap it just means i m still on the old driver. I dont know how to patch / install corectly a module, each google search got me differents ways. it seems if i reboot after modprobe r8169 it fall back to the old driver. Sorry i still have to learn a lot about linux kernel. Without rebooting the pcap udp work fine. So i wont reboot and let it run to see if it time-out again. Perhaps, you didn't rebuild your initramfs, and the old module is still in it? On Ubuntu "update-initramfs -u" seems to be the proper command. Or you remove r8169 from initramfs, it's not needed except your rootfs is e.g. on NFS. Typically this is done in /etc/mkinitcpio.conf. i just wont reboot as it s a home server. so far so good i just got that in dmesg [Sat Jan 23 04:19:13 2021] TCP: enp1s0: Driver has suspect GRO implementation, TCP performance may be compromised. but the smokeping service report no packet loss for all night. On a sidenote i dont really understand why realtek doesn't fix their own driver. It s like the only 2.5gbe ethernet device that are shiped by realtek with nearly all 2020 motherboard o_o Sooner or later Realtek is going to come up with a fixed r8125 Linux driver, but apparently their focus is on the Windows driver, especially as their network chips are used primarily on consumer hardware. Regarding the TCP warning: There has been a number of reports over the last years, with very different network drivers. But I didn't find a good explanation what actually causes the error. One hint was that it can happen if you receive packets bigger than MTU. Do you have any device in your network that may send jumbo packets? i dont know but at night if it was a jumbo packet it was from the internet (if that is even possible). And i did set mtu to 9000. 24h so far the new driver with Test patch v2. Nothing to report expect it s perfectly working. Thank you so much :) Good to hear that patch 2 works for you. It has slightly more performance impact than version 1, therefore if possible I'd go with version 1. But only if version 1 definitely fixes the issue. Could you run version 1 again for some time and check whether it's fine as well? of course performances are important, i m now testing patch1 After one day the patch 1 looks good. By the way update-initramfs -u did work to keep the driver loaded after reboot thx :) Great. Just to be sure: You did a update-initramfs after switching from patch v2 to v2? yes i did it after modprobe r8169 with v1 patch Still good after 3 days. I dont see any problem and it s far better than the realtek driver or the current kernel driver. Great, thanks for the feedback. Then I'll submit patch v1. Should be fixed with 8d520b4de3ed ("r8169: work around RTL8125 UDP hw bug"). Thank you. How can we know in which kernel version it will be merged ? It may take 1-2 weeks until it's applied to 5.11-rc and the stable kernel versions. Just check the change log of new kernel versions. I am also hitting this pretty consistently after some time on an 8111H with all offloads turned off 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15) Subsystem: Realtek Semiconductor Co., Ltd. Device 0123 Flags: bus master, fast devsel, latency 0, IRQ 54, IOMMU group 11 I/O ports at e000 [size=256] Memory at fe804000 (64-bit, non-prefetchable) [size=4K] Memory at fe800000 (64-bit, non-prefetchable) [size=16K] Capabilities: [40] Power Management version 3 Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [70] Express Endpoint, MSI 01 Capabilities: [b0] MSI-X: Enable+ Count=4 Masked- Capabilities: [100] Advanced Error Reporting Capabilities: [140] Virtual Channel Capabilities: [160] Device Serial Number 01-00-00-00-68-4c-e0-00 Capabilities: [170] Latency Tolerance Reporting Capabilities: [178] L1 PM Substates Kernel driver in use: r8169 Please attach a full dmesg incl. the error. Then: - Any specific action how you can trigger it? - Did you test the potential reasons / fixes discussed here? - Is it a regression? Created attachment 296079 [details] attachment-20686-0.html For sure sorry, I had already cleared it with a power cycle on a volatile file system image. When it comes around again I'll try try capture it. -Curtis ________________________________ From: bugzilla-daemon@bugzilla.kernel.org <bugzilla-daemon@bugzilla.kernel.org> Sent: Friday, March 26, 2021 3:19:07 PM To: curtdept@me.com <curtdept@me.com> Subject: [Bug 209839] r8169 (RTL8125B): "rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100)" and connectivity loss, caused by small fragmented datagrams https://bugzilla.kernel.org/show_bug.cgi?id=209839 --- Comment #90 from Heiner Kallweit (hkallweit1@gmail.com) --- Please attach a full dmesg incl. the error. Then: - Any specific action how you can trigger it? - Did you test the potential reasons / fixes discussed here? - Is it a regression? -- You may reply to this email to add a comment. You are receiving this mail because: You are on the CC list for the bug. No obvious triggers, short of time, it seems to happen after a few days or so, sometimes as short as a day. Reboot always fixes, causes the NIC to stop responding. Managed to capture this. 237837.769013] ------------[ cut here ]------------ [237837.774087] NETDEV WATCHDOG: eth1 (r8169): transmit queue 0 timed out [237837.781113] WARNING: CPU: 6 PID: 0 at net/sched/sch_generic.c:442 dev_watchdog+0x1f6/0x200 [237837.789995] Modules linked in: xt_connlimit nf_conncount iptable_nat xt_state xt_nat xt_helper xt_conntrack xt_connmark xt_connbytes xt_REDIRECT xt_MASQUERADE xt_FLOWOFFLOAD xt_CT nf_nat nf_flow_table nf_conntrack ipt_REJECT xt_time xt_tcpudp xt_tcpmss xt_statistic xt_recent xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_ecn xt_dscp xt_comment xt_TCPMSS xt_LOG xt_HL xt_DSCP xt_CLASSIFY sch_cake rfcomm nf_reject_ipv4 nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 iptable_raw iptable_mangle iptable_filter ipt_ECN ip_tables hidp hci_uart btusb btintel bnep bluetooth_6lowpan bluetooth sch_tbf sch_ingress sch_htb sch_hfsc em_u32 cls_u32 cls_tcindex cls_route cls_matchall cls_fw cls_flow cls_basic act_skbedit act_mirred 6lowpan evdev xt_set ip_set_list_set ip_set_hash_netportnet ip_set_hash_netport ip_set_hash_netnet ip_set_hash_netiface ip_set_hash_net ip_set_hash_mac ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_hash_ipport ip_set_hash_ipmark ip_set_hash_ip ip_set_bitmap_port [237837.790226] ip_set_bitmap_ipmac ip_set_bitmap_ip ip_set nfnetlink nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 ifb vfat fat nls_utf8 nls_iso8859_1 nls_cp437 ecdh_generic ecc kpp button_hotplug [237837.907427] CPU: 6 PID: 0 Comm: swapper/6 Not tainted 5.10.27 #0 [237837.913962] Hardware name: Maxtang FP30/FP30, BIOS FP30T306 06/19/2020 [237837.921049] RIP: 0010:dev_watchdog+0x1f6/0x200 [237837.925905] Code: 48 63 75 28 eb 91 4c 89 ef c6 05 0d 3c d1 00 01 e8 2f 09 fd ff 44 89 e1 4c 89 ee 48 c7 c7 d8 22 4d 82 48 89 c2 e8 b9 e1 10 00 <0f> 0b eb bc 66 0f 1f 44 00 00 49 89 f9 48 8d 87 40 01 00 00 31 c9 [237837.945635] RSP: 0018:ffffa128802b0ec0 EFLAGS: 00010282 [237837.951281] RAX: 0000000000000039 RBX: ffff949700934800 RCX: 0000000000000000 [237837.958946] RDX: ffff949818f9e2c8 RSI: ffff949818f973a0 RDI: 0000000000000300 [237837.966824] RBP: ffff949706eb8480 R08: 0000000000000000 R09: ffffa128802b0d10 [237837.974543] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000 [237837.982175] R13: ffff949706eb8000 R14: ffff949706eb83dc R15: ffff949700934880 [237837.989942] FS: 0000000000000000(0000) GS:ffff949818f80000(0000) knlGS:0000000000000000 [237837.998721] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [237838.004914] CR2: 0000559c83b1c920 CR3: 0000000107a58000 CR4: 00000000003506e0 [237838.012603] Call Trace: [237838.015336] <IRQ> [237838.017573] ? dev_trans_start+0x70/0x70 [237838.021868] ? dev_trans_start+0x70/0x70 [237838.026170] call_timer_fn.constprop.0+0x11/0x70 [237838.031244] expire_timers+0x94/0xc0 [237838.035152] run_timer_softirq+0x21c/0x230 [237838.039597] ? lapic_next_event+0x18/0x20 [237838.043957] ? clockevents_program_event+0x8a/0xe0 [237838.049172] ? hrtimer_interrupt+0x136/0x280 [237838.053830] __do_softirq+0xb3/0x1de [237838.057761] asm_call_irq_on_stack+0x12/0x20 [237838.062423] </IRQ> [237838.064781] do_softirq_own_stack+0x32/0x40 [237838.069331] irq_exit_rcu+0x83/0xb0 [237838.072987] sysvec_apic_timer_interrupt+0x2e/0x80 [237838.078254] asm_sysvec_apic_timer_interrupt+0x12/0x20 [237838.083785] RIP: 0010:cpuidle_enter_state+0xba/0x280 [237838.089209] Code: ac 0d 7c ff 31 ff 49 89 c5 e8 02 21 7c ff 41 83 e7 01 74 12 9c 58 f6 c4 02 0f 85 b6 01 00 00 31 ff e8 ba 57 80 ff fb 45 85 f6 <0f> 88 bd 00 00 00 4c 2b 2c 24 49 63 c6 48 6b c8 68 48 8d 14 40 48 [237838.109497] RSP: 0018:ffffa12880157eb0 EFLAGS: 00000202 [237838.115232] RAX: ffff949818fa0b40 RBX: ffff94970147a400 RCX: 000000000000001f [237838.122917] RDX: 0000000000000000 RSI: 000000003d112d75 RDI: 0000000000000000 [237838.130642] RBP: ffffffff826c8980 R08: 0000d84febbc0f9b R09: 0000000000000018 [237838.138353] R10: 0000000000027c58 R11: ffff949818f9fde4 R12: 0000000000000002 [237838.146117] R13: 0000d84febbc0f9b R14: 0000000000000002 R15: 0000000000000000 [237838.153849] cpuidle_enter+0x24/0x40 [237838.157780] do_idle+0x196/0x1f0 [237838.161293] cpu_startup_entry+0x14/0x20 [237838.165596] secondary_startup_64_no_verify+0xc2/0xcb [237838.171152] ---[ end trace 23fecbc400b1bc82 ]--- [237838.189098] r8169 0000:03:00.0 eth1: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100). [237838.201581] r8169 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xfffffffdf8d90270 flags=0x0018] [237843.669099] r8169 0000:03:00.0 eth1: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100). [237843.681078] r8169 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xfffffffdf8880250 flags=0x0018] [237848.788563] r8169 0000:03:00.0 eth1: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100). [237848.830082] r8169 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xfffffffdf8000040 flags=0x0018] [237858.771434] r8169 0000:03:00.0 eth1: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100). [237858.846681] r8169 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xfffffffdf8d80200 flags=0x0008] [237863.890870] r8169 0000:03:00.0 eth1: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100). [237863.933365] r8169 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xfffffffdf82e0240 flags=0x0018] [237869.777828] r8169 0000:03:00.0 eth1: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100). [237869.991911] r8169 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xfffffffdf8c70200 flags=0x0008] [237874.897662] r8169 0000:03:00.0 eth1: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100). [237875.476296] r8169 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xfffffffdf8000040 flags=0x0018] Typically RTL8168h runs fine with r8169, it can be found on a lot of consumer mainboards. The timeout warning provides no further details, "rtl_rxtx_empty_cond == 0" is caused by the chip not responding. Therefore it doesn't say anything about the root cause. You could check whether it's a regression, IOW whether some previous kernel version is fine. Also you could check whether it works ok with the r8168 vendor driver from Realtek. (In reply to Heiner Kallweit from comment #93) > Typically RTL8168h runs fine with r8169, it can be found on a lot of > consumer mainboards. The timeout warning provides no further details, > "rtl_rxtx_empty_cond == 0" is caused by the chip not responding. Therefore > it doesn't say anything about the root cause. > > You could check whether it's a regression, IOW whether some previous kernel > version is fine. Also you could check whether it works ok with the r8168 > vendor driver from Realtek. Ya tried vendor driver, stability on it is significantly worse. It does seem like a regression to the mid 5.4 series, but I havent been able to pinpoint which one due to the length of time to repro the error. Hi, I think I am facing similar issue. In my case I have a "Realtek Semiconductor Co., Ltd. Device 8161 (rev 15)", running Ubuntu 20.04, Kernel 5.11.0. Devices names: - waneth (the ethernet connected to Internet cablemodem) - laneth (for the internal lan) - lanbr (the bridge) - wlan24g (one USB3.0 wifi dongle, TP-Link Archer T9UH V2, which uses RTLRTL8814AU, and with drivers from https://github.com/morrownr/8814au) - wlan5g (same wifi dongle) NOTE: I manage the network by systemd (systemd-networkd) and dnsmasq (for DHCP). Assigned those network devices names via systemd. My problem, I believe, started when I've attached to my existing bridge (lanbr) the 2 USB Wifi devices, and started to show in logs: [ 456.011918] r8169 0000:06:00.0 laneth: rtl_txcfg_empty_cond == 0 (loop: 42, delay: 100). [ 456.021994] r8169 0000:06:00.0 laneth: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100). So, the LAN network, via UTP cable, is not working (WIFI works perfectly) since this behavior. This started to happen 3 days ago, which matches the day I've configured/installed/bridged the USB WIFI devices. Any ideas? (In reply to Gustavo A. Díaz from comment #95) > Hi, > > I think I am facing similar issue. In my case I have a "Realtek > Semiconductor Co., Ltd. Device 8161 (rev 15)", running Ubuntu 20.04, Kernel > 5.11.0. > > Devices names: > > - waneth (the ethernet connected to Internet cablemodem) > - laneth (for the internal lan) > - lanbr (the bridge) > - wlan24g (one USB3.0 wifi dongle, TP-Link Archer T9UH V2, which uses > RTLRTL8814AU, and with drivers from https://github.com/morrownr/8814au) > - wlan5g (same wifi dongle) > > NOTE: I manage the network by systemd (systemd-networkd) and dnsmasq (for > DHCP). Assigned those network devices names via systemd. > > My problem, I believe, started when I've attached to my existing bridge > (lanbr) the 2 USB Wifi devices, and started to show in logs: > > [ 456.011918] r8169 0000:06:00.0 laneth: rtl_txcfg_empty_cond == 0 (loop: > 42, delay: 100). > [ 456.021994] r8169 0000:06:00.0 laneth: rtl_rxtx_empty_cond == 0 (loop: > 42, delay: 100). > > So, the LAN network, via UTP cable, is not working (WIFI works perfectly) > since this behavior. > > This started to happen 3 days ago, which matches the day I've > configured/installed/bridged the USB WIFI devices. > > Any ideas? Never mind, seems in my particular case was a problem with a UTP cable inside my LAN, which was causing flood from one device trying to get an IP via DHCP. Changed the cable, and fixed :S Created attachment 300966 [details]
Dmesg output
Having the same issue with the same chipset (RTL8168h), so throwing some info in as well. (In reply to Heiner Kallweit from comment #90) > Please attach a full dmesg incl. the error. Then: > - Any specific action how you can trigger it? > - Did you test the potential reasons / fixes discussed here? > - Is it a regression? Full dmesg is attached in c.97 It seems to happen simply by booting now. Dmesg shows a timeout not too long after loading the module and then the empty condition errors start occurring. With all this, I can not get any networking up and going now for some reason. This started occurring right after an upgrade to ubuntu from 20.04 to 22.04 to two separate systems. I was running the 5.4 kernel and jumped to 5.15 with the upgrade. For testing, I've tried the driver directly from realtek with no positive change in behavior; in fact, it produces less logs but outright does not provide any interface for NetworkManager to pick up on. I also tested the kmod packages for ubuntu with no luck. I'm getting errors when trying to build both the mainline 5.18-rc7 kernel and 5.17.8 kernel; ~/linux-5.17.8 via 🐍 v3.10.4 ❯ sudo make CALL scripts/checksyscalls.sh CALL scripts/atomic/check-atomics.sh DESCEND objtool DESCEND bpf/resolve_btfids CHK include/generated/compile.h CHK kernel/kheaders_data.tar.xz make[1]: *** No rule to make target 'debian/canonical-certs.pem', needed by 'certs/x509_certificate_list'. Stop. make: *** [Makefile:1831: certs] Error 2 ~/linux-5.18-rc7 via 🐍 v3.10.4 ❯ sudo make CALL scripts/checksyscalls.sh CALL scripts/atomic/check-atomics.sh DESCEND objtool CHK include/generated/compile.h CC [M] arch/x86/kvm/emulate.o In function ‘reg_read’, inlined from ‘reg_rmw’ at arch/x86/kvm/emulate.c:266:2: arch/x86/kvm/emulate.c:254:27: error: array subscript 32 is above array bounds of ‘long unsigned int[17]’ [-Werror=array-bounds] 254 | return ctxt->_regs[nr]; | ~~~~~~~~~~~^~~~ In file included from arch/x86/kvm/emulate.c:23: arch/x86/kvm/kvm_emulate.h: In function ‘reg_rmw’: arch/x86/kvm/kvm_emulate.h:366:23: note: while referencing ‘_regs’ 366 | unsigned long _regs[NR_VCPU_REGS]; | ^~~~~ cc1: all warnings being treated as errors make[2]: *** [scripts/Makefile.build:288: arch/x86/kvm/emulate.o] Error 1 make[1]: *** [scripts/Makefile.build:550: arch/x86/kvm] Error 2 make: *** [Makefile:1834: arch/x86] Error 2 Any help on the above would be appreciated and I can try building again with that help. Not sure what's going wrong here. FWIW I copied /boot/config-`uname -r`* to .config and, in menuconfig, just open and save it as is (I'm not savvy enough with the kernel to reasonably know what every option should be). As an end user, it absolutely feels like a regression introduced by something. I updated two systems, both with the same Realtek chip, from ubuntu 20.04 to 22.04 and both are now failing in the same way without any other changes. Neither system goes back to prior working behavior when booting from an old kernel either and neither shows working networking when booting from an ubuntu 22.04 live usb. I can also try other things to see what works and does not. Also please let me know if I missed something, it's late and I need to get rest. (In reply to Charles Haithcock from comment #97) > Created attachment 300966 [details] > Dmesg output Seems your report refers to an Ubuntu kernel. Please pay attention to the hint on the kernel bugzilla startpage. Issues with distro kernels should be reported to distro support forums. Alternatively you can build a mainline kernel and re-test. (In reply to Heiner Kallweit from comment #99) > (In reply to Charles Haithcock from comment #97) > > Created attachment 300966 [details] > > Dmesg output > > Seems your report refers to an Ubuntu kernel. Please pay attention to the > hint on the kernel bugzilla startpage. Issues with distro kernels should be > reported to distro support forums. > Alternatively you can build a mainline kernel and re-test. I understand. That's why I also mentioned in my comment I tried building the two most recent kernels available: > I'm getting errors when trying to build both the mainline 5.18-rc7 kernel and > 5.17.8 kernel; It's very likely a problem with how I'm building it but not sure what I chose incorrectly in menuconfig to prevent it from building properly. I don't mind helping to isolate an issue within the requirements outlined, but I need some help doing so. Best don't reuse the existing .config but create a clean new one with "make defconfig". (In reply to Heiner Kallweit from comment #101) > Best don't reuse the existing .config but create a clean new one with "make > defconfig". Thank you! I found various tutorials online which pointed out to just copy over the .config file; I'm unfamiliar with building kernels with just make, where I work I sometimes compile kernels but it's a super convoluted process and I don't even touch make or any build toolchain directly. Interestingly, everything works now. This morning, I attempted to boot both ubuntu 21.04 as well as fedora 36 from usb and both had proper networking activity. After rebooting to the kernel which experiened the issue in both systems, the issue is no longer present. This is exceedingly odd due to several reboots for both systems last night into the kernel whih originally experienced the issue as well as all the older kernels (I couldn't boot into one of them due to issues with grub finding / but didn't bother pursuing fixing that). Apologies! If it happens again, should a new bz be opened? (In reply to Charles Haithcock from comment #102) > (In reply to Heiner Kallweit from comment #101) > > Best don't reuse the existing .config but create a clean new one with "make > > defconfig". > > Thank you! I found various tutorials online which pointed out to just copy > over the .config file; I'm unfamiliar with building kernels with just make, > where I work I sometimes compile kernels but it's a super convoluted process > and I don't even touch make or any build toolchain directly. > > Interestingly, everything works now. This morning, I attempted to boot both > ubuntu 21.04 as well as fedora 36 from usb and both had proper networking > activity. After rebooting to the kernel which experiened the issue in both > systems, the issue is no longer present. This is exceedingly odd due to > several reboots for both systems last night into the kernel whih originally > experienced the issue as well as all the older kernels (I couldn't boot into > one of them due to issues with grub finding / but didn't bother pursuing > fixing that). > > Apologies! If it happens again, should a new bz be opened? This would be better. The bug report here is about RTL8125B and issues caused by small fragmented packets. Even though you faced similar symptoms it's a different story. I run a network-based desktop with rootfs via iSCSI and /home via NFS4. I also do a lot of development in Golang. All this together means it's a very network heavy system. I get the error from the title at every boot, but it recovers. Then, during work (e.g. when "gopls" kicks in), I get various intermittent network hangs (anywhere between 5 and 30 seconds). Sometimes the whole system permanently hangs (even the mouse cursor freezes) and I have to reset. Unfortunately, there are absolutely no warning/error messages in dmesg when hangs occur. When the full freeze happens, I can't access the system in any way to check - no wonder, as it fully depends on the network - but it responds to ping. I intend to get a RS232 port working in the system to capture some final kernel logs, if there are any. What did help with performance and these hangs, though, was switching to the out-of-tree r8168 kernel module. Hello, With an add-on PCI-E card using RTL8125A (XID 609), driver: r8169 version: 5.10.164-rm1+ firmware-version: rtl8125a-3_0.0.1 08/24/19 expansion-rom-version: bus-info: 0000:06:00.0 supports-statistics: yes supports-test: no supports-eeprom-access: no supports-register-dump: yes supports-priv-flags: no Got a backtrace once, and then just these timeout messages once in a while, I do not observe any unrecoverable hang or significant link downtime. The system also has an RTL8125B (XID 641) on the motherboard, that one seems fine. [382619.488071] ------------[ cut here ]------------ [382619.488074] NETDEV WATCHDOG: rt1 (r8169): transmit queue 0 timed out [382619.488084] WARNING: CPU: 21 PID: 0 at net/sched/sch_generic.c:467 dev_watchdog+0x25c/0x260 [382619.488084] Modules linked in: tcp_diag(E) udp_diag(E) inet_diag(E) unix_diag(E) dm_snapshot(E) xt_set(E) ip_set_hash_net(E) ip_set(E) nfnetlink(E) veth(E) vhost_net(E) vhost(E) vhost_iotlb(E) tap(E) tun(E) i2c_dev(E) sit(E) tunnel4(E) ip_tunnel(E) xt_comment(E) xt_multiport(E) xt_limit(E) xt_length(E) xt_CT(E) xt_state(E) xt_conntrack(E) ip6t_rpfilter(E) ipt_rpfilter(E) ip6table_nat(E) ip6table_raw(E) ip6table_mangle(E) iptable_raw(E) ip6table_filter(E) ip6_tables(E) iptable_mangle(E) xt_CHECKSUM(E) iptable_nat(E) xt_MASQUERADE(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) xt_tcpudp(E) cpufreq_userspace(E) cpufreq_conservative(E) iptable_filter(E) ip_tables(E) cpufreq_ondemand(E) x_tables(E) cpufreq_powersave(E) fuse(E) nbd(E) 8021q(E) garp(E) mrp(E) bridge(E) stp(E) llc(E) tcp_bbr(E) xfs(E) crc32_generic(E) loop(E) dm_thin_pool(E) edac_mce_amd(E) kvm_amd(E) radeon(E) kvm(E) snd_pcsp(E) ttm(E) snd_pcm(E) irqbypass(E) drm_kms_helper(E) snd_timer(E) rapl(E) snd(E) cec(E) [382619.488124] huawei_cdc_ncm(E) sp5100_tco(E) soundcore(E) wmi_bmof(E) watchdog(E) qmi_wwan(E) cdc_ncm(E) k10temp(E) ccp(E) evdev(E) option(E) drm(E) cdc_wdm(E) cdc_ether(E) joydev(E) cp210x(E) rng_core(E) usb_wwan(E) usbnet(E) i2c_algo_bit(E) usbserial(E) mii(E) sg(E) acpi_cpufreq(E) button(E) ext4(E) crc16(E) mbcache(E) jbd2(E) btrfs(E) blake2b_generic(E) dm_crypt(E) raid10(E) raid456(E) async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E) async_tx(E) sr_mod(E) xor(E) cdrom(E) raid6_pq(E) raid0(E) multipath(E) linear(E) dm_cache_smq(E) dm_cache(E) dm_persistent_data(E) dm_bio_prison(E) dm_bufio(E) dm_mod(E) libcrc32c(E) crc32c_generic(E) hid_generic(E) usbhid(E) raid1(E) hid(E) sd_mod(E) uas(E) usb_storage(E) crc32_pclmul(E) crc32c_intel(E) md_mod(E) ghash_clmulni_intel(E) aesni_intel(E) libaes(E) crypto_simd(E) cryptd(E) glue_helper(E) r8169(E) ahci(E) xhci_pci(E) libahci(E) realtek(E) nvme(E) mdio_devres(E) xhci_hcd(E) i2c_piix4(E) libata(E) nvme_core(E) libphy(E) usbcore(E) [382619.488167] t10_pi(E) crc_t10dif(E) usb_common(E) scsi_mod(E) crct10dif_generic(E) crct10dif_pclmul(E) crct10dif_common(E) wmi(E) gpio_amdpt(E) gpio_generic(E) [382619.488175] CPU: 21 PID: 0 Comm: swapper/21 Tainted: G E 5.10.164-rm1+ #324 [382619.488176] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B550 PG Velocita, BIOS P2.10 08/04/2021 [382619.488178] RIP: 0010:dev_watchdog+0x25c/0x260 [382619.488180] Code: eb a9 48 8b 1c 24 c6 05 e0 09 cd 00 01 48 89 df e8 09 32 fa ff 44 89 e9 48 89 c2 48 89 de 48 c7 c7 20 85 f5 a4 e8 6f 4e 15 00 <0f> 0b eb 86 0f 1f 44 00 00 48 83 ec 40 48 89 5c 24 10 48 89 6c 24 [382619.488181] RSP: 0018:ffff98f9005e4eb0 EFLAGS: 00010282 [382619.488182] RAX: 0000000000000000 RBX: ffff88d480c6a000 RCX: ffff88e37ef5ca08 [382619.488183] RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff88e37ef5ca00 [382619.488184] RBP: ffff88d480c6a3dc R08: 0000000000000000 R09: ffff98f9005e4cd0 [382619.488185] R10: 0000000000000001 R11: 0000000000000001 R12: ffff88d4a16ff080 [382619.488185] R13: 0000000000000000 R14: ffff88d480c6a480 R15: 0000000000000001 [382619.488187] FS: 0000000000000000(0000) GS:ffff88e37ef40000(0000) knlGS:0000000000000000 [382619.488188] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [382619.488188] CR2: fffff8a002e02000 CR3: 000000012be94000 CR4: 0000000000750ee0 [382619.488189] PKRU: 55555554 [382619.488190] Call Trace: [382619.488192] <IRQ> [382619.488195] ? pfifo_fast_enqueue+0x180/0x180 [382619.488197] call_timer_fn+0x29/0x100 [382619.488199] __run_timers.part.0+0x1d4/0x250 [382619.488200] ? ktime_get+0x38/0xa0 [382619.488203] ? native_x2apic_icr_read+0x20/0x20 [382619.488204] ? lapic_next_event+0x1d/0x30 [382619.488206] ? clockevents_program_event+0xad/0x130 [382619.488207] run_timer_softirq+0x26/0x50 [382619.488210] __do_softirq+0xc3/0x275 [382619.488212] asm_call_irq_on_stack+0x12/0x20 [382619.488213] </IRQ> [382619.488216] do_softirq_own_stack+0x36/0x50 [382619.488218] irq_exit_rcu+0x92/0xd0 [382619.488221] sysvec_apic_timer_interrupt+0x36/0x80 [382619.488222] asm_sysvec_apic_timer_interrupt+0x12/0x20 [382619.488225] RIP: 0010:cpuidle_enter_state+0xc5/0x360 [382619.488226] Code: 8b 3d af e4 b5 5b e8 3a a0 a1 ff 49 89 c5 0f 1f 44 00 00 31 ff e8 5b ac a1 ff 45 84 ff 0f 85 ff 00 00 00 fb 66 0f 1f 44 00 00 <45> 85 f6 0f 88 0b 01 00 00 49 63 ce be 68 00 00 00 4c 2b 2c 24 48 [382619.488227] RSP: 0018:ffff98f9001efea8 EFLAGS: 00000246 [382619.488228] RAX: ffff88e37ef6fbc0 RBX: 0000000000000002 RCX: 000000000000001f [382619.488229] RDX: 0000000000000000 RSI: 0000000022a8da03 RDI: 0000000000000000 [382619.488229] RBP: ffff88d4a0a78c00 R08: 00015bfd8aab86e1 R09: 000000000000000a [382619.488230] R10: 0000000000007d99 R11: ffff88e37ef6eb24 R12: ffffffffa51a4ca0 [382619.488231] R13: 00015bfd8aab86e1 R14: 0000000000000002 R15: 0000000000000000 [382619.488233] ? cpuidle_enter_state+0xb5/0x360 [382619.488234] cpuidle_enter+0x37/0x60 [382619.488237] do_idle+0x1fb/0x2c0 [382619.488238] cpu_startup_entry+0x19/0x20 [382619.488241] secondary_startup_64_no_verify+0xb0/0xbb [382619.488243] ---[ end trace b6bb48d8503993d8 ]--- [382619.501118] r8169 0000:06:00.0 rt1: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100). ... [444128.054196] r8169 0000:06:00.0 rt1: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100). [446939.932765] r8169 0000:06:00.0 rt1: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100). ... [450948.856709] r8169 0000:06:00.0 rt1: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100). [451152.886863] r8169 0000:06:00.0 rt1: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100). [451308.789226] r8169 0000:06:00.0 rt1: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100). Try disabling ASPM L1.1, either in BIOS or using the respective sysfs property. Hello, It appears to be disabled (for 0000:06:00.0): # grep . `find /sys/ -type f | grep aspm` /sys/devices/pci0000:00/0000:00:01.2/0000:02:00.2/0000:03:00.0/0000:04:00.0/link/l1_aspm:1 /sys/devices/pci0000:00/0000:00:01.2/0000:02:00.2/0000:03:08.0/0000:07:00.0/link/l1_aspm:0 /sys/devices/pci0000:00/0000:00:01.2/0000:02:00.2/link/l1_aspm:1 /sys/devices/pci0000:00/0000:00:01.2/0000:02:00.2/0000:03:07.0/0000:06:00.0/link/l1_aspm:0 /sys/devices/pci0000:00/0000:00:01.2/0000:02:00.0/link/l1_aspm:1 /sys/devices/pci0000:00/0000:00:01.2/0000:02:00.1/link/l1_aspm:1 /sys/devices/pci0000:00/0000:00:08.1/0000:0a:00.0/link/l0s_aspm:0 /sys/devices/pci0000:00/0000:00:08.1/0000:0a:00.0/link/l1_aspm:0 /sys/devices/pci0000:00/0000:00:08.1/0000:0a:00.3/link/l0s_aspm:0 /sys/devices/pci0000:00/0000:00:08.1/0000:0a:00.3/link/l1_aspm:0 /sys/devices/pci0000:00/0000:00:08.1/0000:0a:00.1/link/l0s_aspm:0 /sys/devices/pci0000:00/0000:00:08.1/0000:0a:00.1/link/l1_aspm:0 /sys/devices/pci0000:00/0000:00:01.1/0000:01:00.0/link/l1_1_aspm:0 /sys/devices/pci0000:00/0000:00:01.1/0000:01:00.0/link/l1_aspm:1 /sys/devices/pci0000:00/0000:00:07.1/0000:09:00.0/link/l0s_aspm:0 /sys/devices/pci0000:00/0000:00:07.1/0000:09:00.0/link/l1_aspm:0 grep: /sys/module/pcie_aspm/uevent: Permission denied /sys/module/pcie_aspm/parameters/policy:[default] performance powersave powersupersave /sys/module/radeon/parameters/aspm:-1 Yes, even L1 is disabled, so ASPM should not be the root cause. You could test whether latest linux-next is better, and you can test with r8125 vendor driver. I am not 100% sure if this is related but I am encountering the same kind of error. My setup is proxmox with kernel 5.15, but during tests I have tried also 6.2. As attachment I am providing kernel.log from 1st of May, when I tried all combinations of kernel version as well as using r8169 and vendor's r8125. The case is that vendor driver r8125 is working without any problems but r8169 is crashing (however, restoring the activity after some time). From the log please note: May 1 11:27:55 pve kernel: [ 291.063523] vmbr0: port 2(enp3s0) entered forwarding state May 1 11:28:16 pve kernel: [ 311.589586] ------------[ cut here ]------------ May 1 11:28:16 pve kernel: [ 311.589654] NETDEV WATCHDOG: enp3s0 (r8169): transmit queue 0 timed out May 1 11:28:16 pve kernel: [ 311.589820] WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:525 dev_watchdog+0x212/0x220 You may also see r8125 "correct behavior" in similar cases. Now something that I didn't noticed in previous comments - for me the error happens when I unplug the network cable and plug in again. So, I am able to reproduce it without any problems (no need to wait). Maybe from reporter's perspective it was similar, just carrier went down etc.? I do not know how to attach file (if passible - somebody can attach it? the this comment can be deleted): https://data.online-mp.pl/s/GrM7tNzGRLRdz9X (In reply to Marcin from comment #110) > I do not know how to attach file (if passible - somebody can attach it? the > this comment can be deleted): > > https://data.online-mp.pl/s/GrM7tNzGRLRdz9X See the "Attachments" part, there's an option "Add an attachment". Could you please re-test with 6.4-rc1 and r8169? I'v heard before about an issue that on a specific RTL8168 chip version and maybe under specific circumstances a tx timeout occured after removing and re-plugging the cable. Maybe some hw erratum. Created attachment 304240 [details]
Kernel 5.15 and 6.2 vs. r8169 and r8125
Thank you. File is attached.
For the request to re-test with 6.4-rc1 and r8169 - I may have some difficulties as I do not have proper (spare) hardware for such a test.
But maybe would it be ok to create VM (with Linux 6.4-rc1 and with included r8169) and passthrough PCIe device to it?
My setup is a following (01:00.0 is add-on card, 02:00.0 and 03:00.0 are built-in): 01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller (rev 04) Subsystem: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller Kernel driver in use: r8125 Kernel modules: r8169, r8125 02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller (rev 04) Subsystem: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller Kernel driver in use: r8125 Kernel modules: r8169, r8125 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller (rev 04) Subsystem: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller Kernel driver in use: r8125 Kernel modules: r8169, r8125 Not sure whether the VM use case is a fully valid test case. But give it a try. Do the built-in and add-on NICs behave the same? I'm asking because the BIOS may do some special initialization for the built-in NIC, resulting in a different behavior. (In reply to Heiner Kallweit from comment #114) > Not sure whether the VM use case is a fully valid test case. But give it a > try. Sure, I will do this test. But I have a question (it was very long time since I have compiled kernel). If I did "make defconfig" - compilation was fast but almost without any modules. So, I did copied /boot/config-6.1.0-0.deb11.6-amd64 as .config and executed "make olddefconfig". Will it be ok (or there is better way to match configuration as much as possible)? BTW. I will do the test on VM with: [root@r8169 ~]# uname -a Linux r8169 6.1.0-0.deb11.6-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.15-1~bpo11+1 (2023-03-16) x86_64 GNU/Linux and then with new "6.4-rc1" (which I am compiling now). Bot kernels. With firmware for RTL8125B "not installed" and "installed". So, in total there will be 4 tests. However, in case "VM Test Platform" is not valid test case... I will need to think about different approach (). > > Do the built-in and add-on NICs behave the same? Yes. By means - the same error is thrown in dmesg. I didn't noticed any differences in behavior. BTW. This is Odroid H2+ - and "built-in" NICs are simply soldered Realtek RTL8125B chips assigned to two exposed (from CPU) PCIe lanes (from remaining 4 - 1 lane is assigned to "add-on" RTL8125B). Created attachment 304247 [details]
Test results - kernel 6.1.0
Created attachment 304248 [details]
Test results - kernel 6.4.0-rc1
Please see as provided (in comment #c116 and #c117) - for both kernel versions - I am able to reproduce the issue (by simply plugging the cable out and in). Remark - please note the timing between: [ 13.789561] r8169 0000:00:10.0 enx001e064549e5: Link is Down [ 14.418911] memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL, pid=515 'systemd' [ 17.325981] r8169 0000:00:10.0 enx001e064549e5: Link is Up - 2.5Gbps/Full - flow control off [ 17.326031] IPv6: ADDRCONF(NETDEV_CHANGE): enx001e064549e5: link becomes ready [ 150.591000] r8169 0000:00:10.0 enx001e064549e5: Link is Down [ 164.609442] r8169 0000:00:10.0 enx001e064549e5: Link is Up - 2.5Gbps/Full - flow control off [ 351.022591] ------------[ cut here ]------------ [ 351.022636] NETDEV WATCHDOG: enx001e064549e5 (r8169): transmit queue 0 timed out 9408 ms 164 - plug in the cable - dmesg is showing immediately the change "Link is Up" BUT - network is not active (in both ingress - eg. pinging my host from other machine and egress - pinging other host from test machine). 351 - when stack trace is thrown - network is full active again. It mean that "inactivity" lasted for more than 3 minutes (kernel 6.4) or 2 minutes (kernel 6.1)? Just a guess - besides of the reason (root cause of the issue, which is the most important here) - interface seems to become active after: 6.1.0 - 120 seconds + ~timeout (a few seconds?) 6.4.0 - 180 seconds + ~timeout (a few seconds?) Comment for above - timing between "Link is Up" (interface is inactive) and stack trace (interface becomes active again) is really RANDOM, as see below. Tests WITH or WITHOUT firmware are giving the same results, here is 6.4.0-rc1 without firmware: [ 13.083300] Console: switching to colour frame buffer device 160x50 [ 13.100055] bochs-drm 0000:00:02.0: [drm] fb0: bochs-drmdrmfb frame buffer device [ 13.116609] Error: Driver 'pcspkr' is already registered, aborting... [ 13.395255] r8169 0000:00:10.0: Direct firmware load for rtl_nic/rtl8125b-2.fw failed with error -2 [ 13.395270] r8169 0000:00:10.0: Unable to load firmware rtl_nic/rtl8125b-2.fw (-2) [ 13.425483] RTL8226B_RTL8221B 2.5Gbps PHY r8169-0-80:00: attached PHY driver (mii_bus:phy_addr=r8169-0-80:00, irq=MAC) [ 13.502455] r8169 0000:00:10.0 enx001e064549e5: No native access to PCI extended config space, falling back to CSI [ 13.529332] r8169 0000:00:10.0 enx001e064549e5: Link is Down [ 14.155465] memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL, pid=513 'systemd' [ 17.113067] r8169 0000:00:10.0 enx001e064549e5: Link is Up - 2.5Gbps/Full - flow control off [ 17.113177] IPv6: ADDRCONF(NETDEV_CHANGE): enx001e064549e5: link becomes ready [ 59.694671] r8169 0000:00:10.0 enx001e064549e5: Link is Down [ 77.076825] r8169 0000:00:10.0 enx001e064549e5: Link is Up - 2.5Gbps/Full - flow control off [ 143.526494] ------------[ cut here ]------------ [ 143.526540] NETDEV WATCHDOG: enx001e064549e5 (r8169): transmit queue 0 timed out 5664 ms [ 143.526608] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:525 dev_watchdog+0x215/0x220 [ 143.526620] Modules linked in: snd_pcm snd_timer bochs snd drm_vram_helper drm_ttm_helper ttm soundcore virtio_console pcspkr drm_kms_helper virtio_balloon button evdev joydev serio_raw sg drm sunrpc fuse configfs efi _pstore qemu_fw_cfg ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq libcrc32c crc32c_generic sd_mod t10_pi crc64_rocksoft crc64 crc_t10dif crct10dif_generic virtio_net net_failover crct10dif_common failover virtio_scsi ata_generic ata_piix libata r8169 uhci_hcd ehci_hcd virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev i2c_piix4 realtek virtio usbcore mdio_devres libphy psmouse virtio_ring scsi_mod usb_common scsi_com mon floppy [ 143.526797] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0-rc1 #8 [ 143.526802] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.1-0-g3208b098f51a-prebuilt.qemu.org 04/01/2014 [ 143.526805] RIP: 0010:dev_watchdog+0x215/0x220 Additional aspect - the problem exists no matter if connection is 2.5Gbps or 1Gbps: [ 13.087589] fbcon: bochs-drmdrmfb (fb0) is primary device [ 13.127481] Console: switching to colour frame buffer device 160x50 [ 13.136263] Error: Driver 'pcspkr' is already registered, aborting... [ 13.145607] bochs-drm 0000:00:02.0: [drm] fb0: bochs-drmdrmfb frame buffer device [ 14.481616] RTL8226B_RTL8221B 2.5Gbps PHY r8169-0-80:00: attached PHY driver (mii_bus:phy_addr=r8169-0-80:00, irq=MAC) [ 14.642369] r8169 0000:00:10.0 enx001e064549e5: No native access to PCI extended config space, falling back to CSI [ 14.669778] r8169 0000:00:10.0 enx001e064549e5: Link is Down [ 15.307257] memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL, pid=514 'systemd' [ 17.654585] r8169 0000:00:10.0 enx001e064549e5: Link is Up - 1Gbps/Full - flow control off [ 17.654615] IPv6: ADDRCONF(NETDEV_CHANGE): enx001e064549e5: link becomes ready [ 37.268371] r8169 0000:00:10.0 enx001e064549e5: Link is Down [ 44.092858] r8169 0000:00:10.0 enx001e064549e5: Link is Up - 1Gbps/Full - flow control off [ 70.052069] r8169 0000:00:10.0 enx001e064549e5: Link is Down [ 78.661951] r8169 0000:00:10.0 enx001e064549e5: Link is Up - 1Gbps/Full - flow control off [ 153.913556] ------------[ cut here ]------------ [ 153.913598] NETDEV WATCHDOG: enx001e064549e5 (r8169): transmit queue 0 timed out 8352 ms [ 153.913638] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:525 dev_watchdog+0x215/0x220 [ 153.913651] Modules linked in: snd_pcm snd_timer bochs drm_vram_helper drm_ttm_helper snd ttm soundcore virtio_console virtio_balloon pcspkr drm_kms_helper button evdev joydev serio_raw sg drm fuse sunrpc configfs efi_pstore qemu_fw_cfg ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq libcrc32c crc32c_generic sd_mod t10_pi crc64_rocksoft crc64 crc_t10dif crct10dif_generic uhci_hcd crct10dif_common r8169 ehci_hcd virtio_net realtek net_failover ata_generic virtio_scsi failover mdio_devres psmouse libphy virtio_pci usbcore virtio_pci_legacy_dev virtio_pci_modern_dev ata_piix virtio libata i2c_piix4 scsi_mod virtio_ring usb_common scsi_common floppy [ 153.913816] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0-rc1 #8 [ 153.913821] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.1-0-g3208b098f51a-prebuilt.qemu.org 04/01/2014 [ 153.913825] RIP: 0010:dev_watchdog+0x215/0x220 [ 153.913830] Code: ff ff ff 4c 89 e7 c6 05 ba 12 09 01 01 e8 43 62 fa ff 45 89 f8 44 89 f1 4c 89 e6 48 89 c2 48 c7 c7 80 32 4e 8b e8 5b c7 73 ff <0f> 0b e9 2e ff ff ff 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 [ 153.913835] RSP: 0018:ffffb3c600003ea0 EFLAGS: 00010286 Thanks a lot for testing! As I said, apparently this a HW erratum. Unfortunately Realtek doesn't publish such information. The vendor drivers seem to work around this by doing a full hw initialization on each link-up. To simulate this for testing: Supposedly runtime pm for the NIC isn't active on your system. Do a "echo auto > /sys/class/net/<if>/power/control" and unplug the cable for more than 10s. Then NIC goes into runtime suspend with PCIe switching to D3hot. The PME triggered by link-up initializes the HW. You should also see the PCIe PME interrupt count incrementing in /proc/interrupts. If you could give a little bit more explanation. One thing is that for Proxmox hypervisor ("pve" host) I have disabled ASPM. The reason is that vendor driver (which I am using now) when getting ingress traffic was "waking up slowly". It was visible by iperf3 -s for inbound. I was getting ~300mbit, ~300mbit, ~300mbit, then after 3-5 seconds: 2.47 Gbit, 2.47 Gbit. I have found the solution - I was able to make it "always fast" by changing the ASPM settings in BIOS or giving parameters in command line as given below (see "pcie_port_pm=off pcie_aspm.policy=performance"). And remark - r8169 was NOT impacted by performance by ASPM - it was fast always. But it has the issue, which we are discussing here. [root@pve ~]# dmesg | head -3 [ 0.000000] microcode: microcode updated early to revision 0x3e, date = 2022-09-16 [ 0.000000] Linux version 5.15.107-1-pve (build@proxmox) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP PVE 5.15.107-1 (2023-04-20T10:05Z) () [ 0.000000] Command line: BOOT_IMAGE=/@/boot/vmlinuz-5.15.107-1-pve root=UUID=7719ce9c-5acb-4d3d-ae50-112101732256 ro rootflags=subvol=@ quiet nosgx intel_iommu=on iommu=pt pcie_port_pm=off pcie_aspm.policy=performance [root@pve ~]# ll /sys/class/net/enp*s0/power/control -rw-r--r-- 1 root root 4096 May 12 07:38 /sys/class/net/enp1s0/power/control -rw-r--r-- 1 root root 4096 May 12 07:38 /sys/class/net/enp2s0/power/control [root@pve ~]# cat /sys/class/net/enp*s0/power/control auto auto [root@pve ~]# Note, that I haven't executed the "echo" as given by you - my two interfaces are already in auto control. The 3rd one (enp3s0) is not displayed as it is forwarded to VM (so, host machine stops seeing this interface). Now the same for VM itself - on which I have been executing all tests. The interface which was forwarded by PCIe passthrough is enx001e064549e5. The enx26ee9bd6b647 is virtio network. But as you can see - they are both in "auto" as well (without doing any additional command): [root@r8169 ~]# dmesg | head -3 [ 0.000000] Linux version 6.4.0-rc1 (root@PK) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #8 SMP PREEMPT_DYNAMIC Thu May 11 18:35:51 CEST 2023 [ 0.000000] Command line: BOOT_IMAGE=/@/boot/vmlinuz-6.4.0-rc1 root=UUID=8f1ac5cf-d262-4729-ba23-dd0513ee3db7 ro rootflags=subvol=@ quiet [ 0.000000] x86/fpu: x87 FPU will use FXSAVE [root@r8169 ~]# dmesg | head -3 [ 0.000000] Linux version 6.4.0-rc1 (root@PK) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #8 SMP PREEMPT_DYNAMIC Thu May 11 18:35:51 CEST 2023 [ 0.000000] Command line: BOOT_IMAGE=/@/boot/vmlinuz-6.4.0-rc1 root=UUID=8f1ac5cf-d262-4729-ba23-dd0513ee3db7 ro rootflags=subvol=@ quiet [ 0.000000] x86/fpu: x87 FPU will use FXSAVE [root@r8169 ~]# ll /sys/class/net/enx*/power/control -rw-r--r-- 1 root root 4096 May 12 21:10 /sys/class/net/enx26ee9bd6b647/power/control -rw-r--r-- 1 root root 4096 May 12 21:10 /sys/class/net/enx001e064549e5/power/control [root@r8169 ~]# cat /sys/class/net/enx*/power/control auto auto [root@r8169 ~]# Now /proc/interrupts - which one is PCIe PME? [root@r8169 ~]# cat /proc/interrupts CPU0 CPU1 0: 33 0 IO-APIC 2-edge timer 1: 0 9 IO-APIC 1-edge i8042 6: 0 3 IO-APIC 6-edge floppy 8: 1 0 IO-APIC 8-edge rtc0 9: 0 0 IO-APIC 9-fasteoi acpi 10: 0 70 IO-APIC 10-fasteoi virtio0 11: 0 0 IO-APIC 11-fasteoi uhci_hcd:usb1 12: 33 0 IO-APIC 12-edge i8042 14: 0 0 IO-APIC 14-edge ata_piix 15: 0 0 IO-APIC 15-edge ata_piix 24: 1193 0 PCI-MSIX-0000:00:10.0 0-edge enx001e064549e5 25: 0 0 PCI-MSIX-0000:00:05.0 0-edge virtio1-config 26: 0 0 PCI-MSIX-0000:00:05.0 1-edge virtio1-control 27: 0 0 PCI-MSIX-0000:00:05.0 2-edge virtio1-event 28: 837 0 PCI-MSIX-0000:00:05.0 3-edge virtio1-request 29: 0 1008 PCI-MSIX-0000:00:05.0 4-edge virtio1-request 30: 0 0 PCI-MSIX-0000:00:12.0 0-edge virtio3-config 31: 0 0 PCI-MSIX-0000:00:12.0 1-edge virtio3-input.0 32: 0 0 PCI-MSIX-0000:00:12.0 2-edge virtio3-output.0 33: 0 0 PCI-MSIX-0000:00:08.0 0-edge virtio2-config 34: 0 3 PCI-MSIX-0000:00:08.0 1-edge virtio2-virtqueues NMI: 0 0 Non-maskable interrupts LOC: 6841 4121 Local timer interrupts SPU: 0 0 Spurious interrupts PMI: 0 0 Performance monitoring interrupts IWI: 0 0 IRQ work interrupts RTR: 0 0 APIC ICR read retries RES: 455 935 Rescheduling interrupts CAL: 3366 4221 Function call interrupts TLB: 7 12 TLB shootdowns TRM: 0 0 Thermal event interrupts THR: 0 0 Threshold APIC interrupts DFR: 0 0 Deferred Error APIC interrupts MCE: 0 0 Machine check exceptions MCP: 1 1 Machine check polls ERR: 0 MIS: 0 PIN: 0 0 Posted-interrupt notification event NPI: 0 0 Nested posted-interrupt event PIW: 0 0 Posted-interrupt wakeup event [root@r8169 ~]# On host (hypervisor) it looks differently (I see some PME PCI records): [root@pve ~]# cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 0: 17 0 0 0 IR-IO-APIC 2-edge timer 8: 0 0 0 0 IR-IO-APIC 8-fasteoi rtc0 9: 0 0 0 0 IR-IO-APIC 9-fasteoi acpi 14: 0 0 0 0 IR-IO-APIC 14-fasteoi INT3453:00, INT3453:01, INT3453:03 15: 0 0 0 0 IR-IO-APIC 15-fasteoi INT3453:02 20: 0 0 0 27 IR-IO-APIC 20-fasteoi i801_smbus 31: 0 0 0 0 IR-IO-APIC 31-fasteoi idma64.0, i2c_designware.0 32: 0 0 0 0 IR-IO-APIC 32-fasteoi idma64.1, i2c_designware.1 33: 0 0 0 0 IR-IO-APIC 33-fasteoi idma64.2, i2c_designware.2 34: 0 0 0 0 IR-IO-APIC 34-fasteoi idma64.3, i2c_designware.3 39: 0 0 725786 0 IR-IO-APIC 39-fasteoi mmc0 120: 0 0 0 0 DMAR-MSI 0-edge dmar0 121: 0 0 0 0 DMAR-MSI 1-edge dmar1 122: 0 0 0 0 IR-PCI-MSI 311296-edge PCIe PME 123: 0 0 0 0 IR-PCI-MSI 327680-edge PCIe PME 124: 0 0 0 0 IR-PCI-MSI 329728-edge PCIe PME 125: 0 0 3333801 0 IR-PCI-MSI 294912-edge ahci[0000:00:12.0] 126: 0 0 0 0 IR-PCI-MSI 344064-edge xhci_hcd 127: 0 5966557 0 0 IR-PCI-MSI 524288-edge enp1s0-0 128: 0 0 0 0 IR-PCI-MSI 524289-edge enp1s0-1 129: 0 0 0 0 IR-PCI-MSI 524290-edge enp1s0-2 130: 0 0 0 0 IR-PCI-MSI 524291-edge enp1s0-3 131: 0 0 0 0 IR-PCI-MSI 524292-edge enp1s0-4 132: 0 0 0 0 IR-PCI-MSI 524293-edge enp1s0-5 133: 0 0 0 0 IR-PCI-MSI 524294-edge enp1s0-6 134: 0 0 0 0 IR-PCI-MSI 524295-edge enp1s0-7 135: 0 0 0 0 IR-PCI-MSI 524296-edge enp1s0-8 136: 0 0 0 0 IR-PCI-MSI 524297-edge enp1s0-9 137: 0 0 0 0 IR-PCI-MSI 524298-edge enp1s0-10 138: 0 0 0 0 IR-PCI-MSI 524299-edge enp1s0-11 139: 0 0 0 0 IR-PCI-MSI 524300-edge enp1s0-12 140: 0 0 0 0 IR-PCI-MSI 524301-edge enp1s0-13 141: 0 0 0 0 IR-PCI-MSI 524302-edge enp1s0-14 142: 0 0 0 0 IR-PCI-MSI 524303-edge enp1s0-15 143: 0 6050052 0 0 IR-PCI-MSI 524304-edge enp1s0-16 144: 0 0 0 0 IR-PCI-MSI 524305-edge enp1s0-17 145: 0 0 0 0 IR-PCI-MSI 524306-edge enp1s0-18 146: 0 0 0 0 IR-PCI-MSI 524307-edge enp1s0-19 147: 0 0 0 0 IR-PCI-MSI 524308-edge enp1s0-20 148: 0 0 1 0 IR-PCI-MSI 524309-edge enp1s0-21 149: 0 0 0 0 IR-PCI-MSI 524310-edge enp1s0-22 150: 0 0 0 0 IR-PCI-MSI 524311-edge enp1s0-23 151: 0 0 0 0 IR-PCI-MSI 524312-edge enp1s0-24 152: 0 0 0 0 IR-PCI-MSI 524313-edge enp1s0-25 153: 0 0 0 0 IR-PCI-MSI 524314-edge enp1s0-26 154: 0 0 0 0 IR-PCI-MSI 524315-edge enp1s0-27 155: 0 0 0 0 IR-PCI-MSI 524316-edge enp1s0-28 156: 0 0 0 0 IR-PCI-MSI 524317-edge enp1s0-29 157: 0 0 0 0 IR-PCI-MSI 524318-edge enp1s0-30 158: 0 0 0 0 IR-PCI-MSI 524319-edge enp1s0-31 159: 0 0 36 0 IR-PCI-MSI 245760-edge mei_me 160: 0 8028785 0 0 IR-PCI-MSI 1048576-edge enp2s0-0 161: 0 0 0 0 IR-PCI-MSI 1048577-edge enp2s0-1 162: 0 0 0 0 IR-PCI-MSI 1048578-edge enp2s0-2 163: 0 0 0 0 IR-PCI-MSI 1048579-edge enp2s0-3 164: 0 0 0 0 IR-PCI-MSI 1048580-edge enp2s0-4 165: 0 0 0 0 IR-PCI-MSI 1048581-edge enp2s0-5 166: 0 0 0 0 IR-PCI-MSI 1048582-edge enp2s0-6 167: 0 0 0 0 IR-PCI-MSI 1048583-edge enp2s0-7 168: 0 0 0 0 IR-PCI-MSI 1048584-edge enp2s0-8 169: 0 0 0 0 IR-PCI-MSI 1048585-edge enp2s0-9 170: 0 0 0 0 IR-PCI-MSI 1048586-edge enp2s0-10 171: 0 0 0 0 IR-PCI-MSI 1048587-edge enp2s0-11 172: 0 0 0 0 IR-PCI-MSI 1048588-edge enp2s0-12 173: 0 0 0 0 IR-PCI-MSI 1048589-edge enp2s0-13 174: 0 0 0 0 IR-PCI-MSI 1048590-edge enp2s0-14 175: 0 0 0 0 IR-PCI-MSI 1048591-edge enp2s0-15 176: 0 8839858 0 0 IR-PCI-MSI 1048592-edge enp2s0-16 177: 0 0 0 0 IR-PCI-MSI 1048593-edge enp2s0-17 178: 0 0 0 0 IR-PCI-MSI 1048594-edge enp2s0-18 179: 0 0 0 0 IR-PCI-MSI 1048595-edge enp2s0-19 180: 0 0 0 0 IR-PCI-MSI 1048596-edge enp2s0-20 181: 0 0 3 0 IR-PCI-MSI 1048597-edge enp2s0-21 182: 0 0 0 0 IR-PCI-MSI 1048598-edge enp2s0-22 183: 0 0 0 0 IR-PCI-MSI 1048599-edge enp2s0-23 184: 0 0 0 0 IR-PCI-MSI 1048600-edge enp2s0-24 185: 0 0 0 0 IR-PCI-MSI 1048601-edge enp2s0-25 186: 0 0 0 0 IR-PCI-MSI 1048602-edge enp2s0-26 187: 0 0 0 0 IR-PCI-MSI 1048603-edge enp2s0-27 188: 0 0 0 0 IR-PCI-MSI 1048604-edge enp2s0-28 189: 0 0 0 0 IR-PCI-MSI 1048605-edge enp2s0-29 190: 0 0 0 0 IR-PCI-MSI 1048606-edge enp2s0-30 191: 0 0 0 0 IR-PCI-MSI 1048607-edge enp2s0-31 192: 0 4458 0 0 IR-PCI-MSI 1572864-edge vfio-msix[0](0000:03:00.0) 224: 0 0 0 53 IR-PCI-MSI 32768-edge i915 225: 362 0 0 0 IR-PCI-MSI 229376-edge snd_hda_intel:card0 NMI: 1188 1269 1183 1176 Non-maskable interrupts LOC: 34826494 35213804 34565557 34451312 Local timer interrupts SPU: 0 0 0 0 Spurious interrupts PMI: 1188 1269 1183 1176 Performance monitoring interrupts IWI: 16 0 0 7 IRQ work interrupts RTR: 0 0 0 0 APIC ICR read retries RES: 622077 574339 560971 611903 Rescheduling interrupts CAL: 4991583 5201007 4762941 4735719 Function call interrupts TLB: 239654 192327 182074 188543 TLB shootdowns TRM: 0 0 0 0 Thermal event interrupts THR: 0 0 0 0 Threshold APIC interrupts DFR: 0 0 0 0 Deferred Error APIC interrupts MCE: 0 0 0 0 Machine check exceptions MCP: 294 295 295 295 Machine check polls ERR: 0 MIS: 0 PIN: 170890 183860 164263 165410 Posted-interrupt notification event NPI: 0 0 0 0 Nested posted-interrupt event PIW: 0 0 0 0 Posted-interrupt wakeup event [root@pve ~]# I'm not a virtualization expert, maybe PME doesn't exist on the guest. To ensure that runtime pm kicks in you could unplug the cable a little longer (say 30s). Then /sys/class/net/<if>/device/power_state should show D3hot for the NIC. Does the problem also occur if you set runtime pm control attribute to "on"? |