Bug 203703 - 5.1 regression makes r8169 Ethernet connection inoperable if fq_codel qdisc is used
Summary: 5.1 regression makes r8169 Ethernet connection inoperable if fq_codel qdisc i...
Status: NEW
Alias: None
Product: Networking
Classification: Unclassified
Component: Other (show other bugs)
Hardware: x86-64 Linux
: P1 high
Assignee: Stephen Hemminger
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-05-25 04:55 UTC by Sergey Kondakov
Modified: 2024-01-29 09:54 UTC (History)
7 users (show)

See Also:
Kernel Version: 5.1.4
Subsystem:
Regression: No
Bisected commit-id:


Attachments
kernel config (218.63 KB, text/plain)
2019-05-25 04:55 UTC, Sergey Kondakov
Details
r8169_stuck-on_5.9.1.txt (18.11 KB, text/plain)
2020-10-23 07:09 UTC, Sergey Kondakov
Details

Description Sergey Kondakov 2019-05-25 04:55:19 UTC
Created attachment 282937 [details]
kernel config

After updating from 5.0.x to 5.1.x my network started halting less than hour after boot with "network unreachable" messages for any connection attempt. With these lines in kernel log:
[34441.731088] NETDEV WATCHDOG: enp4s0 (r8169): transmit queue 0 timed out
[34441.731126] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:461 dev_watchdog+0x21a/0x220
[34441.731128] Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq af_packet ts_bm xt_pkttype xt_string nf_nat_ftp nf_conntrack_ftp xt_tcpudp ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink ebtable_filter ebtables scsi_transport_iscsi ip6table_filter ip6_tables iptable_filter ip_tables x_tables bpfilter rfcomm bnep zram msr it87 hwmon_vid snd_hda_codec_hdmi snd_usb_audio snd_usbmidi_lib rc_avermedia btusb snd_rawmidi snd_hda_codec_realtek btrtl snd_hda_codec_generic snd_seq_device btbcm ledtrig_audio tuner_simple tuner_types ath9k btintel tuner tda7432 ath9k_common ath9k_hw bluetooth tvaudio msp3400 ath amd64_edac_mod bttv snd_hda_intel edac_mce_amd kvm_amd snd_hda_codec snd_hda_core mac80211 snd_hwdep tea575x kvm tveeprom videobuf_dma_sg videobuf_core snd_pcm_oss
[34441.731180]  rc_core snd_mixer_oss v4l2_common videodev irqbypass mxm_wmi wmi_bmof amdgpu media pcspkr k10temp snd_pcm cfg80211 r8169 fam15h_power realtek sp5100_tco i2c_piix4 chash libphy gpu_sched ttm rfkill mac_hid hid_generic usbhid uas usb_storage sd_mod ohci_pci serio_raw ohci_hcd xhci_pci ehci_pci ehci_hcd xhci_hcd wmi exfat(O) l2tp_ppp l2tp_netlink l2tp_core ip6_udp_tunnel udp_tunnel pppox ppp_generic slhc vhba(O) uinput sg nbd dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ecryptfs
[34441.731218] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G          IO      5.1.4-1320.g0739fa4-HSF #1 openSUSE Tumbleweed (unreleased)
[34441.731220] Hardware name: Gigabyte Technology Co., Ltd. GA-990XA-UD3/GA-990XA-UD3, BIOS F14e 09/09/2014
[34441.731224] RIP: 0010:dev_watchdog+0x21a/0x220
[34441.731227] Code: 49 63 4c 24 e0 eb 8c 4c 89 ef c6 05 a7 8d 0e 01 01 e8 9a dd fa ff 89 d9 4c 89 ee 48 c7 c7 c0 51 9b 9a 48 89 c2 e8 1a e2 44 ff <0f> 0b eb be 66 90 0f 1f 44 00 00 48 c7 47 08 00 00 00 00 48 c7 07
[34441.731230] RSP: 0018:ffff8c1cede03e40 EFLAGS: 00010286
[34441.731233] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[34441.731235] RDX: 0000000000000007 RSI: ffff8c19c5d14dc8 RDI: 0000000000000001
[34441.731238] RBP: ffff8c1cdcd8e4a0 R08: 0000000000000103 R09: 0000000000000000
[34441.731240] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8c1cdcd8e508
[34441.731243] R13: ffff8c1cdcd8e000 R14: 0000000000000001 R15: ffff8c1cdc31d080
[34441.731246] FS:  0000000000000000(0000) GS:ffff8c1cede00000(0000) knlGS:0000000000000000
[34441.731248] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[34441.731251] CR2: 00007fec8043aa48 CR3: 00000004067d4000 CR4: 00000000000406e0
[34441.731253] Call Trace:
[34441.731256]  <IRQ>
[34441.731263]  ? qdisc_put_unlocked+0x30/0x30
[34441.731269]  call_timer_fn+0xaa/0x300
[34441.731279]  ? qdisc_put_unlocked+0x30/0x30
[34441.731283]  run_timer_softirq+0x1df/0x530
[34441.731291]  ? read_hpet+0x124/0x140
[34441.731302]  __do_softirq+0xf3/0x4c5
[34441.731315]  irq_exit+0xef/0x100
[34441.731319]  smp_apic_timer_interrupt+0xb5/0x270
[34441.731324]  apic_timer_interrupt+0xf/0x20
[34441.731327]  </IRQ>
[34441.731331] RIP: 0010:native_safe_halt+0xe/0x10
[34441.731334] Code: f0 80 48 02 20 48 8b 00 a8 08 75 c3 e9 7c ff ff ff 90 90 90 90 90 90 90 90 90 90 90 e9 07 00 00 00 0f 00 2d 86 40 52 00 fb f4 <c3> 90 e9 07 00 00 00 0f 00 2d 76 40 52 00 f4 c3 90 90 0f 1f 44 00
[34441.731337] RSP: 0018:ffffb835c196beb0 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13
[34441.731340] RAX: ffff8c19c5d14440 RBX: 0000000000000001 RCX: 0000000000000000
[34441.731342] RDX: ffff8c19c5d14440 RSI: 0000000000000006 RDI: ffff8c19c5d14440
[34441.731344] RBP: ffffffff9ae3f360 R08: 0000000000000001 R09: 0000000000000000
[34441.731347] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[34441.731349] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[34441.731367]  default_idle+0x1f/0x180
[34441.731374]  default_idle_call+0x31/0x40
[34441.731378]  do_idle+0x211/0x2b0
[34441.731386]  cpu_startup_entry+0x19/0x20
[34441.731391]  start_secondary+0x185/0x1e0
[34441.731397]  secondary_startup_64+0xa4/0xb0
[34441.731415] irq event stamp: 422277267
[34441.731419] hardirqs last  enabled at (422277266): [<ffffffff991e1b58>] console_unlock.part.14+0x438/0x5a0
[34441.731423] hardirqs last disabled at (422277267): [<ffffffff9900383b>] trace_hardirqs_off_thunk+0x1a/0x1c
[34441.731426] softirqs last  enabled at (422277248): [<ffffffff99165d97>] irq_enter+0x67/0x70
[34441.731430] softirqs last disabled at (422277249): [<ffffffff99165e8f>] irq_exit+0xef/0x100
[34441.731432] ---[ end trace 05ead7daf10a5f51 ]---

Reloading the r8169 doesn't fix that but I was able to work around the issue by changing qdisc from fq_codel to "safe default" of pfifo_fast. With that, network continues to work as if nothing has happened. The only seemingly relevant info that I could gather is this discussion: https://lkml.org/lkml/2019/2/9/44

I set qdisc by CONFIG_DEFAULT_NET_SCH="fq_codel" in kernel config and `tc qdisc replace dev ${interface} root fq_codel limit 500000 flows 50000 target 25ms interval 200ms` in tuned's script.sh.
Comment 1 Heiner Kallweit 2019-05-25 20:38:26 UTC
Root cause of the issue could be in quite different places in the network stack. It's not necessarily a r8169 issue. Best would be if you could bisect the issue.
Comment 2 Cong Wang 2019-05-28 17:36:27 UTC
Hmm, you narrow down the problem to fq_codel, which is great. But I don't see any relevant changes of fq_codel between 5.0 and 5.1. So, the cause is probably somewhere else. As Heiner suggested, a bisect would help a lot.
Comment 3 Sergey Kondakov 2019-05-28 20:06:26 UTC
Indeed it may not be directly relate to fq_codel. Today I've noticed few strange thing about it:
1) During last boot everything worked for more than an hour without me changing qdisc or resetting network. But there still was that dmesg message under 10 minutes after boot. Couldn't see where it goes for long, my apartment got electricity cut off.
2) If issue has manifested then just changing qdisc or removing and inserting r8169 module doesn't help. Or, at least, isn't guaranteed to help. Usually, I have to "disable" and "enable" "networking" in NetworkManager one or several times, whatever it does, and then change qdisc to be safe until next boot.

My current, binary distribution is as unsuited as they come for kernel recompiling and manual reinstalling (I use OBS servers to auto-build & update kernel package with my config), so I planned to at least wait for more minor or one major kernel releases before going full bisect in hope that it's a consequence of a known mishap.

I even thought that maybe it's a result of openSUSE pushing some raw patches against latest Intel vulnerabilities. By the way, you may check them out at https://github.com/openSUSE/kernel-source/tree/stable/patches.suse
The builds are in:
https://build.opensuse.org/project/monitor/home:X0F:HSF:Kernel - my fork
https://build.opensuse.org/package/binaries/home:X0F:HSF:Kernel/kernel-HSF/standard - that package in particular
Comment 4 Sergey Kondakov 2019-07-13 09:10:07 UTC
Just updated to 5.2. The bad news is that aforementioned warning didn't go away. The "good" (?) news is that it showed itself about 6 hours of uptime and currently, at 18 hours, complete glitch has not happened yet. I've noticed that fq_codel does a lot of requeues and new_flow_count is constantly rising but I don't remember if its different with how it behaved prior to the glitch. For some reason there are also packet drops on a machine with 6-thread FX-6100 CPU and 50 Mbit/s connection.

Here's is `tc qdisc` dump at ~18 hours up uptime with only mild Youtube watching and Tor relay (capped at 128/512 KB rate/burst) in the background:
qdisc fq_codel 0: dev enp4s0 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn 
 Sent 3225700533 bytes 3411973 pkt (dropped 18, overlimits 0 requeues 79903) 
 backlog 0b 0p requeues 79903
  maxpacket 2846 drop_overlimit 0 new_flow_count 40713 ecn_mark 0
  new_flows_len 0 old_flows_len 0
Comment 5 Dave Täht 2019-08-09 03:26:35 UTC
The requeues, new_flow_count, and dropped stats show fq_codel working as designed. I think.
 
However I'm puzzled as to why you'd have any drops at all at a 50mbit "rate". Does the device imposing that limit use pause frames?

(normally we shape using sqm-scripts or these days via cake to slightly below the provided rate)
Comment 6 Heiner Kallweit 2019-08-09 07:38:09 UTC
Two questions regarding the affected system:
1. Which RTL8168 chip version is it? (dmesg | grep XID)
2. TSO activated?

At least one chip version (RTL8168evl) has a hw issue with TSO, therefore it's disabled per default.
Comment 7 Sergey Kondakov 2019-08-09 14:02:10 UTC
(In reply to Dave Täht from comment #5)
> The requeues, new_flow_count, and dropped stats show fq_codel working as
> designed. I think.
>  
> However I'm puzzled as to why you'd have any drops at all at a 50mbit
> "rate". Does the device imposing that limit use pause frames?
> 
> (normally we shape using sqm-scripts or these days via cake to slightly
> below the provided rate)

So, my PC with r8169 is connected to cheapo 100mbit switch which is connected to Huawei EchoLife HG8245 GPON Terminal that is connected via fiber to ISP that limits bandwidth to 50Mbits somehow.

I recently switched to cake and I was able to reproduce the problem with it… but I forgot exactly how. Now I have: tc qdisc replace dev ${interface} root cake ethernet ether-vlan rtt 100ms flows diffserv4 ack-filter split-gso

I think it was setting 50ms (I can get about 40-60ms to some big in-country sites but google stuff is about 90ms) and autorate-ingress. Judging by statistics, it was dropping and removing whole bunch of acks at the time. But now it's working fine for days with statistics like this:
qdisc cake 8003: dev enp4s0 root refcnt 2 bandwidth unlimited diffserv4 flows nonat nowash ack-filter split-gso rtt 100.0ms noatm overhead 42 mpu 84 
 Sent 1105703468 bytes 1264192 pkt (dropped 3, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
 memory used: 162Kb of 15140Kb
 capacity estimate: 0ibit
 min/max network layer size:           28 /    1475
 min/max overhead-adjusted size:       84 /    1517
 average network hdr offset:           14

                   Bulk  Best Effort        Video        Voice
  thresh          0ibit        0ibit        0ibit        0ibit
  target          5.0ms        5.0ms        5.0ms        5.0ms
  interval      100.0ms      100.0ms      100.0ms      100.0ms
  pk_delay         12us          8us         14us          9us
  av_delay          4us          3us          4us          4us
  sp_delay          2us          2us          2us          3us
  backlog            0b           0b           0b           0b
  pkts          1019412       238119         6102          562
  bytes      1065322775     36965315      3368008        51779
  way_inds        20562         4859            0            0
  way_miss        99399        10891           19          155
  way_cols            0            0            0            0
  drops               3            0            0            0
  marks               0            0            0            0
  ack_drop            0            0            0            0
  sp_flows            1            6            0            1
  bk_flows            0            0            0            0
  un_flows            0            0            0            0
  max_len          2816        11672          554          590
  quantum          1514         1514         1514         1514

(In reply to Heiner Kallweit from comment #6)
> Two questions regarding the affected system:
> 1. Which RTL8168 chip version is it? (dmesg | grep XID)
> 2. TSO activated?
> 
> At least one chip version (RTL8168evl) has a hw issue with TSO, therefore
> it's disabled per default.

1) RTL8168evl/8111evl, XID 2c9, IRQ 42
2) Yes but I think that it manifested before I recently started enabling everything I could via ethtool in a boot-up script: `ethtool -K ${interface} tx-nocache-copy on rx on tx on sg on tso on ufo on gso on gro on lro on rxvlan on txvlan on ntuple on rxhash on`
Comment 8 Dave Täht 2019-08-09 17:01:47 UTC
(In reply to Sergey Kondakov from comment #7)
> (In reply to Dave Täht from comment #5)
> > The requeues, new_flow_count, and dropped stats show fq_codel working as
> > designed. I think.
> >  
> > However I'm puzzled as to why you'd have any drops at all at a 50mbit
> > "rate". Does the device imposing that limit use pause frames?
> > 
> > (normally we shape using sqm-scripts or these days via cake to slightly
> > below the provided rate)
> 
> So, my PC with r8169 is connected to cheapo 100mbit switch which is
> connected to Huawei EchoLife HG8245 GPON Terminal that is connected via
> fiber to ISP that limits bandwidth to 50Mbits somehow.
> 
> I recently switched to cake and I was able to reproduce the problem with it…
> but I forgot exactly how. Now I have: tc qdisc replace dev ${interface} root
> cake ethernet ether-vlan rtt 100ms flows diffserv4 ack-filter split-gso
> 
> I think it was setting 50ms (I can get about 40-60ms to some big in-country
> sites but google stuff is about 90ms) and autorate-ingress. Judging by
> statistics, it was dropping and removing whole bunch of acks at the time.
> But now it's working fine for days with statistics like this:
> qdisc cake 8003: dev enp4s0 root refcnt 2 bandwidth unlimited diffserv4
> flows nonat nowash ack-filter split-gso rtt 100.0ms noatm overhead 42 mpu 84 
>  Sent 1105703468 bytes 1264192 pkt (dropped 3, overlimits 0 requeues 0) 
>  backlog 0b 0p requeues 0
>  memory used: 162Kb of 15140Kb
>  capacity estimate: 0ibit
>  min/max network layer size:           28 /    1475
>  min/max overhead-adjusted size:       84 /    1517
>  average network hdr offset:           14
> 
>                    Bulk  Best Effort        Video        Voice
>   thresh          0ibit        0ibit        0ibit        0ibit
>   target          5.0ms        5.0ms        5.0ms        5.0ms
>   interval      100.0ms      100.0ms      100.0ms      100.0ms
>   pk_delay         12us          8us         14us          9us
>   av_delay          4us          3us          4us          4us
>   sp_delay          2us          2us          2us          3us
>   backlog            0b           0b           0b           0b
>   pkts          1019412       238119         6102          562
>   bytes      1065322775     36965315      3368008        51779
>   way_inds        20562         4859            0            0
>   way_miss        99399        10891           19          155
>   way_cols            0            0            0            0
>   drops               3            0            0            0
>   marks               0            0            0            0
>   ack_drop            0            0            0            0
>   sp_flows            1            6            0            1
>   bk_flows            0            0            0            0
>   un_flows            0            0            0            0
>   max_len          2816        11672          554          590
>   quantum          1514         1514         1514         1514

It looks like this box is originating traffic, and not acting as a router?

If your intent was to shape traffic the topology would be more like

gpon - this box - switch - your other boxes

and you'd set cake's bandwidth param a bit below 50mbit. (and also shape inbound)

diffserv4 is primarily intended for video-heavy loads.

ack-filtering doesn't do much unless you are on an asymmetric connection (1gb down, 50mbit up) 

> (In reply to Heiner Kallweit from comment #6)
> > Two questions regarding the affected system:
> > 1. Which RTL8168 chip version is it? (dmesg | grep XID)
> > 2. TSO activated?
> > 
> > At least one chip version (RTL8168evl) has a hw issue with TSO, therefore
> > it's disabled per default.
> 
> 1) RTL8168evl/8111evl, XID 2c9, IRQ 42
> 2) Yes but I think that it manifested before I recently started enabling
> everything I could via ethtool in a boot-up script: `ethtool -K ${interface}
> tx-nocache-copy on rx on tx on sg on tso on ufo on gso on gro on lro on
> rxvlan on txvlan on ntuple on rxhash on`

At 100mbit you really don't want or need gro/tso etc. It just bulks packets up for no reason (and cake automagically splits them again). If it's off by default, leave it off. That said if it's on be default and causing problems, well...
Comment 9 Sergey Kondakov 2019-08-09 20:42:23 UTC
(In reply to Dave Täht from comment #8)
> 
> It looks like this box is originating traffic, and not acting as a router?
> 
> If your intent was to shape traffic the topology would be more like
> 
> gpon - this box - switch - your other boxes
> 
> and you'd set cake's bandwidth param a bit below 50mbit. (and also shape
> inbound)
> 
> diffserv4 is primarily intended for video-heavy loads.
> 
> ack-filtering doesn't do much unless you are on an asymmetric connection
> (1gb down, 50mbit up) 
> 

Yes, it's not a router. What I would like to achieve is automatically heuristically manage up&down-stream queues and recognize & mark QoS classes appropriately so in case of sudden congestions or system overload, less important ones may be sacrificed by delays or drops. On the endpoints. Not just because nothing can be done about ISP's GPON boxes but because actual originator/receiver "knows" best what's what and how should be handled, ISP's internal backbone shenanigans notwithstanding.

You know, a network scheduler.

I don't know much about ack-filtering but it was hyped up buy its creators somewhere in https://forum.netgate.com/topic/112527/playing-with-fq_codel-in-2-4/410 as something that squeezes some benefits without any downsides.

> At 100mbit you really don't want or need gro/tso etc. It just bulks packets
> up for no reason (and cake automagically splits them again). If it's off by
> default, leave it off. That said if it's on be default and causing problems,
> well...


So, I will try disabling those explicitly and bring fq_codel to see if it hangs again. HOWEVER… "If it's off by default, leave it off" is NOT a good axiom to follow, especially on Linux. Many if not most things are inadequately configured by default, especially for desktop. Such as KMS TearFree of AMD and Intel GPUs being disabled by default, bluez not compiling in SixAxis/DualShock support, autogroup being enabled and screwing up actual process prioritization, kernel.sched_rr_timeslice_ms being defaulted at embarrassing 100ms with CFS granularity being on par with it, input, audio and GPU kernel processes not being of highest FIFO realtime priority, high-hz mice not using their actual proper polling rate on unless forced or never (bug#60586 on USB2 but !3 and bug#82571 - on USB3 but !2), PulseAudio not even trying to use available 24-32bit 48-192khz or figure out optimal minimal buffer/latency (talk about "buffer-bloat", huh…) from what hardware advertises and so on.

Low-latency high-throughput networking is nice, perpetually desirable, and _should_ all the time be used if available, BUT I would gladly sacrifice networking latency tenfold and bandwidth in half if otherwise it would make a frame-length stutter on my video output, delay/skip input or worse, create audio buffer under/over run / crackle.

Right now if I enable my ath9k WiFi on this PC, my very nice stable 8ms 32b/96khz in hardware hw / float/192khz (software limiter processing stage requirement) in software (JACK pipeline) audio may start to throw occasional x-run errors and crackle. If I disable MSI on it then it will ruin everything with its interrupt storm, even though it can handle complete saturation of all CPU cores with multimedia load and have 0 x-runs. But on my old laptop enabling MSI on ath9k WiFi hangs it… or something, I don't remember but I gave up on it completely.

Thankfully, Ethernet in general and r8169 in particular don't seem to have this problem (unless that newest 003bd5b4a7b4a94b501e3a1e2e7c9df6b2a94ed4 patch of 5.2.8 release from bug#204079 changes it…). But interactive I/O and A/V latencies always stand above network performance. There should be no network-related kernel workload that could compromise those ever, which means that if buffer needs to be get bigger and more stuff needs to be offloaded to PHY then so be it. It should stand on that line just behind kernel interactive input/output & userland multimedia processing but before all of latency-insensitive muck. A/V networking and wireless I/O would be a "grey area" in this.

The point is that, for now, it would be very foolish to blindly trust developer defaults in anything unless reasoning behind them is clearly defined and factually correct. They often have very strange and… single-minded priorities and compromises.
Comment 10 Dave Täht 2019-08-10 23:23:03 UTC
That was a good rant, thank you. I too have issues with linux audio. I used to work on ardour until it became too hard to get a good result.

I'm incidentally one of the authors of the latest ath9k code, but have not tried to run it under an R/T kernel. The current default of a 300 byte quantum creates a lot of extra processing, I'd try a MTU or two instead.

However, signalling intent over the endpoints as you are trying to do won't work. The bottleneck router needs to share the same interpretation of your intent and it doesn't. Stick an (for example - many others to choose from) edgerouter X w/openwrt in front of the gpon box and setup cake and see what you get.

Also, sometimes, the devs are right. If this card is buggy with gso, don't use it.
Comment 11 Sergey Kondakov 2019-08-19 19:13:49 UTC
(In reply to Dave Täht from comment #10)
> That was a good rant, thank you. I too have issues with linux audio. I used
> to work on ardour until it became too hard to get a good result.
> 
> I'm incidentally one of the authors of the latest ath9k code, but have not
> tried to run it under an R/T kernel. The current default of a 300 byte
> quantum creates a lot of extra processing, I'd try a MTU or two instead.
> 
> However, signalling intent over the endpoints as you are trying to do won't
> work. The bottleneck router needs to share the same interpretation of your
> intent and it doesn't. Stick an (for example - many others to choose from)
> edgerouter X w/openwrt in front of the gpon box and setup cake and see what
> you get.
> 
> Also, sometimes, the devs are right. If this card is buggy with gso, don't
> use it.

Thanks ! I'm really baffled that among:
* audio/kernel (DAC/ADC drivers);
* audio/userspace (JACK, PA, realtime processing pipeline, apps);
* video/kernel (CPU & GPU RAM management, GPU drivers);
* video/userspace (libdrm, Mesa, LLVM, X11/compositor, Wayland, apps and their rendering pipelines);
* input/kernel (USB, HID, sensors);
* input/userspace (libevdev, libinput, apps);
* network/kernel (TCP/IP & WiFi & BT stacks, filtering, qdisc/shaping?, drivers);
* network/userspace (apps);
network may be considered anything but lowest priority for CPU time. Although, with interactive/realtime traffic and remote I/O (such as wireless input devices and displays) point does become moot. Same goes for storage where random reads are much-much-much more important than writes even though prioritising reads may horribly screw up sequential writes but doing otherwise may halt any or all of the above.

For audio I really hope for PipeWire & Gstreamer success as universal userspace realtime A/V stream pipelining framework to replace PA & JACK, maybe give a push to ffmpeg, conventional DAWs and fancy, heavily-processing video players (mpv, vlc) & editors. All of those should scale with system's parallelization and acceleration capabilities better, JACK's LV2 filters, for example, aren't great with threading at all. But userspace can't do much if kernel's priorities are all upside down.

My kernel isn't from RT branch, that one updates too slow nowadays and its benefits are not clear. But I used RT priority on all latency-sensitive processes and rearranged all I could to minimize I/O latency of GPU, DAC & JACK and USB. Which does wonders on Linux even on such an old system. Windows has 1up over it in one case though: Bluetooth and Sony DualShock gamepads. There is no DS4 polling rate controls on Linux and BT stack is finicky to off-brand controllers. And what have they done to DS3's multitude of pressure sensors for each button… just ignored all of them in kernel !


But to the point: "quantum" as in codel's and cake's 'quantum' setting or is there something similar on more generic driver-level ? I may try so in codel but doesn't cake lack 'target' and 'quantum' settings ? I'm thinking about also setting target to 15-20ms by default instead of 5ms, as you've suggested once here: https://github.com/systemd/systemd/issues/9725#issuecomment-414079283 If I had actually latency-sensitive remote devices, I would want 4-8ms. For DS4 via BT on Windows I use 7ms polling with 1ms on wired devices. With things like mice and styli you would want to keep it to the minimum at all times. But those still come with their WiFi/BT mystery-dongles, so that's irrelevant for now.

Router indeed is the problem that may always screw up any attempt in anything. Especially if it's ISP's "blackbox" that they force you to "buy" but even then try to lock it with their password which, I'm pretty sure, is technically illegal but so is country-wide censorship and surveillance and that did not stop them… But can another box before ISP's make anything better ? What I'm worried about is:
* proper congestion control for the whole path, mainly ECN;
* distinction for realtime/interactive traffic (remote I/O, realtime A/V streaming/conferencing, gaming) and the rest.
However, it's not just about transit points dropping or slowing down different classes of traffic, it also about behaviour of the system under 100% CPU load or RAM exhaustion, momentary and constant. Ideally, it should delay (dynamically increase buffers), then drop traffic of lowest priority, then all, then halt network driver and stacks activity, and only after that affect the rest. It should not be "all or nothing" situation where network stack always gets what it wants, nothing else does or it drops all of its activity during any interference.

We can't expect ISPs behave sanely until state auditory agencies start to actually doing their job of technical quality control for communication infrastructure instead of whatever they are doing now. So it all may be an exercise in futility. But at least we may control endpoints.

If something is assuredly bugy then it should be blacklisted and not passed to userspace controls. If defaults were sane it would have instilled actual trust to developers' intentions but right now only distro maintainers bring some kind of sanity, even when they don't understand much more than a random user. For example, by default entire kernel is configured to bare minimum, as per Linus' own instructions, and debug options are discouraged from being enabled even by maintainers. Devs _actually_ expect all people to make their own situational builds after encountering every single problem which is ridiculous. So you don't even know which "debug" options just add some useful verbose lines, use some insignificant amount of RAM and which will crap-out your entire log and/or slow-down whole system 10 times over, actual debug versus paranoia. And all that goes with "safe for a 2-core per 4-sockets 10-year old headless remote-controlled web-server with a fiber-connected high-latency storage-array" default configuration.

More to the point: I tested with codel and TSO disabled then with cake and TSO/GSO/GRO disabled but it still halted. cake seems to be able to actually recover (at least partially) after a minute or few on its own with increased "delay" stats and some drops. And it doesn't trigger it as fast as codel. I `rmmod r8169` just in case and re-load the module. But in "stuck" state it doesn't recover with module & qdisc reload, just seconds or a minute after. I can't find correlation between settings, network activity (even a hundred or few KBs is enough to suddenly "choke" after gobbling up tens of gigabytes on full 50mbits for hours) and time for manifestation of the halt, only that cake uses codel's 'flows' and I'm not keen on changing that.
Comment 12 Sergey Kondakov 2019-09-03 07:56:08 UTC
So, after fiddling around with all kinds of networking parameters and trying to revert r8169 changes between 5.0 and 5.1, I think I have actually figured out what triggers the issue and it's these being set too low:
* net.core.netdev_budget_usecs lower than 500
* net.core.netdev_budget lower than 100
* net.core.dev_weight lower than 30

The lower they are the faster qdisc kernel warning comes and then all transfers on the interface halt, after a while some packets come through but backlog is getting filled and many or most packets are getting dropped (rx_missed rises in `ethtool -S enp4s0` and drops in `tc qdisc show`). It may recover after 5-10 minutes but too make sure that it did so fully I have to reinsert the r8169 module (even then, it may not start working normally for tens of seconds) but if parameters aren't changes, it will happen again. Strangely, when issue manifests, coalescence parameters get silently dropped to 'rx-frames 1 tx-frames 1 rx-usecs 0 tx-usecs 0' from anything that they might have been set. But, otherwise, any manual ethtool changes of any parameters do not seem to influence the issue, except maybe, again, too conservative coalescence such as 'rx-usecs 250 tx-usecs 500' or 'rx-frames 22 tx-frames 44'.

During normal operation, I have never seen backlog to be non-zero, codel showing its 'memory_used' stat, anything being dropped or missed. So, in the end, I had to put more generous values in those sysctl parameters and increase my audio buffering from 8 to 12 ms because otherwise perfectly dropless audio became unsustainable due to that or some other tuning. But WiFi does not seem to influence audio stability anymore either. I did:
* net.core.dev_weight=39
* net.core.dev_weight_rx_bias=3
* net.core.dev_weight_tx_bias=2
* net.core.netdev_budget=117
* net.core.netdev_budget_usecs=751
* net.core.rps_sock_flow_entries=1024
* /sys/class/net/*/queues/rx-0/rps_flow_cnt = 1024 (there is only 1 rx queue)
* ethtool -K enp4s0 tx-nocache-copy off rx on tx on sg on ufo on gro on gso off tso on lro on rxvlan on txvlan on ntuple on rxhash on
* ethtool -C enp4s0 rx-frames 0 tx-frames 0 rx-usecs 125 tx-usecs 250 adaptive-rx on adaptive-tx on (but adaptive doesn't apply and real Rus/Tus is 120/240)
* tc qdisc replace dev enp4s0 root fq_codel limit 102400 flows 10240 target 15ms interval 100ms quantum 3028 ecn
* tc qdisc replace dev wlp3s0 root fq_codel limit 102400 flows 5120 target 20ms interval 100ms quantum 2327 ecn (maxmtu of wifi for some reason is 2304 so I've put quantum at ~1.01 of that "just to make sure", as codel by default puts ethernet's quantum to ~1.01 of 1500 mtu)
* tc qdisc replace dev lo root fq_codel limit 102400 flows 1024 target 4ms interval 20ms ecn (for local DNS caching and sockets of daemons)

Seems fine now. Honestly, I no longer sure if it was a regression or change in those sysctl parameters just coincided with the kernel update. Still, not an adequate behaviour on kernel's part to hang all network activity semi-randomly like that.
Comment 13 Heiner Kallweit 2019-09-03 08:53:59 UTC
Puh, impressive analysis. From a r8169 maintainers perspective it's of course good news that the issue doesn't seem to be caused by a driver bug.
I agree that kernel should handle the situation more gentle. Question is, which sub-system / module this would refer to.
Comment 14 Sergey Kondakov 2019-09-20 16:23:40 UTC
(In reply to Heiner Kallweit from comment #13)
> Puh, impressive analysis. From a r8169 maintainers perspective it's of
> course good news that the issue doesn't seem to be caused by a driver bug.
> I agree that kernel should handle the situation more gentle. Question is,
> which sub-system / module this would refer to.

Thanks. Have you been able to reproduce it too ?

After figuring out safe parameters I haven't had a single network "halt", dropped IP packet, missed Ethernet frame, video frame stutter or audio dropout even under full system stress-load on an old clunker of mine. Even requeues are almost non-existent, had to sacrifice some audio latency but now all network, graphics, audio and inputs at their peak interactivity. Youtube player in FF fills its buffer with big chunks so fast that it waits most of the time for need of the next "portion" with zero network activity and downloading speed from German openSUSE distro update build-servers increased from ~300KB to ~1,5MB that shows that either old parameters were that bad (which is most likely), they recently upgraded their servers or some transit line on path from Russia's midland to Germany got better. Local DNS caching is also finally working correctly, previously I thought that there is some bug in unbound that makes it fail semi-randomly.

On both fq_codel and cake, cake even increases its quantum from 300 to 1514 in WiFi connection by itself. Although, with more relaxed audio latency, it may not be a problem even if it wouldn't.

I doubt it's relevant but here's some sysctl overrides that I also did:
net.core.netdev_tstamp_prequeue=0
net.core.somaxconn=4096
net.core.optmem_max=1048576
net.core.rmem_default=1048576
net.core.rmem_max=134217728
net.core.wmem_default=1048576
net.core.wmem_max=134217728
net.ipv4.tcp_mem=32768 196608 262144
net.ipv4.tcp_rmem=16384 1048576 134217728
net.ipv4.tcp_wmem=16384 1048576 134217728
net.ipv4.udp_mem=32768 196608 262144
net.ipv4.udp_rmem_min=16384
net.ipv4.udp_wmem_min=16384
net.ipv4.tcp_mtu_probing=2
net.ipv6.conf.default.mtu=1480
net.ipv4.tcp_min_snd_mss=48
net.ipv4.tcp_base_mss=256
net.ipv4.tcp_timestamps=2
net.ipv4.tcp_reordering=2
net.ipv4.tcp_max_reordering=1000
net.ipv4.tcp_tso_win_divisor=25
net.ipv4.tcp_tw_reuse=2
net.ipv4.tcp_ecn=1
net.ipv4.tcp_allowed_congestion_control=bbr reno cubic scalable highspeed bic cdg dctcp westwood hybla htcp vegas nv veno lp yeah illinois
net.ipv4.tcp_available_congestion_control = reno bbr bic cdg cubic dctcp westwood highspeed hybla htcp vegas nv veno scalable lp yeah illinois
net.ipv4.tcp_congestion_control = bbr
net.ipv4.tcp_slow_start_after_idle=0
net.ipv4.ip_local_port_range=18000 65535
net.ipv4.tcp_synack_retries=2
net.ipv4.tcp_comp_sack_nr=132
net.unix.max_dgram_qlen=8192
Comment 15 Heiner Kallweit 2019-09-26 08:33:27 UTC
On my test systems used for network driver development I never saw this problem. However they are only slightly loaded and don't have a complex setup like yours.
Comment 17 Sergey Kondakov 2020-10-20 14:00:44 UTC
Ah, I've changed motherboard to one for LGA2011 socket which has 'r8169 0000:0d:00.0 eth0: RTL8168evl/8111evl, XID 2c9, IRQ 66' on it and the issue returned (not as aggressively, without kernel warnings but still with needlessly requeuing, delaying or dropping packets/connections) even on kernel 5.9.1 but this time no amount of fiddling with any system options seem to help. I've resorted to replacing r8169 with out-of-tree r8168 and it works flawlessly so far.
Comment 18 Heiner Kallweit 2020-10-20 19:30:39 UTC
Do you run the system with forced interrupt threading (CONFIG_PREEMPT_RT or threadirqs command line parameter)?
Comment 19 Sergey Kondakov 2020-10-20 21:37:16 UTC
(In reply to Heiner Kallweit from comment #18)
> Do you run the system with forced interrupt threading (CONFIG_PREEMPT_RT or
> threadirqs command line parameter)?

Yes, both, in fact, CONFIG_PREEMPT_RT and CONFIG_IRQ_FORCED_THREADING for the sake of low-latency audio and vsync/compositing. Although, I don't see 'threadirqs' parameter now, I probably thought that it was implied by CONFIG_IRQ_FORCED_THREADING or something else and removed it.
Comment 20 Heiner Kallweit 2020-10-20 22:09:13 UTC
OK, there's a general known issue with napi_schedule_irqoff() under forced irq threading. See following very recent fix:
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=424a646e072a887aa87283b53aa6f8b19c2a7bef
It should be backported to the stable versions very soon.
Can you apply this patch and re-test with r8169?
Comment 21 Sergey Kondakov 2020-10-21 12:12:45 UTC
Explicitly added 'threadirqs' parameter and applied linked patch to new 5.9.1 build, used system for about 6 hours with low network activity but then connections started to get "stuck" again.

Kernel log got suspicious "NETDEV WATCHDOG: enp13s0 (r8169): transmit queue 0 timed out" about the same time. `ethtool -i enp13s0` says 'firmware-version: rtl8168e-3_0.0.4 03/27/12', by the way. `tc qdisc` in about same time showed:

qdisc cake 8003: dev enp13s0 root refcnt 2 bandwidth unlimited diffserv4 flows nonat nowash ack-filter no-split-gso rtt 80ms noatm overhead 42 mpu 84 
 Sent 117300483 bytes 894392 pkt (dropped 55, overlimits 0 requeues 7) 
 backlog 0b 0p requeues 7
 memory used: 42014b of 92080Kb
 capacity estimate: 0ibit
 min/max network layer size:           28 /    1560
 min/max overhead-adjusted size:       84 /    1602
 average network hdr offset:           14

                   Bulk  Best Effort        Video        Voice
  thresh          0ibit        0ibit        0ibit        0ibit
  target            4ms          4ms          4ms          4ms
  interval         80ms         80ms         80ms         80ms
  pk_delay        1.64s        1.38s          4us         76us
  av_delay        149ms       34.1ms          0us          4us
  sp_delay        279us          1us          0us          2us
  backlog            0b           0b           0b           0b
  pkts            25439       835490           12         3741
  bytes        27602686     89505596          648       196972
  way_inds          147        28855            0            0
  way_miss         3409        30537            3           81
  way_cols            0            0            0            0
  drops               0            3            0            0
  marks               0            1            0            0
  ack_drop            0           52            0            0
  sp_flows           12           26            1            1
  bk_flows            0            0            0            0
  un_flows            0            0            0            0
  max_len          7305         5792           54          590
  quantum          1514         1514         1514         1514

Used Wi-Fi to get r8168 back, `tc qdisc` shows:

qdisc cake 8004: dev enp13s0 root refcnt 2 bandwidth unlimited diffserv4 flows nonat nowash ack-filter no-split-gso rtt 80ms noatm overhead 42 mpu 84 
 Sent 13149956 bytes 91039 pkt (dropped 0, overlimits 0 requeues 1) 
 backlog 0b 0p requeues 1
 memory used: 16640b of 92080Kb
 capacity estimate: 0ibit
 min/max network layer size:           28 /    1560
 min/max overhead-adjusted size:       84 /    1602
 average network hdr offset:           14

                   Bulk  Best Effort        Video        Voice
  thresh          0ibit        0ibit        0ibit        0ibit
  target            4ms          4ms          4ms          4ms
  interval         80ms         80ms         80ms         80ms
  pk_delay         13us         13us          0us         11us
  av_delay          3us          3us          0us          4us
  sp_delay          2us          2us          0us          2us
  backlog            0b           0b           0b           0b
  pkts            18348        69261            0          940
  bytes         1497390     11607601            0        44965
  way_inds          339         1410            0            0
  way_miss         2132        11116            0           19
  way_cols            0            0            0            0
  drops               0            0            0            0
  marks               0            0            0            0
  ack_drop            0            0            0            0
  sp_flows           11            2            0            0
  bk_flows            0            0            0            0
  un_flows            0            0            0            0
  max_len           644         5792            0          590
  quantum          1514         1514         1514         1514
Comment 22 Heiner Kallweit 2020-10-21 12:37:02 UTC
OK, thanks for testing. r8169 and r8168 use different features of net subsystem, therefore the test result doesn't necessarily indicate a driver bug. Best would still be a bisect, but as you wrote earlier it's difficult in your setup.
r8169 uses the byte queue limit feature, not sure whether this could be related to your problem. But it might be worth a try to adjust BQL setting via sysfs.
Comment 23 Sergey Kondakov 2020-10-22 00:50:17 UTC
Yes, bisecting is almost impossible especially since it often manifests slowly & gradually and I don't even know if that exact chip ever worked fine on r8169. `tc qdisc` and lack of scarier dmesg messages don't actually show how extensive the problem is: drop and requeue numbers may not be big but when they even start counting then most of connections slow-down and/or freeze as if all servers started to respond slowly or not at all. Gajim, a python XMPP/jabber client, straight up crashes and can't restart until I `rmmod r8169` (or at least switch off the link) because it just can't handle whatever is happening in network stack.

I might play around with BQL limits under r8169 when feeling like checking it out again but isn't that only for transfer and not receive ?
Comment 24 Heiner Kallweit 2020-10-22 13:51:05 UTC
W/o bisect basically everything is a shot in the dark. What else you could try:
- use r8169 from 5.0 on top of a recent kernel (you said it was still ok in 5.0)
- rx irq coalescing was removed in 5.1, as a better replacement you could do:
  use latest linux-next and set:
  echo 20000 > /sys/class/net/<if>/gro_flush_timeout
  echo 1 > /sys/class/net/<if>/napi_defer_hard_irqs

Both your affected systems have a RTL8168evl. However this chip version is very common, therefore something seems to be special in your setup, else I would expect much more such reports.
Comment 25 Sergey Kondakov 2020-10-23 07:09:18 UTC
Created attachment 293147 [details]
r8169_stuck-on_5.9.1.txt

Tried out napi-patched 5.9.1 with napi_defer_hard_irqs and gro_flush_timeout options, it worked perfectly for about 12 hours of uptime and then got stuck completely with a familiar kernel trace. So far the only consistent things I've noticed are:
1) r8169 uses MSI-X while r8168 uses legacy MSI.
2) If coalesce parameters were customized for r8169, they're silently reset to default 0/1-0/1 when glitching starts.
3) Usually pk_delay of tc-cake is <20us even on sustained ~100% all-core CPU loads but on glitch it maxes out to >1s.
4) For AM3 motherboard and 5.2 kernel it was enough to increase net.core.netdev_budget* but here it seems to have no influence now.
5) Torrent downloading with summary speed of >2MB/s, a lot of active seeders and a lot of connections allowed in a client while having XMPP client logged-in and playing streamed video in background seems to be a good way to trigger it. Except that it may work fine without a single drop or delay for a while.

Bisect would be just as much of a shot in the dark with such long trigger, it would be easier to buy a PCIe Ethernet card with another chip. Besides, to properly install kernel it has to be packaged on a build-server. Haven't tried manual installation for years since after dropping Gentoo.

As for what's special on my system: pretty much all tunables for most aggressive preemption and latency of process scheduling that's possible. As much of kernel's interrupt handling is offloaded to threads and audio/video/input/SSD-access is bumped while networking and HDD/complex-i/o-scheduling is sacrificed in priority.

For example, scheduler precision is maximized with:
kernel.sched_autogroup_enabled=0 - crucial for priorities to work as expected.
# inspired by https://probablydance.com/2019/12/30/measuring-mutexes-spinlocks-and-how-bad-the-linux-scheduler-really-is/
kernel.sched_latency_ns=100000
kernel.sched_min_granularity_ns=100000
kernel.sched_wakeup_granularity_ns=99
kernel.sched_nr_migrate=12
kernel.sched_migration_cost_ns=99000
kernel.timer_migration=0
kernel.sched_cfs_bandwidth_slice_us=100
kernel.sched_tunable_scaling=1
kernel.sched_child_runs_first=1
kernel.sched_rt_period_us=2000000
kernel.sched_rt_runtime_us=1500000
kernel.sched_rr_timeslice_ms=1
So anything that is not expecting to be preempted will have a bad time for the sake of me not having a bad time due to output stuttering and UI hanging.
And RCU stuff: rcu_nocbs=0-126 rcu_nocb_poll rcutree.kthread_prio=1 rcutree.use_softirq=0 rcupdate.rcu_task_ipi_delay=3333 rcutree.rcu_idle_lazy_gp_delay=4 rcutree.rcu_idle_gp_delay=1 io_delay=none
Although, at the beginning I've tested this bug with much more relaxed scheduling & RCU parameters and without CONFIG_PREEMPT_RT
Comment 26 WGH 2020-10-24 09:01:59 UTC
I had the same driver hang on my new system that I have been running for a week. It's new RTL8125B, XID 641 (support for which got mainlined only in Linux 5.9). The problem occured when I attempted to connect to Windows 7 VM via RDP, but that might be just a coincidence.

[94171.732301] NETDEV WATCHDOG: enp6s0 (r8169): transmit queue 0 timed out
[94171.732319] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:442 dev_watchdog+0x232/0x240
[94171.732321] Modules linked in: macvtap macvlan tap veth xt_MASQUERADE xt_CHECKSUM xt_comment bridge stp llc ip6table_raw ip6table_nat iptable_raw iptable_nat bpfilter fuse btrfs blake2b_generic xor zstd_compress lzo_compress raid6_pq sctp kvm_amd kvm amdgpu irqbypass mfd_core gpu_sched ttm ghash_clmulni_intel nct6775 hwmon_vid k10temp efivarfs
[94171.732345] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.1-gentoo #6
[94171.732348] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B550 Extreme4, BIOS P1.20 08/13/2020
[94171.732353] RIP: 0010:dev_watchdog+0x232/0x240
[94171.732357] Code: 85 c0 75 e5 eb 9c 4c 89 ef c6 05 c5 e6 c9 00 01 e8 b3 1d fc ff 44 89 e1 48 89 c2 4c 89 ee 48 c7 c7 c8 12 a8 91 e8 1e 9d 7e ff <0f> 0b e9 7a ff ff ff 0f 1f 80 00 00 00 00 0f 1f 44 00 00 48 c7 47
[94171.732360] RSP: 0018:ffff9e8d00003ea0 EFLAGS: 00010286
[94171.732363] RAX: 0000000000000000 RBX: ffff8c1aa331d600 RCX: 0000000000000000
[94171.732365] RDX: ffff8c1aaea278a0 RSI: ffff8c1aaea17820 RDI: 0000000000000300
[94171.732368] RBP: ffff8c1aa306a440 R08: ffff8c1aaea17820 R09: 00000000000006e3
[94171.732370] R10: ffffffff922cdd78 R11: ffff9e8d00003d48 R12: 0000000000000000
[94171.732372] R13: ffff8c1aa306a000 R14: ffff8c1aa306a440 R15: 0000000000000000
[94171.732374] FS:  0000000000000000(0000) GS:ffff8c1aaea00000(0000) knlGS:0000000000000000
[94171.732377] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[94171.732380] CR2: 00007f8d50009048 CR3: 0000001e9b98a000 CR4: 0000000000350ef0
[94171.732382] Call Trace:
[94171.732385]  <IRQ>
[94171.732390]  ? qdisc_put_unlocked+0x30/0x30
[94171.732395]  call_timer_fn+0x2d/0x130
[94171.732399]  run_timer_softirq+0x393/0x450
[94171.732403]  ? tick_sched_handle.isra.0+0x40/0x40
[94171.732405]  ? __hrtimer_run_queues+0xfd/0x260
[94171.732408]  ? ktime_get+0x4a/0xc0
[94171.732412]  __do_softirq+0xe1/0x2bf
[94171.732416]  asm_call_irq_on_stack+0x12/0x20
[94171.732418]  </IRQ>
[94171.732423]  do_softirq_own_stack+0x36/0x40
[94171.732427]  irq_exit_rcu+0x9a/0xa0
[94171.732432]  sysvec_apic_timer_interrupt+0x2e/0x80
[94171.732435]  asm_sysvec_apic_timer_interrupt+0x12/0x20
[94171.732440] RIP: 0010:cpuidle_enter_state+0xd5/0x380
[94171.732443] Code: c4 0f 1f 44 00 00 31 ff e8 88 77 91 ff 80 7c 24 0f 00 74 12 9c 58 f6 c4 02 0f 85 8d 02 00 00 31 ff e8 7f c0 96 ff fb 45 85 f6 <0f> 88 20 01 00 00 49 63 c6 be 68 00 00 00 4c 2b 24 24 48 89 c2 48
[94171.732446] RSP: 0018:ffffffff91c03e58 EFLAGS: 00000202
[94171.732448] RAX: ffff8c1aaea2a5c0 RBX: ffff8c1aa5d6b000 RCX: 000000000000001f
[94171.732451] RDX: 0000000000000000 RSI: 0000000021bf5c7a RDI: 0000000000000000
[94171.732453] RBP: ffffffff91cdce00 R08: 000055a610a6578d R09: ffff8c1aa77c9000
[94171.732455] R10: ffff8c1aaea29584 R11: ffff8c1aaea29564 R12: 000055a610a6578d
[94171.732458] R13: ffffffff91cdcee8 R14: 0000000000000002 R15: ffff8c1aa5d6b000
[94171.732463]  ? cpuidle_enter_state+0xb8/0x380
[94171.732466]  cpuidle_enter+0x37/0x60
[94171.732470]  do_idle+0x1c9/0x240
[94171.732473]  cpu_startup_entry+0x19/0x20
[94171.732476]  start_kernel+0x50a/0x52c
[94171.732480]  secondary_startup_64+0xa4/0xb0
[94171.732484] ---[ end trace 896922ae98389a20 ]---
[94171.744374] r8169 0000:06:00.0 enp6s0: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100).
[94171.752789] r8169 0000:06:00.0 enp6s0: rtl_rxtx_empty_cond_2 == 0 (loop: 42, delay: 100).
(I did link up/down at this point)
[94262.048882] r8169 0000:06:00.0 enp6s0: Link is Down
[94264.253871] RTL8125B 2.5Gbps internal r8169-600:00: attached PHY driver [RTL8125B 2.5Gbps internal] (mii_bus:phy_addr=r8169-600:00, irq=IGNORE)
[94264.418592] r8169 0000:06:00.0 enp6s0: Link is Down
Comment 27 Heiner Kallweit 2020-10-24 09:35:54 UTC
(In reply to WGH from comment #26)
> I had the same driver hang on my new system that I have been running for a
> week. It's new RTL8125B, XID 641 (support for which got mainlined only in
> Linux 5.9). The problem occured when I attempted to connect to Windows 7 VM
> via RDP, but that might be just a coincidence.
> 
This bug report is about a regression in 5.1. The trace you posted is a generic tx timeout, rout cause can be anything. Also the report is about a different chip version. Having said that it's most likely a different issue. What you could do:
- check whether you can reproduce the issue
- check the 5.9-rc kernels whether there was one w/o this issue, so that you 
  have a basis for a bisect
Comment 28 WGH 2020-10-24 09:39:46 UTC
RDP activity is triggering this bug very easily somehow. No other activity including downloading packages, downloading games from Steam, running iperf in all possible directions, copying VM images with SSH+rsync could trigger this.

The first time I got this stacktrace AND two rtl_rxtx_empty_cond warnings.

The next times connectivity disappears, and after ~5-60 seconds I get two rtl_rxtx_empty_cond warnings, and connectivity recovers.
Comment 29 Zhuravlev Uriy 2021-02-12 10:39:10 UTC
Hello all! 

I have the same issue but not only for r8169 but also for e1000e and ibt (I have many devices) but it's mostly gone if I change fc_codel to pfifo_fast. 

> tc qdisc replace dev eth1 root pfifo_fast

Interesting, it happens only with a 1Gb connection for 100Mb is stable.
I suppose we have some regression in fc_codel itself. All kernels started from 5.4 and up to 5.10 affected (it's what I tested).
Comment 30 Sergey Kondakov 2021-02-12 11:01:40 UTC
(In reply to Zhuravlev Uriy from comment #29)
> Hello all! 
> 
> I have the same issue but not only for r8169 but also for e1000e and ibt (I
> have many devices) but it's mostly gone if I change fc_codel to pfifo_fast. 
> 
> > tc qdisc replace dev eth1 root pfifo_fast
> 
> Interesting, it happens only with a 1Gb connection for 100Mb is stable.
> I suppose we have some regression in fc_codel itself. All kernels started
> from 5.4 and up to 5.10 affected (it's what I tested).

I can reproduce it also with tc-cake (with 'no-split-gso' option) and even tc-sfq. And it seems to be a lower-level issue because it started happening on third-party r8168 too as well as kernel's r8169, now even without complains in dmesg or not happening after complains like "WARNING: CPU: 7 PID: 75 at kernel/softirq.c:484 __raise_softirq_irqoff+0x64/0x110" with trace starting with "napi_watchdog+0x78/0x90". But with codel and cake it happens much more sooner and often, only full reboot fixes things, just resetting driver modules or replacing qdisc after the fact - doesn't. There is no relevant output in dmesg, like network activity hanging is OK.

After some tweaks it seem to work, despite aforementioned warnings at boot up. Like:
* not enabling /sys/class/net/${interface}/napi_defer_hard_irqs
* setting:
net.core.dev_weight=8
net.core.dev_weight_rx_bias=3
net.core.dev_weight_tx_bias=2
net.core.busy_poll=5
net.core.netdev_max_backlog=1024
net.core.netdev_budget=48
net.core.netdev_budget_usecs=5000
* doing this at interface bring-up:
for txq in /sys/class/net/${interface}/queues/tx-*; do
        echo "42" > ${txq}/byte_queue_limits/limit_min
        echo "65536" > ${txq}/byte_queue_limits/limit_max
done
echo "9999" > /sys/class/net/${interface}/gro_flush_timeout
ip l set "${interface}" gso_max_size 8192 gso_max_segs 8 multicast on txqueuelen 256
ip l set "${interface}" mtu 9194

But it's all complete guesswork.
Comment 31 Zhuravlev Uriy 2021-02-14 03:19:11 UTC
I also solved this issue - just disable the ASPM for the kernel (pcie_aspm=off).
Comment 32 Sergey Kondakov 2021-02-14 03:37:24 UTC
(In reply to Zhuravlev Uriy from comment #31)
> I also solved this issue - just disable the ASPM for the kernel
> (pcie_aspm=off).

Except that both my boards are too old to have it enabled by default and I've forced-disabled it in kernel from beginning anyway. Also, there were some reports that it doesn't actually gets disabled for good unless you put 'pcie_aspm=force pcie_aspm.policy=performance', don't know if that got ever fixed.
Comment 33 dnieper 2024-01-25 02:02:11 UTC Comment hidden (spam)
Comment 34 Heiner Kallweit 2024-01-29 09:54:31 UTC
(In reply to dnieper from comment #33)
> [...]
> 
> https://geometrydash-free.com
> 
>  [...]

Spam

Note You need to log in before you can comment on or make changes to this bug.