When running iperf3 on a Single Q device (WLAN device), iperf3 reports OOO (out of order frames) after investigating it was found the ndo_xmit() itself is sending the frames OOO. (only a few per stream) $ sudo tc -s qdisc show dev wlp4s0f0 qdisc pfifo_fast 0: root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 Sent 792082458 bytes 523933 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 4 iperf3 sessions are run on different ports and below rules to prioritize them: iptables -A OUTPUT -o lo -t mangle -d localhost -p udp --dport 5201 -j DSCP --set-dscp-class cs1 iptables -A OUTPUT -o lo -t mangle -d localhost -p udp --dport 5202 -j DSCP --set-dscp-class cs0 iptables -A OUTPUT -o lo -t mangle -d localhost -p udp --dport 5203 -j DSCP --set-dscp-class cs4 iptables -A OUTPUT -o lo -t mangle -d localhost -p udp --dport 5204 -j DSCP --set-dscp-class cs6 FYI, my problem seems to quite similar to the one https://www.spinics.net/lists/netdev/msg490934.html but the fix is already part of 5.4.18 2 workarounds found - booting with `nosmp` (similar to commenting TCQ_F_NOLOCK but haven't tried) - And, switching to fq_codel solves the problem. (Forgot the exact params for the tc fq_codel)
A couple of comments: - it wasn't fq_codel I tried as our driver doesn't support BQL (but can't recollect what it was, just copy-pasted) - Tried running https://raw.githubusercontent.com/jakob-tsd/pfifo_stress/master/pfifo_stress.py but no issues seen (as we are using wireless devices commented setup_network and manually configured them)
Spoke too soon: Able to repro with the direction changed: $ sudo ./pfifo_stress.py expected ctr 0x1, received 0x41a2 expected ctr 0x2e481c, received 0x2e4838 expected ctr 0x2e493f, received 0x2e496d expected ctr 0x2e497d, received 0x2e497e
uname `Linux BH2-LAB-13 5.4.16-050416-generic #202001300040 SMP Thu Jan 30 06:06:07 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux` nr_cpus=4 $ sudo tc -s qdisc show dev wlp4s0f0 qdisc pfifo_fast 0: root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 Sent 11539860196 bytes 250866528 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 $ sudo tc -s qdisc show dev wlp8s0f0 qdisc pfifo_fast 0: root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 Sent 364 bytes 6 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0
update from the netdev mailing list: The issue looks like a duplicate of https://www.spinics.net/lists/linux-can/msg03239.html bisected to below commit, a few patches proposed but none works, possibly reverting is the only solution. ba27b4cdaaa ("net: dev: introduce support for sch BYPASS for lockless qdisc")
Any update on this? I have seen some discussion on the list and then it died down?
Does the following commit fix this bug? commit 379349e9bc3b42b8b2f8f7a03f64a97623fff323 Author: Paolo Abeni <pabeni@redhat.com> Date: Tue Feb 18 18:15:44 2020 +0100 Revert "net: dev: introduce support for sch BYPASS for lockless qdisc" This reverts commit ba27b4cdaaa66561aaedb2101876e563738d36fe Ahmed reported ouf-of-order issues bisected to commit ba27b4cdaaa6 ("net: dev: introduce support for sch BYPASS for lockless qdisc"). I can't find any working solution other than a plain revert. This will introduce some minor performance regressions for pfifo_fast qdisc. I plan to address them in net-next with more indirect call wrapper boilerplate for qdiscs. Reported-by: Ahmad Fatoum <a.fatoum@pengutronix.de> Fixes: ba27b4cdaaa6 ("net: dev: introduce support for sch BYPASS for lockless qdisc") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
I have installed 5.7 kernel (5.7.0-050700-generic #202006082127 x86_64) (which has that commit) and ran the below tests, and can confirm that the issue is resolved. * iperf 4 stream test * pfifo_stress.py