Bug 206497 - pfifo_fast: netdev xmit sends frames OOO to devices
Summary: pfifo_fast: netdev xmit sends frames OOO to devices
Status: NEW
Alias: None
Product: Networking
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Stephen Hemminger
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-02-11 13:08 UTC by Chaitanya T K
Modified: 2020-06-16 15:39 UTC (History)
1 user (show)

See Also:
Kernel Version: 5.4.18
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Chaitanya T K 2020-02-11 13:08:19 UTC
When running iperf3 on a Single Q device (WLAN device), iperf3 reports OOO (out of order frames) after investigating it was found the ndo_xmit() itself is sending the frames OOO. (only a few per stream)

$ sudo tc -s qdisc show dev wlp4s0f0
qdisc pfifo_fast 0: root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 792082458 bytes 523933 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0 


4 iperf3 sessions are run on different ports and below rules to prioritize them:

iptables -A OUTPUT -o lo -t mangle -d localhost -p udp --dport 5201 -j DSCP --set-dscp-class cs1
iptables -A OUTPUT -o lo -t mangle -d localhost -p udp --dport 5202 -j DSCP --set-dscp-class cs0
iptables -A OUTPUT -o lo -t mangle -d localhost -p udp --dport 5203 -j DSCP --set-dscp-class cs4
iptables -A OUTPUT -o lo -t mangle -d localhost -p udp --dport 5204 -j DSCP --set-dscp-class cs6

FYI, my problem seems to quite similar to the one https://www.spinics.net/lists/netdev/msg490934.html but the fix is already part of 5.4.18

2 workarounds found
 - booting with `nosmp` (similar to commenting TCQ_F_NOLOCK but haven't tried)
 - And, switching to fq_codel solves the problem. (Forgot the exact params for the tc fq_codel)
Comment 1 Chaitanya T K 2020-02-12 15:33:34 UTC
A couple of comments:

- it wasn't fq_codel I tried as our driver doesn't support BQL (but can't recollect what it was, just copy-pasted)
- Tried running https://raw.githubusercontent.com/jakob-tsd/pfifo_stress/master/pfifo_stress.py but no issues seen (as we are using wireless devices commented setup_network and manually configured them)
Comment 2 Chaitanya T K 2020-02-12 16:35:22 UTC
Spoke too soon:

Able to repro with the direction changed:

$ sudo ./pfifo_stress.py 
expected ctr 0x1, received 0x41a2
expected ctr 0x2e481c, received 0x2e4838
expected ctr 0x2e493f, received 0x2e496d
expected ctr 0x2e497d, received 0x2e497e
Comment 3 Chaitanya T K 2020-02-12 16:37:16 UTC
uname `Linux BH2-LAB-13 5.4.16-050416-generic #202001300040 SMP Thu Jan 30 06:06:07 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux`

nr_cpus=4


$ sudo tc -s qdisc show dev wlp4s0f0
qdisc pfifo_fast 0: root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 11539860196 bytes 250866528 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0 

$ sudo tc -s qdisc show dev wlp8s0f0
qdisc pfifo_fast 0: root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 364 bytes 6 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
Comment 4 Chaitanya T K 2020-02-20 09:49:19 UTC
update from the netdev mailing list:

The issue looks like a duplicate of https://www.spinics.net/lists/linux-can/msg03239.html bisected to below commit, a few patches proposed but none works, possibly reverting is the only solution.

ba27b4cdaaa ("net: dev: introduce support for sch BYPASS for lockless qdisc")
Comment 5 Chaitanya T K 2020-05-21 10:58:57 UTC
Any update on this? I have seen some discussion on the list and then it died down?
Comment 6 Cong Wang 2020-06-12 19:22:27 UTC
Does the following commit fix this bug?


commit 379349e9bc3b42b8b2f8f7a03f64a97623fff323
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Tue Feb 18 18:15:44 2020 +0100

    Revert "net: dev: introduce support for sch BYPASS for lockless qdisc"
    
    This reverts commit ba27b4cdaaa66561aaedb2101876e563738d36fe
    
    Ahmed reported ouf-of-order issues bisected to commit ba27b4cdaaa6
    ("net: dev: introduce support for sch BYPASS for lockless qdisc").
    I can't find any working solution other than a plain revert.
    
    This will introduce some minor performance regressions for
    pfifo_fast qdisc. I plan to address them in net-next with more
    indirect call wrapper boilerplate for qdiscs.
    
    Reported-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
    Fixes: ba27b4cdaaa6 ("net: dev: introduce support for sch BYPASS for lockless qdisc")
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
Comment 7 Chaitanya T K 2020-06-16 15:39:12 UTC
I have installed 5.7 kernel (5.7.0-050700-generic #202006082127 x86_64) (which has that commit) and ran the below tests, and can confirm that the issue is resolved.

* iperf 4 stream test
* pfifo_stress.py

Note You need to log in before you can comment on or make changes to this bug.