Created attachment 278555 [details] kernel config Copying from the machine to an other server (protocol does not matter), causes a kernel crash when using tc-setting with SFQ. The machine has a Qualcom Killer NIC: lspci |grep Killer 03:00.0 Ethernet controller: Qualcomm Atheros Killer E220x Gigabit Ethernet Controller (rev 13) I use traffic control with SFQ: tc qdisc add dev enp3s0 root handle 1: sfq tc qdisc show dev enp3s0 Now I try to copy a big file (124GB, an image of a partition) to another Linux-Server (same kernel version) to a NFS-Share. It does not matter if it is a nfs or samba or whatever-share. It also does not matter if I use cp or rsync command. The target-share is for example: grep base /proc/mounts jaguar.grafnetz:/base /mnt/base nfs4 rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.0.9,local_lock=none,addr=192.168.0.7 0 0 df shows this nfs-share called base when mounted: jaguar.grafnetz:/base 11718572032 6012592128 5705979904 52% /mnt/base Now I use a simpe cp-command: cp big-fime.dd.image /mnt/base/test_01 The machine crashes after 7833735168 Bytes reached the Target-Server. About 7,9 GB (with G=1000^3). I can reproduce this crash. The good thing is: I figured out that no kernel crash happens when I do not use: tc qdisc add dev enp3s0 root handle 1: sfq tc qdisc show dev enp3s0 (So I commented it out from my local start-script and rebootet the system.) Result: No crash any more. Copying the big file (124GB) completed without a kernel crash. Additional Information... NIC is configured with IPv4: haswell ~ # ifconfig enp3s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.0.9 netmask 255.255.255.0 broadcast 192.168.0.255 ether d4:3d:7e:bd:89:44 txqueuelen 1000 (Ethernet) RX packets 7399483 bytes 511559908 (487.8 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 91781850 bytes 47176316774 (43.9 GiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 device interrupt 19 ethtool enp3s0 Settings for enp3s0: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Supported pause frame use: Symmetric Receive-only Supports auto-negotiation: Yes Supported FEC modes: Not reported Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Advertised pause frame use: Symmetric Advertised auto-negotiation: Yes Advertised FEC modes: Not reported Speed: 1000Mb/s Duplex: Full Port: Twisted Pair PHYAD: 0 Transceiver: internal Auto-negotiation: on MDI-X: Unknown Current message level: 0x000060e4 (24804) link ifup rx_err tx_err hw wol Link detected: yes While copying over the Gigabit-Network, speed is near maximum: ifstat enp2s0 KB/s in KB/s out 0.06 0.18 8348.65 31.60 117536.2 435.11 118049.0 435.04 119100.9 434.84 118889.7 435.19 119004.1 444.53 119061.4 440.47 119102.8 444.04 119077.4 444.39 119084.1 432.32 119089.6 439.71 [...] So, perhaps the sfq-Kernel-module has a bug. I use the vanilla kernel from kernel.org and sfq is compiled as a module. /usr/src/linux # grep SFQ .config CONFIG_NET_SCH_SFQ=m Perhaps important: the server with the target-share also uses sfq with the same settings without a problem. It runs stable.
The ifstat-output is from the target machine. So the incoming file is showed as "KB/s in".
Same behaviour with fresh Linux Kernel 4.18.8.
Hi Mario, thanks for your report. Can you please provide a stack trace for the crash?
Created attachment 278639 [details] Picture of the Screen after the Oops. The messages after the copy command.
Created attachment 279143 [details] netem crash kernel also I meet almost the same issue, and reproducted at 4.14.71 after system startup, run: [~]# tc qdisc add dev eth0 root netem delay 5ms then, just ping the IP address of eth0 from other machine, the kernel will crash. 4.14.70 is ok under the same operation 4.19 is ok 4.18.10+ is ok, couldn't confirm versions before 4.18.10
4.14.79: netem is ok.
I have a similar situation 1. From another Computer, send a file to my server. scp some_file to _my_server 2. ADD a qdisc sfq into the default class tc qdisc add dev ifb1 root handle 0:0 hfsc default 100 tc class add dev ifb1 parent 0:0 classid 1 hfsc sc rate 500mbit ul rate 500mbit tc class add dev ifb1 parent 0:1 classid 100 hfsc sc rate 200mbit ul rate 200mbit -> tc qdisc add dev ifb1 parent 0:100 sfq perturb 10 3. kernel crash occurs without any LOG. all system freeze.. The problem occur with any Kernel 5.x (already tested 5.3.10). With 4.x, its working like a charm
Same situation with Kernel 5.3.11.
I upgraded my kernel to version 5.8.8, and freezes continue... (I used the same .config file compiled in 4.x - and in a 4.x kernel, its ok.) 1. From another Computer, send a file to my server. $ scp test.bin to_my_server 2. ADD a qdisc sfq into the default class (see the last line) $ tc qdisc add dev ifb1 root handle 0:0 hfsc default 100 $ tc class add dev ifb1 parent 0:0 classid 1 hfsc sc rate 500mbit ul rate 500mbit $ tc class add dev ifb1 parent 0:1 classid 100 hfsc sc rate 200mbit ul rate 200mbit $ -> tc qdisc add dev ifb1 parent 0:100 sfq perturb 10 3. kernel crash occurs without any LOG. all system freezes.. Thanks in advance.