Bug 109581
Summary: | WARNING: CPU: 1 PID: 0 at net/sched/sch_hfsc.c:1429 hfsc_dequeue+0x166/0x2da() | ||
---|---|---|---|
Product: | Networking | Reporter: | Pawel Staszewski (pstaszewski) |
Component: | Other | Assignee: | Stephen Hemminger (stephen) |
Status: | RESOLVED PATCH_ALREADY_AVAILABLE | ||
Severity: | high | CC: | allfox, arch, asilva, bugzilla.kernel.org, ccav2000, koct9i, kosmo, lucas.bocchi, marco, matthew4196, michael, nbehnken, philipp, sebastian.kricner, sebastian_ml, shemming, stasn77, szg00000, toracat, xiyou.wangcong |
Priority: | P1 | ||
Hardware: | Intel | ||
OS: | Linux | ||
Kernel Version: | 4.3.3 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
attachment-29507-0.html
Log of git bisect Log of 2nd git bisect |
Description
Pawel Staszewski
2015-12-18 17:12:15 UTC
How to reproduce this? (In reply to Cong Wang from comment #1) > How to reproduce this? I'm periodically getting the same warnings using hfsc+fq_codel. I have thousands such classes+qdiscs and I don't know exactly wich of them causing warn_on. Changing qdisc to sfq (or any other not codel) solves problem. If I replace to HFSC->PRIO->3xFQ_CODELs then WARN_ON disappears, but qdisc_warn_nonwc appears: [Вс янв 3 08:02:00 2016] qdisc_peek_len: prio qdisc E52A: is non-work-conserving? [Вс янв 3 08:24:18 2016] HFSC: prio qdisc FFFE: is non-work-conserving? And one more thing noticed: IPs in such problematic qdiscs are under DNS amplification DDoS attack. Blocking dns-flood before the shaper almost certainly eliminates the appearance of warnings. It looks like this can happen under load with HFSC when a class becomes empty. codel may drop packets when overloaded which makes HFSC upset "this is not the packet you are looking for". And sometimes this leads to a total loss of traffic on the interface with hfsc shaper (while increasing the CPU usage and memory consumption) until you restart shaper. *** Bug 110581 has been marked as a duplicate of this bug. *** This can also be reproduced on OpenWRT trunk on at least the x86_64 platform with Linux 4.4. If the qos-scripts package is installed and enabled with the default settings, uploading any file more than about 2 megabytes large will trigger a flood of the above-posted kernel messages, consuming 100% of the CPU and halting all outgoing traffic. I am observing exactly the same issue on kernel 4.2.8 with HFSC->SFQ. I have 16 ifb interfaces with about 200 HFSC classes on each. The problem is visible even if routed traffic is very low (100Mbit). Bug can be reproduced on any platform with linux kernel 4.3.3 with that script: tc qdisc del dev vlan3703 root tc qdisc add dev vlan3703 handle 1: root hfsc default 100 tc class add dev vlan3703 parent 1: classid 1:1 hfsc ls m2 600Mbit ul m2 600Mbit tc class add dev vlan3703 parent 1:1 classid 1:2 hfsc ls m2 600Mbit ul m2 600Mbit tc class add dev vlan3703 parent 1:2 classid 1:3 hfsc ls m2 600Mbit ul m2 600Mbit tc class add dev vlan3703 parent 1:2 classid 1:100 hfsc ls m2 1kbit tc filter add dev vlan3703 parent 1: protocol ip prio 3 u32 match ip dst 0/0 flowid 1:3 tc qdisc add dev vlan3703 parent 1:3 handle 3: sfq perturb 60 tc qdisc del dev vlan3703 ingress tc qdisc add dev vlan3703 handle ffff: ingress tc filter add dev vlan3703 parent ffff: protocol ip prio 50 \ u32 match ip src 0/0 police rate 600Mbit \ burst 60M mtu 32k drop flowid :2 generate any traffic by iperf that will go out/in interface where we have hfsc *** Bug 112011 has been marked as a duplicate of this bug. *** (In reply to Pawel Staszewski from comment #9) > Bug can be reproduced on any platform with linux kernel 4.3.3 > with that script: > tc qdisc del dev vlan3703 root > tc qdisc add dev vlan3703 handle 1: root hfsc default 100 > tc class add dev vlan3703 parent 1: classid 1:1 hfsc ls m2 600Mbit ul m2 > 600Mbit > > tc class add dev vlan3703 parent 1:1 classid 1:2 hfsc ls m2 600Mbit ul m2 > 600Mbit > > > tc class add dev vlan3703 parent 1:2 classid 1:3 hfsc ls m2 600Mbit ul m2 > 600Mbit > tc class add dev vlan3703 parent 1:2 classid 1:100 hfsc ls m2 1kbit > > tc filter add dev vlan3703 parent 1: protocol ip prio 3 u32 match ip dst 0/0 > flowid 1:3 > tc qdisc add dev vlan3703 parent 1:3 handle 3: sfq perturb 60 > > tc qdisc del dev vlan3703 ingress > tc qdisc add dev vlan3703 handle ffff: ingress > tc filter add dev vlan3703 parent ffff: protocol ip prio 50 \ > u32 match ip src 0/0 police rate 600Mbit \ > burst 60M mtu 32k drop flowid :2 > > generate any traffic by iperf that will go out/in interface where we have > hfsc I tried this on the latest kernel, after s/vlan3703/eth0/, I don't see any warning. Checked with kernel 4.4.3 and 4.4.4 - and there is no warning also... It was still happening for me on 4.4.3. I haven't had a chance to try 4.4.4 yet. Yes just checked now with latest 4.5 kernel: [14177.967927] WARNING: CPU: 3 PID: 0 at net/sched/sch_hfsc.c:1429 hfsc_dequeue+0x13d/0x262() [14177.967929] Modules linked in: ipt_NETFLOW(O) xt_set ip_set_hash_net ip_set ixgbe mdio ipmi_si x86_pkg_temp_thermal [14177.967934] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G W O 4.5.0 #3 [14177.967935] Hardware name: Supermicro X10SLL-F/X10SLL-SF/X10SLL-F/X10SLL-SF, BIOS 1.0a 06/11/2013 [14177.967937] 0000000000000000 ffff88022fcc3858 ffffffff8123fbb1 0000000000000000 [14177.967938] 0000000000000009 ffff88022fcc3898 ffffffff81040bee ffff88022fcc3888 [14177.967940] ffffffff81501aae 0000000000000000 ffff88021f3cc800 00000033941115bf [14177.967941] Call Trace: [14177.967942] <IRQ> [<ffffffff8123fbb1>] dump_stack+0x4d/0x64 [14177.967948] [<ffffffff81040bee>] warn_slowpath_common+0x97/0xb1 [14177.967950] [<ffffffff81501aae>] ? hfsc_dequeue+0x13d/0x262 [14177.967951] [<ffffffff81040c1d>] warn_slowpath_null+0x15/0x17 [14177.967952] [<ffffffff81501aae>] hfsc_dequeue+0x13d/0x262 [14177.967955] [<ffffffff814f9082>] __qdisc_run+0xb1/0x16f [14177.967958] [<ffffffff814e0923>] __dev_queue_xmit+0x2f5/0x494 [14177.967959] [<ffffffff814e0ad8>] dev_queue_xmit+0xb/0xd [14177.967961] [<ffffffff814e904d>] neigh_connected_output+0x9c/0xb5 [14177.967963] [<ffffffff8151c103>] ip_finish_output2+0x228/0x288 [14177.967965] [<ffffffff81557f12>] ? ipt_do_table+0x387/0x3a9 [14177.967966] [<ffffffff8151c9de>] ip_finish_output+0x13e/0x151 [14177.967968] [<ffffffff8151da2b>] ip_output+0x4f/0xb8 [14177.967969] [<ffffffff81557ffe>] ? iptable_filter_hook+0x46/0x48 [14177.967971] [<ffffffff81510a5c>] ? nf_iterate+0x33/0x5e [14177.967972] [<ffffffff8151a857>] ip_forward_finish+0x53/0x5a [14177.967973] [<ffffffff8151ab48>] ip_forward+0x2ea/0x412 [14177.967974] [<ffffffff8151a804>] ? ip_frag_mem+0x3e/0x3e [14177.967976] [<ffffffff81519138>] ip_rcv_finish+0x244/0x27c [14177.967978] [<ffffffff815196ed>] ip_rcv+0x26d/0x2f0 [14177.967979] [<ffffffff81519102>] ? ip_rcv_finish+0x20e/0x27c [14177.967981] [<ffffffff81518ef4>] ? pskb_may_pull+0x30/0x30 [14177.967982] [<ffffffff814de115>] __netif_receive_skb_core+0x4b3/0x52c [14177.967984] [<ffffffff81546fe2>] ? inet_gro_receive+0x1a6/0x1b8 [14177.967985] [<ffffffff814de1dc>] __netif_receive_skb+0x4e/0x60 [14177.967987] [<ffffffff814de36a>] netif_receive_skb_internal+0x52/0x84 [14177.967988] [<ffffffff814dedbd>] napi_gro_receive+0x45/0xa1 [14177.967994] [<ffffffffa005af1f>] ixgbe_clean_rx_irq+0x5e6/0x6db [ixgbe] [14177.967997] [<ffffffffa005ba26>] ixgbe_poll+0x495/0x60b [ixgbe] [14177.968000] [<ffffffff8105a4bf>] ? ttwu_do_activate.constprop.74+0x57/0x5c [14177.968001] [<ffffffff8105b1ec>] ? try_to_wake_up+0x1b5/0x1c7 [14177.968003] [<ffffffff814dead6>] net_rx_action+0xdd/0x247 [14177.968005] [<ffffffff810435b4>] __do_softirq+0xae/0x1a9 [14177.968006] [<ffffffff81043802>] irq_exit+0x37/0x45 [14177.968008] [<ffffffff81004b4f>] do_IRQ+0xa1/0xba [14177.968011] [<ffffffff815c933f>] common_interrupt+0x7f/0x7f [14177.968012] <EOI> [<ffffffff814a6aab>] ? cpuidle_enter_state+0x12d/0x185 [14177.968015] [<ffffffff814a6a6c>] ? cpuidle_enter_state+0xee/0x185 [14177.968016] [<ffffffff814a6b25>] cpuidle_enter+0x12/0x14 [14177.968018] [<ffffffff81067879>] cpu_startup_entry+0x131/0x1b0 [14177.968020] [<ffffffff8102a6e8>] start_secondary+0xe1/0xe5 [14177.968021] ---[ end trace 7ca47b9601695df7 ]--- And this iformation was produced on iproute tc rules update so tc qdisc del dev xxx root on two interfaces and then tc -f -b tc.file changes in foles made was replacing sfq and sfb qdisc to fq_codel After this first qdisc change and reload can't reproduce this by trying to reload tc again.... It is totally random - dont know what is producing this exactly. Currently wihout any changes logs below: Mar 16 15:54:23 RTR_TM_01 kernel: [16465.514531] ---[ end trace 7ca47b9601695e3a ]--- Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931477] ------------[ cut here ]------------ Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931484] WARNING: CPU: 3 PID: 0 at net/sched/sch_hfsc.c:1429 hfsc_dequeue+0x13d/0x262() Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931485] Modules linked in: ipt_NETFLOW(O) xt_set ip_set_hash_net ip_set ixgbe mdio ipmi_si x86_pkg_temp_thermal Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931490] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G W O 4.5.0 #3 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931491] Hardware name: Supermicro X10SLL-F/X10SLL-SF/X10SLL-F/X10SLL-SF, BIOS 1.0a 06/11/2013 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931492] 0000000000000000 ffff88022fcc3858 ffffffff8123fbb1 0000000000000000 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931494] 0000000000000009 ffff88022fcc3898 ffffffff81040bee ffff88022fcc3888 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931496] ffffffff81501aae 0000000000000000 ffff880201ef7800 0000003c148464c8 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931497] Call Trace: Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931498] <IRQ> [<ffffffff8123fbb1>] dump_stack+0x4d/0x64 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931503] [<ffffffff81040bee>] warn_slowpath_common+0x97/0xb1 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931505] [<ffffffff81501aae>] ? hfsc_dequeue+0x13d/0x262 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931506] [<ffffffff81040c1d>] warn_slowpath_null+0x15/0x17 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931508] [<ffffffff81501aae>] hfsc_dequeue+0x13d/0x262 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931510] [<ffffffff814f9082>] __qdisc_run+0xb1/0x16f Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931512] [<ffffffff814e0923>] __dev_queue_xmit+0x2f5/0x494 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931514] [<ffffffff814e0ad8>] dev_queue_xmit+0xb/0xd Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931516] [<ffffffff814e904d>] neigh_connected_output+0x9c/0xb5 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931518] [<ffffffff8151c103>] ip_finish_output2+0x228/0x288 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931520] [<ffffffff81557f12>] ? ipt_do_table+0x387/0x3a9 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931521] [<ffffffff8151c9de>] ip_finish_output+0x13e/0x151 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931522] [<ffffffff8151da2b>] ip_output+0x4f/0xb8 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931523] [<ffffffff81557ffe>] ? iptable_filter_hook+0x46/0x48 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931525] [<ffffffff81510a5c>] ? nf_iterate+0x33/0x5e Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931526] [<ffffffff8151a857>] ip_forward_finish+0x53/0x5a Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931528] [<ffffffff8151ab48>] ip_forward+0x2ea/0x412 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931529] [<ffffffff8151a804>] ? ip_frag_mem+0x3e/0x3e Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931531] [<ffffffff81519138>] ip_rcv_finish+0x244/0x27c Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931532] [<ffffffff815196ed>] ip_rcv+0x26d/0x2f0 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931534] [<ffffffff81519102>] ? ip_rcv_finish+0x20e/0x27c Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931536] [<ffffffff81518ef4>] ? pskb_may_pull+0x30/0x30 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931537] [<ffffffff814de115>] __netif_receive_skb_core+0x4b3/0x52c Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931539] [<ffffffff81546fe2>] ? inet_gro_receive+0x1a6/0x1b8 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931540] [<ffffffff814de1dc>] __netif_receive_skb+0x4e/0x60 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931542] [<ffffffff814de36a>] netif_receive_skb_internal+0x52/0x84 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931543] [<ffffffff814dedbd>] napi_gro_receive+0x45/0xa1 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931549] [<ffffffffa005af1f>] ixgbe_clean_rx_irq+0x5e6/0x6db [ixgbe] Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931552] [<ffffffffa005ba26>] ixgbe_poll+0x495/0x60b [ixgbe] Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931554] [<ffffffff814dead6>] net_rx_action+0xdd/0x247 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931556] [<ffffffff810435b4>] __do_softirq+0xae/0x1a9 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931557] [<ffffffff81043802>] irq_exit+0x37/0x45 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931559] [<ffffffff81004b4f>] do_IRQ+0xa1/0xba Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931562] [<ffffffff815c933f>] common_interrupt+0x7f/0x7f Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931562] <EOI> [<ffffffff814a6aa6>] ? cpuidle_enter_state+0x128/0x185 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931566] [<ffffffff814a6a6c>] ? cpuidle_enter_state+0xee/0x185 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931567] [<ffffffff814a6b25>] cpuidle_enter+0x12/0x14 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931569] [<ffffffff81067879>] cpu_startup_entry+0x131/0x1b0 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931571] [<ffffffff8102a6e8>] start_secondary+0xe1/0xe5 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931573] ---[ end trace 7ca47b9601695e3b ]--- Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931584] ------------[ cut here ]------------ Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931588] WARNING: CPU: 5 PID: 0 at net/sched/sch_hfsc.c:1429 hfsc_dequeue+0x13d/0x262() Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931590] Modules linked in: ipt_NETFLOW(O) xt_set ip_set_hash_net ip_set ixgbe mdio ipmi_si x86_pkg_temp_thermal Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931594] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G W O 4.5.0 #3 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931595] Hardware name: Supermicro X10SLL-F/X10SLL-SF/X10SLL-F/X10SLL-SF, BIOS 1.0a 06/11/2013 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931596] 0000000000000000 ffff88022fd43858 ffffffff8123fbb1 0000000000000000 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931598] 0000000000000009 ffff88022fd43898 ffffffff81040bee ffff88022fd43888 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931599] ffffffff81501aae 0000000000000000 ffff880201ef7800 0000003c14846b58 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931601] Call Trace: Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931601] <IRQ> [<ffffffff8123fbb1>] dump_stack+0x4d/0x64 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931605] [<ffffffff81040bee>] warn_slowpath_common+0x97/0xb1 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931607] [<ffffffff81501aae>] ? hfsc_dequeue+0x13d/0x262 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931608] [<ffffffff81040c1d>] warn_slowpath_null+0x15/0x17 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931609] [<ffffffff81501aae>] hfsc_dequeue+0x13d/0x262 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931611] [<ffffffff814f9082>] __qdisc_run+0xb1/0x16f Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931613] [<ffffffff814e0923>] __dev_queue_xmit+0x2f5/0x494 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931614] [<ffffffff814e0ad8>] dev_queue_xmit+0xb/0xd Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931616] [<ffffffff814e904d>] neigh_connected_output+0x9c/0xb5 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931617] [<ffffffff8151c103>] ip_finish_output2+0x228/0x288 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931619] [<ffffffff81557f12>] ? ipt_do_table+0x387/0x3a9 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931620] [<ffffffff8151c9de>] ip_finish_output+0x13e/0x151 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931621] [<ffffffff8151da2b>] ip_output+0x4f/0xb8 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931623] [<ffffffff81557ffe>] ? iptable_filter_hook+0x46/0x48 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931624] [<ffffffff81510a5c>] ? nf_iterate+0x33/0x5e Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931625] [<ffffffff8151a857>] ip_forward_finish+0x53/0x5a Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931626] [<ffffffff8151ab48>] ip_forward+0x2ea/0x412 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931628] [<ffffffff8151a804>] ? ip_frag_mem+0x3e/0x3e Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931629] [<ffffffff81519138>] ip_rcv_finish+0x244/0x27c Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931631] [<ffffffff815196ed>] ip_rcv+0x26d/0x2f0 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931633] [<ffffffff81519102>] ? ip_rcv_finish+0x20e/0x27c Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931634] [<ffffffff81518ef4>] ? pskb_may_pull+0x30/0x30 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931636] [<ffffffff814de115>] __netif_receive_skb_core+0x4b3/0x52c Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931637] [<ffffffff81546fe2>] ? inet_gro_receive+0x1a6/0x1b8 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931638] [<ffffffff814de1dc>] __netif_receive_skb+0x4e/0x60 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931640] [<ffffffff814de36a>] netif_receive_skb_internal+0x52/0x84 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931641] [<ffffffff814dedbd>] napi_gro_receive+0x45/0xa1 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931645] [<ffffffffa005af1f>] ixgbe_clean_rx_irq+0x5e6/0x6db [ixgbe] Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931648] [<ffffffffa005ba26>] ixgbe_poll+0x495/0x60b [ixgbe] Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931650] [<ffffffff812ed59a>] ? credit_entropy_bits+0x1e1/0x284 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931652] [<ffffffff814dead6>] net_rx_action+0xdd/0x247 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931653] [<ffffffff810435b4>] __do_softirq+0xae/0x1a9 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931655] [<ffffffff81043802>] irq_exit+0x37/0x45 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931656] [<ffffffff81004b4f>] do_IRQ+0xa1/0xba Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931658] [<ffffffff815c933f>] common_interrupt+0x7f/0x7f Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931659] <EOI> [<ffffffff814a6aa6>] ? cpuidle_enter_state+0x128/0x185 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931661] [<ffffffff814a6a6c>] ? cpuidle_enter_state+0xee/0x185 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931663] [<ffffffff814a6b25>] cpuidle_enter+0x12/0x14 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931664] [<ffffffff81067879>] cpu_startup_entry+0x131/0x1b0 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931666] [<ffffffff8102a6e8>] start_secondary+0xe1/0xe5 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931667] ---[ end trace 7ca47b9601695e3c ]--- Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931670] ------------[ cut here ]------------ Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931674] WARNING: CPU: 4 PID: 0 at net/sched/sch_hfsc.c:1429 hfsc_dequeue+0x13d/0x262() Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931675] Modules linked in: ipt_NETFLOW(O) xt_set ip_set_hash_net ip_set ixgbe mdio ipmi_si x86_pkg_temp_thermal Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931679] CPU: 4 PID: 0 Comm: swapper/4 Tainted: G W O 4.5.0 #3 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931680] Hardware name: Supermicro X10SLL-F/X10SLL-SF/X10SLL-F/X10SLL-SF, BIOS 1.0a 06/11/2013 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931681] 0000000000000000 ffff88022fd03858 ffffffff8123fbb1 0000000000000000 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931683] 0000000000000009 ffff88022fd03898 ffffffff81040bee ffff88022fd03888 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931684] ffffffff81501aae 0000000000000000 ffff880201ef7800 0000003c14847098 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931686] Call Trace: Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931687] <IRQ> [<ffffffff8123fbb1>] dump_stack+0x4d/0x64 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931690] [<ffffffff81040bee>] warn_slowpath_common+0x97/0xb1 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931691] [<ffffffff81501aae>] ? hfsc_dequeue+0x13d/0x262 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931693] [<ffffffff81040c1d>] warn_slowpath_null+0x15/0x17 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931694] [<ffffffff81501aae>] hfsc_dequeue+0x13d/0x262 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931696] [<ffffffff814f9082>] __qdisc_run+0xb1/0x16f Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931698] [<ffffffff814e0923>] __dev_queue_xmit+0x2f5/0x494 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931699] [<ffffffff814e0ad8>] dev_queue_xmit+0xb/0xd Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931700] [<ffffffff814e904d>] neigh_connected_output+0x9c/0xb5 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931702] [<ffffffff8151c103>] ip_finish_output2+0x228/0x288 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931703] [<ffffffff81557f12>] ? ipt_do_table+0x387/0x3a9 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931704] [<ffffffff8151c9de>] ip_finish_output+0x13e/0x151 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931706] [<ffffffff8151da2b>] ip_output+0x4f/0xb8 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931707] [<ffffffff81557ffe>] ? iptable_filter_hook+0x46/0x48 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931708] [<ffffffff81510a5c>] ? nf_iterate+0x33/0x5e Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931709] [<ffffffff8151a857>] ip_forward_finish+0x53/0x5a Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931710] [<ffffffff8151ab48>] ip_forward+0x2ea/0x412 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931712] [<ffffffff8151a804>] ? ip_frag_mem+0x3e/0x3e Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931713] [<ffffffff81519138>] ip_rcv_finish+0x244/0x27c Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931715] [<ffffffff815196ed>] ip_rcv+0x26d/0x2f0 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931717] [<ffffffff81519102>] ? ip_rcv_finish+0x20e/0x27c Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931718] [<ffffffff81518ef4>] ? pskb_may_pull+0x30/0x30 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931720] [<ffffffff814de115>] __netif_receive_skb_core+0x4b3/0x52c Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931721] [<ffffffff81546fe2>] ? inet_gro_receive+0x1a6/0x1b8 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931722] [<ffffffff814de1dc>] __netif_receive_skb+0x4e/0x60 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931724] [<ffffffff814de36a>] netif_receive_skb_internal+0x52/0x84 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931725] [<ffffffff814dedbd>] napi_gro_receive+0x45/0xa1 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931728] [<ffffffffa005af1f>] ixgbe_clean_rx_irq+0x5e6/0x6db [ixgbe] Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931731] [<ffffffffa005ba26>] ixgbe_poll+0x495/0x60b [ixgbe] Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931733] [<ffffffff8104380e>] ? irq_exit+0x43/0x45 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931734] [<ffffffff81004b4f>] ? do_IRQ+0xa1/0xba Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931735] [<ffffffff814dead6>] net_rx_action+0xdd/0x247 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931737] [<ffffffff810435b4>] __do_softirq+0xae/0x1a9 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931738] [<ffffffff81043802>] irq_exit+0x37/0x45 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931739] [<ffffffff81004b4f>] do_IRQ+0xa1/0xba Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931741] [<ffffffff815c933f>] common_interrupt+0x7f/0x7f Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931741] <EOI> [<ffffffff814a6aa6>] ? cpuidle_enter_state+0x128/0x185 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931744] [<ffffffff814a6a6c>] ? cpuidle_enter_state+0xee/0x185 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931746] [<ffffffff814a6b25>] cpuidle_enter+0x12/0x14 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931747] [<ffffffff81067879>] cpu_startup_entry+0x131/0x1b0 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931749] [<ffffffff8102a6e8>] start_secondary+0xe1/0xe5 Mar 16 15:55:13 RTR_TM_01 kernel: [16514.931750] ---[ end trace 7ca47b9601695e3d ]--- Ok it has something to do with default class that hfcs has. When I put all my rules and for default class i will attach sfq qdisc like this: qdisc add dev vlan4011 parent 11:8000 handle 8000: sfq perturb 120 then no more Warning is produced. But when I put same rules with default class that has differend qdisc like: qdisc add dev vlan4011 parent 11:8000 handle 8000: fq_codel quantum 1514 noecn Then I have randomly produced Warning WARNING: CPU: 4 PID: 0 at net/sched/sch_hfsc.c:1429 hfsc_dequeue+0x13d/0x262() This is running now from an hour without any warning produced in log... hope this is it. So far another hour without warning. Ok - this wont help ... after few hours of running: Mar 16 18:45:59 RTR_TM_01 kernel: [26761.053884] WARNING: CPU: 5 PID: 0 at net/sched/sch_hfsc.c:1429 hfsc_dequeue+0x13d/0x262() Mar 16 18:45:59 RTR_TM_01 kernel: [26761.053885] Modules linked in: ipt_NETFLOW(O) xt_set ip_set_hash_net ip_set ixgbe mdio ipmi_si x86_pkg_temp_thermal Mar 16 18:45:59 RTR_TM_01 kernel: [26761.053890] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G W O 4.5.0 #3 Mar 16 18:45:59 RTR_TM_01 kernel: [26761.053891] Hardware name: Supermicro X10SLL-F/X10SLL-SF/X10SLL-F/X10SLL-SF, BIOS 1.0a 06/11/2013 Mar 16 18:45:59 RTR_TM_01 kernel: [26761.053891] 0000000000000000 ffff88022fd43858 ffffffff8123fbb1 0000000000000000 Mar 16 18:45:59 RTR_TM_01 kernel: [26761.053893] 0000000000000009 ffff88022fd43898 ffffffff81040bee ffff88022fd43888 Mar 16 18:45:59 RTR_TM_01 kernel: [26761.053894] ffffffff81501aae 0000000000000000 ffff88021e0db000 000000615ae3d3fe Mar 16 18:45:59 RTR_TM_01 kernel: [26761.053896] Call Trace: Mar 16 18:45:59 RTR_TM_01 kernel: [26761.053897] <IRQ> [<ffffffff8123fbb1>] dump_stack+0x4d/0x64 Mar 16 18:45:59 RTR_TM_01 kernel: [26761.053900] [<ffffffff81040bee>] warn_slowpath_common+0x97/0xb1 Mar 16 18:45:59 RTR_TM_01 kernel: [26761.053902] [<ffffffff81501aae>] ? hfsc_dequeue+0x13d/0x262 Mar 16 18:45:59 RTR_TM_01 kernel: [26761.053903] [<ffffffff81040c1d>] warn_slowpath_null+0x15/0x17 Mar 16 18:45:59 RTR_TM_01 kernel: [26761.053905] [<ffffffff81501aae>] hfsc_dequeue+0x13d/0x262 Mar 16 18:45:59 RTR_TM_01 kernel: [26761.053907] [<ffffffff814f9082>] __qdisc_run+0xb1/0x16f Mar 16 18:45:59 RTR_TM_01 kernel: [26761.053909] [<ffffffff814e0923>] __dev_queue_xmit+0x2f5/0x494 Mar 16 18:45:59 RTR_TM_01 kernel: [26761.053910] [<ffffffff814e0ad8>] dev_queue_xmit+0xb/0xd Mar 16 18:45:59 RTR_TM_01 kernel: [26761.053911] [<ffffffff814e904d>] neigh_connected_output+0x9c/0xb5 Mar 16 18:45:59 RTR_TM_01 kernel: [26761.053913] [<ffffffff8151c103>] ip_finish_output2+0x228/0x288 Mar 16 18:45:59 RTR_TM_01 kernel: [26761.053914] [<ffffffff81557f12>] ? ipt_do_table+0x387/0x3a9 Mar 16 18:45:59 RTR_TM_01 kernel: [26761.053916] [<ffffffff8151c9de>] ip_finish_output+0x13e/0x151 Mar 16 18:45:59 RTR_TM_01 kernel: [26761.053918] [<ffffffff8151da2b>] ip_output+0x4f/0xb8 Mar 16 18:45:59 RTR_TM_01 kernel: [26761.053920] [<ffffffff81557ffe>] ? iptable_filter_hook+0x46/0x48 Mar 16 18:45:59 RTR_TM_01 kernel: [26761.053922] [<ffffffff81510a5c>] ? nf_iterate+0x33/0x5e Mar 16 18:45:59 RTR_TM_01 kernel: [26761.053924] [<ffffffff8151a857>] ip_forward_finish+0x53/0x5a Mar 16 18:45:59 RTR_TM_01 kernel: [26761.053926] [<ffffffff8151ab48>] ip_forward+0x2ea/0x412 Mar 16 18:45:59 RTR_TM_01 kernel: [26761.053927] [<ffffffff8151a804>] ? ip_frag_mem+0x3e/0x3e Mar 16 18:45:59 RTR_TM_01 kernel: [26761.053930] [<ffffffff81519138>] ip_rcv_finish+0x244/0x27c Mar 16 18:45:59 RTR_TM_01 kernel: [26761.053932] [<ffffffff815196ed>] ip_rcv+0x26d/0x2f0 Mar 16 18:45:59 RTR_TM_01 kernel: [26761.053934] [<ffffffff81519102>] ? ip_rcv_finish+0x20e/0x27c Mar 16 18:45:59 RTR_TM_01 kernel: [26761.053937] [<ffffffff81518ef4>] ? pskb_may_pull+0x30/0x30 Mar 16 18:45:59 RTR_TM_01 kernel: [26761.053939] [<ffffffff814de115>] __netif_receive_skb_core+0x4b3/0x52c Mar 16 18:45:59 RTR_TM_01 kernel: [26761.053941] [<ffffffff81546fe2>] ? inet_gro_receive+0x1a6/0x1b8 Mar 16 18:45:59 RTR_TM_01 kernel: [26761.053943] [<ffffffff814de1dc>] __netif_receive_skb+0x4e/0x60 Mar 16 18:45:59 RTR_TM_01 kernel: [26761.053945] [<ffffffff814de36a>] netif_receive_skb_internal+0x52/0x84 Mar 16 18:45:59 RTR_TM_01 kernel: [26761.053946] [<ffffffff814dedbd>] napi_gro_receive+0x45/0xa1 Mar 16 18:45:59 RTR_TM_01 kernel: [26761.053951] [<ffffffffa005af1f>] ixgbe_clean_rx_irq+0x5e6/0x6db [ixgbe] Mar 16 18:45:59 RTR_TM_01 kernel: [26761.053956] [<ffffffffa005ba26>] ixgbe_poll+0x495/0x60b [ixgbe] Mar 16 18:45:59 RTR_TM_01 kernel: [26761.053959] [<ffffffff81079c98>] ? hrtimer_start_range_ns+0x127/0x136 Mar 16 18:45:59 RTR_TM_01 kernel: [26761.053961] [<ffffffff814dead6>] net_rx_action+0xdd/0x247 Mar 16 18:45:59 RTR_TM_01 kernel: [26761.053963] [<ffffffff810435b4>] __do_softirq+0xae/0x1a9 Mar 16 18:45:59 RTR_TM_01 kernel: [26761.053965] [<ffffffff81043802>] irq_exit+0x37/0x45 Mar 16 18:45:59 RTR_TM_01 kernel: [26761.053967] [<ffffffff81004b4f>] do_IRQ+0xa1/0xba Mar 16 18:45:59 RTR_TM_01 kernel: [26761.053970] [<ffffffff815c933f>] common_interrupt+0x7f/0x7f Mar 16 18:45:59 RTR_TM_01 kernel: [26761.053970] <EOI> [<ffffffff814a6aa6>] ? cpuidle_enter_state+0x128/0x185 Mar 16 18:45:59 RTR_TM_01 kernel: [26761.053974] [<ffffffff814a6a6c>] ? cpuidle_enter_state+0xee/0x185 Mar 16 18:45:59 RTR_TM_01 kernel: [26761.053977] [<ffffffff814a6b25>] cpuidle_enter+0x12/0x14 Mar 16 18:45:59 RTR_TM_01 kernel: [26761.053979] [<ffffffff81067879>] cpu_startup_entry+0x131/0x1b0 Mar 16 18:45:59 RTR_TM_01 kernel: [26761.053981] [<ffffffff8102a6e8>] start_secondary+0xe1/0xe5 Mar 16 18:45:59 RTR_TM_01 kernel: [26761.053982] ---[ end trace 7ca47b9601695e8b ]--- Mar 16 18:45:59 RTR_TM_01 kernel: [26761.143801] ------------[ cut here ]------------ No changes made - only traffic if higher and higher - nothing more. Ok - problem resolved by changing qdisc for all classes from fq_codel to sfb No warning from 6 hours. So something is fromg with fq_codel + hfsc - as Stephen said HFSC is upset when fq_codel is releasing/holding some packets not in time that it should be :) Hi, you mean sfq perturb? Yes, i use sfq perturb, too. Created attachment 209551 [details] attachment-29507-0.html My sched use only hfsc with linklayer adsl and sfq pure and the bug still occurs. Isn't a exclusive codel sched problem. I believe that problem is on hsfc sched. Em 16/03/2016 20:23, <bugzilla-daemon@bugzilla.kernel.org> escreveu: > https://bugzilla.kernel.org/show_bug.cgi?id=109581 > > --- Comment #21 from Sebastian Kricner <sebastian.kricner@tuxwave.net> --- > Hi, > > you mean sfq perturb? > Yes, i use sfq perturb, too. > > -- > You are receiving this mail because: > You are on the CC list for the bug. > Just for info, the setup here is like the example at the bottom of the page: http://linux-ip.net/articles/hfsc.en/ Except that sfq perturb also is being used for upstream and rate policing at downstream. Still, as told, no problems with 4.0.9. In my oppinion hfsc is important as it is in my oppinion the best QoS scheduler. It also here seems to be just an hfsc problem, as i tested it as i reported that error, too. Also as i also use sfq and that problem still appeared, it is unlikely that adding sfq perturb helps, like reported at comment 17. Also being using pure ethernet (to cable modem). No sfq perturb but sfb (stohastic fair blue) - with this qdisc have no warning Can confirm that warning exist with qdiscs fq_codel , sfq So checked with sfb - and problem don't exist. Also can confirm that problem don't exist with pfifo qdisc attached to classes. Similary to other, it's hard to say how to reproduce this bug in lab. But also I've got such problem. I'm using hfsc + fq_codel with ifb. My trace in dmesg is: 2016-03-24T09:49:07.116204+01:00 bramkarz kernel: [851816.202967] ------------[ cut here ]------------ 2016-03-24T09:49:07.116204+01:00 bramkarz kernel: [851816.202969] WARNING: CPU: 1 PID: 22501 at net/sched/sch_hfsc.c:1429 hfsc_dequeue+0x2f0/0x300 [sch_hfsc]() 2016-03-24T09:49:07.116205+01:00 bramkarz kernel: [851816.202970] Modules linked in: nf_log_common xt_TRACE nf_conntrack_netlink sch_fq_codel sch_hfsc act_mirred cls_u32 sch_ingress ip6t_rt ip6table_raw ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_filter ip6_tables tun 8021q mrp garp stp llc xt_NFLOG xt_geoip(O) xt_TARPIT(O) ipv6 xt_conntrack iptable_filter xt_length xt_CLASSIFY xt_mark xt_connmark iptable_mangle xt_comment xt_tcpudp xt_nat xt_multiport iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_raw ip_tables x_tables nfnetlink_log nfnetlink tcp_diag inet_diag ifb crc32_pclmul aesni_intel aes_x86_64 ablk_helper cryptd lrw gf128mul glue_helper button e1000 sr_mod cdrom crc32c_intel ata_piix dm_mirror dm_region_hash dm_log dm_mod 2016-03-24T09:49:07.116206+01:00 bramkarz kernel: [851816.202994] CPU: 1 PID: 22501 Comm: dhcpd Tainted: G W O 4.4.3 #1 2016-03-24T09:49:07.116207+01:00 bramkarz kernel: [851816.202995] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014 2016-03-24T09:49:07.116207+01:00 bramkarz kernel: [851816.202995] 0000000000000286 00000000ec6e0a88 ffff880069103e50 ffffffff813ba5a6 2016-03-24T09:49:07.116208+01:00 bramkarz kernel: [851816.202996] 0000000000000000 ffffffffa02b5c4d ffff880069103e88 ffffffff810490dc 2016-03-24T09:49:07.116208+01:00 bramkarz kernel: [851816.202998] ffff88005eb50200 00000c1af0462a22 ffff8800649f9000 ffff8800649f9490 2016-03-24T09:49:07.116208+01:00 bramkarz kernel: [851816.202999] Call Trace: 2016-03-24T09:49:07.116210+01:00 bramkarz kernel: [851816.202999] <IRQ> [<ffffffff813ba5a6>] dump_stack+0x63/0x81 2016-03-24T09:49:07.116210+01:00 bramkarz kernel: [851816.203003] [<ffffffff810490dc>] warn_slowpath_common+0x7c/0xb0 2016-03-24T09:49:07.116211+01:00 bramkarz kernel: [851816.203004] [<ffffffff81049205>] warn_slowpath_null+0x15/0x20 2016-03-24T09:49:07.116211+01:00 bramkarz kernel: [851816.203006] [<ffffffffa02b45a0>] hfsc_dequeue+0x2f0/0x300 [sch_hfsc] 2016-03-24T09:49:07.116211+01:00 bramkarz kernel: [851816.203007] [<ffffffff8133e8be>] __qdisc_run+0x3e/0x1b0 2016-03-24T09:49:07.116212+01:00 bramkarz kernel: [851816.203008] [<ffffffff8131d7ef>] net_tx_action+0x10f/0x180 2016-03-24T09:49:07.116212+01:00 bramkarz kernel: [851816.203010] [<ffffffff8104c942>] __do_softirq+0xe2/0x1d0 2016-03-24T09:49:07.116213+01:00 bramkarz kernel: [851816.203011] [<ffffffff813c71ec>] do_softirq_own_stack+0x1c/0x30 2016-03-24T09:49:07.116214+01:00 bramkarz kernel: [851816.203012] <EOI> [<ffffffff8104c773>] do_softirq.part.17+0x33/0x40 2016-03-24T09:49:07.116214+01:00 bramkarz kernel: [851816.203014] [<ffffffff8104c7f8>] __local_bh_enable_ip+0x78/0x80 2016-03-24T09:49:07.116215+01:00 bramkarz kernel: [851816.203015] [<ffffffff813c57f9>] _raw_spin_unlock_bh+0x19/0x20 2016-03-24T09:49:07.116215+01:00 bramkarz kernel: [851816.203016] [<ffffffff813b14b2>] packet_poll+0x82/0x150 2016-03-24T09:49:07.116215+01:00 bramkarz kernel: [851816.203018] [<ffffffff81304974>] sock_poll+0x44/0x100 2016-03-24T09:49:07.116217+01:00 bramkarz kernel: [851816.203019] [<ffffffff8113487b>] do_select+0x34b/0x7d0 2016-03-24T09:49:07.116217+01:00 bramkarz kernel: [851816.203020] [<ffffffff81110e7c>] ? __slab_free+0x14c/0x1e0 2016-03-24T09:49:07.116217+01:00 bramkarz kernel: [851816.203021] [<ffffffff81134280>] ? poll_select_copy_remaining+0x140/0x140 2016-03-24T09:49:07.116218+01:00 bramkarz kernel: [851816.203023] [<ffffffff81134280>] ? poll_select_copy_remaining+0x140/0x140 2016-03-24T09:49:07.116218+01:00 bramkarz kernel: [851816.203024] [<ffffffff81134280>] ? poll_select_copy_remaining+0x140/0x140 2016-03-24T09:49:07.116219+01:00 bramkarz kernel: [851816.203025] [<ffffffff81134280>] ? poll_select_copy_remaining+0x140/0x140 2016-03-24T09:49:07.116219+01:00 bramkarz kernel: [851816.203026] [<ffffffff81134280>] ? poll_select_copy_remaining+0x140/0x140 2016-03-24T09:49:07.116220+01:00 bramkarz kernel: [851816.203027] [<ffffffff8104c7f8>] ? __local_bh_enable_ip+0x78/0x80 2016-03-24T09:49:07.116221+01:00 bramkarz kernel: [851816.203028] [<ffffffff81320908>] ? __dev_queue_xmit+0x258/0x4f0 2016-03-24T09:49:07.116221+01:00 bramkarz kernel: [851816.203030] [<ffffffff81110dde>] ? __slab_free+0xae/0x1e0 2016-03-24T09:49:07.116222+01:00 bramkarz kernel: [851816.203031] [<ffffffff813136d7>] ? __skb_recv_datagram+0x187/0x580 2016-03-24T09:49:07.116222+01:00 bramkarz kernel: [851816.203032] [<ffffffff81134ed1>] core_sys_select+0x1d1/0x2e0 2016-03-24T09:49:07.116222+01:00 bramkarz kernel: [851816.203034] [<ffffffff81313b30>] ? skb_free_datagram+0x10/0x40 2016-03-24T09:49:07.116223+01:00 bramkarz kernel: [851816.203035] [<ffffffff8138afb9>] ? inet_recvmsg+0x79/0xa0 2016-03-24T09:49:07.116224+01:00 bramkarz kernel: [851816.203036] [<ffffffff81302a40>] ? move_addr_to_user+0x60/0x80 2016-03-24T09:49:07.116227+01:00 bramkarz kernel: [851816.203037] [<ffffffff8130362a>] ? SYSC_recvfrom+0x10a/0x160 2016-03-24T09:49:07.116228+01:00 bramkarz kernel: [851816.203039] [<ffffffff8103811e>] ? kvm_clock_get_cycles+0x1e/0x20 2016-03-24T09:49:07.116228+01:00 bramkarz kernel: [851816.203040] [<ffffffff81095860>] ? ktime_get_ts64+0x40/0xf0 2016-03-24T09:49:07.116229+01:00 bramkarz kernel: [851816.203041] [<ffffffff81135095>] SyS_select+0xb5/0x110 2016-03-24T09:49:07.116229+01:00 bramkarz kernel: [851816.203043] [<ffffffff813c5cee>] entry_SYSCALL_64_fastpath+0x12/0x71 2016-03-24T09:49:07.116229+01:00 bramkarz kernel: [851816.203043] ---[ end trace e25ee0cc37829169 ]--- kernel 4.4.3 And another, shorter trace: 2016-03-24T11:44:35.754129+01:00 bramkarz kernel: [858744.742396] ------------[ cut here ]------------ 2016-03-24T11:44:35.754129+01:00 bramkarz kernel: [858744.742399] WARNING: CPU: 1 PID: 30450 at net/sched/sch_hfsc.c:1429 hfsc_dequeue+0x2f0/0x300 [sch_hfsc]() 2016-03-24T11:44:35.754130+01:00 bramkarz kernel: [858744.742399] Modules linked in: nf_log_common xt_TRACE nf_conntrack_netlink sch_fq_codel sch_hfsc act_mirred cls_u32 sch_ingress ip6t_rt ip6table_raw ip6t able_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_filter ip6_tables tun 8021q mrp garp stp llc xt_NFLOG xt_geoip(O) xt_TARPIT(O) ipv6 xt_conntrack iptable_filter xt_length xt_CLA SSIFY xt_mark xt_connmark iptable_mangle xt_comment xt_tcpudp xt_nat xt_multiport iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_raw ip_tables x_tables nfnetlink_log nfn etlink tcp_diag inet_diag ifb crc32_pclmul aesni_intel aes_x86_64 ablk_helper cryptd lrw gf128mul glue_helper button e1000 sr_mod cdrom crc32c_intel ata_piix dm_mirror dm_region_hash dm_log dm_mod 2016-03-24T11:44:35.754130+01:00 bramkarz kernel: [858744.742417] CPU: 1 PID: 30450 Comm: multitail Tainted: G W O 4.4.3 #1 2016-03-24T11:44:35.754131+01:00 bramkarz kernel: [858744.742418] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014 2016-03-24T11:44:35.754131+01:00 bramkarz kernel: [858744.742419] 0000000000000286 0000000046285df0 ffff880069103e08 ffffffff813ba5a6 2016-03-24T11:44:35.754132+01:00 bramkarz kernel: [858744.742420] 0000000000000000 ffffffffa02b5c4d ffff880069103e40 ffffffff810490dc 2016-03-24T11:44:35.754132+01:00 bramkarz kernel: [858744.742421] ffff88005eb50200 00000c34251521e0 ffff880064931800 ffff880064931c90 2016-03-24T11:44:35.754133+01:00 bramkarz kernel: [858744.742423] Call Trace: 2016-03-24T11:44:35.754133+01:00 bramkarz kernel: [858744.742423] <IRQ> [<ffffffff813ba5a6>] dump_stack+0x63/0x81 2016-03-24T11:44:35.754134+01:00 bramkarz kernel: [858744.742426] [<ffffffff810490dc>] warn_slowpath_common+0x7c/0xb0 2016-03-24T11:44:35.754134+01:00 bramkarz kernel: [858744.742428] [<ffffffff81049205>] warn_slowpath_null+0x15/0x20 2016-03-24T11:44:35.754134+01:00 bramkarz kernel: [858744.742429] [<ffffffffa02b45a0>] hfsc_dequeue+0x2f0/0x300 [sch_hfsc] 2016-03-24T11:44:35.754135+01:00 bramkarz kernel: [858744.742431] [<ffffffff8133e8be>] __qdisc_run+0x3e/0x1b0 2016-03-24T11:44:35.754135+01:00 bramkarz kernel: [858744.742432] [<ffffffff8131d7ef>] net_tx_action+0x10f/0x180 2016-03-24T11:44:35.754136+01:00 bramkarz kernel: [858744.742433] [<ffffffff8104c942>] __do_softirq+0xe2/0x1d0 2016-03-24T11:44:35.754136+01:00 bramkarz kernel: [858744.742435] [<ffffffff8104cb69>] irq_exit+0x89/0x90 2016-03-24T11:44:35.754136+01:00 bramkarz kernel: [858744.742436] [<ffffffff810050ff>] do_IRQ+0x4f/0xd0 2016-03-24T11:44:35.754137+01:00 bramkarz kernel: [858744.742437] [<ffffffff813c6782>] common_interrupt+0x82/0x82 2016-03-24T11:44:35.754137+01:00 bramkarz kernel: [858744.742438] <EOI> 2016-03-24T11:44:35.754138+01:00 bramkarz kernel: [858744.742438] ---[ end trace e25ee0cc37829bdd ]--- This occurs on 4.4.7 in openwrt [1212366.541370] WARNING: CPU: 0 PID: 0 at net/sched/sch_hfsc.c:1429 hfsc_dequeue+0x19c/0x704 [sch_hfsc]() [1212366.550828] Modules linked in: pppoe ppp_async iptable_nat pppox ppp_generic nf_nat_ipv4 nf_conntrack_ipv4 mwl8k mac80211 ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_tcpmss xt_statistic xt_state xt_recent xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_id xt_hl xt_helper xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_HL xt_DSCP xt_CT xt_CLASSIFY slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_nat nf_log_ipv4 nf_log_common nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack iptable_raw iptable_mangle iptable_filter ipt_ECN ip_tables crc_ccitt compat em_nbyte sch_teql sch_htb cls_basic sch_prio sch_tbf sch_dsmark sch_pie act_ipt x_tables em_cmp sch_gred em_meta em_text sch_codel sch_red sch_fq sch_sfq act_police act_skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_hfsc sch_ingress ifb [1212366.632018] CPU: 0 PID: 0 Comm: swapper Tainted: G W 4.4.7 #3 [1212366.639105] Hardware name: Marvell Kirkwood (Flattened Device Tree) [1212366.645571] Backtrace: [1212366.648231] [<c0015898>] (dump_backtrace) from [<c0015aa0>] (show_stack+0x18/0x1c) [1212366.656010] r7:bf00b210 r6:bf0094f8 r5:00000595 r4:00000000 [1212366.661924] [<c0015a88>] (show_stack) from [<c01be94c>] (dump_stack+0x20/0x28) [1212366.669375] [<c01be92c>] (dump_stack) from [<c001d940>] (warn_slowpath_common+0x94/0xbc) [1212366.677678] [<c001d8ac>] (warn_slowpath_common) from [<c001da0c>] (warn_slowpath_null+0x24/0x2c) [1212366.686682] r8:76827c9b r7:00000002 r6:c6fb3580 r5:c667ec00 r4:c667ecc8 [1212366.693663] [<c001d9e8>] (warn_slowpath_null) from [<bf0094f8>] (hfsc_dequeue+0x19c/0x704 [sch_hfsc]) [1212366.703145] [<bf00935c>] (hfsc_dequeue [sch_hfsc]) from [<c0309ba4>] (__qdisc_run+0xc0/0x198) [1212366.711893] r10:c050a000 r9:0000003f r8:00000000 r7:c6f6bb40 r6:c6fb3580 r5:fffff969 [1212366.719979] r4:c667ec00 [1212366.722713] [<c0309ae4>] (__qdisc_run) from [<c02e983c>] (net_tx_action+0xf8/0x110) [1212366.730587] r10:00000001 r9:00000100 r8:c050a000 r7:40000002 r6:c0510f40 r5:c052945c [1212366.738676] r4:00000000 [1212366.741406] [<c02e9744>] (net_tx_action) from [<c00201d4>] (__do_softirq+0xac/0x23c) [1212366.749370] r7:40000002 r6:c0529454 r5:c052945c r4:00000002 [1212366.755268] [<c0020128>] (__do_softirq) from [<c00205f8>] (irq_exit+0x90/0xcc) [1212366.762702] r10:00000000 r9:c050bf28 r8:c7802200 r7:00000001 r6:00000000 r5:c050f504 [1212366.770791] r4:00000000 [1212366.773524] [<c0020568>] (irq_exit) from [<c0044918>] (__handle_domain_irq+0x8c/0xa8) [1212366.781575] r5:c050f504 r4:00000000 [1212366.785362] [<c004488c>] (__handle_domain_irq) from [<c00093b8>] (orion_handle_irq+0x74/0xa0) [1212366.794103] r9:c050bf28 r8:00000001 r7:c0551ab8 r6:c780601c r5:00008000 r4:0000000f [1212366.802113] [<c0009344>] (orion_handle_irq) from [<c0009e30>] (__irq_svc+0x50/0x64) [1212366.809985] Exception stack(0xc050bf28 to 0xc050bf70) [1212366.815235] bf20: 00000000 00000000 00000000 60000013 c050a000 c050c0bc [1212366.823637] bf40: 00000000 00000000 c7ffc6c0 c0510570 00000001 c050bf84 c050bf78 c050bf78 [1212366.832044] bf60: c0012fcc c003ed0c 60000013 ffffffff [1212366.837290] r10:00000001 r9:c0510570 r8:c7ffc6c0 r7:c050bf5c r6:ffffffff r5:60000013 [1212366.845373] r4:c003ed0c [1212366.848127] [<c003ecd4>] (default_idle_call) from [<c003edb0>] (cpu_startup_entry+0xa0/0x16c) [1212366.856872] [<c003ed10>] (cpu_startup_entry) from [<c000df08>] (rest_init+0x64/0x7c) [1212366.864833] r7:c0528e74 r4:c0528e90 [1212366.868632] [<c000dea4>] (rest_init) from [<c04e3d38>] (start_kernel+0x3bc/0x458) [1212366.876325] [<c04e397c>] (start_kernel) from [<00008048>] (0x8048) [1212366.882719] ---[ end trace 69972347d20c58cb ]--- Created attachment 233411 [details] Log of git bisect Hello all, I ran a git bisect between linux-stable v4.1.32 (no problems) and v4.4.20. The bisect log is attached. git bisect offered 8d12ded3dd499e38e8022fe3ec53920d085e57a3 as the first bad commit: https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=8d12ded3dd499e38e8022fe3ec53920d085e57a3 "Merge branch 'perf/urgent' into perf/core, before applying dependent patches" The problem is probably that it's a pretty big commit. If there's a way to do a git bisect on this commit let me know and I'll run another git bisect. Regards, Sebastian Created attachment 233481 [details] Log of 2nd git bisect Hello all, I ended up doing another bisect, between d499c106843afa0703a68c64662bf42a16421aec (good) and 8d12ded3dd499e38e8022fe3ec53920d085e57a3 (bad). git bisect now suggested a5d28090405038ca1f40c13f38d6d4285456efee is the first bad commit: https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=a5d28090405038ca1f40c13f38d6d4285456efee "codel: fix maxpacket/mtu confusion" To verify I checked out commit 8d12ded3dd499e38e8022fe3ec53920d085e57a3 and removed a5d28090405038ca1f40c13f38d6d4285456efee. No problems. I reapplied the codel patch and the problems started again. [ 64.950393] ------------[ cut here ]------------ [ 64.950404] WARNING: CPU: 1 PID: 2000 at net/sched/sch_hfsc.c:1429 hfsc_dequeue+0x327/0x340 [sch_hfsc]() [ 64.950405] Modules linked in: ifb sch_fq_codel sch_hfsc sch_ingress act_mirred em_u32 cls_u32 xts gf128mul ctr ccm nls_iso8859_15 nls_cp850 vfat fat cbc dm_crypt dm_mod i915 snd_hda_codec_hdmi cfbfillrect snd_hda_codec_realtek snd_hda_codec_generic fbcon cfbimgblt bitblit softcursor i2c_algo_bit font cfbcopyarea drm_kms_helper snd_hda_intel snd_hda_controller snd_hda_codec snd_pcm drm snd_hda_core arc4 i2c_core iwldvm mac80211 iwlwifi cfg80211 xhci_pci snd_timer xhci_hcd ehci_pci intel_gtt agpgart ehci_hcd snd usbcore r8169 coretemp rfkill fb fbdev hwmon soundcore crc32c_intel usb_common mii psmouse video backlight evdev [ 64.950451] CPU: 1 PID: 2000 Comm: firefox-bin Not tainted 4.1.0-rc5+ #22 [ 64.950453] Hardware name: LENOVO 3354ALG/3354ALG, BIOS H3ET75WW(1.12) 03/03/2015 [ 64.950454] 0000000000000000 ffffffffa00f3d8f ffffffff8136c5a4 0000000000000000 [ 64.950458] ffffffff81042fc7 ffff8800d774a400 000000003c6b78c8 ffff8800c488a000 [ 64.950460] ffff8800c488a490 ffff8800c7cd9900 ffffffffa00f2847 ffff8800d774a400 [ 64.950463] Call Trace: [ 64.950464] <IRQ> [<ffffffff8136c5a4>] ? dump_stack+0x40/0x50 [ 64.950474] [<ffffffff81042fc7>] ? warn_slowpath_common+0x77/0xb0 [ 64.950477] [<ffffffffa00f2847>] ? hfsc_dequeue+0x327/0x340 [sch_hfsc] [ 64.950480] [<ffffffff81308def>] ? __qdisc_run+0x3f/0x1b0 [ 64.950483] [<ffffffff812ec943>] ? net_tx_action+0xd3/0x160 [ 64.950486] [<ffffffff81045a18>] ? __do_softirq+0xe8/0x1e0 [ 64.950489] [<ffffffff81045c46>] ? irq_exit+0x76/0xa0 [ 64.950492] [<ffffffff81030889>] ? smp_apic_timer_interrupt+0x39/0x50 [ 64.950495] [<ffffffff813714fb>] ? apic_timer_interrupt+0x6b/0x70 [ 64.950496] <EOI> [ 64.950497] ---[ end trace a3de6dd48cf75d29 ]--- [ 64.950505] ------------[ cut here ]------------ Hi all Update here with kernel 4.10.1 [39542.654334] WARNING: CPU: 26 PID: 0 at net/sched/sch_hfsc.c:1400 hfsc_dequeue+0x13e/0x277 [39542.654335] Modules linked in: bridge stp llc veth bonding ipmi_si [39542.654341] CPU: 26 PID: 0 Comm: swapper/26 Tainted: G W O 4.10.1 #1 [39542.654342] Call Trace: [39542.654344] <IRQ> [39542.654347] dump_stack+0x63/0x7f [39542.654349] __warn+0xc0/0xdb [39542.654351] warn_slowpath_null+0x18/0x1a [39542.654352] hfsc_dequeue+0x13e/0x277 [39542.654353] __qdisc_run+0xf0/0x21b [39542.654355] net_tx_action+0xdc/0xef [39542.654356] __do_softirq+0xbc/0x1c9 [39542.654358] irq_exit+0x51/0x5f [39542.654360] smp_apic_timer_interrupt+0x29/0x34 [39542.654363] apic_timer_interrupt+0x89/0x90 [39542.654366] RIP: 0010:cpuidle_enter_state+0x148/0x1a3 [39542.654367] RSP: 0018:ffffc90003333e98 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10 [39542.654368] RAX: ffff88046fc153c0 RBX: 0000000000000001 RCX: 000000000000001f [39542.654369] RDX: 0000000000000000 RSI: ffff88046fc12b18 RDI: 0000000000000000 [39542.654369] RBP: ffffc90003333ec8 R08: 0000000000000000 R09: 0000000000000000 [39542.654370] R10: ffffc90003333e40 R11: ffff88086d387000 R12: ffff88046c4f4400 [39542.654371] R13: 000023f6bddd31c7 R14: 0000000000000000 R15: 000023f6bddd0579 [39542.654371] </IRQ> [39542.654373] ? cpuidle_enter_state+0x101/0x1a3 [39542.654374] cpuidle_enter+0x12/0x14 [39542.654376] do_idle+0x129/0x1a7 [39542.654378] cpu_startup_entry+0x1a/0x1f [39542.654379] start_secondary+0xd6/0xd9 [39542.654382] start_cpu+0x14/0x14 [39542.654383] ---[ end trace de75ea4e514f5380 ]--- Typical configuration fot hfsc on vlan interface: tc qdisc del dev vlan1201 root tc qdisc add dev vlan1201 handle 1: root hfsc default 100 tc class add dev vlan1201 parent 1: classid 1:100 hfsc ls m2 2100Mbit ul m2 2100Mbit tc qdisc add dev vlan1201 parent 1:100 handle 100: fq_codel limit 3000 noecn Same hfsc/qdisc configuration with kernel 4.8.7 - different server [9893310.154619] WARNING: CPU: 16 PID: 0 at net/sched/sch_hfsc.c:1419 hfsc_dequeue+0x13e/0x277 [9893310.154620] Modules linked in: bonding ixgbe ipmi_si x86_pkg_temp_thermal mdio [9893310.154624] CPU: 16 PID: 0 Comm: swapper/16 Tainted: G W 4.8.7 #1 [9893310.154625] 0000000000000000 ffff88046fd03930 ffffffff81228c34 0000000000000000 [9893310.154627] 0000000000000000 ffff88046fd03970 ffffffff81039f70 0000058b66562800 [9893310.154629] 0000000000000000 ffff88085739b000 00008c9633170125 0000000000000001 [9893310.154631] Call Trace: [9893310.154632] <IRQ> [<ffffffff81228c34>] dump_stack+0x4d/0x63 [9893310.154636] [<ffffffff81039f70>] __warn+0xc0/0xdb [9893310.154638] [<ffffffff8103a038>] warn_slowpath_null+0x18/0x1a [9893310.154640] [<ffffffff814eca5b>] hfsc_dequeue+0x13e/0x277 [9893310.154643] [<ffffffff814e46db>] __qdisc_run+0xf0/0x21f [9893310.154645] [<ffffffff814cc0f4>] __dev_queue_xmit+0x2a1/0x483 [9893310.154647] [<ffffffff814cc2e1>] dev_queue_xmit+0xb/0xd [9893310.154649] [<ffffffff814d4378>] neigh_connected_output+0x98/0xaf [9893310.154651] [<ffffffff81502c77>] ip_finish_output2+0x234/0x276 [9893310.154652] [<ffffffff81503da1>] ip_finish_output+0x10b/0x117 [9893310.154654] [<ffffffff815044f0>] ip_output+0x4f/0xb8 [9893310.154656] [<ffffffff8153c827>] ? iptable_filter_hook+0x46/0x48 [9893310.154657] [<ffffffff814f8efb>] ? nf_iterate+0x41/0x5b [9893310.154659] [<ffffffff815014b4>] ip_forward_finish+0x53/0x58 [9893310.154661] [<ffffffff815017a3>] ip_forward+0x2ea/0x33c [9893310.154662] [<ffffffff81501461>] ? ip_frag_mem+0x3e/0x3e [9893310.154664] [<ffffffff814ffdf9>] ip_rcv_finish+0x257/0x2b5 [9893310.154665] [<ffffffff8150039f>] ip_rcv+0x2a4/0x326 [9893310.154666] [<ffffffff814ffba2>] ? inet_del_offload+0x40/0x40 [9893310.154668] [<ffffffff814c8e78>] __netif_receive_skb_core+0x3c8/0x4fa [9893310.154670] [<ffffffff8152e510>] ? inet_gro_receive+0x1e1/0x1f1 [9893310.154671] [<ffffffff814c8fc2>] __netif_receive_skb+0x18/0x5a [9893310.154672] [<ffffffff814c9047>] netif_receive_skb_internal+0x43/0x78 [9893310.154673] [<ffffffff814c9e69>] napi_gro_receive+0x45/0x94 [9893310.154678] [<ffffffffa00e3226>] ixgbe_clean_rx_irq+0x697/0x6d3 [ixgbe] [9893310.154682] [<ffffffffa00e3cb1>] ixgbe_poll+0x4cc/0x601 [ixgbe] [9893310.154683] [<ffffffff814c92da>] net_rx_action+0xc9/0x226 [9893310.154685] [<ffffffff8103cab0>] __do_softirq+0xb8/0x1a8 [9893310.154687] [<ffffffff8103ccec>] irq_exit+0x37/0x45 [9893310.154688] [<ffffffff81014acc>] do_IRQ+0x9d/0xb5 [9893310.154689] [<ffffffff815a227f>] common_interrupt+0x7f/0x7f [9893310.154690] <EOI> [<ffffffff8149b138>] ? cpuidle_enter_state+0x13d/0x195 [9893310.154693] [<ffffffff8149b0ff>] ? cpuidle_enter_state+0x104/0x195 [9893310.154694] [<ffffffff8149b1b2>] cpuidle_enter+0x12/0x14 [9893310.154696] [<ffffffff81061501>] cpu_startup_entry+0x130/0x196 [9893310.154698] [<ffffffff81027193>] start_secondary+0xd0/0xd3 [9893310.154700] ---[ end trace 49d515475d6252b8 ]--- Greetings. I'm using a Raspberry Pi as my home router, with HFSC and fq_codel on kernel 4.4.50. Two Realtek 8153 USB ethernet dongles as WAN and LAN interfaces. By default, sometimes the error about hfsc_dequeue would stuff the log. I could't tell if it's the same problem talking here. It looks like turning off offloads could help. I turned off all GSO GRO TSO UFO as I'm building a router. It looks like turning off GRO and GSO is adequate to workaround the problem. And about the codel patch, it sounds logical to me. I read its code, looks like it done the right thing. I'm a newbie to kernel source, but it's true that codel should stop dropping when queue is shorter than a single MTU, rather than a whole GSO train. I didn't find the actual problem in it, but it looks right, just like every other problematic code. Another trace from kernel 4.10.9 [57130.658057] ------------[ cut here ]------------ [57130.658068] WARNING: CPU: 11 PID: 0 at net/sched/sch_hfsc.c:1400 hfsc_dequeue+0x136/0x276 [57130.658069] Modules linked in: xt_statistic bonding ipmi_si x86_pkg_temp_thermal [57130.658075] CPU: 11 PID: 0 Comm: swapper/11 Tainted: G W 4.10.9 #1 [57130.658076] Call Trace: [57130.658079] <IRQ> [57130.658085] dump_stack+0x63/0x7f [57130.658088] __warn+0xc2/0xe0 [57130.658090] warn_slowpath_null+0x18/0x1a [57130.658091] hfsc_dequeue+0x136/0x276 [57130.658095] __qdisc_run+0x10b/0x21e [57130.658099] net_tx_action+0xda/0xf1 [57130.658102] __do_softirq+0xa8/0x1bc [57130.658104] irq_exit+0x5d/0x6b [57130.658108] smp_apic_timer_interrupt+0x2a/0x36 [57130.658113] apic_timer_interrupt+0x89/0x90 [57130.658117] RIP: 0010:cpuidle_enter_state+0xff/0x161 [57130.658118] RSP: 0018:ffffc900032bbe68 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10 [57130.658120] RAX: ffff88087fcb5dc0 RBX: ffff88046d128c00 RCX: 000000000000001f [57130.658120] RDX: 0000000000000001 RSI: ffff88087fcb3518 RDI: 0000000000000000 [57130.658121] RBP: ffffc900032bbea8 R08: 0000000000000000 R09: 0000000000000000 [57130.658122] R10: ffffffff81c06050 R11: ffff88086dd1d000 R12: 0000000000000001 [57130.658122] R13: 000033f5c494280a R14: 000033f5c493bf9a R15: 0000000000000000 [57130.658123] </IRQ> [57130.658125] ? cpuidle_enter_state+0xbd/0x161 [57130.658127] cpuidle_enter+0x12/0x14 [57130.658130] do_idle+0x125/0x1b6 [57130.658132] cpu_startup_entry+0x1d/0x1f [57130.658133] start_secondary+0xd8/0xdd [57130.658137] start_cpu+0x14/0x14 [57130.658138] ---[ end trace 479898aee34e9363 ]--- (In reply to Sebastian Kemper from comment #32) > Created attachment 233481 [details] > Log of 2nd git bisect > > Hello all, > > I ended up doing another bisect, between > d499c106843afa0703a68c64662bf42a16421aec (good) and > 8d12ded3dd499e38e8022fe3ec53920d085e57a3 (bad). git bisect now suggested > a5d28090405038ca1f40c13f38d6d4285456efee is the first bad commit: > https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/ > ?id=a5d28090405038ca1f40c13f38d6d4285456efee > > "codel: fix maxpacket/mtu confusion" > > To verify I checked out commit 8d12ded3dd499e38e8022fe3ec53920d085e57a3 and > removed a5d28090405038ca1f40c13f38d6d4285456efee. No problems. I reapplied > the codel patch and the problems started again. Can anyone else confirm this? This might be a fix: https://patchwork.ozlabs.org/patch/803560/ Ah, nope. This is bug in fq_codel logic around "We cant call qdisc_tree_reduce_backlog() if our qlen is 0". I'll fix it too. See "[PATCH RFC] net_sched/codel: do not defer queue length update" in netdev@ Yes this patch solved problem. Thank you very much, for resolving this issue. When is that patch incorporated in the kernel mainline? If so, which version already includes that patch? Regards Sebastian Kricner I just added comment about patch cause there was no choice like - tested with net-next kernel :) W dniu 2017-08-22 o 19:13, bugzilla-daemon@bugzilla.kernel.org pisze: > https://bugzilla.kernel.org/show_bug.cgi?id=109581 > > --- Comment #42 from Sebastian Kricner (sebastian.kricner@tuxwave.net) --- > Thank you very much, for resolving this issue. > > When is that patch incorporated in the kernel mainline? If so, which version > already includes that patch? > > Regards > > Sebastian Kricner > WOW, you guys are great! I see the patch in kernel 12.10, so I tried it on my home router, which is a Raspberry Pi. However, it seems the problem is still there: Sep 4 22:27:11 router kernel: [247398.001750] ------------[ cut here ]------------ Sep 4 22:27:11 router kernel: [247398.001828] WARNING: CPU: 0 PID: 0 at net/sched/sch_hfsc.c:1402 hfsc_dequeue+0x348/0x370 [sch_hfsc] Sep 4 22:27:11 router kernel: [247398.001839] Modules linked in: xt_nat ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_TCPMSS xt_tcpudp pppoe pppox ppp_generic slhc cls_u32 sch_hfsc xt_conntrack iptable_filter iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat sch_fq_codel nf_conntrack bridge stp llc cfg80211 rfkill cdc_ether r8152 snd_bcm2835(C) snd_pcm snd_timer snd uio_pdrv_genirq uio i2c_dev ip_tables x_tables ipv6 overlay i2c_bcm2835 fixed Sep 4 22:27:11 router kernel: [247398.002050] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G WC 4.12.10-v7 #1 Sep 4 22:27:11 router kernel: [247398.002058] Hardware name: BCM2835 Sep 4 22:27:11 router kernel: [247398.002092] [<8010f7c0>] (unwind_backtrace) from [<8010bd10>] (show_stack+0x20/0x24) Sep 4 22:27:11 router kernel: [247398.002111] [<8010bd10>] (show_stack) from [<80467410>] (dump_stack+0xc8/0x114) Sep 4 22:27:11 router kernel: [247398.002129] [<80467410>] (dump_stack) from [<8011d1b4>] (__warn+0xe8/0x114) Sep 4 22:27:11 router kernel: [247398.002146] [<8011d1b4>] (__warn) from [<8011d2b0>] (warn_slowpath_null+0x30/0x38) Sep 4 22:27:11 router kernel: [247398.002171] [<8011d2b0>] (warn_slowpath_null) from [<7f360e08>] (hfsc_dequeue+0x348/0x370 [sch_hfsc]) Sep 4 22:27:11 router kernel: [247398.002200] [<7f360e08>] (hfsc_dequeue [sch_hfsc]) from [<80669ef8>] (__qdisc_run+0x88/0x274) Sep 4 22:27:11 router kernel: [247398.002223] [<80669ef8>] (__qdisc_run) from [<8063f648>] (__dev_queue_xmit+0x1f4/0x734) Sep 4 22:27:11 router kernel: [247398.002244] [<8063f648>] (__dev_queue_xmit) from [<8063fba4>] (dev_queue_xmit+0x1c/0x20) Sep 4 22:27:11 router kernel: [247398.002264] [<8063fba4>] (dev_queue_xmit) from [<8064b780>] (neigh_direct_output+0x1c/0x20) Sep 4 22:27:11 router kernel: [247398.002283] [<8064b780>] (neigh_direct_output) from [<806849dc>] (ip_finish_output2+0x1d0/0x3a8) Sep 4 22:27:11 router kernel: [247398.002301] [<806849dc>] (ip_finish_output2) from [<806864f0>] (ip_finish_output+0x138/0x204) Sep 4 22:27:11 router kernel: [247398.002317] [<806864f0>] (ip_finish_output) from [<8068706c>] (ip_output+0xc4/0x12c) Sep 4 22:27:11 router kernel: [247398.002338] [<8068706c>] (ip_output) from [<80682fa4>] (ip_forward_finish+0xdc/0xf4) Sep 4 22:27:11 router kernel: [247398.002361] [<80682fa4>] (ip_forward_finish) from [<80683364>] (ip_forward+0x3a8/0x4bc) Sep 4 22:27:11 router kernel: [247398.002382] [<80683364>] (ip_forward) from [<80681020>] (ip_rcv_finish+0x230/0x490) Sep 4 22:27:11 router kernel: [247398.002402] [<80681020>] (ip_rcv_finish) from [<806817b4>] (ip_rcv+0x320/0x558) Sep 4 22:27:11 router kernel: [247398.002422] [<806817b4>] (ip_rcv) from [<8063ae00>] (__netif_receive_skb_core+0x750/0xb88) Sep 4 22:27:11 router kernel: [247398.002442] [<8063ae00>] (__netif_receive_skb_core) from [<8063d428>] (__netif_receive_skb+0x20/0x7c) Sep 4 22:27:11 router kernel: [247398.002462] [<8063d428>] (__netif_receive_skb) from [<8063fbf4>] (netif_receive_skb_internal+0x30/0x3e8) Sep 4 22:27:11 router kernel: [247398.002482] [<8063fbf4>] (netif_receive_skb_internal) from [<8063ffd0>] (netif_receive_skb+0x24/0x98) Sep 4 22:27:11 router kernel: [247398.002634] [<8063ffd0>] (netif_receive_skb) from [<7f2bb154>] (br_netif_receive_skb+0x50/0x68 [bridge]) Sep 4 22:27:11 router kernel: [247398.002807] [<7f2bb154>] (br_netif_receive_skb [bridge]) from [<7f2bb224>] (br_pass_frame_up+0xb8/0x118 [bridge]) Sep 4 22:27:11 router kernel: [247398.002976] [<7f2bb224>] (br_pass_frame_up [bridge]) from [<7f2bb3f0>] (br_handle_frame_finish+0x128/0x4cc [bridge]) Sep 4 22:27:11 router kernel: [247398.003143] [<7f2bb3f0>] (br_handle_frame_finish [bridge]) from [<7f2bb904>] (br_handle_frame+0x170/0x2cc [bridge]) Sep 4 22:27:11 router kernel: [247398.003247] [<7f2bb904>] (br_handle_frame [bridge]) from [<8063a898>] (__netif_receive_skb_core+0x1e8/0xb88) Sep 4 22:27:11 router kernel: [247398.003266] [<8063a898>] (__netif_receive_skb_core) from [<8063d428>] (__netif_receive_skb+0x20/0x7c) Sep 4 22:27:11 router kernel: [247398.003287] [<8063d428>] (__netif_receive_skb) from [<8063fbf4>] (netif_receive_skb_internal+0x30/0x3e8) Sep 4 22:27:11 router kernel: [247398.003307] [<8063fbf4>] (netif_receive_skb_internal) from [<80640c98>] (napi_gro_receive+0xec/0x120) Sep 4 22:27:11 router kernel: [247398.003351] [<80640c98>] (napi_gro_receive) from [<7f18776c>] (r8152_poll+0x360/0xfa0 [r8152]) Sep 4 22:27:11 router kernel: [247398.003385] [<7f18776c>] (r8152_poll [r8152]) from [<80640504>] (net_rx_action+0x1d4/0x434) Sep 4 22:27:11 router kernel: [247398.003405] [<80640504>] (net_rx_action) from [<80101628>] (__do_softirq+0x118/0x3b8) Sep 4 22:27:11 router kernel: [247398.003425] [<80101628>] (__do_softirq) from [<801227a4>] (irq_exit+0xf8/0x164) Sep 4 22:27:11 router kernel: [247398.003444] [<801227a4>] (irq_exit) from [<8016e740>] (__handle_domain_irq+0x68/0xc4) Sep 4 22:27:11 router kernel: [247398.003464] [<8016e740>] (__handle_domain_irq) from [<80101508>] (bcm2836_arm_irqchip_handle_irq+0xa4/0xac) Sep 4 22:27:11 router kernel: [247398.003486] [<80101508>] (bcm2836_arm_irqchip_handle_irq) from [<807343fc>] (__irq_svc+0x5c/0x7c) Sep 4 22:27:11 router kernel: [247398.003494] Exception stack(0x80c01ef8 to 0x80c01f40) Sep 4 22:27:11 router kernel: [247398.003505] 1ee0: 80c04178 6e3bc8ec Sep 4 22:27:11 router kernel: [247398.003519] 1f00: 80c01f58 00000000 80c00000 80c03dd0 80c03d6c 80c75290 00000001 bb7ffa40 Sep 4 22:27:11 router kernel: [247398.003534] 1f20: 00000001 80c01f54 80c01f48 80c01f48 801087f0 801087f4 60000013 ffffffff Sep 4 22:27:11 router kernel: [247398.003559] [<807343fc>] (__irq_svc) from [<801087f4>] (arch_cpu_idle+0x30/0x4c) Sep 4 22:27:11 router kernel: [247398.003580] [<801087f4>] (arch_cpu_idle) from [<80733b58>] (default_idle_call+0x34/0x48) Sep 4 22:27:11 router kernel: [247398.003600] [<80733b58>] (default_idle_call) from [<8015c56c>] (do_idle+0xc4/0x148) Sep 4 22:27:11 router kernel: [247398.003616] [<8015c56c>] (do_idle) from [<8015c86c>] (cpu_startup_entry+0x28/0x2c) Sep 4 22:27:11 router kernel: [247398.003642] [<8015c86c>] (cpu_startup_entry) from [<8072dc14>] (rest_init+0x70/0x88) Sep 4 22:27:11 router kernel: [247398.003664] [<8072dc14>] (rest_init) from [<80b00d68>] (start_kernel+0x360/0x3d0) Sep 4 22:27:11 router kernel: [247398.003740] ---[ end trace bc514feaf013158b ]--- Fix supposed to be in patch "net_sched/codel: do not defer queue length update" which isn't applied anywhere. Ah, I thought it's the one in Comment 38. Excuse me for that. Just grabbed the Comment 40 one. Thanks for the patch. About the patch net_sched/codel: do not defer queue length update", has it ever been applied to the mainline kernel? I don't see the patched code there. (In reply to Akemi Yagi from comment #48) > About the patch net_sched/codel: do not defer queue length update", has it > ever been applied to the mainline kernel? I don't see the patched code there. This warning is resolved by: commit 35b42da69e35536da603a50e40aa6c41b2f7b0f8 Author: Cong Wang <xiyou.wangcong@gmail.com> Date: Fri Jun 22 14:33:16 2018 -0700 net_sched: remove a bogus warning in hfsc |