Bug 39372 - Problems with HFSC Scheduler
Summary: Problems with HFSC Scheduler
Status: RESOLVED CODE_FIX
Alias: None
Product: Networking
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Arnaldo Carvalho de Melo
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-07-14 13:07 UTC by Lucas Bocchi
Modified: 2012-08-15 21:55 UTC (History)
3 users (show)

See Also:
Kernel Version: 2.6.39.3
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Script that call the qdisc (6.56 KB, application/octet-stream)
2011-07-14 13:11 UTC, Lucas Bocchi
Details
The config file to the kernel (121.13 KB, application/octet-stream)
2011-07-14 13:13 UTC, Lucas Bocchi
Details
The file with the OOPS out (32 bytes, text/plain)
2011-07-14 13:19 UTC, Lucas Bocchi
Details

Description Lucas Bocchi 2011-07-14 13:07:50 UTC
We have a problem with HFSC scheduler. When use it with my configuration, in aleatory times the kernel hang and the network device with attached scheduler hangs and kernel stops to work for a long time.

Above I'll attach the files with the scheduler used by me

Additional Information

Linux optimus 2.6.39.3 #1 SMP Wed Jul 13 09:40:20 BRT 2011 x86_64 GNU/Linux

Gnu C                  4.6.1
Gnu make               3.81
binutils               2.21.52.20110606
util-linux             2.17.2
mount                  support
module-init-tools      3.16
e2fsprogs              1.42-WIP
xfsprogs               3.1.5
PPP                    2.4.5
Linux C Library        2.13
Dynamic linker (ldd)   2.13
Procps                 3.2.8
Net-tools              1.60
Console-tools          0.2.3
Sh-utils               8.5
Modules Loaded         cls_u32 sch_sfq sch_hfsc sch_prio pppoe pppox nf_nat_sip nf_conntrack_sip nf_nat_ftp nf_conntrack_ftp xt_owner ipt_LOG xt_recent xt_hashlimit xt_TCPMSS xt_tcpmss xt_mark xt_connmark xt_state ipt_MASQUERADE ipt_REDIRECT ipt_REJECT xt_tcpudp iptable_mangle iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_filter ip_tables x_tables tun fuse nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc sit tunnel4 ppp_generic slhc ext4 jbd2 crc16 sbs sbshc it87 hwmon_vid coretemp loop kvm_intel kvm snd_hda_codec_via snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer snd soundcore snd_page_alloc psmouse serio_raw parport_pc parport processor pcspkr evdev asus_atk0110 rng_core button thermal_sys ext3 jbd mbcache btrfs zlib_deflate crc32c libcrc32c usbhid hid ide_gd_mod sd_mod crc_t10dif ata_generic pata_acpi uhci_hcd ata_piix libata scsi_mod floppy ehci_hcd ide_pci_generic r8169 8139too 8139cp mii piix ide_core usbcore
Comment 1 Lucas Bocchi 2011-07-14 13:11:26 UTC
Created attachment 65612 [details]
Script that call the qdisc
Comment 2 Lucas Bocchi 2011-07-14 13:13:16 UTC
Created attachment 65622 [details]
The config file to the kernel
Comment 3 Lucas Bocchi 2011-07-14 13:19:26 UTC
Created attachment 65632 [details]
The file with the OOPS out
Comment 4 Andrew Morton 2011-07-14 22:15:05 UTC
(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Thu, 14 Jul 2011 13:07:59 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=39372
> 
>            Summary: Problems with HFSC Scheduler
>            Product: Networking
>            Version: 2.5
>     Kernel Version: 2.6.39.3
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Other
>         AssignedTo: acme@ghostprotocols.net
>         ReportedBy: lucas.bocchi@gmail.com
>         Regression: No
> 
> 
> We have a problem with HFSC scheduler. When use it with my configuration, in
> aleatory times the kernel hang and the network device with attached scheduler
> hangs and kernel stops to work for a long time.
> 
> Above I'll attach the files with the scheduler used by me
> 
> Additional Information
> 
> Linux optimus 2.6.39.3 #1 SMP Wed Jul 13 09:40:20 BRT 2011 x86_64 GNU/Linux
> 
> Gnu C                  4.6.1
> Gnu make               3.81
> binutils               2.21.52.20110606
> util-linux             2.17.2
> mount                  support
> module-init-tools      3.16
> e2fsprogs              1.42-WIP
> xfsprogs               3.1.5
> PPP                    2.4.5
> Linux C Library        2.13
> Dynamic linker (ldd)   2.13
> Procps                 3.2.8
> Net-tools              1.60
> Console-tools          0.2.3
> Sh-utils               8.5
> Modules Loaded         cls_u32 sch_sfq sch_hfsc sch_prio pppoe pppox
> nf_nat_sip
> nf_conntrack_sip nf_nat_ftp nf_conntrack_ftp xt_owner ipt_LOG xt_recent
> xt_hashlimit xt_TCPMSS xt_tcpmss xt_mark xt_connmark xt_state ipt_MASQUERADE
> ipt_REDIRECT ipt_REJECT xt_tcpudp iptable_mangle iptable_nat nf_nat
> nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_filter ip_tables
> x_tables
> tun fuse nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc sit tunnel4
> ppp_generic slhc ext4 jbd2 crc16 sbs sbshc it87 hwmon_vid coretemp loop
> kvm_intel kvm snd_hda_codec_via snd_hda_intel snd_hda_codec snd_hwdep snd_pcm
> snd_timer snd soundcore snd_page_alloc psmouse serio_raw parport_pc parport
> processor pcspkr evdev asus_atk0110 rng_core button thermal_sys ext3 jbd
> mbcache btrfs zlib_deflate crc32c libcrc32c usbhid hid ide_gd_mod sd_mod
> crc_t10dif ata_generic pata_acpi uhci_hcd ata_piix libata scsi_mod floppy
> ehci_hcd ide_pci_generic r8169 8139too 8139cp mii piix ide_core usbcore
> 

It's a warning storm, not really an oops:

Jul 13 18:00:22 optimus kernel: [28933.952120] ------------[ cut here ]------------
Jul 13 18:00:22 optimus kernel: [28933.952171] WARNING: at net/sched/sch_hfsc.c:1427 hfsc_dequeue+0x12c/0x275 [sch_hfsc]()
Jul 13 18:00:22 optimus kernel: [28933.952234] Hardware name: System Product Name
Jul 13 18:00:22 optimus kernel: [28933.952265] Modules linked in: cls_u32 sch_sfq sch_hfsc sch_prio xfs nf_nat_sip nf_conntrack_sip nf_nat_ftp nf_conntrack_ftp xt_owner ipt_LOG xt_recent xt_hashlimit xt_TCPMSS xt_tcpmss xt_mark xt_connmark xt_state ipt_MASQUERADE ipt_REDIRECT ipt_REJECT xt_tcpudp iptable_mangle tun iptable_filter iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables x_tables pppoe pppox fuse nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc sit tunnel4 ppp_generic slhc ext4 jbd2 crc16 sbs sbshc it87 hwmon_vid coretemp loop kvm_intel kvm snd_hda_codec_via snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer snd asus_atk0110 soundcore evdev snd_page_alloc rng_core processor parport_pc parport thermal_sys button pcspkr psmouse serio_raw ext3 jbd mbcache btrfs zlib_deflate crc32c libcrc32c sd_mod ide_gd_mod crc_t10dif ata_generic pata_acpi ata_piix libata scsi_mod piix floppy uhci_hcd ide_pci_generic ehci_hcd 8139too 8139cp ide_core r8169!
  mii usbcore [last unloaded: scsi_wait_
Jul 13 18:00:22 optimus kernel: scan]
Jul 13 18:00:22 optimus kernel: [28933.952811] Pid: 0, comm: swapper Tainted: G        W   2.6.39.3 #1
Jul 13 18:00:22 optimus kernel: [28933.952843] Call Trace:
Jul 13 18:00:22 optimus kernel: [28933.952866]  <IRQ>  [<ffffffff81031aae>] ? warn_slowpath_common+0x78/0x8c
Jul 13 18:00:22 optimus kernel: [28933.952906]  [<ffffffffa00fb9ec>] ? hfsc_dequeue+0x12c/0x275 [sch_hfsc]
Jul 13 18:00:22 optimus kernel: [28933.952940]  [<ffffffffa000704d>] ? prio_dequeue+0x1c/0x6e [sch_prio]
Jul 13 18:00:22 optimus kernel: [28933.952977]  [<ffffffffa0112920>] ? rtl8139_start_xmit+0x6a/0xf7 [8139too]
Jul 13 18:00:22 optimus kernel: [28933.953012]  [<ffffffff8123408e>] ? __qdisc_run+0x8e/0x115
Jul 13 18:00:22 optimus kernel: [28933.953044]  [<ffffffff8121ae48>] ? net_tx_action+0xef/0x124
Jul 13 18:00:22 optimus kernel: [28933.953075]  [<ffffffff81036bbb>] ? __do_softirq+0xc7/0x192
Jul 13 18:00:22 optimus kernel: [28933.953105]  [<ffffffff812d5fdc>] ? call_softirq+0x1c/0x26
Jul 13 18:00:22 optimus kernel: [28933.953105]  [<ffffffff810037ba>] ? do_softirq+0x3c/0x7a
Jul 13 18:00:22 optimus kernel: [28933.953105]  [<ffffffff81036e4c>] ? irq_exit+0x4a/0x94
Jul 13 18:00:22 optimus kernel: [28933.953105]  [<ffffffff810156fb>] ? smp_apic_timer_interrupt+0x75/0x82
Jul 13 18:00:22 optimus kernel: [28933.953105]  [<ffffffff812d578e>] ? apic_timer_interrupt+0xe/0x20
Jul 13 18:00:22 optimus kernel: [28933.953105]  [<ffffffff812d5793>] ? apic_timer_interrupt+0x13/0x20
Jul 13 18:00:22 optimus kernel: [28933.953105]  <EOI>  [<ffffffff81007be7>] ? mwait_idle+0x8b/0xb7
Jul 13 18:00:22 optimus kernel: [28933.953105]  [<ffffffff81007bda>] ? mwait_idle+0x7e/0xb7
Jul 13 18:00:22 optimus kernel: [28933.953105]  [<ffffffff810011e2>] ? cpu_idle+0x9d/0xd7
Jul 13 18:00:22 optimus kernel: [28933.953105]  [<ffffffff81b91b16>] ? start_kernel+0x3b4/0x3bf
Jul 13 18:00:22 optimus kernel: [28933.953105]  [<ffffffff81b91140>] ? early_idt_handlers+0x140/0x140
Jul 13 18:00:22 optimus kernel: [28933.953105]  [<ffffffff81b913a8>] ? x86_64_start_kernel+0x104/0x111
Jul 13 18:00:22 optimus kernel: [28933.953105] ---[ end trace 05058983c1ade13b ]---
Jul 13 18:00:22 optimus kernel: [28933.953105] ------------[ cut here ]------------


Here:
	WARN_ON(next_time == 0);

There's some more info in the bugzilla report.
Comment 5 Eric Dumazet 2011-07-29 13:27:51 UTC
Le vendredi 29 juillet 2011 à 14:29 +0200, Michal Soltys a écrit :
> On 11-07-15 00:14, Andrew Morton wrote:
> > 
> > (switched to email.  Please respond via emailed reply-to-all, not via
> > the bugzilla web interface).
> > 
> > 
> > Here: WARN_ON(next_time == 0);
> > 
> 
> From the other thread on netfilter-devel:
> 
> > On 11-07-22 11:58, Michal Pokrywka wrote: After bisecting 2.6.39.1 it
> > turned out that the bug is caused independently by two patches:
> > 
> > commit b262a5da755cc6ed0cb4fba230cd9bf4037e1096 sch_sfq: fix peek()
> > implementation
> > 
> > and
> > 
> > commit 9df49f2bfe862573911a080c75a6d81113c5c81d sch_sfq: avoid giving
> > spurious NET_XMIT_CN signals
> > 
> > Reverting these patches makes HFSC work again.
> > 
> 
> This one (upstream 8efa885406359af300d46910642b50ca82c0fe47) seems to be
> the culprit (does reverting only that one cures the problem ?)
> 
> It allows SFQ to return success on enqueuing, when the packet really
> replaced some other packet in some other flow. This confuses outer qdisc
> (in this particular case HFSC) which thinks new packet was actually
> added each time such situation happes.
> 

Technically speaking, _this_ packet was successfuly enqueued.

Returning NET_XMIT_CN or NET_XMIT_SUCCESS should not trigger a bug in
caller.

> This in turn causes additional dequeues and ends with attempt
> to schedule non-existent packets, and triggers the warning.
> 

Then its probably a bug in HFSC : It doesnt understand SFQ lost a
packet.

I'll take a look, thanks for the report.
Comment 6 Michal Soltys 2011-07-29 13:31:21 UTC
On 11-07-15 00:14, Andrew Morton wrote:
> 
> (switched to email.  Please respond via emailed reply-to-all, not via
> the bugzilla web interface).
> 
> 
> Here: WARN_ON(next_time == 0);
> 

From the other thread on netfilter-devel:

> On 11-07-22 11:58, Michal Pokrywka wrote: After bisecting 2.6.39.1 it
> turned out that the bug is caused independently by two patches:
> 
> commit b262a5da755cc6ed0cb4fba230cd9bf4037e1096 sch_sfq: fix peek()
> implementation
> 
> and
> 
> commit 9df49f2bfe862573911a080c75a6d81113c5c81d sch_sfq: avoid giving
> spurious NET_XMIT_CN signals
> 
> Reverting these patches makes HFSC work again.
> 

This one (upstream 8efa885406359af300d46910642b50ca82c0fe47) seems to be
the culprit (does reverting only that one cures the problem ?)

It allows SFQ to return success on enqueuing, when the packet really
replaced some other packet in some other flow. This confuses outer qdisc
(in this particular case HFSC) which thinks new packet was actually
added each time such situation happes.

This in turn causes additional dequeues and ends with attempt
to schedule non-existent packets, and triggers the warning.


ps.

removed netfilter from cc, as it's not really netfilter issue.
Comment 7 Eric Dumazet 2011-07-29 14:02:12 UTC
Le vendredi 29 juillet 2011 à 15:27 +0200, Eric Dumazet a écrit :
> Le vendredi 29 juillet 2011 à 14:29 +0200, Michal Soltys a écrit :
> > On 11-07-15 00:14, Andrew Morton wrote:
> > > 
> > > (switched to email.  Please respond via emailed reply-to-all, not via
> > > the bugzilla web interface).
> > > 
> > > 
> > > Here: WARN_ON(next_time == 0);
> > > 
> > 
> > From the other thread on netfilter-devel:
> > 
> > > On 11-07-22 11:58, Michal Pokrywka wrote: After bisecting 2.6.39.1 it
> > > turned out that the bug is caused independently by two patches:
> > > 
> > > commit b262a5da755cc6ed0cb4fba230cd9bf4037e1096 sch_sfq: fix peek()
> > > implementation
> > > 
> > > and
> > > 
> > > commit 9df49f2bfe862573911a080c75a6d81113c5c81d sch_sfq: avoid giving
> > > spurious NET_XMIT_CN signals
> > > 
> > > Reverting these patches makes HFSC work again.
> > > 
> > 
> > This one (upstream 8efa885406359af300d46910642b50ca82c0fe47) seems to be
> > the culprit (does reverting only that one cures the problem ?)
> > 
> > It allows SFQ to return success on enqueuing, when the packet really
> > replaced some other packet in some other flow. This confuses outer qdisc
> > (in this particular case HFSC) which thinks new packet was actually
> > added each time such situation happes.
> > 
> 
> Technically speaking, _this_ packet was successfuly enqueued.
> 
> Returning NET_XMIT_CN or NET_XMIT_SUCCESS should not trigger a bug in
> caller.
> 
> > This in turn causes additional dequeues and ends with attempt
> > to schedule non-existent packets, and triggers the warning.
> > 
> 
> Then its probably a bug in HFSC : It doesnt understand SFQ lost a
> packet.
> 
> I'll take a look, thanks for the report.
> 
> 

Oh well, it seems one qdisc_tree_decrease_qlen(sch, 1) is missing

Maybe following patch would help...


diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index 4536ee6..2a2d287 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -410,7 +410,12 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	/* Return Congestion Notification only if we dropped a packet
 	 * from this flow.
 	 */
-	return (qlen != slot->qlen) ? NET_XMIT_CN : NET_XMIT_SUCCESS;
+	if (qlen != slot->qlen)
+		return NET_XMIT_CN;
+
+	/* as we dropped a packet, better let upper stack know this */
+	qdisc_tree_decrease_qlen(sch, 1);
+	return NET_XMIT_SUCCESS;
 }
 
 static struct sk_buff *
Comment 8 Patrick McHardy 2011-07-29 14:50:42 UTC
On 29.07.2011 16:00, Eric Dumazet wrote:
> Le vendredi 29 juillet 2011 à 15:27 +0200, Eric Dumazet a écrit :
>> Le vendredi 29 juillet 2011 à 14:29 +0200, Michal Soltys a écrit :
>>> On 11-07-15 00:14, Andrew Morton wrote:
>>>>
>>>> (switched to email.  Please respond via emailed reply-to-all, not via
>>>> the bugzilla web interface).
>>>>
>>>>
>>>> Here: WARN_ON(next_time == 0);
>>>>
>>>
>>> From the other thread on netfilter-devel:
>>>
>>>> On 11-07-22 11:58, Michal Pokrywka wrote: After bisecting 2.6.39.1 it
>>>> turned out that the bug is caused independently by two patches:
>>>>
>>>> commit b262a5da755cc6ed0cb4fba230cd9bf4037e1096 sch_sfq: fix peek()
>>>> implementation
>>>>
>>>> and
>>>>
>>>> commit 9df49f2bfe862573911a080c75a6d81113c5c81d sch_sfq: avoid giving
>>>> spurious NET_XMIT_CN signals
>>>>
>>>> Reverting these patches makes HFSC work again.
>>>>
>>>
>>> This one (upstream 8efa885406359af300d46910642b50ca82c0fe47) seems to be
>>> the culprit (does reverting only that one cures the problem ?)
>>>
>>> It allows SFQ to return success on enqueuing, when the packet really
>>> replaced some other packet in some other flow. This confuses outer qdisc
>>> (in this particular case HFSC) which thinks new packet was actually
>>> added each time such situation happes.
>>>
>>
>> Technically speaking, _this_ packet was successfuly enqueued.
>>
>> Returning NET_XMIT_CN or NET_XMIT_SUCCESS should not trigger a bug in
>> caller.
>>
>>> This in turn causes additional dequeues and ends with attempt
>>> to schedule non-existent packets, and triggers the warning.
>>>
>>
>> Then its probably a bug in HFSC : It doesnt understand SFQ lost a
>> packet.
>>
>> I'll take a look, thanks for the report.
>>
>>
> 
> Oh well, it seems one qdisc_tree_decrease_qlen(sch, 1) is missing
> 
> Maybe following patch would help...
> 
> 
> diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
> index 4536ee6..2a2d287 100644
> --- a/net/sched/sch_sfq.c
> +++ b/net/sched/sch_sfq.c
> @@ -410,7 +410,12 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
>       /* Return Congestion Notification only if we dropped a packet
>        * from this flow.
>        */
> -     return (qlen != slot->qlen) ? NET_XMIT_CN : NET_XMIT_SUCCESS;
> +     if (qlen != slot->qlen)
> +             return NET_XMIT_CN;
> +
> +     /* as we dropped a packet, better let upper stack know this */
> +     qdisc_tree_decrease_qlen(sch, 1);
> +     return NET_XMIT_SUCCESS;
>  }
>  

Yeah, that seems to be the correct fix, thanks for looking into this.
Comment 9 Eric Dumazet 2011-07-30 05:22:56 UTC
Le vendredi 29 juillet 2011 à 16:11 +0200, Patrick McHardy a écrit :

> Yeah, that seems to be the correct fix, thanks for looking into this.

Thanks Patrick, here is the official patch submission then ;)

[PATCH] sch_sfq: fix sfq_enqueue()

commit 8efa88540635 (sch_sfq: avoid giving spurious NET_XMIT_CN signals)
forgot to call qdisc_tree_decrease_qlen() to signal upper levels that a
packet (from another flow) was dropped, leading to various problems. 

With help from Michal Soltys and Michal Pokrywka, who did a bisection.

Bugzilla ref: https://bugzilla.kernel.org/show_bug.cgi?id=39372
Debian ref: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=631945

Reported-by: Lucas Bocchi <lucas.bocchi@gmail.com>
Reported-and-bisected-by: Michal Pokrywka <wolfmoon@o2.pl>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Michal Soltys <soltys@ziu.info>
Acked-by: Patrick McHardy <kaber@trash.net>
---
 net/sched/sch_sfq.c |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index 4536ee6..2a2d287 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -410,7 +410,12 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	/* Return Congestion Notification only if we dropped a packet
 	 * from this flow.
 	 */
-	return (qlen != slot->qlen) ? NET_XMIT_CN : NET_XMIT_SUCCESS;
+	if (qlen != slot->qlen)
+		return NET_XMIT_CN;
+
+	/* As we dropped a packet, better let upper stack know this */
+	qdisc_tree_decrease_qlen(sch, 1);
+	return NET_XMIT_SUCCESS;
 }
 
 static struct sk_buff *
Comment 10 David S. Miller 2011-08-01 09:27:52 UTC
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sat, 30 Jul 2011 07:22:42 +0200

> [PATCH] sch_sfq: fix sfq_enqueue()
> 
> commit 8efa88540635 (sch_sfq: avoid giving spurious NET_XMIT_CN signals)
> forgot to call qdisc_tree_decrease_qlen() to signal upper levels that a
> packet (from another flow) was dropped, leading to various problems. 
> 
> With help from Michal Soltys and Michal Pokrywka, who did a bisection.
> 
> Bugzilla ref: https://bugzilla.kernel.org/show_bug.cgi?id=39372
> Debian ref: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=631945
> 
> Reported-by: Lucas Bocchi <lucas.bocchi@gmail.com>
> Reported-and-bisected-by: Michal Pokrywka <wolfmoon@o2.pl>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> CC: Michal Soltys <soltys@ziu.info>
> Acked-by: Patrick McHardy <kaber@trash.net>

Applied, thanks.
Comment 11 Florian Mickler 2011-08-08 08:11:55 UTC
A patch referencing this bug report has been merged in Linux v3.1-rc1:

commit e1738bd9cecc5c867b0e2996470c1ff20f66ba79
Author: Eric Dumazet <eric.dumazet@gmail.com>
Date:   Fri Jul 29 19:22:42 2011 +0000

    sch_sfq: fix sfq_enqueue()
Comment 12 dashang 2011-08-25 15:10:57 UTC
(In reply to comment #7)
> Le vendredi 29 juillet 2011 à 15:27 +0200, Eric Dumazet a écrit :
> > Le vendredi 29 juillet 2011 à 14:29 +0200, Michal Soltys a écrit :
> > > On 11-07-15 00:14, Andrew Morton wrote:
> > > > 
> > > > (switched to email.  Please respond via emailed reply-to-all, not via
> > > > the bugzilla web interface).
> > > > 
> > > > 
> > > > Here: WARN_ON(next_time == 0);
> > > > 
> > > 
> > > From the other thread on netfilter-devel:
> > > 
> > > > On 11-07-22 11:58, Michal Pokrywka wrote: After bisecting 2.6.39.1 it
> > > > turned out that the bug is caused independently by two patches:
> > > > 
> > > > commit b262a5da755cc6ed0cb4fba230cd9bf4037e1096 sch_sfq: fix peek()
> > > > implementation
> > > > 
> > > > and
> > > > 
> > > > commit 9df49f2bfe862573911a080c75a6d81113c5c81d sch_sfq: avoid giving
> > > > spurious NET_XMIT_CN signals
> > > > 
> > > > Reverting these patches makes HFSC work again.
> > > > 
> > > 
> > > This one (upstream 8efa885406359af300d46910642b50ca82c0fe47) seems to be
> > > the culprit (does reverting only that one cures the problem ?)
> > > 
> > > It allows SFQ to return success on enqueuing, when the packet really
> > > replaced some other packet in some other flow. This confuses outer qdisc
> > > (in this particular case HFSC) which thinks new packet was actually
> > > added each time such situation happes.
> > > 
> > 
> > Technically speaking, _this_ packet was successfuly enqueued.
> > 
> > Returning NET_XMIT_CN or NET_XMIT_SUCCESS should not trigger a bug in
> > caller.
> > 
> > > This in turn causes additional dequeues and ends with attempt
> > > to schedule non-existent packets, and triggers the warning.
> > > 
> > 
> > Then its probably a bug in HFSC : It doesnt understand SFQ lost a
> > packet.
> > 
> > I'll take a look, thanks for the report.
> > 
> > 
> 
> Oh well, it seems one qdisc_tree_decrease_qlen(sch, 1) is missing
> 
> Maybe following patch would help...
> 
> 
> diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
> index 4536ee6..2a2d287 100644
> --- a/net/sched/sch_sfq.c
> +++ b/net/sched/sch_sfq.c
> @@ -410,7 +410,12 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
>      /* Return Congestion Notification only if we dropped a packet
>       * from this flow.
>       */
> -    return (qlen != slot->qlen) ? NET_XMIT_CN : NET_XMIT_SUCCESS;
> +    if (qlen != slot->qlen)
> +        return NET_XMIT_CN;
> +
> +    /* as we dropped a packet, better let upper stack know this */
> +    qdisc_tree_decrease_qlen(sch, 1);
> +    return NET_XMIT_SUCCESS;
>  }
> 
>  static struct sk_buff *

i got same problem but  after apply this patch its successfully running ...
but one more problem is in hfsc scheduling  is when i am start surfing at that time  its call different class and different root ...which i have not created...
[ class sfq 5:3e parent 5: ] [ class sfq 5:2ec parent 5: ] [ class sfq 5:3c6 parent 5: ] this all classes are call automatically and i am  not getting because when i stop surfing that class is not show in tc...the class is sfq ...so is it sfq patch problem...???please guide ..

this is output of tc

qdisc hfsc 1: root refcnt 2 default 1 
 Sent 700540 bytes 871 pkt (dropped 0, overlimits 693 requeues 0) 
 backlog 0b 6p requeues 0 
qdisc sfq 5: parent 1:5 limit 127p quantum 1514b flows 127/1024 perturb 10sec 
 Sent 574303 bytes 483 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 12860b 6p requeues 0 
####################################################################################################
                                                        Classes
####################################################################################################
class hfsc 1: root 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0 
 period 0 level 2 

class hfsc 1:1 parent 1: sc m1 0bit d 0us m2 800000Kbit 
 Sent 127195 bytes 391 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0 
 period 390 work 127195 bytes rtwork 125235 bytes level 0 

class hfsc 1:2 parent 1: sc m1 0bit d 0us m2 32768Kbit ul m1 0bit d 0us m2 32768Kbit 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0 
 period 0 level 0 

class hfsc 1:3 parent 1: sc m1 0bit d 0us m2 32768Kbit ul m1 0bit d 0us m2 32768Kbit 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0 
 period 0 level 0 

class hfsc 1:4 parent 1: sc m1 0bit d 0us m2 32768Kbit ul m1 0bit d 0us m2 32768Kbit 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0 
 period 46 work 574245 bytes level 1 

class hfsc 1:5 parent 1:4 leaf 5: sc m1 0bit d 0us m2 1000bit ul m1 0bit d 0us m2 40000bit 
 Sent 587163 bytes 494 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 6p requeues 0 
 period 46 work 574245 bytes rtwork 32261 bytes level 0 

class sfq 5:3e parent 5: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 58b 1p requeues 0 
 allot 1520 

class sfq 5:2ec parent 5: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 12731b 3p requeues 0 
 allot -4320 

class sfq 5:3c6 parent 5: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 71b 1p requeues 0 
 allot 64
Comment 13 Lucas Bocchi 2011-08-25 16:51:27 UTC
Don't worry about that. 

man tc-sfq will solve your doubt.
Comment 14 Lucas Bocchi 2011-08-25 17:10:22 UTC
Don't worry about that. 

man tc-sfq will solve your doubt.

Note You need to log in before you can comment on or make changes to this bug.