Bug 11571
Summary: | u32_classify Kernel Panic | ||
---|---|---|---|
Product: | Networking | Reporter: | m0sia (m0sia) |
Component: | Other | Assignee: | Arnaldo Carvalho de Melo (acme) |
Status: | RESOLVED CODE_FIX | ||
Severity: | high | ||
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.26.5 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
2.6.18-6-686 Kernel Panic
2.6.25.6 Kernel Panic |
Description
m0sia
2008-09-15 03:35:15 UTC
Created attachment 17782 [details]
2.6.18-6-686 Kernel Panic
Created attachment 17783 [details]
2.6.25.6 Kernel Panic
Reply-To: akpm@linux-foundation.org (switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Mon, 15 Sep 2008 03:35:16 -0700 (PDT) bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=11571 > > Summary: u32_classify Kernel Panic > Product: Networking > Version: 2.5 > KernelVersion: 2.6.26.5 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: high > Priority: P1 > Component: Other > AssignedTo: acme@ghostprotocols.net > ReportedBy: m0sia@plotinka.ru > > > Distribution: Debian > > Problem Description: > Kernel panic > [<c023f2d8>] dev_queue_xmit+0x175/0x2a1 > [<c0243861>] neigh_resolve_output+0x1f8/0x2la > [<c025a784>] ip_finish_output+0x1d7/0x200 > [<c025aa2f>] ip_output+0x6f/0x81 > [<c0258218>] ip_forwardjinish+0x2c/0x2e > [<c0257223>] ip_rcv_f inish+0x263/0x27f > [<c023cc62>] netif_receive_skb+0x2c1/0x32b > [<f886f26d>] e1000_clean_rx_irq+0x395/0x46f [e1000] > [<f886f5f7>] e1000_clean+0x52/0x1db [e1000] > [<c013e8e4>] net_rx_action+0x8a/0x153 > [<c0128bfa>] __do_softirq+0x5d/0xc1 > [<c0128c98>] do_softirq+0x32/0x36 > [<c0185cb8>] do_IRQ+0x52/0x66 > [<c01887fa>] mwait_idle+0x8/0x32 > [<c018418b>] common_interrupt+0x23/0x28 > [<c01887fa>] mwait_idle+0xB/0x32 > [<c0188829>] muait_idle+0x2f/0x32 > [<c0182545>] cpu_ i d1e+0x88/0x9c > > > Code: 0c 8b 80 90 00 00 00 c7 44 24 14 00 00 00 00 c7 44 24 18 00 00 00 00 89 > 44 > 24 18 8b 54 24 0c 8b 74 aa 18 85 f6 0f 84 a0 01 00 00 <8b> 46 38 83 00 01 83 > 50 > 04 00 8b 4c 24 04 8b 46 38 23 81 88 00 > EIP: [<f8bf3670>] u32_classify+0x41/0x23f [cls_u32] SS:ESP 8868:f746fd44 > Kernel panic - not syncing: Fatal exception in interrupt > > Steps to reproduce: > > tc qdisc add dev eth1 root handle 1: htb > > tc class add dev eth1 parent 1: classid 1:1 htb rate 3600Kbit > tc class add dev eth1 parent 1:1 classid 1:11 htb rate 2800Kbit prio 0 > tc class add dev eth1 parent 1:1 classid 1:15 htb rate 100Kbit ceil 2800Kbit > prio 0 > tc class add dev eth1 parent 1:1 classid 1:19 htb rate 100Kbit ceil 500Kbit > prio 2 > > > N from 10 to 2000 > tc class add dev eth1 parent 1:{11,15,19} classid 1:$N htb rate 1Kbit ceil > {$SPEED}Kbit > tc filter add dev eth1 parent 1: protocol ip pref $N u32 match ip dst $IP > flowid 1:$N > > Everything worked with N smaller then 2000. The problem first acquired with > kernel 2.6.18-6-686 and is still present in 2.6.26.5 > > > -- > Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You are on the CC list for the bug, or are watching someone who is. Andrew Morton wrote, On 09/16/2008 06:15 PM:
> (switched to email. Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
>
> On Mon, 15 Sep 2008 03:35:16 -0700 (PDT) bugme-daemon@bugzilla.kernel.org
> wrote:
>
>> http://bugzilla.kernel.org/show_bug.cgi?id=11571
>>
>> Summary: u32_classify Kernel Panic
...
Could you add more details:
- .config
- gzipped cls_u32.o (compiled with CONFIG_DEBUG_INFO on)
- the first part of OOPS if possible.
- more exactly these tc commands from your script
Does this happen while creating or deleting something and is this
easy to reproduce?
Thanks,
Jarek P.
On Wed, Sep 17, 2008 at 09:38:32PM +0200, Jarek Poplawski wrote:
...
> >> http://bugzilla.kernel.org/show_bug.cgi?id=11571
> Does this happen while creating or deleting something and is this
> easy to reproduce?
If accidentally there is any deleting around try this patch, please.
Jarek P.
---
net/sched/cls_u32.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 246f906..9912ad5 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -433,7 +433,9 @@ static int u32_delete(struct tcf_proto *tp, unsigned long arg)
if (ht->refcnt == 1) {
ht->refcnt--;
+ tcf_tree_lock(tp);
u32_destroy_hnode(tp, ht);
+ tcf_tree_unlock(tp);
} else {
return -EBUSY;
}
Jarek Poplawski пишет: > Andrew Morton wrote, On 09/16/2008 06:15 PM: > > >> (switched to email. Please respond via emailed reply-to-all, not via the >> bugzilla web interface). >> >> On Mon, 15 Sep 2008 03:35:16 -0700 (PDT) bugme-daemon@bugzilla.kernel.org >> wrote: >> >> >>> http://bugzilla.kernel.org/show_bug.cgi?id=11571 >>> >>> Summary: u32_classify Kernel Panic >>> > ... > > Could you add more details: > - .config > - gzipped cls_u32.o (compiled with CONFIG_DEBUG_INFO on) > - the first part of OOPS if possible. > - more exactly these tc commands from your script > > Does this happen while creating or deleting something and is this > easy to reproduce? > > Thanks, > Jarek P. > It happens on a production server and i can't experiment with this bug. Now i'm using MARK iptable target and fw mark filter, because of this bug. I think it happens when deleting or adding filters(it occur automatically by script, when adding new user or user change speed). I'll try this patch on test server. On Thu, Sep 18, 2008 at 01:05:28PM +0600, m0sia wrote: ... >>>> http://bugzilla.kernel.org/show_bug.cgi?id=11571 >>>> >>>> Summary: u32_classify Kernel Panic ... > It happens on a production server and i can't experiment with this bug. > Now i'm using MARK iptable target and fw mark filter, because of this > bug. I think it happens when deleting or adding filters(it occur > automatically by script, when adding new user or user change speed). > I'll try this patch on test server. OK, no hurry. BTW, it looks like some traffic is needed in a qdisc while its filters are modified to trigger this. Probably, turning off CONFIG_CLS_U32_PERF can make this less visible. On the other hand, turning on memory debugging: CONFIG_DEBUG_SLAB or CONFIG_SLUB_DEBUG_ON should be helpful here. Jarek P. > On Thu, Sep 18, 2008 at 01:05:28PM +0600, m0sia wrote: > ... > >>>> http://bugzilla.kernel.org/show_bug.cgi?id=11571 ... (take 2) It seems there could be a problem with testing if this patch fixes this bug, but IMHO it's quite probable, and needed anyway. Jarek P. -----------------> pkt_sched: cls_u32: Fix locking in u32_change() New nodes are inserted in u32_change() under rtnl_lock() with wmb(), so without tcf_tree_lock() like in other classifiers (e.g. cls_fw). This isn't enough without rmb() on the read side, but on the other hand adding such barriers doesn't give any savings, so the lock is added instead. Reported-by: m0sia <m0sia@plotinka.ru> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> --- net/sched/cls_u32.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c index 05d1780..07372f6 100644 --- a/net/sched/cls_u32.c +++ b/net/sched/cls_u32.c @@ -638,8 +638,9 @@ static int u32_change(struct tcf_proto *tp, unsigned long base, u32 handle, break; n->next = *ins; - wmb(); + tcf_tree_lock(tp); *ins = n; + tcf_tree_unlock(tp); *arg = (unsigned long)n; return 0; From: Jarek Poplawski <jarkao2@gmail.com> Date: Mon, 5 Jan 2009 13:52:45 +0000 > pkt_sched: cls_u32: Fix locking in u32_change() > > New nodes are inserted in u32_change() under rtnl_lock() with wmb(), > so without tcf_tree_lock() like in other classifiers (e.g. cls_fw). > This isn't enough without rmb() on the read side, but on the other > hand adding such barriers doesn't give any savings, so the lock is > added instead. > > Reported-by: m0sia <m0sia@plotinka.ru> > Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Applied and queued up for -stable, thanks Jarek. For the record: it was acknowledged here: http://bugzilla.kernel.org/show_bug.cgi?id=12858#c3 the last patch really fixed the bug, so this report can be closed. Jarek P. |