Bug 6322 - Kernel Panic (tc filter delete panic)
Summary: Kernel Panic (tc filter delete panic)
Status: CLOSED CODE_FIX
Alias: None
Product: Networking
Classification: Unclassified
Component: IPV4 (show other bugs)
Hardware: i386 Linux
: P2 high
Assignee: Harald Welte
URL:
Keywords:
Depends on:
Blocks: 6681
  Show dependency tree
 
Reported: 2006-04-03 04:26 UTC by Badalian Slava
Modified: 2008-01-28 22:53 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.17.4
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
Config for kernel (25.87 KB, application/octet-stream)
2006-04-03 04:32 UTC, Badalian Slava
Details
Screenshot (563.10 KB, image/jpeg)
2006-04-03 04:46 UTC, Badalian Slava
Details
Screenshot 2 (463.80 KB, image/jpeg)
2006-04-03 04:46 UTC, Badalian Slava
Details
Screenshot 3 (539.24 KB, image/jpeg)
2006-04-03 04:47 UTC, Badalian Slava
Details
Screen Shot Kernel Panic on 2.6.16.1 (349.56 KB, image/gif)
2006-04-06 06:03 UTC, Badalian Slava
Details
create_nodes.sh (3.78 KB, application/octet-stream)
2006-04-11 23:05 UTC, Badalian Slava
Details
iptables_temp_rules.gz (19.08 KB, application/x-tar)
2006-04-11 23:05 UTC, Badalian Slava
Details
tc_temp_rules.gz (235.95 KB, application/x-tar)
2006-04-11 23:05 UTC, Badalian Slava
Details
Kernel panic on 20.04.05 part 1 (130.74 KB, image/jpeg)
2006-04-20 04:36 UTC, Badalian Slava
Details
Kernel panic on 20.04.05 part 2 (132.59 KB, image/jpeg)
2006-04-20 04:36 UTC, Badalian Slava
Details
Kernel panic on 19.04.05 part 1 (131.77 KB, image/jpeg)
2006-04-20 04:37 UTC, Badalian Slava
Details
Kernel panic on 19.04.05 part 2 (130.74 KB, image/jpeg)
2006-04-20 04:37 UTC, Badalian Slava
Details

Description Badalian Slava 2006-04-03 04:26:20 UTC
Most recent kernel where this bug did not occur: in 2.6.15.x not see. 

Distribution: Slackware 10.2 updated to current

Hardware Environment: 
00:00.0 Host bridge: Intel Corporation 915G/P/GV/GL/PL/910GL Processor to I/O 
Controller (rev 04)
00:1c.0 PCI bridge: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) PCI 
Express Port 1 (rev 03)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d3)
00:1f.0 ISA bridge: Intel Corporation 82801FB/FR (ICH6/ICH6R) LPC Interface 
Bridge (rev 03)
00:1f.1 IDE interface: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) IDE 
Controller (rev 03)
00:1f.2 IDE interface: Intel Corporation 82801FR/FRW (ICH6R/ICH6RW) SATA 
Controller (rev 03)
00:1f.3 SMBus: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) SMBus 
Controller (rev 03)
01:09.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] (rev 78)
01:0a.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] (rev 78)
01:0b.0 VGA compatible controller: ATI Technologies Inc 3D Rage II+ 215GTB 
[Mach64 GTB] (rev 9a)

Software Environment:

Problem Description:
now 2.6.16.1 and normaly work 2 days. in 2.6.16.0 - have 2 kernel panic for 1 
week. This last

Server used how Bridge + Traffic Shaper... have more than 10k tc rules (used 
hash tables) and over 4k iptables rules. TC used: htc red class, sfq qdisc, u32 
filter only
iptables use psydev module.

Channel have 60 mbs any time. 2k+ clients.

Steps to reproduce:
I try attach .config and screenshots (from photo)
Comment 1 Badalian Slava 2006-04-03 04:32:42 UTC
Created attachment 7747 [details]
Config for kernel
Comment 2 Badalian Slava 2006-04-03 04:46:06 UTC
Created attachment 7748 [details]
Screenshot
Comment 3 Badalian Slava 2006-04-03 04:46:34 UTC
Created attachment 7749 [details]
Screenshot 2
Comment 4 Badalian Slava 2006-04-03 04:47:01 UTC
Created attachment 7750 [details]
Screenshot 3
Comment 5 Badalian Slava 2006-04-05 12:59:14 UTC
2.6.16.1 - Again!
Comment 6 Stephen Hemminger 2006-04-05 16:55:55 UTC
Is this a regression (ie didn't occur with 2.6.15?). If so could you use
"git bisect" to identify the changeset.  Although I have an idea which one
it is.

The most interesting part of the backtrace has scrolled off the screen!
There is  config option in 2.6.17 to put multiple functions per line
columns and that might help see the offender. Could you try 2.6.17-rc1?
Comment 7 Badalian Slava 2006-04-06 05:55:25 UTC
i can't use 2.6.17-rc1. Its unstable.
This mashine is Main network QOS for Clients connections.

We now unuse brige and created simple forwarding. If this variant have kernel
panic too - i lost my work =)))

I Attach again screenshot to you. Its Kernel Panic in 2.6.16.1.
Comment 8 Badalian Slava 2006-04-06 06:03:25 UTC
Created attachment 7786 [details]
Screen Shot Kernel Panic on 2.6.16.1
Comment 9 Stephen Hemminger 2006-04-11 13:19:47 UTC
Could you attach your rules to the bug report.
I think the problem is in the netfilter rules being used.
Comment 10 Badalian Slava 2006-04-11 23:04:01 UTC
Ok... i attach 3 files

create_nodes.sh - create TC Main Channels
iptables_temp_rules.gz - use for iptables
tc_temp_rules.gz - use for tc

==========

create_nodes.sh Run once on startup

Logic of script:

1. Create iptables_temp_rules and tc_temp_rules from mysql; 
2. # iptables-restore < iptables_temp_rules
3. # sh tc_temp_rules

Script run every 30 min. 
Work time:
1 min create files
1-3 sec - attach rules to iptables
3-4 min do TC

Our External zone replaced by XXX.XXX in scripts.
Comment 11 Badalian Slava 2006-04-11 23:05:06 UTC
Created attachment 7838 [details]
create_nodes.sh
Comment 12 Badalian Slava 2006-04-11 23:05:26 UTC
Created attachment 7839 [details]
iptables_temp_rules.gz
Comment 13 Badalian Slava 2006-04-11 23:05:50 UTC
Created attachment 7840 [details]
tc_temp_rules.gz
Comment 14 Badalian Slava 2006-04-20 04:35:13 UTC
Yesterday and Today have kernel panic. kernel 2.6.16.1

Now update to 2.6.16.9.

I try belive that all be work normaly.

Its scrinshots of panic:
Comment 15 Badalian Slava 2006-04-20 04:36:02 UTC
Created attachment 7914 [details]
Kernel panic on 20.04.05 part 1
Comment 16 Badalian Slava 2006-04-20 04:36:28 UTC
Created attachment 7915 [details]
Kernel panic on 20.04.05 part 2
Comment 17 Badalian Slava 2006-04-20 04:37:05 UTC
Created attachment 7916 [details]
Kernel panic on 19.04.05 part 1
Comment 18 Badalian Slava 2006-04-20 04:37:41 UTC
Created attachment 7917 [details]
Kernel panic on 19.04.05 part 2
Comment 19 Badalian Slava 2006-05-02 04:25:31 UTC
New panic... system work... can't reboot and kill tc processes

Syslog:
...
May  1 13:35:12 new-bridge-second kernel: ip_tables: (C) 2000-2006 Netfilter
Core Team
May  1 21:51:59 new-bridge-second kernel: Unable to handle kernel NULL pointer
dereference at virtual address 00000004
May  1 21:51:59 new-bridge-second kernel:  printing eip:
May  1 21:51:59 new-bridge-second kernel: c10f873f
May  1 21:51:59 new-bridge-second kernel: *pde = 00000000
May  1 21:51:59 new-bridge-second kernel: Oops: 0000 [#1]
May  1 21:51:59 new-bridge-second kernel: SMP
May  1 21:51:59 new-bridge-second kernel: Modules linked in: sch_sfq
iptable_filter ip_tables x_tables sch_red sch_htb
May  1 21:51:59 new-bridge-second kernel: CPU:    0
May  1 21:51:59 new-bridge-second kernel: EIP:    0060:[<c10f873f>]    Not
tainted VLI
May  1 21:51:59 new-bridge-second kernel: EFLAGS: 00210246   (2.6.16.9 #1)
May  1 21:51:59 new-bridge-second kernel: EIP is at __rb_erase_color+0x89/0x1ad
May  1 21:51:59 new-bridge-second kernel: eax: 00000000   ebx: f7b42b54   ecx:
f7b42b54   edx: 00000000
May  1 21:51:59 new-bridge-second kernel: esi: f54d886c   edi: f7a6b468   ebp:
f7a6b468   esp: f6f87bbc
May  1 21:51:59 new-bridge-second kernel: ds: 007b   es: 007b   ss: 0068
May  1 21:51:59 new-bridge-second kernel: Process tc (pid: 30527,
threadinfo=f6f86000 task=c1ec7540)
May  1 21:51:59 new-bridge-second kernel: Stack: <0>c10036ee f54d8800 00000000
f54d886c 00000000 c10f895a 00000000 f7b42b54
May  1 21:51:59 new-bridge-second kernel:        f7a6b468 f54d8800 f54d886c
f7a6b000 f7a6b080 f89bb29e f54d8954 f7a6b468
May  1 21:51:59 new-bridge-second kernel:        00000004 000000d0 00000000
00010006 00010e4d f7a6b000 c11a89ce f7a6b000
May  1 21:51:59 new-bridge-second kernel: Call Trace:
May  1 21:51:59 new-bridge-second kernel:  [<c10036ee>] common_interrupt+0x1a/0x20
May  1 21:51:59 new-bridge-second kernel:  [<c10f895a>] rb_erase+0xf7/0x12d
May  1 21:51:59 new-bridge-second kernel:  [<f89bb29e>]
htb_destroy_class+0xec/0x15d [sch_htb]
May  1 21:51:59 new-bridge-second kernel:  [<c11a89ce>] tc_ctl_tclass+0x1b1/0x288
May  1 21:51:59 new-bridge-second kernel:  [<c11a38a6>]
rtnetlink_dump_ifinfo+0x6c/0x89
May  1 21:51:59 new-bridge-second kernel:  [<c11a3ef7>]
rtnetlink_rcv_msg+0x171/0x233
May  1 21:51:59 new-bridge-second kernel:  [<c11af1bf>] netlink_dump+0x94/0x1e2
May  1 21:51:59 new-bridge-second kernel:  [<c11a3d86>] rtnetlink_rcv_msg+0x0/0x233
May  1 21:51:59 new-bridge-second kernel:  [<c11af665>] netlink_rcv_skb+0x46/0xad
May  1 21:51:59 new-bridge-second kernel:  [<c11a3d86>] rtnetlink_rcv_msg+0x0/0x233
May  1 21:51:59 new-bridge-second kernel:  [<c11af70c>] netlink_run_queue+0x40/0xd0
May  1 21:51:59 new-bridge-second kernel:  [<c11a3d86>] rtnetlink_rcv_msg+0x0/0x233
May  1 21:51:59 new-bridge-second kernel:  [<c11a3d66>] rtnetlink_rcv+0x2e/0x4e
May  1 21:51:59 new-bridge-second kernel:  [<c11a3d86>] rtnetlink_rcv_msg+0x0/0x233
May  1 21:51:59 new-bridge-second kernel:  [<c11aef7c>] netlink_data_ready+0x60/0x62
May  1 21:51:59 new-bridge-second kernel:  [<c11ae10d>] netlink_sendskb+0x32/0x61
May  1 21:51:59 new-bridge-second kernel:  [<c11aec6d>] netlink_sendmsg+0x291/0x304
May  1 21:51:59 new-bridge-second kernel:  [<c118fc71>] sock_sendmsg+0xeb/0x10d
May  1 21:51:59 new-bridge-second kernel:  [<c118fc71>] sock_sendmsg+0xeb/0x10d
May  1 21:51:59 new-bridge-second kernel:  [<c1031b0e>]
autoremove_wake_function+0x0/0x57
May  1 21:51:59 new-bridge-second kernel:  [<c10fabb4>] copy_from_user+0x46/0x7e
May  1 21:51:59 new-bridge-second kernel:  [<c1196cd4>] verify_iovec+0x44/0x9e
May  1 21:51:59 new-bridge-second kernel:  [<c1191689>] sys_sendmsg+0x15a/0x272
May  1 21:51:59 new-bridge-second kernel:  [<c10407ee>] filemap_nopage+0x30d/0x38a
May  1 21:51:59 new-bridge-second kernel:  [<c104db04>] do_no_page+0x219/0x278
May  1 21:51:59 new-bridge-second kernel:  [<c102716b>] update_wall_time+0x10/0x3b
May  1 21:51:59 new-bridge-second kernel:  [<c10fabb4>] copy_from_user+0x46/0x7e
May  1 21:51:59 new-bridge-second kernel:  [<c1191c5b>] sys_socketcall+0x28d/0x294
May  1 21:51:59 new-bridge-second kernel:  [<c1002d21>] syscall_call+0x7/0xb
May  1 21:51:59 new-bridge-second kernel: Code: 8b 48 04 89 c2 85 c9 75 ad c7 40
04 01 00 00 00 c7 43 04 00 00 00 00 89 7c 24 04 89 1c 24 e8 6b fe ff ff 8b 53 0c
eb 8e 8b 53 08 <8b> 72 04 85 f6 0f 84 82 00 00 00 8b 4a 0c 85 c9 74 0a 83 79 04
Comment 20 Badalian Slava 2006-07-10 09:27:54 UTC
2.6.17.4. Reboot every 3-8 days (sysctl - reboot on panic). Kernel panic on
delete tc filter.
Comment 21 Badalian Slava 2006-07-17 01:16:38 UTC
BUG: unable to handle kernel NULL pointer dereference at virtual address 00000004
 printing eip:
c021dd97
*pde = 00000000
Oops: 0000 [#1]
SMP
Modules linked in: sch_sfq cls_u32 sch_red sch_htb iptable_filter ip_tables
x_tables ehci_hcd uhci_hcd usbcore
CPU:    0
EIP:    0060:[<c021dd97>]    Not tainted VLI
EFLAGS: 00010246   (2.6.17-gentoo #1)
EIP is at __rb_erase_color+0x89/0x1ad
eax: 00000000   ebx: f574c954   ecx: f574c954   edx: 00000000
esi: f4a6126c   edi: f74d6c68   ebp: f74d6c68   esp: e1b31bb8
ds: 007b   es: 007b   ss: 0068
Process tc (pid: 13132, threadinfo=e1b30000 task=c2157a50)
Stack: 0000007b 0000007b 00000000 f4a6126c 00000000 c021dfb2 00000000 f574c954
       f74d6c68 f4a61200 f4a6126c f74d6800 f74d6880 f89032d4 f4a61354 f74d6c68
       00000004 000000d0 00000000 00010006 0001026a f74d6800 c03199cb f74d6800
Call Trace:
 <c021dfb2> rb_erase+0xf7/0x12d  <f89032d4> htb_destroy_class+0xec/0x15d [sch_htb]
 <c03199cb> tc_ctl_tclass+0x1b1/0x288  <c0315036> rtnetlink_dump_ifinfo+0x6c/0x89
 <c03156c7> rtnetlink_rcv_msg+0x171/0x233  <c031f132> netlink_dump+0x94/0x1b8
 <c0315556> rtnetlink_rcv_msg+0x0/0x233  <c031f5c2> netlink_rcv_skb+0x46/0xad
 <c0315556> rtnetlink_rcv_msg+0x0/0x233  <c031f675> netlink_run_queue+0x4c/0xa6
 <c0315556> rtnetlink_rcv_msg+0x0/0x233  <c0315539> rtnetlink_rcv+0x33/0x50
 <c0315556> rtnetlink_rcv_msg+0x0/0x233  <c031eeb7> netlink_data_ready+0x60/0x62
 <c031e0e0> netlink_sendskb+0x32/0x61  <c031eb80> netlink_sendmsg+0x238/0x2d3
 <c030117b> sock_sendmsg+0xeb/0x10d  <c0126879> update_wall_time_one_tick+0x6/0x8f
 <c012693d> update_wall_time+0x10/0x3b  <c0131036> autoremove_wake_function+0x0/0x57
 <c02201ec> copy_from_user+0x46/0x7e  <c0308698> verify_iovec+0x44/0x9e
 <c0302d5a> sys_sendmsg+0x162/0x285  <c01459e6> __alloc_pages+0x56/0x308
 <c0302ad6> sys_setsockopt+0xbb/0xc4  <c02201ec> copy_from_user+0x46/0x7e
 <c0303349> sys_socketcall+0x28d/0x294  <c0102c1f> sysenter_past_esp+0x54/0x75
Code: 8b 48 04 89 c2 85 c9 75 ad c7 40 04 01 00 00 00 c7 43 04 00 00 00 00 89 7c
24 04 89 1c 24 e8 6b fe ff ff 8b 53 0c eb 8e 8b 53 08 <8b> 72 04 85 f6 0f 84 82
00 00 00 8b 4a 0c 85 c9 74 0a 83 79 04
EIP: [<c021dd97>] __rb_erase_color+0x89/0x1ad SS:ESP 0068:e1b31bb8
Comment 22 Natalie Protasevich 2007-11-07 19:13:16 UTC
Slava, is this problem still there with recent kernel? 
Thanks.
Comment 23 Badalian Slava 2007-11-07 22:54:15 UTC
i replace scripts logic.

now i use "tc -b" and apply clear list of rules every time.

In my memory
2.6.20 - bug "delete tc FILTER" - fixed, but have bug in "tc delete HTB CLASS".

Now i not have problems with TC because i don't delete any childs... i delete root and recreate all (1-3 seconds to create over 10k rules)...

Thanks all.
Comment 24 Badalian Slava 2008-01-28 22:53:35 UTC
I think need close... last kernels work fine... thanks...

Note You need to log in before you can comment on or make changes to this bug.