Bug 6681

Summary: TC crash and rule freeze
Product: Networking Reporter: Badalian Slava (slavon.net)
Component: IPV4Assignee: Stephen Hemminger (stephen)
Status: CLOSED CODE_FIX    
Severity: normal CC: protasnb
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.16-gentoo-r6 Subsystem:
Regression: --- Bisected commit-id:
Bug Depends on: 6322    
Bug Blocks:    
Attachments: change htb to use explicit RCU

Description Badalian Slava 2006-06-12 10:29:15 UTC
Most recent kernel where this bug did not occur:
2.6.16-gentoo-r6
Distribution:
Gentoo
Hardware Environment:
00:00.0 Host bridge: Intel Corporation 82865G/PE/P DRAM Controller/Host-Hub
Interface (rev 02)
00:02.0 VGA compatible controller: Intel Corporation 82865G Integrated Graphics
Controller (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI
Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI
Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI
Controller #3 (rev 02)
00:1d.3 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI
Controller #4 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB2 EHCI
Controller (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev c2)
00:1f.0 ISA bridge: Intel Corporation 82801EB/ER (ICH5/ICH5R) LPC Interface
Bridge (rev 02)
00:1f.1 IDE interface: Intel Corporation 82801EB/ER (ICH5/ICH5R) IDE Controller
(rev 02)
01:09.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro 100] (rev 05)
01:0a.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro 100] (rev 05)
01:0b.0 SCSI storage controller: Adaptec ASC-39320 U320 (rev 03)
01:0b.1 SCSI storage controller: Adaptec ASC-39320 U320 (rev 03)
Software Environment:
sys-apps/iproute2-2.6.16.20060323
Problem Description:
#cat dmeseg
Unable to handle kernel NULL pointer dereference at virtual address 0000000c
printing eip:
c0217c26
*pde = 00000000
Oops: 0000 [#1]
SMP
Modules linked in: sch_sfq cls_u32 sch_red sch_htb iptable_filter ip_tables
x_tables uhci_hcd ehci_hcd usbcore
CPU:    0
EIP:    0060:[<c0217c26>]    Not tainted VLI
EFLAGS: 00010287   (2.6.16-gentoo-r6 #2)
EIP is at __rb_erase_color+0x94/0x1ad
eax: f55f4954   ebx: f724bb54   ecx: f724bb54   edx: 00000000
esi: 00000000   edi: f7679468   ebp: f7679468   esp: f676bbbc
ds: 007b   es: 007b   ss: 0068
Process tc (pid: 24294, threadinfo=f676a000 task=f7d7da90)
Stack: <0>f724bb54 f7679468 00000000 e6bbf154 00000000 c0217e36 00000000 f724bb54
f7679468 e6bbf000 e6bbf06c f7679000 f7679080 f8903366 e6bbf154 f7679468
00000004 000000d0 00000000 00010006 000103c9 f7679000 c0311ffa f7679000
Call Trace:
[<c0217e36>] rb_erase+0xf7/0x12d
[<f8903366>] htb_destroy_class+0xec/0x15d [sch_htb]
[<c0311ffa>] tc_ctl_tclass+0x1b1/0x288
[<c030d69e>] rtnetlink_dump_ifinfo+0x6c/0x89
[<c030dcef>] rtnetlink_rcv_msg+0x171/0x233
[<c031759f>] netlink_dump+0x94/0x1e2
[<c030db7e>] rtnetlink_rcv_msg+0x0/0x233
[<c0317a45>] netlink_rcv_skb+0x46/0xad
[<c030db7e>] rtnetlink_rcv_msg+0x0/0x233
[<c0317aec>] netlink_run_queue+0x40/0xd0
[<c030db7e>] rtnetlink_rcv_msg+0x0/0x233
[<c030db5e>] rtnetlink_rcv+0x2e/0x4e
[<c030db7e>] rtnetlink_rcv_msg+0x0/0x233
[<c031735c>] netlink_data_ready+0x60/0x62
[<c03164ed>] netlink_sendskb+0x32/0x61
[<c031704d>] netlink_sendmsg+0x291/0x304
[<c02f9b0d>] sock_sendmsg+0xeb/0x10d
[<c02f9b0d>] sock_sendmsg+0xeb/0x10d
[<c0131fa6>] autoremove_wake_function+0x0/0x57
[<c021a084>] copy_from_user+0x46/0x7e
[<c0300ae4>] verify_iovec+0x44/0x9e
[<c02fb525>] sys_sendmsg+0x15a/0x272
[<c0140ca6>] filemap_nopage+0x30d/0x38a
[<c0152e83>] page_add_file_rmap+0x2a/0x2e
[<c014e014>] do_no_page+0x219/0x278
[<c021a084>] copy_from_user+0x46/0x7e
[<c02fbaf7>] sys_socketcall+0x28d/0x294
[<c0102ca7>] sysenter_past_esp+0x54/0x75
Code: 04 01 00 00 00 c7 43 04 00 00 00 00 89 7c 24 04 89 1c 24 e8 6b fe ff ff 8b
53 0c eb 8e 8b 53 08 8b 72 04 85 f6 0f 84 82 00 00 00  <8b> 4a 0c 85 c9 74 0a 83
79 04 01 0f 85 00 01 00 00 8b 72 08 85

Steps to reproduce:
Comment 1 Andrew Morton 2006-06-19 15:17:55 UTC
bugme-daemon@bugzilla.kernel.org wrote:
>
> http://bugzilla.kernel.org/show_bug.cgi?id=6681
> 
>            Summary: TC crash and rule freeze
>     Kernel Version: 2.6.16-gentoo-r6
>             Status: NEW
>           Severity: normal
>              Owner: shemminger@osdl.org
>          Submitter: slavon@bigtelecom.ru
> 
> 
> Most recent kernel where this bug did not occur:
> 2.6.16-gentoo-r6
> Distribution:
> Gentoo
> Hardware Environment:
> 00:00.0 Host bridge: Intel Corporation 82865G/PE/P DRAM Controller/Host-Hub
> Interface (rev 02)
> 00:02.0 VGA compatible controller: Intel Corporation 82865G Integrated Graphics
> Controller (rev 02)
> 00:1d.0 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI
> Controller #1 (rev 02)
> 00:1d.1 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI
> Controller #2 (rev 02)
> 00:1d.2 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI
> Controller #3 (rev 02)
> 00:1d.3 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI
> Controller #4 (rev 02)
> 00:1d.7 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB2 EHCI
> Controller (rev 02)
> 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev c2)
> 00:1f.0 ISA bridge: Intel Corporation 82801EB/ER (ICH5/ICH5R) LPC Interface
> Bridge (rev 02)
> 00:1f.1 IDE interface: Intel Corporation 82801EB/ER (ICH5/ICH5R) IDE Controller
> (rev 02)
> 01:09.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro 100] (rev 05)
> 01:0a.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro 100] (rev 05)
> 01:0b.0 SCSI storage controller: Adaptec ASC-39320 U320 (rev 03)
> 01:0b.1 SCSI storage controller: Adaptec ASC-39320 U320 (rev 03)
> Software Environment:
> sys-apps/iproute2-2.6.16.20060323
> Problem Description:
> #cat dmeseg
> Unable to handle kernel NULL pointer dereference at virtual address 0000000c
> printing eip:
> c0217c26
> *pde = 00000000
> Oops: 0000 [#1]
> SMP
> Modules linked in: sch_sfq cls_u32 sch_red sch_htb iptable_filter ip_tables
> x_tables uhci_hcd ehci_hcd usbcore
> CPU:    0
> EIP:    0060:[<c0217c26>]    Not tainted VLI
> EFLAGS: 00010287   (2.6.16-gentoo-r6 #2)
> EIP is at __rb_erase_color+0x94/0x1ad
> eax: f55f4954   ebx: f724bb54   ecx: f724bb54   edx: 00000000
> esi: 00000000   edi: f7679468   ebp: f7679468   esp: f676bbbc
> ds: 007b   es: 007b   ss: 0068
> Process tc (pid: 24294, threadinfo=f676a000 task=f7d7da90)
> Stack: <0>f724bb54 f7679468 00000000 e6bbf154 00000000 c0217e36 00000000 f724bb54
> f7679468 e6bbf000 e6bbf06c f7679000 f7679080 f8903366 e6bbf154 f7679468
> 00000004 000000d0 00000000 00010006 000103c9 f7679000 c0311ffa f7679000
> Call Trace:
> [<c0217e36>] rb_erase+0xf7/0x12d
> [<f8903366>] htb_destroy_class+0xec/0x15d [sch_htb]
> [<c0311ffa>] tc_ctl_tclass+0x1b1/0x288
> [<c030d69e>] rtnetlink_dump_ifinfo+0x6c/0x89
> [<c030dcef>] rtnetlink_rcv_msg+0x171/0x233
> [<c031759f>] netlink_dump+0x94/0x1e2
> [<c030db7e>] rtnetlink_rcv_msg+0x0/0x233
> [<c0317a45>] netlink_rcv_skb+0x46/0xad
> [<c030db7e>] rtnetlink_rcv_msg+0x0/0x233
> [<c0317aec>] netlink_run_queue+0x40/0xd0
> [<c030db7e>] rtnetlink_rcv_msg+0x0/0x233
> [<c030db5e>] rtnetlink_rcv+0x2e/0x4e
> [<c030db7e>] rtnetlink_rcv_msg+0x0/0x233
> [<c031735c>] netlink_data_ready+0x60/0x62
> [<c03164ed>] netlink_sendskb+0x32/0x61
> [<c031704d>] netlink_sendmsg+0x291/0x304
> [<c02f9b0d>] sock_sendmsg+0xeb/0x10d
> [<c02f9b0d>] sock_sendmsg+0xeb/0x10d
> [<c0131fa6>] autoremove_wake_function+0x0/0x57
> [<c021a084>] copy_from_user+0x46/0x7e
> [<c0300ae4>] verify_iovec+0x44/0x9e
> [<c02fb525>] sys_sendmsg+0x15a/0x272
> [<c0140ca6>] filemap_nopage+0x30d/0x38a
> [<c0152e83>] page_add_file_rmap+0x2a/0x2e
> [<c014e014>] do_no_page+0x219/0x278
> [<c021a084>] copy_from_user+0x46/0x7e
> [<c02fbaf7>] sys_socketcall+0x28d/0x294
> [<c0102ca7>] sysenter_past_esp+0x54/0x75
> Code: 04 01 00 00 00 c7 43 04 00 00 00 00 89 7c 24 04 89 1c 24 e8 6b fe ff ff 8b
> 53 0c eb 8e 8b 53 08 8b 72 04 85 f6 0f 84 82 00 00 00  <8b> 4a 0c 85 c9 74 0a 83
> 79 04 01 0f 85 00 01 00 00 8b 72 08 85
> 

It crashed in net/sched/somewhere.

Comment 2 Stephen Hemminger 2006-07-17 10:51:34 UTC
Please send the commands that setup this configuration before the crash.
Obvioush HTB was configured, but then what?
Comment 3 Badalian Slava 2006-07-17 13:59:32 UTC
see
http://bugzilla.kernel.org/show_bug.cgi?id=6322

create_nodes.sh - create main channels
tc_temp_rules.gz - cliens reconfiguration...

create_nodes.sh - run once on startup
tc_temp_rules.gz - generating and run one time in hour.

crash (and oops) in command like
tc filter del dev ethX protocol ip parent 1:0 prio 5 handle X:XX:XX u32

Anytime on different rules but all time its "tc filter del". Uptime before bug
release 1-10 days. Can't find system to duplicate bug... its random. =(

process "tc filter del" freeze and i can't kill it! (-9 not work)... can reboot
computer... init script freeze.... only work "reboot -n -f"
Comment 4 Stephen Hemminger 2006-08-01 14:11:27 UTC
Created attachment 8670 [details]
change htb to use explicit RCU

This patch changes HTB to use RCU. It really is overkill, since the whole qdisc
is covered by a device lock and RTNL mutex. But it is a possiblity to test.
Comment 5 Stephen Hemminger 2006-08-02 15:45:10 UTC
Code submitted for 2.6.19, the rbtree wasn't being initilized.
Comment 6 Stephen Hemminger 2007-02-23 15:35:07 UTC
This should be fixed by 2.6.19 or later. Reopen if you can still
reproduce the problem
Comment 7 Badalian Slava 2007-02-24 04:19:47 UTC
Hello
use 2.6.18... some time (1-2 in week) have reboot (have sysctl param to reboot
on oops and panic)... 
try load 2.6.20... have panic on delete qdisc any run (PC have over 300mbs
traffic... use 2*e1000 ethernet controllers)... if i switch off link from
ethernet - all normal work! On other PC where not have traffic all normal run... =((

i can't get kernel panic code... i need to mashine work all time...
Kernel may dump panic to email, HDD, serial?

I think if tc have system some have iptables-restore to permoment add all rules
to flashed tc it's will very good. =)

Thanks
Comment 8 Natalie Protasevich 2007-07-08 11:41:00 UTC
Any update on this?
Slava, have you tried recent kernels, there were some patches that went in recently, can you try 2.6.22-tc7?
Thanks.
Comment 9 Stephen Hemminger 2007-12-10 17:02:01 UTC
*** Bug 8971 has been marked as a duplicate of this bug. ***
Comment 10 Badalian Slava 2008-01-28 22:52:31 UTC
In last kernels all work fine for me... i think need close this bugreport... thanks all...