Most recent kernel where this bug did not occur: in 2.6.15.x not see. Distribution: Slackware 10.2 updated to current Hardware Environment: 00:00.0 Host bridge: Intel Corporation 915G/P/GV/GL/PL/910GL Processor to I/O Controller (rev 04) 00:1c.0 PCI bridge: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) PCI Express Port 1 (rev 03) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d3) 00:1f.0 ISA bridge: Intel Corporation 82801FB/FR (ICH6/ICH6R) LPC Interface Bridge (rev 03) 00:1f.1 IDE interface: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) IDE Controller (rev 03) 00:1f.2 IDE interface: Intel Corporation 82801FR/FRW (ICH6R/ICH6RW) SATA Controller (rev 03) 00:1f.3 SMBus: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) SMBus Controller (rev 03) 01:09.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] (rev 78) 01:0a.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] (rev 78) 01:0b.0 VGA compatible controller: ATI Technologies Inc 3D Rage II+ 215GTB [Mach64 GTB] (rev 9a) Software Environment: Problem Description: now 2.6.16.1 and normaly work 2 days. in 2.6.16.0 - have 2 kernel panic for 1 week. This last Server used how Bridge + Traffic Shaper... have more than 10k tc rules (used hash tables) and over 4k iptables rules. TC used: htc red class, sfq qdisc, u32 filter only iptables use psydev module. Channel have 60 mbs any time. 2k+ clients. Steps to reproduce: I try attach .config and screenshots (from photo)
Created attachment 7747 [details] Config for kernel
Created attachment 7748 [details] Screenshot
Created attachment 7749 [details] Screenshot 2
Created attachment 7750 [details] Screenshot 3
2.6.16.1 - Again!
Is this a regression (ie didn't occur with 2.6.15?). If so could you use "git bisect" to identify the changeset. Although I have an idea which one it is. The most interesting part of the backtrace has scrolled off the screen! There is config option in 2.6.17 to put multiple functions per line columns and that might help see the offender. Could you try 2.6.17-rc1?
i can't use 2.6.17-rc1. Its unstable. This mashine is Main network QOS for Clients connections. We now unuse brige and created simple forwarding. If this variant have kernel panic too - i lost my work =))) I Attach again screenshot to you. Its Kernel Panic in 2.6.16.1.
Created attachment 7786 [details] Screen Shot Kernel Panic on 2.6.16.1
Could you attach your rules to the bug report. I think the problem is in the netfilter rules being used.
Ok... i attach 3 files create_nodes.sh - create TC Main Channels iptables_temp_rules.gz - use for iptables tc_temp_rules.gz - use for tc ========== create_nodes.sh Run once on startup Logic of script: 1. Create iptables_temp_rules and tc_temp_rules from mysql; 2. # iptables-restore < iptables_temp_rules 3. # sh tc_temp_rules Script run every 30 min. Work time: 1 min create files 1-3 sec - attach rules to iptables 3-4 min do TC Our External zone replaced by XXX.XXX in scripts.
Created attachment 7838 [details] create_nodes.sh
Created attachment 7839 [details] iptables_temp_rules.gz
Created attachment 7840 [details] tc_temp_rules.gz
Yesterday and Today have kernel panic. kernel 2.6.16.1 Now update to 2.6.16.9. I try belive that all be work normaly. Its scrinshots of panic:
Created attachment 7914 [details] Kernel panic on 20.04.05 part 1
Created attachment 7915 [details] Kernel panic on 20.04.05 part 2
Created attachment 7916 [details] Kernel panic on 19.04.05 part 1
Created attachment 7917 [details] Kernel panic on 19.04.05 part 2
New panic... system work... can't reboot and kill tc processes Syslog: ... May 1 13:35:12 new-bridge-second kernel: ip_tables: (C) 2000-2006 Netfilter Core Team May 1 21:51:59 new-bridge-second kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000004 May 1 21:51:59 new-bridge-second kernel: printing eip: May 1 21:51:59 new-bridge-second kernel: c10f873f May 1 21:51:59 new-bridge-second kernel: *pde = 00000000 May 1 21:51:59 new-bridge-second kernel: Oops: 0000 [#1] May 1 21:51:59 new-bridge-second kernel: SMP May 1 21:51:59 new-bridge-second kernel: Modules linked in: sch_sfq iptable_filter ip_tables x_tables sch_red sch_htb May 1 21:51:59 new-bridge-second kernel: CPU: 0 May 1 21:51:59 new-bridge-second kernel: EIP: 0060:[<c10f873f>] Not tainted VLI May 1 21:51:59 new-bridge-second kernel: EFLAGS: 00210246 (2.6.16.9 #1) May 1 21:51:59 new-bridge-second kernel: EIP is at __rb_erase_color+0x89/0x1ad May 1 21:51:59 new-bridge-second kernel: eax: 00000000 ebx: f7b42b54 ecx: f7b42b54 edx: 00000000 May 1 21:51:59 new-bridge-second kernel: esi: f54d886c edi: f7a6b468 ebp: f7a6b468 esp: f6f87bbc May 1 21:51:59 new-bridge-second kernel: ds: 007b es: 007b ss: 0068 May 1 21:51:59 new-bridge-second kernel: Process tc (pid: 30527, threadinfo=f6f86000 task=c1ec7540) May 1 21:51:59 new-bridge-second kernel: Stack: <0>c10036ee f54d8800 00000000 f54d886c 00000000 c10f895a 00000000 f7b42b54 May 1 21:51:59 new-bridge-second kernel: f7a6b468 f54d8800 f54d886c f7a6b000 f7a6b080 f89bb29e f54d8954 f7a6b468 May 1 21:51:59 new-bridge-second kernel: 00000004 000000d0 00000000 00010006 00010e4d f7a6b000 c11a89ce f7a6b000 May 1 21:51:59 new-bridge-second kernel: Call Trace: May 1 21:51:59 new-bridge-second kernel: [<c10036ee>] common_interrupt+0x1a/0x20 May 1 21:51:59 new-bridge-second kernel: [<c10f895a>] rb_erase+0xf7/0x12d May 1 21:51:59 new-bridge-second kernel: [<f89bb29e>] htb_destroy_class+0xec/0x15d [sch_htb] May 1 21:51:59 new-bridge-second kernel: [<c11a89ce>] tc_ctl_tclass+0x1b1/0x288 May 1 21:51:59 new-bridge-second kernel: [<c11a38a6>] rtnetlink_dump_ifinfo+0x6c/0x89 May 1 21:51:59 new-bridge-second kernel: [<c11a3ef7>] rtnetlink_rcv_msg+0x171/0x233 May 1 21:51:59 new-bridge-second kernel: [<c11af1bf>] netlink_dump+0x94/0x1e2 May 1 21:51:59 new-bridge-second kernel: [<c11a3d86>] rtnetlink_rcv_msg+0x0/0x233 May 1 21:51:59 new-bridge-second kernel: [<c11af665>] netlink_rcv_skb+0x46/0xad May 1 21:51:59 new-bridge-second kernel: [<c11a3d86>] rtnetlink_rcv_msg+0x0/0x233 May 1 21:51:59 new-bridge-second kernel: [<c11af70c>] netlink_run_queue+0x40/0xd0 May 1 21:51:59 new-bridge-second kernel: [<c11a3d86>] rtnetlink_rcv_msg+0x0/0x233 May 1 21:51:59 new-bridge-second kernel: [<c11a3d66>] rtnetlink_rcv+0x2e/0x4e May 1 21:51:59 new-bridge-second kernel: [<c11a3d86>] rtnetlink_rcv_msg+0x0/0x233 May 1 21:51:59 new-bridge-second kernel: [<c11aef7c>] netlink_data_ready+0x60/0x62 May 1 21:51:59 new-bridge-second kernel: [<c11ae10d>] netlink_sendskb+0x32/0x61 May 1 21:51:59 new-bridge-second kernel: [<c11aec6d>] netlink_sendmsg+0x291/0x304 May 1 21:51:59 new-bridge-second kernel: [<c118fc71>] sock_sendmsg+0xeb/0x10d May 1 21:51:59 new-bridge-second kernel: [<c118fc71>] sock_sendmsg+0xeb/0x10d May 1 21:51:59 new-bridge-second kernel: [<c1031b0e>] autoremove_wake_function+0x0/0x57 May 1 21:51:59 new-bridge-second kernel: [<c10fabb4>] copy_from_user+0x46/0x7e May 1 21:51:59 new-bridge-second kernel: [<c1196cd4>] verify_iovec+0x44/0x9e May 1 21:51:59 new-bridge-second kernel: [<c1191689>] sys_sendmsg+0x15a/0x272 May 1 21:51:59 new-bridge-second kernel: [<c10407ee>] filemap_nopage+0x30d/0x38a May 1 21:51:59 new-bridge-second kernel: [<c104db04>] do_no_page+0x219/0x278 May 1 21:51:59 new-bridge-second kernel: [<c102716b>] update_wall_time+0x10/0x3b May 1 21:51:59 new-bridge-second kernel: [<c10fabb4>] copy_from_user+0x46/0x7e May 1 21:51:59 new-bridge-second kernel: [<c1191c5b>] sys_socketcall+0x28d/0x294 May 1 21:51:59 new-bridge-second kernel: [<c1002d21>] syscall_call+0x7/0xb May 1 21:51:59 new-bridge-second kernel: Code: 8b 48 04 89 c2 85 c9 75 ad c7 40 04 01 00 00 00 c7 43 04 00 00 00 00 89 7c 24 04 89 1c 24 e8 6b fe ff ff 8b 53 0c eb 8e 8b 53 08 <8b> 72 04 85 f6 0f 84 82 00 00 00 8b 4a 0c 85 c9 74 0a 83 79 04
2.6.17.4. Reboot every 3-8 days (sysctl - reboot on panic). Kernel panic on delete tc filter.
BUG: unable to handle kernel NULL pointer dereference at virtual address 00000004 printing eip: c021dd97 *pde = 00000000 Oops: 0000 [#1] SMP Modules linked in: sch_sfq cls_u32 sch_red sch_htb iptable_filter ip_tables x_tables ehci_hcd uhci_hcd usbcore CPU: 0 EIP: 0060:[<c021dd97>] Not tainted VLI EFLAGS: 00010246 (2.6.17-gentoo #1) EIP is at __rb_erase_color+0x89/0x1ad eax: 00000000 ebx: f574c954 ecx: f574c954 edx: 00000000 esi: f4a6126c edi: f74d6c68 ebp: f74d6c68 esp: e1b31bb8 ds: 007b es: 007b ss: 0068 Process tc (pid: 13132, threadinfo=e1b30000 task=c2157a50) Stack: 0000007b 0000007b 00000000 f4a6126c 00000000 c021dfb2 00000000 f574c954 f74d6c68 f4a61200 f4a6126c f74d6800 f74d6880 f89032d4 f4a61354 f74d6c68 00000004 000000d0 00000000 00010006 0001026a f74d6800 c03199cb f74d6800 Call Trace: <c021dfb2> rb_erase+0xf7/0x12d <f89032d4> htb_destroy_class+0xec/0x15d [sch_htb] <c03199cb> tc_ctl_tclass+0x1b1/0x288 <c0315036> rtnetlink_dump_ifinfo+0x6c/0x89 <c03156c7> rtnetlink_rcv_msg+0x171/0x233 <c031f132> netlink_dump+0x94/0x1b8 <c0315556> rtnetlink_rcv_msg+0x0/0x233 <c031f5c2> netlink_rcv_skb+0x46/0xad <c0315556> rtnetlink_rcv_msg+0x0/0x233 <c031f675> netlink_run_queue+0x4c/0xa6 <c0315556> rtnetlink_rcv_msg+0x0/0x233 <c0315539> rtnetlink_rcv+0x33/0x50 <c0315556> rtnetlink_rcv_msg+0x0/0x233 <c031eeb7> netlink_data_ready+0x60/0x62 <c031e0e0> netlink_sendskb+0x32/0x61 <c031eb80> netlink_sendmsg+0x238/0x2d3 <c030117b> sock_sendmsg+0xeb/0x10d <c0126879> update_wall_time_one_tick+0x6/0x8f <c012693d> update_wall_time+0x10/0x3b <c0131036> autoremove_wake_function+0x0/0x57 <c02201ec> copy_from_user+0x46/0x7e <c0308698> verify_iovec+0x44/0x9e <c0302d5a> sys_sendmsg+0x162/0x285 <c01459e6> __alloc_pages+0x56/0x308 <c0302ad6> sys_setsockopt+0xbb/0xc4 <c02201ec> copy_from_user+0x46/0x7e <c0303349> sys_socketcall+0x28d/0x294 <c0102c1f> sysenter_past_esp+0x54/0x75 Code: 8b 48 04 89 c2 85 c9 75 ad c7 40 04 01 00 00 00 c7 43 04 00 00 00 00 89 7c 24 04 89 1c 24 e8 6b fe ff ff 8b 53 0c eb 8e 8b 53 08 <8b> 72 04 85 f6 0f 84 82 00 00 00 8b 4a 0c 85 c9 74 0a 83 79 04 EIP: [<c021dd97>] __rb_erase_color+0x89/0x1ad SS:ESP 0068:e1b31bb8
Slava, is this problem still there with recent kernel? Thanks.
i replace scripts logic. now i use "tc -b" and apply clear list of rules every time. In my memory 2.6.20 - bug "delete tc FILTER" - fixed, but have bug in "tc delete HTB CLASS". Now i not have problems with TC because i don't delete any childs... i delete root and recreate all (1-3 seconds to create over 10k rules)... Thanks all.
I think need close... last kernels work fine... thanks...