Bug 10323

Summary: panic using bridging on linus kernel 2.6.25-rc6
Product: Networking Reporter: Andy Gospodarek (andy)
Component: IPV4Assignee: Stephen Hemminger (stephen)
Status: CLOSED CODE_FIX    
Severity: normal CC: bunk, jm, nhorman, rjw
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.25-rc6 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 9832    

Description Andy Gospodarek 2008-03-25 11:40:30 UTC
Reproducer:
1.  Boot latest linus git tree (2.6.24-rc6).
2.  Run  following commands:

brctl addbr br0
brctl stp br0 off
brctl addif br0 eth0
brcth addif br0 eth1
ifconfig br0 up
ifconfig eth0 up
ifconfig eth1 up

3.  Try and pass traffic between hosts connected to eth0 and eth1.
4.  Observe panic:

UG: unable to handle kernel NULL pointer dereference at 0000000000000078
IP: [<ffffffff8123b5d8>] __ip_route_output_key+0x187/0x93d
PGD 0
Oops: 0000 [1] SMP DEBUG_PAGEALLOC
CPU 0
Modules linked in: bridge rfcomm l2cap bluetooth autofs4 sunrpc ipt_REJECT xt_tcpudp nf_conntrack_ipv4 xt_state nf_conntrack iptable_filter ip_tables ip6table_filter ip6_tables x_tables cpufreq_o]
Pid: 0, comm: swapper Not tainted 2.6.25-rc6.sky2.testing #3
RIP: 0010:[<ffffffff8123b5d8>]  [<ffffffff8123b5d8>] __ip_route_output_key+0x187/0x93d
RSP: 0018:ffffffff815885e0  EFLAGS: 00010202
RAX: ffffffff814c1fd8 RBX: 0000000000000000 RCX: 0000000000000002
RDX: ffffffff813d57a0 RSI: 0000000000000001 RDI: 0000000000000000
RBP: ffffffff815886a0 R08: 0000000000000001 R09: 0000000000000000
R10: ffffffff8123b451 R11: ffff81006995eb40 R12: 0000000000000000
R13: ffffffff815887c8 R14: 0000000000000000 R15: ffffffff81588770
FS:  0000000000000000(0000) GS:ffffffff81416000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000078 CR3: 0000000000201000 CR4: 00000000000006a0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffffffff814c0000, task ffffffff813d57a0)
Stack:  ffffffff815887c8 0000000000000002 0000000000000001 ffff81007c6ac560
 0000000000000002 0000000000000000 ffff8100010083b8 ffffffff81588618
 ffffffff81588618 0000000181588658 0000000000000002 0000000000000246
Call Trace:
 <IRQ>  [<ffffffff812635b7>] ? icmp_send+0x14f/0x5a4
 [<ffffffff8126373a>] icmp_send+0x2d2/0x5a4
 [<ffffffff8832907f>] :ipt_REJECT:reject_tg+0x45/0x396
 [<ffffffff882ff7a5>] :ip_tables:ipt_do_table+0x4b3/0x556
 [<ffffffff810a3232>] ? check_object+0x159/0x209
 [<ffffffff88305081>] :iptable_filter:ipt_hook+0x18/0x1b
 [<ffffffff81238324>] nf_iterate+0x41/0x81
 [<ffffffff883917bb>] ? :bridge:br_nf_forward_finish+0x0/0x109
 [<ffffffff812383e5>] nf_hook_slow+0x81/0xfb
 [<ffffffff883917bb>] ? :bridge:br_nf_forward_finish+0x0/0x109
 [<ffffffff88391ab1>] :bridge:br_nf_forward_ip+0x1ed/0x214
 [<ffffffff81238324>] nf_iterate+0x41/0x81
 [<ffffffff8838cfb9>] ? :bridge:br_forward_finish+0x0/0x53
 [<ffffffff812383e5>] nf_hook_slow+0x81/0xfb
 [<ffffffff8838cfb9>] ? :bridge:br_forward_finish+0x0/0x53
 [<ffffffff8838d06a>] :bridge:__br_forward+0x5e/0x6e
 [<ffffffff8838d093>] :bridge:br_forward+0x19/0x25
 [<ffffffff8838daf1>] :bridge:br_handle_frame_finish+0x129/0x14d
 [<ffffffff883921ef>] :bridge:br_nf_pre_routing_finish+0x353/0x364
 [<ffffffff88391e9c>] ? :bridge:br_nf_pre_routing_finish+0x0/0x364
 [<ffffffff8123844e>] ? nf_hook_slow+0xea/0xfb
 [<ffffffff88391e9c>] ? :bridge:br_nf_pre_routing_finish+0x0/0x364
 [<ffffffff8839289e>] :bridge:br_nf_pre_routing+0x69e/0x6bd
 [<ffffffff81238324>] nf_iterate+0x41/0x81
 [<ffffffff8838d9c8>] ? :bridge:br_handle_frame_finish+0x0/0x14d
 [<ffffffff812383e5>] nf_hook_slow+0x81/0xfb
 [<ffffffff8838d9c8>] ? :bridge:br_handle_frame_finish+0x0/0x14d
 [<ffffffff8838dccc>] :bridge:br_handle_frame+0x1b7/0x1df
 [<ffffffff8121c701>] netif_receive_skb+0x33d/0x480
 [<ffffffff881525e0>] :sky2:sky2_poll+0x969/0xcb9
 [<ffffffff8104af26>] ? ktime_get+0x11/0x42
 [<ffffffff8121adc8>] net_rx_action+0xd9/0x20e
 [<ffffffff81039b87>] __do_softirq+0x70/0xf1
 [<ffffffff8100d25c>] call_softirq+0x1c/0x28
 [<ffffffff8100e485>] do_softirq+0x39/0x8a
 [<ffffffff810396c0>] irq_exit+0x4e/0x8f
 [<ffffffff8100e781>] do_IRQ+0x145/0x167
 [<ffffffff8100c5e6>] ret_from_intr+0x0/0xf
 <EOI>  [<ffffffff8103184e>] ? sched_clock_idle_wakeup_event+0x44/0x6a
 [<ffffffff81186dc8>] ? acpi_idle_enter_simple+0x1af/0x220
 [<ffffffff81186dbe>] ? acpi_idle_enter_simple+0x1a5/0x220
 [<ffffffff811fd94c>] ? cpuidle_idle_call+0x80/0xb2
 [<ffffffff811fd8cc>] ? cpuidle_idle_call+0x0/0xb2
 [<ffffffff8100b141>] ? default_idle+0x0/0x73
 [<ffffffff8100b0f9>] ? cpu_idle+0xaa/0xf2
 [<ffffffff81291ed6>] ? rest_init+0x5a/0x5c


Code: 48 85 db 0f 85 11 ff ff ff 48 c7 c2 b6 b5 23 81 be 01 00 00 00 48 c7 c7 80 37 3e 81 e8 03 a0 e1 ff e8 9f e3 df ff 45 0f b6 77 14 <49> 8b 44 24 78 4c 8d 9d 70 ff ff ff 41 8b 5f 10 41 ba 10 0
RIP  [<ffffffff8123b5d8>] __ip_route_output_key+0x187/0x93d
 RSP <ffffffff815885e0>
CR2: 0000000000000078
---[ end trace 7b7b0a93664f745c ]---
Kernel panic - not syncing: Aiee, killing interrupt handler!


Other info:

In this case eth0 and eth1 are both sky2 cards, but I'm guessing that doesn't make any difference.

I did some quick debugging and found that the panic is a result of nd_net = NULL here (line 444):

 430 
 431 void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
 432 {
 433         struct iphdr *iph;
 434         int room;
 435         struct icmp_bxm icmp_param;
 436         struct rtable *rt = (struct rtable *)skb_in->dst;
 437         struct ipcm_cookie ipc;
 438         __be32 saddr;
 439         u8  tos;
 440         struct net *net;
 441 
 442         if (!rt)
 443                 goto out;
 444         net = rt->u.dst.dev->nd_net;
 445 

I know that's not much info, but I didn't have a ton of time to research this one yet.
Comment 1 Andy Gospodarek 2008-03-25 11:48:42 UTC
I just tested without nf_conntrack[_ipv4] modules and the test passes.  Off to look at netfilter changes....
Comment 2 Adrian Bunk 2008-03-25 12:20:10 UTC
Does it work with 2.6.24?
Comment 3 Andy Gospodarek 2008-03-25 12:36:28 UTC
Not sure, I'll recompile and check.
Comment 4 Andy Gospodarek 2008-03-25 14:24:20 UTC
It looks like 2.6.24 is good, I guess I'll start bisecting....
Comment 5 Stephen Hemminger 2008-04-14 18:20:56 UTC
There was a missing case for network namespace. So either turn off network namespaces in the configuration or wait for the patch to get upstream for 2.6.25
Comment 6 Adrian Bunk 2008-04-15 02:39:33 UTC
fixed by commit 159d83363b629c91d020734207c1bc788b96af5a
Comment 7 Andy Gospodarek 2008-04-15 06:13:13 UTC
I just got done testing 159d83363b629c91d020734207c1bc788b96af5a and I can confirm that is resolves my issue.