I was trying to use broaddast (ahem, multicast) gre. This repeatably results in an Oops and a kernel panic: htpc2 ~ # ifconfig eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 4074 inet 10.1.9.61 netmask 255.255.255.0 broadcast 10.1.9.255 inet6 2001:a60:10b3:c00:201:c0ff:fe13:db43 prefixlen 64 scopeid 0x0<global> inet6 fe80::201:c0ff:fe13:db43 prefixlen 64 scopeid 0x20<link> inet6 fdf2:e35b:1a0e:2c28:201:c0ff:fe13:db43 prefixlen 64 scopeid 0x0<global> inet6 fdf2:e35b:1a0e:2c28::61 prefixlen 64 scopeid 0x0<global> ether 00:01:c0:13:db:43 txqueuelen 1000 (Ethernet) RX packets 1943144 bytes 403477948 (384.7 MiB) RX errors 0 dropped 65686 overruns 0 frame 0 TX packets 2135358 bytes 367947113 (350.9 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 device interrupt 20 memory 0xe0700000-e0720000 lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10<host> loop txqueuelen 0 (Local Loopback) RX packets 264059 bytes 16600869 (15.8 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 264059 bytes 16600869 (15.8 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 htpc2 ~ # route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 10.1.9.1 0.0.0.0 UG 2 0 0 eth0 10.1.9.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0 127.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 lo 127.0.0.0 127.0.0.1 255.0.0.0 UG 0 0 0 lo 224.0.0.0 0.0.0.0 240.0.0.0 U 10 0 0 eth0 htpc2 ~ # ip tunnel add test mode gre local 10.1.9.61 remote 224.66.66.66 ttl 16 htpc2 ~ # ip addr add 10.0.0.1/24 dev test htpc2 ~ # ip link set test up This results instantly in the following Oops (from /dev/pstore): Oops#1 Part1 <4>R13: ffffffff816569c0 R14: ffff88043e250008 R15: ffffffff8166fc40 <4>FS: 0000000000000000(0000) GS:ffff88043e240000(0000) knlGS:0000000000000000 <4>CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>CR2: 00000000000000a2 CR3: 0000000410f41000 CR4: 00000000001407a0 <4>Stack: <4> ffff880417fb2840 ffff88042c88a1c0 ffff88043e243dcc ffff8800a1644800 <4> ffffffff816569c0 ffffffffa016b33b 0000000000000012 ffff88042c88a1c0 <4> 0000000000000000 ffffffff816569c0 ffff88042d674000 ffffffffa015c4ac <4>Call Trace: <4> <IRQ> <4> [<ffffffffa016b33b>] ? ipgre_rcv+0xb4/0xc5 [ip_gre] <4> [<ffffffffa015c4ac>] ? gre_cisco_rcv+0x3b/0x89 [gre] <4> [<ffffffffa015c10b>] ? gre_rcv+0x66/0x8e [gre] <4> [<ffffffff81342222>] ? ip_local_deliver_finish+0x92/0xfc <4> [<ffffffff81313374>] ? __netif_receive_skb_core+0x612/0x6a5 <4> [<ffffffff813134e8>] ? process_backlog+0x8a/0x140 <4> [<ffffffff81313825>] ? net_rx_action+0xa5/0x1e4 <4> [<ffffffff81040f2e>] ? __do_softirq+0xf1/0x26d <4> [<ffffffff8104128a>] ? irq_exit+0x35/0x7a <4> [<ffffffff81020895>] ? smp_apic_timer_interrupt+0x3b/0x46 <4> [<ffffffff813ea60a>] ? apic_timer_interrupt+0x6a/0x70 <4> <EOI> <4> [<ffffffff812ee4dd>] ? cpuidle_enter_state+0x43/0xa6 <4> [<ffffffff812ee4d6>] ? cpuidle_enter_state+0x3c/0xa6 <4> [<ffffffff812ee649>] ? cpuidle_idle_call+0x109/0x1e3 <4> [<ffffffff81009b4d>] ? arch_cpu_idle+0x7/0x1a <4> [<ffffffff8106f2ec>] ? cpu_startup_entry+0x133/0x206 <4>Code: 89 f3 50 44 0f b7 a6 ae 00 00 00 4c 03 a6 c0 00 00 00 41 8b 44 24 10 25 f0 00 00 00 3d e0 00 00 00 75 2c 48 8b 46 58 48 83 e0 fe <80> b8 a2 00 00 00 00 0f 84 53 03 00 00 48 8b 47 18 48 ff 80 48 <1>RIP [<ffffffff8137954f>] ip_tunnel_rcv+0x35/0x3d7 <4> RSP <ffff88043e243d68> <4>CR2: 00000000000000a2 <4>---[ end trace 6a1568a07dacad07 ]--- Oops#1 Part2 <6>gre: GRE over IPv4 demultiplexor driver <6>ip_gre: GRE over IPv4 tunneling driver <1>BUG: unable to handle kernel NULL pointer dereference at 00000000000000a2 <1>IP: [<ffffffff8137954f>] ip_tunnel_rcv+0x35/0x3d7 <4>PGD 410f42067 PUD 410f43067 PMD 0 <4>Oops: 0000 [#1] PREEMPT SMP <4>Modules linked in: ip_gre gre bnep autofs4 nls_iso8859_15 nls_cp850 vfat fat configfs uinput snd_aloop snd_seq_midi snd_seq_midi_event snd_seq snd_rawmidi snd_seq_device fuse tun hid_topseed hid_generic iwldvm led_class mac80211 usbhid cdc_acm snd_hda_codec_hdmi coretemp snd_hda_codec_realtek pcspkr iwlwifi cfg80211 btusb i915 8250_pci snd_hda_intel i2c_algo_bit intel_agp snd_hda_codec i2c_i801 intel_gtt r8169 snd_pcm drm_kms_helper snd_page_alloc rtc_cmos mii iTCO_wdt drm snd_timer snd 8250 agpgart soundcore serial_core bluetooth uhci_hcd ehci_pci ehci_hcd xhci_hcd usb_storage <4>CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.13.2-gentoo-htpc2 #1 <4>Hardware name: CompuLab Intense-PC/Intense-PC, BIOS CR_2.2.0.400 X64 12/12/2013 <4>task: ffff88043c0fd9a0 ti: ffff88043c0fe000 task.ti: ffff88043c0fe000 <4>RIP: 0010:[<ffffffff8137954f>] [<ffffffff8137954f>] ip_tunnel_rcv+0x35/0x3d7 <4>RSP: 0018:ffff88043e243d68 EFLAGS: 00010246 <4>RAX: 0000000000000000 RBX: ffff88042c88a1c0 RCX: 0000000000000001 <4>RDX: ffff88043e243dcc RSI: ffff88042c88a1c0 RDI: ffff880417fb2840 <4>RBP: ffff880417fb2840 R08: 0000000000000000 R09: 00000000424242e0 <4>R10: ffff8800a1644a08 R11: ffff8800a1ca9680 R12: ffff880410eae054 The kernel then panics but I don't seem to get the panic reliably written to /dev/pstore so I can't add it here.
From b4ddc591e46a884e77092788ec25c36e42ac3304 Mon Sep 17 00:00:00 2001 From: Xin Long <lucien.xin@gmail.com> Date: Mon, 3 Mar 2014 20:04:33 +0800 Subject: [PATCH] ip_tunnel:multicast process cause panic due to skb->_skb_refdst NULL pointer when ip_tunnel process multicast packets, it may check if the packet is looped back packet though 'rt_is_output_route(skb_rtable(skb))' in ip_tunnel_rcv(), but before that , skb->_skb_refdst has been dropped in iptunnel_pull_header(), so which leads to a panic. fix the bug: https://bugzilla.kernel.org/show_bug.cgi?id=70681 Signed-off-by: Xin Long <lucien.xin@gmail.com> --- net/ipv4/ip_tunnel_core.c | 1 - 1 file changed, 1 deletion(-) diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c index 6156f4e..88b08aa 100644 --- a/net/ipv4/ip_tunnel_core.c +++ b/net/ipv4/ip_tunnel_core.c @@ -108,7 +108,6 @@ int iptunnel_pull_header(struct sk_buff *skb, int hdr_len, __be16 inner_proto) nf_reset(skb); secpath_reset(skb); skb_clear_hash_if_not_l4(skb); - skb_dst_drop(skb); skb->vlan_tci = 0; skb_set_queue_mapping(skb, 0); skb->pkt_type = PACKET_HOST; -- 1.8.3.1
I've been seeing exactly the same kernel oops on my Ubuntu 13.10 system. I can reproduce the crash by creating multiple LXC containers (each of which has a bridge of gretap interfaces) and then forcibly destroying the containers. I tried applying the above patch (to linux-source-3.11.0 version 3.11.0-18.32) but now I get a crash when the containers (and therefore the gretap interfaces) are being created. Apologies if I am supposed to be using a different kernel! Here is the new oops: [ 15.448092] BUG: unable to handle kernel paging request at fffffffc [ 15.448958] IP: [<c15c190d>] ipv6_rcv+0x13d/0x500 [ 15.449524] *pdpt = 0000000001a1a001 *pde = 0000000001a21067 *pte = 0000000000000000 [ 15.450455] Oops: 0000 [#1] SMP [ 15.450906] Modules linked in: ebt_mark_m ebtable_filter ip_gre gre ip_tunnel dummy macvlan overlayfs xt_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp ip6table_filter ip6_tables iptable_filter ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables bridge stp llc nfsd auth_rpcgss nfs_acl nfs lockd dm_multipath sunrpc scsi_dh psmouse microcode fscache virtio_balloon serio_raw lp parport ext2 floppy [ 15.453411] CPU: 1 PID: 1 Comm: init Not tainted 3.11.10.4 #1 [ 15.453411] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 [ 15.453411] task: f6058000 ti: f60f0000 task.ti: f6036000 [ 15.453411] EIP: 0060:[<c15c190d>] EFLAGS: 00010286 CPU: 1 [ 15.453411] EIP is at ipv6_rcv+0x13d/0x500 [ 15.453411] EAX: fffffffc EBX: eb4ed3c0 ECX: 00000000 EDX: eb4ed3f0 [ 15.453411] ESI: eb486200 EDI: 00000018 EBP: f60f1ef8 ESP: f60f1ecc [ 15.453411] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [ 15.453411] CR0: 80050033 CR2: fffffffc CR3: 36253000 CR4: 000006f0 [ 15.453411] Stack: [ 15.453411] eb4ed3c0 f60f1ef0 c156f5d2 00000000 00000024 ec7b5800 00000001 f3a04380 [ 15.453411] c1937340 eb4ed3c0 c1935e74 f60f1f30 c1544e33 ec7b5800 80000000 00000076 [ 15.453411] 00000000 c1937340 ec7b5800 c1935e88 eb4ed3c0 c1935e88 eb4ed3c0 eb664410 [ 15.453411] Call Trace: [ 15.453411] [<c156f5d2>] ? ip_rcv_finish+0x62/0x320 [ 15.453411] [<c1544e33>] __netif_receive_skb_core+0x4a3/0x630 [ 15.453411] [<c1544fd6>] __netif_receive_skb+0x16/0x60 [ 15.453411] [<c154503f>] netif_receive_skb+0x1f/0x80 [ 15.453411] [<c1545817>] napi_gro_receive+0x67/0x90 [ 15.453411] [<f8681aff>] gro_cell_poll+0x5f/0xa0 [ip_tunnel] [ 15.453411] [<c15452a2>] net_rx_action+0xa2/0x180 [ 15.453411] [<c1057531>] __do_softirq+0xc1/0x1d0 [ 15.453411] [<c1057470>] ? remote_softirq_receive+0xb0/0xb0 [ 15.453411] <IRQ> [ 15.453411] [<c10577a5>] ? irq_exit+0x95/0xa0 [ 15.453411] [<c1617758>] ? smp_apic_timer_interrupt+0x38/0x50 [ 15.453411] [<c16100dc>] ? apic_timer_interrupt+0x34/0x3c [ 15.453411] Code: f2 01 c2 f6 c1 02 74 09 31 ff 83 c2 02 66 89 7a fe 83 e1 01 74 03 c6 02 00 8b 43 48 83 e0 fe 0f 84 66 01 00 00 8b 80 c4 00 00 00 <8b> 00 8b 80 80 00 00 00 8b 53 4c 89 43 18 89 d0 2b 43 50 83 f8[ 15.477172] device ext1 entered promiscuous mode [ 15.453411] EIP: [<c15c190d>] ipv6_rcv+0x13d/0x500 SS:ESP 0068:f60f1ecc [ 15.453411] CR2: 00000000fffffffc [ 15.453411] ---[ end trace c7339aadbfd8dab1 ]--- [ 15.453411] Kernel panic - not syncing: Fatal exception in interrupt
I've decided that my bug is actually different and so I've opened a new ticket (https://bugzilla.kernel.org/show_bug.cgi?id=72081). However, it's still the case that the patch above caused my system to crash. Regards,
(In reply to Alex Zeffertt from comment #2) > > [ 15.453411] EIP: [<c15c190d>] ipv6_rcv+0x13d/0x500 SS:ESP 0068:f60f1ecc > [ 15.453411] CR2: 00000000fffffffc > [ 15.453411] ---[ end trace c7339aadbfd8dab1 ]--- > [ 15.453411] Kernel panic - not syncing: Fatal exception in interrupt hi, Alex, that patch actually cause this panic. a new patch may fix it perfectly. Commit 10ddceb22ba (ip_tunnel:multicast process cause panic due to skb->_skb_refdst NULL pointer) removed dst-drop call from ip-tunnel-recv. Following commit reintroduce dst-drop and fix the original bug by checking loopback packet before releasing dst. Original bug: https://bugzilla.kernel.org/show_bug.cgi?id=70681 CC: Xin Long <lucien.xin@gmail.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> --- net/ipv4/gre_demux.c | 8 ++++++++ net/ipv4/ip_tunnel.c | 3 --- net/ipv4/ip_tunnel_core.c | 1 + 3 files changed, 9 insertions(+), 3 deletions(-) diff --git a/net/ipv4/gre_demux.c b/net/ipv4/gre_demux.c index 1863422f..250be74 100644 --- a/net/ipv4/gre_demux.c +++ b/net/ipv4/gre_demux.c @@ -182,6 +182,14 @@ static int gre_cisco_rcv(struct sk_buff *skb) int i; bool csum_err = false; +#ifdef CONFIG_NET_IPGRE_BROADCAST + if (ipv4_is_multicast(ip_hdr(skb)->daddr)) { + /* Looped back packet, drop it! */ + if (rt_is_output_route(skb_rtable(skb))) + goto drop; + } +#endif + if (parse_gre_header(skb, &tpi, &csum_err) < 0) goto drop; diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c index 78a89e6..a82a22d 100644 --- a/net/ipv4/ip_tunnel.c +++ b/net/ipv4/ip_tunnel.c @@ -416,9 +416,6 @@ int ip_tunnel_rcv(struct ip_tunnel *tunnel, struct sk_buff *skb, #ifdef CONFIG_NET_IPGRE_BROADCAST if (ipv4_is_multicast(iph->daddr)) { - /* Looped back packet, drop it! */ - if (rt_is_output_route(skb_rtable(skb))) - goto drop; tunnel->dev->stats.multicast++; skb->pkt_type = PACKET_BROADCAST; } diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c index 6f847dd..8d69626 100644 --- a/net/ipv4/ip_tunnel_core.c +++ b/net/ipv4/ip_tunnel_core.c @@ -108,6 +108,7 @@ int iptunnel_pull_header(struct sk_buff *skb, int hdr_len, __be16 inner_proto) nf_reset(skb); secpath_reset(skb); skb_clear_hash_if_not_l4(skb); + skb_dst_drop(skb); skb->vlan_tci = 0; skb_set_queue_mapping(skb, 0); skb->pkt_type = PACKET_HOST;