Bug 72081

Summary: [PATCH] kernel oops on gretap teardown
Product: Networking Reporter: Alex Zeffertt (alex.zeffertt)
Component: IPV4Assignee: Stephen Hemminger (stephen)
Status: NEW ---    
Severity: high CC: alan, alex.zeffertt, lucien.xin, szg00000
Priority: P1    
Hardware: i386   
OS: Linux   
Kernel Version: 3.11.10.4 Subsystem:
Regression: No Bisected commit-id:
Attachments: Test t->dev != NULL before dereferencing

Description Alex Zeffertt 2014-03-14 14:50:38 UTC
Created attachment 129391 [details]
Test t->dev != NULL before dereferencing

The following kernel oops occurs on teardown of gretap devices.  The devices actually belong to LXC containers which are destroyed forcibly.

[  202.827256] BUG: unable to handle kernel NULL pointer dereference at 000000fc
[  202.828153] IP: [<f867c22a>] ip_tunnel_lookup+0x22a/0x2e0 [ip_tunnel]
[  202.828895] *pdpt = 000000002695c001 *pde = 0000000000000000 
[  202.829568] Oops: 0000 [#1] SMP 
[  202.829970] Modules linked in: ebt_mark_m ebtable_filter ip_gre ip_tunnel gre dummy macvlan overlayfs xt_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp ip6table_filter ip6_tables iptable_filter ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables bridge stp llc nfsd auth_rpcgss nfs_acl nfs lockd sunrpc fscache ext2 dm_multipath scsi_dh microcode psmouse serio_raw virtio_balloon lp parport floppy
[  202.831060] CPU: 1 PID: 17426 Comm: manager Tainted: GF            3.11.0-18-generic #32-Ubuntu
[  202.831060] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[  202.831060] task: e4956700 ti: f64f2000 task.ti: e491e000
[  202.831060] EIP: 0060:[<f867c22a>] EFLAGS: 00210246 CPU: 1
[  202.831060] EIP is at ip_tunnel_lookup+0x22a/0x2e0 [ip_tunnel]
[  202.831060] EAX: 00000000 EBX: 00000035 ECX: 00000000 EDX: deb13ffc
[  202.831060] ESI: 00000000 EDI: deb16000 EBP: f64f3e44 ESP: f64f3e18
[  202.831060]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[  202.831060] CR0: 8005003b CR2: 000000fc CR3: 26b58000 CR4: 000006f0
[  202.831060] Stack:
[  202.831060]  f2b884e8 e02d0002 00000400 deb16000 deb16000 eb340400 00000035 04003e48
[  202.831060]  eb34b180 7afb10ac f64f3e78 f64f3e68 f8685d66 7afb10ac 76fb10ac 00000000
[  202.831060]  00000035 00000000 eb34b180 e3314080 f64f3e8c f86776a6 00000001 0088e800
[  202.831060] Call Trace:
[  202.831060]  [<f8685d66>] ipgre_rcv+0x86/0xe0 [ip_gre]
[  202.831060]  [<f86776a6>] gre_cisco_rcv+0x36/0x80 [gre]
[  202.831060]  [<f8677390>] gre_rcv+0x50/0x80 [gre]
[  202.831060]  [<c157d157>] ip_local_deliver_finish+0x97/0x220
[  202.831060]  [<c157d444>] ip_local_deliver+0x44/0x90


The attached patch fixes this and I can now destroy multiple LXC containers (and their gretap interfaces) simultaneously without causing an oops.  I don't know whether this is the right way to go about fixing the problem though....

Regards,

Alex
Comment 1 Xin Long 2014-03-15 05:41:00 UTC
(In reply to Alex Zeffertt from comment #0)

> The attached patch fixes this and I can now destroy multiple LXC containers
> (and their gretap interfaces) simultaneously without causing an oops.  I
> don't know whether this is the right way to go about fixing the problem
> though....
> 
I think it is crucial that why the member *dev* of sturct ip_tunnel is NULL, perhaps the patch should be there that leads to the t->dev is NULL, such as when creating or changing the tunnel. 

otherwise , you can send your idea to network dev <netdev@vger.kernel.org> to discuss.