Bug 8895 (fib6_clean)

Summary: An ioctl to delete an ipv6 tunnel leads to a kernel panic
Product: Networking Reporter: Vincent Perrier (clowncoder)
Component: IPV6Assignee: Hideaki YOSHIFUJI (yoshfuji)
Status: RESOLVED CODE_FIX    
Severity: normal CC: alan, clowncoder, dassanjib.in, davem, protasnb, qmiao, zhangwf
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.22.3 and also 2.6.21.5 Subsystem:
Regression: No Bisected commit-id:
Attachments: patch of the printk and dump of traces

Description Vincent Perrier 2007-08-16 12:31:09 UTC
Most recent kernel where this bug did not occur: ?
Distribution: lfs and fedora
Hardware Environment:user mode linux and vmware
Software Environment:an evolution of mip6d (ip mobility daemon) 
Problem Description: The mip6d HA was modified to make a redondancy evolution, when an HA is interrupted, the other takes over, this leads to some creation/deletion of routes and tunnels.
Note: The HA ip address known by the mobile (MR) stays the same, the slave HA takes it with an override neighbor advertisement message. So the tunnel between the mobile router and the HA(s) keep the same end adresses. 
The problem occurs when a Ctrl C is done on the master HA, the slave takes over but sometimes, the master gets a kernel panic. 

Here is the dump of the master:

ICMPv6 NA: someone advertises our address on eth1!
Slab corruption: ip6_dst_cache start=0867ed00, len=224
Redzone: 0x9f911029d74e35b/0x9f911029d74e35b.
Last user: [<08157c46>](dst_destroy+0x79/0xad)
0a0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6c 6b 6b 6b
Prev obj: start=0867ec08, len=224
Redzone: 0xd84156c5635688c0/0xd84156c5635688c0.
Last user: [<08157b05>](dst_alloc+0x26/0x62)
000: 00 00 00 00 00 00 00 00 00 00 00 00 40 41 6f 08
010: 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00
Next obj: start=0867edf8, len=224
Redzone: 0xd84156c5635688c0/0xd84156c5635688c0.
Last user: [<08157b05>](dst_alloc+0x26/0x62)
000: 00 00 00 00 00 00 00 00 00 00 00 00 60 41 99 0b
010: 00 00 ff ff 00 00 00 00 7d df ff ff 00 00 00 00
BUG: failure at net/ipv6/ip6_fib.c:1151/fib6_del_route()!
Kernel panic - not syncing: BUG!

EIP: 0073:[<080e10b4>] CPU: 0 Not tainted ESP: 007b:bf6d0398 EFLAGS: 00000246
    Not tainted
EAX: ffffffda EBX: 00000006 ECX: 000089f2 EDX: bf6d0428
ESI: 00000000 EDI: 0815c150 EBP: bf6d0458 DS: 007b ES: 007b
08a37ae4:  [<0806ba80>] show_regs+0xb4/0xb9
08a37b10:  [<0805a044>] panic_exit+0x25/0x3f
08a37b24:  [<0807b088>] notifier_call_chain+0x21/0x46
08a37b44:  [<0807b123>] __atomic_notifier_call_chain+0x17/0x19
08a37b60:  [<0807b13a>] atomic_notifier_call_chain+0x15/0x17
08a37b7c:  [<0806fff6>] panic+0x52/0xdd
08a37b9c:  [<081bb8d2>] fib6_del_route+0x112/0x175
08a37bc0:  [<081bb9c6>] fib6_del+0x91/0xcc
08a37bdc:  [<081bbba8>] fib6_clean_node+0x26/0x73
08a37bf4:  [<081bba8a>] fib6_walk_continue+0x89/0x11f
08a37c04:  [<081bbb57>] fib6_walk+0x37/0x62
08a37c18:  [<081bbc23>] fib6_clean_tree+0x2e/0x31
08a37c4c:  [<081bbc83>] fib6_prune_clones+0x15/0x1a
08a37c64:  [<081bb9de>] fib6_del+0xa9/0xcc
08a37c7c:  [<081bbba8>] fib6_clean_node+0x26/0x73
08a37c94:  [<081bba8a>] fib6_walk_continue+0x89/0x11f
08a37ca4:  [<081bbb57>] fib6_walk+0x37/0x62
08a37cb8:  [<081bbc23>] fib6_clean_tree+0x2e/0x31
08a37cec:  [<081bbc51>] fib6_clean_all+0x2b/0x48
08a37d10:  [<081b9d15>] rt6_ifdown+0x12/0x17
08a37d24:  [<081b56e3>] addrconf_ifdown+0x54/0x275
08a37d40:  [<081b562d>] addrconf_notify+0x18a/0x1ec
08a37d5c:  [<0807b088>] notifier_call_chain+0x21/0x46
08a37d7c:  [<0807b257>] __raw_notifier_call_chain+0x17/0x19
08a37d98:  [<0807b26e>] raw_notifier_call_chain+0x15/0x17
08a37db4:  [<08153c18>] dev_close+0x5e/0x68
08a37dcc:  [<0815619e>] unregister_netdevice+0xb7/0x1bc
08a37ddc:  [<081d75d7>] ip6_tnl_ioctl+0x1a9/0x1d2
08a37e34:  [<0815578c>] dev_ifsioc+0x3b9/0x3d9
08a37e54:  [<08155a71>] dev_ioctl+0x2c5/0x300
08a37e9c:  [<0814b435>] sock_ioctl+0x230/0x243
08a37ebc:  [<080b0801>] do_ioctl+0x21/0x5a
08a37ed8:  [<080b0ba8>] vfs_ioctl+0x1ec/0x209
08a37f00:  [<080b0bf3>] sys_ioctl+0x2e/0x4b
08a37f28:  [<0805a7ae>] handle_syscall+0x86/0xa0
08a37f74:  [<08068d00>] handle_trap+0xd8/0xe1
08a37f90:  [<080690f3>] userspace+0x138/0x180
08a37fdc:  [<0805a4d1>] fork_handler+0x74/0x7c
08a37ffc:  [<a55a5a5a>] 0xa55a5a5a


Program received signal SIGSEGV, Segmentation fault.
0xb7e58761 in abort () from /lib/tls/i686/cmov/libc.so.6
(gdb)



Program received signal SIGSEGV, Segmentation fault.
0xb7e58761 in abort () from /lib/tls/i686/cmov/libc.so.6
(gdb) bt
#0  0xb7e58761 in abort () from /lib/tls/i686/cmov/libc.so.6
#1  0x080676df in os_dump_core () at arch/um/os-Linux/util.c:109
#2  0x0805a05a in panic_exit (self=0x825d674, unused1=0, unused2=0x8277ee0)
    at arch/um/kernel/um_arch.c:477
#3  0x0807b088 in notifier_call_chain (nl=0x8277ec0, val=0, v=0x8277ee0,
    nr_to_call=-2, nr_calls=0x0) at kernel/sys.c:163
#4  0x0807b123 in __atomic_notifier_call_chain (nh=0x8277ec0, val=0,
    v=0x8277ee0, nr_to_call=-1, nr_calls=0x0) at kernel/sys.c:256
#5  0x0807b13a in atomic_notifier_call_chain (nh=0x8277ec0, val=0, v=0x8277ee0)
    at kernel/sys.c:266
#6  0x0806fff6 in panic (fmt=0x8217b25 "BUG!") at kernel/panic.c:99
#7  0x081bb8d2 in fib6_del_route (fn=0x0, rtp=0x8abd568, info=0x0)
    at net/ipv6/ip6_fib.c:1151
#8  0x081bb9c6 in fib6_del (rt=0x867ed00, info=0x0) at net/ipv6/ip6_fib.c:1193
#9  0x081bbba8 in fib6_clean_node (w=0x8a37c20) at net/ipv6/ip6_fib.c:1322
#10 0x081bba8a in fib6_walk_continue (w=0x8a37c20) at net/ipv6/ip6_fib.c:1264
#11 0x081bbb57 in fib6_walk (w=0x8a37c20) at net/ipv6/ip6_fib.c:1306
#12 0x081bbc23 in fib6_clean_tree (root=0x8abd440,
    func=0x81bbc88 <fib6_prune_clone>, prune=1, arg=0x867edf8)
    at net/ipv6/ip6_fib.c:1360
#13 0x081bbc83 in fib6_prune_clones (fn=0x8abd440, rt=0x867edf8)
    at net/ipv6/ip6_fib.c:1394
#14 0x081bb9de in fib6_del (rt=0x867edf8, info=0x0) at net/ipv6/ip6_fib.c:1184
#15 0x081bbba8 in fib6_clean_node (w=0x8a37cc0) at net/ipv6/ip6_fib.c:1322
#16 0x081bba8a in fib6_walk_continue (w=0x8a37cc0) at net/ipv6/ip6_fib.c:1264
#17 0x081bbb57 in fib6_walk (w=0x8a37cc0) at net/ipv6/ip6_fib.c:1306
#18 0x081bbc23 in fib6_clean_tree (root=0x8272dac,
    func=0x81b9ce2 <fib6_ifdown>, prune=0, arg=0xb994160)
    at net/ipv6/ip6_fib.c:1360
#19 0x081bbc51 in fib6_clean_all (func=0x81b9ce2 <fib6_ifdown>, prune=0,
    arg=0xb994160) at net/ipv6/ip6_fib.c:1372
#20 0x081b9d15 in rt6_ifdown (dev=0xb994160) at net/ipv6/route.c:1944
#21 0x081b56e3 in addrconf_ifdown (dev=0xb994160, how=0)
    at net/ipv6/addrconf.c:2400
#22 0x081b562d in addrconf_notify (this=0x82721c4, event=2, data=0xb994160)
    at net/ipv6/addrconf.c:2358
#23 0x0807b088 in notifier_call_chain (nl=0x8283e94, val=2, v=0xb994160,
    nr_to_call=-10, nr_calls=0x0) at kernel/sys.c:163
#24 0x0807b257 in __raw_notifier_call_chain (nh=0x8283e94, val=2, v=0xb994160,
    nr_to_call=-1, nr_calls=0x0) at kernel/sys.c:451
#25 0x0807b26e in raw_notifier_call_chain (nh=0x8283e94, val=2, v=0xb994160)
    at kernel/sys.c:459
#26 0x08153c18 in dev_close (dev=0xb994160) at net/core/dev.c:1015
#27 0x0815619e in unregister_netdevice (dev=0xb994160) at net/core/dev.c:3451
#28 0x081d75d7 in ip6_tnl_ioctl (dev=0xb994160, ifr=0x8a37e6c, cmd=35314)
    at net/ipv6/ip6_tunnel.c:1266
#29 0x0815578c in dev_ifsioc (ifr=0x8a37e6c, cmd=35314) at net/core/dev.c:2816
#30 0x08155a71 in dev_ioctl (cmd=35314, arg=0xbf6d0428) at net/core/dev.c:2995
#31 0x0814b435 in sock_ioctl (file=0x832a348, cmd=35314, arg=3211592744)
    at net/socket.c:909
#32 0x080b0801 in do_ioctl (filp=0x16, cmd=35314, arg=3211592744)
---Type <return> to continue, or q <return> to quit---

    at fs/ioctl.c:30
#33 0x080b0ba8 in vfs_ioctl (filp=0x832a348, fd=6, cmd=6, arg=3211592744)
    at fs/ioctl.c:159
#34 0x080b0bf3 in sys_ioctl (fd=6, cmd=35314, arg=3211592744) at fs/ioctl.c:179
#35 0x0805a7ae in handle_syscall (r=0x867a894)
    at arch/um/kernel/skas/syscall.c:38
#36 0x08068d00 in handle_trap (pid=10640, regs=0x867a894, local_using_sysemu=2)
    at arch/um/os-Linux/skas/process.c:173
#37 0x080690f3 in userspace (regs=0x867a894)
    at arch/um/os-Linux/skas/process.c:330
#38 0x0805a4d1 in fork_handler () at arch/um/kernel/skas/process.c:96
#39 0xa55a5a5a in ?? ()
(gdb)



Steps to reproduce:
Comment 1 Anonymous Emailer 2007-08-16 13:16:34 UTC
Reply-To: akpm@linux-foundation.org

On Thu, 16 Aug 2007 12:24:05 -0700 (PDT)
bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=8895
> 
>            Summary: An ioctl to delete an ipv6 tunnel leads to a kernel
>                     panic
>            Product: Networking
>            Version: 2.5
>      KernelVersion: 2.6.22.3 and also 2.6.21.5
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: IPV6
>         AssignedTo: yoshfuji@linux-ipv6.org
>         ReportedBy: clowncoder@clownix.net
> 
> 
> Most recent kernel where this bug did not occur: ?
> Distribution: lfs and fedora
> Hardware Environment:user mode linux and vmware
> Software Environment:an evolution of mip6d (ip mobility daemon) 
> Problem Description: The mip6d HA was modified to make a redondancy
> evolution,
> when an HA is interrupted, the other takes over, this leads to some
> creation/deletion of routes and tunnels.
> Note: The HA ip address known by the mobile (MR) stays the same, the slave HA
> takes it with an override neighbor advertisement message. So the tunnel
> between
> the mobile router and the HA(s) keep the same end adresses. 
> The problem occurs when a Ctrl C is done on the master HA, the slave takes
> over
> but sometimes, the master gets a kernel panic. 
> 
> Here is the dump of the master:
> 
> ICMPv6 NA: someone advertises our address on eth1!
> Slab corruption: ip6_dst_cache start=0867ed00, len=224
> Redzone: 0x9f911029d74e35b/0x9f911029d74e35b.
> Last user: [<08157c46>](dst_destroy+0x79/0xad)
> 0a0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6c 6b 6b 6b
> Prev obj: start=0867ec08, len=224
> Redzone: 0xd84156c5635688c0/0xd84156c5635688c0.
> Last user: [<08157b05>](dst_alloc+0x26/0x62)
> 000: 00 00 00 00 00 00 00 00 00 00 00 00 40 41 6f 08
> 010: 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00
> Next obj: start=0867edf8, len=224
> Redzone: 0xd84156c5635688c0/0xd84156c5635688c0.
> Last user: [<08157b05>](dst_alloc+0x26/0x62)
> 000: 00 00 00 00 00 00 00 00 00 00 00 00 60 41 99 0b
> 010: 00 00 ff ff 00 00 00 00 7d df ff ff 00 00 00 00
> BUG: failure at net/ipv6/ip6_fib.c:1151/fib6_del_route()!
> Kernel panic - not syncing: BUG!
> 
> EIP: 0073:[<080e10b4>] CPU: 0 Not tainted ESP: 007b:bf6d0398 EFLAGS: 00000246
>     Not tainted
> EAX: ffffffda EBX: 00000006 ECX: 000089f2 EDX: bf6d0428
> ESI: 00000000 EDI: 0815c150 EBP: bf6d0458 DS: 007b ES: 007b
> 08a37ae4:  [<0806ba80>] show_regs+0xb4/0xb9
> 08a37b10:  [<0805a044>] panic_exit+0x25/0x3f
> 08a37b24:  [<0807b088>] notifier_call_chain+0x21/0x46
> 08a37b44:  [<0807b123>] __atomic_notifier_call_chain+0x17/0x19
> 08a37b60:  [<0807b13a>] atomic_notifier_call_chain+0x15/0x17
> 08a37b7c:  [<0806fff6>] panic+0x52/0xdd
> 08a37b9c:  [<081bb8d2>] fib6_del_route+0x112/0x175
> 08a37bc0:  [<081bb9c6>] fib6_del+0x91/0xcc
> 08a37bdc:  [<081bbba8>] fib6_clean_node+0x26/0x73
> 08a37bf4:  [<081bba8a>] fib6_walk_continue+0x89/0x11f
> 08a37c04:  [<081bbb57>] fib6_walk+0x37/0x62
> 08a37c18:  [<081bbc23>] fib6_clean_tree+0x2e/0x31
> 08a37c4c:  [<081bbc83>] fib6_prune_clones+0x15/0x1a
> 08a37c64:  [<081bb9de>] fib6_del+0xa9/0xcc
> 08a37c7c:  [<081bbba8>] fib6_clean_node+0x26/0x73
> 08a37c94:  [<081bba8a>] fib6_walk_continue+0x89/0x11f
> 08a37ca4:  [<081bbb57>] fib6_walk+0x37/0x62
> 08a37cb8:  [<081bbc23>] fib6_clean_tree+0x2e/0x31
> 08a37cec:  [<081bbc51>] fib6_clean_all+0x2b/0x48
> 08a37d10:  [<081b9d15>] rt6_ifdown+0x12/0x17
> 08a37d24:  [<081b56e3>] addrconf_ifdown+0x54/0x275
> 08a37d40:  [<081b562d>] addrconf_notify+0x18a/0x1ec
> 08a37d5c:  [<0807b088>] notifier_call_chain+0x21/0x46
> 08a37d7c:  [<0807b257>] __raw_notifier_call_chain+0x17/0x19
> 08a37d98:  [<0807b26e>] raw_notifier_call_chain+0x15/0x17
> 08a37db4:  [<08153c18>] dev_close+0x5e/0x68
> 08a37dcc:  [<0815619e>] unregister_netdevice+0xb7/0x1bc
> 08a37ddc:  [<081d75d7>] ip6_tnl_ioctl+0x1a9/0x1d2
> 08a37e34:  [<0815578c>] dev_ifsioc+0x3b9/0x3d9
> 08a37e54:  [<08155a71>] dev_ioctl+0x2c5/0x300
> 08a37e9c:  [<0814b435>] sock_ioctl+0x230/0x243
> 08a37ebc:  [<080b0801>] do_ioctl+0x21/0x5a
> 08a37ed8:  [<080b0ba8>] vfs_ioctl+0x1ec/0x209
> 08a37f00:  [<080b0bf3>] sys_ioctl+0x2e/0x4b
> 08a37f28:  [<0805a7ae>] handle_syscall+0x86/0xa0
> 08a37f74:  [<08068d00>] handle_trap+0xd8/0xe1
> 08a37f90:  [<080690f3>] userspace+0x138/0x180
> 08a37fdc:  [<0805a4d1>] fork_handler+0x74/0x7c
> 08a37ffc:  [<a55a5a5a>] 0xa55a5a5a
> 
> 
> Program received signal SIGSEGV, Segmentation fault.
> 0xb7e58761 in abort () from /lib/tls/i686/cmov/libc.so.6
> (gdb)
> 
> 
> 
> Program received signal SIGSEGV, Segmentation fault.
> 0xb7e58761 in abort () from /lib/tls/i686/cmov/libc.so.6
> (gdb) bt
> #0  0xb7e58761 in abort () from /lib/tls/i686/cmov/libc.so.6
> #1  0x080676df in os_dump_core () at arch/um/os-Linux/util.c:109
> #2  0x0805a05a in panic_exit (self=0x825d674, unused1=0, unused2=0x8277ee0)
>     at arch/um/kernel/um_arch.c:477
> #3  0x0807b088 in notifier_call_chain (nl=0x8277ec0, val=0, v=0x8277ee0,
>     nr_to_call=-2, nr_calls=0x0) at kernel/sys.c:163
> #4  0x0807b123 in __atomic_notifier_call_chain (nh=0x8277ec0, val=0,
>     v=0x8277ee0, nr_to_call=-1, nr_calls=0x0) at kernel/sys.c:256
> #5  0x0807b13a in atomic_notifier_call_chain (nh=0x8277ec0, val=0,
> v=0x8277ee0)
>     at kernel/sys.c:266
> #6  0x0806fff6 in panic (fmt=0x8217b25 "BUG!") at kernel/panic.c:99
> #7  0x081bb8d2 in fib6_del_route (fn=0x0, rtp=0x8abd568, info=0x0)
>     at net/ipv6/ip6_fib.c:1151
> #8  0x081bb9c6 in fib6_del (rt=0x867ed00, info=0x0) at
> net/ipv6/ip6_fib.c:1193
> #9  0x081bbba8 in fib6_clean_node (w=0x8a37c20) at net/ipv6/ip6_fib.c:1322
> #10 0x081bba8a in fib6_walk_continue (w=0x8a37c20) at net/ipv6/ip6_fib.c:1264
> #11 0x081bbb57 in fib6_walk (w=0x8a37c20) at net/ipv6/ip6_fib.c:1306
> #12 0x081bbc23 in fib6_clean_tree (root=0x8abd440,
>     func=0x81bbc88 <fib6_prune_clone>, prune=1, arg=0x867edf8)
>     at net/ipv6/ip6_fib.c:1360
> #13 0x081bbc83 in fib6_prune_clones (fn=0x8abd440, rt=0x867edf8)
>     at net/ipv6/ip6_fib.c:1394
> #14 0x081bb9de in fib6_del (rt=0x867edf8, info=0x0) at
> net/ipv6/ip6_fib.c:1184
> #15 0x081bbba8 in fib6_clean_node (w=0x8a37cc0) at net/ipv6/ip6_fib.c:1322
> #16 0x081bba8a in fib6_walk_continue (w=0x8a37cc0) at net/ipv6/ip6_fib.c:1264
> #17 0x081bbb57 in fib6_walk (w=0x8a37cc0) at net/ipv6/ip6_fib.c:1306
> #18 0x081bbc23 in fib6_clean_tree (root=0x8272dac,
>     func=0x81b9ce2 <fib6_ifdown>, prune=0, arg=0xb994160)
>     at net/ipv6/ip6_fib.c:1360
> #19 0x081bbc51 in fib6_clean_all (func=0x81b9ce2 <fib6_ifdown>, prune=0,
>     arg=0xb994160) at net/ipv6/ip6_fib.c:1372
> #20 0x081b9d15 in rt6_ifdown (dev=0xb994160) at net/ipv6/route.c:1944
> #21 0x081b56e3 in addrconf_ifdown (dev=0xb994160, how=0)
>     at net/ipv6/addrconf.c:2400
> #22 0x081b562d in addrconf_notify (this=0x82721c4, event=2, data=0xb994160)
>     at net/ipv6/addrconf.c:2358
> #23 0x0807b088 in notifier_call_chain (nl=0x8283e94, val=2, v=0xb994160,
>     nr_to_call=-10, nr_calls=0x0) at kernel/sys.c:163
> #24 0x0807b257 in __raw_notifier_call_chain (nh=0x8283e94, val=2,
> v=0xb994160,
>     nr_to_call=-1, nr_calls=0x0) at kernel/sys.c:451
> #25 0x0807b26e in raw_notifier_call_chain (nh=0x8283e94, val=2, v=0xb994160)
>     at kernel/sys.c:459
> #26 0x08153c18 in dev_close (dev=0xb994160) at net/core/dev.c:1015
> #27 0x0815619e in unregister_netdevice (dev=0xb994160) at net/core/dev.c:3451
> #28 0x081d75d7 in ip6_tnl_ioctl (dev=0xb994160, ifr=0x8a37e6c, cmd=35314)
>     at net/ipv6/ip6_tunnel.c:1266
> #29 0x0815578c in dev_ifsioc (ifr=0x8a37e6c, cmd=35314) at
> net/core/dev.c:2816
> #30 0x08155a71 in dev_ioctl (cmd=35314, arg=0xbf6d0428) at
> net/core/dev.c:2995
> #31 0x0814b435 in sock_ioctl (file=0x832a348, cmd=35314, arg=3211592744)
>     at net/socket.c:909
> #32 0x080b0801 in do_ioctl (filp=0x16, cmd=35314, arg=3211592744)
> ---Type <return> to continue, or q <return> to quit---
> 
>     at fs/ioctl.c:30
> #33 0x080b0ba8 in vfs_ioctl (filp=0x832a348, fd=6, cmd=6, arg=3211592744)
>     at fs/ioctl.c:159
> #34 0x080b0bf3 in sys_ioctl (fd=6, cmd=35314, arg=3211592744) at
> fs/ioctl.c:179
> #35 0x0805a7ae in handle_syscall (r=0x867a894)
>     at arch/um/kernel/skas/syscall.c:38
> #36 0x08068d00 in handle_trap (pid=10640, regs=0x867a894,
> local_using_sysemu=2)
>     at arch/um/os-Linux/skas/process.c:173
> #37 0x080690f3 in userspace (regs=0x867a894)
>     at arch/um/os-Linux/skas/process.c:330
> #38 0x0805a4d1 in fork_handler () at arch/um/kernel/skas/process.c:96
> #39 0xa55a5a5a in ?? ()
> (gdb)
> 
> 
> 
> Steps to reproduce:
> 
> 
Comment 2 Vincent Perrier 2007-08-18 03:06:35 UTC
I did another test:
Modification of file:   ip6_fib.c

static void fib6_del_route(struct fib6_node *fn, struct rt6_info **rtp,
                           struct nl_info *info)
{
. . .
printk("0 ATOMIC  %d\n", atomic_read(&rt->rt6i_ref));
        if (atomic_read(&rt->rt6i_ref) != 1) {
printk("1 ATOMIC  %d\n", atomic_read(&rt->rt6i_ref));
                /* This route is used as dummy address holder in some split
                 * nodes. It is not leaked, but it still holds other resources,
                 * which must be released in time. So, scan ascendant nodes
                 * and replace dummy references to this route with references
                 * to still alive ones.
                 */
                while (fn) {
                        if (!(fn->fn_flags&RTN_RTINFO) && fn->leaf == rt) {
                                fn->leaf = fib6_find_prefix(fn);
                                atomic_inc(&fn->leaf->rt6i_ref);
                                rt6_release(rt);
                        }
                        fn = fn->parent;
                }
                /* No more references are possible at this point. */
                if (atomic_read(&rt->rt6i_ref) != 1)
 printk("2 ATOMIC  %d", atomic_read(&rt->rt6i_ref));

. . .

Result in the console:


. . .
0 ATOMIC  1
0 ATOMIC  1
0 ATOMIC  1
0 ATOMIC  1
Slab corruption: ip6_dst_cache start=08506160, len=224
Redzone: 0x9f911029d74e35b/0x9f911029d74e35b.
Last user: [<08157c46>](dst_destroy+0x79/0xad)
0a0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6c 6b 6b 6b
Prev obj: start=08506068, len=224
Redzone: 0xd84156c5635688c0/0xd84156c5635688c0.
Last user: [<08157b05>](dst_alloc+0x26/0x62)
000: 00 00 00 00 00 00 00 00 00 00 00 00 40 b4 26 08
010: 00 00 ff ff 01 00 00 00 00 00 00 00 00 00 00 00
Next obj: start=08506258, len=224
Redzone: 0x9f911029d74e35b/0x9f911029d74e35b.
Last user: [<08157c46>](dst_destroy+0x79/0xad)
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
0 ATOMIC  1
0 ATOMIC  1
0 ATOMIC  1
0 ATOMIC  1802201963
1 ATOMIC  1802201963
2 ATOMIC  1802201963
Program received signal SIGSEGV, Segmentation fault.
rt6_fill_node (skb=0x8b1e7e8, rt=0x8506160, dst=0x0, src=0x0, iif=0, type=25,
    pid=0, seq=0, prefix=0, flags=0) at net/ipv6/route.c:2145
2145                    table = rt->rt6i_table->tb6_id;
(gdb) quit
Comment 3 weifeng zhang 2007-08-22 21:53:14 UTC
we also have encountered this problem, which happens randomly. and it existed on 2.6.18. we have some different log. But we believe they are the same cause.

kernel BUG in fib6_del_route at /sw/release/dev/autorel/platform/linux/net/ipv6/ip6_fib.c:865!
Oops: Exception in kernel mode, sig: 5 [#1]
PREEMPT
NIP: C01C59E0 LR: C01C588C CTR: C01C64EC
REGS: cc5299f0 TRAP: 0700   Tainted: P       (2.6.18-ctc8247)
MSR: 00029032 <EE,ME,IR,DR>  CR: 44008824  XER: 00000000
TASK = cfbc47b0[700] 'nsm' THREAD: cc528000
GPR00: 00000002 CC529AA0 CFBC47B0 CFAC9380 C0222614 C022252C 0000040B 00000008
GPR08: C0252910 00000004 00000102 C0252910 24000822 102B6330 0FFFF000 00000000
GPR16: 00000001 FFFFFFFF 00000000 7F8207D0 C0250000 C84C7E30 C8626080 00000000
GPR24: C0290000 C0250000 CC528000 C0252910 C02A8C90 CDFEEDA0 CDFEEE44 00000000
NIP [C01C59E0] fib6_del+0x214/0x610
LR [C01C588C] fib6_del+0xc0/0x610
Call Trace:
[CC529AA0] [C01C588C] fib6_del+0xc0/0x610 (unreliable)
[CC529AE0] [C01C32E8] ip6_route_del+0x12c/0x1d4
[CC529B10] [C01C42C0] inet6_rtm_delroute+0x50/0x6c
[CC529B90] [C0166320] rtnetlink_rcv_msg+0x194/0x250
[CC529BC0] [C016C924] netlink_run_queue+0xd8/0x17c
[CC529BF0] [C0166418] rtnetlink_rcv+0x3c/0x68
[CC529C10] [C016BD20] netlink_data_ready+0x70/0xcc
[CC529C20] [C016ADAC] netlink_sendskb+0x34/0x88
[CC529C40] [C016BC34] netlink_sendmsg+0x270/0x2ec
[CC529CB0] [C014D504] sock_sendmsg+0xac/0xf4
[CC529DB0] [C014F36C] sys_sendmsg+0x1d0/0x26c
[CC529F00] [C014F7F4] sys_socketcall+0x1d8/0x1dc
[CC529F40] [C00042A0] ret_from_syscall+0x0/0x38
Instruction dump:
70090004 40820014 801f0010 7fe3fb78 7f80e800 419e0074 83ff0000 2f9f0000
409effdc 801d00a4 2f800001 419e0008 <0fe00000> 7ec5b378 7ea6ab78 38600019
Kernel panic - not syncing: Aiee, killing interrupt handler!
 <0>Rebooting in 1 seconds..
Comment 4 Hideaki YOSHIFUJI 2007-08-24 01:11:33 UTC
Do you use vanilla kernel?
Are there any trivial way to reproduce this?
Comment 5 qmiao 2007-08-28 18:47:47 UTC
The bug is caused by race condition between deleting ipv6 addr (proc ctx) and dad timer (softirq ctx).
The following is executing sequence:
1. (process context) add ipv6 address => new ifp => new ifp->rt => start dad timer
ifp->rt is not inserted into fib6 tree, it will be inserted into fib6 tree by addrconf_dad_completed()

2. (process context) delete ipv6 address => dst_free(ifp->rt) => ifp->rt->u.dst is queued on dst_garbage_list
addrconf.c:__ipv6_ifa_notify():3564: dst_free(&ifp->rt->u.dst)

3. (softirq context) dad timer expired => addrconf_dad_completed() insert ifp->rt into fib6 tree

4. (softirq context) dst_gc timer expired => dst_run_gc() free ifp->rt->u.dst

5. (process context) shutdown interface => fib6_clean_tree() => fib6_walk() => access already freed rt6_info


solution to fix the bug:
Delete dad timer before deleting ipv6 addr instead of deleting dad timer after deleting ipv6 addr.(addrconf.c:ipv6_del_addr())
Comment 6 Vincent Perrier 2007-08-29 05:01:12 UTC
Hello, I tried the detetion of dad timer before deleting the addr, it is not this bug.
I have put lots of printks to see more and here is my way out:

I had sometimes the following call tree:
  ip6_route_add
  ...
        fib6_add
                ...

                if (fn->leaf == NULL) {
                        fn->leaf = rt;
                        atomic_inc(&rt->rt6i_ref);
                }
                ...
                err = fib6_add_rt2node(fn, rt, info);
                ...
(Here err was not null)
                ...
                if (err) {
                        ...
                        dst_free(&rt->u.dst);
                        ...
                        }

And it seems that this dst_free is not good at this point, it decrements
rt6i_ref but the address is still used.
I do not understand exactly what happens, but the following patch hides
my problem, but certainly does not solve it.


diff -Naur linux-2.6.22.5/net/ipv6/ip6_fib.c clownix_linux-2.6.22.5/net/ipv6/ip6_fib.c
--- linux-2.6.22.5/net/ipv6/ip6_fib.c   2007-08-23 01:23:54.000000000 +0200
+++ clownix_linux-2.6.22.5/net/ipv6/ip6_fib.c   2007-08-29 13:10:35.000000000 +0200
@@ -696,6 +696,7 @@
 {
        struct fib6_node *fn, *pn = NULL;
        int err = -ENOMEM;
+       int bug_8895_clownix_provisional_workaround = 0;

        fn = fib6_add_1(root, &rt->rt6i_dst.addr, sizeof(struct in6_addr),
                        rt->rt6i_dst.plen, offsetof(struct rt6_info, rt6i_dst));
@@ -760,6 +761,7 @@
                }

                if (fn->leaf == NULL) {
+                       bug_8895_clownix_provisional_workaround = 1;
                        fn->leaf = rt;
                        atomic_inc(&rt->rt6i_ref);
                }
@@ -793,7 +795,8 @@
                        atomic_inc(&pn->leaf->rt6i_ref);
                }
 #endif
-               dst_free(&rt->u.dst);
+               if (!bug_8895_clownix_provisional_workaround)
+                       dst_free(&rt->u.dst);
        }
        return err;
Comment 7 qmiao 2007-08-29 18:01:35 UTC
My kernel version is 2.6.18 and no CONFIG_IPV6_SUBTREES enabled.

Can you explain/find why fib6_add_rt2node return error(-EEXIST)?
Comment 8 Vincent Perrier 2007-08-29 23:19:30 UTC
Your Bug is not the same as mine, I had a kernel crash every 5 Ctrl-C of the user software approximatelly, and with my patch (which does not correct in depth the problem), I can make a Ctrl-C every 10 secondes all day.
The EEXIST error can be caused by mistakes in the user software or anything else, I don't know. But I went through the following error:

 for (iter = fn->leaf; iter; iter=iter->u.dst.rt6_next) {
                /*
                 *      Search for duplicates
                 */

                if (iter->rt6i_metric == rt->rt6i_metric) {
                        /*
                         *      Same priority level
                         */

                        if (iter->rt6i_dev == rt->rt6i_dev &&
                            iter->rt6i_idev == rt->rt6i_idev &&
                            ipv6_addr_equal(&iter->rt6i_gateway,
                                            &rt->rt6i_gateway)) {
                                if (!(iter->rt6i_flags&RTF_EXPIRES))
THIS IS WHERE I RETURNED ---------->   return -EEXIST;
                                iter->rt6i_expires = rt->rt6i_expires;
                                if (!(rt->rt6i_flags&RTF_EXPIRES)) {
                                        iter->rt6i_flags &= ~RTF_EXPIRES;
                                        iter->rt6i_expires = 0;
                                }
                                return -EEXIST;
                        }
                }
Comment 9 qmiao 2007-08-29 23:33:07 UTC
        fib6_add
                ...

                if (fn->leaf == NULL) {
                        fn->leaf = rt;    <--**-- rt is assigned to fn->leaf
                        atomic_inc(&rt->rt6i_ref);
                }
                ...
                err = fib6_add_rt2node(fn, rt, info); <-**- return -EEXIST
                ...
(Here err was not null)
                ...
                if (err) {
                        ...
                        dst_free(&rt->u.dst); <--**-- Actually rt is still in tree (fn->leaf = rt /* see above */)
                        ...
                        }
Comment 10 Vincent Perrier 2007-08-30 11:22:25 UTC
Yes, it is also what I think, but I have also tried to put fn->leaf to null and that did not work, because there are lots of other things to do to delete rt from the tree. So the kernel experts will have to find a solution to clean fn->leaf in case of an error in fib6_add_rt2node.

What happens after: in my case, a call to ip_route_output (triggrered by a message output) increments rt6i_ref again and the leaf lives its normal life, but the crash occurs long after that, the rt6i_ref is one too low, so the address is freed when there is still one use of it and then the 0x6b6b6b appear.

I still have not seen any bad things caused by my simple patch, so everything is fine for me. 
Thank you for the other bug, I think I may have seen it too, but I am not sure.
Comment 11 Vincent Perrier 2007-08-30 23:00:57 UTC
Created attachment 12639 [details]
patch of the printk and dump of traces
Comment 12 Vincent Perrier 2007-12-21 02:21:50 UTC
Why is this bug not corrected, it is old and completely clear:
file ip6_fib.c, line 796 in the vanilla kernel 2.6.23.11 the dst_free 
can cause kernel crash, as qmiao wrote:
       fib6_add
                ...

                if (fn->leaf == NULL) {
                        fn->leaf = rt;    <--**-- rt is assigned to fn->leaf
                        atomic_inc(&rt->rt6i_ref);
                }
                ...
                err = fib6_add_rt2node(fn, rt, info); <-**- return -EEXIST
                ...
(Here err was not null)
                ...
                if (err) {
                        ...
                        dst_free(&rt->u.dst); <--**-- Actually rt is still in
tree (fn->leaf = rt /* see above */)
                        ...
                        }
Comment 13 Natalie Protasevich 2008-02-10 20:25:39 UTC
It looks like the code in question is still there. I will forward this to netdev.
Comment 14 Alan 2009-03-18 10:10:37 UTC
(pinging DaveM)
Comment 15 Alan 2009-05-21 20:48:34 UTC
Date:   Fri Apr 18 01:46:19 2008 -0700

    [IPV6]: Fix dangling references on error in fib6_add().
    
    Fixes bugzilla #8895
Comment 16 Sanjib 2020-02-17 17:28:48 UTC
Hi Alan,

It would be great favour if you can point me to the fix patch (for this issue).
Comment 17 Alan 2020-02-19 02:12:31 UTC
No idea - we fixed it a decade ago