Latest working kernel version: none known Earliest failing kernel version: none tested Distribution: Debian Hardware Environment: Intel(R) Pentium(R) 4 CPU 2.80GHz, HyperThreaded Software Environment: SMT kernel, Debian glibc 2.7 Problem Description: When an IFF_TUN (/dev/net/tun) device has more than 1023 IPv6 neighbors, a process context crash occurs. Backtrace follows: BUG: unable to handle kernel NULL pointer dereference at 0000001d IP: [<f8b375bf>] :ipv6:ip6_dst_lookup_tail+0x95/0x15a *pde = 00000000 Oops: 0000 [#14] SMP Modules linked in: ipx p8022 psnap llc p8023 i915 drm tun cpufreq_ondemand binfmt_misc fuse nf_conntrack_ftp nf_conntrack_ipv6 nf_conntrack_ipv4 nf_conntrack ipv6 snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device snd intel_agp psmouse soundcore agpgart button processor snd_page_alloc parport_pc parport iTCO_wdt evdev pcspkr dm_mirror dm_log dm_snapshot dm_mod sg sr_mod cdrom e100 mii ehci_hcd uhci_hcd usbcore unix Pid: 9950, comm: tunload Tainted: G D (2.6.26.3 #8) EIP: 0060:[<f8b375bf>] EFLAGS: 00210246 CPU: 0 EIP is at ip6_dst_lookup_tail+0x95/0x15a [ipv6] EAX: 00000000 EBX: 00000000 ECX: ef4abdac EDX: 00000000 ESI: ef4abd3c EDI: ef64ca00 EBP: ef4abcb8 ESP: ef4abc64 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process tunload (pid: 9950, ti=ef4aa000 task=f7d45320 task.ti=ef4aa000) Stack: ef4abd58 ef4abdac f7cc0c00 ef4abc80 f8b36918 00000000 ef673e40 ef4abcc0 f8b381b2 00000002 f7cc0c00 ef7c3e00 f7cc0e24 00000000 ef4abca8 ef4abca8 c030bcfa ef4abcc0 00000000 ef4abed4 00000000 ef4abcc0 f8b377d5 ef4abdbc Call Trace: [<f8b36918>] ? ip6_cork_release+0x2e/0x52 [ipv6] [<f8b381b2>] ? ip6_push_pending_frames+0x1c9/0x3d9 [ipv6] [<c030bcfa>] ? _spin_unlock_bh+0xd/0xf [<f8b377d5>] ? ip6_dst_lookup+0xe/0x10 [ipv6] [<f8b4c2b2>] ? rawv6_sendmsg+0x25d/0xc08 [ipv6] [<c0151022>] ? filemap_fault+0x203/0x3d5 [<c02e8de0>] ? inet_sendmsg+0x2e/0x50 [<c02a24b8>] ? sock_sendmsg+0xcc/0xf0 [<c01365f5>] ? autoremove_wake_function+0x0/0x3a [<c0136799>] ? remove_wait_queue+0x30/0x34 [<f8a08fbe>] ? tun_chr_aio_read+0x298/0x31f [tun] [<c0211d67>] ? copy_from_user+0x2a/0x114 [<c02a2790>] ? sys_sendto+0xa5/0xc5 [<c02b3713>] ? neigh_periodic_timer+0x0/0x17a [<c01365f5>] ? autoremove_wake_function+0x0/0x3a [<c02a348f>] ? sys_socketcall+0x141/0x262 [<c0102f99>] ? sysenter_past_esp+0x6a/0x91 ======================= Code: 22 83 fb 9b 74 37 8b 4d b0 8b 01 e8 35 96 77 c7 8b 45 b0 c7 00 00 00 00 00 89 d8 83 c4 48 5b 5e 5f 5d c3 8b 4d b0 8b 39 8b 47 2c <f6> 40 1d de 74 23 31 db 89 d8 83 c4 48 5b 5e 5f 5d c3 64 a1 04 EIP: [<f8b375bf>] ip6_dst_lookup_tail+0x95/0x15a [ipv6] SS:ESP 0068:ef4abc64 ---[ end trace 1035c8e1d028e84b ]--- Steps to reproduce: Test case available at: http://www.remlab.net/files/divers/tunload.c
Reply-To: akpm@linux-foundation.org (switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Sun, 31 Aug 2008 09:44:36 -0700 (PDT) bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=11469 > > Summary: TUN with 1024 neighbours: ip6_dst_lookup_tail NULL crash > Product: Networking > Version: 2.5 > KernelVersion: 2.6.26.3 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: IPV6 > AssignedTo: yoshfuji@linux-ipv6.org > ReportedBy: rdenis@simphalempin.com > > > Latest working kernel version: none known > Earliest failing kernel version: none tested > Distribution: Debian > Hardware Environment: Intel(R) Pentium(R) 4 CPU 2.80GHz, HyperThreaded > Software Environment: SMT kernel, Debian glibc 2.7 > Problem Description: > When an IFF_TUN (/dev/net/tun) device has more than 1023 IPv6 neighbors, a > process context crash occurs. Backtrace follows: > > BUG: unable to handle kernel NULL pointer dereference at 0000001d > IP: [<f8b375bf>] :ipv6:ip6_dst_lookup_tail+0x95/0x15a > *pde = 00000000 > Oops: 0000 [#14] SMP > Modules linked in: ipx p8022 psnap llc p8023 i915 drm tun cpufreq_ondemand > binfmt_misc fuse nf_conntrack_ftp nf_conntrack_ipv6 nf_conntrack_ipv4 > nf_conntrack ipv6 snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm_oss > snd_mixer_oss snd_pcm snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event > snd_seq snd_timer snd_seq_device snd intel_agp psmouse soundcore agpgart > button > processor snd_page_alloc parport_pc parport iTCO_wdt evdev pcspkr dm_mirror > dm_log dm_snapshot dm_mod sg sr_mod cdrom e100 mii ehci_hcd uhci_hcd usbcore > unix > > Pid: 9950, comm: tunload Tainted: G D (2.6.26.3 #8) > EIP: 0060:[<f8b375bf>] EFLAGS: 00210246 CPU: 0 > EIP is at ip6_dst_lookup_tail+0x95/0x15a [ipv6] > EAX: 00000000 EBX: 00000000 ECX: ef4abdac EDX: 00000000 > ESI: ef4abd3c EDI: ef64ca00 EBP: ef4abcb8 ESP: ef4abc64 > DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 > Process tunload (pid: 9950, ti=ef4aa000 task=f7d45320 task.ti=ef4aa000) > Stack: ef4abd58 ef4abdac f7cc0c00 ef4abc80 f8b36918 00000000 ef673e40 > ef4abcc0 > f8b381b2 00000002 f7cc0c00 ef7c3e00 f7cc0e24 00000000 ef4abca8 > ef4abca8 > c030bcfa ef4abcc0 00000000 ef4abed4 00000000 ef4abcc0 f8b377d5 > ef4abdbc > Call Trace: > [<f8b36918>] ? ip6_cork_release+0x2e/0x52 [ipv6] > [<f8b381b2>] ? ip6_push_pending_frames+0x1c9/0x3d9 [ipv6] > [<c030bcfa>] ? _spin_unlock_bh+0xd/0xf > [<f8b377d5>] ? ip6_dst_lookup+0xe/0x10 [ipv6] > [<f8b4c2b2>] ? rawv6_sendmsg+0x25d/0xc08 [ipv6] > [<c0151022>] ? filemap_fault+0x203/0x3d5 > [<c02e8de0>] ? inet_sendmsg+0x2e/0x50 > [<c02a24b8>] ? sock_sendmsg+0xcc/0xf0 > [<c01365f5>] ? autoremove_wake_function+0x0/0x3a > [<c0136799>] ? remove_wait_queue+0x30/0x34 > [<f8a08fbe>] ? tun_chr_aio_read+0x298/0x31f [tun] > [<c0211d67>] ? copy_from_user+0x2a/0x114 > [<c02a2790>] ? sys_sendto+0xa5/0xc5 > [<c02b3713>] ? neigh_periodic_timer+0x0/0x17a > [<c01365f5>] ? autoremove_wake_function+0x0/0x3a > [<c02a348f>] ? sys_socketcall+0x141/0x262 > [<c0102f99>] ? sysenter_past_esp+0x6a/0x91 > ======================= > Code: 22 83 fb 9b 74 37 8b 4d b0 8b 01 e8 35 96 77 c7 8b 45 b0 c7 00 00 00 00 > 00 89 d8 83 c4 48 5b 5e 5f 5d c3 8b 4d b0 8b 39 8b 47 2c <f6> 40 1d de 74 23 > 31 > db 89 d8 83 c4 48 5b 5e 5f 5d c3 64 a1 04 > EIP: [<f8b375bf>] ip6_dst_lookup_tail+0x95/0x15a [ipv6] SS:ESP 0068:ef4abc64 > ---[ end trace 1035c8e1d028e84b ]--- > > > Steps to reproduce: > > Test case available at: http://www.remlab.net/files/divers/tunload.c >
Hi. On Sun, Aug 31, 2008 at 11:13:04AM -0700, Andrew Morton (akpm@linux-foundation.org) wrote: > > When an IFF_TUN (/dev/net/tun) device has more than 1023 IPv6 neighbors, a > > process context crash occurs. Backtrace follows: Does this problem still exist? > > BUG: unable to handle kernel NULL pointer dereference at 0000001d > > IP: [<f8b375bf>] :ipv6:ip6_dst_lookup_tail+0x95/0x15a > > *pde = 00000000 > > Oops: 0000 [#14] SMP > > Modules linked in: ipx p8022 psnap llc p8023 i915 drm tun cpufreq_ondemand > > binfmt_misc fuse nf_conntrack_ftp nf_conntrack_ipv6 nf_conntrack_ipv4 > > nf_conntrack ipv6 snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm_oss > > snd_mixer_oss snd_pcm snd_seq_oss snd_seq_midi snd_rawmidi > snd_seq_midi_event > > snd_seq snd_timer snd_seq_device snd intel_agp psmouse soundcore agpgart > button > > processor snd_page_alloc parport_pc parport iTCO_wdt evdev pcspkr dm_mirror > > dm_log dm_snapshot dm_mod sg sr_mod cdrom e100 mii ehci_hcd uhci_hcd > usbcore > > unix > > > > Pid: 9950, comm: tunload Tainted: G D (2.6.26.3 #8) By whom it was tainted?
Le vendredi 5 septembre 2008 14:41:50 Evgeniy Polyakov, vous avez
Hi. On Fri, Sep 05, 2008 at 07:03:38PM +0300, Rémi Denis-Courmont (rdenis@simphalempin.com) wrote: > > Does this problem still exist? > > Yes. With 2.6.27-rc5-b380b0d4f7dffcc235c0facefa537d4655619101, I get this: I will try to work it out this weekend, thanks for the info. > > > > Pid: 9950, comm: tunload Tainted: G D (2.6.26.3 #8) > > > > By whom it was tainted? > > Its own self. I just failed to copy the very first crash trace. Likely it will not help, since subsequent crashes are results of the first one and rarely contain good info, but dump you have presented is not tainted, so problem is not in some other place.
Hi. On Sun, Aug 31, 2008 at 11:13:04AM -0700, Andrew Morton (akpm@linux-foundation.org) wrote: > > BUG: unable to handle kernel NULL pointer dereference at 0000001d > > IP: [<f8b375bf>] :ipv6:ip6_dst_lookup_tail+0x95/0x15a > > *pde = 00000000 > > Oops: 0000 [#14] SMP Attached patch fixes the problem. Since dst entry is allowed not to have neighbour entry, flush it just like with incomplete one. This drops performance of your application with more than 1024 neighbours to 1024 messages, to fix it you should tune ipv6 routing parameters (gc intervals, gc threshold, maximum number of entries and so on). There may be another problem with perfomance though, at least I was able to bump it 10 times with different settings, but still two times smaller than with 4k neighbours. Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 8b67ca0..582dde5 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -934,7 +934,7 @@ static int ip6_dst_lookup_tail(struct sock *sk, * dst entry and replace it instead with the * dst entry of the nexthop router */ - if (!((*dst)->neighbour->nud_state & NUD_VALID)) { + if (!(*dst)->neighbour || !((*dst)->neighbour->nud_state & NUD_VALID)) { struct inet6_ifaddr *ifp; struct flowi fl_gw; int redirect;
Le dimanche 7 septembre 2008 21:11:09 Evgeniy Polyakov, vous avez
From: R
Hi David. On Mon, Sep 08, 2008 at 01:15:47PM -0700, David Miller (davem@davemloft.net) wrote: > But I don't like this patch for several reasons: > > 1) Slapping on a NULL check in response to a OOPS at that exact > location is usually a very big red flag, and deserves high scrutiny > instead of blind acceptance. > > 2) Looking at the indentation of this DAD code block (it's all one tab > too much) it's obviously a very shitty cut and paste job. If the > coding style was too difficult to get right, what does this say > about that change that brought the code here, semantically? :-/ > > This means we should figure out how this code got to this place, > and what kind of invariants existed at the old location that might > make this dst->neighbour dereference valid, and what implications > there are for the fact that it can now be NULL. > > Really, we really need to understand much more deeply this situation. Well, yes. The whole 'optimistic' dad looks a bit suspicious. I think failed dst entry without neighbour is a result of the 'static' dst entry returned by the above route lookup and previously neighbour was not used at all. This patch fixes the opps, but may be just hiding a problem, but reading how this optimistic duplicate address detection works, I see no strict requirements that returned route entry has to have neighbour, so this check actually can be a right fix. I've added Neil Horman, who created the patch 1.5 years ago, to the copy list.
On Tue, Sep 09, 2008 at 12:34:18AM +0400, Evgeniy Polyakov wrote: > Hi David. > > On Mon, Sep 08, 2008 at 01:15:47PM -0700, David Miller (davem@davemloft.net) > wrote: > > But I don't like this patch for several reasons: > > > > 1) Slapping on a NULL check in response to a OOPS at that exact > > location is usually a very big red flag, and deserves high scrutiny > > instead of blind acceptance. > > > > 2) Looking at the indentation of this DAD code block (it's all one tab > > too much) it's obviously a very shitty cut and paste job. If the > > coding style was too difficult to get right, what does this say > > about that change that brought the code here, semantically? :-/ > > > > This means we should figure out how this code got to this place, > > and what kind of invariants existed at the old location that might > > make this dst->neighbour dereference valid, and what implications > > there are for the fact that it can now be NULL. > > > > Really, we really need to understand much more deeply this situation. > > Well, yes. The whole 'optimistic' dad looks a bit suspicious. I think > failed dst entry without neighbour is a result of the 'static' dst entry > returned by the above route lookup and previously neighbour was not > used at all. This patch fixes the opps, but may be just hiding a > problem, but reading how this optimistic duplicate address detection > works, I see no strict requirements that returned route entry has to > have neighbour, so this check actually can be a right fix. > > I've added Neil Horman, who created the patch 1.5 years ago, to the > copy list. > I agree, while I don't normally like to fix an oops with just a NULL check, since as dave mentions, thats a pretty big red flag about not having thought through the problem, I think in this case its probably the right fix. having a dst entry without a corresponding neightbor entry is going to be a rarity to say the least, but it can happen (Evgeniy's above example, for instance, or perhaps a situation in which the arp cache was explicitly purged at just the right time from userspace). Regardless, making sure we have one by explicitly checking the pointer seems like the right thing to do. As Evgeniy noted, the Optimistic DAD code provides no explicit guarantee that we have a neigh entry for this dst. We just want to be sure we forward packets to an incomplete neighbour that is marked as optimistic to the router. We already do this dst->neighbour NULL check in many other locations (giving us precident, see ip6_output_finish and ip6_forward as just a few examples). I think its right to do the same check in the Optimistic DAD code. I just never hit it in my testing since its an uncommon case. As for the indentation, I'm pretty sure this was the only point I wrote/used this code, so I'm loathe to believe that the indentation was a cut and paste error. Unfortunately, that just implies it was my own stupidity and poor eyesight that created that uglynesss. Apologies Dave, I'll submit a patch shortly to correct that. Thanks & Regards Neil > -- > Evgeniy Polyakov > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >
replied via email on netdev. I think the NULL check is the right solution. Thats how we do it in the rest of the routing code.
On Mon, Sep 08, 2008 at 01:15:47PM -0700, David Miller wrote: > From: R
Hi. On Tue, Sep 09, 2008 at 07:32:15AM -0400, Neil Horman (nhorman@tuxdriver.com) wrote: > + /* > + * Here if the dst entry we've looked up > + * has a neighbour entry that is in the INCOMPLETE > + * state and the src address from the flow is > + * marked as OPTIMISTIC, we release the found > + * dst entry and replace it instead with the > + * dst entry of the nexthop router > + */ > + if (dst->neighbour && !((*dst)->neighbour->nud_state & NUD_VALID)) { ^^^ This should be (*dst)->neighbour, isn't it?
On Tue, Sep 09, 2008 at 06:31:05PM +0400, Evgeniy Polyakov wrote: > Hi. > > On Tue, Sep 09, 2008 at 07:32:15AM -0400, Neil Horman (nhorman@tuxdriver.com) > wrote: > > + /* > > + * Here if the dst entry we've looked up > > + * has a neighbour entry that is in the INCOMPLETE > > + * state and the src address from the flow is > > + * marked as OPTIMISTIC, we release the found > > + * dst entry and replace it instead with the > > + * dst entry of the nexthop router > > + */ > > + if (dst->neighbour && !((*dst)->neighbour->nud_state & NUD_VALID)) { > ^^^ > This should be (*dst)->neighbour, isn't it? > Dang it, yes it should. Sorry. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> ip6_output.c | 64 +++++++++++++++++++++++++++++------------------------------ 1 file changed, 32 insertions(+), 32 deletions(-) diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index a4402de..e6671bd 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -943,39 +943,39 @@ static int ip6_dst_lookup_tail(struct sock *sk, } #ifdef CONFIG_IPV6_OPTIMISTIC_DAD - /* - * Here if the dst entry we've looked up - * has a neighbour entry that is in the INCOMPLETE - * state and the src address from the flow is - * marked as OPTIMISTIC, we release the found - * dst entry and replace it instead with the - * dst entry of the nexthop router - */ - if (!((*dst)->neighbour->nud_state & NUD_VALID)) { - struct inet6_ifaddr *ifp; - struct flowi fl_gw; - int redirect; - - ifp = ipv6_get_ifaddr(net, &fl->fl6_src, - (*dst)->dev, 1); - - redirect = (ifp && ifp->flags & IFA_F_OPTIMISTIC); - if (ifp) - in6_ifa_put(ifp); - - if (redirect) { - /* - * We need to get the dst entry for the - * default router instead - */ - dst_release(*dst); - memcpy(&fl_gw, fl, sizeof(struct flowi)); - memset(&fl_gw.fl6_dst, 0, sizeof(struct in6_addr)); - *dst = ip6_route_output(net, sk, &fl_gw); - if ((err = (*dst)->error)) - goto out_err_release; - } + /* + * Here if the dst entry we've looked up + * has a neighbour entry that is in the INCOMPLETE + * state and the src address from the flow is + * marked as OPTIMISTIC, we release the found + * dst entry and replace it instead with the + * dst entry of the nexthop router + */ + if ((*dst)->neighbour && !((*dst)->neighbour->nud_state & NUD_VALID)) { + struct inet6_ifaddr *ifp; + struct flowi fl_gw; + int redirect; + + ifp = ipv6_get_ifaddr(net, &fl->fl6_src, + (*dst)->dev, 1); + + redirect = (ifp && ifp->flags & IFA_F_OPTIMISTIC); + if (ifp) + in6_ifa_put(ifp); + + if (redirect) { + /* + * We need to get the dst entry for the + * default router instead + */ + dst_release(*dst); + memcpy(&fl_gw, fl, sizeof(struct flowi)); + memset(&fl_gw.fl6_dst, 0, sizeof(struct in6_addr)); + *dst = ip6_route_output(net, sk, &fl_gw); + if ((err = (*dst)->error)) + goto out_err_release; } + } #endif return 0; > -- > Evgeniy Polyakov >
From: Neil Horman <nhorman@tuxdriver.com> Date: Tue, 9 Sep 2008 11:39:55 -0400 > On Tue, Sep 09, 2008 at 06:31:05PM +0400, Evgeniy Polyakov wrote: > > This should be (*dst)->neighbour, isn't it? > > > Dang it, yes it should. Sorry. > > Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Applied with the following commit message. I'll queue this up for -stable as well. Thanks! ipv6: Fix OOPS in ip6_dst_lookup_tail(). This fixes kernel bugzilla 11469: "TUN with 1024 neighbours: ip6_dst_lookup_tail NULL crash" dst->neighbour is not necessarily hooked up at this point in the processing path, so blindly dereferencing it is the wrong thing to do. This NULL check exists in other similar paths and this case was just an oversight. Also fix the completely wrong and confusing indentation here while we're at it. Based upon a patch by Evgeniy Polyakov. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
fixed by commit e550dfb0c2c31b6363aa463a035fc9f8dcaa3c9b