Bug 29252

Summary: IPv6 doesn't work in a kvm guest.
Product: Networking Reporter: Kusanagi Kouichi (slash)
Component: IPV6Assignee: Hideaki YOSHIFUJI (yoshfuji)
Status: CLOSED CODE_FIX    
Severity: normal CC: florian, maciej.rutecki, rjw
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.38-rc5 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 27352    

Description Kusanagi Kouichi 2011-02-16 13:03:11 UTC
apt-get and ping6 don't work in a kvm guest. It appears that neighbor discovery is not working. I'm using tap, virtio-net and vhost. I did bisect and results is

d80bc0fd262ef840ed4e82593ad6416fa1ba3fc4 is the first bad commit
commit d80bc0fd262ef840ed4e82593ad6416fa1ba3fc4
Author: David S. Miller <davem@davemloft.net>
Date:   Mon Jan 24 16:01:58 2011 -0800

    ipv6: Always clone offlink routes.

    Do not handle PMTU vs. route lookup creation any differently
    wrt. offlink routes, always clone them.

    Reported-by: PK <runningdoglackey@yahoo.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

:040000 040000 c4a34e9e36bd8cc81e898738f2d4344f6647d472f0fcd4b093ec22784798f3951203391f4567b278 M      net

reverting this commit for guest kernel fixes the probrem.
Comment 1 Andrew Morton 2011-02-17 22:25:52 UTC
(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Wed, 16 Feb 2011 13:03:15 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=29252
> 
>            Summary: IPv6 doesn't work in a kvm guest.
>            Product: Networking
>            Version: 2.5
>     Kernel Version: 2.6.38-rc5
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: IPV6
>         AssignedTo: yoshfuji@linux-ipv6.org
>         ReportedBy: slash@ac.auone-net.jp
>         Regression: Yes
> 
> 
> apt-get and ping6 don't work in a kvm guest. It appears that neighbor
> discovery
> is not working. I'm using tap, virtio-net and vhost. I did bisect and results
> is
> 
> d80bc0fd262ef840ed4e82593ad6416fa1ba3fc4 is the first bad commit
> commit d80bc0fd262ef840ed4e82593ad6416fa1ba3fc4
> Author: David S. Miller <davem@davemloft.net>
> Date:   Mon Jan 24 16:01:58 2011 -0800
> 
>     ipv6: Always clone offlink routes.
> 
>     Do not handle PMTU vs. route lookup creation any differently
>     wrt. offlink routes, always clone them.
> 
>     Reported-by: PK <runningdoglackey@yahoo.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> :040000 040000
>
> c4a34e9e36bd8cc81e898738f2d4344f6647d472f0fcd4b093ec22784798f3951203391f4567b278
> M      net
> 
> reverting this commit for guest kernel fixes the probrem.
>
Comment 2 David S. Miller 2011-02-17 22:32:23 UTC
From: Andrew Morton <akpm@linux-foundation.org>
Date: Thu, 17 Feb 2011 14:25:17 -0800

>> https://bugzilla.kernel.org/show_bug.cgi?id=29252
 ...
>> apt-get and ping6 don't work in a kvm guest. It appears that neighbor
>> discovery
>> is not working. I'm using tap, virtio-net and vhost. I did bisect and
>> results
>> is
>> 
>> d80bc0fd262ef840ed4e82593ad6416fa1ba3fc4 is the first bad commit
>> commit d80bc0fd262ef840ed4e82593ad6416fa1ba3fc4
>> Author: David S. Miller <davem@davemloft.net>
>> Date:   Mon Jan 24 16:01:58 2011 -0800
>> 
>>     ipv6: Always clone offlink routes.
>> 
>>     Do not handle PMTU vs. route lookup creation any differently
>>     wrt. offlink routes, always clone them.
>> 
>>     Reported-by: PK <runningdoglackey@yahoo.com>
>>     Signed-off-by: David S. Miller <davem@davemloft.net>
>> 
>> :040000 040000
>>
>> c4a34e9e36bd8cc81e898738f2d4344f6647d472f0fcd4b093ec22784798f3951203391f4567b278
>> M      net
>> 
>> reverting this commit for guest kernel fixes the probrem.

Thanks for the report, I'll look into this.
Comment 3 Kusanagi Kouichi 2011-03-06 16:00:52 UTC
v2.6.38-rc7-163-gfb62c00 doesn't work. Incidentally, I found that this bug affects only 32bit guest.
Comment 4 David S. Miller 2011-03-09 23:58:04 UTC
Ok, the following should address both bugs, #29252 and #30462, please
give it some testing.

--------------------
ipv6: Don't create clones of nonexthop routes forever.

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=29252
Addresses https://bugzilla.kernel.org/show_bug.cgi?id=30462

In commit d80bc0fd262ef840ed4e82593ad6416fa1ba3fc4 ("ipv6: Always
clone offlink routes.") we forced the kernel to always clone offlink
routes.

The reason we do that is to make sure we never bind an inetpeer to a
prefixed route.

The logic turned on here has existed in the tree for many years,
but was always off due to a protecting CPP define.  So perhaps
it's no surprise that there is a logic bug here.

The issue is that there is nothing that stops the cloning process.
Every route call will find the same route in the same condition,
reclone it, and insert.

This causes two problems:

1) If the in6_rt_ins() call succeeds, we get tons of crap in the
   routing tree.

2) If the int6_rt_ins() call fails, we get no networking to that
   destination.

We've had reports of both cases.

End the loop and the failures by keying on the RTF_CACHE flag which
will only be set in the clone routes.

Reported-by: slash@ac.auone-net.jp
Reported-by: Ernst Sjöstrand <ernstp@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 904312e..8963635 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -739,8 +739,10 @@ restart:
 
 	if (!rt->rt6i_nexthop && !(rt->rt6i_flags & RTF_NONEXTHOP))
 		nrt = rt6_alloc_cow(rt, &fl->fl6_dst, &fl->fl6_src);
-	else
+	else if (!(rt->rt6i_flags & RTF_CACHE))
 		nrt = rt6_alloc_clone(rt, &fl->fl6_dst);
+	else
+		goto out2;
 
 	dst_release(&rt->dst);
 	rt = nrt ? : net->ipv6.ip6_null_entry;
Comment 5 David S. Miller 2011-03-10 03:20:01 UTC
From: David Miller <davem@davemloft.net>
Date: Wed, 09 Mar 2011 15:58:18 -0800 (PST)

> Ok, the following should address both bugs, #29252 and #30462, please
> give it some testing.
> 
> --------------------
> ipv6: Don't create clones of nonexthop routes forever.

Nevermind, this patch has problems, I'm still debugging and trying to
come up with a proper fix.

Thanks in advance for your patience.
Comment 6 David S. Miller 2011-03-10 04:04:25 UTC
From: David Miller <davem@davemloft.net>
Date: Wed, 09 Mar 2011 19:20:12 -0800 (PST)

> From: David Miller <davem@davemloft.net>
> Date: Wed, 09 Mar 2011 15:58:18 -0800 (PST)
> 
>> Ok, the following should address both bugs, #29252 and #30462, please
>> give it some testing.
>> 
>> --------------------
>> ipv6: Don't create clones of nonexthop routes forever.
> 
> Nevermind, this patch has problems, I'm still debugging and trying to
> come up with a proper fix.
> 
> Thanks in advance for your patience.

Ok, I'm more confident in this version of the fix.  It passes all of
my tests, and I've added instrumentation to make sure various cases
are performing the operations the way I expect them to.

--------------------
ipv6: Don't create clones of host routes.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=29252
Addresses https://bugzilla.kernel.org/show_bug.cgi?id=30462

In commit d80bc0fd262ef840ed4e82593ad6416fa1ba3fc4 ("ipv6: Always
clone offlink routes.") we forced the kernel to always clone offlink
routes.

The reason we do that is to make sure we never bind an inetpeer to a
prefixed route.

The logic turned on here has existed in the tree for many years,
but was always off due to a protecting CPP define.  So perhaps
it's no surprise that there is a logic bug here.

The problem is that we canot clone a route that is already a
host route (ie. has DST_HOST set).  Because if we do, an identical
entry already exists in the routing tree and therefore the
ip6_rt_ins() call is going to fail.

This sets off a series of failures and high cpu usage, because when
ip6_rt_ins() fails we loop retrying this operation a few times in
order to handle a race between two threads trying to clone and insert
the same host route at the same time.

Fix this by simply using the route as-is when DST_HOST is set.

Reported-by: slash@ac.auone-net.jp
Reported-by: Ernst Sjöstrand <ernstp@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv6/route.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 904312e..e7db701 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -739,8 +739,10 @@ restart:
 
 	if (!rt->rt6i_nexthop && !(rt->rt6i_flags & RTF_NONEXTHOP))
 		nrt = rt6_alloc_cow(rt, &fl->fl6_dst, &fl->fl6_src);
-	else
+	else if (!(rt->dst.flags & DST_HOST))
 		nrt = rt6_alloc_clone(rt, &fl->fl6_dst);
+	else
+		goto out2;
 
 	dst_release(&rt->dst);
 	rt = nrt ? : net->ipv6.ip6_null_entry;
Comment 7 Kusanagi Kouichi 2011-03-11 08:38:08 UTC
Your patch fixes the problem. Thanks.
Comment 8 Ernst Sjöstrand 2011-03-11 19:45:05 UTC
Hi,

this patch solved the issue I reported. Applied the patch on -rc8.

Regards
//Ernst

On Thu, Mar 10, 2011 at 05:04, David Miller <davem@davemloft.net> wrote:
> From: David Miller <davem@davemloft.net>
> Date: Wed, 09 Mar 2011 19:20:12 -0800 (PST)
>
>> From: David Miller <davem@davemloft.net>
>> Date: Wed, 09 Mar 2011 15:58:18 -0800 (PST)
>>
>>> Ok, the following should address both bugs, #29252 and #30462, please
>>> give it some testing.
>>>
>>> --------------------
>>> ipv6: Don't create clones of nonexthop routes forever.
>>
>> Nevermind, this patch has problems, I'm still debugging and trying to
>> come up with a proper fix.
>>
>> Thanks in advance for your patience.
>
> Ok, I'm more confident in this version of the fix.  It passes all of
> my tests, and I've added instrumentation to make sure various cases
> are performing the operations the way I expect them to.
>
> --------------------
> ipv6: Don't create clones of host routes.
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
> Addresses https://bugzilla.kernel.org/show_bug.cgi?id=29252
> Addresses https://bugzilla.kernel.org/show_bug.cgi?id=30462
>
> In commit d80bc0fd262ef840ed4e82593ad6416fa1ba3fc4 ("ipv6: Always
> clone offlink routes.") we forced the kernel to always clone offlink
> routes.
>
> The reason we do that is to make sure we never bind an inetpeer to a
> prefixed route.
>
> The logic turned on here has existed in the tree for many years,
> but was always off due to a protecting CPP define.  So perhaps
> it's no surprise that there is a logic bug here.
>
> The problem is that we canot clone a route that is already a
> host route (ie. has DST_HOST set).  Because if we do, an identical
> entry already exists in the routing tree and therefore the
> ip6_rt_ins() call is going to fail.
>
> This sets off a series of failures and high cpu usage, because when
> ip6_rt_ins() fails we loop retrying this operation a few times in
> order to handle a race between two threads trying to clone and insert
> the same host route at the same time.
>
> Fix this by simply using the route as-is when DST_HOST is set.
>
> Reported-by: slash@ac.auone-net.jp
> Reported-by: Ernst Sjöstrand <ernstp@gmail.com>
> Signed-off-by: David S. Miller <davem@davemloft.net>
> ---
>  net/ipv6/route.c |    4 +++-
>  1 files changed, 3 insertions(+), 1 deletions(-)
>
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index 904312e..e7db701 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -739,8 +739,10 @@ restart:
>
>        if (!rt->rt6i_nexthop && !(rt->rt6i_flags & RTF_NONEXTHOP))
>                nrt = rt6_alloc_cow(rt, &fl->fl6_dst, &fl->fl6_src);
> -       else
> +       else if (!(rt->dst.flags & DST_HOST))
>                nrt = rt6_alloc_clone(rt, &fl->fl6_dst);
> +       else
> +               goto out2;
>
>        dst_release(&rt->dst);
>        rt = nrt ? : net->ipv6.ip6_null_entry;
> --
> 1.7.4.1
>
>
Comment 9 David S. Miller 2011-03-11 20:15:44 UTC
From: Ernst Sjöstrand <ernstp@gmail.com>
Date: Fri, 11 Mar 2011 20:44:39 +0100

> this patch solved the issue I reported. Applied the patch on -rc8.

Thank you for testing.
Comment 10 Florian Mickler 2011-03-14 17:24:02 UTC
There was a commit merged for 2.6.38-rc9 (or final) which mentions this bug
report:

commit 7343ff31ebf01691ea4515d3126467434b9d22d6
Author: David S. Miller <davem@davemloft.net>
Date:   Wed Mar 9 19:55:25 2011 -0800

    ipv6: Don't create clones of host routes.