Bug 9816

Summary: cannot replace route
Product: Networking Reporter: Andreas Schwab (schwab)
Component: IPV4Assignee: Stephen Hemminger (stephen)
Status: RESOLVED CODE_FIX    
Severity: normal CC: protasnb
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.24 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 9243, 9493    

Description Andreas Schwab 2008-01-25 13:23:49 UTC
Latest working kernel version: 2.6.24-rc8
Earliest failing kernel version: 2.6.24
Distribution: SuSE 10.3
Hardware Environment: ppc32/ppc64
Software Environment:
Problem Description:
ip route replace always fails with EEXIST.  (del+add still works.)

Steps to reproduce:
# ip ro show 127.0.0.0/8
127.0.0.0/8 dev lo  scope link 
# ip ro rep 127.0.0.0/8 dev lo
RTNETLINK answers: File exists

Broken by bd566e7525b5986864e8d6eb5b67640abcd284a9.
Comment 1 Stephen Hemminger 2008-01-25 13:35:22 UTC
You are replacing route with itself!  
If you use a different route it works.

# ip ro
default via 192.168.1.1 dev eth1 
192.168.1.0/24 dev eth1  proto kernel  scope link  src 192.168.1.9 
192.168.66.0/24 dev eth1  scope link 
192.168.99.0/24 dev dummy0  proto kernel  scope link  src 192.168.99.99

# ip ro replace 192.168.66.0/24 dev dummy0

# ip ro
default via 192.168.1.1 dev eth1 
192.168.1.0/24 dev eth1  proto kernel  scope link  src 192.168.1.9 
192.168.66.0/24 dev dummy0  scope link 
192.168.99.0/24 dev dummy0  proto kernel  scope link  src 192.168.99.99

# ip ro replace 192.168.99.0/24 via 192.168.66.1 dev dummy0
Comment 2 Andreas Schwab 2008-01-25 13:51:24 UTC
I need to add a gateway to a route.  That works perfectly fine with previous versions.
Comment 3 Andreas Schwab 2008-01-25 13:56:25 UTC
# ip ro sh 10.0.0.0/8
10.0.0.0/8 dev eth0  scope link 
# ip ro rep 10.0.0.0/8 via 192.168.1.1 dev eth0 src 10.204.0.116
RTNETLINK answers: File exists
Comment 4 Andreas Schwab 2008-01-25 14:02:47 UTC
What's wrong with replacing a route with itself?
Comment 5 Anonymous Emailer 2008-01-25 14:26:30 UTC
Reply-To: akpm@linux-foundation.org

> On Fri, 25 Jan 2008 13:23:49 -0800 (PST) bugme-daemon@bugzilla.kernel.org
> wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=9816
> 
>            Summary: cannot replace route
>            Product: Networking
>            Version: 2.5
>      KernelVersion: 2.6.24
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: IPV4
>         AssignedTo: shemminger@linux-foundation.org
>         ReportedBy: schwab@suse.de
> 
> 
> Latest working kernel version: 2.6.24-rc8
> Earliest failing kernel version: 2.6.24
> Distribution: SuSE 10.3
> Hardware Environment: ppc32/ppc64
> Software Environment:
> Problem Description:
> ip route replace always fails with EEXIST.  (del+add still works.)
> 
> Steps to reproduce:
> # ip ro show 127.0.0.0/8
> 127.0.0.0/8 dev lo  scope link 
> # ip ro rep 127.0.0.0/8 dev lo
> RTNETLINK answers: File exists
> 
> Broken by bd566e7525b5986864e8d6eb5b67640abcd284a9.
> 

There is a bit more discussion in the bugzilla writeup.

I'd agree with Andrea: replacing a route with itself a) used to work and b)
should still work (surely)?
Comment 6 Jarek Poplawski 2008-01-25 15:09:34 UTC
Andrew Morton wrote, On 01/25/2008 11:26 PM:

>> On Fri, 25 Jan 2008 13:23:49 -0800 (PST) bugme-daemon@bugzilla.kernel.org
>> wrote:
>> http://bugzilla.kernel.org/show_bug.cgi?id=9816

...

> I'd agree with Andrea: replacing a route with itself a) used to work and b)
> should still work (surely)?

...on the other hand:

$ touch file1
$ cp file1 file1
cp: `file1' and `file1' are the same file
$ mv file1 file1
mv: `file1' and `file1' are the same file

and: 'everything' in 'linux' is file...

ergo: route cannot replace with itself!

Regards,
Jarek P.
Comment 7 Anonymous Emailer 2008-01-25 19:20:28 UTC
Reply-To: akpm@linux-foundation.org

> On Sat, 26 Jan 2008 00:11:57 +0100 Jarek Poplawski <jarkao2@gmail.com> wrote:
> Andrew Morton wrote, On 01/25/2008 11:26 PM:
> 
> >> On Fri, 25 Jan 2008 13:23:49 -0800 (PST) bugme-daemon@bugzilla.kernel.org
> wrote:
> >> http://bugzilla.kernel.org/show_bug.cgi?id=9816
> 
> ...
> 
> > I'd agree with Andrea: replacing a route with itself a) used to work and b)
> > should still work (surely)?
> 
> ...on the other hand:
> 
> $ touch file1
> $ cp file1 file1
> cp: `file1' and `file1' are the same file
> $ mv file1 file1
> mv: `file1' and `file1' are the same file
> 
> and: 'everything' in 'linux' is file...
> 
> ergo: route cannot replace with itself!
> 

That's not a very good analogy - the source is a kernel object.  A better
example would be:

linux-2.6.24-rc8:

echo foo > /tmp/1
echo bar > /tmp/2
echo foo > /tmp/1

linux-2.6.24:

echo foo > /tmp/1
echo bar > /tmp/2
echo foo > /tmp/1
sh: cannot write /tmp/1: Inalid argument

But whatever.   It used to work.  People's scripts will break.  Regression.
Comment 8 Joonwoo Park 2008-01-25 21:16:04 UTC
2008/1/26, Andrew Morton <akpm@linux-foundation.org>:
>
> But whatever.   It used to work.  People's scripts will break.  Regression.
>

Also I thought that 'replace with itself' should be error as like Jarek.
But it used to work and patch made a regression, it's my bad :(

I think Julian's recent patches on the list would work for this
replacement issue and regression.

Thanks,
Joonwoo
Comment 9 Jarek Poplawski 2008-01-26 03:37:42 UTC
On Sat, Jan 26, 2008 at 02:16:01PM +0900, Joonwoo Park wrote:
> 2008/1/26, Andrew Morton <akpm@linux-foundation.org>:
> >
> > But whatever.   It used to work.  People's scripts will break.  Regression.
> >
> 
> Also I thought that 'replace with itself' should be error as like Jarek.
> But it used to work and patch made a regression, it's my bad :(

Actually, I don't think 'replace with itself' should be an error. I've
only meant that lack of this possibility shouldn't be necessarily seen
as error - there could be arguments for both sides.

IMHO, there should be simply analyzed pros and cons of doing it in
this particular place, so: if there is any gain in doing this, and if
possible complications or problems with performance, security etc.
don't prevail such a gain.

And I don't think a regression argument should be valid at all if
there are removed any unlogical, error-prone or otherwise problematic
options (I don't know if this is such case), even if they are not
obvious bugs - especially if it's still possible to do the same
'proper' way.

Regards,
Jarek P.
Comment 10 Jarek Poplawski 2008-01-26 04:07:22 UTC
On Fri, Jan 25, 2008 at 07:20:26PM -0800, Andrew Morton wrote:
...
> That's not a very good analogy - the source is a kernel object.  A better
> example would be:
> 
> linux-2.6.24-rc8:
> 
> echo foo > /tmp/1
> echo bar > /tmp/2
> echo foo > /tmp/1
> 
> linux-2.6.24:
> 
> echo foo > /tmp/1
> echo bar > /tmp/2
> echo foo > /tmp/1
> sh: cannot write /tmp/1: Inalid argument

Probably you are right. When I have some time to build and boot 2.6.24
at last, I hope I'll find what this example is about...

> But whatever.   It used to work.  People's scripts will break.  Regression.

Do you mean people didn't have to change their scripts before, e.g.
from linux 2.0.? Scripts are kind of programs, and both APIs and some
'undocumented' features are achanging...

Regards,
Jarek P.

Regards,
Jarek P.
Comment 11 Jarek Poplawski 2008-01-26 06:06:47 UTC
On Sat, Jan 26, 2008 at 12:40:36PM +0100, Jarek Poplawski wrote:
> On Sat, Jan 26, 2008 at 02:16:01PM +0900, Joonwoo Park wrote:
> > 2008/1/26, Andrew Morton <akpm@linux-foundation.org>:
> > >
> > > But whatever.   It used to work.  People's scripts will break. 
> Regression.
> > >
> > 
> > Also I thought that 'replace with itself' should be error as like Jarek.
> > But it used to work and patch made a regression, it's my bad :(
> 
> Actually, I don't think 'replace with itself' should be an error. I've
> only meant that lack of this possibility shouldn't be necessarily seen
> as error - there could be arguments for both sides.

...On the other hand, after some re-thinking, I actually think 'replace
with itself' should be considered a bug. I wondered about the possible
reason of this behaviour in a file system, and it seems replace just
means things like overwrite, so old thing is always supposed to be
destroyed (of course it's a matter of implementation or conditions in
which moment this destruction takes place).

So, 'replace with itself' is simply ambiguous: we can always delete the
object first, to prepare the place for replacement, and find there is
nothing to do after this - and it's probably not what somebody wanted.
And, after re-reading this bugzilla report, I'm pretty sure the thing
should be done with 'ip route change' (but I didn't check if 2.6.24
knows about this...).

Jarek P.
Comment 12 Andreas Schwab 2008-01-26 06:27:38 UTC
Jarek Poplawski <jarkao2@gmail.com> writes:

> And, after re-reading this bugzilla report, I'm pretty sure the thing
> should be done with 'ip route change' (but I didn't check if 2.6.24
> knows about this...).

$ man ip
[...]
   ip route add - add new route
   ip route change - change route
   ip route replace - change or add new one
[...]

According to this "replace" should be a superset of "change".

Also, please check out comment#3, it also fails for replacing a route
with something different (it's a route to an ipsec tunnel).

Andreas.
Comment 13 Jarek Poplawski 2008-01-26 06:28:47 UTC
On Sat, Jan 26, 2008 at 03:10:10PM +0100, Jarek Poplawski wrote:
...
> [...] so old thing is always supposed to be
> destroyed (of course it's a matter of implementation or conditions in
> which moment this destruction takes place).
> 
> So, 'replace with itself' is simply ambiguous: we can always delete the
> object first, to prepare the place for replacement, and find there is
> nothing to do after this - and it's probably not what somebody wanted.

As a matter of fact, the moment of destruction doesn't even matter:
assuming the replaced thing is destroyed in all 'common' cases, doing
this at the end isn't probably wanted as well - and skipping this
action makes an exception - what IMHO proves the concept is at least
inconsistent.

Jarek P.
Comment 14 Jarek Poplawski 2008-01-26 07:15:05 UTC
On Sat, Jan 26, 2008 at 03:27:00PM +0100, Andreas Schwab wrote:
> Jarek Poplawski <jarkao2@gmail.com> writes:
> 
> > And, after re-reading this bugzilla report, I'm pretty sure the thing
> > should be done with 'ip route change' (but I didn't check if 2.6.24
> > knows about this...).
> 
> $ man ip
> [...]
>    ip route add - add new route
>    ip route change - change route
>    ip route replace - change or add new one
> [...]
> 
> According to this "replace" should be a superset of "change".

According to this "replace" should be ...ambiguous. I could read this
"my/proper(?) way":

    ip route replace - change with new one or add new one

And ...man could be wrong too after all! (...but not me!)

> Also, please check out comment#3, it also fails for replacing a route
> with something different (it's a route to an ipsec tunnel).

It all depends on which routes should be considered different (and it
should be specified somewhere BTW...).

But, I should've added my all reasoning was more about logic, and since
in real life change and replace are often equivalent, and iproute is
famous from using such equivalents in many places, your claim WRT this
man page could be completely right!

There should be only considered, if realization of this doesn't imply
bugs in some other places, like the one which fix caused this
"regression".

Regards,
Jarek P.
Comment 15 Jarek Poplawski 2008-01-26 17:09:00 UTC
On Sat, Jan 26, 2008 at 04:19:34PM +0100, Jarek Poplawski wrote:
> On Sat, Jan 26, 2008 at 03:27:00PM +0100, Andreas Schwab wrote:
> > Jarek Poplawski <jarkao2@gmail.com> writes:
> > 
> > > And, after re-reading this bugzilla report, I'm pretty sure the thing
> > > should be done with 'ip route change' (but I didn't check if 2.6.24
> > > knows about this...).
> > 
> > $ man ip
> > [...]
> >    ip route add - add new route
> >    ip route change - change route
> >    ip route replace - change or add new one
> > [...]
> > 
> > According to this "replace" should be a superset of "change".
> 
> According to this "replace" should be ...ambiguous. I could read this
> "my/proper(?) way":
> 
>     ip route replace - change with new one or add new one
> 
> And ...man could be wrong too after all! (...but not me!)

After some checks it seems man is right - ie. WRT iproute.c! (...hmm?)
And you read this right: '"replace" should be a superset of "change"'.

> > Also, please check out comment#3, it also fails for replacing a route
> > with something different (it's a route to an ipsec tunnel).

But comment#3 is "ambiguous"... It looks like you don't want to show
us too much... So, apparently you change the route, but it seems this
route exists; you have this:
  10.0.0.0/8 dev eth0  scope link 
but probably also something like this:
  default via 192.168.1.1 dev eth0 src 10.204.0.116

So, I doubt there is any "real" change attempted here. It looks more
like a question if program should allow for changing the form of route
entries even if they mean the same, and if this should be reported as
error at all? But maybe I miss something...

Jarek P.
Comment 16 Jarek Poplawski 2008-01-26 23:56:37 UTC
On Sun, Jan 27, 2008 at 02:11:26AM +0100, Jarek Poplawski wrote:
...
> But comment#3 is "ambiguous"... It looks like you don't want to show
> us too much... So, apparently you change the route, but it seems this
> route exists; you have this:
>   10.0.0.0/8 dev eth0  scope link 
> but probably also something like this:
>   default via 192.168.1.1 dev eth0 src 10.204.0.116
> 
> So, I doubt there is any "real" change attempted here. It looks more
> like a question if program should allow for changing the form of route
> entries even if they mean the same, and if this should be reported as
> error at all? But maybe I miss something...

...But, on the other hand, default route shoudn't affect adding or
changing specific routes, so this behaviour is wrong, and IMHO this
change should be reverted or fixed.

Regards,
Jarek P.
Comment 17 Anonymous Emailer 2008-01-27 01:48:31 UTC
Reply-To: ja@ssi.bg


	Hello,

On Sun, 27 Jan 2008, Jarek Poplawski wrote:

> But comment#3 is "ambiguous"... It looks like you don't want to show
> us too much... So, apparently you change the route, but it seems this
> route exists; you have this:
>   10.0.0.0/8 dev eth0  scope link 
> but probably also something like this:
>   default via 192.168.1.1 dev eth0 src 10.204.0.116

	On replace the problem arises when same fib_info (priority, 
protocol, prefsrc, metrics, nexthops) is used in another route or routing 
table. In such cases single copy of structure is used with reference 
counter, all routes share pointer to such fib_info structure which
saves memory when we have many routes using same gateway, for example.

> So, I doubt there is any "real" change attempted here. It looks more
> like a question if program should allow for changing the form of route
> entries even if they mean the same, and if this should be reported as
> error at all? But maybe I miss something...

	No, simply the last change in 2.6.24 is wrong to assume 
duplication is evident in fib_info reference counter. And such check
is only on ip route replace/change. I'm appending brief FIB information
for your reference:

FIB - Forwarding Information Base

- Routes are organized in routing tables
- For "fib_hash" algorithm routing tables have 33 zones (for prefix
lengths 0..32), routing lookup walks them from 32 to 0 to find a
node containing all routing information
- Zones are implemented as hash tables where nodes are hashed by
key (prefix=network) because there can be lots of prefixes in a zone.
- Nodes can be stored with other methods, eg. trie, where nodes are
searched (we hope faster) by prefix and length, no zones are used
in this case
- Nodes have a list of aliases (tos+type+scope+fib_info ptr) sorted by
decreasing TOS because TOS=0 must be a last hit when looking for route.
type is unicast, local, prohibit, etc. scope is host, link, etc.
- fib_info is a structure containing protocol (kernel, boot, zebra, etc),
prefsrc, priority (metric), metrics, nexthop(s). Fallback routes have
higher value for priority, they are used if more priority routes
disappear or their nexthops are dead.
- fib_info structures are organized in 2 global hash tables, one
keyed by prefsrc and another by nexthop_count+protocol+prefsrc+priority
- fib_info is a shared structure, different aliases can point to same
fib_info, even aliases from different prefixes, from different routing
tables. By this way if fib_info contains multipath route then many
aliases share same route path scheduling context.
- Nexthop contains gateway, output device, scope and weight. Weight
is used for path scheduling where nexthops have relative priority
compared to other nexthops in multipath route.
- There can be many aliases with same tos, there can be alternative
routes (aliases) with same tos and priority (metric) but only one alias
with particular tos, type, scope and fib_info can exist to avoid duplicate
alternative routes.
- The operation to replace route includes replacing of alias. The alias
in node (table -> prefix/len) is matched by tos and fib_info priority and
they can not be changed. The parameters that are changed are type, scope
and fib_info (except priority).

* routing table
	* node (prefix/len)
		* alias (tos, type, scope)
			-> fib_info (priority, protocol, prefsrc, metrics)
				* nexthop (gateway, outdev, scope, weight)

read '*' for 'many' and '->' for 'counted reference'

Regards

--
Julian Anastasov <ja@ssi.bg>
Comment 18 Jarek Poplawski 2008-01-27 03:10:41 UTC
On Sun, Jan 27, 2008 at 11:49:06AM +0200, Julian Anastasov wrote:
...
>       No, simply the last change in 2.6.24 is wrong to assume 
> duplication is evident in fib_info reference counter. And such check
> is only on ip route replace/change. I'm appending brief FIB information
> for your reference:

Thank you very much! I've just planned to look more at this code today,
and maybe try to find how your new patch copes with this problem, so
this description should be really helpful!

But, since we agree now it's a bug and regression in 2.6.24, it seems
this should be fixed as soon as possible, and since your patch isn't
probably tested enough for stable, it looks like safe bet to revert
Joonwoo's patch, at least until this all is more verified.

Regards,
Jarek P.
Comment 19 Rafael J. Wysocki 2008-01-27 03:13:12 UTC
(In reply to comment #17)
> Reply-To: ja@ssi.bg
> 
> 
>         Hello,
> 
> On Sun, 27 Jan 2008, Jarek Poplawski wrote:
> 
> > But comment#3 is "ambiguous"... It looks like you don't want to show
> > us too much... So, apparently you change the route, but it seems this
> > route exists; you have this:
> >   10.0.0.0/8 dev eth0  scope link 
> > but probably also something like this:
> >   default via 192.168.1.1 dev eth0 src 10.204.0.116
> 
>         On replace the problem arises when same fib_info (priority, 
> protocol, prefsrc, metrics, nexthops) is used in another route or routing 
> table. In such cases single copy of structure is used with reference 
> counter, all routes share pointer to such fib_info structure which
> saves memory when we have many routes using same gateway, for example.
> 
> > So, I doubt there is any "real" change attempted here. It looks more
> > like a question if program should allow for changing the form of route
> > entries even if they mean the same, and if this should be reported as
> > error at all? But maybe I miss something...
> 
>         No, simply the last change in 2.6.24 is wrong to assume 
> duplication is evident in fib_info reference counter. And such check
> is only on ip route replace/change. I'm appending brief FIB information
> for your reference:

Is there a fix for 2.6.24, available somewhere?
Comment 20 Natalie Protasevich 2008-06-02 15:35:57 UTC
Any updates on this problem? Has aforementioned patches (#8) resolved the problem in the latest kernel? - thanks.
Comment 21 Andreas Schwab 2008-06-06 11:03:07 UTC
Fixed by c18865f39276435abb9286f9a816cb5b66c99a00 ([IPV4] fib: fix route replacement, fib_info is shared).