Latest working kernel version: 2.6.24-rc8 Earliest failing kernel version: 2.6.24 Distribution: SuSE 10.3 Hardware Environment: ppc32/ppc64 Software Environment: Problem Description: ip route replace always fails with EEXIST. (del+add still works.) Steps to reproduce: # ip ro show 127.0.0.0/8 127.0.0.0/8 dev lo scope link # ip ro rep 127.0.0.0/8 dev lo RTNETLINK answers: File exists Broken by bd566e7525b5986864e8d6eb5b67640abcd284a9.
You are replacing route with itself! If you use a different route it works. # ip ro default via 192.168.1.1 dev eth1 192.168.1.0/24 dev eth1 proto kernel scope link src 192.168.1.9 192.168.66.0/24 dev eth1 scope link 192.168.99.0/24 dev dummy0 proto kernel scope link src 192.168.99.99 # ip ro replace 192.168.66.0/24 dev dummy0 # ip ro default via 192.168.1.1 dev eth1 192.168.1.0/24 dev eth1 proto kernel scope link src 192.168.1.9 192.168.66.0/24 dev dummy0 scope link 192.168.99.0/24 dev dummy0 proto kernel scope link src 192.168.99.99 # ip ro replace 192.168.99.0/24 via 192.168.66.1 dev dummy0
I need to add a gateway to a route. That works perfectly fine with previous versions.
# ip ro sh 10.0.0.0/8 10.0.0.0/8 dev eth0 scope link # ip ro rep 10.0.0.0/8 via 192.168.1.1 dev eth0 src 10.204.0.116 RTNETLINK answers: File exists
What's wrong with replacing a route with itself?
Reply-To: akpm@linux-foundation.org > On Fri, 25 Jan 2008 13:23:49 -0800 (PST) bugme-daemon@bugzilla.kernel.org > wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=9816 > > Summary: cannot replace route > Product: Networking > Version: 2.5 > KernelVersion: 2.6.24 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: IPV4 > AssignedTo: shemminger@linux-foundation.org > ReportedBy: schwab@suse.de > > > Latest working kernel version: 2.6.24-rc8 > Earliest failing kernel version: 2.6.24 > Distribution: SuSE 10.3 > Hardware Environment: ppc32/ppc64 > Software Environment: > Problem Description: > ip route replace always fails with EEXIST. (del+add still works.) > > Steps to reproduce: > # ip ro show 127.0.0.0/8 > 127.0.0.0/8 dev lo scope link > # ip ro rep 127.0.0.0/8 dev lo > RTNETLINK answers: File exists > > Broken by bd566e7525b5986864e8d6eb5b67640abcd284a9. > There is a bit more discussion in the bugzilla writeup. I'd agree with Andrea: replacing a route with itself a) used to work and b) should still work (surely)?
Andrew Morton wrote, On 01/25/2008 11:26 PM: >> On Fri, 25 Jan 2008 13:23:49 -0800 (PST) bugme-daemon@bugzilla.kernel.org >> wrote: >> http://bugzilla.kernel.org/show_bug.cgi?id=9816 ... > I'd agree with Andrea: replacing a route with itself a) used to work and b) > should still work (surely)? ...on the other hand: $ touch file1 $ cp file1 file1 cp: `file1' and `file1' are the same file $ mv file1 file1 mv: `file1' and `file1' are the same file and: 'everything' in 'linux' is file... ergo: route cannot replace with itself! Regards, Jarek P.
Reply-To: akpm@linux-foundation.org > On Sat, 26 Jan 2008 00:11:57 +0100 Jarek Poplawski <jarkao2@gmail.com> wrote: > Andrew Morton wrote, On 01/25/2008 11:26 PM: > > >> On Fri, 25 Jan 2008 13:23:49 -0800 (PST) bugme-daemon@bugzilla.kernel.org > wrote: > >> http://bugzilla.kernel.org/show_bug.cgi?id=9816 > > ... > > > I'd agree with Andrea: replacing a route with itself a) used to work and b) > > should still work (surely)? > > ...on the other hand: > > $ touch file1 > $ cp file1 file1 > cp: `file1' and `file1' are the same file > $ mv file1 file1 > mv: `file1' and `file1' are the same file > > and: 'everything' in 'linux' is file... > > ergo: route cannot replace with itself! > That's not a very good analogy - the source is a kernel object. A better example would be: linux-2.6.24-rc8: echo foo > /tmp/1 echo bar > /tmp/2 echo foo > /tmp/1 linux-2.6.24: echo foo > /tmp/1 echo bar > /tmp/2 echo foo > /tmp/1 sh: cannot write /tmp/1: Inalid argument But whatever. It used to work. People's scripts will break. Regression.
2008/1/26, Andrew Morton <akpm@linux-foundation.org>: > > But whatever. It used to work. People's scripts will break. Regression. > Also I thought that 'replace with itself' should be error as like Jarek. But it used to work and patch made a regression, it's my bad :( I think Julian's recent patches on the list would work for this replacement issue and regression. Thanks, Joonwoo
On Sat, Jan 26, 2008 at 02:16:01PM +0900, Joonwoo Park wrote: > 2008/1/26, Andrew Morton <akpm@linux-foundation.org>: > > > > But whatever. It used to work. People's scripts will break. Regression. > > > > Also I thought that 'replace with itself' should be error as like Jarek. > But it used to work and patch made a regression, it's my bad :( Actually, I don't think 'replace with itself' should be an error. I've only meant that lack of this possibility shouldn't be necessarily seen as error - there could be arguments for both sides. IMHO, there should be simply analyzed pros and cons of doing it in this particular place, so: if there is any gain in doing this, and if possible complications or problems with performance, security etc. don't prevail such a gain. And I don't think a regression argument should be valid at all if there are removed any unlogical, error-prone or otherwise problematic options (I don't know if this is such case), even if they are not obvious bugs - especially if it's still possible to do the same 'proper' way. Regards, Jarek P.
On Fri, Jan 25, 2008 at 07:20:26PM -0800, Andrew Morton wrote: ... > That's not a very good analogy - the source is a kernel object. A better > example would be: > > linux-2.6.24-rc8: > > echo foo > /tmp/1 > echo bar > /tmp/2 > echo foo > /tmp/1 > > linux-2.6.24: > > echo foo > /tmp/1 > echo bar > /tmp/2 > echo foo > /tmp/1 > sh: cannot write /tmp/1: Inalid argument Probably you are right. When I have some time to build and boot 2.6.24 at last, I hope I'll find what this example is about... > But whatever. It used to work. People's scripts will break. Regression. Do you mean people didn't have to change their scripts before, e.g. from linux 2.0.? Scripts are kind of programs, and both APIs and some 'undocumented' features are achanging... Regards, Jarek P. Regards, Jarek P.
On Sat, Jan 26, 2008 at 12:40:36PM +0100, Jarek Poplawski wrote: > On Sat, Jan 26, 2008 at 02:16:01PM +0900, Joonwoo Park wrote: > > 2008/1/26, Andrew Morton <akpm@linux-foundation.org>: > > > > > > But whatever. It used to work. People's scripts will break. > Regression. > > > > > > > Also I thought that 'replace with itself' should be error as like Jarek. > > But it used to work and patch made a regression, it's my bad :( > > Actually, I don't think 'replace with itself' should be an error. I've > only meant that lack of this possibility shouldn't be necessarily seen > as error - there could be arguments for both sides. ...On the other hand, after some re-thinking, I actually think 'replace with itself' should be considered a bug. I wondered about the possible reason of this behaviour in a file system, and it seems replace just means things like overwrite, so old thing is always supposed to be destroyed (of course it's a matter of implementation or conditions in which moment this destruction takes place). So, 'replace with itself' is simply ambiguous: we can always delete the object first, to prepare the place for replacement, and find there is nothing to do after this - and it's probably not what somebody wanted. And, after re-reading this bugzilla report, I'm pretty sure the thing should be done with 'ip route change' (but I didn't check if 2.6.24 knows about this...). Jarek P.
Jarek Poplawski <jarkao2@gmail.com> writes: > And, after re-reading this bugzilla report, I'm pretty sure the thing > should be done with 'ip route change' (but I didn't check if 2.6.24 > knows about this...). $ man ip [...] ip route add - add new route ip route change - change route ip route replace - change or add new one [...] According to this "replace" should be a superset of "change". Also, please check out comment#3, it also fails for replacing a route with something different (it's a route to an ipsec tunnel). Andreas.
On Sat, Jan 26, 2008 at 03:10:10PM +0100, Jarek Poplawski wrote: ... > [...] so old thing is always supposed to be > destroyed (of course it's a matter of implementation or conditions in > which moment this destruction takes place). > > So, 'replace with itself' is simply ambiguous: we can always delete the > object first, to prepare the place for replacement, and find there is > nothing to do after this - and it's probably not what somebody wanted. As a matter of fact, the moment of destruction doesn't even matter: assuming the replaced thing is destroyed in all 'common' cases, doing this at the end isn't probably wanted as well - and skipping this action makes an exception - what IMHO proves the concept is at least inconsistent. Jarek P.
On Sat, Jan 26, 2008 at 03:27:00PM +0100, Andreas Schwab wrote: > Jarek Poplawski <jarkao2@gmail.com> writes: > > > And, after re-reading this bugzilla report, I'm pretty sure the thing > > should be done with 'ip route change' (but I didn't check if 2.6.24 > > knows about this...). > > $ man ip > [...] > ip route add - add new route > ip route change - change route > ip route replace - change or add new one > [...] > > According to this "replace" should be a superset of "change". According to this "replace" should be ...ambiguous. I could read this "my/proper(?) way": ip route replace - change with new one or add new one And ...man could be wrong too after all! (...but not me!) > Also, please check out comment#3, it also fails for replacing a route > with something different (it's a route to an ipsec tunnel). It all depends on which routes should be considered different (and it should be specified somewhere BTW...). But, I should've added my all reasoning was more about logic, and since in real life change and replace are often equivalent, and iproute is famous from using such equivalents in many places, your claim WRT this man page could be completely right! There should be only considered, if realization of this doesn't imply bugs in some other places, like the one which fix caused this "regression". Regards, Jarek P.
On Sat, Jan 26, 2008 at 04:19:34PM +0100, Jarek Poplawski wrote: > On Sat, Jan 26, 2008 at 03:27:00PM +0100, Andreas Schwab wrote: > > Jarek Poplawski <jarkao2@gmail.com> writes: > > > > > And, after re-reading this bugzilla report, I'm pretty sure the thing > > > should be done with 'ip route change' (but I didn't check if 2.6.24 > > > knows about this...). > > > > $ man ip > > [...] > > ip route add - add new route > > ip route change - change route > > ip route replace - change or add new one > > [...] > > > > According to this "replace" should be a superset of "change". > > According to this "replace" should be ...ambiguous. I could read this > "my/proper(?) way": > > ip route replace - change with new one or add new one > > And ...man could be wrong too after all! (...but not me!) After some checks it seems man is right - ie. WRT iproute.c! (...hmm?) And you read this right: '"replace" should be a superset of "change"'. > > Also, please check out comment#3, it also fails for replacing a route > > with something different (it's a route to an ipsec tunnel). But comment#3 is "ambiguous"... It looks like you don't want to show us too much... So, apparently you change the route, but it seems this route exists; you have this: 10.0.0.0/8 dev eth0 scope link but probably also something like this: default via 192.168.1.1 dev eth0 src 10.204.0.116 So, I doubt there is any "real" change attempted here. It looks more like a question if program should allow for changing the form of route entries even if they mean the same, and if this should be reported as error at all? But maybe I miss something... Jarek P.
On Sun, Jan 27, 2008 at 02:11:26AM +0100, Jarek Poplawski wrote: ... > But comment#3 is "ambiguous"... It looks like you don't want to show > us too much... So, apparently you change the route, but it seems this > route exists; you have this: > 10.0.0.0/8 dev eth0 scope link > but probably also something like this: > default via 192.168.1.1 dev eth0 src 10.204.0.116 > > So, I doubt there is any "real" change attempted here. It looks more > like a question if program should allow for changing the form of route > entries even if they mean the same, and if this should be reported as > error at all? But maybe I miss something... ...But, on the other hand, default route shoudn't affect adding or changing specific routes, so this behaviour is wrong, and IMHO this change should be reverted or fixed. Regards, Jarek P.
Reply-To: ja@ssi.bg Hello, On Sun, 27 Jan 2008, Jarek Poplawski wrote: > But comment#3 is "ambiguous"... It looks like you don't want to show > us too much... So, apparently you change the route, but it seems this > route exists; you have this: > 10.0.0.0/8 dev eth0 scope link > but probably also something like this: > default via 192.168.1.1 dev eth0 src 10.204.0.116 On replace the problem arises when same fib_info (priority, protocol, prefsrc, metrics, nexthops) is used in another route or routing table. In such cases single copy of structure is used with reference counter, all routes share pointer to such fib_info structure which saves memory when we have many routes using same gateway, for example. > So, I doubt there is any "real" change attempted here. It looks more > like a question if program should allow for changing the form of route > entries even if they mean the same, and if this should be reported as > error at all? But maybe I miss something... No, simply the last change in 2.6.24 is wrong to assume duplication is evident in fib_info reference counter. And such check is only on ip route replace/change. I'm appending brief FIB information for your reference: FIB - Forwarding Information Base - Routes are organized in routing tables - For "fib_hash" algorithm routing tables have 33 zones (for prefix lengths 0..32), routing lookup walks them from 32 to 0 to find a node containing all routing information - Zones are implemented as hash tables where nodes are hashed by key (prefix=network) because there can be lots of prefixes in a zone. - Nodes can be stored with other methods, eg. trie, where nodes are searched (we hope faster) by prefix and length, no zones are used in this case - Nodes have a list of aliases (tos+type+scope+fib_info ptr) sorted by decreasing TOS because TOS=0 must be a last hit when looking for route. type is unicast, local, prohibit, etc. scope is host, link, etc. - fib_info is a structure containing protocol (kernel, boot, zebra, etc), prefsrc, priority (metric), metrics, nexthop(s). Fallback routes have higher value for priority, they are used if more priority routes disappear or their nexthops are dead. - fib_info structures are organized in 2 global hash tables, one keyed by prefsrc and another by nexthop_count+protocol+prefsrc+priority - fib_info is a shared structure, different aliases can point to same fib_info, even aliases from different prefixes, from different routing tables. By this way if fib_info contains multipath route then many aliases share same route path scheduling context. - Nexthop contains gateway, output device, scope and weight. Weight is used for path scheduling where nexthops have relative priority compared to other nexthops in multipath route. - There can be many aliases with same tos, there can be alternative routes (aliases) with same tos and priority (metric) but only one alias with particular tos, type, scope and fib_info can exist to avoid duplicate alternative routes. - The operation to replace route includes replacing of alias. The alias in node (table -> prefix/len) is matched by tos and fib_info priority and they can not be changed. The parameters that are changed are type, scope and fib_info (except priority). * routing table * node (prefix/len) * alias (tos, type, scope) -> fib_info (priority, protocol, prefsrc, metrics) * nexthop (gateway, outdev, scope, weight) read '*' for 'many' and '->' for 'counted reference' Regards -- Julian Anastasov <ja@ssi.bg>
On Sun, Jan 27, 2008 at 11:49:06AM +0200, Julian Anastasov wrote: ... > No, simply the last change in 2.6.24 is wrong to assume > duplication is evident in fib_info reference counter. And such check > is only on ip route replace/change. I'm appending brief FIB information > for your reference: Thank you very much! I've just planned to look more at this code today, and maybe try to find how your new patch copes with this problem, so this description should be really helpful! But, since we agree now it's a bug and regression in 2.6.24, it seems this should be fixed as soon as possible, and since your patch isn't probably tested enough for stable, it looks like safe bet to revert Joonwoo's patch, at least until this all is more verified. Regards, Jarek P.
(In reply to comment #17) > Reply-To: ja@ssi.bg > > > Hello, > > On Sun, 27 Jan 2008, Jarek Poplawski wrote: > > > But comment#3 is "ambiguous"... It looks like you don't want to show > > us too much... So, apparently you change the route, but it seems this > > route exists; you have this: > > 10.0.0.0/8 dev eth0 scope link > > but probably also something like this: > > default via 192.168.1.1 dev eth0 src 10.204.0.116 > > On replace the problem arises when same fib_info (priority, > protocol, prefsrc, metrics, nexthops) is used in another route or routing > table. In such cases single copy of structure is used with reference > counter, all routes share pointer to such fib_info structure which > saves memory when we have many routes using same gateway, for example. > > > So, I doubt there is any "real" change attempted here. It looks more > > like a question if program should allow for changing the form of route > > entries even if they mean the same, and if this should be reported as > > error at all? But maybe I miss something... > > No, simply the last change in 2.6.24 is wrong to assume > duplication is evident in fib_info reference counter. And such check > is only on ip route replace/change. I'm appending brief FIB information > for your reference: Is there a fix for 2.6.24, available somewhere?
Any updates on this problem? Has aforementioned patches (#8) resolved the problem in the latest kernel? - thanks.
Fixed by c18865f39276435abb9286f9a816cb5b66c99a00 ([IPV4] fib: fix route replacement, fib_info is shared).