Bug 204005

Summary: Code in __mkroute_input isn't full correct
Product: Networking Reporter: cliff chen (cliff.chen)
Component: IPV4Assignee: Stephen Hemminger (stephen)
Status: NEW ---    
Severity: normal    
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.10.0-862 Subsystem:
Regression: No Bisected commit-id:

Description cliff chen 2019-06-27 09:33:27 UTC
In function __mkroute_input(), there is issue in below code:
......
rt_cache:
                if (rt_cache_valid(rth)) { <<======
                        skb_dst_set_noref(skb, &rth->dst);
                        goto out;
                }

......
Once the route is failed, then rth.rt_type is set as unreachable(7).
however, once the route is correct again, because the condition rt_cache_valid(rth) only check the rt_genid in cache and net space.
so even the route is recovery, then it always get the failed route cache.
one test env.
1) host1:
add ip1 on interface x

2) host2(proxy arp)
2.1) add ip2 on interface y1 with 32 prefix
2.2) add no IP on interface y2
Notes: x, y1 and y2 are in the same layer2 networkwork
set forwarding on y1 interface
set ip3 as arp proxy on interface y1

2.3) add ip3 on interface z on any interface which isn't the same layer2 as interface y1 and y2.

3)run below test on host1 to check whether arp is back.
arping -I x -s ip1 ip3

The possible reason analysis:
since ARP is broadcast, then interface y2 can get this ARP request first,
because forwarding isn't set on on y2, then route failed. this is correct.
however, when ARP is received on y1, the route is always failed even the result from fib_lookup is successfully. All these because the condition rt_cache_valid(rth).
because, the rt_genid in cache isn't changed, and
 rg_genid in network space isn't changed, too.
therefore, it will never OK until, I 
down y2, or
ip route flush cache
to increase rt_genid in network space.

thanks
Cliff
Comment 1 cliff chen 2019-06-28 01:17:42 UTC
Please let me know for any information!