Bug 43327 - IP routing: cached route is applied to wrong network interface
Summary: IP routing: cached route is applied to wrong network interface
Status: RESOLVED CODE_FIX
Alias: None
Product: Networking
Classification: Unclassified
Component: IPV4 (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Stephen Hemminger
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-06-01 00:18 UTC by Daniel Schnabel
Modified: 2013-04-30 19:30 UTC (History)
3 users (show)

See Also:
Kernel Version: 3.1.10
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Setup of the use case (1.80 KB, text/plain)
2012-06-01 00:20 UTC, Daniel Schnabel
Details
Patch to fix ICMP redirect issue (813 bytes, patch)
2013-04-30 17:47 UTC, Daniel Schnabel
Details | Diff

Description Daniel Schnabel 2012-06-01 00:18:37 UTC
IP routing: cached route is applied to wrong network interface


Dynamic route changes like ICMP redirects are cached in the cache routing table of the kernel. This cache table can be displayed using the command "route -nC" or "ip route show cache".
Routes in this table are used before checking the Routing Policy Database (RPDB). In a certain use case a wrong route entry is created in the cache table.

This is my network setup:
* the Linux machine has 2 network interfaces (eth0 and eth1) with IP adresses of different subnets
** eth0: 172.16.124.217/24 (Subnet A)
** eth1: 172.16.128.219/24 (Subnet B)
* IP rules to accomplish two default gateways
** root@myBox:~# ip rule show
0:      from all lookup local
32764:  from 172.16.128.219 lookup E1
32765:  from 172.16.124.217 lookup E0
32766:  from all lookup main
32767:  from all lookup default
** root@myBox:~# ip route show table E0
default via 172.16.124.254 dev eth0
** root@myBox:~# ip route show table E1
default via 172.16.128.254 dev eth1
* Both gateways are connected to Subnet C

This is how it looks like:

    ************                                                                   #      ************
    * Subnet A *                                                                   #      * Subnet C *
    ************             +-------------------+      +-------------------+      #      ************
                             |                   |      |                   |      #
         +-------------------+ GW 172.16.124.254 +------+ GW 172.16.124.18  +------#---------------+
         | 172.16.124.217    |                   |      |                   |      #               |
  +------+--------+          +-------------------+      +---------+---------+      #               |
  |     eth0      |                                               |                #      +--------+----------+
  |               |                                               |                #      |      Target       |
  | Linux Machine |                                      ##################        #      |   IP 10.20.2.252  |
  |               |                                               |                #      +--------+----------+
  |     eth1      |                                               |                #
  +------+--------+          +-------------------+                |                #
         | 172.16.128.219    |                   |                |                #
         +-------------------+ GW 172.16.128.254 +----------------+                #
                             |                   |                                 #
    ************             +-------------------+                                 #
    * Subnet B *                                                                   #
    ************                                                                   #
     
     
     
I can ping the target from both interfaces:
ping 10.20.2.252 -I 172.16.124.217
ping 10.20.2.252 -I 172.16.128.219

When pining from eth0 (172.16.124.217) the Gateway 172.16.124.254 will return a redirect to Gateway 172.16.124.18 since it's in the same network:
root@myBox:~# ping 10.20.2.252 -I 172.16.124.217
PING 10.20.2.252 (10.20.2.252) from 172.16.124.217 : 56(84) bytes of data.
64 bytes from 10.20.2.252: icmp_seq=1 ttl=63 time=81.4 ms
From 172.16.124.254: icmp_seq=1 Redirect Host(New nexthop: 172.16.124.18)
64 bytes from 10.20.2.252: icmp_seq=2 ttl=63 time=0.277 ms
64 bytes from 10.20.2.252: icmp_seq=3 ttl=63 time=0.238 ms
64 bytes from 10.20.2.252: icmp_seq=4 ttl=63 time=0.236 ms

And this redirect will create a new entry in the cache table:
root@myBox:~# route -nC | grep 172.16.124.18
172.16.124.217  10.20.2.252     172.16.124.18         0      0        2 eth0

So far so good. Here comes the problem.

When I ping the same target now from eth1 (172.16.128.219) then it won't work anymore:
root@myBox:~# ping 10.20.2.252 -I 172.16.128.219
PING 10.20.2.252 (10.20.2.252) from 172.16.128.219 : 56(84) bytes of data.
From 172.16.128.219 icmp_seq=2 Destination Host Unreachable
From 172.16.128.219 icmp_seq=3 Destination Host Unreachable
From 172.16.128.219 icmp_seq=4 Destination Host Unreachable
^C
--- 10.20.2.252 ping statistics ---
4 packets transmitted, 0 received, +3 errors, 100% packet loss, time 2999ms

I check the cache table and notice another entry:
root@portwell19:~# route -nC | grep 172.16.124.18
172.16.124.217  10.20.2.252     172.16.124.18         0      0        2 eth0
172.16.128.219  10.20.2.252     172.16.124.18         0      0        7 eth1

That means eth1 is now trying to reach 10.20.2.252 using the gateway 172.16.124.18. It's obvious that this won't work since eth1 is in a different subnet.
So the entry in the cache table is wrong. After clearing the cache with "ip route flush table cache" the ping from eth1 works again.

I did some research:
The cache routing table works on an AVL tree of Internet Peers. Those peers are stored in a structure called inet_peer (include/net/inetpeer.h). A lookup is done by the call to inet_getpeer_v4() in net/ipv4/route.c which takes the destination address (10.20.2.252 in my case) as the first argument. So if the destination address matches then the peer is returned and saved to the cache table regardless of the source address.

Two possible fixes I can think of:
* A peer lookup should be done not only by the destination address but also by the source address (or netmask)
* The inet_peer structure should contain a field for the source address (or netmask). Then after lookup via inet_getpeer_v4() check the source address (or netmask) of the returned peer.
Comment 1 Daniel Schnabel 2012-06-01 00:20:55 UTC
Created attachment 73482 [details]
Setup of the use case
Comment 2 Daniel Schnabel 2012-06-01 00:28:05 UTC
Please disregard the ugly ascii art in the description and refer to the attachment "Setup of the use case" instead :) I haven't found a way yet to edit the description text.
Comment 3 Alan 2012-07-11 14:48:31 UTC
If you've not already done so please report this first to netdev@vger.kernel.org
Comment 4 Daniel Schnabel 2012-07-11 18:00:54 UTC
I'm not able to report this to netdev@vger.kernel.org

"Your message wasn't delivered due to a permission or security issue. It may have been rejected by a moderator, the address may only accept e-mail from certain senders, or another restriction may be preventing delivery."
Comment 5 Daniel Schnabel 2013-04-30 17:47:59 UTC
Created attachment 100321 [details]
Patch to fix ICMP redirect issue

I fixed the issue and attached the patch. Our systems have been running successfully for 8 months now with this fix.
Comment 6 Stephen Hemminger 2013-04-30 19:30:09 UTC
Route cache was completely removed in recent kernels, so your patch has no (In reply to comment #4)
> I'm not able to report this to netdev@vger.kernel.org
> 
> "Your message wasn't delivered due to a permission or security issue. It may
> have been rejected by a moderator, the address may only accept e-mail from
> certain senders, or another restriction may be preventing delivery."

kernel.org does not accept HTML mail, and certain senders with excess spam are blacklisted.
Comment 7 Stephen Hemminger 2013-04-30 19:30:41 UTC
(In reply to comment #5)
> Created an attachment (id=100321) [details]
> fixes ICMP redirect issue
> 
> I fixed the issue and attached the patch. Our systems have been running
> successfully for 8 months now with this fix.

Route cache was completely removed in 3.2 kernel. Your patch is no longer relevant.

Note You need to log in before you can comment on or make changes to this bug.