Bug 49311 (inet_select_addr) - Incorrect ARP behavior when multiple/none IPv4 address assigned to interface
Summary: Incorrect ARP behavior when multiple/none IPv4 address assigned to interface
Status: NEW
Alias: inet_select_addr
Product: Networking
Classification: Unclassified
Component: IPV4 (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Stephen Hemminger
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-10-22 13:53 UTC by Sergey Popovich
Modified: 2016-02-15 20:18 UTC (History)
2 users (show)

See Also:
Kernel Version: 3.7-rc2
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Extend inet_select_addr() to match most specific address (1.68 KB, patch)
2012-10-22 13:53 UTC, Sergey Popovich
Details | Diff
ARP case 1 system configuration and tcpdump captures (2.98 KB, application/octet-stream)
2012-10-22 13:53 UTC, Sergey Popovich
Details
ARP case 2 system configuration and tcpdump captures (3.01 KB, application/octet-stream)
2012-10-22 13:54 UTC, Sergey Popovich
Details
bash script to reproduce bug (1.96 KB, application/octet-stream)
2012-10-22 13:55 UTC, Sergey Popovich
Details
ICMP Reply on net/ipv4/icmp_errors_use_inbound_ifaddr (related) (1.21 KB, application/octet-stream)
2012-10-22 13:56 UTC, Sergey Popovich
Details

Description Sergey Popovich 2012-10-22 13:53:00 UTC
Created attachment 84271 [details]
Extend inet_select_addr() to match most specific address



Hello.

I have discovered following two cases, where neighbour discovery for IPv4 works
not as expected:
  1. Interface has no IPv4 address assigned and arp_ignore set to 0 (default).
  2. Interface has multiple IPv4 addresses assigned from within same IP subnet 
     but with different subnet mask (address space overlapping).
  3. When sysctl net/ipv4/icmp_use_inbound_ifaddr is 1 ICMP Reply send with
     first address configured on interface instead of looking most specific 
     address (matching subnetwork) or first address on any network interface
     (also no matching done).

Describe each problem more closely:
-----------------------------------
1. Common ISP configuration in IPv4 to conserve address is to use so called 
"ip unnumbered" address assignment schema: instead of allocating to each
customer /30 subnet for one IPv4 address (two addresses get's unused: network &
broadcast), customer gets assigned one address but with mask (for example) /24.

Many network equipment vendors implement this since long ago.

Linux also has this implemented (but still needs to be configured)
with thanks to arp_ignore sysctl.

Configuration schema on Linux loks like following:

 PC1              |             |
  ip: 10.0.1.2/24 |             | Linux Router
  gw: 10.0.1.1    |--------eth0-| Lo0: 10.10.10.10/32
                                | Lo255: 10.0.1.1/24
 PC2              |--------eth1-|        10.0.2.1/24
  ip: 10.0.1.3/24 |             | eth[0-2]: no ip address
  gw: 10.0.1.1    |             | ip route 10.0.1.2/32 dev eth0 src 10.0.1.1
                              +-| ip route 10.0.1.3/32 dev eth1 src 10.0.1.1
                              | | ip route 10.0.2.2/32 dev eth2 src 10.0.2.1
 PC3              |-----eth2--+
  ip: 10.0.2.2/24 |
  gw: 10.0.2.1    |

PC1-3 - hosts with Linux 3.2.23-3-amd64 (debian) kernels.

No ip address assigned on eth0-2, but IPv4 enabled. Next sysctl settings used (at least):
  sysctl net/ipv4/ip_forward=1
  sysctl net/ipv4/conf/all/forwarding=1  // default
  sysctl net/ipv4/conf/default/forward=1 // default
  sysctl net/ipv4/all/proxy_arp=0
  sysctl net/ipv4/default/proxy_arp=1    // needed for communication between 
                                         // PC1 & PC2
  sysctl net/ipv4/all/arp_ignore=0       // default
  sysctl net/ipv4/default/arp_ignore=0   // default

Main routing table (254) contains routes to customers ip addresses.
Lo0   - dummy network interface with name Lo(opback0) hosts IP address of the 
        router (10.10.10.10).
Lo255 - dummy network interface with name Lo(opback)255 created
with
  ip link add name lo255 type dummy
However selecting dummy interface as host for addresses not necessary: addresses
may be configured on ANY interface even real physical nic but NOT system 
Loopback (127.0.0.1) because routes on it considered to be local to the host 
and placed into local (255) routing table and according to Linux policy routing 
rules (which plases rule to lookup local routing table at idx 0) "ip unnumbered"
schema can't work because route found in local routing table (/24) even if in 
main more specific (/32) routes exists.

PROBLEM:
  If there are more than one ip address assigned in system, ARP Request
  generated by kernel in NUD PROBE phase gets address from within 
  first IP address assigned to interface (and in ip unnumbered there is no such
  address on interface) or first found address on any interface with scope <= 
  LINK. Parameter src (ip(8)) not taken into account even if configured.

Here is confirmation created in lab environment, reflecting schema described
above.
  
  reading from file arp-probe-bug.pcap, link-type EN10MB (Ethernet)
!--- PROBE phase begins ---
13:28:57.395181 08:00:27:3b:63:ae > 0a:00:27:00:00:00, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.1.2 tell 10.10.10.10, length 28
13:28:58.395257 08:00:27:3b:63:ae > 0a:00:27:00:00:00, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.1.2 tell 10.10.10.10, length 28
13:28:59.395207 08:00:27:3b:63:ae > 0a:00:27:00:00:00, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.1.2 tell 10.10.10.10, length 28
!--- FAILED phase begins ---
13:29:01.393739 08:00:27:3b:63:ae > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.1.2 tell 10.0.1.1, length 28
13:29:01.393862 0a:00:27:00:00:00 > 08:00:27:3b:63:ae, ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Reply 10.0.1.2 is-at 0a:00:27:00:00:00, length 46

As can be seen from capture router sends ARP with source NOT from subnet where
destination address 10.0.1.2 resides (configured on Lo255).

This causes NUD entry reneval to go into FAILED state on Linux router and make 
resolving neighbour entry from begin where it looks for 'src' parameter 
associated with /32 routes and successfuly resolving address.

Hosts PC1-3 ignored ARP with "fake" ip address in source (linux behavior).
Network equipment from other vendors (especially security devices) treat this as
attack and may have to apply specified actions.

See attachment arp-case-1.tar.xz for network stack configuration of Linux router.

2. Next case closely relates to first but configuration has changed

 PC1              |             |
  ip: 10.0.1.2/24 |             | Linux Router
  gw: 10.0.1.1    |--------+    | Lo0: 10.10.10.10/32
                           |    | Lo255: 10.0.2.1/24
 PC2                       +----| eth0: 10.0.1.1/24
  ip: 10.0.1.130/25 |      |    |       10.0.1.129/25
  gw: 10.0.1.129    |------+    |

We have overlapping address space (10.0.1.0/24 contains 10.0.1.0/25).
This could be in plase when network gets grown from 10.0.1.0/25 to 10.0.1.0/24
but not all hosts migrated to new settings).

In this case we also have same as in first case, but address taken from eth0
and it is first, that matches subnet even if more specific exists (/25).

reading from file arp-probe-bug.pcap, link-type EN10MB (Ethernet)
!--- PROBE phase begins ---
14:15:59.716100 08:00:27:3b:63:ae > 0a:00:27:00:00:00, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.1.130 tell 10.0.1.1, length 28
14:16:00.715429 08:00:27:3b:63:ae > 0a:00:27:00:00:00, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.1.130 tell 10.0.1.1, length 28
14:16:01.715302 08:00:27:3b:63:ae > 0a:00:27:00:00:00, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.1.130 tell 10.0.1.1, length 28
! --- FAILED phase begins ---
14:16:03.713596 08:00:27:3b:63:ae > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42:
Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.1.130 tell 10.0.1.129, length 28
14:16:03.713663 0a:00:27:00:00:00 > 08:00:27:3b:63:ae, ethertype ARP (0x0806), length 60:
Ethernet (len 6), IPv4 (len 4), Reply 10.0.1.130 is-at 0a:00:27:00:00:00, length 46

See attachment arp-case-2.tar.xz for network stack configuration of Linux router (sysctls identical to first case).

3. On the same schema as in case 2 when doing
tracepath (traceroute) we get ICMP answer from 10.0.1.1 and NOT from 10.0.1.129
as expected.
  sysctl net/ipv4/icmp_errors_use_inbound_ifaddr = 1

# tracepath -n 10.0.2.2
 1:  10.0.1.130                                            0.130ms pmtu 1500
 1:  no reply // Here should be 10.0.1.129, but see next output from tcpdump

16:19:32.043941 08:00:27:3b:63:ae > 0a:00:27:00:00:00, ethertype IPv4 (0x0800), length 590: (tos 0xc0, ttl 64, id 61526, offset 0, flags [none], proto ICMP (1), length 576)
    10.0.1.1 > 10.0.1.130: ICMP time exceeded in-transit, length 556
        (tos 0x0, ttl 1, id 0, offset 0, flags [DF], proto UDP (17), length 1500)
    10.0.1.130.55610 > 10.0.2.2.44444: UDP, length 1472

--------------------------------------------------------------------------
Proposed patch in attachment. 

It expands inet_select_addr() in net/ipv4/devinet.c to look at most specific 
network configured on interface it tested in our lab environment and found to 
work correctly.

arp-probe-bug.bsh script in attachment could be used to reproduce bug.

In general: even if inet_select_addr updated with our patch there is still
one place that breaks source address selection for kernel generated IPv4 
traffic:

ip_route_output_slow() (net/ipv4/route.c)
FIB_RES_PREFSRC()      (include/net/ip_fib.h)

in fib_info_update_nh_saddr() at net/ipv4/fib_semantic.c inet_select_addr() 
called with dst == nh->nh_gw, but if network is directly connected nh_gw == 0!
and we get first best address instead of searching on packet destination.

However this is in case only when route does't have prefsrc ('src' parameter)
accociated with it.
Comment 1 Sergey Popovich 2012-10-22 13:53:57 UTC
Created attachment 84281 [details]
ARP case 1 system configuration and tcpdump captures
Comment 2 Sergey Popovich 2012-10-22 13:54:39 UTC
Created attachment 84291 [details]
ARP case 2 system configuration and tcpdump captures
Comment 3 Sergey Popovich 2012-10-22 13:55:11 UTC
Created attachment 84301 [details]
bash script to reproduce bug
Comment 4 Sergey Popovich 2012-10-22 13:56:16 UTC
Created attachment 84311 [details]
ICMP Reply on net/ipv4/icmp_errors_use_inbound_ifaddr (related)
Comment 5 Alan 2012-10-23 10:22:53 UTC
Please post a summary of this to netdev@vger.kernel.org (you don't need to be subscribed). We use bugzilla for tracking reported bugs rather than for necessarily fixing them.

Note You need to log in before you can comment on or make changes to this bug.