Bug 199573 - VRF: ICMPV6 Echo Reply failed to egress if ingress pkt Src is IPV6 Global and Dest is IPV6 Link Local.
Summary: VRF: ICMPV6 Echo Reply failed to egress if ingress pkt Src is IPV6 Global an...
Status: NEW
Alias: None
Product: Networking
Classification: Unclassified
Component: IPV6 (show other bugs)
Hardware: All Linux
: P1 blocking
Assignee: Hideaki YOSHIFUJI
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-04-30 12:53 UTC by Sukumar
Modified: 2019-06-19 12:27 UTC (History)
1 user (show)

See Also:
Kernel Version: 4.14.28
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Sukumar 2018-04-30 12:53:34 UTC
VRF:  ICMPV6 Echo Reply failed to egress if ingress pkt Src is IPV6 Global and Dest is IPV6 Link Local.

CONFIGURATION AND PACKET FLOW:
==============================

1) Created VRF device(Vrf_258) and enslaved network device(v1_F4246) to this
VRF. Assigned Global address 2001::1 and fe80::1 to VLAN device.

/exos/bin # ifconfig -a v1_F4246
v1_F4246  Link encap:Ethernet  HWaddr 00:04:96:98:C9:18  
          inet6 addr: 2001::1/64 Scope:Global
          inet6 addr: fe80::1/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:52 errors:0 dropped:0 overruns:0 frame:0
          TX packets:98 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:3024 (2.9 KiB)  TX bytes:8116 (7.9 KiB)

/exos/bin # ifconfig -a vrf_258 
vrf_258   Link encap:Ethernet  HWaddr 00:04:96:98:C9:18  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP RUNNING NOARP MASTER  MTU:65536  Metric:1
          RX packets:27 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:1512 (1.4 KiB)  TX bytes:0 (0.0 B)

/exos/bin # ip link show vrf_258
14: vrf_258: <NOARP,MASTER,UP,LOWER_UP> mtu 65536 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 00:04:96:98:c9:18 brd ff:ff:ff:ff:ff:ff

/exos/bin # ip link show v1_F4246
54: v1_F4246: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vrf_258 state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 00:04:96:98:c9:18 brd ff:ff:ff:ff:ff:ff


2) Received Ping request to local intf IP 2001::1 from neigbour DUT with following SRC/Dest.

  Rx ICMPV6 Echo Req Pkt's SRC IP = fe800000:0:0:2, DEST IP =  20010000:0:0:1

3) Incoming packet SRC is IPv6 Global address so skb->dev changed to vrf_258 by ip receive function.

4) Since it is local delivered packet  icmpv6_echo_reply() is invoked. This function tries to send reply with
   SRC IP = 20010000:0:0:1 
   DEST IP =  fe800000:0:0:2
   flowi6_oif = vrf_dev258

5) ip6_dst_lookup() is failed for this packet and got dropped.
 
 fl6.flowi6_oif = icmp6_iif(skb); 

 flowi6_oif is having vrf device index when ip6_dst_lookup() is invoked.

Pkt flows to ip6_dst_lookup() -> ip6_dst_lookup_tail()->ip6_route_output_flags()->l3mdev_link_scope_lookup()->vrf_link_scope_lookup()
 
There is a check in  vrf_link_scope_lookup() which is dropping the reply packet.

In this case flowi6_oif and dev->ifindex are same.

..
if (fl6->flowi6_oif == dev->ifindex) {
                dst = &net->ipv6.ip6_null_entry->dst;
                dst_hold(dst);
                return dst;
        }
..


TEMP FIX:
=========

Return Ingress vlan device's ifIndex instead of skb->dev's.

static int icmp6_iif(const struct sk_buff *skb)
{
        int iif = IP6CB(skb)->iif; // skb->dev->ifindex;

..
..
}
Comment 1 Sukumar 2018-05-08 10:13:49 UTC
Hi David,

Thanks for your suggested fix.

 Ping response with Src Global and Dest Link-Local is a valid
scenario. In many circumstances we check ping to Global address from
the interface which have only link-local configured.

 I tried your changes with small correction. I am trying to send ping
response packet out with Src Global and Dest Link local so the check
should be


/* Link local, Multicast should not have vrf device as egress */
if (fl6->flowi6_oif == dev->ifindex && rt6_need_strict(&fl6->saddr)) {

  ..

  return dst;

}

After adding the corrected check the packet proceed further.

Now rtlookup is failing with oif as vrf device (Src=20010000:0:0:1,
Dest=fe800000:0:0:2, oif=vrf_258).

rt = vrf_ip6_route_lookup(net, dev, fl6, fl6->flowi6_oif, flags);

vrf_ip6_route_lookup() calls ip6_pol_route() which returned error.

 Routing table
==========

 /exos/bin #   ip -6 route show table 258
anycast 2001:: dev v1_F4246 proto kernel metric 0 pref medium
local 2001::1 dev v1_F4246 proto kernel metric 0 pref medium
2001::/64 dev v1_F4246 proto kernel metric 256 pref medium
anycast fe80:: dev v1_F4246 proto kernel metric 0 pref medium
local fe80::1 dev v1_F4246 proto kernel metric 0 pref medium
fe80::/64 dev v1_F4246 proto kernel metric 256 pref medium
ff00::/8 dev v1_F4246 metric 256 pref medium


To make route look pass set FLOWI_FLAG_SKIP_NH_OIF to
fl6->flowi6_flags in vrf_link_scope_lookup() routine for the global
source address.


Proposed code changes:
===================

changes in-line as "Fix"


static struct dst_entry *vrf_link_scope_lookup(const struct net_device *dev,
                                              struct flowi6 *fl6)
{
..
..

       /* VRF device does not have a link-local address and
         * sending packets to link-local or mcast addresses over
         * a VRF device does not make sense
         */
        if (fl6->flowi6_oif == dev->ifindex &&
rt6_need_strict(&fl6->saddr)) {     => Fix1<=
         ..
         ..
        }

        if (!ipv6_addr_any(&fl6->saddr))
               flags |= RT6_LOOKUP_F_HAS_SADDR;

        => Fix2 start here<=
       /* Skip NH vrf oif for route lookup */
       if (fl6->flowi6_oif == dev->ifindex) {
          fl6->flowi6_flags  |= FLOWI_FLAG_SKIP_NH_OIF;
       }
       => Fix2 end here<=
      ..
      ..


}


On Fri, May 4, 2018 at 9:02 AM, David Ahern <dsahern@gmail.com> wrote:
> On 4/30/18 6:58 AM, Sukumar Gopalakrishnan wrote:
>> VRF: ICMPV6 Echo Reply failed to egress if ingress pkt Src is IPV6
>> Global and Dest is IPV6 Link Local.
>
> ...
>
>> if (fl6->flowi6_oif == dev->ifindex) {
>
> try adding ' && !rt6_need_strict(saddr)' to the above.
>
> If it works, add a comment above the line noting this case (link local
> dest and global source) submit a patch.
>
>
>>                 dst = &net->ipv6.ip6_null_entry->dst;
>>                 dst_hold(dst);
>>                 return dst;
>>         }
>> ..
>>
>>
>> TEMP FIX:
>> =========
>>
>> Return Ingress vlan device's ifIndex instead of skb->dev's.
>>
>> static int icmp6_iif(const struct sk_buff *skb)
>> {
>>         int iif = IP6CB(skb)->iif; // skb->dev->ifindex;
>>
>> ..
>> ..
>> }
>>
>
Comment 2 Stephen Suryaputra 2019-06-18 16:06:19 UTC
This is fixed in the following commit in net tree:

commit e1ae5c2ea478
Author: Stephen Suryaputra <ssuryaextr@gmail.com>
Date:   Mon Jun 10 10:32:50 2019 -0400

    vrf: Increment Icmp6InMsgs on the original netdev

    Get the ingress interface and increment ICMP counters based on that
    instead of skb->dev when the the dev is a VRF device.

    This is a follow up on the following message:
    https://www.spinics.net/lists/netdev/msg560268.html

    v2: Avoid changing skb->dev since it has unintended effect for local
        delivery (David Ahern).
    Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com>
    Reviewed-by: David Ahern <dsahern@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
Comment 3 Stephen Suryaputra 2019-06-19 12:27:27 UTC
(In reply to Stephen Suryaputra from comment #2)
> This is fixed in the following commit in net tree:
> 
> commit e1ae5c2ea478
> Author: Stephen Suryaputra <ssuryaextr@gmail.com>
> Date:   Mon Jun 10 10:32:50 2019 -0400
> 
>     vrf: Increment Icmp6InMsgs on the original netdev
> 
>     Get the ingress interface and increment ICMP counters based on that
>     instead of skb->dev when the the dev is a VRF device.
> 
>     This is a follow up on the following message:
>     https://www.spinics.net/lists/netdev/msg560268.html
> 
>     v2: Avoid changing skb->dev since it has unintended effect for local
>         delivery (David Ahern).
>     Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com>
>     Reviewed-by: David Ahern <dsahern@gmail.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>

My apology. This commit alone doesn't fix the bug.

Note You need to log in before you can comment on or make changes to this bug.