Bug 215105 - Kernel leaks memory in ip6_dst_cache when suppress_prefix is present in ipv6 routing rules and a `fib` rule is present in ipv6 nftables rules
Summary: Kernel leaks memory in ip6_dst_cache when suppress_prefix is present in ipv6 ...
Status: RESOLVED CODE_FIX
Alias: None
Product: Networking
Classification: Unclassified
Component: IPV6 (show other bugs)
Hardware: All Linux
: P1 high
Assignee: Hideaki YOSHIFUJI
URL:
Keywords: trivial
Depends on:
Blocks:
 
Reported: 2021-11-22 15:10 UTC by qtmlabs
Modified: 2021-11-30 05:42 UTC (History)
2 users (show)

See Also:
Kernel Version: 5.4-rc1 and later
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments

Description qtmlabs 2021-11-22 15:10:46 UTC
The kernel leaks memory when a `fib` rule is present in ipv6 nftables firewall rules and a suppress_prefix rule is present in the IPv6 routing rules (used by certain tools such as wg-quick). In such scenarios, every incoming packet will leak an allocation in ip6_dst_cache slab cache.

After some hours of `bpftrace`-ing and source code reading, I tracked down the issue to this commit: https://github.com/torvalds/linux/commit/ca7a03c4175366a92cee0ccc4fec0038c3266e26

The problem with that change is that the generic args->flags always have FIB_LOOKUP_NOREF set[1][2] but the ip6-specific flag RT6_LOOKUP_F_DST_NOREF might not be, leading to fib6_rule_suppress not decreasing the refcount when needed.

How to reproduce:
 - Add the following nftables rule to a prerouting chain: `meta nfproto ipv6 fib saddr . mark . iif oif missing drop`. Can be done with:
    sudo nft create table inet test
    sudo nft create chain inet test test_chain '{ type filter hook prerouting priority filter + 10; policy accept; }'
    sudo nft add rule inet test test_chain meta nfproto ipv6 fib saddr . mark . iif oif missing drop
 - Run `sudo ip -6 rule add table main suppress_prefixlength 0`
 - Watch `sudo slabtop -o | grep ip6_dst_cache` memory usage increase with every incoming ipv6 packet

Suggested fix: Expose the protocol-specific flags to the protocol specific `suppress` function, and check the protocol-specific `flags` argument for RT6_LOOKUP_F_DST_NOREF instead of the generic FIB_LOOKUP_NOREF when decreasing the refcount, like this: https://gist.github.com/msizanoen1/36a2853467a9bd34fadc5bb3783fde0f

[1]: https://github.com/torvalds/linux/blob/ca7a03c4175366a92cee0ccc4fec0038c3266e26/net/ipv6/fib6_rules.c#L71
[2]: https://github.com/torvalds/linux/blob/ca7a03c4175366a92cee0ccc4fec0038c3266e26/net/ipv6/fib6_rules.c#L99
Comment 1 Jason A. Donenfeld 2021-11-22 15:28:47 UTC
Can you post the patch with your analysis message in the body of it to netdev@vger.kernel.org, and include a proper Signed-off-by line on the patch? That will garner the eyeballs of a lot more people and will result in this being fixed most likely.
Comment 2 qtmlabs 2021-11-22 23:48:40 UTC
A Signed-off-by line requires disclosing your legal name, which I am not willing to disclose. However the patch is trivial and uncopyrightable, and is the only (obvious) way to fix it. What's the recommended course of action in this case?
Comment 3 Jason A. Donenfeld 2021-11-22 23:59:06 UTC
Oh, shucks, okay. This really does look like a worthwhile patch with valuable analysis. I'll ask people more knowledgeable about process stuff what to do here.
Comment 4 Jason A. Donenfeld 2021-11-23 12:51:25 UTC
Alright, posted it on your behalf: https://lore.kernel.org/netdev/20211123124832.15419-1-Jason@zx2c4.com/

If you want to tweak this or send different revisions, I'll sign off and remail whatever you need.
Comment 5 Jason A. Donenfeld 2021-11-29 18:39:05 UTC
Nobody really chimed in after I mentioned that it should be reviewed, but it got merged anyway, so: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=cdef485217d30382f3bf6448c54b4401648fe3f1
Comment 6 Jason A. Donenfeld 2021-11-29 18:39:22 UTC
You can probably close this bug report and mark it as resolved.

Note You need to log in before you can comment on or make changes to this bug.