Bug 211911
Summary: | Panic in skb_find_text(), regression in 5.11 vs 5.10 | ||
---|---|---|---|
Product: | Networking | Reporter: | Kris Karas (bugs-a21) |
Component: | Netfilter/Iptables | Assignee: | networking_netfilter-iptables (networking_netfilter-iptables) |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | willemb |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 5.11 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: | Script to trigger the bug on vulnerable systems |
Description
Kris Karas
2021-02-24 06:54:15 UTC
Got it via bisect: 97550f6fa59254435d864b92603de3ca4b5a99f8 is the first bad commit commit 97550f6fa59254435d864b92603de3ca4b5a99f8 Author: Willem de Bruijn <willemb@google.com> Date: Sat Jan 9 17:18:33 2021 -0500 net: compound page support in skb_seq_read skb_seq_read iterates over an skb, returning pointer and length of the next data range with each call. It relies on kmap_atomic to access highmem pages when needed. An skb frag may be backed by a compound page, but kmap_atomic maps only a single page. There are not enough kmap slots to always map all pages concurrently. Instead, if kmap_atomic is needed, iterate over each page. As this increases the number of calls, avoid this unless needed. The necessary condition is captured in skb_frag_must_loop. I tried to make the change as obvious as possible. It should be easy to verify that nothing changes if skb_frag_must_loop returns false. Tested: On an x86 platform with CONFIG_HIGHMEM=y CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP=y CONFIG_NETFILTER_XT_MATCH_STRING=y Run ip link set dev lo mtu 1500 iptables -A OUTPUT -m string --string 'badstring' -algo bm -j ACCEPT dd if=/dev/urandom of=in bs=1M count=20 nc -l -p 8000 > /dev/null & nc -w 1 -q 0 localhost 8000 < in Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> include/linux/skbuff.h | 1 + net/core/skbuff.c | 28 +++++++++++++++++++++++----- 2 files changed, 24 insertions(+), 5 deletions(-) Added Willem to the CC list, as he is the patch author. Additional notes: This machine that is effected is x86_64, so CONFIG_HIGHMEM does not apply. Similarly, CONFIG_DEBUG_KMAP=n Thanks for the report. Apologies for the issue. To be clear, your config does not have CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP=y, right? I structured the patch to be as obvious as possible a NOOP when skb_frag_must_loop returns false. Your stack shows locally generated UDP replies. These are normal packets, smaller than MTU, no UDP_GSO? I would expect those to be wholly linear, no frags whatsoever. On top of that, skb_find_text even bails after offset 32. Hi Willem, Indeed, all of the KMAP debug options are set to false. As this is a border router, kernel config options are set rather conservatively. It has a Ryzen 5 CPU; and the ethernet interface in question reports itself as a RTL 8168h/8111h (r8169 kernel driver). I'm not positive about the GSO; I looked for any references to UDP_SEGMENT within the ISC BIND source, and found none, so blindly assume "no" there. CONFIG_NET_MPLS is unset, as is the _GSO. The packets themselves are small. The incoming queries are 75 bytes in length, the outgoing response 70 bytes (including the ethernet frame). Well below MTU (which is at a default 1500). No fragmentation was reported by wireshark in any of the packet captures. As for skb_find_text bailing at offset 32 (I assume the IP, not the frame), that's right at the end of the search target, which is offsets 30 through 31. Thanks for helping with this! Kris Thanks for the extra details. Let me try to reproduce it locally. The __stack_chk_fail means a stack overflow was discovered. The odd thing is that the patch really only modifies code that touches skb frags. Which these packets should not have. I agree that this sounds odd, given that no frags are involved. I should have been more rigorous, and recompiled 5.11.1 with just this one commit backed out, to prove that the "git bisect" didn't pick the wrong commit. I'll try that later today after my 8yo is done with her Zoom school; don't want to crash her video feed. :-) Just to follow up on comment 6, my router is now running vanilla 5.11.1 with 97550f6fa592 backed out, has seen the iptables string match rule hit 161 times as I pen this, and no OOPSen or other dmesg output has occurred. So this is definitely the errant patch, even though it doesn't sound likely given the gating factors mentioned in comment 3. That's pretty clear. I hope no one's zoom school was impacted :) I haven't gotten to writing a reproducer yet. Will do over the weekend and get back. Created attachment 295499 [details]
Script to trigger the bug on vulnerable systems
Willem, I've written a test script for you that demonstrates the bug, uploaded above. It uses "nping" (usually bundled with the "nmap" package) to generate raw test packets, and also iptables (or ip6tables) to dynamically create a rule to catch the nping output. Script is a bit verbose with sanity-checking, but should clean up after itself. Kris Thanks a lot, Kris! Looking into it right now. The nping reproducer will help a lot. Commit 97550f6fa592 ("net: compound page support in skb_seq_read") adds 4B field frag_off to skb_seq_state, growing it from 40 to 48B on 64-bit platforms. This state is stored in the control block, as shown by TS_SKB_CB(). I assumed that this was like other .._SKB_CB macros and points to skb->cb[], which is 48B. But in skb_find_text it points into stack allocated ts_state: struct ts_state { unsigned int offset; char cb[40]; }; where cb is 40 bytes, so the write to frag_off corrupts the stack. At first attempt this fixes it: struct ts_state { unsigned int offset; - char cb[40]; + char cb[40]; }; Will take a closer look, and write it in such a way that this does not happen again next time skb_seq_state grows. the from 40 to 48 is not a typo: 4B field + 4B padding: $ pahole -C skb_seq_state net/core/skbuff.o struct skb_seq_state { __u32 lower_offset; /* 0 4 */ __u32 upper_offset; /* 4 4 */ __u32 frag_idx; /* 8 4 */ __u32 stepped_offset; /* 12 4 */ struct sk_buff * root_skb; /* 16 8 */ struct sk_buff * cur_skb; /* 24 8 */ __u8 * frag_data; /* 32 8 */ __u32 frag_off; /* 40 4 */ /* size: 48, cachelines: 1, members: 8 */ /* padding: 4 */ /* last cacheline: 48 bytes */ }; I'm being an idiot here, but, would something such as: struct ts_state { unsigned int offset; char cb[sizeof(struct skb_seq_state)]; }; work? Yes it would. Not an idiotic suggestion! But I think the textsearch infra in include/linux/textsearch.h is intended not to depend on skbuffs. Even if in practice it has no other user. We can avoid introducing the dependency by doing the reverse where both structs are known. In skb_find_text: BUILD_BUG_ON(sizeof(struct skb_seq_state) > sizeof(state.cb)); I'd rather get rid of the weird untyped state.cb altogether. Taking a look at that. But likely comment 12 + this is the shortest patch and therefore most suitable solution for stable branches. That's an honorable goal, to keep unexpected dependencies out of the kernel. Kudos for that! I can say with a bit of a laugh that, having run custom-configured kernels in production environments since late 1993, that the kernel is resplendent with unexpected dependencies. Reading the code, there are zero comments that accompany skb_seq_state in include/linux/skbuff.h, which is to say, not one reference to downstream users requiring synchronous code updates. I think each owner of downstream code that embeds anonymous and undocumented size_t's into their own code owes it to the community to add a comment before the upstream structure, something like this: /* When adding, removing, or re-arranging struct skb_seq_state, * please update "char cb[40];" within "struct ts_state" in file * include/linux/textsearch.h so that the "40" matches your new * sizeof(struct skb_seq_state). -- John Q. Maintainer */ That way, authors who need to update skb_seq_state will know what else they need to tweak when writing the patch. As it is, it is maddening that there is no definitive way to know who else's code you need to edit. Rant aside, I wonder if there are other undocumented users of sizeof(struct skb_seq_state) besides textsearch? In particular, in 5.11, netconsole doesn't work on any one of my systems. I see several that all report, do_syscall_64+0x2d/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 On the system I'm writing this on now, I see this in the logs: springdale kernel: igb_poll+0x0/0x1240 exceeded budget in poll [...] springdale kernel: Call Trace: springdale kernel: __netpoll_send_skb+0x1d1/0x230 springdale kernel: netpoll_send_skb+0x11/0x30 springdale kernel: br_dev_xmit+0x248/0x3e0 springdale kernel: netpoll_start_xmit+0x110/0x1b0 springdale kernel: __netpoll_send_skb+0x14b/0x230 springdale kernel: netpoll_send_udp+0x2b3/0x3f0 springdale kernel: write_msg+0x121/0x140 [...] Might be just a coincidence, something else affecting socket buffers. That sounds like a separate unrelated issue. The BUILD_BUG_ON is intended to be a more strict, because also machine verifiable, version of documentation. Coupled with git blame pointing to a git commit message that can be much more detailed than inline comments. When I send a patch, I'll add you as Reported-by: . Let me know if you would rather I do not. The commit has been merged in time for 5.11.3 Where the commit is " author Willem de Bruijn <willemb@google.com> 2021-03-01 15:09:44 +0000 committer David S. Miller <davem@davemloft.net> 2021-03-01 15:25:24 -0800 commit b228c9b058760500fda5edb3134527f629fc2dc3 (patch) tree da9c298ac2224e735f2d0f56c28a48e7324c8701 parent 2353db75c3db1dd26ff9c8feccfd3543a9cb73be (diff) net: expand textsearch ts_state to fit skb_seq_state The referenced commit expands the skb_seq_state used by skb_find_text with a 4B frag_off field, growing it to 48B. This exceeds container ts_state->cb, causing a stack corruption: [ 73.238353] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: skb_find_text+0xc5/0xd0 [ 73.247384] CPU: 1 PID: 376 Comm: nping Not tainted 5.11.0+ #4 [ 73.252613] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 [ 73.260078] Call Trace: [ 73.264677] dump_stack+0x57/0x6a [ 73.267866] panic+0xf6/0x2b7 [ 73.270578] ? skb_find_text+0xc5/0xd0 [ 73.273964] __stack_chk_fail+0x10/0x10 [ 73.277491] skb_find_text+0xc5/0xd0 [ 73.280727] string_mt+0x1f/0x30 [ 73.283639] ipt_do_table+0x214/0x410 The struct is passed between skb_find_text and its callbacks skb_prepare_seq_read, skb_seq_read and skb_abort_seq read through the textsearch interface using TS_SKB_CB. I assumed that this mapped to skb->cb like other .._SKB_CB wrappers. skb->cb is 48B. But it maps to ts_state->cb, which is only 40B. skb->cb was increased from 40B to 48B after ts_state was introduced, in commit 3e3850e989c5 ("[NETFILTER]: Fix xfrm lookup in ip_route_me_harder/ip6_route_me_harder"). Increase ts_state.cb[] to 48 to fit the struct. Also add a BUILD_BUG_ON to avoid a repeat. The alternative is to directly add a dependency from textsearch onto linux/skbuff.h, but I think the intent is textsearch to have no such dependencies on its callers. Link: https://bugzilla.kernel.org/show_bug.cgi?id=211911 Fixes: 97550f6fa592 ("net: compound page support in skb_seq_read") Reported-by: Kris Karas <bugs-a17@moonlit-rail.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> " https://patchwork.kernel.org/project/netdevbpf/patch/20210301150944.138500-1-willemdebruijn.kernel@gmail.com/ Excellent. Thank you, Willem! I am running the diff against the vanilla 5.11.2, and just as I have started typing, the text match has hit 50 times with no complaints. Shall we label this bug as Closed/Code-Fix ? Thanks for verifying Kris. Yes, sounds good. It appears that this patch, while in netdev, has not made it to 5.11. Both 5.11.3 and now 5.11.4 are missing it. Can we push this to Greg KH's queue? Good catch. The patch is in the stable queue, waiting to be merged https://patchwork.kernel.org/bundle/netdev/stable/?state=* For details on stable backports for networking, see also https://www.kernel.org/doc/html/latest/networking/netdev-FAQ.html (I had to myself, to recall the URL of the stable queue) |