Bug 218033
Summary: | kernel tried to execute NX-protected page - exploit attempt? (uid: 0) | ||
---|---|---|---|
Product: | Memory Management | Reporter: | CM76 (cmaff76) |
Component: | Page Allocator | Assignee: | Andrew Morton (akpm) |
Status: | RESOLVED PATCH_ALREADY_AVAILABLE | ||
Severity: | normal | CC: | bagasdotme |
Priority: | P3 | ||
Hardware: | Intel | ||
OS: | Linux | ||
Kernel Version: | Subsystem: | ||
Regression: | No | Bisected commit-id: | |
Attachments: |
dmesg.202310211711
dmesg.202310180543 dmesg.202310221752 |
Description
CM76
2023-10-21 17:31:43 UTC
Created attachment 305274 [details]
dmesg.202310211711
I managed to load the dump in crash on another machine. I also attached the dmesg of the crash dump that happened when I was running the Mainline/Stable version Kernel v6.5. I attached the same two dmesg earlier by mistake. ------------------- crash> set -p PID: 0 COMMAND: "swapper/1" TASK: ffff9441c0958000 (1 of 4) [THREAD_INFO: ffff9441c0958000] CPU: 1 STATE: TASK_RUNNING (PANIC) crash> bt PID: 0 TASK: ffff9441c0958000 CPU: 1 COMMAND: "swapper/1" #0 [ffffb3c380138610] machine_kexec at ffffffff9acafa3b #1 [ffffb3c380138670] __crash_kexec at ffffffff9ae133f3 #2 [ffffb3c380138738] crash_kexec at ffffffff9ae14de2 #3 [ffffb3c380138748] oops_end at ffffffff9ac52131 #4 [ffffb3c380138770] page_fault_oops at ffffffff9acc77b0 #5 [ffffb3c3801387d0] kernelmode_fixup_or_oops at ffffffff9acc7962 #6 [ffffb3c380138810] __bad_area_nosemaphore at ffffffff9acc7ba5 #7 [ffffb3c380138868] bad_area_nosemaphore at ffffffff9acc7ce6 #8 [ffffb3c380138878] do_kern_addr_fault at ffffffff9acc7d8b #9 [ffffb3c3801388a0] exc_page_fault at ffffffff9bd41864 #10 [ffffb3c3801388d0] asm_exc_page_fault at ffffffff9be00bc7 [exception RIP: unknown or invalid address] RIP: ffff9441c9734458 RSP: ffffb3c380138980 RFLAGS: 00010282 RAX: ffff9441c9734458 RBX: ffff9441c9734400 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9441c9734400 RBP: ffffb3c380138990 R8: 0000000000000000 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff9441c9734400 R13: 00000000000005dc R14: ffff9441c49dda00 R15: ffffffff9e55ec40 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #11 [ffffb3c380138980] skb_release_head_state at ffffffff9ba16117 #12 [ffffb3c380138998] consume_skb at ffffffff9ba18c13 #13 [ffffb3c3801389b0] tcp_mtu_probe at ffffffff9bb26405 #14 [ffffb3c380138a00] tcp_write_xmit at ffffffff9bb269f9 #15 [ffffb3c380138a68] __tcp_push_pending_frames at ffffffff9bb26f77 #16 [ffffb3c380138a88] tcp_rcv_established at ffffffff9bb1edf4 #17 [ffffb3c380138ad8] tcp_v4_do_rcv at ffffffff9bb30169 #18 [ffffb3c380138b00] tcp_v4_rcv at ffffffff9bb32482 #19 [ffffb3c380138b80] ip_protocol_deliver_rcu at ffffffff9baf424c #20 [ffffb3c380138bb8] ip_local_deliver_finish at ffffffff9baf44a7 #21 [ffffb3c380138bd8] ip_local_deliver at ffffffff9baf454e #22 [ffffb3c380138c38] ip_sublist_rcv_finish at ffffffff9baf467f #23 [ffffb3c380138c58] ip_sublist_rcv at ffffffff9baf4811 #24 [ffffb3c380138ce0] ip_list_rcv at ffffffff9baf4c62 #25 [ffffb3c380138d48] __netif_receive_skb_list_core at ffffffff9ba3d12d #26 [ffffb3c380138dc8] netif_receive_skb_list_internal at ffffffff9ba3d763 #27 [ffffb3c380138e38] napi_complete_done at ffffffff9ba3df24 #28 [ffffb3c380138e68] bnx2_poll_msix at ffffffffc02cb121 [bnx2] #29 [ffffb3c380138ea0] __napi_poll at ffffffff9ba3e0b3 #30 [ffffb3c380138ed8] net_rx_action at ffffffff9ba3e631 #31 [ffffb3c380138f60] __do_softirq at ffffffff9bd5a349 #32 [ffffb3c380138fd0] __irq_exit_rcu at ffffffff9acff925 #33 [ffffb3c380138fe0] irq_exit_rcu at ffffffff9acffc7e #34 [ffffb3c380138ff0] common_interrupt at ffffffff9bd3d724 --- <IRQ stack> --- #35 [ffffb3c3800cbd68] asm_common_interrupt at ffffffff9be00e27 [exception RIP: cpuidle_enter_state+218] RIP: ffffffff9bd4239a RSP: ffffb3c3800cbe18 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffff9442f7c7ec00 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffffb3c3800cbe68 R8: 0000000000000000 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff9d0d24a0 R13: 0000000000000003 R14: 0000000000000003 R15: 00000afefc75867b ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #36 [ffffb3c3800cbe70] cpuidle_enter at ffffffff9b9ac56e #37 [ffffb3c3800cbe98] call_cpuidle at ffffffff9ad68843 #38 [ffffb3c3800cbea8] cpuidle_idle_call at ffffffff9ad6e0fd #39 [ffffb3c3800cbee8] do_idle at ffffffff9ad6e202 #40 [ffffb3c3800cbf08] cpu_startup_entry at ffffffff9ad6e48d #41 [ffffb3c3800cbf20] start_secondary at ffffffff9ac9e6c9 #42 [ffffb3c3800cbf50] secondary_startup_64_no_verify at ffffffff9ac00263 crash> dis -rl 0xffff9441c9734458 dis: WARNING: ffff9441c9734458: no associated kernel symbol found 0xffff9441c9734458: add %al,(%rax) crash> kmem 0xffff9441c9734458 CACHE OBJSIZE ALLOCATED TOTAL SLABS SSIZE NAME ffff9441c0e46c00 512 1114 1504 94 8k skbuff_fclone_cache SLAB MEMORY NODE TOTAL ALLOCATED FREE ffffdf6ec425cd00 ffff9441c9734000 0 16 15 1 FREE / [ALLOCATED] [ffff9441c9734400] PAGE PHYSICAL MAPPING INDEX CNT FLAGS ffffdf6ec425cd00 109734000 dead000000000004 0 1 17ffffc0010200 slab,head crash> ---------------------- Created attachment 305276 [details]
dmesg.202310180543
Probably has nothing to do with the Broadcom bnx2 driver. The server crashed with "tx-nocache-copy" set to off. I added the dmesg as attachment, the backtrace and kmem of the RIP address are below. I ran qbittorrent on a different server with the same hardware config back in June this year, the server was running Mainline/Stable kernel version 6.3.x then 6.4.0 and the server never rebooted once. ---------------------------- crash> bt PID: 0 TASK: ffff90b60095b300 CPU: 1 COMMAND: "swapper/1" #0 [ffffb9fdc0138610] machine_kexec at ffffffffa48afa3b #1 [ffffb9fdc0138670] __crash_kexec at ffffffffa4a133f3 #2 [ffffb9fdc0138738] crash_kexec at ffffffffa4a14de2 #3 [ffffb9fdc0138748] oops_end at ffffffffa4852131 #4 [ffffb9fdc0138770] page_fault_oops at ffffffffa48c77b0 #5 [ffffb9fdc01387d0] kernelmode_fixup_or_oops at ffffffffa48c7962 #6 [ffffb9fdc0138810] __bad_area_nosemaphore at ffffffffa48c7ba5 #7 [ffffb9fdc0138868] bad_area_nosemaphore at ffffffffa48c7ce6 #8 [ffffb9fdc0138878] do_kern_addr_fault at ffffffffa48c7d8b #9 [ffffb9fdc01388a0] exc_page_fault at ffffffffa5941864 #10 [ffffb9fdc01388d0] asm_exc_page_fault at ffffffffa5a00bc7 [exception RIP: unknown or invalid address] RIP: ffff90b602a3ca58 RSP: ffffb9fdc0138980 RFLAGS: 00010282 RAX: ffff90b602a3ca58 RBX: ffff90b602a3ca00 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff90b602a3ca00 RBP: ffffb9fdc0138990 R8: 0000000000000000 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff90b602a3ca00 R13: 00000000000005c8 R14: ffff90b6035f4800 R15: ffffffffa815ec40 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #11 [ffffb9fdc0138980] skb_release_head_state at ffffffffa5616117 #12 [ffffb9fdc0138998] consume_skb at ffffffffa5618c13 #13 [ffffb9fdc01389b0] tcp_mtu_probe at ffffffffa5726405 #14 [ffffb9fdc0138a00] tcp_write_xmit at ffffffffa57269f9 #15 [ffffb9fdc0138a68] __tcp_push_pending_frames at ffffffffa5726f77 #16 [ffffb9fdc0138a88] tcp_rcv_established at ffffffffa571edf4 #17 [ffffb9fdc0138ad8] tcp_v4_do_rcv at ffffffffa5730169 #18 [ffffb9fdc0138b00] tcp_v4_rcv at ffffffffa5732482 #19 [ffffb9fdc0138b80] ip_protocol_deliver_rcu at ffffffffa56f424c #20 [ffffb9fdc0138bb8] ip_local_deliver_finish at ffffffffa56f44a7 #21 [ffffb9fdc0138bd8] ip_local_deliver at ffffffffa56f454e #22 [ffffb9fdc0138c38] ip_sublist_rcv_finish at ffffffffa56f467f #23 [ffffb9fdc0138c58] ip_sublist_rcv at ffffffffa56f4811 #24 [ffffb9fdc0138ce0] ip_list_rcv at ffffffffa56f4c62 #25 [ffffb9fdc0138d48] __netif_receive_skb_list_core at ffffffffa563d12d #26 [ffffb9fdc0138dc8] netif_receive_skb_list_internal at ffffffffa563d763 #27 [ffffb9fdc0138e38] napi_complete_done at ffffffffa563df24 #28 [ffffb9fdc0138e68] bnx2_poll_msix at ffffffffc056e121 [bnx2] #29 [ffffb9fdc0138ea0] __napi_poll at ffffffffa563e0b3 #30 [ffffb9fdc0138ed8] net_rx_action at ffffffffa563e631 #31 [ffffb9fdc0138f60] __do_softirq at ffffffffa595a349 #32 [ffffb9fdc0138fd0] __irq_exit_rcu at ffffffffa48ff925 #33 [ffffb9fdc0138fe0] irq_exit_rcu at ffffffffa48ffc7e #34 [ffffb9fdc0138ff0] common_interrupt at ffffffffa593d724 --- <IRQ stack> --- #35 [ffffb9fdc00cbd68] asm_common_interrupt at ffffffffa5a00e27 [exception RIP: cpuidle_enter_state+218] RIP: ffffffffa594239a RSP: ffffb9fdc00cbe18 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffff90b737c7ec00 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffffb9fdc00cbe68 R8: 0000000000000000 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa6cd24a0 R13: 0000000000000004 R14: 0000000000000004 R15: 0000509709bfcf38 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #36 [ffffb9fdc00cbe70] cpuidle_enter at ffffffffa55ac56e #37 [ffffb9fdc00cbe98] call_cpuidle at ffffffffa4968843 #38 [ffffb9fdc00cbea8] cpuidle_idle_call at ffffffffa496e0fd #39 [ffffb9fdc00cbee8] do_idle at ffffffffa496e202 #40 [ffffb9fdc00cbf08] cpu_startup_entry at ffffffffa496e48d #41 [ffffb9fdc00cbf20] start_secondary at ffffffffa489e6c9 #42 [ffffb9fdc00cbf50] secondary_startup_64_no_verify at ffffffffa4800263 crash> kmem ffff90b602a3ca58 CACHE OBJSIZE ALLOCATED TOTAL SLABS SSIZE NAME ffff90b600e46d00 512 771 864 54 8k skbuff_fclone_cache SLAB MEMORY NODE TOTAL ALLOCATED FREE fffffac1440a8f00 ffff90b602a3c000 0 16 12 4 FREE / [ALLOCATED] [ffff90b602a3ca00] PAGE PHYSICAL MAPPING INDEX CNT FLAGS fffffac1440a8f00 102a3c000 dead000000000001 0 1 17ffffc0010200 slab,head crash> Created attachment 305278 [details]
dmesg.202310221752
(In reply to CM76 from comment #0) > I believe this is also an issue with the Broadcom bnx2 drivers since it only > seem to happen when I enable "tx-nocache-copy" in ethtool. > > The issue started when I was running Mainline/stable Kernel v6.5.x on > another machine, after google-ing a bit I landed on an article from redhat > that pointed at the possibility of an issue caused by a failing hardware. I > was renting the server, so I didn't bother to fill a bug report and assumed > it was the server that was going bad. But then it happened again on my other > server as soon as I switched the bittorrent client to the same I was using > on that other server. I turned "tx-nocache-copy" off and ran mainline kernel > v6.5 (on Ubuntu 23.04) for a day or two without issue. After that I switched > the kernel back to Ubuntu's kernel (v6.2) and the server ran for a couple > more days without issue. Two days ago I turned "tx-nocache-copy" on again > out of curiosity (kernel v6.2), and the server didn't run into any issue > with this setting set to on. This morning I upgraded to Ubuntu 23.10 that > runs their version of Kernel v6.5. The kernel panicked and server rebooted a > couple of hours later. > Please perform bisection (see Documentation/admin-guide/bug-bisect.rst in the kernel sources for how). Also, please test latest mainline (currently v6.6-rc7). I reinstalled/re-provisioned my main server to go back to Ubuntu 23.04 (kernel v6.2.x) two days ago. I'll keep it running with its v6.2.x kernel for a couple more days before I try 6.6-RC and git bisect <v6.5.x. Thanks for your time. I've been using 6.5.8 since yesterday and it hasn't crashed despite everything I threw at the server/bittorrent client. I wish I knew about "decode_stacktrace.sh" before, I wouldn't have overlooked the sparsely detailed "net:" patch when I got back to filling the bug report on Saturday. I started on the 18th but gave up as I couldn't load the crash dump from that day in the crash utility. [...] [88609.634236] ? asm_exc_page_fault (/build/linux-D15vQj/linux-6.5.0/arch/x86/include/asm/idtentry.h:570) [88609.634249] ? skb_release_head_state (/build/linux-D15vQj/linux-6.5.0/include/linux/skbuff.h:4572 /build/linux-D15vQj/linux-6.5.0/net/core/skbuff.c:997) [88609.634260] consume_skb (/build/linux-D15vQj/linux-6.5.0/net/core/skbuff.c:1007 (discriminator 1) /build/linux-D15vQj/linux-6.5.0/net/core/skbuff.c:1022 (discriminator 1) /build/linux-D15vQj/linux-6.5.0/net/core/skbuff.c:1238 (discriminator 1) /build/linux-D15vQj/linux-6.5.0/net/core/skbuff.c:1232 (discriminator 1)) [88609.634269] tcp_mtu_probe (/build/linux-D15vQj/linux-6.5.0/net/ipv4/tcp_output.c:2446) [...] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/net/ipv4/tcp_output.c?h=v6.5.9#n2446 https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/net?h=v6.5.9&id=e8dc72cb8312c1175a832b2e69239a23e8f7d570 Thanks again. |