Bug 219247
Summary: | Segfault in Qt apps running on Linux kernel 6.10.8 ARM with LPAE | ||
---|---|---|---|
Product: | Memory Management | Reporter: | Andrew (quark) |
Component: | Other | Assignee: | drivers_video-other |
Status: | RESOLVED PATCH_ALREADY_AVAILABLE | ||
Severity: | normal | CC: | michal.pecio, triad |
Priority: | P3 | ||
Hardware: | ARM | ||
OS: | Linux | ||
Kernel Version: | 6.10 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
dmesg
gdb Kate on 6.10.1 and 6.9.12 More detailed dmesg. dmesg config Script used for build |
One way to track it down is to bisect: https://docs.kernel.org/admin-guide/bug-bisect.html Sadly, I've not seen anything similar recently, so you could be the only one with the issue and the only one who can unwind it. Created attachment 306829 [details]
gdb Kate on 6.10.1 and 6.9.12
Well, I compiled Linux kernel 6.10.1 and got the same problem. So the error appeared between 6.9.12 and 6.10.1. Since nothing similar happened recently, I assumed it was related to a hw of my system. According to ChangeLog-6.10, there were some changes in the panfrost video driver that is used on my system. I tried running Kate with gdb on 6.10.1 and 6.9.12, report is attached. Created attachment 306833 [details]
More detailed dmesg.
Errors in mwifiex occurred in other versions of the Linux kernel too, it was run several times and then worked.
It seems my assumption about panfrost was wrong, at least it was not mentioned in the errors.
You really could bisect because there are literally a dozen of people in the world having the same hw configuration. git bisect start 0129910096573d08ecb139b20e2940682f248186 bb67b270b37e8bd9c96829d58ffe758635651e90 Bisecting: a merge base must be tested [a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6] Linux 6.9 git bisect good a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6 Bisecting: 7202 revisions left to test after this (roughly 13 steps) [33e02dc69afbd8f1b85a51d74d72f139ba4ca623] Merge tag 'sound-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound git bisect good 33e02dc69afbd8f1b85a51d74d72f139ba4ca623 Bisecting: 3570 revisions left to test after this (roughly 12 steps) [29c73fc794c83505066ee6db893b2a83ac5fac63] Merge tag 'perf-tools-for-v6.10-1-2024-05-21' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools git bisect bad 29c73fc794c83505066ee6db893b2a83ac5fac63 Bisecting: 2006 revisions left to test after this (roughly 11 steps) [0450d2083be6bdcd18c9535ac50c55266499b2df] Merge tag '6.10-rc-smb-fix' of git://git.samba.org/sfrench/cifs-2.6 git bisect bad 0450d2083be6bdcd18c9535ac50c55266499b2df Bisecting: 839 revisions left to test after this (roughly 10 steps) [b426433c03a6eb547515edbe74ebb3a90b9979dd] Merge tag 'mtd/for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux git bisect good b426433c03a6eb547515edbe74ebb3a90b9979dd Bisecting: 425 revisions left to test after this (roughly 9 steps) [f0cd69b8cca6a5096463644d6dacc9f991bfa521] Merge tag 'random-6.10-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random git bisect bad f0cd69b8cca6a5096463644d6dacc9f991bfa521 Bisecting: 246 revisions left to test after this (roughly 8 steps) [4853f1f6ace32c68a04287353e428c4cfc3fa8ed] Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux git bisect bad 4853f1f6ace32c68a04287353e428c4cfc3fa8ed Bisecting: 83 revisions left to test after this (roughly 6 steps) [31456ffa7b73f2b46b07b9b863eed332d05f5c23] platform/x86: thinkpad_acpi: Use correct keycodes for volume and brightness keys git bisect good 31456ffa7b73f2b46b07b9b863eed332d05f5c23 Bisecting: 41 revisions left to test after this (roughly 5 steps) [484bae9e4d6acb5eec39e1ea47f9aa43f11b154d] platform/x86: Add new Dell UART backlight driver git bisect good 484bae9e4d6acb5eec39e1ea47f9aa43f11b154d Bisecting: 22 revisions left to test after this (roughly 4 steps) [aff00427579d4c915ee92553f712e4c632185e6e] ARM: 9379/1: coresight: tpda: drop owner assignment git bisect good aff00427579d4c915ee92553f712e4c632185e6e Bisecting: 12 revisions left to test after this (roughly 4 steps) [7b749aad1faa5bcb23b45b7126f677ab17324c40] ARM: 9393/1: mm: Use conditionals for CFI branches 7b749aad1faa5bcb23b45b7126f677ab17324c40 build error: ... drivers/uio/uio.c: In function ‘uio_mmap_dma_coherent’: drivers/uio/uio.c:795:16: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] 795 | addr = (void *)mem->addr; | ... arm-linux-gnueabihf-ld: kernel/trace/fgraph.o: in function `unregister_ftrace_graph': fgraph.c:(.text+0xa60): undefined reference to `ftrace_stub_graph' arm-linux-gnueabihf-ld: fgraph.c:(.text+0xa64): undefined reference to `ftrace_stub_graph' arm-linux-gnueabihf-ld: kernel/trace/fgraph.o: in function `.LANCHOR0': fgraph.c:(.data+0x1c): undefined reference to `ftrace_stub_graph' ... make[2]: *** [scripts/Makefile.vmlinux:37: vmlinux] Error 1 make[1]: *** [/aaa/ttt/kernel/Makefile:1160: vmlinux] Error 2 make[1]: *** Waiting for unfinished jobs.... ... make: *** [Makefile:240: __sub-make] Error 2 git bisect skip 7b749aad1faa5bcb23b45b7126f677ab17324c40 Bisecting: 12 revisions left to test after this (roughly 4 steps) [393999fa96273bab8d6efb2f4724030916afd61b] ARM: 9389/2: mm: Define prototypes for all per-processor calls git bisect good 393999fa96273bab8d6efb2f4724030916afd61b Bisecting: 10 revisions left to test after this (roughly 3 steps) [eebadafc3b14d9426fa9cc3ab0da0e48367c7114] ARM: 9398/1: Fix userspace enter on LPAE with CC_OPTIMIZE_FOR_SIZE=y git bisect bad eebadafc3b14d9426fa9cc3ab0da0e48367c7114 Bisecting: 2 revisions left to test after this (roughly 2 steps) [de7f60f0b03175ff056f18996d7e2577bc4baa65] ARM: 9357/2: Reduce the number of #ifdef CONFIG_CPU_SW_DOMAIN_PAN git bisect good de7f60f0b03175ff056f18996d7e2577bc4baa65 Bisecting: 1 revision left to test after this (roughly 1 step) [7af5b901e84743c608aae90cb0e429702812c324] ARM: 9358/2: Implement PAN for LPAE by TTBR0 page table walks disablement git bisect bad 7af5b901e84743c608aae90cb0e429702812c324 7af5b901e84743c608aae90cb0e429702812c324 is the first bad commit commit 7af5b901e84743c608aae90cb0e429702812c324 Author: Linus Walleij <linus.walleij@linaro.org> Date: Mon Mar 25 08:31:13 2024 +0100 ARM: 9358/2: Implement PAN for LPAE by TTBR0 page table walks disablement With LPAE enabled, privileged no-access cannot be enforced using CPU domains as such feature is not available. This patch implements PAN by disabling TTBR0 page table walks while in kernel mode. The ARM architecture allows page table walks to be split between TTBR0 and TTBR1. With LPAE enabled, the split is defined by a combination of TTBCR T0SZ and T1SZ bits. Currently, an LPAE-enabled kernel uses TTBR0 for user addresses and TTBR1 for kernel addresses with the VMSPLIT_2G and VMSPLIT_3G configurations. The main advantage for the 3:1 split is that TTBR1 is reduced to 2 levels, so potentially faster TLB refill (though usually the first level entries are already cached in the TLB). The PAN support on LPAE-enabled kernels uses TTBR0 when running in user space or in kernel space during user access routines (TTBCR T0SZ and T1SZ are both 0). When running user accesses are disabled in kernel mode, TTBR0 page table walks are disabled by setting TTBCR.EPD0. TTBR1 is used for kernel accesses (including loadable modules; anything covered by swapper_pg_dir) by reducing the TTBCR.T0SZ to the minimum (2^(32-7) = 32MB). To avoid user accesses potentially hitting stale TLB entries, the ASID is switched to 0 (reserved) by setting TTBCR.A1 and using the ASID value in TTBR1. The difference from a non-PAN kernel is that with the 3:1 memory split, TTBR1 always uses 3 levels of page tables. As part of the change we are using preprocessor elif definied() clauses so balance these clauses by converting relevant precedingt ifdef clauses to if defined() clauses. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> arch/arm/Kconfig | 22 +++++++++++++-- arch/arm/include/asm/assembler.h | 1 + arch/arm/include/asm/pgtable-3level-hwdef.h | 9 ++++++ arch/arm/include/asm/ptrace.h | 1 + arch/arm/include/asm/uaccess-asm.h | 44 ++++++++++++++++++++++++++++- arch/arm/include/asm/uaccess.h | 26 ++++++++++++++++- arch/arm/kernel/asm-offsets.c | 1 + arch/arm/kernel/suspend.c | 8 ++++++ arch/arm/lib/csumpartialcopyuser.S | 20 ++++++++++++- arch/arm/mm/fault.c | 29 +++++++++++++++++++ 10 files changed, 155 insertions(+), 6 deletions(-) ---- This does not affect mwifiex/mwifiex_sdio errors in dmesg, only segfault errors. LPAE is enabled in my kernel config. (In reply to Andrew from comment #6) > git bisect [...] Thx; can I CC you when forwarding this bug my mail? note, this will expose your email address to the public! Yes, you can. Created attachment 306924 [details]
dmesg
Tried version 6.10.1 commit 0129910096573d08ecb139b20e2940682f248186ed
Changing in config to
# CONFIG_CPU_TTBR0_PAN is not set
reverted automatically during build.
Segfault does not occur in qt-apps in case of
# CONFIG_ARM_PAN is not set
The patch did not help.
Maybe I should have checked on another commit?
Created attachment 306925 [details]
config
I have obtained a Chromebook XE303C12 to see if I can replicate this problem. What I do not understand is how you installed Devuan on the machine since it has only AMD64 and x86_64 images available on the download pages. Any hints on how to replicate your set-up? In the meantime I have discovered that Debian (upstream to Devuan I guess) is actually disabling PAN. I don't know why they do this but I have noted that doing so increases syscall performance. Maybe there are other reasons why they do this as well? Created attachment 307119 [details] Script used for build Devuan is a fork of Debian with the same architectures in the repository. The process of creating the system image same for any debian-based system and was borrowed from the Kali Linux arm systems build script with minor changes in the repository links. The kernel was cross-compiled from the chrooted AMD64 Kali Linux minimal environment (this was the very first option, and still used; to install build dependencies lines 289-307 should be uncommented), as a result we get a deb-package of the kernel. I will attach the script that was used for the build. Chromebook was switched to developer mode. You can also test with a Debian system, in case of a custom kernel, there will probably not be difference except of disk partition scheme and boot config. The only reason why Devuan was named is that it was already installed. In case of ready-made system images, it could be taken from https://gitlab.com/quarkscript/linarm#recoveryinstalltest-disk-images Last compiled kernel packages with patch and without PAN: https://drive.usercontent.google.com/download?id=1ad3jASrJmvncOsWUGn52Dpo0Hr9vY0UB&export=download https://drive.usercontent.google.com/download?id=1k5zkXGKBq7J2zoND1WZM6I-wxqLj8zqS&export=download Just checked kernel on Debian 12 and got same issue. [ 95.037173] lxqt-session: unhandled page fault (11) at 0x0048a000, code 0x207 [ 95.037182] [0048a000] *pgd=48499003, *pmd=bdaf4003 [ 95.037198] CPU: 0 PID: 414 Comm: lxqt-session Not tainted 6.10.1-dirty #1 [ 95.037207] Hardware name: Samsung Exynos (Flattened Device Tree) [ 95.037214] PC is at 0xb51c0cf4 [ 95.037224] LR is at 0xb5394215 [ 95.037231] pc : [<b51c0cf4>] lr : [<b5394215>] psr: 200f0030 [ 95.037237] sp : be8eb850 ip : 004885bc fp : 01422c18 [ 95.037244] r10: 004885bc r9 : 00000000 r8 : be8f38c0 [ 95.037250] r7 : 00000002 r6 : be8eb8b4 r5 : ffb3b4bf r4 : 00489ffa [ 95.037256] r3 : be8eb8a0 r2 : be8ed2b0 r1 : 002380ae r0 : 00989681 [ 95.037263] Flags: nzCv IRQs on FIQs on Mode USER_32 ISA Thumb Segment user [ 95.037271] Control: 30c5387d Table: 4784d480 DAC: fffffffd Used system https://drive.usercontent.google.com/download?id=1q9i1FITox8Q-cmYsz-UCqsgtMyB_DYp-&export=download&authuser=0 Hi Andrew, I was pointed to this bug because I found a similar one (or the same): https://lore.kernel.org/linux-arm-kernel/20241111233817.2f824c19@foxbook/T/#u What helped me was running the crashing program under strace (which thankfully didn't crash) and noticing strange behavior of syscalls right before the crash. Specifically, it was cacheflush failing with EFAULT for no reason, and this was actually the direct and relatively obvious cause of gdb segfaulting a moment later. If you can see similar cacheflush failures in your crashing applications, it's likely the same bug. If not, knowing what you applications are doing before they crash might still offer some clue, hopefully. Russell wrote this patch: https://lore.kernel.org/linux-arm-kernel/ZzMsMFNSHLOKEeEW@shell.armlinux.org.uk/ Andrew could you see if it solves your crash too? Yes, I can confirm. It is solved my crash too. |
Created attachment 306828 [details] dmesg Trying to run LxQt on a Chromebook XE303C12 with Devuan 4 and Linux kernel 6.10.8 results in a segmentation fault (for LxQt). There are no such problems with Linux kernel 6.9.12 or earlier. With Linux kernel 6.10.8 it is possible to run Xfce4, but trying to run for example Kate ends in a segmentation fault. Mesa 20.3.5, patched for partial hardware acceleration, preserves this acceleration in Xfce4. The mpv works using acceleration regardless of the Linux kernel version. dmesg does not give anything significantly new compared to previous kernel version.