Bug 206669
Summary: | Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load | ||
---|---|---|---|
Product: | Platform Specific/Hardware | Reporter: | John Paul Adrian Glaubitz (glaubitz) |
Component: | PPC-64 | Assignee: | platform_ppc-64 |
Status: | ASSIGNED --- | ||
Severity: | normal | CC: | aneesh.kumar, matorola, michael, msuchanek, nickpiggin, pmenzel+bugzilla.kernel.org |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 5.4.x | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
Backtrace of host system crashing with little-endian kernel
kern.log containing some crash dumps |
bugzilla-daemon@bugzilla.kernel.org's on February 26, 2020 1:26 am: > https://bugzilla.kernel.org/show_bug.cgi?id=206669 > > Bug ID: 206669 > Summary: Little-endian kernel crashing on POWER8 on heavy > big-endian PowerKVM load > Product: Platform Specific/Hardware > Version: 2.5 > Kernel Version: 5.4.x > Hardware: All > OS: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: PPC-64 > Assignee: platform_ppc-64@kernel-bugs.osdl.org > Reporter: glaubitz@physik.fu-berlin.de > CC: matorola@gmail.com > Regression: No > > Created attachment 287605 [details] > --> https://bugzilla.kernel.org/attachment.cgi?id=287605&action=edit > Backtrace of host system crashing with little-endian kernel > > We have an IBM POWER server (8247-42L) running Linux kernel 5.4.13 on Debian > unstable hosting a big-endian ppc64 virtual machine running the same kernel > in > big-endian mode. > > When building OpenJDK-11 on the big-endian VM, the testsuite crashes the > *host* > system which is little-endian with the following kernel backtrace. The > problem > reproduces both with kernel 4.19.98 as well as 5.4.13, both guest and host > running 5.4.x. > > Backtrace attached. Thanks for the report, we need to get more data about the first BUG if we can. What function in your vmlinux contains address 0xc00000000017a778? (use nm or objdump etc) Is that the first message you get, No warnings or anything else earlier in the dmesg? Also 0xc0000000002659a0 would be interesting. When reproducing, do you ever get a clean trace of the first bug? Could you try setting /proc/sys/kernel/panic_on_oops and reproducing? Thanks, Nick (In reply to npiggin from comment #1) > Thanks for the report, we need to get more data about the first BUG if > we can. What function in your vmlinux contains address > 0xc00000000017a778? (use nm or objdump etc) Seems to be t select_task_rq_fair: root@watson:/boot# nm vmlinux-5.4.0-0.bpo.3-powerpc64le |grep -C5 c00000000017a c000000000448550 T select_estimate_accuracy c000000000170d20 t select_fallback_rq c000000000e4c940 D select_idle_mask c000000000179f10 t select_idle_sibling c00000000018fd80 t select_task_rq_dl c00000000017a640 t select_task_rq_fair c000000000177f50 t select_task_rq_idle c00000000018c9e0 t select_task_rq_rt c00000000019c800 t select_task_rq_stop c000000000927710 t selem_alloc.isra.6 c000000000926e50 t selem_link_map root@watson:/boot# > Is that the first message you > get, > No warnings or anything else earlier in the dmesg? Correct. You can see the login prompt of the host VM watson directly after booting up. > Also 0xc0000000002659a0 would be interesting. Looks like that's ring_buffer_record_off: root@watson:/boot# nm vmlinux-5.4.0-0.bpo.3-powerpc64le |grep -C5 c0000000002659 c0000000002667e0 T ring_buffer_read_finish c00000000026b4b0 T ring_buffer_read_page c000000000265e10 T ring_buffer_read_prepare c000000000265ef0 T ring_buffer_read_prepare_sync c000000000269ae0 T ring_buffer_read_start c000000000265950 T ring_buffer_record_disable c000000000266070 T ring_buffer_record_disable_cpu c000000000265970 T ring_buffer_record_enable c0000000002660c0 T ring_buffer_record_enable_cpu c00000000026d470 T ring_buffer_record_is_on c00000000026d480 T ring_buffer_record_is_set_on c000000000265990 T ring_buffer_record_off c000000000265a10 T ring_buffer_record_on c000000000266da0 T ring_buffer_reset c000000000266a90 T ring_buffer_reset_cpu c000000000267cd0 T ring_buffer_resize c00000000026d400 T ring_buffer_set_clock root@watson:/boot# FWIW, the kernel image comes from this Debian package: > > http://snapshot.debian.org/archive/debian/20200211T210433Z/pool/main/l/linux/linux-image-5.4.0-0.bpo.3-powerpc64le_5.4.13-1%7Ebpo10%2B1_ppc64el.deb > When reproducing, do you ever get a clean trace of the first bug? I have logged everything that showed in the console during and after the crash. After that, the machine no longer responds and has to be hard-resetted. > Could you try setting /proc/sys/kernel/panic_on_oops and reproducing? I will try that. Anything to be considered for the kernel running inside the big-endian VM? bugzilla-daemon@bugzilla.kernel.org's on February 26, 2020 5:26 pm: > https://bugzilla.kernel.org/show_bug.cgi?id=206669 > > --- Comment #2 from John Paul Adrian Glaubitz (glaubitz@physik.fu-berlin.de) > --- > (In reply to npiggin from comment #1) >> Thanks for the report, we need to get more data about the first BUG if >> we can. What function in your vmlinux contains address >> 0xc00000000017a778? (use nm or objdump etc) > > Seems to be t select_task_rq_fair: > > root@watson:/boot# nm vmlinux-5.4.0-0.bpo.3-powerpc64le |grep -C5 > c00000000017a > c000000000448550 T select_estimate_accuracy > c000000000170d20 t select_fallback_rq > c000000000e4c940 D select_idle_mask > c000000000179f10 t select_idle_sibling > c00000000018fd80 t select_task_rq_dl > c00000000017a640 t select_task_rq_fair > c000000000177f50 t select_task_rq_idle > c00000000018c9e0 t select_task_rq_rt > c00000000019c800 t select_task_rq_stop > c000000000927710 t selem_alloc.isra.6 > c000000000926e50 t selem_link_map > root@watson:/boot# > >> Is that the first message you >> get, >> No warnings or anything else earlier in the dmesg? > > Correct. You can see the login prompt of the host VM watson directly after > booting up. > >> Also 0xc0000000002659a0 would be interesting. > > Looks like that's ring_buffer_record_off: > > root@watson:/boot# nm vmlinux-5.4.0-0.bpo.3-powerpc64le |grep -C5 > c0000000002659 > c0000000002667e0 T ring_buffer_read_finish > c00000000026b4b0 T ring_buffer_read_page > c000000000265e10 T ring_buffer_read_prepare > c000000000265ef0 T ring_buffer_read_prepare_sync > c000000000269ae0 T ring_buffer_read_start > c000000000265950 T ring_buffer_record_disable > c000000000266070 T ring_buffer_record_disable_cpu > c000000000265970 T ring_buffer_record_enable > c0000000002660c0 T ring_buffer_record_enable_cpu > c00000000026d470 T ring_buffer_record_is_on > c00000000026d480 T ring_buffer_record_is_set_on > c000000000265990 T ring_buffer_record_off > c000000000265a10 T ring_buffer_record_on > c000000000266da0 T ring_buffer_reset > c000000000266a90 T ring_buffer_reset_cpu > c000000000267cd0 T ring_buffer_resize > c00000000026d400 T ring_buffer_set_clock > root@watson:/boot# Thanks. Okay it looks like what's happening here is something crashes in select_task_rq_fair (kernel data access fault). It's then able to print out those first two lines but then it calls die(), which ends up calling oops_enter() which calls tracing_off(), which calls tracer_tracing_off and crashes there, which goes around the same cycle only printing out the first two lines. Nothing obvious as to why those accesses in particular are crashing. The first data address is 0xc000000002bfd038, the second is 0xc0000007f9070c08. Not vmalloc space, not above the 1TB segment. Do you have tracing / ftrace enabled in the host kernel for any reason? Turning that off might let the oops message get printed. > > FWIW, the kernel image comes from this Debian package: > >> >> >> http://snapshot.debian.org/archive/debian/20200211T210433Z/pool/main/l/linux/linux-image-5.4.0-0.bpo.3-powerpc64le_5.4.13-1%7Ebpo10%2B1_ppc64el.deb Okay. Any chance you could test an upstream kernel? > >> When reproducing, do you ever get a clean trace of the first bug? > > I have logged everything that showed in the console during and after the > crash. > After that, the machine no longer responds and has to be hard-resetted. > >> Could you try setting /proc/sys/kernel/panic_on_oops and reproducing? > > I will try that. Don't bother testing that after the above -- panic_on_oops happens after oops_begin(), so it won't help unfortunately. Attmepting to get into xmon might though, if you boot with xmon=on. Try that if tracing wasn't enabled, or disabling it doesn't help. > > Anything to be considered for the kernel running inside the big-endian VM? > Not that I'm aware of really. Certainly it shouldn't be able to crash the host even if the guest was doing something stupid. Thanks, Nick (In reply to npiggin from comment #3) > Do you have tracing / ftrace enabled in the host kernel for any > reason? Turning that off might let the oops message get printed. Seems that this is the case in the Debian kernel, yes: root@watson:~# grep -i ftrace /boot/config-5.4.0-0.bpo.3-powerpc64le CONFIG_KPROBES_ON_FTRACE=y CONFIG_HAVE_KPROBES_ON_FTRACE=y CONFIG_HAVE_DYNAMIC_FTRACE=y CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS=y CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y CONFIG_FTRACE=y CONFIG_FTRACE_SYSCALLS=y CONFIG_DYNAMIC_FTRACE=y CONFIG_DYNAMIC_FTRACE_WITH_REGS=y CONFIG_FTRACE_MCOUNT_RECORD=y # CONFIG_FTRACE_STARTUP_TEST is not set root@watson:~# Do you have the kernel command option at hand which disables ftrace on the command line? Is it just ftrace=off? > > FWIW, the kernel image comes from this Debian package: > > > >> > >> > >> > http://snapshot.debian.org/archive/debian/20200211T210433Z/pool/main/l/linux/linux-image-5.4.0-0.bpo.3-powerpc64le_5.4.13-1%7Ebpo10%2B1_ppc64el.deb > > Okay. Any chance you could test an upstream kernel? Sure, absolutely. Any preference on the version number? > Don't bother testing that after the above -- panic_on_oops happens > after oops_begin(), so it won't help unfortunately. Okay. > Attmepting to get into xmon might though, if you boot with xmon=on. > Try that if tracing wasn't enabled, or disabling it doesn't help. Okay. I will try to disable ftrace first, then retrigger the crash. > > > > Anything to be considered for the kernel running inside the big-endian VM? > > > > Not that I'm aware of really. Certainly it shouldn't be able to crash > the host even if the guest was doing something stupid. I agree. bugzilla-daemon@bugzilla.kernel.org's on February 26, 2020 8:28 pm: > https://bugzilla.kernel.org/show_bug.cgi?id=206669 > > --- Comment #4 from John Paul Adrian Glaubitz (glaubitz@physik.fu-berlin.de) > --- > (In reply to npiggin from comment #3) >> Do you have tracing / ftrace enabled in the host kernel for any >> reason? Turning that off might let the oops message get printed. > > Seems that this is the case in the Debian kernel, yes: > > root@watson:~# grep -i ftrace /boot/config-5.4.0-0.bpo.3-powerpc64le > CONFIG_KPROBES_ON_FTRACE=y > CONFIG_HAVE_KPROBES_ON_FTRACE=y > CONFIG_HAVE_DYNAMIC_FTRACE=y > CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS=y > CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y > CONFIG_FTRACE=y > CONFIG_FTRACE_SYSCALLS=y > CONFIG_DYNAMIC_FTRACE=y > CONFIG_DYNAMIC_FTRACE_WITH_REGS=y > CONFIG_FTRACE_MCOUNT_RECORD=y > # CONFIG_FTRACE_STARTUP_TEST is not set > root@watson:~# > > Do you have the kernel command option at hand which disables ftrace on the > command line? Is it just ftrace=off? Hmm, not sure, Documentation/admin-guide/kernel-parameters.txt seems to say that wouldn't work. I thought it might only be going down that path if you have already done some tracing. Perhaps ensure /sys/kernel/debug/tracing/tracing_on is set to 0, and then `echo 1 > /sys/kernel/debug/tracing/free_buffer` before you start the test. >> > FWIW, the kernel image comes from this Debian package: >> > >> >> >> >> >> >> >> >> http://snapshot.debian.org/archive/debian/20200211T210433Z/pool/main/l/linux/linux-image-5.4.0-0.bpo.3-powerpc64le_5.4.13-1%7Ebpo10%2B1_ppc64el.deb >> >> Okay. Any chance you could test an upstream kernel? > > Sure, absolutely. Any preference on the version number? Current head if you're feeling lucky, but v5.5 if not. But you can give the ftrace test a try with the debian kernel first if you've got it ready to go. >> Don't bother testing that after the above -- panic_on_oops happens >> after oops_begin(), so it won't help unfortunately. > > Okay. > >> Attmepting to get into xmon might though, if you boot with xmon=on. >> Try that if tracing wasn't enabled, or disabling it doesn't help. > > Okay. I will try to disable ftrace first, then retrigger the crash. Cool Thanks, Nick (In reply to npiggin from comment #5) > I thought it might only be going down that path if you have already done > some tracing. Perhaps ensure /sys/kernel/debug/tracing/tracing_on is set > to 0, and then `echo 1 > /sys/kernel/debug/tracing/free_buffer` before > you start the test. I have done this now and I'm performing the test. Let's see if we can get some more output. > >> Okay. Any chance you could test an upstream kernel? > > > > Sure, absolutely. Any preference on the version number? > > Current head if you're feeling lucky, but v5.5 if not. But you can > give the ftrace test a try with the debian kernel first if you've got > it ready to go. I think I will try the latest 5.5.x first. After the test with the Debian kernel and tracing turned off. I have set /sys/kernel/debug/tracing/tracing_on to "0" and /sys/kernel/debug/tracing/free_buffer to "1" and it seems I can no longer reproduce the issue. I will have to do more testing to see if that's just an artifact or really related. Created attachment 287823 [details]
kern.log containing some crash dumps
I have another trace of the crash. Not sure whether this was with tracing disabled.
Does that help?
Also, can you try disabling THP. echo "never" > /sys/kernel/mm/transparent_hugepage/enabled -aneesh (In reply to Aneesh Kumar KV from comment #9) > Also, can you try disabling THP. echo "never" > > /sys/kernel/mm/transparent_hugepage/enabled Yes. Just disabled. FWIW, the machine just crashed some minutes ago but I didn't have a serial console open to capture the trace. It seems I can provoke the crash by running the glibc testsuite in a big-endian guest VM. The machine just crashed with the IPMI console open but the only message the kernel printed was: "watson login: [ 1809.138398] KVM: couldn't grab cpu 115" I have not observed the kernel buffer though. But I will try to provoke the crash now while having the kernel log open. Another crash: watson login: [17667512263.751484] BUG: Unable to handle kernel data access at 0xc000000ff06e4838 [17667512263.751507] Faulting instruction address: 0xc00000000017a778 [17667512263.751513] BUG: Unable to handle kernel data access at 0xc0000007f9070c08 [17667512263.751517] Faulting instruction address: 0xc0000000002659a0 [17667512263.751521] BUG: Unable to handle kernel data access at 0xc0000007f9070c08 [17667512263.751525] Faulting instruction address: 0xc0000000002659a0 [17667512263.751529] BUG: Unable to handle kernel data access at 0xc0000007f9070c08 [17667512263.751533] Faulting instruction address: 0xc0000000002659a0 [17667512263.751537] BUG: Unable to handle kernel data access at 0xc0000007f9070c08 [17667512263.751541] Faulting instruction address: 0xc0000000002659a0 [17667512263.751545] BUG: Unable to handle kernel data access at 0xc0000007f9070c08 [17667512263.751548] Faulting instruction address: 0xc0000000002659a0 [17667512263.751552] BUG: Unable to handle kernel data access at 0xc0000007f9070c08 [17667512263.751556] Faulting instruction address: 0xc0000000002659a0 [17667512263.751560] BUG: Unable to handle kernel data access at 0xc0000007f9070c08 [17667512263.751564] Faulting instruction address: 0xc0000000002659a0 [17667512263.751569] BUG: Unable to handle kernel data access at 0xc0000007f9070c08 [17667512263.751574] Faulting instruction address: 0xc0000000002659a0 [17667512263.751578] BUG: Unable to handle kernel data access at 0xc0000007f9070c08 [17667512263.751583] Faulting instruction address: 0xc0000000002659a0 [17667512263.751587] BUG: Unable to handle kernel data access at 0xc0000007f9070c08 [17667512263.751591] Faulting instruction address: 0xc0000000002659a0 [17667512263.751596] BUG: Unable to handle kernel data access at 0xc0000007f9070c08 [17667512263.751600] Faulting instruction address: 0xc0000000002659a0 [17667512263.751604] Thread overran stack, or stack corrupted [17667512263.751608] BUG: Unable to handle kernel data access at 0xc0000007f9070c08 [17667512263.751612] Faulting instruction address: 0xc0000000002659a0 [17667512263.751615] Thread overran stack, or stack corrupted [17667512263.751618] BUG: Unable to handle kernel data access at 0xc0000007f9070c08 [ 1835.743178] BUG: Unable to handle unknown paging fault at 0xc000000000c4b363 [ 1835.743180] Faulting instruction address: 0x00000000 [17667512263.751633] Faulting instruction address: 0xc0000000002659a0 [ 1835.743195] Oops: Kernel access of bad area, sig: 11 [#1] [ 1835.743198] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV [ 1835.743203] Modules linked in: [17667512263.751652] Thread overran stack, or stack corrupted [ 1835.743205] watson login: [30887.552539] KVM: CPU 4 seems to be stuck [30900.094713] watchdog: CPU 8 detected hard LOCKUP on other CPUs 0 [30900.094730] watchdog: CPU 8 TB:15863742878763, last SMP heartbeat TB:15855546563837 (16008ms ago) [30908.222926] watchdog: BUG: soft lockup - CPU#80 stuck for 22s! [CPU 4/KVM:2698] [30908.374929] watchdog: BUG: soft lockup - CPU#112 stuck for 22s! [CPU 23/KVM:2717] [30908.426934] watchdog: BUG: soft lockup - CPU#120 stuck for 22s! [CPU 16/KVM:2710] [30909.570962] rcu: INFO: rcu_sched self-detected stall on CPU [30909.570970] rcu: 120-....: (5059 ticks this GP) idle=7d2/1/0x4000000000000002 softirq=421758/421758 fqs=2378 [30912.095025] watchdog: BUG: soft lockup - CPU#8 stuck for 23s! [CPU 18/KVM:2712] [30912.127027] watchdog: BUG: soft lockup - CPU#40 stuck for 23s! [CPU 22/KVM:2716] [30912.155026] watchdog: BUG: soft lockup - CPU#56 stuck for 23s! [CPU 27/KVM:2721] [30912.175028] watchdog: BUG: soft lockup - CPU#64 stuck for 23s! [CPU 26/KVM:2720] [30912.195028] watchdog: BUG: soft lockup - CPU#72 stuck for 23s! [CPU 19/KVM:2713] [30912.547038] watchdog: BUG: soft lockup - CPU#136 stuck for 22s! [CPU 8/KVM:2702] [30912.619040] watchdog: BUG: soft lockup - CPU#144 stuck for 22s! [CPU 5/KVM:2699] Still reproduces with Linux 5.10.46 from Debian Bullseye. After a day and a half I have managed to get BE debian installed in a VM :} You said "running the glibc testsuite" was enough to trigger it. Do you mean from the glibc git tree? I can't get upstream, or the debian packaged glibc sources to build. Both fail building with: ../include/setjmp.h:42:3: error: static assertion failed: "size of jmp_buf != 656" 42 | _Static_assert (sizeof (type) == size, \ | ^~~~~~~~~~~~~~ I guess I'm doing something wrong. Any pointers on what your setup is? Hi Michael! Thanks a lot for looking into this! If you have installed a Debian unstable big-endian system, the easiest way to get such a setup by creating an sbuild chroot. You should set up an sbuild chroot for both powerpc and ppc64: $ sbuild-createchroot --arch=powerpc $ sbuild-createchroot --arch=ppc64 and then build the glibc package using sbuild for both powerpc and ppc64 in parallel which is what makes the VM and the host crash during the testsuite: $ dget -u https://deb.debian.org/debian/pool/main/g/glibc/glibc_2.32-2.dsc In one shell: $ sbuild -d sid --arch=ppc64 --no-arch-all glibc_2.32-2.dsc and in a second one: $ sbuild -d sid --arch=powerpc --no-arch-all glibc_2.32-2.dsc If glibc doesn't trigger the crash, try gcc-10 or llvm-toolchain-13: $ dget -u https://deb.debian.org/debian/pool/main/l/llvm-toolchain-13/llvm-toolchain-13_13.0.0~+rc2-3.dsc $ dget -u https://deb.debian.org/debian/pool/main/g/gcc-11/gcc-11_11.2.0-5.dsc POWER server crashes with 100% reproducibility when building GCC in a powerpc chroot and GCC in a ppc64 chroot on the ppc64 KVM instance at the same time. And I assume it's the testsuite that kills both the KVM instance and the host system. There seems to be a related discussion: > https://yhbt.net/lore/all/20200831091523.GC29521@kitsune.suse.cz/T/ This suspects 10d91611f426d4bafd2a83d966c36da811b2f7ad to be the cause: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=10d91611f426d4bafd2a83d966c36da811b2f7ad In case any runs into this issue, a workaround is disabling "dynamic_mt_modes": # echo 0 > /sys/module/kvm_hv/parameters/dynamic_mt_modes This fixes the crashes for me with a 5.15.x kernel. |
Created attachment 287605 [details] Backtrace of host system crashing with little-endian kernel We have an IBM POWER server (8247-42L) running Linux kernel 5.4.13 on Debian unstable hosting a big-endian ppc64 virtual machine running the same kernel in big-endian mode. When building OpenJDK-11 on the big-endian VM, the testsuite crashes the *host* system which is little-endian with the following kernel backtrace. The problem reproduces both with kernel 4.19.98 as well as 5.4.13, both guest and host running 5.4.x. Backtrace attached.