Bug 206669 - Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load
Summary: Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load
Status: NEW
Alias: None
Product: Platform Specific/Hardware
Classification: Unclassified
Component: PPC-64 (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: platform_ppc-64
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-02-25 15:26 UTC by John Paul Adrian Glaubitz
Modified: 2021-01-11 11:49 UTC (History)
3 users (show)

See Also:
Kernel Version: 5.4.x
Tree: Mainline
Regression: No


Attachments
Backtrace of host system crashing with little-endian kernel (134.61 KB, text/plain)
2020-02-25 15:26 UTC, John Paul Adrian Glaubitz
Details
kern.log containing some crash dumps (500.21 KB, text/plain)
2020-03-07 21:56 UTC, John Paul Adrian Glaubitz
Details

Description John Paul Adrian Glaubitz 2020-02-25 15:26:14 UTC
Created attachment 287605 [details]
Backtrace of host system crashing with little-endian kernel

We have an IBM POWER server (8247-42L) running Linux kernel 5.4.13 on Debian unstable hosting a big-endian ppc64 virtual machine running the same kernel in big-endian mode.

When building OpenJDK-11 on the big-endian VM, the testsuite crashes the *host* system which is little-endian with the following kernel backtrace. The problem reproduces both with kernel 4.19.98 as well as 5.4.13, both guest and host running 5.4.x.

Backtrace attached.
Comment 1 npiggin 2020-02-26 04:06:53 UTC
bugzilla-daemon@bugzilla.kernel.org's on February 26, 2020 1:26 am:
> https://bugzilla.kernel.org/show_bug.cgi?id=206669
> 
>             Bug ID: 206669
>            Summary: Little-endian kernel crashing on POWER8 on heavy
>                     big-endian PowerKVM load
>            Product: Platform Specific/Hardware
>            Version: 2.5
>     Kernel Version: 5.4.x
>           Hardware: All
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: PPC-64
>           Assignee: platform_ppc-64@kernel-bugs.osdl.org
>           Reporter: glaubitz@physik.fu-berlin.de
>                 CC: matorola@gmail.com
>         Regression: No
> 
> Created attachment 287605 [details]
>   --> https://bugzilla.kernel.org/attachment.cgi?id=287605&action=edit
> Backtrace of host system crashing with little-endian kernel
> 
> We have an IBM POWER server (8247-42L) running Linux kernel 5.4.13 on Debian
> unstable hosting a big-endian ppc64 virtual machine running the same kernel
> in
> big-endian mode.
> 
> When building OpenJDK-11 on the big-endian VM, the testsuite crashes the
> *host*
> system which is little-endian with the following kernel backtrace. The
> problem
> reproduces both with kernel 4.19.98 as well as 5.4.13, both guest and host
> running 5.4.x.
> 
> Backtrace attached.

Thanks for the report, we need to get more data about the first BUG if 
we can. What function in your vmlinux contains address 
0xc00000000017a778? (use nm or objdump etc) Is that the first message you get,
No warnings or anything else earlier in the dmesg?

Also 0xc0000000002659a0 would be interesting.

When reproducing, do you ever get a clean trace of the first bug? Could
you try setting /proc/sys/kernel/panic_on_oops and reproducing?

Thanks,
Nick
Comment 2 John Paul Adrian Glaubitz 2020-02-26 07:26:31 UTC
(In reply to npiggin from comment #1)
> Thanks for the report, we need to get more data about the first BUG if 
> we can. What function in your vmlinux contains address 
> 0xc00000000017a778? (use nm or objdump etc)

Seems to be t select_task_rq_fair:

root@watson:/boot# nm vmlinux-5.4.0-0.bpo.3-powerpc64le |grep -C5 c00000000017a
c000000000448550 T select_estimate_accuracy
c000000000170d20 t select_fallback_rq
c000000000e4c940 D select_idle_mask
c000000000179f10 t select_idle_sibling
c00000000018fd80 t select_task_rq_dl
c00000000017a640 t select_task_rq_fair
c000000000177f50 t select_task_rq_idle
c00000000018c9e0 t select_task_rq_rt
c00000000019c800 t select_task_rq_stop
c000000000927710 t selem_alloc.isra.6
c000000000926e50 t selem_link_map
root@watson:/boot#

> Is that the first message you
> get,
> No warnings or anything else earlier in the dmesg?

Correct. You can see the login prompt of the host VM watson directly after booting up.

> Also 0xc0000000002659a0 would be interesting.

Looks like that's ring_buffer_record_off:

root@watson:/boot# nm vmlinux-5.4.0-0.bpo.3-powerpc64le |grep -C5 c0000000002659
c0000000002667e0 T ring_buffer_read_finish
c00000000026b4b0 T ring_buffer_read_page
c000000000265e10 T ring_buffer_read_prepare
c000000000265ef0 T ring_buffer_read_prepare_sync
c000000000269ae0 T ring_buffer_read_start
c000000000265950 T ring_buffer_record_disable
c000000000266070 T ring_buffer_record_disable_cpu
c000000000265970 T ring_buffer_record_enable
c0000000002660c0 T ring_buffer_record_enable_cpu
c00000000026d470 T ring_buffer_record_is_on
c00000000026d480 T ring_buffer_record_is_set_on
c000000000265990 T ring_buffer_record_off
c000000000265a10 T ring_buffer_record_on
c000000000266da0 T ring_buffer_reset
c000000000266a90 T ring_buffer_reset_cpu
c000000000267cd0 T ring_buffer_resize
c00000000026d400 T ring_buffer_set_clock
root@watson:/boot#

FWIW, the kernel image comes from this Debian package:

>
> http://snapshot.debian.org/archive/debian/20200211T210433Z/pool/main/l/linux/linux-image-5.4.0-0.bpo.3-powerpc64le_5.4.13-1%7Ebpo10%2B1_ppc64el.deb

> When reproducing, do you ever get a clean trace of the first bug?

I have logged everything that showed in the console during and after the crash. After that, the machine no longer responds and has to be hard-resetted.

> Could you try setting /proc/sys/kernel/panic_on_oops and reproducing?

I will try that.

Anything to be considered for the kernel running inside the big-endian VM?
Comment 3 npiggin 2020-02-26 09:29:19 UTC
bugzilla-daemon@bugzilla.kernel.org's on February 26, 2020 5:26 pm:
> https://bugzilla.kernel.org/show_bug.cgi?id=206669
> 
> --- Comment #2 from John Paul Adrian Glaubitz (glaubitz@physik.fu-berlin.de)
> ---
> (In reply to npiggin from comment #1)
>> Thanks for the report, we need to get more data about the first BUG if 
>> we can. What function in your vmlinux contains address 
>> 0xc00000000017a778? (use nm or objdump etc)
> 
> Seems to be t select_task_rq_fair:
> 
> root@watson:/boot# nm vmlinux-5.4.0-0.bpo.3-powerpc64le |grep -C5
> c00000000017a
> c000000000448550 T select_estimate_accuracy
> c000000000170d20 t select_fallback_rq
> c000000000e4c940 D select_idle_mask
> c000000000179f10 t select_idle_sibling
> c00000000018fd80 t select_task_rq_dl
> c00000000017a640 t select_task_rq_fair
> c000000000177f50 t select_task_rq_idle
> c00000000018c9e0 t select_task_rq_rt
> c00000000019c800 t select_task_rq_stop
> c000000000927710 t selem_alloc.isra.6
> c000000000926e50 t selem_link_map
> root@watson:/boot#
> 
>> Is that the first message you
>> get,
>> No warnings or anything else earlier in the dmesg?
> 
> Correct. You can see the login prompt of the host VM watson directly after
> booting up.
> 
>> Also 0xc0000000002659a0 would be interesting.
> 
> Looks like that's ring_buffer_record_off:
> 
> root@watson:/boot# nm vmlinux-5.4.0-0.bpo.3-powerpc64le |grep -C5
> c0000000002659
> c0000000002667e0 T ring_buffer_read_finish
> c00000000026b4b0 T ring_buffer_read_page
> c000000000265e10 T ring_buffer_read_prepare
> c000000000265ef0 T ring_buffer_read_prepare_sync
> c000000000269ae0 T ring_buffer_read_start
> c000000000265950 T ring_buffer_record_disable
> c000000000266070 T ring_buffer_record_disable_cpu
> c000000000265970 T ring_buffer_record_enable
> c0000000002660c0 T ring_buffer_record_enable_cpu
> c00000000026d470 T ring_buffer_record_is_on
> c00000000026d480 T ring_buffer_record_is_set_on
> c000000000265990 T ring_buffer_record_off
> c000000000265a10 T ring_buffer_record_on
> c000000000266da0 T ring_buffer_reset
> c000000000266a90 T ring_buffer_reset_cpu
> c000000000267cd0 T ring_buffer_resize
> c00000000026d400 T ring_buffer_set_clock
> root@watson:/boot#

Thanks.

Okay it looks like what's happening here is something crashes in
select_task_rq_fair (kernel data access fault). It's then able to
print out those first two lines but then it calls die(), which
ends up calling oops_enter() which calls tracing_off(), which calls
tracer_tracing_off and crashes there, which goes around the same
cycle only printing out the first two lines.

Nothing obvious as to why those accesses in particular are crashing.
The first data address is 0xc000000002bfd038, the second is
0xc0000007f9070c08. Not vmalloc space, not above the 1TB segment.

Do you have tracing / ftrace enabled in the host kernel for any
reason? Turning that off might let the oops message get printed.

> 
> FWIW, the kernel image comes from this Debian package:
> 
>>
>>
>> http://snapshot.debian.org/archive/debian/20200211T210433Z/pool/main/l/linux/linux-image-5.4.0-0.bpo.3-powerpc64le_5.4.13-1%7Ebpo10%2B1_ppc64el.deb

Okay. Any chance you could test an upstream kernel? 
> 
>> When reproducing, do you ever get a clean trace of the first bug?
> 
> I have logged everything that showed in the console during and after the
> crash.
> After that, the machine no longer responds and has to be hard-resetted.
> 
>> Could you try setting /proc/sys/kernel/panic_on_oops and reproducing?
> 
> I will try that.

Don't bother testing that after the above -- panic_on_oops happens
after oops_begin(), so it won't help unfortunately.

Attmepting to get into xmon might though, if you boot with xmon=on.
Try that if tracing wasn't enabled, or disabling it doesn't help.

> 
> Anything to be considered for the kernel running inside the big-endian VM?
> 

Not that I'm aware of really. Certainly it shouldn't be able to crash
the host even if the guest was doing something stupid.

Thanks,
Nick
Comment 4 John Paul Adrian Glaubitz 2020-02-26 10:28:27 UTC
(In reply to npiggin from comment #3)
> Do you have tracing / ftrace enabled in the host kernel for any
> reason? Turning that off might let the oops message get printed.

Seems that this is the case in the Debian kernel, yes:

root@watson:~# grep -i ftrace /boot/config-5.4.0-0.bpo.3-powerpc64le 
CONFIG_KPROBES_ON_FTRACE=y
CONFIG_HAVE_KPROBES_ON_FTRACE=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_FTRACE=y
CONFIG_FTRACE_SYSCALLS=y
CONFIG_DYNAMIC_FTRACE=y
CONFIG_DYNAMIC_FTRACE_WITH_REGS=y
CONFIG_FTRACE_MCOUNT_RECORD=y
# CONFIG_FTRACE_STARTUP_TEST is not set
root@watson:~#

Do you have the kernel command option at hand which disables ftrace on the command line? Is it just ftrace=off?

> > FWIW, the kernel image comes from this Debian package:
> > 
> >>
> >>
> >>
> http://snapshot.debian.org/archive/debian/20200211T210433Z/pool/main/l/linux/linux-image-5.4.0-0.bpo.3-powerpc64le_5.4.13-1%7Ebpo10%2B1_ppc64el.deb
> 
> Okay. Any chance you could test an upstream kernel? 

Sure, absolutely. Any preference on the version number?

> Don't bother testing that after the above -- panic_on_oops happens
> after oops_begin(), so it won't help unfortunately.

Okay.

> Attmepting to get into xmon might though, if you boot with xmon=on.
> Try that if tracing wasn't enabled, or disabling it doesn't help.

Okay. I will try to disable ftrace first, then retrigger the crash.

> > 
> > Anything to be considered for the kernel running inside the big-endian VM?
> > 
> 
> Not that I'm aware of really. Certainly it shouldn't be able to crash
> the host even if the guest was doing something stupid.

I agree.
Comment 5 npiggin 2020-02-26 11:08:14 UTC
bugzilla-daemon@bugzilla.kernel.org's on February 26, 2020 8:28 pm:
> https://bugzilla.kernel.org/show_bug.cgi?id=206669
> 
> --- Comment #4 from John Paul Adrian Glaubitz (glaubitz@physik.fu-berlin.de)
> ---
> (In reply to npiggin from comment #3)
>> Do you have tracing / ftrace enabled in the host kernel for any
>> reason? Turning that off might let the oops message get printed.
> 
> Seems that this is the case in the Debian kernel, yes:
> 
> root@watson:~# grep -i ftrace /boot/config-5.4.0-0.bpo.3-powerpc64le 
> CONFIG_KPROBES_ON_FTRACE=y
> CONFIG_HAVE_KPROBES_ON_FTRACE=y
> CONFIG_HAVE_DYNAMIC_FTRACE=y
> CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS=y
> CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
> CONFIG_FTRACE=y
> CONFIG_FTRACE_SYSCALLS=y
> CONFIG_DYNAMIC_FTRACE=y
> CONFIG_DYNAMIC_FTRACE_WITH_REGS=y
> CONFIG_FTRACE_MCOUNT_RECORD=y
> # CONFIG_FTRACE_STARTUP_TEST is not set
> root@watson:~#
> 
> Do you have the kernel command option at hand which disables ftrace on the
> command line? Is it just ftrace=off?

Hmm, not sure, Documentation/admin-guide/kernel-parameters.txt seems
to say that wouldn't work.

I thought it might only be going down that path if you have already done
some tracing. Perhaps ensure /sys/kernel/debug/tracing/tracing_on is set
to 0, and then `echo 1 > /sys/kernel/debug/tracing/free_buffer` before
you start the test.

>> > FWIW, the kernel image comes from this Debian package:
>> > 
>> >>
>> >>
>> >>
>>
>> http://snapshot.debian.org/archive/debian/20200211T210433Z/pool/main/l/linux/linux-image-5.4.0-0.bpo.3-powerpc64le_5.4.13-1%7Ebpo10%2B1_ppc64el.deb
>> 
>> Okay. Any chance you could test an upstream kernel? 
> 
> Sure, absolutely. Any preference on the version number?

Current head if you're feeling lucky, but v5.5 if not. But you can
give the ftrace test a try with the debian kernel first if you've got
it ready to go.

>> Don't bother testing that after the above -- panic_on_oops happens
>> after oops_begin(), so it won't help unfortunately.
> 
> Okay.
> 
>> Attmepting to get into xmon might though, if you boot with xmon=on.
>> Try that if tracing wasn't enabled, or disabling it doesn't help.
> 
> Okay. I will try to disable ftrace first, then retrigger the crash.

Cool

Thanks,
Nick
Comment 6 John Paul Adrian Glaubitz 2020-02-26 12:02:20 UTC
(In reply to npiggin from comment #5)
> I thought it might only be going down that path if you have already done
> some tracing. Perhaps ensure /sys/kernel/debug/tracing/tracing_on is set
> to 0, and then `echo 1 > /sys/kernel/debug/tracing/free_buffer` before
> you start the test.

I have done this now and I'm performing the test. Let's see if we can get some more output.

> >> Okay. Any chance you could test an upstream kernel? 
> > 
> > Sure, absolutely. Any preference on the version number?
> 
> Current head if you're feeling lucky, but v5.5 if not. But you can
> give the ftrace test a try with the debian kernel first if you've got
> it ready to go.

I think I will try the latest 5.5.x first. After the test with the Debian kernel and tracing turned off.
Comment 7 John Paul Adrian Glaubitz 2020-02-27 16:07:49 UTC
I have set /sys/kernel/debug/tracing/tracing_on to "0" and /sys/kernel/debug/tracing/free_buffer to "1" and it seems I can no longer reproduce the issue.

I will have to do more testing to see if that's just an artifact or really related.
Comment 8 John Paul Adrian Glaubitz 2020-03-07 21:56:19 UTC
Created attachment 287823 [details]
kern.log containing some crash dumps

I have another trace of the crash. Not sure whether this was with tracing disabled.

Does that help?
Comment 9 Aneesh Kumar KV 2020-03-10 12:25:53 UTC
Also, can you try disabling THP. echo "never" > /sys/kernel/mm/transparent_hugepage/enabled 

-aneesh
Comment 10 John Paul Adrian Glaubitz 2020-03-10 12:28:07 UTC
(In reply to Aneesh Kumar KV from comment #9)
> Also, can you try disabling THP. echo "never" >
> /sys/kernel/mm/transparent_hugepage/enabled 

Yes. Just disabled.

FWIW, the machine just crashed some minutes ago but I didn't have a serial console open to capture the trace.
Comment 11 John Paul Adrian Glaubitz 2020-03-14 10:08:45 UTC
It seems I can provoke the crash by running the glibc testsuite in a big-endian guest VM.

The machine just crashed with the IPMI console open but the only message the kernel printed was:

"watson login: [ 1809.138398] KVM: couldn't grab cpu 115"

I have not observed the kernel buffer though. But I will try to provoke the crash now while having the kernel log open.
Comment 12 John Paul Adrian Glaubitz 2020-03-16 18:13:34 UTC
Another crash:

watson login: [17667512263.751484] BUG: Unable to handle kernel data access at 0xc000000ff06e4838
[17667512263.751507] Faulting instruction address: 0xc00000000017a778
[17667512263.751513] BUG: Unable to handle kernel data access at 0xc0000007f9070c08
[17667512263.751517] Faulting instruction address: 0xc0000000002659a0
[17667512263.751521] BUG: Unable to handle kernel data access at 0xc0000007f9070c08
[17667512263.751525] Faulting instruction address: 0xc0000000002659a0
[17667512263.751529] BUG: Unable to handle kernel data access at 0xc0000007f9070c08
[17667512263.751533] Faulting instruction address: 0xc0000000002659a0
[17667512263.751537] BUG: Unable to handle kernel data access at 0xc0000007f9070c08
[17667512263.751541] Faulting instruction address: 0xc0000000002659a0
[17667512263.751545] BUG: Unable to handle kernel data access at 0xc0000007f9070c08
[17667512263.751548] Faulting instruction address: 0xc0000000002659a0
[17667512263.751552] BUG: Unable to handle kernel data access at 0xc0000007f9070c08
[17667512263.751556] Faulting instruction address: 0xc0000000002659a0
[17667512263.751560] BUG: Unable to handle kernel data access at 0xc0000007f9070c08
[17667512263.751564] Faulting instruction address: 0xc0000000002659a0
[17667512263.751569] BUG: Unable to handle kernel data access at 0xc0000007f9070c08
[17667512263.751574] Faulting instruction address: 0xc0000000002659a0
[17667512263.751578] BUG: Unable to handle kernel data access at 0xc0000007f9070c08
[17667512263.751583] Faulting instruction address: 0xc0000000002659a0
[17667512263.751587] BUG: Unable to handle kernel data access at 0xc0000007f9070c08
[17667512263.751591] Faulting instruction address: 0xc0000000002659a0
[17667512263.751596] BUG: Unable to handle kernel data access at 0xc0000007f9070c08
[17667512263.751600] Faulting instruction address: 0xc0000000002659a0
[17667512263.751604] Thread overran stack, or stack corrupted
[17667512263.751608] BUG: Unable to handle kernel data access at 0xc0000007f9070c08
[17667512263.751612] Faulting instruction address: 0xc0000000002659a0
[17667512263.751615] Thread overran stack, or stack corrupted
[17667512263.751618] BUG: Unable to handle kernel data access at 0xc0000007f9070c08
[ 1835.743178] BUG: Unable to handle unknown paging fault at 0xc000000000c4b363
[ 1835.743180] Faulting instruction address: 0x00000000
[17667512263.751633] Faulting instruction address: 0xc0000000002659a0
[ 1835.743195] Oops: Kernel access of bad area, sig: 11 [#1]
[ 1835.743198] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
[ 1835.743203] Modules linked in:
[17667512263.751652] Thread overran stack, or stack corrupted
[ 1835.743205]
Comment 13 John Paul Adrian Glaubitz 2020-03-19 20:22:05 UTC
watson login: [30887.552539] KVM: CPU 4 seems to be stuck
[30900.094713] watchdog: CPU 8 detected hard LOCKUP on other CPUs 0
[30900.094730] watchdog: CPU 8 TB:15863742878763, last SMP heartbeat TB:15855546563837 (16008ms ago)
[30908.222926] watchdog: BUG: soft lockup - CPU#80 stuck for 22s! [CPU 4/KVM:2698]
[30908.374929] watchdog: BUG: soft lockup - CPU#112 stuck for 22s! [CPU 23/KVM:2717]
[30908.426934] watchdog: BUG: soft lockup - CPU#120 stuck for 22s! [CPU 16/KVM:2710]
[30909.570962] rcu: INFO: rcu_sched self-detected stall on CPU
[30909.570970] rcu:     120-....: (5059 ticks this GP) idle=7d2/1/0x4000000000000002 softirq=421758/421758 fqs=2378 
[30912.095025] watchdog: BUG: soft lockup - CPU#8 stuck for 23s! [CPU 18/KVM:2712]
[30912.127027] watchdog: BUG: soft lockup - CPU#40 stuck for 23s! [CPU 22/KVM:2716]
[30912.155026] watchdog: BUG: soft lockup - CPU#56 stuck for 23s! [CPU 27/KVM:2721]
[30912.175028] watchdog: BUG: soft lockup - CPU#64 stuck for 23s! [CPU 26/KVM:2720]
[30912.195028] watchdog: BUG: soft lockup - CPU#72 stuck for 23s! [CPU 19/KVM:2713]
[30912.547038] watchdog: BUG: soft lockup - CPU#136 stuck for 22s! [CPU 8/KVM:2702]
[30912.619040] watchdog: BUG: soft lockup - CPU#144 stuck for 22s! [CPU 5/KVM:2699]

Note You need to log in before you can comment on or make changes to this bug.