Running the kernel with the kernel ftrace self tests triggers an OOPS on RISC-V on 5.8.0, this did not occur on 5.4.0 How to reproduce: Test: linux/tools/testing/selftests/ftrace run with sudo ftrace -vvv function_graph ftrace test basic2.tc causes the oops: [ 455.134397] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 [ 455.134451] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 [ 455.134508] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 [ 455.134564] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 [ 455.134607] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 [ 455.134649] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 [ 455.134754] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 [ 455.135305] Oops [#1] [ 455.145642] Modules linked in: binfmt_misc ofpart redboot cmdlinepart cfi_cmdset_0001 cfi_probe cfi_util gen_probe physmap map_funcs chipreg virtio_rng mtd uio_pdrv_genirq uio sch_fq_codel drm drm_panel_orientation_quirks backlight ip_tables x_tables autofs4 virtio_net net_failover failover virtio_blk [ 455.196320] CPU: 2 PID: 20 Comm: migration/2 Tainted: G W 5.8.0-2-generic #2 [ 455.197001] epc: 0000000000000000 ra : ffffffe0002b7acc sp : ffffffe1f5c8bd70 [ 455.197581] gp : ffffffe0017223a0 tp : ffffffe1f5c76780 t0 : ffffffe1f5c8bd78 [ 455.222311] t1 : ffffffe0002769a0 t2 : ffffffe1f5c8be00 s0 : ffffffe1f5c8bd80 [ 455.223098] s1 : ffffffe1f1063ba0 a0 : ffffffe0002b7ba6 a1 : ffffffe0002769a0 [ 455.223846] a2 : ffffffe1f5c8be00 a3 : 0000000000000010 a4 : 0000000000000002 [ 455.224507] a5 : ffffffe1fecbfad8 a6 : 00000000000000ff a7 : 0000000000000001 [ 455.225351] s2 : ffffffe1f1063bc4 s3 : ffffffffffffffff s4 : ffffffe001724210 [ 455.226199] s5 : 0000000000000001 s6 : 0000000000000003 s7 : 0000000000000003 [ 455.226924] s8 : 0000000000000004 s9 : 0000000000000002 s10: 0000000000000000 [ 455.227522] s11: 0000000000000003 t3 : 000000000000b67e t4 : 00000000000192f5 [ 455.228111] t5 : ffffffe0017221a0 t6 : ffffffe000c02d24 [ 455.228542] status: 0000000000000100 badaddr: 0000000000000000 cause: 000000000000000c
Occurs in 5.8.8 too.
regression between 5.6 (ok) and 5.7 (crashes)
This is a RISC-V specific issue, bisected down to: cfafe260137418d0265d0df3bb18dc494af2b43e is the first bad commit commit cfafe260137418d0265d0df3bb18dc494af2b43e Author: Atish Patra <atish.patra@wdc.com> Date: Tue Mar 17 18:11:43 2020 -0700 RISC-V: Add supported for ordered booting method using HSM
Issue still in 5.9-rc6
On Sat, 26 Sep 2020 22:02:35 +0000 bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=209317 > > --- Comment #4 from Colin Ian King (colin.king@canonical.com) --- > Issue still in 5.9-rc6 > Atish, As the issues bisects down to your commit, care to take a look at this. (And take ownership of this bug) -- Steve
On Mon, 2020-09-28 at 11:13 -0400, Steven Rostedt wrote: > On Sat, 26 Sep 2020 22:02:35 +0000 > bugzilla-daemon@bugzilla.kernel.org wrote: > > > https://bugzilla.kernel.org/show_bug.cgi?id=209317 > > > > --- Comment #4 from Colin Ian King (colin.king@canonical.com) --- > > Issue still in 5.9-rc6 > > > > Atish, > > As the issues bisects down to your commit, care to take a look at > this. > (And take ownership of this bug) > Yes. I am already looking into this. Colin informed me about the bug over the weekend. I couldn't change the ownership as I am not part of the editbugs group. I have sent an email to helpdesk@kernel.org for access. > -- Steve
Hi Alan and Zong, I initially suspected ftrace is broken between v5.6 & v5.7 as Kolin pointed out. I couldn't find any reason how the HSM patch is related. Zong's ftrace patching code was also merged in that release. However, I was able to reproduce the issue in the older kernel(v5.4) as well on both Qemu & Unleashed hardware. Here are the steps: mount -t debugfs none /sys/kernel/debug/ cd /sys/kernel/debug/tracing echo function_graph > current_tracer echo function > current_tracer It works for the first time with function_graph but writing any other tracer crashes immediately. Can you take a look to check if the bug is in ftrace infrastructure code ? On Mon, Sep 28, 2020 at 10:25 AM Atish Patra <Atish.Patra@wdc.com> wrote: > > On Mon, 2020-09-28 at 11:13 -0400, Steven Rostedt wrote: > > On Sat, 26 Sep 2020 22:02:35 +0000 > > bugzilla-daemon@bugzilla.kernel.org wrote: > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=209317 > > > > > > --- Comment #4 from Colin Ian King (colin.king@canonical.com) --- > > > Issue still in 5.9-rc6 > > > > > > > Atish, > > > > As the issues bisects down to your commit, care to take a look at > > this. > > (And take ownership of this bug) > > > > Yes. I am already looking into this. Colin informed me about the bug > over the weekend. > > I couldn't change the ownership as I am not part of the editbugs group. > I have sent an email to helpdesk@kernel.org for access. > > > -- Steve > > -- > Regards, > Atish > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv
Hi Atish, I can take out some time to take a look at it together, if anyone here fixes it or has ideas, please share the information, thanks. On Sun, Oct 4, 2020 at 1:33 AM Atish Patra <atishp@atishpatra.org> wrote: > > Hi Alan and Zong, > I initially suspected ftrace is broken between v5.6 & v5.7 as Kolin pointed > out. > I couldn't find any reason how the HSM patch is related. Zong's ftrace > patching code was also merged in that release. > However, I was able to reproduce the issue in the older kernel(v5.4) > as well on both Qemu & Unleashed hardware. > Here are the steps: > > mount -t debugfs none /sys/kernel/debug/ > cd /sys/kernel/debug/tracing > echo function_graph > current_tracer > echo function > current_tracer > > It works for the first time with function_graph but writing any other > tracer crashes immediately. > Can you take a look to check if the bug is in ftrace infrastructure code ? > > On Mon, Sep 28, 2020 at 10:25 AM Atish Patra <Atish.Patra@wdc.com> wrote: > > > > On Mon, 2020-09-28 at 11:13 -0400, Steven Rostedt wrote: > > > On Sat, 26 Sep 2020 22:02:35 +0000 > > > bugzilla-daemon@bugzilla.kernel.org wrote: > > > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=209317 > > > > > > > > --- Comment #4 from Colin Ian King (colin.king@canonical.com) --- > > > > Issue still in 5.9-rc6 > > > > > > > > > > Atish, > > > > > > As the issues bisects down to your commit, care to take a look at > > > this. > > > (And take ownership of this bug) > > > > > > > Yes. I am already looking into this. Colin informed me about the bug > > over the weekend. > > > > I couldn't change the ownership as I am not part of the editbugs group. > > I have sent an email to helpdesk@kernel.org for access. > > > > > -- Steve > > > > -- > > Regards, > > Atish > > _______________________________________________ > > linux-riscv mailing list > > linux-riscv@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/linux-riscv > > > > -- > Regards, > Atish
On Sun, Oct 4, 2020 at 11:08 PM Zong Li <zong.li@sifive.com> wrote: > > Hi Atish, > > I can take out some time to take a look at it together, if anyone here > fixes it or has ideas, please share the information, thanks. > Thanks. I observed this in case it helps. Across kernels, the panic trace seems to point out the one of the first two functions after patching is corrupted. rcu_momentary_dyntick_idle or stop_machine_yield[1] [1]https://elixir.bootlin.com/linux/v5.9-rc7/source/kernel/stop_machine.c#L213 I am suspecting nop was not replaced with the correct auipc+jalr pair? > On Sun, Oct 4, 2020 at 1:33 AM Atish Patra <atishp@atishpatra.org> wrote: > > > > Hi Alan and Zong, > > I initially suspected ftrace is broken between v5.6 & v5.7 as Kolin pointed > out. > > I couldn't find any reason how the HSM patch is related. Zong's ftrace > > patching code was also merged in that release. > > However, I was able to reproduce the issue in the older kernel(v5.4) > > as well on both Qemu & Unleashed hardware. > > Here are the steps: > > > > mount -t debugfs none /sys/kernel/debug/ > > cd /sys/kernel/debug/tracing > > echo function_graph > current_tracer > > echo function > current_tracer > > > > It works for the first time with function_graph but writing any other > > tracer crashes immediately. > > Can you take a look to check if the bug is in ftrace infrastructure code ? > > > > On Mon, Sep 28, 2020 at 10:25 AM Atish Patra <Atish.Patra@wdc.com> wrote: > > > > > > On Mon, 2020-09-28 at 11:13 -0400, Steven Rostedt wrote: > > > > On Sat, 26 Sep 2020 22:02:35 +0000 > > > > bugzilla-daemon@bugzilla.kernel.org wrote: > > > > > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=209317 > > > > > > > > > > --- Comment #4 from Colin Ian King (colin.king@canonical.com) --- > > > > > Issue still in 5.9-rc6 > > > > > > > > > > > > > Atish, > > > > > > > > As the issues bisects down to your commit, care to take a look at > > > > this. > > > > (And take ownership of this bug) > > > > > > > > > > Yes. I am already looking into this. Colin informed me about the bug > > > over the weekend. > > > > > > I couldn't change the ownership as I am not part of the editbugs group. > > > I have sent an email to helpdesk@kernel.org for access. > > > > > > > -- Steve > > > > > > -- > > > Regards, > > > Atish > > > _______________________________________________ > > > linux-riscv mailing list > > > linux-riscv@lists.infradead.org > > > http://lists.infradead.org/mailman/listinfo/linux-riscv > > > > > > > > -- > > Regards, > > Atish