Bug 209317
Summary: | ftrace kernel self test failure on RISC-V on 5.8, regression from 5.4.0 | ||
---|---|---|---|
Product: | Tracing/Profiling | Reporter: | Colin Ian King (colin.king) |
Component: | Ftrace | Assignee: | Steven Rostedt (rostedt) |
Status: | NEW --- | ||
Severity: | high | ||
Priority: | P1 | ||
Hardware: | Other | ||
OS: | Linux | ||
Kernel Version: | 5.8.0 | Subsystem: | |
Regression: | Yes | Bisected commit-id: |
Description
Colin Ian King
2020-09-18 14:33:50 UTC
Occurs in 5.8.8 too. regression between 5.6 (ok) and 5.7 (crashes) This is a RISC-V specific issue, bisected down to: cfafe260137418d0265d0df3bb18dc494af2b43e is the first bad commit commit cfafe260137418d0265d0df3bb18dc494af2b43e Author: Atish Patra <atish.patra@wdc.com> Date: Tue Mar 17 18:11:43 2020 -0700 RISC-V: Add supported for ordered booting method using HSM Issue still in 5.9-rc6 On Sat, 26 Sep 2020 22:02:35 +0000 bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=209317 > > --- Comment #4 from Colin Ian King (colin.king@canonical.com) --- > Issue still in 5.9-rc6 > Atish, As the issues bisects down to your commit, care to take a look at this. (And take ownership of this bug) -- Steve On Mon, 2020-09-28 at 11:13 -0400, Steven Rostedt wrote: > On Sat, 26 Sep 2020 22:02:35 +0000 > bugzilla-daemon@bugzilla.kernel.org wrote: > > > https://bugzilla.kernel.org/show_bug.cgi?id=209317 > > > > --- Comment #4 from Colin Ian King (colin.king@canonical.com) --- > > Issue still in 5.9-rc6 > > > > Atish, > > As the issues bisects down to your commit, care to take a look at > this. > (And take ownership of this bug) > Yes. I am already looking into this. Colin informed me about the bug over the weekend. I couldn't change the ownership as I am not part of the editbugs group. I have sent an email to helpdesk@kernel.org for access. > -- Steve Hi Alan and Zong, I initially suspected ftrace is broken between v5.6 & v5.7 as Kolin pointed out. I couldn't find any reason how the HSM patch is related. Zong's ftrace patching code was also merged in that release. However, I was able to reproduce the issue in the older kernel(v5.4) as well on both Qemu & Unleashed hardware. Here are the steps: mount -t debugfs none /sys/kernel/debug/ cd /sys/kernel/debug/tracing echo function_graph > current_tracer echo function > current_tracer It works for the first time with function_graph but writing any other tracer crashes immediately. Can you take a look to check if the bug is in ftrace infrastructure code ? On Mon, Sep 28, 2020 at 10:25 AM Atish Patra <Atish.Patra@wdc.com> wrote: > > On Mon, 2020-09-28 at 11:13 -0400, Steven Rostedt wrote: > > On Sat, 26 Sep 2020 22:02:35 +0000 > > bugzilla-daemon@bugzilla.kernel.org wrote: > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=209317 > > > > > > --- Comment #4 from Colin Ian King (colin.king@canonical.com) --- > > > Issue still in 5.9-rc6 > > > > > > > Atish, > > > > As the issues bisects down to your commit, care to take a look at > > this. > > (And take ownership of this bug) > > > > Yes. I am already looking into this. Colin informed me about the bug > over the weekend. > > I couldn't change the ownership as I am not part of the editbugs group. > I have sent an email to helpdesk@kernel.org for access. > > > -- Steve > > -- > Regards, > Atish > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv Hi Atish, I can take out some time to take a look at it together, if anyone here fixes it or has ideas, please share the information, thanks. On Sun, Oct 4, 2020 at 1:33 AM Atish Patra <atishp@atishpatra.org> wrote: > > Hi Alan and Zong, > I initially suspected ftrace is broken between v5.6 & v5.7 as Kolin pointed > out. > I couldn't find any reason how the HSM patch is related. Zong's ftrace > patching code was also merged in that release. > However, I was able to reproduce the issue in the older kernel(v5.4) > as well on both Qemu & Unleashed hardware. > Here are the steps: > > mount -t debugfs none /sys/kernel/debug/ > cd /sys/kernel/debug/tracing > echo function_graph > current_tracer > echo function > current_tracer > > It works for the first time with function_graph but writing any other > tracer crashes immediately. > Can you take a look to check if the bug is in ftrace infrastructure code ? > > On Mon, Sep 28, 2020 at 10:25 AM Atish Patra <Atish.Patra@wdc.com> wrote: > > > > On Mon, 2020-09-28 at 11:13 -0400, Steven Rostedt wrote: > > > On Sat, 26 Sep 2020 22:02:35 +0000 > > > bugzilla-daemon@bugzilla.kernel.org wrote: > > > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=209317 > > > > > > > > --- Comment #4 from Colin Ian King (colin.king@canonical.com) --- > > > > Issue still in 5.9-rc6 > > > > > > > > > > Atish, > > > > > > As the issues bisects down to your commit, care to take a look at > > > this. > > > (And take ownership of this bug) > > > > > > > Yes. I am already looking into this. Colin informed me about the bug > > over the weekend. > > > > I couldn't change the ownership as I am not part of the editbugs group. > > I have sent an email to helpdesk@kernel.org for access. > > > > > -- Steve > > > > -- > > Regards, > > Atish > > _______________________________________________ > > linux-riscv mailing list > > linux-riscv@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/linux-riscv > > > > -- > Regards, > Atish On Sun, Oct 4, 2020 at 11:08 PM Zong Li <zong.li@sifive.com> wrote: > > Hi Atish, > > I can take out some time to take a look at it together, if anyone here > fixes it or has ideas, please share the information, thanks. > Thanks. I observed this in case it helps. Across kernels, the panic trace seems to point out the one of the first two functions after patching is corrupted. rcu_momentary_dyntick_idle or stop_machine_yield[1] [1]https://elixir.bootlin.com/linux/v5.9-rc7/source/kernel/stop_machine.c#L213 I am suspecting nop was not replaced with the correct auipc+jalr pair? > On Sun, Oct 4, 2020 at 1:33 AM Atish Patra <atishp@atishpatra.org> wrote: > > > > Hi Alan and Zong, > > I initially suspected ftrace is broken between v5.6 & v5.7 as Kolin pointed > out. > > I couldn't find any reason how the HSM patch is related. Zong's ftrace > > patching code was also merged in that release. > > However, I was able to reproduce the issue in the older kernel(v5.4) > > as well on both Qemu & Unleashed hardware. > > Here are the steps: > > > > mount -t debugfs none /sys/kernel/debug/ > > cd /sys/kernel/debug/tracing > > echo function_graph > current_tracer > > echo function > current_tracer > > > > It works for the first time with function_graph but writing any other > > tracer crashes immediately. > > Can you take a look to check if the bug is in ftrace infrastructure code ? > > > > On Mon, Sep 28, 2020 at 10:25 AM Atish Patra <Atish.Patra@wdc.com> wrote: > > > > > > On Mon, 2020-09-28 at 11:13 -0400, Steven Rostedt wrote: > > > > On Sat, 26 Sep 2020 22:02:35 +0000 > > > > bugzilla-daemon@bugzilla.kernel.org wrote: > > > > > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=209317 > > > > > > > > > > --- Comment #4 from Colin Ian King (colin.king@canonical.com) --- > > > > > Issue still in 5.9-rc6 > > > > > > > > > > > > > Atish, > > > > > > > > As the issues bisects down to your commit, care to take a look at > > > > this. > > > > (And take ownership of this bug) > > > > > > > > > > Yes. I am already looking into this. Colin informed me about the bug > > > over the weekend. > > > > > > I couldn't change the ownership as I am not part of the editbugs group. > > > I have sent an email to helpdesk@kernel.org for access. > > > > > > > -- Steve > > > > > > -- > > > Regards, > > > Atish > > > _______________________________________________ > > > linux-riscv mailing list > > > linux-riscv@lists.infradead.org > > > http://lists.infradead.org/mailman/listinfo/linux-riscv > > > > > > > > -- > > Regards, > > Atish |