Bug 211339
Summary: | Skipped function traces | ||
---|---|---|---|
Product: | Tracing/Profiling | Reporter: | pierre.gondois |
Component: | Ftrace | Assignee: | Steven Rostedt (rostedt) |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | ||
Priority: | P1 | ||
Hardware: | ARM | ||
OS: | Linux | ||
Kernel Version: | v5.11-rc1 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
Trace of a cpu with a bad tracing_graph_pause value
Set tracing_graph_pause at init_task |
Description
pierre.gondois
2021-01-25 10:18:35 UTC
*** Bug 211341 has been marked as a duplicate of this bug. *** I don't currently have an arm board up and running to test on (I have some, but they are all in a box as I restructure my office). I'm not able to reproduce this on x86_64. Is this something easily reproducible? If so, can you bisect it. If you decide to do so, I would use quilt (https://wiki.debian.org/UsingQuilt), with your update. That way, you can easily add and remove your "TheTag" patch. $ git diff > thetag.patch $ patch -p1 -R < thetag.patch $ quilt import thetag.patch $ git bisect start <good> <bad> $ quilt push $ make ... $ quilt pop $ git bisect [good|bad] $ quilt push [..] There are a few things that could cause function graph not to print. One is the current->tracing_graph_pause being set. The other is if the current->ret_stack is not updated yet. But I see the above has idle threads, and that should be updated before tracing begins. What if you enable function tracing instead? Do you see the same thing? # trace-cmd start -p function -l MyFunc -e ... ? Hello, Thank you for the quick answer and the advice. It seems that the issue is related to the (struct task_struct)->tracing_graph_pause field. In arch/arm64/kernel/suspend.c:cpu_suspend(), the idle task is doing: cpu_suspend() { // tracing_graph_pause is incremented from 0->1 pause_graph_tracing(); ... sleep ... // tracing_graph_pause is decremented from 1->0 unpause_graph_tracing(); } In the meantime, setting the current_tracer to "function_graph" issue a call to kernel/trace/fgraph.c:graph_init_task(), and this for each CPU alive: graph_init_task() { // Set tracing_graph_pause whatever the previous value was atomic_set(&t->tracing_graph_pause, 0); ... } This means the following scenario can happen (with (swapper/1)->tracing_graph_pause being tgp1): CPU 0 |tgp1 | CPU 1 | 0 |cpu_suspend() { |0->1 | pause_graph_tracing(); | 1 | ... sleep ... graph_init_task() |1->0 | ... sleep ... | 1 | ... sleep ... |0->-1| unpause_graph_tracing() | -1 |} This results in the functions of CPU1 being traced only in the cpu_suspend() function. I attached the trace generated of such a CPU. I tested this on v5.11-rc4 on a Juno-r2 board. I think using atomic_set() instead of atomic_inc()/atomic_dec() should resolve it. Please let me know if I can help to reproduce it. Regards, Pierre Created attachment 294977 [details]
Trace of a cpu with a bad tracing_graph_pause value
When tracing_graph_pause is set to 0, the CPU is only tracing the function inside the cpu_suspend() function. The tracing is disabled when getting out of the function.
There was a typo in the table: CPU 0 |tgp1 | CPU 1 | 0 |cpu_suspend() { |0->1 | pause_graph_tracing(); | 1 | ... sleep ... graph_init_task() |1->0 | ... sleep ... | 0 | ... sleep ... |0->-1| unpause_graph_tracing() | -1 |} Created attachment 294981 [details]
Set tracing_graph_pause at init_task
The attached patch makes init_task initialize the tracing_graph_pause variable in the task struct, and does not touch it at start up of function graph tracing.
Can you see if this fixes your issue.
This is fixing the issue :) Thank you for the patch, Pierre Fixed with: "fgraph: Initialize tracing_graph_pause at task creation" https://github.com/torvalds/linux/commit/7e0a9220467dbcfdc5bc62825724f3e52e50ab31 |