Bug 211339

Summary: Skipped function traces
Product: Tracing/Profiling Reporter: pierre.gondois
Component: FtraceAssignee: Steven Rostedt (rostedt)
Status: RESOLVED CODE_FIX    
Severity: normal    
Priority: P1    
Hardware: ARM   
OS: Linux   
Kernel Version: v5.11-rc1 Subsystem:
Regression: No Bisected commit-id:
Attachments: Trace of a cpu with a bad tracing_graph_pause value
Set tracing_graph_pause at init_task

Description pierre.gondois 2021-01-25 10:18:35 UTC
Hello,
I added this function and called it from kernel/sched/fair.c:select_task_rq_fair (but it could be somewhere else):
void noinline MyFunc(void) {
  trace_printk("TheTag\n");
  return;
}

I then issued the following commands:
# /root/trace-cmd start -e cpu_frequency -e cpu_idle -e sched_overutilized -e sched_switch -e sched_wakeup -p function_graph -l MyFunc
Wait a bit ...
# /root/trace-cmd stop
# /root/trace-cmd extract -o /root/trace.dat

it gives me the following extracts:
<idle>-0     [003]  2308.555100: cpu_idle:             state=4294967295 cpu_id=3
<idle>-0     [003]  2308.555359: bputs:                MyFunc: TheTag
<idle>-0     [003]  2308.555457: sched_wakeup:         comm=kcompactd0 pid=53 prio=120 target_cpu=003

and
kworker/u12:2-1899  [003]  2307.953111: funcgraph_entry:                   |  MyFunc() {
kworker/u12:2-1899  [003]  2307.953114: bputs:                MyFunc: TheTag
kworker/u12:2-1899  [003]  2307.953116: funcgraph_exit:         6.900 us   |  }

As shown, it seems the function "MyFunc" is sometimes not traced.

Issuing the following commands:
# echo "MyFunc" > /sys/kernel/debug/tracing/set_ftrace_filter
# echo "function_graph" > /sys/kernel/debug/tracing/current_tracer
# echo 1 > /sys/kernel/debug/tracing/tracing_on
Wait a bit ...
# echo 0 > /sys/kernel/debug/tracing/tracing_on
# /root/trace-cmd extract -o /root/trace.dat

gives me the a similar output (available in trace2.dat):
<idle>-0     [002]  3057.671350: bputs:                MyFunc: TheTag
kworker/2:1H-90    [002]  3057.671606: funcgraph_entry:                   |  MyFunc() {
kworker/2:1H-90    [002]  3057.671609: bputs:                MyFunc: TheTag
kworker/2:1H-90    [002]  3057.671610: funcgraph_exit:         6.120 us   |  }
<idle>-0     [002]  3057.676063: bputs:                MyFunc: TheTag
kworker/2:1H-90    [002]  3057.678275: funcgraph_entry:                   |  MyFunc() {
kworker/2:1H-90    [002]  3057.678278: bputs:                MyFunc: TheTag
kworker/2:1H-90    [002]  3057.678280: funcgraph_exit:         6.120 us   |  }

As shown, the function "MyFunc" is sometimes not traced.
I am using trace-cmd built from git@github.com:rostedt/trace-cmd.git:b9bb3ae87e7c102013a5a717399281247d5be967
and a kernel based on v5.11-rc1 with patches that should not be related to the issue.

There was apparently no events lost as the per_cpu/cpuX/stats/overrun entries are still at 0.

Regards,
Pierre
Comment 1 pierre.gondois 2021-01-25 12:39:00 UTC
*** Bug 211341 has been marked as a duplicate of this bug. ***
Comment 2 Steven Rostedt 2021-01-25 19:46:24 UTC
I don't currently have an arm board up and running to test on (I have some, but they are all in a box as I restructure my office). I'm not able to reproduce this on x86_64.

Is this something easily reproducible? If so, can you bisect it. If you decide to do so, I would use quilt (https://wiki.debian.org/UsingQuilt), with your update. That way, you can easily add and remove your "TheTag" patch.

 $ git diff > thetag.patch
 $ patch -p1 -R < thetag.patch
 $ quilt import thetag.patch
 $ git bisect start <good> <bad>
 $ quilt push
 $ make ...
 $ quilt pop
 $ git bisect [good|bad]
 $ quilt push 
 [..]

There are a few things that could cause function graph not to print. One is the current->tracing_graph_pause being set.

The other is if the current->ret_stack is not updated yet. But I see the above has idle threads, and that should be updated before tracing begins. What if you enable function tracing instead? Do you see the same thing?

 # trace-cmd start -p function -l MyFunc -e ...

?
Comment 3 pierre.gondois 2021-01-27 22:50:05 UTC
Hello,
Thank you for the quick answer and the advice. It seems that the issue is related to the (struct task_struct)->tracing_graph_pause field.

In arch/arm64/kernel/suspend.c:cpu_suspend(), the idle task is doing:
cpu_suspend() {
  // tracing_graph_pause is incremented from 0->1
  pause_graph_tracing();
  ... sleep ...
  // tracing_graph_pause is decremented from 1->0
  unpause_graph_tracing();
}

In the meantime, setting the current_tracer to "function_graph" issue a call to kernel/trace/fgraph.c:graph_init_task(), and this for each CPU alive:
graph_init_task() {
  // Set tracing_graph_pause whatever the previous value was
  atomic_set(&t->tracing_graph_pause, 0);
  ...
}

This means the following scenario can happen (with (swapper/1)->tracing_graph_pause being tgp1):
       CPU 0        |tgp1 |        CPU 1
                    |  0  |cpu_suspend() {
                    |0->1 |  pause_graph_tracing();
                    |  1  |  ... sleep ...
graph_init_task()   |1->0 |  ... sleep ...
                    |  1  |  ... sleep ...
                    |0->-1|  unpause_graph_tracing() 
                    | -1  |}

This results in the functions of CPU1 being traced only in the cpu_suspend() function. I attached the trace generated of such a CPU. I tested this on v5.11-rc4 on a Juno-r2 board.
I think using atomic_set() instead of atomic_inc()/atomic_dec() should resolve it.
Please let me know if I can help to reproduce it.

Regards,
Pierre
Comment 4 pierre.gondois 2021-01-27 22:56:41 UTC
Created attachment 294977 [details]
Trace of a cpu with a bad tracing_graph_pause value

When tracing_graph_pause is set to 0, the CPU is only tracing the function inside the cpu_suspend() function. The tracing is disabled when getting out of the function.
Comment 5 pierre.gondois 2021-01-27 23:29:01 UTC
There was a typo in the table:
       CPU 0        |tgp1 |        CPU 1
                    |  0  |cpu_suspend() {
                    |0->1 |  pause_graph_tracing();
                    |  1  |  ... sleep ...
graph_init_task()   |1->0 |  ... sleep ...
                    |  0  |  ... sleep ...
                    |0->-1|  unpause_graph_tracing() 
                    | -1  |}
Comment 6 Steven Rostedt 2021-01-28 15:59:12 UTC
Created attachment 294981 [details]
Set tracing_graph_pause at init_task

The attached patch makes init_task initialize the tracing_graph_pause variable in the task struct, and does not touch it at start up of function graph tracing.

Can you see if this fixes your issue.
Comment 7 pierre.gondois 2021-01-28 17:16:39 UTC
This is fixing the issue :)

Thank you for the patch,
Pierre
Comment 8 pierre.gondois 2021-02-07 20:57:16 UTC
Fixed with: "fgraph: Initialize tracing_graph_pause at task creation"
https://github.com/torvalds/linux/commit/7e0a9220467dbcfdc5bc62825724f3e52e50ab31