Bug 211255 - trace-cmd report -r hides some events unless -F is used
Summary: trace-cmd report -r hides some events unless -F is used
Status: RESOLVED CODE_FIX
Alias: None
Product: Tools
Classification: Unclassified
Component: Trace-cmd/Kernelshark (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Default virtual assignee for Trace-cmd and kernelshark
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-01-18 11:46 UTC by Douglas RAILLARD
Modified: 2021-03-10 22:38 UTC (History)
1 user (show)

See Also:
Kernel Version: v5.9-rc8
Subsystem:
Regression: No
Bisected commit-id:


Attachments
trace.dat reproducer (1.31 MB, application/x-xz)
2021-01-18 11:46 UTC, Douglas RAILLARD
Details

Description Douglas RAILLARD 2021-01-18 11:46:36 UTC
Created attachment 294725 [details]
trace.dat reproducer

Hi,

These 2 commands should print the same number of events on the attached trace.dat:

$ trace-cmd report -N -r funcgraph_exit trace.dat 2>/dev/null | grep funcgraph_exit | wc -l
11645

$ trace-cmd report -N -F funcgraph_exit -r funcgraph_exit trace.dat 2>/dev/null | grep funcgraph_exit | wc -l
24826

I could not find any pattern when quickly eyeballing the results. The events that are dropped seem to be random, but the result is repeatable.

$ trace-cmd --version                                                                                                               
trace-cmd version 2.9.1 (b9bb3ae87e7c102013a5a717399281247d5be967)

I wanted to try a more recent version but I'll need to update my build environment first to use musl libc as we use static binaries, which will take some time.
Comment 1 Steven Rostedt 2021-01-18 23:34:30 UTC
It appears that the '-N' is not stopping the function graph plugin from being run. When I ran your first command (without grep) I see:

       trace-cmd-884   [003] 11237.706547: funcgraph_entry:                   |  mutex_unlock() {
       trace-cmd-884   [003] 11237.706554: funcgraph_entry:        3.320 us   |    __mutex_unlock_slowpath();
       trace-cmd-884   [003] 11237.706561: funcgraph_exit:        func=0xffff800011135c58 overrun=0x0 calltime=0xa387b81b45c rettime=0xa387b81f14c depth=0
       trace-cmd-884   [003] 11237.706565: funcgraph_entry:        3.020 us   |  __fsnotify_parent();


The '__mutex_unlock_slowpath();' (with the semi-colon) tells me that trace-cmd pulled in the two events and printed them as one (funcgraph_entry and funcgraph_exit), which is why you see less of them in that output.

I'll go and investigate further. Thanks for reporting!
Comment 2 Steven Rostedt 2021-01-19 00:08:32 UTC
Taking a quick look, it appears that -N has never stopped the builtin ftrace plugins from loading, so that's not a regression (been there since 2010!)

What the ftrace plugin does, is for funcgraph_entry calls, it checks if the next event to be printed is a funcgraph_exit, and if so, it checks if it is the return of the current fungraph_entry call, and if that matches, then it will "eat" or merge the funcgraph_exit event into the funcgraph_entry output. That is, the fungraph_exit event is never shown.

Now, if we make -N not enable the internal plugins, you get the expected result.

With a version where I commented out the plugin loading:

 $ trace-cmd report -N -r funcgraph_exit /tmp/trace-issue.dat | grep funcgraph_exit | wc -l
24826

Which is what you expect. Note, without loading the internal plugins we get:

       trace-cmd-884   [003] 11237.706547: funcgraph_entry:      --> mutex_unlock (0)
       trace-cmd-884   [003] 11237.706554: funcgraph_entry:      --> __mutex_unlock_slowpath (1)
       trace-cmd-884   [003] 11237.706558: funcgraph_exit:       <-- __mutex_unlock_slowpath (1) (start: a387b81d66c  end: a387b81e364) over: 1
       trace-cmd-884   [003] 11237.706561: funcgraph_exit:       <-- mutex_unlock (0) (start: a387b81b45c  end: a387b81f14c) over: 0
       trace-cmd-884   [003] 11237.706565: funcgraph_entry:      --> __fsnotify_parent (0)
       trace-cmd-884   [003] 11237.706568: funcgraph_exit:       <-- __fsnotify_parent (0) (start: a387b81ff48  end: a387b820b14) over: 0
       trace-cmd-884   [003] 11237.706570: funcgraph_entry:      --> __sb_end_write (0)


For the output. But I guess that should be expected. Maybe that's the change we should make to fix this. That is, to have -N not load those plugins. Or we could add another option to do so, which means -N doesn't get modified. Something like: --no-internal-plugins?

Thoughts?
Comment 3 Douglas RAILLARD 2021-01-19 10:21:22 UTC
Hi Steven,

Given the current documentation of -N, it seems appropriate to disable internal plugins too. It's probably not widely used anyway beyond people that want a reproducible result not impacted by the environment or for debugging.

That's probably more complicated to implement but maybe events that are asked to be displayed in raw format with -r should not be fed to plugins and displayed directly ?

There is another combo that I forgot to mention and that gives the expected result for some reason:

$ trace-cmd report -R trace.dat 2>/dev/null | grep funcgraph_exit | wc -l               
24826
Comment 4 Steven Rostedt 2021-01-19 14:24:02 UTC
Yeah, you may be right to disable the function tracing macros there too. It is a little more complex, as we are separating the trace-cmd library code (which we want other tools to use) from trace-cmd itself, and that code is in the library. We need to change the API to tell that function "don't load ftrace plugins".

But that would still be easier than having -r do it. The reason is that this is even lower in the code, at the function graph tracer plugin logic. The logic of the funcgraph_entry does a "peak" at the next event to be shown. If it's a matching funcgraph_exit, it will print the the event differently, and it will consume the funcgraph_exit entry. That is, it will be like the funcgraph_exit event never existed. Without the -N, the -r is actually correct. Because it only affects funcgraph_exit entries that would be normally shown.

When you ran "-F", it filtered out the fungraph_entry, which kept the missing funcgraph_exit events from being consumed, and thus they were displayed.

The -R, acts like -N, in that it means "raw event" and in that case, it does not run the registered plugin display overrides, and it too keeps the funcgraph_entry from consuming the funcgraph_exits.

I think having -N stop the plugins from being loaded is the correct solution here.
Comment 5 Steven Rostedt 2021-01-19 14:49:05 UTC
Actually, making -N not loading internal plugins is not that complex. The design needs to be changed slightly, but for the better.

Stay tuned!
Comment 6 Steven Rostedt 2021-03-10 22:38:44 UTC
Fixed by 9dec229a3 ("trace-cmd: Add flags to APIs for opening trace file")

https://lore.kernel.org/linux-trace-devel/20210205075638.1283393-1-tz.stoyanov@gmail.com

Note You need to log in before you can comment on or make changes to this bug.