Bug 196935

Summary: Perf annotate usage limitations
Product: Tracing/Profiling Reporter: Martin Liška (mliska)
Component: Perf toolAssignee: Arnaldo Carvalho de Melo (acme)
Status: NEEDINFO ---    
Severity: normal CC: jolsa
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.12 Subsystem:
Regression: No Bisected commit-id:

Description Martin Liška 2017-09-13 12:43:10 UTC
Hello.

I've hit various limitations of perf in a scenario that I'll describe and maybe some of my observations can be transformed to another PRs:

Having a benchmark, I wanted to compare speed in between 2 versions of GCC compiler. I quickly identified that a gap is caused by a single function. That I wanted to understand the difference in more details and I wanted to compare all event types for the symbol. Thus I run: perf record -e ..all_events.. ./benchmark
Then I observed that:

1) $ perf annotate symbol:
Using TTY one can see annotation for all counters, but name of the counter is not written in a header

2) perf annotate symbol --stdio:
Does annotation just for a *single* counter type:
 Percent |      Source code & Disassembly of gcc_peak.none for branch-instructions:u of symbol htab_traverse (3634 samples)
....

Here I would expect to display all event types

3) --event (-e) is missing for annotate -> would be helpful
4) --percent-limit is missing for annotate -> would be also helpful
5) $ perf report asks user for an event type -> annotate can maybe do the same

Thanks,
Martin
Comment 1 Arnaldo Carvalho de Melo 2018-04-02 14:57:58 UTC
1) Can you try the new 'perf annotate --stdio2 symbol' to see if it is any better?

Exercising it a bit here (noticed some stuff to improve, sigh):

# perf record -a -e '{cycles,instructions,cache-misses}' sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 2.252 MB perf.data (3229 samples) ]
# cat ~/.perfconfig
[annotate]
	hide_src_code = true
# perf annotate --stdio2 _raw_spin_lock 
_raw_spin_lock() /lib/modules/4.16.0-rc7/build/vmlinux
Event: anon group { cycles, instructions, cache-misses }

  0.00   0.00  16.67      → callq  __fentry__           
                            xor    %eax,%eax            
                            mov    $0x1,%edx            
  0.00   7.14   0.00        lock   cmpxchg %edx,(%rdi)  
100.00  92.86  83.33        test   %eax,%eax            
                          ↓ jne    16                   
                            repz   retq                 
                      16:   mov    %eax,%esi            
                          → jmpq   ffffffff810eaed0 <queued_spin_lock_slowpath>
#
# perf annotate --ignore-vmlinux --stdio2 _raw_spin_lock 
_raw_spin_lock() /proc/kcore
Event: anon group { cycles, instructions, cache-misses }

  0.00   0.00  16.67        nop                         
                            xor    %eax,%eax            
                            mov    $0x1,%edx            
  0.00   7.14   0.00        lock   cmpxchg %edx,(%rdi)  
100.00  92.86  83.33        test   %eax,%eax            
                          ↓ jne    16                   
                            repz   retq                 
                      16:   mov    %eax,%esi            
                          → jmpq   0xffffffffb30eaed0   
#
Comment 2 Arnaldo Carvalho de Melo 2018-04-02 15:15:23 UTC
2) With --stdio2 if the events were grouped, i.e. using {event,event,event} then, as you can see in the previous comment, all will appear and the event description has the event names (should be in the order they appear in the N columns, one for each.

If you don't group them, yeah, it is showing just the first one in all cases, its a bug where it shows the annotation for the first event, then when you exit annotation for this event, it should go to the next one, and so on, but this is actually working only for the --tui right now, i.e. if you do:

   perf record -e cycles,instructions,cache-misses

Then go to the --tui annotation for a specific symbol:

   perf annotate _raw_spin_lock

It'll start with the annotation for cycles, then if you press 'q' it will go to the next one, 'instructions', and then to 'cache-misses', the last 'q' will exit perf altogether.

So we need to improve this to allow cycling thru the events, it is also necessary to print the name of the event being annotated, on the first line, and also we need to make --group (forced group) work for 'perf annotate' as it does for 'perf report', i.e. even when not using {} to do ask for grouping the events, i.e. force the kernel perf subsystem to try to reserve N events for simultaneous scheduling (or failing if not possible), we would be able to ask ---group and have N columns, one with each event.

The issues here just shows that the annotation code is more tested when used from 'perf report', I'll try and fix these issues, thanks for reporting.
Comment 3 Arnaldo Carvalho de Melo 2018-04-02 19:34:56 UTC
Take a look at:

git.kernel.org/acme/c/3b6469df3250

Where showing the event name is implemented for --tui.
Comment 4 Arnaldo Carvalho de Melo 2018-04-02 19:56:17 UTC
Ditto for --stdio2:

https://git.kernel.org/acme/c/4defe4f719a6
Comment 5 Arnaldo Carvalho de Melo 2018-04-02 20:02:52 UTC
Grrr, those need redoing, the needed information here are samples/period/percentages for the symbol being annotated, not for all the samples :-\ Ok, will get back to this later, out of time now...
Comment 6 Martin Liška 2018-04-03 08:06:31 UTC
Thanks Arnaldo working on that.
Comment 7 Arnaldo Carvalho de Melo 2018-04-03 20:11:46 UTC
Ok, reworked, should be better now, doing tests to push to Ingo, please take a look at my perf/core branch, for instance, for the TUI:

https://git.kernel.org/acme/c/6920e2854e9a