Hello. I've hit various limitations of perf in a scenario that I'll describe and maybe some of my observations can be transformed to another PRs: Having a benchmark, I wanted to compare speed in between 2 versions of GCC compiler. I quickly identified that a gap is caused by a single function. That I wanted to understand the difference in more details and I wanted to compare all event types for the symbol. Thus I run: perf record -e ..all_events.. ./benchmark Then I observed that: 1) $ perf annotate symbol: Using TTY one can see annotation for all counters, but name of the counter is not written in a header 2) perf annotate symbol --stdio: Does annotation just for a *single* counter type: Percent | Source code & Disassembly of gcc_peak.none for branch-instructions:u of symbol htab_traverse (3634 samples) .... Here I would expect to display all event types 3) --event (-e) is missing for annotate -> would be helpful 4) --percent-limit is missing for annotate -> would be also helpful 5) $ perf report asks user for an event type -> annotate can maybe do the same Thanks, Martin
1) Can you try the new 'perf annotate --stdio2 symbol' to see if it is any better? Exercising it a bit here (noticed some stuff to improve, sigh): # perf record -a -e '{cycles,instructions,cache-misses}' sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 2.252 MB perf.data (3229 samples) ] # cat ~/.perfconfig [annotate] hide_src_code = true # perf annotate --stdio2 _raw_spin_lock _raw_spin_lock() /lib/modules/4.16.0-rc7/build/vmlinux Event: anon group { cycles, instructions, cache-misses } 0.00 0.00 16.67 → callq __fentry__ xor %eax,%eax mov $0x1,%edx 0.00 7.14 0.00 lock cmpxchg %edx,(%rdi) 100.00 92.86 83.33 test %eax,%eax ↓ jne 16 repz retq 16: mov %eax,%esi → jmpq ffffffff810eaed0 <queued_spin_lock_slowpath> # # perf annotate --ignore-vmlinux --stdio2 _raw_spin_lock _raw_spin_lock() /proc/kcore Event: anon group { cycles, instructions, cache-misses } 0.00 0.00 16.67 nop xor %eax,%eax mov $0x1,%edx 0.00 7.14 0.00 lock cmpxchg %edx,(%rdi) 100.00 92.86 83.33 test %eax,%eax ↓ jne 16 repz retq 16: mov %eax,%esi → jmpq 0xffffffffb30eaed0 #
2) With --stdio2 if the events were grouped, i.e. using {event,event,event} then, as you can see in the previous comment, all will appear and the event description has the event names (should be in the order they appear in the N columns, one for each. If you don't group them, yeah, it is showing just the first one in all cases, its a bug where it shows the annotation for the first event, then when you exit annotation for this event, it should go to the next one, and so on, but this is actually working only for the --tui right now, i.e. if you do: perf record -e cycles,instructions,cache-misses Then go to the --tui annotation for a specific symbol: perf annotate _raw_spin_lock It'll start with the annotation for cycles, then if you press 'q' it will go to the next one, 'instructions', and then to 'cache-misses', the last 'q' will exit perf altogether. So we need to improve this to allow cycling thru the events, it is also necessary to print the name of the event being annotated, on the first line, and also we need to make --group (forced group) work for 'perf annotate' as it does for 'perf report', i.e. even when not using {} to do ask for grouping the events, i.e. force the kernel perf subsystem to try to reserve N events for simultaneous scheduling (or failing if not possible), we would be able to ask ---group and have N columns, one with each event. The issues here just shows that the annotation code is more tested when used from 'perf report', I'll try and fix these issues, thanks for reporting.
Take a look at: git.kernel.org/acme/c/3b6469df3250 Where showing the event name is implemented for --tui.
Ditto for --stdio2: https://git.kernel.org/acme/c/4defe4f719a6
Grrr, those need redoing, the needed information here are samples/period/percentages for the symbol being annotated, not for all the samples :-\ Ok, will get back to this later, out of time now...
Thanks Arnaldo working on that.
Ok, reworked, should be better now, doing tests to push to Ingo, please take a look at my perf/core branch, for instance, for the TUI: https://git.kernel.org/acme/c/6920e2854e9a