Bug 216441

Summary: [BISECTED] perf v5.19+ breaks `perf top -p` for multi-threaded processes
Product: Tracing/Profiling Reporter: Echo J. (aidas957)
Component: Perf toolAssignee: Arnaldo Carvalho de Melo (acme)
Status: ASSIGNED ---    
Severity: normal CC: adrian.hunter, hvtaifwkbgefbaei, jolsa, sahan.h.fernando
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 6.0-rc3 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: Proposed fix

Description Echo J. 2022-09-02 09:03:42 UTC
Hello,

I was trying to figure out why "perf top -p" wasn't working on my Arch Linux system (returning the error "Failed to mmap with 22 (Invalid argument)") and I eventually found out that downgrading to perf v5.18 made "perf top -p" work again

Later I did some bisecting and found out ae4f8ae16a07896403c90305d4b9be27f657c1fc is the problematic commit

I confirmed this issue happens on v5.18.16 and v5.19.4/v5.19.5 Arch kernels (I even tried a v6.0-rc2 mainline kernel and the same error occurs so it should occur on v6.0-rc3 as well)

I even tried updating to perf v6.0-rc3 and the same error occurs

To reproduce this bug, you should compile and install perf with at least the v5.19 kernel tag and run `perf top -p` as an user with the PID of some multi-threaded process (web browsers are a good option)

You should get the "Failed to mmap with 22 (Invalid argument)" message on your screen

This issue doesn't happen without the "-p" argument though

A few users reported this issue on both Intel and AMD CPUs (so it's not some CPU vendor-only bug)

`perf record` works fine for the problematic processes

I just found out that "perf top -p" only breaks for multi-threaded processes (single-threaded ones are fine) and no, specifying a thread ID doesn't have any effect

And as for my system, I currently use a Ryzen 5 4600H CPU on Arch Linux with 5.19.5.arch1 kernel version
Comment 1 Adrian Hunter 2022-09-02 11:59:35 UTC
On 2/09/22 12:03, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=216441
> 
>             Bug ID: 216441
>            Summary: [BISECTED] perf v5.19+ breaks `perf top -p` for
>                     multi-threaded processes
>            Product: Tracing/Profiling
>            Version: unspecified
>     Kernel Version: 6.0-rc3
>           Hardware: All
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Perf tool
>           Assignee: acme@kernel.org
>           Reporter: aidas957@gmail.com
>                 CC: adrian.hunter@intel.com, jolsa@kernel.org
>         Regression: Yes
> 
> Hello,
> 
> I was trying to figure out why "perf top -p" wasn't working on my Arch Linux
> system (returning the error "Failed to mmap with 22 (Invalid argument)") and
> I
> eventually found out that downgrading to perf v5.18 made "perf top -p" work
> again
> 
> Later I did some bisecting and found out
> ae4f8ae16a07896403c90305d4b9be27f657c1fc is the problematic commit

commit ae4f8ae16a07896403c90305d4b9be27f657c1fc
Author: Adrian Hunter <adrian.hunter@intel.com>
Date:   Tue May 24 10:54:31 2022 +0300

    libperf evlist: Allow mixing per-thread and per-cpu mmaps
    
    mmap_per_evsel() will skip events that do not match the CPU, so all CPUs
    can be iterated in any case.
    
    Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>

> 
> I confirmed this issue happens on v5.18.16 and v5.19.4/v5.19.5 Arch kernels
> (I
> even tried a v6.0-rc2 mainline kernel and the same error occurs so it should
> occur on v6.0-rc3 as well)
> 
> I even tried updating to perf v6.0-rc3 and the same error occurs
> 
> To reproduce this bug, you should compile and install perf with at least the
> v5.19 kernel tag and run `perf top -p` as an user with the PID of some
> multi-threaded process (web browsers are a good option)
> 
> You should get the "Failed to mmap with 22 (Invalid argument)" message on
> your
> screen
> 
> This issue doesn't happen without the "-p" argument though
> 
> A few users reported this issue on both Intel and AMD CPUs (so it's not some
> CPU vendor-only bug)
> 
> `perf record` works fine for the problematic processes

In fact *not* for multi-threaded targets with:

	perf record --per-thread -p

> 
> I just found out that "perf top -p" only breaks for multi-threaded processes
> (single-threaded ones are fine) and no, specifying a thread ID doesn't have
> any
> effect

I will see how best to fix it.
Comment 2 Arnaldo Carvalho de Melo 2022-09-02 19:19:56 UTC
If I do:

⬢[acme@toolbox perf-urgent]$ git log -2
commit dfeb0bc60782471c293938e71b1a1117cfac2cb3 (HEAD -> perf/urgent)
Author: Arnaldo Carvalho de Melo <acme@redhat.com>
Date:   Fri Sep 2 16:15:39 2022 -0300

    Revert "libperf evlist: Check nr_mmaps is correct"

    This reverts commit 4ce47d842d4c16c07b135b8a7975b8f0672bcc0e.

    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

commit 78cd283f6b8ab701cb35eafd5af8140560a88f16
Author: Arnaldo Carvalho de Melo <acme@redhat.com>
Date:   Fri Sep 2 16:13:41 2022 -0300

    Revert "libperf evlist: Allow mixing per-thread and per-cpu mmaps"

    This reverts commit ae4f8ae16a07896403c90305d4b9be27f657c1fc.

    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
⬢[acme@toolbox perf-urgent]$

It works again, can the reporter please try this?
Comment 3 Echo J. 2022-09-02 23:24:04 UTC
(In reply to Arnaldo Carvalho de Melo from comment #2)
> If I do:
> 
> ... 
> 
> It works again, can the reporter please try this?

The issue is gone after reverting those 2 commits as expected (the former one is only for fixing a build error though)
Comment 4 Adrian Hunter 2022-09-03 09:50:15 UTC
Created attachment 301736 [details]
Proposed fix

This is the fix I have so far.
Comment 5 Echo J. 2022-09-03 10:04:05 UTC
(In reply to Adrian Hunter from comment #4)
> Created attachment 301736 [details]
> Proposed fix
> 
> This is the fix I have so far.

I just applied your fix from the LKML and it seems to work

The LKML version had a missing commit hash though so I had to generate one from some random data
Comment 6 Sahan Fernando 2022-10-12 08:51:30 UTC
(In reply to Adrian Hunter from comment #4)
> Created attachment 301736 [details]
> Proposed fix
> 
> This is the fix I have so far.

I can also confirm that your patch works.
Comment 7 Sami Farin 2024-02-23 11:46:32 UTC
perf from kernel 6.1.75 works, with 6.6.18 I get:
"The cycles:P event is not supported."

With this patch:

--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -1027,8 +1027,8 @@ static int perf_top__start_counters(struct perf_top *top)
 
        evlist__for_each_entry(evlist, counter) {
 try_again:
-               if (evsel__open(counter, top->evlist->core.user_requested_cpus,
-                                    top->evlist->core.threads) < 0) {
+               if (evsel__open(counter, counter->core.cpus,
+                       counter->core.threads) < 0) {
 
"perf top" returns 
  Available samples
  10K cycles:P
  28K cycles:P
and I can select one and continue...


processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 183
model name	: 13th Gen Intel(R) Core(TM) i5-13600K
stepping	: 1
microcode	: 0x11d
Comment 8 Adrian Hunter 2024-03-14 06:21:34 UTC
On 23/02/24 13:46, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=216441
> 
> Sami Farin (hvtaifwkbgefbaei@gmail.com) changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |hvtaifwkbgefbaei@gmail.com
> 
> --- Comment #7 from Sami Farin (hvtaifwkbgefbaei@gmail.com) ---
> perf from kernel 6.1.75 works, with 6.6.18 I get:
> "The cycles:P event is not supported."
> 
> With this patch:
> 
> --- a/tools/perf/builtin-top.c
> +++ b/tools/perf/builtin-top.c
> @@ -1027,8 +1027,8 @@ static int perf_top__start_counters(struct perf_top
> *top)
> 
>         evlist__for_each_entry(evlist, counter) {
>  try_again:
> -               if (evsel__open(counter,
> top->evlist->core.user_requested_cpus,
> -                                    top->evlist->core.threads) < 0) {
> +               if (evsel__open(counter, counter->core.cpus,
> +                       counter->core.threads) < 0) {
> 
> "perf top" returns 
>   Available samples
>   10K cycles:P
>   28K cycles:P
> and I can select one and continue...
> 
> 
> processor       : 0
> vendor_id       : GenuineIntel
> cpu family      : 6
> model           : 183
> model name      : 13th Gen Intel(R) Core(TM) i5-13600K
> stepping        : 1
> microcode       : 0x11d
> 

That is a different issue. It is a hybrid machine. The fix
is below, but I don't know why it wasn't cc'ed to stable.

commit 5fa695e7da4975e8d21ce49f3718d6cf00ecb75e
Author: Kan Liang <kan.liang@linux.intel.com>
Date:   Thu Dec 14 06:46:11 2023 -0800

    perf top: Use evsel's cpus to replace user_requested_cpus
    
    perf top errors out on a hybrid machine
     $perf top
    
     Error:
     The cycles:P event is not supported.
    
    The perf top expects that the "cycles" is collected on all CPUs in the
    system. But for hybrid there is no single "cycles" event which can cover
    all CPUs. Perf has to split it into two cycles events, e.g.,
    cpu_core/cycles/ and cpu_atom/cycles/. Each event has its own CPU mask.
    If a event is opened on the unsupported CPU. The open fails. That's the
    reason of the above error out.
    
    Perf should only open the cycles event on the corresponding CPU. The
    commit ef91871c960e ("perf evlist: Propagate user CPU maps intersecting
    core PMU maps") intersect the requested CPU map with the CPU map of the
    PMU. Use the evsel's cpus to replace user_requested_cpus.
    
    The evlist's threads are also propagated to the evsel's threads in
    __perf_evlist__propagate_maps(). For a system-wide event, perf appends
    a dummy event and assign it to the evsel's threads. For a per-thread
    event, the evlist's thread_map is assigned to the evsel's threads. The
    same as the other tools, e.g., perf record, using the evsel's threads
    when opening an event.
    
    Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
    Reviewed-by: Ian Rogers <irogers@google.com>
    Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
    Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
    Cc: Hector Martin <marcan@marcan.st>
    Cc: Marc Zyngier <maz@kernel.org>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Namhyung Kim <namhyung@kernel.org>
    Closes: https://lore.kernel.org/linux-perf-users/ZXNnDrGKXbEELMXV@kernel.org/
    Link: https://lore.kernel.org/r/20231214144612.1092028-1-kan.liang@linux.intel.com
    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index ed83afeeced0..13e609c0c693 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -1026,8 +1026,8 @@ static int perf_top__start_counters(struct perf_top *top)
 
 	evlist__for_each_entry(evlist, counter) {
 try_again:
-		if (evsel__open(counter, top->evlist->core.user_requested_cpus,
-				     top->evlist->core.threads) < 0) {
+		if (evsel__open(counter, counter->core.cpus,
+				counter->core.threads) < 0) {
 
 			/*
 			 * Specially handle overwrite fall back.
Comment 9 kan.liang 2024-03-14 12:21:34 UTC
On 2024-03-14 2:21 a.m., Adrian Hunter wrote:
> On 23/02/24 13:46, bugzilla-daemon@kernel.org wrote:
>> https://bugzilla.kernel.org/show_bug.cgi?id=216441
>>
>> Sami Farin (hvtaifwkbgefbaei@gmail.com) changed:
>>
>>            What    |Removed                     |Added
>> ----------------------------------------------------------------------------
>>                  CC|                            |hvtaifwkbgefbaei@gmail.com
>>
>> --- Comment #7 from Sami Farin (hvtaifwkbgefbaei@gmail.com) ---
>> perf from kernel 6.1.75 works, with 6.6.18 I get:
>> "The cycles:P event is not supported."
>>
>> With this patch:
>>
>> --- a/tools/perf/builtin-top.c
>> +++ b/tools/perf/builtin-top.c
>> @@ -1027,8 +1027,8 @@ static int perf_top__start_counters(struct perf_top
>> *top)
>>
>>         evlist__for_each_entry(evlist, counter) {
>>  try_again:
>> -               if (evsel__open(counter,
>> top->evlist->core.user_requested_cpus,
>> -                                    top->evlist->core.threads) < 0) {
>> +               if (evsel__open(counter, counter->core.cpus,
>> +                       counter->core.threads) < 0) {
>>
>> "perf top" returns 
>>   Available samples
>>   10K cycles:P
>>   28K cycles:P
>> and I can select one and continue...
>>
>>
>> processor       : 0
>> vendor_id       : GenuineIntel
>> cpu family      : 6
>> model           : 183
>> model name      : 13th Gen Intel(R) Core(TM) i5-13600K
>> stepping        : 1
>> microcode       : 0x11d
>>
> 
> That is a different issue. It is a hybrid machine. The fix
> is below, but I don't know why it wasn't cc'ed to stable.

Yes, it's missed from the original fix.
I resend it to the stable recently, but I'm not sure when it will be
back ported.

https://lore.kernel.org/stable/20240308151239.2414774-1-kan.liang@linux.intel.com/

Thanks,
Kan
> 
> commit 5fa695e7da4975e8d21ce49f3718d6cf00ecb75e
> Author: Kan Liang <kan.liang@linux.intel.com>
> Date:   Thu Dec 14 06:46:11 2023 -0800
> 
>     perf top: Use evsel's cpus to replace user_requested_cpus
>     
>     perf top errors out on a hybrid machine
>      $perf top
>     
>      Error:
>      The cycles:P event is not supported.
>     
>     The perf top expects that the "cycles" is collected on all CPUs in the
>     system. But for hybrid there is no single "cycles" event which can cover
>     all CPUs. Perf has to split it into two cycles events, e.g.,
>     cpu_core/cycles/ and cpu_atom/cycles/. Each event has its own CPU mask.
>     If a event is opened on the unsupported CPU. The open fails. That's the
>     reason of the above error out.
>     
>     Perf should only open the cycles event on the corresponding CPU. The
>     commit ef91871c960e ("perf evlist: Propagate user CPU maps intersecting
>     core PMU maps") intersect the requested CPU map with the CPU map of the
>     PMU. Use the evsel's cpus to replace user_requested_cpus.
>     
>     The evlist's threads are also propagated to the evsel's threads in
>     __perf_evlist__propagate_maps(). For a system-wide event, perf appends
>     a dummy event and assign it to the evsel's threads. For a per-thread
>     event, the evlist's thread_map is assigned to the evsel's threads. The
>     same as the other tools, e.g., perf record, using the evsel's threads
>     when opening an event.
>     
>     Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
>     Reviewed-by: Ian Rogers <irogers@google.com>
>     Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
>     Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
>     Cc: Hector Martin <marcan@marcan.st>
>     Cc: Marc Zyngier <maz@kernel.org>
>     Cc: Mark Rutland <mark.rutland@arm.com>
>     Cc: Namhyung Kim <namhyung@kernel.org>
>     Closes:
>     https://lore.kernel.org/linux-perf-users/ZXNnDrGKXbEELMXV@kernel.org/
>     Link:
>     https://lore.kernel.org/r/20231214144612.1092028-1-kan.liang@linux.intel.com
>     Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> 
> diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
> index ed83afeeced0..13e609c0c693 100644
> --- a/tools/perf/builtin-top.c
> +++ b/tools/perf/builtin-top.c
> @@ -1026,8 +1026,8 @@ static int perf_top__start_counters(struct perf_top
> *top)
>  
>       evlist__for_each_entry(evlist, counter) {
>  try_again:
> -             if (evsel__open(counter, top->evlist->core.user_requested_cpus,
> -                                  top->evlist->core.threads) < 0) {
> +             if (evsel__open(counter, counter->core.cpus,
> +                             counter->core.threads) < 0) {
>  
>                       /*
>                        * Specially handle overwrite fall back.
>