Hello, I was trying to figure out why "perf top -p" wasn't working on my Arch Linux system (returning the error "Failed to mmap with 22 (Invalid argument)") and I eventually found out that downgrading to perf v5.18 made "perf top -p" work again Later I did some bisecting and found out ae4f8ae16a07896403c90305d4b9be27f657c1fc is the problematic commit I confirmed this issue happens on v5.18.16 and v5.19.4/v5.19.5 Arch kernels (I even tried a v6.0-rc2 mainline kernel and the same error occurs so it should occur on v6.0-rc3 as well) I even tried updating to perf v6.0-rc3 and the same error occurs To reproduce this bug, you should compile and install perf with at least the v5.19 kernel tag and run `perf top -p` as an user with the PID of some multi-threaded process (web browsers are a good option) You should get the "Failed to mmap with 22 (Invalid argument)" message on your screen This issue doesn't happen without the "-p" argument though A few users reported this issue on both Intel and AMD CPUs (so it's not some CPU vendor-only bug) `perf record` works fine for the problematic processes I just found out that "perf top -p" only breaks for multi-threaded processes (single-threaded ones are fine) and no, specifying a thread ID doesn't have any effect And as for my system, I currently use a Ryzen 5 4600H CPU on Arch Linux with 5.19.5.arch1 kernel version
On 2/09/22 12:03, bugzilla-daemon@kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=216441 > > Bug ID: 216441 > Summary: [BISECTED] perf v5.19+ breaks `perf top -p` for > multi-threaded processes > Product: Tracing/Profiling > Version: unspecified > Kernel Version: 6.0-rc3 > Hardware: All > OS: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Perf tool > Assignee: acme@kernel.org > Reporter: aidas957@gmail.com > CC: adrian.hunter@intel.com, jolsa@kernel.org > Regression: Yes > > Hello, > > I was trying to figure out why "perf top -p" wasn't working on my Arch Linux > system (returning the error "Failed to mmap with 22 (Invalid argument)") and > I > eventually found out that downgrading to perf v5.18 made "perf top -p" work > again > > Later I did some bisecting and found out > ae4f8ae16a07896403c90305d4b9be27f657c1fc is the problematic commit commit ae4f8ae16a07896403c90305d4b9be27f657c1fc Author: Adrian Hunter <adrian.hunter@intel.com> Date: Tue May 24 10:54:31 2022 +0300 libperf evlist: Allow mixing per-thread and per-cpu mmaps mmap_per_evsel() will skip events that do not match the CPU, so all CPUs can be iterated in any case. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> > > I confirmed this issue happens on v5.18.16 and v5.19.4/v5.19.5 Arch kernels > (I > even tried a v6.0-rc2 mainline kernel and the same error occurs so it should > occur on v6.0-rc3 as well) > > I even tried updating to perf v6.0-rc3 and the same error occurs > > To reproduce this bug, you should compile and install perf with at least the > v5.19 kernel tag and run `perf top -p` as an user with the PID of some > multi-threaded process (web browsers are a good option) > > You should get the "Failed to mmap with 22 (Invalid argument)" message on > your > screen > > This issue doesn't happen without the "-p" argument though > > A few users reported this issue on both Intel and AMD CPUs (so it's not some > CPU vendor-only bug) > > `perf record` works fine for the problematic processes In fact *not* for multi-threaded targets with: perf record --per-thread -p > > I just found out that "perf top -p" only breaks for multi-threaded processes > (single-threaded ones are fine) and no, specifying a thread ID doesn't have > any > effect I will see how best to fix it.
If I do: ⬢[acme@toolbox perf-urgent]$ git log -2 commit dfeb0bc60782471c293938e71b1a1117cfac2cb3 (HEAD -> perf/urgent) Author: Arnaldo Carvalho de Melo <acme@redhat.com> Date: Fri Sep 2 16:15:39 2022 -0300 Revert "libperf evlist: Check nr_mmaps is correct" This reverts commit 4ce47d842d4c16c07b135b8a7975b8f0672bcc0e. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> commit 78cd283f6b8ab701cb35eafd5af8140560a88f16 Author: Arnaldo Carvalho de Melo <acme@redhat.com> Date: Fri Sep 2 16:13:41 2022 -0300 Revert "libperf evlist: Allow mixing per-thread and per-cpu mmaps" This reverts commit ae4f8ae16a07896403c90305d4b9be27f657c1fc. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> ⬢[acme@toolbox perf-urgent]$ It works again, can the reporter please try this?
(In reply to Arnaldo Carvalho de Melo from comment #2) > If I do: > > ... > > It works again, can the reporter please try this? The issue is gone after reverting those 2 commits as expected (the former one is only for fixing a build error though)
Created attachment 301736 [details] Proposed fix This is the fix I have so far.
(In reply to Adrian Hunter from comment #4) > Created attachment 301736 [details] > Proposed fix > > This is the fix I have so far. I just applied your fix from the LKML and it seems to work The LKML version had a missing commit hash though so I had to generate one from some random data
(In reply to Adrian Hunter from comment #4) > Created attachment 301736 [details] > Proposed fix > > This is the fix I have so far. I can also confirm that your patch works.
perf from kernel 6.1.75 works, with 6.6.18 I get: "The cycles:P event is not supported." With this patch: --- a/tools/perf/builtin-top.c +++ b/tools/perf/builtin-top.c @@ -1027,8 +1027,8 @@ static int perf_top__start_counters(struct perf_top *top) evlist__for_each_entry(evlist, counter) { try_again: - if (evsel__open(counter, top->evlist->core.user_requested_cpus, - top->evlist->core.threads) < 0) { + if (evsel__open(counter, counter->core.cpus, + counter->core.threads) < 0) { "perf top" returns Available samples 10K cycles:P 28K cycles:P and I can select one and continue... processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 183 model name : 13th Gen Intel(R) Core(TM) i5-13600K stepping : 1 microcode : 0x11d
On 23/02/24 13:46, bugzilla-daemon@kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=216441 > > Sami Farin (hvtaifwkbgefbaei@gmail.com) changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > CC| |hvtaifwkbgefbaei@gmail.com > > --- Comment #7 from Sami Farin (hvtaifwkbgefbaei@gmail.com) --- > perf from kernel 6.1.75 works, with 6.6.18 I get: > "The cycles:P event is not supported." > > With this patch: > > --- a/tools/perf/builtin-top.c > +++ b/tools/perf/builtin-top.c > @@ -1027,8 +1027,8 @@ static int perf_top__start_counters(struct perf_top > *top) > > evlist__for_each_entry(evlist, counter) { > try_again: > - if (evsel__open(counter, > top->evlist->core.user_requested_cpus, > - top->evlist->core.threads) < 0) { > + if (evsel__open(counter, counter->core.cpus, > + counter->core.threads) < 0) { > > "perf top" returns > Available samples > 10K cycles:P > 28K cycles:P > and I can select one and continue... > > > processor : 0 > vendor_id : GenuineIntel > cpu family : 6 > model : 183 > model name : 13th Gen Intel(R) Core(TM) i5-13600K > stepping : 1 > microcode : 0x11d > That is a different issue. It is a hybrid machine. The fix is below, but I don't know why it wasn't cc'ed to stable. commit 5fa695e7da4975e8d21ce49f3718d6cf00ecb75e Author: Kan Liang <kan.liang@linux.intel.com> Date: Thu Dec 14 06:46:11 2023 -0800 perf top: Use evsel's cpus to replace user_requested_cpus perf top errors out on a hybrid machine $perf top Error: The cycles:P event is not supported. The perf top expects that the "cycles" is collected on all CPUs in the system. But for hybrid there is no single "cycles" event which can cover all CPUs. Perf has to split it into two cycles events, e.g., cpu_core/cycles/ and cpu_atom/cycles/. Each event has its own CPU mask. If a event is opened on the unsupported CPU. The open fails. That's the reason of the above error out. Perf should only open the cycles event on the corresponding CPU. The commit ef91871c960e ("perf evlist: Propagate user CPU maps intersecting core PMU maps") intersect the requested CPU map with the CPU map of the PMU. Use the evsel's cpus to replace user_requested_cpus. The evlist's threads are also propagated to the evsel's threads in __perf_evlist__propagate_maps(). For a system-wide event, perf appends a dummy event and assign it to the evsel's threads. For a per-thread event, the evlist's thread_map is assigned to the evsel's threads. The same as the other tools, e.g., perf record, using the evsel's threads when opening an event. Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Hector Martin <marcan@marcan.st> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Closes: https://lore.kernel.org/linux-perf-users/ZXNnDrGKXbEELMXV@kernel.org/ Link: https://lore.kernel.org/r/20231214144612.1092028-1-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c index ed83afeeced0..13e609c0c693 100644 --- a/tools/perf/builtin-top.c +++ b/tools/perf/builtin-top.c @@ -1026,8 +1026,8 @@ static int perf_top__start_counters(struct perf_top *top) evlist__for_each_entry(evlist, counter) { try_again: - if (evsel__open(counter, top->evlist->core.user_requested_cpus, - top->evlist->core.threads) < 0) { + if (evsel__open(counter, counter->core.cpus, + counter->core.threads) < 0) { /* * Specially handle overwrite fall back.
On 2024-03-14 2:21 a.m., Adrian Hunter wrote: > On 23/02/24 13:46, bugzilla-daemon@kernel.org wrote: >> https://bugzilla.kernel.org/show_bug.cgi?id=216441 >> >> Sami Farin (hvtaifwkbgefbaei@gmail.com) changed: >> >> What |Removed |Added >> ---------------------------------------------------------------------------- >> CC| |hvtaifwkbgefbaei@gmail.com >> >> --- Comment #7 from Sami Farin (hvtaifwkbgefbaei@gmail.com) --- >> perf from kernel 6.1.75 works, with 6.6.18 I get: >> "The cycles:P event is not supported." >> >> With this patch: >> >> --- a/tools/perf/builtin-top.c >> +++ b/tools/perf/builtin-top.c >> @@ -1027,8 +1027,8 @@ static int perf_top__start_counters(struct perf_top >> *top) >> >> evlist__for_each_entry(evlist, counter) { >> try_again: >> - if (evsel__open(counter, >> top->evlist->core.user_requested_cpus, >> - top->evlist->core.threads) < 0) { >> + if (evsel__open(counter, counter->core.cpus, >> + counter->core.threads) < 0) { >> >> "perf top" returns >> Available samples >> 10K cycles:P >> 28K cycles:P >> and I can select one and continue... >> >> >> processor : 0 >> vendor_id : GenuineIntel >> cpu family : 6 >> model : 183 >> model name : 13th Gen Intel(R) Core(TM) i5-13600K >> stepping : 1 >> microcode : 0x11d >> > > That is a different issue. It is a hybrid machine. The fix > is below, but I don't know why it wasn't cc'ed to stable. Yes, it's missed from the original fix. I resend it to the stable recently, but I'm not sure when it will be back ported. https://lore.kernel.org/stable/20240308151239.2414774-1-kan.liang@linux.intel.com/ Thanks, Kan > > commit 5fa695e7da4975e8d21ce49f3718d6cf00ecb75e > Author: Kan Liang <kan.liang@linux.intel.com> > Date: Thu Dec 14 06:46:11 2023 -0800 > > perf top: Use evsel's cpus to replace user_requested_cpus > > perf top errors out on a hybrid machine > $perf top > > Error: > The cycles:P event is not supported. > > The perf top expects that the "cycles" is collected on all CPUs in the > system. But for hybrid there is no single "cycles" event which can cover > all CPUs. Perf has to split it into two cycles events, e.g., > cpu_core/cycles/ and cpu_atom/cycles/. Each event has its own CPU mask. > If a event is opened on the unsupported CPU. The open fails. That's the > reason of the above error out. > > Perf should only open the cycles event on the corresponding CPU. The > commit ef91871c960e ("perf evlist: Propagate user CPU maps intersecting > core PMU maps") intersect the requested CPU map with the CPU map of the > PMU. Use the evsel's cpus to replace user_requested_cpus. > > The evlist's threads are also propagated to the evsel's threads in > __perf_evlist__propagate_maps(). For a system-wide event, perf appends > a dummy event and assign it to the evsel's threads. For a per-thread > event, the evlist's thread_map is assigned to the evsel's threads. The > same as the other tools, e.g., perf record, using the evsel's threads > when opening an event. > > Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org> > Reviewed-by: Ian Rogers <irogers@google.com> > Signed-off-by: Kan Liang <kan.liang@linux.intel.com> > Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> > Cc: Hector Martin <marcan@marcan.st> > Cc: Marc Zyngier <maz@kernel.org> > Cc: Mark Rutland <mark.rutland@arm.com> > Cc: Namhyung Kim <namhyung@kernel.org> > Closes: > https://lore.kernel.org/linux-perf-users/ZXNnDrGKXbEELMXV@kernel.org/ > Link: > https://lore.kernel.org/r/20231214144612.1092028-1-kan.liang@linux.intel.com > Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> > > diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c > index ed83afeeced0..13e609c0c693 100644 > --- a/tools/perf/builtin-top.c > +++ b/tools/perf/builtin-top.c > @@ -1026,8 +1026,8 @@ static int perf_top__start_counters(struct perf_top > *top) > > evlist__for_each_entry(evlist, counter) { > try_again: > - if (evsel__open(counter, top->evlist->core.user_requested_cpus, > - top->evlist->core.threads) < 0) { > + if (evsel__open(counter, counter->core.cpus, > + counter->core.threads) < 0) { > > /* > * Specially handle overwrite fall back. >