Bug 206817 - trace-cmd record/start will only contain cores from the affinity mask
Summary: trace-cmd record/start will only contain cores from the affinity mask
Status: CLOSED INVALID
Alias: None
Product: Tools
Classification: Unclassified
Component: Trace-cmd/Kernelshark (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Default virtual assignee for Trace-cmd and kernelshark
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-03-11 13:18 UTC by nolange79
Modified: 2020-03-12 22:16 UTC (History)
1 user (show)

See Also:
Kernel Version: 4.19.98
Subsystem:
Regression: No
Bisected commit-id:


Attachments
musl built executable (369.12 KB, application/octet-stream)
2020-03-12 19:47 UTC, nolange79
Details

Description nolange79 2020-03-11 13:18:49 UTC
Hello,

I started with a 4-core Atom System, where I isolated all cores except #0 with isolcpus=1-3.
trace-cmd appears to only record traces from CPUs that are contained in the affinity mask, so I will only get CPU0 undless I' change affinity with taskset.

trace-cmd has an option -M that should allow specifying the CPU set, but this does not seem to do a thing for me (tried with -M -1, -M 0-3, -M 7).
Comment 1 Steven Rostedt 2020-03-11 13:40:39 UTC
Hi! Where trace-cmd runs should not affect which CPUs it records. In fact, I depend on that (I sometimes need to record isolated tasks without trace-cmd on that CPU too).

What are you running on the isolated CPUs? If the CPUs are idle and isolated, they many not produce any events, which means trace-cmd wont see anything.

What do you get if you do:

 # trace-cmd record -e sched -- taskset -c 1 ls

Which places the ls command on CPU 1 and runs it (while trace-cmd is recording normally).

Then do a: trace-cmd report --cpu 1

I just tried it on my machine with isocpus=1 and it shows the ls trace events on the isolated CPU.
Comment 2 nolange79 2020-03-11 14:41:17 UTC
I run alot on all cores, if I set affinity to all cpus, I get 100s of MB per second with -e all.

# taskset  -p -c 0-3 $$
# trace-cmd record  -e sunrpc:svc_recv 

trace-cmd record  -e sunrpc:svc_recv 

Hit Ctrl^C to stop recording
CPU0 data recorded at offset=0x34d000
    0 bytes in size
CPU1 data recorded at offset=0x34d000
    0 bytes in size
CPU2 data recorded at offset=0x34d000
    0 bytes in size
CPU3 data recorded at offset=0x34d000
    0 bytes in size

# taskset  -p -c 0 $$
# trace-cmd record  -e sunrpc:svc_recv 

Hit Ctrl^C to stop recording
CPU0 data recorded at offset=0x34d000
    0 bytes in size

I should note that I am using v 2.9.dev,
which appears to have no --cpu option.

I am not starting a process like you do, but just trying to trace.
Comment 3 Steven Rostedt 2020-03-11 15:25:32 UTC
The "--cpu" is for trace-cmd report, and has existed for a long time (but not documented, so you can't use bash completion with it :-p). I should change that. It was originally used for debugging, but I think it can now be considered a full feature.

But, just so I understand, are you only seeing the "0 bytes in size" for the CPU that you pin trace-cmd to?  That is, it only shows the dump for one CPU? I'm not seeing that for me.

Also, can you show the full version line of 'trace-cmd -v'

Thanks!
Comment 4 Steven Rostedt 2020-03-12 14:26:20 UTC
Just to clarify. The "-c" option of trace-cmd record does not set the affinity of the recorder. It sets the masks inside the kernel to tell the kernel what CPUs to trace. Thus if you say "-c 0" it will only trace CPU 0 inside the kernel. It does not affect what CPU trace-cmd runs on.

Is that what you expected?
Comment 5 nolange79 2020-03-12 17:49:38 UTC
trace-cmd version 2.9.dev (not-a-git-repo)

Its built from sources trace-cmd-kernelshark-v1.1.tar.gz with musl as C library, *which seems to be the issue*. (I built several tools static, so I can just drop them in anywhere)
Building with glibc does not have the issue.

BTW, I had to add these two patches

diff --git a/lib/trace-cmd/trace-input.c b/lib/trace-cmd/trace-input.c
index 3b187e3..f394f6f 100644
--- a/lib/trace-cmd/trace-input.c
+++ b/lib/trace-cmd/trace-input.c
@@ -11,6 +11,7 @@
 #include <regex.h>
 #include <fcntl.h>
 #include <unistd.h>
+#include <limits.h>
 #include <errno.h>
 
 #include <linux/time64.h>


diff -burN trace-cmd-kernelshark-v1.1.org/tracecmd/trace-mem.c trace-cmd-kernelshark-v1.1/tracecmd/trace-mem.c
--- trace-cmd-kernelshark-v1.1.org/tracecmd/trace-mem.c	2019-10-08 16:28:21.000000000 +0200
+++ trace-cmd-kernelshark-v1.1/tracecmd/trace-mem.c	2020-01-21 00:08:08.518044701 +0100
@@ -30,7 +30,7 @@
 static int kmem_cache_alloc_node_type;
 static int kmem_cache_free_type;
 
-struct tep_format_field *common_type_field;
+extern struct tep_format_field *common_type_field;
 
 struct tep_format_field *kmalloc_callsite_field;
 struct tep_format_field *kmalloc_bytes_req_field;
Comment 6 nolange79 2020-03-12 17:50:40 UTC
"But, just so I understand, are you only seeing the "0 bytes in size" for the CPU that you pin trace-cmd to?  That is, it only shows the dump for one CPU? I'm not seeing that for me."

Yes, that's the issue, it does only trace the CPUs from the affinity mask.
Comment 7 Steven Rostedt 2020-03-12 18:25:15 UTC
(In reply to nolange79 from comment #5)
> trace-cmd version 2.9.dev (not-a-git-repo)
> 
> Its built from sources trace-cmd-kernelshark-v1.1.tar.gz with musl as C
> library, *which seems to be the issue*. (I built several tools static, so I
> can just drop them in anywhere)
> Building with glibc does not have the issue.
> 
> BTW, I had to add these two patches
>

Can you create another BZ with this issue in it?

Thanks!
Comment 8 Steven Rostedt 2020-03-12 18:35:35 UTC
(In reply to nolange79 from comment #6)
> "But, just so I understand, are you only seeing the "0 bytes in size" for
> the CPU that you pin trace-cmd to?  That is, it only shows the dump for one
> CPU? I'm not seeing that for me."
> 
> Yes, that's the issue, it does only trace the CPUs from the affinity mask.

What kernel is this? The affinity of trace-cmd should not affect what it records. When I do the same command:

# trace-cmd record -e rpm:rpm_idle
Hit Ctrl^C to stop recording
^CCPU0 data recorded at offset=0x623000
    0 bytes in size
CPU1 data recorded at offset=0x623000
    0 bytes in size
CPU2 data recorded at offset=0x623000
    0 bytes in size
CPU3 data recorded at offset=0x623000
    0 bytes in size
CPU4 data recorded at offset=0x623000
    0 bytes in size
CPU5 data recorded at offset=0x623000
    0 bytes in size
CPU6 data recorded at offset=0x623000
    0 bytes in size
CPU7 data recorded at offset=0x623000
    0 bytes in size

# taskset -c 0 -p $$
pid 1603's current affinity list: 0-2,4-7
pid 1603's new affinity list: 0
# trace-cmd record -e rpm:rpm_idle
Hit Ctrl^C to stop recording
^CCPU0 data recorded at offset=0x623000
    0 bytes in size
CPU1 data recorded at offset=0x623000
    0 bytes in size
CPU2 data recorded at offset=0x623000
    0 bytes in size
CPU3 data recorded at offset=0x623000
    0 bytes in size
CPU4 data recorded at offset=0x623000
    0 bytes in size
CPU5 data recorded at offset=0x623000
    0 bytes in size
CPU6 data recorded at offset=0x623000
    0 bytes in size
CPU7 data recorded at offset=0x623000
    0 bytes in size

[ I had cpu 3 as an iso cpu ]
Comment 9 nolange79 2020-03-12 19:47:09 UTC
Debian kernel 5.4.0-4-amd64 #1 SMP Debian 5.4.19-1 at home,
custom 4.19.98 at work.

As said, this issue only shows up if I use musl as C-library, it doesnt show up if I use glibc.
I'll add the executable
Comment 10 nolange79 2020-03-12 19:47:46 UTC
Created attachment 287897 [details]
musl built executable
Comment 11 Steven Rostedt 2020-03-12 22:16:26 UTC
I compiled trace-cmd against musl, and found it's a bug in the musl libc implementation of sysconf(). Looking at the source we have:

        case JT_NPROCESSORS_CONF & 255:
        case JT_NPROCESSORS_ONLN & 255: ;
                unsigned char set[128] = {1};
                int i, cnt;
                __syscall(SYS_sched_getaffinity, 0, sizeof set, set);
                for (i=cnt=0; i<sizeof set; i++)
                        for (; set[i]; set[i]&=set[i]-1, cnt++);
                return cnt;

Which appears to do the same thing for both CONF and ONLN, and both incorrectly. It takes the affinity of the running task and returns that! trace-cmd uses this to get the number of processors running on the machine, as according to the man page we have:

       - _SC_NPROCESSORS_CONF
              The number of processors configured.  See also get_nprocs_conf(3).

        - _SC_NPROCESSORS_ONLN
              The number of processors currently online (available).  See also get_nprocs_conf(3).


Which isn't what the sysconf() function of musl does.

I'm closing this as its a bug in musl, not trace-cmd. But if you want to send patches to add a work around, feel free to do so. Please send patches to linux-trace-devel@vger.kernel.org.

Note, the tracecmd_count_cpus() function that calls sysconf() has a backup incase sysconf doesn't work. But it only does the backup if sysconf() returns a negative number. Which if musl did, it would have worked fine. In fact, by commenting out the call to sysconf() and force it to use its backup, it works normally.

Note You need to log in before you can comment on or make changes to this bug.