Bug 190711 - Process accounting: Using the NETLINK inface, the command TASKSTATS_CMD_GET returns -EINVAL
Summary: Process accounting: Using the NETLINK inface, the command TASKSTATS_CMD_GET r...
Status: NEW
Alias: None
Product: Process Management
Classification: Unclassified
Component: Other (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: process_other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-12-19 10:54 UTC by Martin Steigerwald
Modified: 2016-12-27 10:37 UTC (History)
3 users (show)

See Also:
Kernel Version: 4.8.14
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Martin Steigerwald 2016-12-19 10:54:22 UTC
In Bug 190271 - process accounting sometimes does not work I mentioned two issues that cause process accounting in process and system monitor atop to fail¹. After Gerlof, the developer of atop, did some more research to point to it being indeed an upstream kernel issue, I report the second issue as well:

[1] http://atoptool.nl/


2) When using the NETLINK inface, the command TASKSTATS_CMD_GET 
consequently returns -EINVAL.

The code that is used by the atopacctd daemon is based on the demo code 
'getdelays.c' that can be found in the kernel source code tree
(..../linux/Documentation/accounting/getdelays.c). Also this 'getdelays' 
program does not work any more (also -EINVAL on the same call)
with the newer kernels. I really spent a lot of time on this issue to 
get the code running (there are many places in the kernel code where
-EINVAL for this call can be given), but I did not succeed. It is really 
an incompatibility introduced by the kernel code.
It would be nice if the kernel maintainers provide a working version of 
the getdelays program in the kernel source tree.

I only experience this problem on Debian8 with a 4.8 kernel (virtual 
machine with 4 cores).
On CentOS7 with a 4.8 kernel it works fine (physical machine with 4 cores).

I will anyhow adapt atopacctd for this issue that it detects and logs 
the -EINVAL and terminates.
The current version of atopacctd keeps running which is not useful at all.


I copy and paste his detailed analysis:

Since there are many reasons for the error EINVAL in this piece kernel 
code involved, I added many printk's to find out where the error is 
generated exactly.

My analysis:

1) The NETLINK command mentioned above is meant to specify the range of 
CPUs for which taskstats has to be activated. In the system call, this 
is specified via a string which is "0-3" for all four CPUs in my system. 
In the kernel function 'cmd_attr_register_cpumask' (source file 
kernel/taskstats.c), this string is converted into a bitmap (via a call 
to function 'parse'). In this case, the bitmap contains 
0x000000000000000f. This bitmap is one unsigned long because it provides 
enough bits for four CPUs. However, if needed a bitmap might consist of 
more unsigned longs.

2) After creating the bitmap,the function 'cmd_attr_register_cpumask' 
calls function 'add_del_listener' to register the bitmap.
The first call issued by this function is:

         if (!cpumask_subset(mask, cpu_possible_mask))
                 return -EINVAL;

and this is the call that causes error EINVAL.


3) The function 'cpumask_subset' (inline function in source file 
include/linux/cpumask.h) contains:

static inline int cpumask_subset(const struct cpumask *src1p,
                                  const struct cpumask *src2p)
{
         return bitmap_subset(cpumask_bits(src1p), cpumask_bits(src2p), nr_cpumask_bits);
}

4) The function 'bitmap_subset' (inline function in source file 
include/linux/bitmap.h) verifies if the bits that are set in the first 
bitmap are a subset of the bits set in the second bitmap. That second 
bitmap is in this case the variable 'cpu_possible_mask'. This is a 
bitmap that contains a bit *for every possible CPU* that might be 
present. The number of bits in this bitmap is defined by the 
configuration parameter CONFIG_NR_CPUS=512 (see .config) which contains 
512 on my system (default).

So for this specific call to 'bitmap_subset', the first pointer refers 
to the bitmask of one unsigned long describing the four CPUs that I 
specified in my NETLINK call while the second pointer refers to 
'cpu_possible_mask' consisting of 8 unsigned longs for 512 CPUs. The 
third argument specifies the number of bits to be checked which is 
'nr_cpumask_bits' that contains 512! IMHO, this is not correct. It means 
that the bits in the two bitmaps are compared over a length of 8 
unsigned longs, while the first bitmap only contains one unsigned long.

5) I verified the conclusion of point 4) by the printk's in front of the 
call mentioned for point 2):


static int add_del_listener(pid_t pid, const struct cpumask *mask, int 
isadd)
{
         struct listener_list *listeners;
         struct listener *s, *tmp, *s2;
         unsigned int cpu, i;
         int ret = 0;

         printk(KERN_INFO "GELA: NR_CPUS=%d\n", NR_CPUS);
         printk(KERN_INFO "GELA: nr_cpumask_bits=%d", nr_cpumask_bits);
         printk(KERN_INFO "GELA: nr_cpu_ids=%d", nr_cpu_ids);

         for (i=0; i < BITS_TO_LONGS(nr_cpumask_bits); i++) {
                 printk(KERN_INFO "GELA: %016lx %016lx",
                         *(cpumask_bits(mask)+i),
                         *(cpumask_bits(cpu_possible_mask)+i) );
         }

         if (!cpumask_subset(mask, cpu_possible_mask))
                 return -EINVAL;



When I run my test program, in dmesg I find the following messages:

[   36.016248] GELA: NR_CPUS=512
[   36.016249] GELA: nr_cpumask_bits=512
[   36.016250] GELA: nr_cpu_ids=4
[   36.016252] GELA: 000000000000000f 000000000000000f
[   36.016253] GELA: ffff88005a3cae14 0000000000000000    <-- from here 
bits in the first bitmap are not part of that bitmap!
[   36.016254] GELA: ffff88005fdd7d40 0000000000000000
[   36.016255] GELA: ffffffff81612630 0000000000000000
[   36.016256] GELA: 000000000000000c 0000000000000000
[   36.016257] GELA: ffffffff8131c193 0000000000000000
[   36.016258] GELA: ffffffff816125c0 0000000000000000
[   36.016259] GELA: ffffffff818e57c0 0000000000000000

This output also shows the variable 'nr_cpu_ids' that defines how many 
CPUs are currently present in the system. When I use this variable as 
third parameter in the call to 'bitmap_subset' (see code in point 3), 
the problem does not occur any more because only 4 bits will be check in 
both bitmaps.

6) With the original code, I modified the configuration parameter 
CONFIG_NR_CPUS to 8 instead of 512 to see if I can bypass the problem. 
This means that the bitmap 'cpu_possible_mask' only consists of one 
unsigned long now. No problem occurs for the application now and 
taskstats works fine.

7) I do not know exactly why the problem did not occur with older kernel 
versions. I suspect configuration parameter CONFIG_CPUMASK_OFFSTACK 
which really influences the code related to the cpumask handling. In all 
kernels that had NO problems, this parameter was set to 'y'. In the 
kernel WITH problems, this configuration parameter is not present at all 
(so effectively 'n').

Note You need to log in before you can comment on or make changes to this bug.