Bug 190271

Summary: process accounting sometimes does not work
Product: Process Management Reporter: Martin Steigerwald (Martin)
Component: OtherAssignee: process_other
Status: NEW ---    
Severity: normal CC: martin.steigerwald, Martin, mh+kernel-bugzilla, rdimanos, wavexx
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 4.7 Subsystem:
Regression: No Bisected commit-id:
Attachments: source code for atopacct test program

Description Martin Steigerwald 2016-12-13 15:05:09 UTC
Created attachment 247561 [details]
source code for atopacct test program

In order to look for the cause of the Debian bug #833997 ("atop: process accounting does not work"¹), Gerlof, the developer of the atop process monitoring tool², found that process accounting sometimes does not work on recent kernels 4.7, 4.8, 4.9 (probably others), while it does work on 3.16.

[1] https://bugs.debian.org/833997
[2] http://atoptool.nl/

Here is his explaination:

-------------------------------------------------------
1) Sometimes process accounting does not work at all.

The acct() system call (to activate process accounting) return value 0, 
which means that process accounting is activated successfully.
However, no process accounting records are written whatsoever. This 
situation can be reproduced with the program 'acctdemo.c'
that you can find as attachment. When this program gives the message 
"found a process accounting record!", the situation is okay
and process accounting works fine to the file '/tmp/mypacct'. When the 
message 'No process accounting record yet....' is repeatedly given,
process accounting does not work and will not work at all. It might be 
that you have to start this program several times before you get
this situation (preferably start/finish lots of processes in the mean time).
This problem is probably caused by a new mechanism introduced in the 
kernel code (..../linux/kernel/acct.c) that is called 'slow accounting'
and has to be solved in the kernel code.

I experience this problem on Debian8 with a 4.8 kernel and on CentOS7 
with a 4.8 kernel.
-------------------------------------------------------

I test his test program and I can reproduce it with a 4.9 kernel I compiled today:

merkaba:/tmp> ./acctdemo
Yeeeeah, found a process accounting record!
merkaba:/tmp> ./acctdemo
Yeeeeah, found a process accounting record!
merkaba:/tmp> ./acctdemo
Yeeeeah, found a process accounting record!
merkaba:/tmp> ./acctdemo
Yeeeeah, found a process accounting record!
merkaba:/tmp> ./acctdemo
Yeeeeah, found a process accounting record!
merkaba:/tmp> ./acctdemo
No process accounting record yet....
No process accounting record yet....
No process accounting record yet....
No process accounting record yet....
No process accounting record yet....
No process accounting record yet....
No process accounting record yet....
^C

I attach the source code of his test program to this bugreport.

There is another related bug to the process accounting that I will report as well.
Comment 1 Martin Steigerwald 2016-12-19 10:54:44 UTC
I now reported the other bug as well:

Bug 190711 - Process accounting: Using the NETLINK inface, the command TASKSTATS_CMD_GET returns -EINVAL
Comment 2 Dmitry Romanov 2016-12-26 21:23:06 UTC
Hello Martin

It seems I reported the same problem with process accounting:

Bug 180841 - Process accounting sometimes do not append records for terminated processes 
https://bugzilla.kernel.org/show_bug.cgi?id=180841

I think I found source of problem and patch, needed for correction this problem
(described in my bugreport). 

About acctdemo.c:
If in test acctdemo.c append delay between process accounting switch on  
with acct(ACCTFILE) and first process finishing for at least 1 jiffy (for 
example, sleep(1)), then problem will be reproduce almost always.
So, need follow changes:

...
if (acct(ACCTFILE) == -1)
	{
		perror("Switch on accounting");
		exit(1);
	}

	sleep(1); /* for better reproduce problem */

	if ( fork() == 0 )	// fork new process
		exit(0);	// child process: finish
...