Bug 190271 - process accounting sometimes does not work
Summary: process accounting sometimes does not work
Status: NEW
Alias: None
Product: Process Management
Classification: Unclassified
Component: Other (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: process_other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-12-13 15:05 UTC by Martin Steigerwald
Modified: 2016-12-27 10:36 UTC (History)
5 users (show)

See Also:
Kernel Version: 4.7
Subsystem:
Regression: No
Bisected commit-id:


Attachments
source code for atopacct test program (764 bytes, text/x-csrc)
2016-12-13 15:05 UTC, Martin Steigerwald
Details

Description Martin Steigerwald 2016-12-13 15:05:09 UTC
Created attachment 247561 [details]
source code for atopacct test program

In order to look for the cause of the Debian bug #833997 ("atop: process accounting does not work"¹), Gerlof, the developer of the atop process monitoring tool², found that process accounting sometimes does not work on recent kernels 4.7, 4.8, 4.9 (probably others), while it does work on 3.16.

[1] https://bugs.debian.org/833997
[2] http://atoptool.nl/

Here is his explaination:

-------------------------------------------------------
1) Sometimes process accounting does not work at all.

The acct() system call (to activate process accounting) return value 0, 
which means that process accounting is activated successfully.
However, no process accounting records are written whatsoever. This 
situation can be reproduced with the program 'acctdemo.c'
that you can find as attachment. When this program gives the message 
"found a process accounting record!", the situation is okay
and process accounting works fine to the file '/tmp/mypacct'. When the 
message 'No process accounting record yet....' is repeatedly given,
process accounting does not work and will not work at all. It might be 
that you have to start this program several times before you get
this situation (preferably start/finish lots of processes in the mean time).
This problem is probably caused by a new mechanism introduced in the 
kernel code (..../linux/kernel/acct.c) that is called 'slow accounting'
and has to be solved in the kernel code.

I experience this problem on Debian8 with a 4.8 kernel and on CentOS7 
with a 4.8 kernel.
-------------------------------------------------------

I test his test program and I can reproduce it with a 4.9 kernel I compiled today:

merkaba:/tmp> ./acctdemo
Yeeeeah, found a process accounting record!
merkaba:/tmp> ./acctdemo
Yeeeeah, found a process accounting record!
merkaba:/tmp> ./acctdemo
Yeeeeah, found a process accounting record!
merkaba:/tmp> ./acctdemo
Yeeeeah, found a process accounting record!
merkaba:/tmp> ./acctdemo
Yeeeeah, found a process accounting record!
merkaba:/tmp> ./acctdemo
No process accounting record yet....
No process accounting record yet....
No process accounting record yet....
No process accounting record yet....
No process accounting record yet....
No process accounting record yet....
No process accounting record yet....
^C

I attach the source code of his test program to this bugreport.

There is another related bug to the process accounting that I will report as well.
Comment 1 Martin Steigerwald 2016-12-19 10:54:44 UTC
I now reported the other bug as well:

Bug 190711 - Process accounting: Using the NETLINK inface, the command TASKSTATS_CMD_GET returns -EINVAL
Comment 2 Dmitry Romanov 2016-12-26 21:23:06 UTC
Hello Martin

It seems I reported the same problem with process accounting:

Bug 180841 - Process accounting sometimes do not append records for terminated processes 
https://bugzilla.kernel.org/show_bug.cgi?id=180841

I think I found source of problem and patch, needed for correction this problem
(described in my bugreport). 

About acctdemo.c:
If in test acctdemo.c append delay between process accounting switch on  
with acct(ACCTFILE) and first process finishing for at least 1 jiffy (for 
example, sleep(1)), then problem will be reproduce almost always.
So, need follow changes:

...
if (acct(ACCTFILE) == -1)
	{
		perror("Switch on accounting");
		exit(1);
	}

	sleep(1); /* for better reproduce problem */

	if ( fork() == 0 )	// fork new process
		exit(0);	// child process: finish
...

Note You need to log in before you can comment on or make changes to this bug.