Bug 150781 - systemd KillUserProcesses=yes and btrfs scrub
Summary: systemd KillUserProcesses=yes and btrfs scrub
Status: NEW
Alias: None
Product: File System
Classification: Unclassified
Component: btrfs (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Josef Bacik
Depends on:
Reported: 2016-07-30 19:59 UTC by Chris Murphy
Modified: 2016-08-01 18:17 UTC (History)
0 users

See Also:
Kernel Version: kernel-4.7.0-0.rc7.git3.1.fc25.x86_64
Tree: Fedora
Regression: No


Description Chris Murphy 2016-07-30 19:59:24 UTC

'btrfs scrub' is interrupted, and status/statistcal information is lost, if the user logs out of their current shell session before scrub completes, when systemd-logind login.conf KillUserProcesses=yes. When set to no, the process survives and whatever accounting its doing is allowed to complete.


1. Use systemd 230+, make sure /etc/systemd/login.conf KillUserProcesses=yes.
2. GNOME Terminal, btrfs scrub start
3. Logout before scrub completes


[root@localhost ~]# btrfs scrub status /
scrub status for 592866ce-c4d0-40ae-bb0c-9de365a1bc00
	scrub started at Sat Jul 30 12:59:05 2016, interrupted after 00:00:05, not running
	total bytes scrubbed: 2.71GiB with 0 errors

There's some evidence kernel threads continue working beyond 5 seconds. So I'm not sure if the scrub is actually finishing, and it's just the killing off of the user space program that's causing the scrub statistics and accounting to be wrong. Or if the kernel processes eventually die off also and the scrub is in fact incomplete.

Additional information:

top and ps report status S for the btrfs scrub process prior to logout; immediate after logout, status goes to Z.

top and ps report status D and R for btrfs balance process before and after logging out. And journalctl -f reports rellocation effects well beyond the user logout, so I'm pretty sure balance is completing.

top and ps report status D and R for btrfs replace process before and after logging out. And after kernel threads cease, 'btrfs scrub status' and 'fi show' indicate the migration happened correctly, and the file system works as expected.

Expected results:

Not sure, but the loss of scrub statistics and status is suboptimal even if the scrub kernel process continues to work and fixes problems.
Comment 1 Chris Murphy 2016-07-31 00:36:58 UTC
This has been supported for a while so it's not necessary to use systemd v230, just make sure KillUserProcesses=yes to reproduce. systemd v230 is the first to default to KillUserProcesses=yes.
Comment 2 Chris Murphy 2016-08-01 18:17:18 UTC
Looks like scrub user space code is doing the accounting, so when it goes status Z on logout, it stops the accounting. Subsequent status checks are stale for the entire duration of the continuing scrub done by the kernel, which proceeds unimpeded. Once the kernel scrub is done, btrfs scrub status reports interruption which actually means the user space accounting was interrupted, not the scrub itself.

Fix needed is for user space scrub code to just issue commands, and poll for status, rather than doing the statistical accounting work. This already appears to be the case for btrfs replace, and is also consistent with how LVM tools work for similar operations like pvmove and lvchange --syncaction.


Note You need to log in before you can comment on or make changes to this bug.