'btrfs scrub' is interrupted, and status/statistcal information is lost, if the user logs out of their current shell session before scrub completes, when systemd-logind login.conf KillUserProcesses=yes. When set to no, the process survives and whatever accounting its doing is allowed to complete.
1. Use systemd 230+, make sure /etc/systemd/login.conf KillUserProcesses=yes.
2. GNOME Terminal, btrfs scrub start
3. Logout before scrub completes
[root@localhost ~]# btrfs scrub status /
scrub status for 592866ce-c4d0-40ae-bb0c-9de365a1bc00
scrub started at Sat Jul 30 12:59:05 2016, interrupted after 00:00:05, not running
total bytes scrubbed: 2.71GiB with 0 errors
There's some evidence kernel threads continue working beyond 5 seconds. So I'm not sure if the scrub is actually finishing, and it's just the killing off of the user space program that's causing the scrub statistics and accounting to be wrong. Or if the kernel processes eventually die off also and the scrub is in fact incomplete.
top and ps report status S for the btrfs scrub process prior to logout; immediate after logout, status goes to Z.
top and ps report status D and R for btrfs balance process before and after logging out. And journalctl -f reports rellocation effects well beyond the user logout, so I'm pretty sure balance is completing.
top and ps report status D and R for btrfs replace process before and after logging out. And after kernel threads cease, 'btrfs scrub status' and 'fi show' indicate the migration happened correctly, and the file system works as expected.
Not sure, but the loss of scrub statistics and status is suboptimal even if the scrub kernel process continues to work and fixes problems.
This has been supported for a while so it's not necessary to use systemd v230, just make sure KillUserProcesses=yes to reproduce. systemd v230 is the first to default to KillUserProcesses=yes.
Looks like scrub user space code is doing the accounting, so when it goes status Z on logout, it stops the accounting. Subsequent status checks are stale for the entire duration of the continuing scrub done by the kernel, which proceeds unimpeded. Once the kernel scrub is done, btrfs scrub status reports interruption which actually means the user space accounting was interrupted, not the scrub itself.
Fix needed is for user space scrub code to just issue commands, and poll for status, rather than doing the statistical accounting work. This already appears to be the case for btrfs replace, and is also consistent with how LVM tools work for similar operations like pvmove and lvchange --syncaction.