Bug 199857

Summary: dmcrypt threads compete excessively with user processes for CPU time
Product: Process Management Reporter: Vito Caputo (vcaputo)
Component: OtherAssignee: Tejun Heo (tj)
Status: RESOLVED CODE_FIX    
Severity: normal CC: cb-kernel, eballetbo, gmazyland, lskrejci, maxxozz, Rondom, snitzer, thuryn1, ua_bugzilla_linux-kernel
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.12+ Subsystem:
Regression: Yes Bisected commit-id:
Attachments: git_bisect.log

Description Vito Caputo 2018-05-28 03:49:19 UTC
Whenever I have significant IO happening on my LVM2+dmcrypt filesystem, a heavyweight git checkout operation on the linux source tree for example, my mp3 player becomes starved for CPU time long enough to cause audio dropouts.

It's like dejavu from using my 486 PC in the 90s when I had to upgrade my system from IDE to SCSI to prevent this from happening in Linux back then.  But it ceased being a problem for over a decade until relatively recently.

I've bisected the offending commit to a1b891:

commit a1b89132dc4f61071bdeaab92ea958e0953380a1
Author: Tim Murray <timmurray@google.com>
Date:   Fri Apr 21 11:11:36 2017 +0200

    dm crypt: use WQ_HIGHPRI for the IO and crypt workqueues
    
    Running dm-crypt with workqueues at the standard priority results in IO
    competing for CPU time with standard user apps, which can lead to
    pipeline bubbles and seriously degraded performance.  Move to using
    WQ_HIGHPRI workqueues to protect against that.
    
    Signed-off-by: Tim Murray <timmurray@google.com>
    Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
    Signed-off-by: Mike Snitzer <snitzer@redhat.com>


When I revert this commit, the problem completely vanishes.  I've brought this up on LKML and CCd Tim and Enric, but it seems to have gone nowhere.

I'm under the impression the developers are using such powerful machines to try reproduce this that there's enough spare CPU even with dmcrypt running like gangbusters.

My machine is a Core2Duo 1.8ghz w/Samsung SSD 840 EVO 250GB, and it's _trivial_ to reproduce this problem.  If you don't have such a machine, just use a raspberry pi or something.

I'm using the music player `cmus` to listen to mp3s, the distribution is Debian 9 on amd64.  All it takes is a /big/ git checkout, like switching between old tags and master on the linux tree.  Once the IO gets going, the audio starts dropping.
Comment 1 Vito Caputo 2018-05-28 18:16:05 UTC
I would just like to note what I suspect is going on with people being unable to reproduce this is they're not using sufficiently small machines.

The process scheduling decisions based on priority become relatively unimportant when there's such an abundance of CPU resources all runnable processes get a chance to run frequently enough to avoid starving the audio buffers, regardless of the order.

However, in situations where the processes are contending for more scarce CPU resources, the scheduling decisions and process priorities become important.

What I am observing is a regular priority user git process generating a large amount of IO.  This IO is becoming transformed into substantial CPU work via dmcrypt.  By proxy through this lvm->dmcrypt storage circuit, this regular priority process is effectively elevated to a higher priority, by virtue of the IO it generated being processed in a high-priority workqueue.

The mp3 player process, also a regular priority process, becomes starved for CPU by the dmcrypt workqueue long enough for audio buffer underruns to occur.  The git user process has effectively DoS'd the mp3 player, by leveraging dmcrypt to elevate the priority of its workload in the CPU domain through IO on a dmcrypt volume.

I've also confirmed mplaying the MP3s from a tmpfs suffers equally.  It is not a dmcrypt IO contention issue causing the MP3 playback to stutter, this is a CPU scheduling problem.

It's also reproducible using `mplayer` instead of `cmus`.  I just now reproduced it trivially by copying an MP3 to /tmp, running `mplayer /tmp/foo.mp3`, then going into my clone of Linus' linux git tree which is checked out to master and running `git checkout v3.12`.  A few seconds later, disk activity picks up and the music begins stuttering regularly until the checkout finishes.

Part of the problem here is the crossing of IO->CPU resource domains dmcrypt performs without translating scheduling/priority information.  As-is, a regular priority user process can leverage IO on a dmcrypt volume to hog the CPU of the system by simply generating a huge amount of IO.
Comment 2 Christian Buchinger 2018-10-14 16:09:53 UTC
Vito,

good find. I hit the same with my i3-2120 CPU, btrfs on dm-crypt.

I was about to blame the scheduler or MQ related things but it looks more like I can confirm it's caused by said commit.

When vm dirty cache is full and writeback kicks off, I see 4 kworker (dmcrypt_write) threads taking all CPU resources. This leads to audio stuttering (pulseaudio) and a lagging desktop (plasma).

I reverted to the latest kernel without that patch in the gentoo tree (4.9.122) and audio and desktop both stay responsive even on high IO load.

I will try to revert the patch in a newer 4.18 kernel and report back.
Comment 3 Mike Snitzer 2018-10-16 15:45:10 UTC
Assigning to Tejun as next step, we have issues with or without commit a1b891 so we need "something else" -- open to suggestions.
Comment 4 Christian Buchinger 2018-10-16 17:39:46 UTC
Can confirm, a bisect revealed the same commit for me.

I did not find much information about the underlying problem this commit is addressing.

I found a message on Patchwork regarding Bullhead: https://patchwork.kernel.org/patch/9692215/
In this proposed patch WQ_CPU_INTENSIVE gets _replaced_ with WQ_HIGHPRI. But with commit a1b891 WQ_HIGHPRI was _added_ to WQ_CPU_INTENSIVE.

Maybe someone could point us to more information. Old discussions or a test case perhaps. I'd like to have a counterpart for my tests.

My current test is as simple as a dd if=/dev/urandom of=foo and see (hear) if pulseaudio has dropouts.
Comment 5 Christian Buchinger 2018-10-16 17:40:49 UTC
Created attachment 279055 [details]
git_bisect.log
Comment 6 Maksim 2019-11-13 13:11:30 UTC
I observe similar problems, device-mapper layout like this:

> RAID (mdadm raid1) --> LUKS --> LVM --> xfs


Everything works, until the moment when my scripts starts the deletion of a large number of files (for example, a directory of about 80G, with many small files).
Scripts simple use find: 
```
"ionice -c 3 ${FIND_CMD} -delete && sleep 5"
```

as you can see uses ionice and sleep. But even then *system stuck*:

+  high value IO-wait : time waiting for I/O completion
+  high value system load avg over the last 1 min (~300 in our case)
+  drop net-packet (timeout on Layer7-HTTP-checks) 

In logs tasks blocked:
```
Nov  6 23:59:44 lb-svc-01 kernel: [4873194.322249] INFO: task kworker/0:2:29122 blocked for more than 120 seconds.
Nov  6 23:59:44 lb-svc-01 kernel: [4873194.326437]       Not tainted 4.15.0-62-generic #69-Ubuntu
Nov  6 23:59:44 lb-svc-01 kernel: [4873194.330075] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov  6 23:59:44 lb-svc-01 kernel: [4873194.333725] kworker/0:2     D    0 29122      2 0x80000000
Nov  6 23:59:44 lb-svc-01 kernel: [4873194.333814] Workqueue: xfs-cil/dm-5 xlog_cil_push_work [xfs]
Nov  6 23:59:44 lb-svc-01 kernel: [4873194.333818] Call Trace:
Nov  6 23:59:44 lb-svc-01 kernel: [4873194.333832]  __schedule+0x24e/0x880
Nov  6 23:59:44 lb-svc-01 kernel: [4873194.333900]  ? xlog_bdstrat+0x2b/0x60 [xfs]
Nov  6 23:59:44 lb-svc-01 kernel: [4873194.333906]  schedule+0x2c/0x80
Nov  6 23:59:44 lb-svc-01 kernel: [4873194.333964]  xlog_state_get_iclog_space+0x105/0x2d0 [xfs]
Nov  6 23:59:44 lb-svc-01 kernel: [4873194.333970]  ? wake_up_q+0x80/0x80
Nov  6 23:59:44 lb-svc-01 kernel: [4873194.334024]  xlog_write+0x163/0x6e0 [xfs]
Nov  6 23:59:44 lb-svc-01 kernel: [4873194.334080]  xlog_cil_push+0x2a7/0x410 [xfs]
Nov  6 23:59:44 lb-svc-01 kernel: [4873194.334086]  ? __switch_to_asm+0x41/0x70
Nov  6 23:59:44 lb-svc-01 kernel: [4873194.334136]  xlog_cil_push_work+0x15/0x20 [xfs]
Nov  6 23:59:44 lb-svc-01 kernel: [4873194.334142]  process_one_work+0x1de/0x420
Nov  6 23:59:44 lb-svc-01 kernel: [4873194.334146]  worker_thread+0x228/0x410
Nov  6 23:59:44 lb-svc-01 kernel: [4873194.334150]  kthread+0x121/0x140
Nov  6 23:59:44 lb-svc-01 kernel: [4873194.334154]  ? process_one_work+0x420/0x420
Nov  6 23:59:44 lb-svc-01 kernel: [4873194.334158]  ? kthread_create_worker_on_cpu+0x70/0x70
Nov  6 23:59:44 lb-svc-01 kernel: [4873194.334162]  ret_from_fork+0x35/0x40
```

|  iotop shows:  |          |
| DISK READ | DISK WRITE | SWAPIN | IO> | COMMAND |
| 0.00 B/s  | 0.00 B/s   | 0.00 % | 99.99%| [dmcrypt_write] |
| | |

I will try to reproduce it on a clean freshly installed system
Comment 7 Vito Caputo 2019-11-20 07:20:03 UTC
It's unbelievable that this is still broken in mainline kernels.

I can't even do screen captures to disk without reverting a1b89132.  I'm not even talking compressed video, just trying to get some uncompressed screenshots of a game under development for publishing using a dumb loop like:

`for ((i = 0; i < 999999; i++)); do xwd -id $gamewin -out $i.xwd; sleep .1s; done`

makes the game halt whenever writeback occurs because I use dmcrypt for FDE.

Why is this problem being ignored by upstream?  It's insanity, there's no way I'm the only person experiencing this and it makes Linux w/FDE on the desktop behave *horribly*.  It's embarrassingly bad.

Commit a1b89132 is a *REGRESSION* and it's been ignored since 4.12!
Comment 8 Milan Broz 2019-11-20 07:51:00 UTC
TBH, I agree. I am personally reverting this patch for my laptop for the last several releases. It seems that soem hw is more sensitive here, but it does not matter.

Mike, could you please think again about reverting a1b89132 for dm-crypt?
Comment 9 Mike Snitzer 2019-11-20 22:46:36 UTC
(In reply to Milan Broz from comment #8)
> TBH, I agree. I am personally reverting this patch for my laptop for the
> last several releases. It seems that soem hw is more sensitive here, but it
> does not matter.
> 
> Mike, could you please think again about reverting a1b89132 for dm-crypt?

I've staged the revert for 5.5, please see:
https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.5&id=f612b2132db529feac4f965f28a1b9258ea7c22b
Comment 10 Tomáš Hurýn 2019-12-05 15:06:52 UTC
I have the same problem - system (KDE Plasma) stucks when kernel starts to flush on device. I've just tried to apply the patch on my kernel and the result is noticeable better. But still - when I create two 1GB files, the system is stuck for a while. But it's more important to me that system is responsive. It's not a problem for me when file is written in 2 minutes latter. I can try to remove WQ_CPU_INTENSIVE flags also and I will see.
Comment 11 Milan Broz 2019-12-05 15:19:39 UTC
FYI: revert patch mentioned in comment #9 is now in stable kernel trees.