Bug 199857
Summary: | dmcrypt threads compete excessively with user processes for CPU time | ||
---|---|---|---|
Product: | Process Management | Reporter: | Vito Caputo (vcaputo) |
Component: | Other | Assignee: | Tejun Heo (tj) |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | cb-kernel, eballetbo, gmazyland, lskrejci, maxxozz, Rondom, snitzer, thuryn1, ua_bugzilla_linux-kernel |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 4.12+ | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: | git_bisect.log |
Description
Vito Caputo
2018-05-28 03:49:19 UTC
I would just like to note what I suspect is going on with people being unable to reproduce this is they're not using sufficiently small machines. The process scheduling decisions based on priority become relatively unimportant when there's such an abundance of CPU resources all runnable processes get a chance to run frequently enough to avoid starving the audio buffers, regardless of the order. However, in situations where the processes are contending for more scarce CPU resources, the scheduling decisions and process priorities become important. What I am observing is a regular priority user git process generating a large amount of IO. This IO is becoming transformed into substantial CPU work via dmcrypt. By proxy through this lvm->dmcrypt storage circuit, this regular priority process is effectively elevated to a higher priority, by virtue of the IO it generated being processed in a high-priority workqueue. The mp3 player process, also a regular priority process, becomes starved for CPU by the dmcrypt workqueue long enough for audio buffer underruns to occur. The git user process has effectively DoS'd the mp3 player, by leveraging dmcrypt to elevate the priority of its workload in the CPU domain through IO on a dmcrypt volume. I've also confirmed mplaying the MP3s from a tmpfs suffers equally. It is not a dmcrypt IO contention issue causing the MP3 playback to stutter, this is a CPU scheduling problem. It's also reproducible using `mplayer` instead of `cmus`. I just now reproduced it trivially by copying an MP3 to /tmp, running `mplayer /tmp/foo.mp3`, then going into my clone of Linus' linux git tree which is checked out to master and running `git checkout v3.12`. A few seconds later, disk activity picks up and the music begins stuttering regularly until the checkout finishes. Part of the problem here is the crossing of IO->CPU resource domains dmcrypt performs without translating scheduling/priority information. As-is, a regular priority user process can leverage IO on a dmcrypt volume to hog the CPU of the system by simply generating a huge amount of IO. Vito, good find. I hit the same with my i3-2120 CPU, btrfs on dm-crypt. I was about to blame the scheduler or MQ related things but it looks more like I can confirm it's caused by said commit. When vm dirty cache is full and writeback kicks off, I see 4 kworker (dmcrypt_write) threads taking all CPU resources. This leads to audio stuttering (pulseaudio) and a lagging desktop (plasma). I reverted to the latest kernel without that patch in the gentoo tree (4.9.122) and audio and desktop both stay responsive even on high IO load. I will try to revert the patch in a newer 4.18 kernel and report back. Assigning to Tejun as next step, we have issues with or without commit a1b891 so we need "something else" -- open to suggestions. Can confirm, a bisect revealed the same commit for me. I did not find much information about the underlying problem this commit is addressing. I found a message on Patchwork regarding Bullhead: https://patchwork.kernel.org/patch/9692215/ In this proposed patch WQ_CPU_INTENSIVE gets _replaced_ with WQ_HIGHPRI. But with commit a1b891 WQ_HIGHPRI was _added_ to WQ_CPU_INTENSIVE. Maybe someone could point us to more information. Old discussions or a test case perhaps. I'd like to have a counterpart for my tests. My current test is as simple as a dd if=/dev/urandom of=foo and see (hear) if pulseaudio has dropouts. Created attachment 279055 [details]
git_bisect.log
I observe similar problems, device-mapper layout like this:
> RAID (mdadm raid1) --> LUKS --> LVM --> xfs
Everything works, until the moment when my scripts starts the deletion of a large number of files (for example, a directory of about 80G, with many small files).
Scripts simple use find:
```
"ionice -c 3 ${FIND_CMD} -delete && sleep 5"
```
as you can see uses ionice and sleep. But even then *system stuck*:
+ high value IO-wait : time waiting for I/O completion
+ high value system load avg over the last 1 min (~300 in our case)
+ drop net-packet (timeout on Layer7-HTTP-checks)
In logs tasks blocked:
```
Nov 6 23:59:44 lb-svc-01 kernel: [4873194.322249] INFO: task kworker/0:2:29122 blocked for more than 120 seconds.
Nov 6 23:59:44 lb-svc-01 kernel: [4873194.326437] Not tainted 4.15.0-62-generic #69-Ubuntu
Nov 6 23:59:44 lb-svc-01 kernel: [4873194.330075] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 6 23:59:44 lb-svc-01 kernel: [4873194.333725] kworker/0:2 D 0 29122 2 0x80000000
Nov 6 23:59:44 lb-svc-01 kernel: [4873194.333814] Workqueue: xfs-cil/dm-5 xlog_cil_push_work [xfs]
Nov 6 23:59:44 lb-svc-01 kernel: [4873194.333818] Call Trace:
Nov 6 23:59:44 lb-svc-01 kernel: [4873194.333832] __schedule+0x24e/0x880
Nov 6 23:59:44 lb-svc-01 kernel: [4873194.333900] ? xlog_bdstrat+0x2b/0x60 [xfs]
Nov 6 23:59:44 lb-svc-01 kernel: [4873194.333906] schedule+0x2c/0x80
Nov 6 23:59:44 lb-svc-01 kernel: [4873194.333964] xlog_state_get_iclog_space+0x105/0x2d0 [xfs]
Nov 6 23:59:44 lb-svc-01 kernel: [4873194.333970] ? wake_up_q+0x80/0x80
Nov 6 23:59:44 lb-svc-01 kernel: [4873194.334024] xlog_write+0x163/0x6e0 [xfs]
Nov 6 23:59:44 lb-svc-01 kernel: [4873194.334080] xlog_cil_push+0x2a7/0x410 [xfs]
Nov 6 23:59:44 lb-svc-01 kernel: [4873194.334086] ? __switch_to_asm+0x41/0x70
Nov 6 23:59:44 lb-svc-01 kernel: [4873194.334136] xlog_cil_push_work+0x15/0x20 [xfs]
Nov 6 23:59:44 lb-svc-01 kernel: [4873194.334142] process_one_work+0x1de/0x420
Nov 6 23:59:44 lb-svc-01 kernel: [4873194.334146] worker_thread+0x228/0x410
Nov 6 23:59:44 lb-svc-01 kernel: [4873194.334150] kthread+0x121/0x140
Nov 6 23:59:44 lb-svc-01 kernel: [4873194.334154] ? process_one_work+0x420/0x420
Nov 6 23:59:44 lb-svc-01 kernel: [4873194.334158] ? kthread_create_worker_on_cpu+0x70/0x70
Nov 6 23:59:44 lb-svc-01 kernel: [4873194.334162] ret_from_fork+0x35/0x40
```
| iotop shows: | |
| DISK READ | DISK WRITE | SWAPIN | IO> | COMMAND |
| 0.00 B/s | 0.00 B/s | 0.00 % | 99.99%| [dmcrypt_write] |
| | |
I will try to reproduce it on a clean freshly installed system
It's unbelievable that this is still broken in mainline kernels. I can't even do screen captures to disk without reverting a1b89132. I'm not even talking compressed video, just trying to get some uncompressed screenshots of a game under development for publishing using a dumb loop like: `for ((i = 0; i < 999999; i++)); do xwd -id $gamewin -out $i.xwd; sleep .1s; done` makes the game halt whenever writeback occurs because I use dmcrypt for FDE. Why is this problem being ignored by upstream? It's insanity, there's no way I'm the only person experiencing this and it makes Linux w/FDE on the desktop behave *horribly*. It's embarrassingly bad. Commit a1b89132 is a *REGRESSION* and it's been ignored since 4.12! TBH, I agree. I am personally reverting this patch for my laptop for the last several releases. It seems that soem hw is more sensitive here, but it does not matter. Mike, could you please think again about reverting a1b89132 for dm-crypt? (In reply to Milan Broz from comment #8) > TBH, I agree. I am personally reverting this patch for my laptop for the > last several releases. It seems that soem hw is more sensitive here, but it > does not matter. > > Mike, could you please think again about reverting a1b89132 for dm-crypt? I've staged the revert for 5.5, please see: https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.5&id=f612b2132db529feac4f965f28a1b9258ea7c22b I have the same problem - system (KDE Plasma) stucks when kernel starts to flush on device. I've just tried to apply the patch on my kernel and the result is noticeable better. But still - when I create two 1GB files, the system is stuck for a while. But it's more important to me that system is responsive. It's not a problem for me when file is written in 2 minutes latter. I can try to remove WQ_CPU_INTENSIVE flags also and I will see. FYI: revert patch mentioned in comment #9 is now in stable kernel trees. |