Created attachment 224171 [details] kernel log for a test on 4.7.0rc6 SHORT SUMMARY ============= Given a md RAID10 (here: imsm) that has a plain dm-crypt (aes-xts-plain64) on-top, issuing a simple... #> dd if=/dev/zero of=/dev/mapper/crypted-device bs=512K status=progress ... will very quickly (as-in: <5 seconds) eat up all memory and force the OOM killer into action **if** a check/repair/resync on the RAID is really in progress, meaning both the minimum and maximum RAID speed have to be set to a value greater than 0. If no such action is in progress, memory consumption will still spike to almost the total RAM but the system stays stable and big memory consumers can be started without any trouble. The issue will also very likely trigger with a mkfs.ext4 on the crypted-device, but it is not as reliable in triggering it as the above dd command. DETAILS ======= Using direct i/o with dd/mkfs.ext4 will prevent this from happening completely, also the excessive memory usage will be gone entirely. Adjusting vm.dirty_background_ratio and vm.dirty_ratio does in fact "hide" the problem but proper values do vary depending on system memory: Whereas with 32 GiB of RAM setting dirty_background_ratio=5 and dirty_ratio=10 works mostly fine, with 8 GiB of RAM you have to more than half that again at the very least. Setting both to 0 will in fact be same as using direct i/o and does neither show excessive memory usage nor any problems. While dd is running, slabinfo clearly highlights bio-3 and kmalloc-256 as the biggest consumers. Here a ranking made by Michal Hocko: $ zcat slabinfo.txt.gz | awk '{printf "%s %d\n" , $1, $6*$15}' | head_and_tail.sh 133 | paste-with-diff.sh | sort -n -k3 initial diff [#pages] radix_tree_node 3444 2592 debug_objects_cache 388 46159 file_lock_ctx 114 138570 buffer_head 5616 238704 kmalloc-256 328 573164 bio-3 24 1118984 Since I am setting up a new machine, I have to do all tests in a Live environment, booted through USB -- mainly OpenSuSE Tumbleweed and Fedora Rawhide. Yet I have not been able to reproduce this exact problem in any kind of VM I tried with the same images. It is only 100% reproducible on the real thing which complicates matters since after all my testing is done, I have a nice long resync ahead of me. Monitoring the memory usage with free shows that both "used" and "buffer/cache" are rising almost equally until the mem is exhausted (with some variation delta but in general it holds true). The memory is clearly spent in-kernel as no process is eating up all that memory. All my tests are done on a IMSM-based RAID10. I cannot say for certain that this is or is not a contributing factor. Unfortunately I can only test it with this particular array. I did several tests with an external disk connected through USB3 and put a RAID10 on-top of it (with several partitions) but I was unable to reproduce this problem -- probably because of differences between the underlying interface/subsystem (SATA3 vs USB3). If the dd command is running, one can slowly increase the resync speed through the sysctl knobs. There is a certain threshold, once you cross that, you will run into OOM. In my case it is something north of 8 MiB/s. Below that, it is still fine. If a sync_action is in progress but the speed is at zero, it is the same as no action were in progress -- everything works, even though memory usage is still excessive but big memory consumers can be started and the system can be used without the OOM killer getting in the way. TEST SYSTEM =========== RAID10 (imsm) over 4 disks (SATA3) w/ 64 KiB chunk size on a Z170-based system with 32 GiB of RAM (CPU: i7 6700k). Tests were all done on either a OpenSuSE Tumbleweed or Fedora Rawhide live system booted through USB.
Created attachment 224181 [details] slabinfo progress log for 4.7.0rc6 This is a timely progression of slabinfo while dd was running until the OOM killer kicked in.
Since I could only select one component for this bug, I would kindly ask to cc' the md (and mm?) folks as well as it is still unclear where the real problem lies and I guess it falls in all those areas.
I am able to reproduce this on: Linux localhost 4.8.8-2-ARCH #1 SMP PREEMPT Thu Nov 17 14:51:03 CET 2016 x86_64 GNU/Linux # cryptsetup -v --cipher aes-xts-plain64 --key-size 512 --hash sha512 --iter-time 5000 --use-urandom luksFormat /dev/sdc1 # cryptsetup open /dev/sdc1 encryption_test # dd if=/dev/zero of=/dev/mapper/encryption_test status=progress bs=1M Without the following settings: vm.dirty_background_ratio = 5 vm.dirty_ratio = 10 Will slow the machine to a halt, load will climb to 10+ and the system will become unresponsive. Tested the same drive without LUKS/dm-crypt and was able to run dd without issue. The problem does subside completely using direct IO: vm.dirty_background_ratio = 0 vm.dirty_ratio = 0 Tested using an external USB hard drive.
This title also may not be the best for this bug as it is reproducible on multiple types of disk, not just a RAID array undergoing repairs.
I would like top amend my earlier comment, I actually cannot reproduce this on my internal RAID controller. I am using a z170 chipset is it possible this is related to some interrupt congestion?