Me and a lot of other users have an issue where disk writes start fast (e.g. 200 MB/sec), but after intensive disk usage, they end up 100+ times slower (e.g. 2 MB/sec), and never get fast again until we run "echo 3 > /proc/sys/vm/drop_caches". This issue happens on systems with any 4.x kernel, i386 arch, 16+ GB RAM. It doesn't happen if we use 3.x kernels (i.e. it's a regression) or any 64bit kernels (i.e. it only affects i386). My initial bug report was in Ubuntu: https://bugs.launchpad.net/ubuntu/+source/linux-hwe/+bug/1698118 I included a test case there, which mostly says "Copy /lib around 100 times. You'll see that the first copy happens in 5 seconds, and the 30th copy may need more than 800 seconds". Here is my latest version of the script (basically, the (3) step below): 1) . /etc/os-release; echo -n "$VERSION, $(uname -r), $(dpkg --print-architecture), RAM="; awk '/MemTotal:/ { print $2 }' /proc/meminfo 2) mount /dev/sdb2 /mnt && rm -rf /mnt/tmp/lib && mkdir -p /mnt/tmp/lib && sync && echo 3 > /proc/sys/vm/drop_caches && chroot /mnt 3) mkdir -p /tmp/lib; cd /tmp/lib; s=/lib; d=1; echo -n "Copying $s to $d: "; while /usr/bin/time -f %e sh -c "cp -a '$s' '$d'; sync"; do s=$d; d=$((($d+1)%100)); echo -n "Copying $s to $d: "; done And here are some results, where you can see that all 4.x+ i386 kernels are affected: ----------------------------------------------------------------------------- 14.04, Trusty Tahr, 3.13.0-24-generic, i386, RAM=16076400 [Live CD] 8-13 secs 15.04 (Vivid Vervet), 3.19.0-15-generic, i386, RAM=16083080 [Live CD] 5-7 secs 15.10 (Wily Werewolf), 4.2.0-16-generic, i386, RAM=16082536 [Live CD] 4-350 secs 16.04.2 LTS (Xenial Xerus), 3.19.0-80-generic, i386, RAM=16294832 [HD install] 10-25 secs 16.04.2 LTS (Xenial Xerus), 4.2.0-42-generic, i386, RAM=16294392 [HD install] 14-89 secs 16.04.2 LTS (Xenial Xerus), 4.4.0-79-generic, i386, RAM=16293556 [HD install] 15-605 secs 16.04.2 LTS (Xenial Xerus), 4.8.0-54-generic, i386, RAM=16292708 [HD install] 6-160 secs 16.04.2 LTS (Xenial Xerus), 4.12.0-041200rc5-generic, i386, RAM=16292588 [HD install] 46-805 secs 16.04.2 LTS (Xenial Xerus), 4.8.0-36-generic, amd64, RAM=16131028 [Live CD] 4-11 secs An example single run of the script: ----------------------------------------------------------------------------- 16.04.2 LTS (Xenial Xerus), 4.8.0-54-generic, i386, RAM=16292708 [HD install] ----------------------------------------------------------------------------- Copying /lib to 1: 37.23 Copying 1 to 2: 6.74 Copying 2 to 3: 6.88 Copying 3 to 4: 7.89 Copying 4 to 5: 7.91 Copying 5 to 6: 9.03 Copying 6 to 7: 8.46 Copying 7 to 8: 8.10 Copying 8 to 9: 8.93 Copying 9 to 10: 10.51 Copying 10 to 11: 10.33 Copying 11 to 12: 11.08 Copying 12 to 13: 11.78 Copying 13 to 14: 14.18 Copying 14 to 15: 18.42 Copying 15 to 16: 23.19 Copying 16 to 17: 61.08 Copying 17 to 18: 155.88 Copying 18 to 19: 141.96 Copying 19 to 20: 152.98 Copying 20 to 21: 163.03 Copying 21 to 22: 154.85 Copying 22 to 23: 137.13 Copying 23 to 24: 146.08 Copying 24 to 25: Thank you!
(switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). hm, that's news to me. Does anyone have access to a large i386 setup? Interested in reproducing this and figuring out what's going wrong? On Thu, 22 Jun 2017 06:25:49 +0000 bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=196157 > > Bug ID: 196157 > Summary: 100+ times slower disk writes on 4.x+/i386/16+RAM, > compared to 3.x > Product: Memory Management > Version: 2.5 > Kernel Version: 4.x > Hardware: All > OS: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Page Allocator > Assignee: akpm@linux-foundation.org > Reporter: alkisg@gmail.com > Regression: No > > Me and a lot of other users have an issue where disk writes start fast (e.g. > 200 MB/sec), but after intensive disk usage, they end up 100+ times slower > (e.g. 2 MB/sec), and never get fast again until we run "echo 3 > > /proc/sys/vm/drop_caches". > > This issue happens on systems with any 4.x kernel, i386 arch, 16+ GB RAM. > It doesn't happen if we use 3.x kernels (i.e. it's a regression) or any 64bit > kernels (i.e. it only affects i386). > > My initial bug report was in Ubuntu: > https://bugs.launchpad.net/ubuntu/+source/linux-hwe/+bug/1698118 > > I included a test case there, which mostly says "Copy /lib around 100 times. > You'll see that the first copy happens in 5 seconds, and the 30th copy may > need > more than 800 seconds". > > Here is my latest version of the script (basically, the (3) step below): > 1) . /etc/os-release; echo -n "$VERSION, $(uname -r), $(dpkg > --print-architecture), RAM="; awk '/MemTotal:/ { print $2 }' /proc/meminfo > 2) mount /dev/sdb2 /mnt && rm -rf /mnt/tmp/lib && mkdir -p /mnt/tmp/lib && > sync > && echo 3 > /proc/sys/vm/drop_caches && chroot /mnt > 3) mkdir -p /tmp/lib; cd /tmp/lib; s=/lib; d=1; echo -n "Copying $s to $d: "; > while /usr/bin/time -f %e sh -c "cp -a '$s' '$d'; sync"; do s=$d; > d=$((($d+1)%100)); echo -n "Copying $s to $d: "; done > > And here are some results, where you can see that all 4.x+ i386 kernels are > affected: > ----------------------------------------------------------------------------- > 14.04, Trusty Tahr, 3.13.0-24-generic, i386, RAM=16076400 [Live CD] > 8-13 secs > > 15.04 (Vivid Vervet), 3.19.0-15-generic, i386, RAM=16083080 [Live CD] > 5-7 secs > > 15.10 (Wily Werewolf), 4.2.0-16-generic, i386, RAM=16082536 [Live CD] > 4-350 secs > > 16.04.2 LTS (Xenial Xerus), 3.19.0-80-generic, i386, RAM=16294832 [HD > install] > 10-25 secs > > 16.04.2 LTS (Xenial Xerus), 4.2.0-42-generic, i386, RAM=16294392 [HD install] > 14-89 secs > > 16.04.2 LTS (Xenial Xerus), 4.4.0-79-generic, i386, RAM=16293556 [HD install] > 15-605 secs > > 16.04.2 LTS (Xenial Xerus), 4.8.0-54-generic, i386, RAM=16292708 [HD install] > 6-160 secs > > 16.04.2 LTS (Xenial Xerus), 4.12.0-041200rc5-generic, i386, RAM=16292588 [HD > install] > 46-805 secs > > 16.04.2 LTS (Xenial Xerus), 4.8.0-36-generic, amd64, RAM=16131028 [Live CD] > 4-11 secs > > An example single run of the script: > ----------------------------------------------------------------------------- > 16.04.2 LTS (Xenial Xerus), 4.8.0-54-generic, i386, RAM=16292708 [HD install] > ----------------------------------------------------------------------------- > Copying /lib to 1: 37.23 > Copying 1 to 2: 6.74 > Copying 2 to 3: 6.88 > Copying 3 to 4: 7.89 > Copying 4 to 5: 7.91 > Copying 5 to 6: 9.03 > Copying 6 to 7: 8.46 > Copying 7 to 8: 8.10 > Copying 8 to 9: 8.93 > Copying 9 to 10: 10.51 > Copying 10 to 11: 10.33 > Copying 11 to 12: 11.08 > Copying 12 to 13: 11.78 > Copying 13 to 14: 14.18 > Copying 14 to 15: 18.42 > Copying 15 to 16: 23.19 > Copying 16 to 17: 61.08 > Copying 17 to 18: 155.88 > Copying 18 to 19: 141.96 > Copying 19 to 20: 152.98 > Copying 20 to 21: 163.03 > Copying 21 to 22: 154.85 > Copying 22 to 23: 137.13 > Copying 23 to 24: 146.08 > Copying 24 to 25: > > Thank you! > > -- > You are receiving this mail because: > You are the assignee for the bug.
Στις 22/06/2017 10:37 μμ, ο Andrew Morton έγραψε: > > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > hm, that's news to me. > > Does anyone have access to a large i386 setup? Interested in > reproducing this and figuring out what's going wrong? > I can arrange ssh/vnc access to an i386 box with 16 GB RAM that has the issue, if some kernel dev wants to work on that. Please PM me for details - also tell me your preferred distro.
On Thu 22-06-17 12:37:36, Andrew Morton wrote: [...] > > Me and a lot of other users have an issue where disk writes start fast > (e.g. > > 200 MB/sec), but after intensive disk usage, they end up 100+ times slower > > (e.g. 2 MB/sec), and never get fast again until we run "echo 3 > > > /proc/sys/vm/drop_caches". What is your dirty limit configuration. Is your highmem dirtyable (highmem_is_dirtyable)? > > This issue happens on systems with any 4.x kernel, i386 arch, 16+ GB RAM. > > It doesn't happen if we use 3.x kernels (i.e. it's a regression) or any > 64bit > > kernels (i.e. it only affects i386). I remember we've had some changes in the way how the dirty memory is throttled and 32b would be more sensitive to those changes. Anyway, I would _strongly_ discourage you from using 32b kernels with that much of memory. You are going to hit walls constantly and many of those issues will be inherent. Some of them less so but rather non-trivial to fix without regressing somewhere else. You can tune your system somehow but this will be fragile no mater what. Sorry to say that but 32b systems with tons of memory are far from priority of most mm people. Just use 64b kernel. There are more pressing problems to deal with.
Στις 23/06/2017 10:13 πμ, ο Michal Hocko έγραψε: > On Thu 22-06-17 12:37:36, Andrew Morton wrote: > > What is your dirty limit configuration. Is your highmem dirtyable > (highmem_is_dirtyable)? > >>> This issue happens on systems with any 4.x kernel, i386 arch, 16+ GB RAM. >>> It doesn't happen if we use 3.x kernels (i.e. it's a regression) or any >>> 64bit >>> kernels (i.e. it only affects i386). > > I remember we've had some changes in the way how the dirty memory is > throttled and 32b would be more sensitive to those changes. Anyway, I > would _strongly_ discourage you from using 32b kernels with that much of > memory. You are going to hit walls constantly and many of those issues > will be inherent. Some of them less so but rather non-trivial to fix > without regressing somewhere else. You can tune your system somehow but > this will be fragile no mater what. > > Sorry to say that but 32b systems with tons of memory are far from > priority of most mm people. Just use 64b kernel. There are more pressing > problems to deal with. > Hi, I'm attaching below all my settings from /proc/sys/vm. I think that the regression also affects 4 GB and 8 GB RAM i386 systems, but not in an exponential manner; i.e. copies there are appear only 2-3 times slower than they used to be in 3.x kernels. Now I don't know the kernel internals, but if disk copies show up to be 2-3 times slower, and the regression is in memory management, wouldn't that mean that the memory management is *hundreds* of times slower, to show up in disk writing benchmarks? I.e. I'm afraid that this regression doesn't affect 16+ GB RAM systems only; it just happens that it's clearly visible there. And it might even affect 64bit systems with even more RAM; but I don't have any such system to test with. Kind regards, Alkis root@pc:/proc/sys/vm# grep . * admin_reserve_kbytes:8192 block_dump:0 compact_unevictable_allowed:1 dirty_background_bytes:0 dirty_background_ratio:10 dirty_bytes:0 dirty_expire_centisecs:1500 dirty_ratio:20 dirtytime_expire_seconds:43200 dirty_writeback_centisecs:1500 drop_caches:3 extfrag_threshold:500 highmem_is_dirtyable:0 hugepages_treat_as_movable:0 hugetlb_shm_group:0 laptop_mode:0 legacy_va_layout:0 lowmem_reserve_ratio:256 32 32 max_map_count:65530 min_free_kbytes:34420 mmap_min_addr:65536 mmap_rnd_bits:8 nr_hugepages:0 nr_overcommit_hugepages:0 nr_pdflush_threads:0 oom_dump_tasks:1 oom_kill_allocating_task:0 overcommit_kbytes:0 overcommit_memory:0 overcommit_ratio:50 page-cluster:3 panic_on_oom:0 percpu_pagelist_fraction:0 stat_interval:1 swappiness:60 user_reserve_kbytes:131072 vdso_enabled:1 vfs_cache_pressure:100 watermark_scale_factor:10
On Fri 23-06-17 10:44:36, Alkis Georgopoulos wrote: > Στις 23/06/2017 10:13 πμ, ο Michal Hocko έγραψε: > >On Thu 22-06-17 12:37:36, Andrew Morton wrote: > > > >What is your dirty limit configuration. Is your highmem dirtyable > >(highmem_is_dirtyable)? > > > >>>This issue happens on systems with any 4.x kernel, i386 arch, 16+ GB RAM. > >>>It doesn't happen if we use 3.x kernels (i.e. it's a regression) or any > 64bit > >>>kernels (i.e. it only affects i386). > > > >I remember we've had some changes in the way how the dirty memory is > >throttled and 32b would be more sensitive to those changes. Anyway, I > >would _strongly_ discourage you from using 32b kernels with that much of > >memory. You are going to hit walls constantly and many of those issues > >will be inherent. Some of them less so but rather non-trivial to fix > >without regressing somewhere else. You can tune your system somehow but > >this will be fragile no mater what. > > > >Sorry to say that but 32b systems with tons of memory are far from > >priority of most mm people. Just use 64b kernel. There are more pressing > >problems to deal with. > > > > > > Hi, I'm attaching below all my settings from /proc/sys/vm. > > I think that the regression also affects 4 GB and 8 GB RAM i386 systems, but > not in an exponential manner; i.e. copies there are appear only 2-3 times > slower than they used to be in 3.x kernels. If the regression shows with 4-8GB 32b systems then the priority for fixing would be certainly much higher. > Now I don't know the kernel internals, but if disk copies show up to be 2-3 > times slower, and the regression is in memory management, wouldn't that mean > that the memory management is *hundreds* of times slower, to show up in disk > writing benchmarks? Well, it is hard to judge what the real problem is here but you have to realize that 32b system has some fundamental issues which come from how the memory has split between kernel (lowmem - 896MB at maximum) and highmem. The more memory you have the more lowmem you consume by kernel data structure. Just consider that ~160MB of this space is eaten by struct pages to describe 16GB of memory. There are other data structures which can only live in the low memory. > I.e. I'm afraid that this regression doesn't affect 16+ GB RAM systems only; > it just happens that it's clearly visible there. > > And it might even affect 64bit systems with even more RAM; but I don't have > any such system to test with. Not really. 64b systems do not need kernel/usespace split because the address space large enough. If there are any regressions since 3.0 then we are certainly interested in hearing about them. > root@pc:/proc/sys/vm# grep . * > dirty_ratio:20 > highmem_is_dirtyable:0 this means that the highmem is not dirtyable and so only 20% of the free lowmem (+ page cache in that region) is considered and writers might get throttled quite early (this might be a really low number when the lowmem is congested already). Do you see the same problem when enabling highmem_is_dirtyable = 1?
Στις 23/06/2017 02:38 μμ, ο Michal Hocko έγραψε: > this means that the highmem is not dirtyable and so only 20% of the free > lowmem (+ page cache in that region) is considered and writers might > get throttled quite early (this might be a really low number when the > lowmem is congested already). Do you see the same problem when enabling > highmem_is_dirtyable = 1? > Excellent advice! :) Indeed, setting highmem_is_dirtyable=1 completely eliminates the issue! Is that something that should be =1 by default, i.e. should I notify the Ubuntu developers that the defaults they ship aren't appropriate, or is it something that only 16+ GB RAM memory owners should adjust in their local configuration? Thanks a lot! Results of 2 test runs, with highmem_is_dirtyable=0 and 1: 1) echo 0 > highmem_is_dirtyable: ----------------------------------------------------------------------------- 16.04.2 LTS (Xenial Xerus), 4.8.0-56-generic, i386, RAM=16292548 ----------------------------------------------------------------------------- Copying /lib to 1: 18.60 Copying 1 to 2: 6.09 Copying 2 to 3: 6.04 Copying 3 to 4: 7.04 Copying 4 to 5: 6.28 Copying 5 to 6: 5.03 Copying 6 to 7: 6.50 Copying 7 to 8: 4.82 Copying 8 to 9: 5.49 Copying 9 to 10: 5.88 Copying 10 to 11: 5.09 Copying 11 to 12: 5.70 Copying 12 to 13: 5.19 Copying 13 to 14: 4.55 Copying 14 to 15: 4.69 Copying 15 to 16: 4.76 Copying 16 to 17: 5.38 Copying 17 to 18: 4.59 Copying 18 to 19: 4.26 Copying 19 to 20: 4.47 Copying 20 to 21: 4.32 Copying 21 to 22: 4.33 Copying 22 to 23: 5.55 Copying 23 to 24: 4.73 Copying 24 to 25: 4.80 Copying 25 to 26: 5.06 Copying 26 to 27: 16.84 Copying 27 to 28: 5.28 Copying 28 to 29: 5.45 Copying 29 to 30: 12.35 Copying 30 to 31: 5.90 Copying 31 to 32: 4.90 Copying 32 to 33: 4.76 Copying 33 to 34: 4.37 Copying 34 to 35: 5.82 Copying 35 to 36: 4.55 Copying 36 to 37: 8.80 Copying 37 to 38: 5.07 Copying 38 to 39: 5.69 Copying 39 to 40: 4.88 Copying 40 to 41: 5.26 Copying 41 to 42: 4.69 Copying 42 to 43: 5.10 Copying 43 to 44: 4.79 Copying 44 to 45: 4.54 Copying 45 to 46: 7.46 Copying 46 to 47: 5.54 Copying 47 to 48: 4.86 Copying 48 to 49: 6.12 Copying 49 to 50: 5.37 Copying 50 to 51: 7.63 Copying 51 to 52: 6.37 Copying 52 to 53: 5.81 ... 2) echo 1 > highmem_is_dirtyable: ----------------------------------------------------------------------------- 16.04.2 LTS (Xenial Xerus), 4.8.0-56-generic, i386, RAM=16292548 ----------------------------------------------------------------------------- Copying /lib to 1: 21.47 Copying 1 to 2: 5.54 Copying 2 to 3: 6.63 Copying 3 to 4: 4.69 Copying 4 to 5: 5.38 Copying 5 to 6: 8.50 Copying 6 to 7: 9.34 Copying 7 to 8: 8.78 Copying 8 to 9: 9.48 Copying 9 to 10: 10.89 Copying 10 to 11: 10.52 Copying 11 to 12: 11.28 Copying 12 to 13: 14.70 Copying 13 to 14: 17.71 Copying 14 to 15: 52.43 Copying 15 to 16: 92.52 ...
On Mon 26-06-17 08:28:07, Alkis Georgopoulos wrote: > Στις 23/06/2017 02:38 μμ, ο Michal Hocko έγραψε: > >this means that the highmem is not dirtyable and so only 20% of the free > >lowmem (+ page cache in that region) is considered and writers might > >get throttled quite early (this might be a really low number when the > >lowmem is congested already). Do you see the same problem when enabling > >highmem_is_dirtyable = 1? > > > > Excellent advice! :) > Indeed, setting highmem_is_dirtyable=1 completely eliminates the issue! > > Is that something that should be =1 by default, Unfortunatelly, this is not something that can be applied in general. This can lead to a premature OOM killer invocations. E.g. a direct write to the block device cannot use highmem, yet there won't be anything to throttle those writes properly. Unfortunately, our documentation is silent about this setting. I will post a patch later.
Στις 26/06/2017 08:46 πμ, ο Michal Hocko έγραψε: > Unfortunatelly, this is not something that can be applied in general. > This can lead to a premature OOM killer invocations. E.g. a direct write > to the block device cannot use highmem, yet there won't be anything to > throttle those writes properly. Unfortunately, our documentation is > silent about this setting. I will post a patch later. I should also note that highmem_is_dirtyable was 0 in all the 3.x kernel tests that I did; yet they didn't have the "slow disk writes" issue. I.e. I think that setting highmem_is_dirtyable=1 works around the issue, but is not the exact point which caused the regression that we see in 4.x kernels... -- Kind regards, Alkis Georgopoulos
On Mon 26-06-17 10:02:23, Alkis Georgopoulos wrote: > Στις 26/06/2017 08:46 πμ, ο Michal Hocko έγραψε: > >Unfortunatelly, this is not something that can be applied in general. > >This can lead to a premature OOM killer invocations. E.g. a direct write > >to the block device cannot use highmem, yet there won't be anything to > >throttle those writes properly. Unfortunately, our documentation is > >silent about this setting. I will post a patch later. > > > I should also note that highmem_is_dirtyable was 0 in all the 3.x kernel > tests that I did; yet they didn't have the "slow disk writes" issue. Yes this is possible. There were some changes in the dirty memory throttling that could lead to visible behavior changes. I remember that ab8fabd46f81 ("mm: exclude reserved pages from dirtyable memory") had noticeable effect. The patch is something that we really want and it is unnfortunate it has eaten some more from the dirtyable lowmem. > I.e. I think that setting highmem_is_dirtyable=1 works around the issue, but > is not the exact point which caused the regression that we see in 4.x > kernels... yes as I've said this is a workaround for for something that is an inherent 32b lowmem/highmem issue.
I've been working on a system with highmem_is_dirtyable=1 for a couple of hours. While the disk benchmark showed no performance hit on intense disk activity, there are other serious problems that make this workaround unusable. I.e. when there's intense disk activity, the mouse cursor moves with extreme lag, like 1-2 fps. Switching with alt+tab from e.g. thunderbird to pidgin needs 10 seconds. kswapd hits 100% cpu usage. Etc etc, the system becomes unusable until the disk activity settles down. I was testing via SSH so I hadn't noticed the extreme lag. All those symptoms go away when resetting highmem_is_dirtyable=0. So currently 32bit installations with 16 GB RAM have no option but to remove the extra RAM... About ab8fabd46f81 ("mm: exclude reserved pages from dirtyable memory"), would it make sense for me to compile a kernel and test if everything works fine without it? I.e. if we see that this caused all those regressions, would it be revisited? And an unrelated idea, is there any way to tell linux to use a limited amount of RAM for page cache, e.g. only 1 GB? Kind regards, Alkis Georgopoulos
On Thu 29-06-17 09:14:55, Alkis Georgopoulos wrote: > I've been working on a system with highmem_is_dirtyable=1 for a couple > of hours. > > While the disk benchmark showed no performance hit on intense disk > activity, there are other serious problems that make this workaround > unusable. > > I.e. when there's intense disk activity, the mouse cursor moves with > extreme lag, like 1-2 fps. Switching with alt+tab from e.g. thunderbird > to pidgin needs 10 seconds. kswapd hits 100% cpu usage. Etc etc, the > system becomes unusable until the disk activity settles down. > I was testing via SSH so I hadn't noticed the extreme lag. > > All those symptoms go away when resetting highmem_is_dirtyable=0. > > So currently 32bit installations with 16 GB RAM have no option but to > remove the extra RAM... Or simply install 64b kernel. You can keep 32b userspace if you need it but running 32b kernel will be always a fight. > About ab8fabd46f81 ("mm: exclude reserved pages from dirtyable memory"), > would it make sense for me to compile a kernel and test if everything > works fine without it? I.e. if we see that this caused all those > regressions, would it be revisited? The patch makes a lot of sense in general. I do not think we will revert it based on a configuration which is rare. We might come up with some tweaks in the dirty memory throttling but that area is quite tricky already. You can of course try to test without this commit applied (I believe you would have to go and checkout ab8fabd46f81 and revert the commit because a later revert sound more complicated to me. I might be wrong here because I haven't tried that myself though). > And an unrelated idea, is there any way to tell linux to use a limited > amount of RAM for page cache, e.g. only 1 GB? No.
Στις 29/06/2017 10:16 πμ, ο Michal Hocko έγραψε: > > Or simply install 64b kernel. You can keep 32b userspace if you need > it but running 32b kernel will be always a fight. Results with 64bit kernel on 32bit userspace: 16.04.2 LTS (Xenial Xerus), 4.4.0-83-generic, i386, RAM=16131400 Copying /lib to 1: 27.00 Copying 1 to 2: 9.37 Copying 2 to 3: 8.80 Copying 3 to 4: 9.13 Copying 4 to 5: 9.25 Copying 5 to 6: 8.08 Copying 6 to 7: 8.00 Copying 7 to 8: 8.85 Copying 8 to 9: 8.67 Copying 9 to 10: 8.55 Copying 10 to 11: 8.67 Copying 11 to 12: 8.15 Copying 12 to 13: 7.57 Copying 13 to 14: 8.05 Copying 14 to 15: 8.22 Copying 15 to 16: 8.35 Copying 16 to 17: 8.50 Copying 17 to 18: 8.30 Copying 18 to 19: 7.97 Copying 19 to 20: 7.81 Copying 20 to 21: 7.11 Copying 21 to 22: 8.20 Copying 22 to 23: 7.54 Copying 23 to 24: 7.96 Copying 24 to 25: 8.04 Copying 25 to 26: 7.87 Copying 26 to 27: 7.70 Copying 27 to 28: 8.33 Copying 28 to 29: 6.88 Copying 29 to 30: 7.18 It doesn't have the 32bit slowness issue, and it's "only" 2 times slower than the full 64bit installation (so maybe there's an additional delay involved somewhere in userspace)... ...but it's also hard to setup (e.g. Ubuntu doesn't allow 4.8 32bit kernel to coexist with 4.8 64bit because they have the same file names; so the 64 bit kernel needs to be 4.4), and it doesn't run some applications, e.g. VirtualBox or proprietary nvidia drivers... Thank you very much for your continuous input on this, we'll see what we can do to locally avoid the issue, probably just tell sysadmins to avoid using -pae with more than 8 GB RAM.
Greetings, I'd like to point out that this bug seems identical to the one I reported here: https://bugzilla.kernel.org/show_bug.cgi?id=110031 While setting vm.highmem_is_dirtyable=1 is indeed a workaround, it is in no way the reason for this bug, that got introduced in Linux v4.2.0 (v4.0.x and 4.1.x do work perfectly fine in this respect and with vm.highmem_is_dirtyable=0, so it's not anything related with any change between Linux v3 and Linux v4). Another notable fact, is that even 64 bits kernels were affected by this bug until Linux v4.8.4 came out, at which point the 64 bits compiled kernel worked fine again for me while the 32 bits one kept crawling after about 4Gb of disk writes occurred. Note also that this bug seems glibc-version-dependent since one Linux distro (Rosa 2012 64 bits, a Mandriva fork) was affected while another (PCLinuxOS, also a 64 bits distro forked from Mandriva) was not on my system (with the same kernel configuration). Definitely, there's something fishy in one of the v4.2 commits, that badly broke disk writes and/or disk caching code. I am extremely worried that after v4.1 will become unmaintained, people who, like me, do need to run some old 32 bits systems on their computer (e.g. to compile 32 bits binaries compatible with old systems) will be left without a solution other than running an outdated kernel with potential security holes. Please, pretty please, do not give up on fixing this show stopper bug...
I found a very nice summary of the problem, along with all the known workarounds, there: http://flaterco.com/kb/PAE_slowdown.html The author says that recompiling the kernel with this option on, solved the issue for him: VMSPLIT_2G: Processor type and features → Memory split = 2G/2G user/kernel split (was 3G/1G user/kernel split) I wonder, since 32bit kernel with 8+ GB RAM are unusable anyway (as is also mentioned in Documentation/vm/highmem.txt), would it be possible to change that setting upstream in the kernel, so that it enables itself during runtime, if it detects more than 8 GB RAM?
(In reply to Alkis Georgopoulos from comment #14) > I found a very nice summary of the problem, along with all the known > workarounds, there: > http://flaterco.com/kb/PAE_slowdown.html > > The author says that recompiling the kernel with this option on, solved the > issue for him: > VMSPLIT_2G: Processor type and features → Memory split = 2G/2G user/kernel > split (was 3G/1G user/kernel split) The problem is that this split causes many memory-hungry applications to become unusable, since they are already in a very tight fit with 3Gb available for their virtual address space. So, again, such a work around is totally unsuitable. No, the only solution is to find the commit which is the culprit for that *regression* which occurred between v4.1 and v4.2.0 and to plain and simply revert it.
Well, if my following thoughts are correct, 1) Noone will work on pinpointing the "commit with is the culprit". Or, it can't be pinointed because it's not a single commit but a collection of essential commits that need more and more low memory, 2) Setting Memory split = 2G/2G does solve the issue, 3) And that causes a regression with apps that need more than 2 GB of address space, then, personally I prefer that, to what we currently have. I believe the impact now is far more grave than some rare application that would need more than 2 GB under i386.
(In reply to Alkis Georgopoulos from comment #16) > Well, if my following thoughts are correct, > 1) Noone will work on pinpointing the "commit with is the culprit". Or, it > can't be pinointed because it's not a single commit but a collection of > essential commits that need more and more low memory, I'm no Linux kernel expert, but in addition to the fact that the bug was introduced in v4.2.0, there's also the fact that this bug did affect 64 bits kernels for me (on a Rosa 2012 64 bits installation, on the same computer with 32Gb of RAM) *and* that it got fixed for 64 bits kernels in Linux v4.8.4 (see my comment dated 2017-07-21 10:04:15 UTC)... This should help someone knowledgeable in this field of expertise to narrow dramatically the amount of potential causes and bogus code changes for this bug. > 2) Setting Memory split = 2G/2G does solve the issue, No, it's just a work around. > 3) And that causes a regression with apps that need more than 2 GB of > address space,> then, personally I prefer that, to what we currently have. > > I believe the impact now is far more grave than some rare application that > would need more than 2 GB under i386. Please, do not judge from you particular needs... It may be suitable for *your* needs, but it is not for mine.
Bug still present for 32 bits kernel in v4.13.2
Bug still present for 32 bits kernel in v4.14.2.
Bug still present for 32 bits kernel in v4.16.3, and with the "end of life" of v4.1 getting close, I'm worried I will be left without any option to run a maintained Linux kernel on 32 bits machines with 16 Gb of memory or more...
(switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). https://bugzilla.kernel.org/show_bug.cgi?id=196157 People are still hurting from this. It does seem a pretty major regression for highmem machines. I'm surprised that we aren't hearing about this from distros. Maybe it only affects a subset of highmem machines? Anyway, can we please take another look at it? Seems that we messed up highmem dirty pagecache handling in the 4.2 timeframe. Thanks.
On Thu, 4/19/18, Andrew Morton <akpm@linux-foundation.org> wrote: > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > https://bugzilla.kernel.org/show_bug.cgi?id=196157 > > People are still hurting from this. It does seem a pretty major > regression for highmem machines. > > I'm surprised that we aren't hearing about this from distros. Maybe it > only affects a subset of highmem machines? Supposition: it would only affect distros with a given glibc version (my affected machines run glibc v2.13) ? Please, also take note that I encountered this bug on the 64 bits flavor of the same distro (Rosa 2012), on 64 bits capable machines, with Linux v4.2+ and until Linux v4.8.4 was released (and another interesting fact is that another 64 bits distro one the same machines was not affected at all by that bug, which would reinforce my suspicion about a glibc-triggered and glibc-version-dependent bug). > Anyway, can we please take another look at it? Seems that we messed up > highmem dirty pagecache handling in the 4.2 timeframe. Oh, yes, please, do have a look ! :-D In the mean time, could you guys also consider extending the lifetime of the v4.1 kernel until this ***showstopper*** bug is resolved in the mainline kernel version ? Many (many, many, many) thanks in advance !
Bug still present in v4.17.0, and with v4.1.52 now being marked "end of life", I will be left without any upgrade path... Pretty please, consider keeping v4.1 live till this *showstopper* bug gets solved !
Bug still present for 32 bits kernel in v4.18.1, and now, v4.1 (last working Linux kernel for 32 bits machines with 16Gb or more RAM) has gone unmaintained...
On Fri 17-08-18 09:01:41, Thierry wrote: > Bug still present for 32 bits kernel in v4.18.1, and now, v4.1 (last > working Linux kernel for 32 bits machines with 16Gb or more RAM) has > gone unmaintained... Have you tried to set highmem_is_dirtyable as suggested elsewhere? I would like to stress out that 16GB with 32b kernels doesn't play really nice. Even small changes (larger kernel memory footprint) can lead to all sorts of problems. I would really recommend using 64b kernels instead. There shouldn't be any real reason to stick with 32b highmem based kernel for such a large beast. I strongly doubt the cpu itself would be 32b only.
On Fri, 8/17/18, Michal Hocko <mhocko@kernel.org> wrote: > Have you tried to set highmem_is_dirtyable as suggested elsewhere? I tried everything, and yes, that too, to no avail. The only solution is to limit the available RAM to less than 12Gb, which is just unacceptable for me. > I would like to stress out that 16GB with 32b kernels doesn't play really > nice. I would like to stress out that 32 Gb of RAM played totally nice and very smoothly with v4.1 and older kernels... This got broken in v4.2 and never repaired since. This is a very nasty regression, and my suggestion to keep v4.1 maintained till that regression would finally get worked around fell into deaf ears... > Even small changes (larger kernel memory footprint) can lead to all sorts of > problems. I would really recommend using 64b kernels instead. There shouldn't > be > any real reason to stick with 32bhighmem based kernel for such a large > beast. > I strongly doubt the cpu itself would be 32b only. The reasons are many (one of them dealing with being able to run old 32 bits Linux distros but without the bugs and security flaws of old, unmaintained kernels). But the reasons are not the problem here. The problem is that v4.2 introduced a bug (*) that was never fixed since. A shame, really. :-( (*) and that bug also affected 64 bits kernels, at first, mind you, till v4.8.4 got released; see my comment in my initial report here: https://bugzilla.kernel.org/show_bug.cgi?id=110031#c14
On Fri 17-08-18 11:29:45, Thierry wrote: > On Fri, 8/17/18, Michal Hocko <mhocko@kernel.org> wrote: > > > Have you tried to set highmem_is_dirtyable as suggested elsewhere? > > I tried everything, and yes, that too, to no avail. The only solution is to > limit the > available RAM to less than 12Gb, which is just unacceptable for me. > > > I would like to stress out that 16GB with 32b kernels doesn't play really > nice. > > I would like to stress out that 32 Gb of RAM played totally nice and very > smoothly > with v4.1 and older kernels... This got broken in v4.2 and never repaired > since. > This is a very nasty regression, and my suggestion to keep v4.1 maintained > till > that regression would finally get worked around fell into deaf ears... > > > Even small changes (larger kernel memory footprint) can lead to all sorts > of > > problems. I would really recommend using 64b kernels instead. There > shouldn't be > > any real reason to stick with 32bhighmem based kernel for such a large > beast. > > I strongly doubt the cpu itself would be 32b only. > > The reasons are many (one of them dealing with being able to run old 32 bits > Linux distros but without the bugs and security flaws of old, unmaintained > kernels). You can easily run 32b distribution on top of 64b kernels. > But the reasons are not the problem here. The problem is that v4.2 introduced > a > bug (*) that was never fixed since. > > A shame, really. :-( Well. I guess nobody is disputing this is really annoying. I do agree! On the other nobody came up with an acceptable solution. I would love to dive into solving this but there are so many other things to work on with much higher priority. Really, my todo list is huge and growing. 32b kernels with that much memory is simply not all that high on that list because there is a clear possibility of running 64b kernel on the hardware which supports. I fully understand your frustration and feel sorry about that but we are only so many of us working on this subsystem. If you are willing to dive into this then by all means. I am pretty sure you will find a word of help and support but be warned this is not really trivial. good luck
Created attachment 279043 [details] Disk read/write speed Disk (write) speed is normal, until midnight when updatedb/file integrity check runs
Created attachment 279045 [details] Disk operations per second Disk operations per second show (only) a slight decrease after updatedb
Created attachment 279047 [details] Disk merged operations per second Disk merged operations seem to fail after updatedb
Just want to comment on this... I'm testing a 4.14.71 32-bit kernel with 12GB of ram memory on Intel Xeon E5620 (2.4Ghz) with 8 cores/threads. I run a cp -a/rm of a /lib directory to tmp on the same filesystem in a loop. Everything runs smooth until lots of inodes/dentries are visited on the hard-disk due to updatedb/file integrity check at midnight. It is depicted quite clearly in the images I attached. For some reason the merged disk operations seem to fail after that, resulting in poor write performance. When updatedb runs at midnight, netdata shows an increase of slab memory until it tops at a maximum of 275MB, while kernel stack is around 3MB and page cache is around 17MB. The slab memory shows unreclaimable memory of 47 (constant) and a reclaimable part of 228MB (dynamic). When i flush the dentry/inode cache, the system goes back to normal behavior... echo 2 > /proc/sys/vm/page_cache During the slow down free -lm shows a decrease of available low memory, as if memory is being allocated constantly (and never freed) even though updatedb etc. was finished, and slap memory tops at 275MB. Memory at slow down: total used free shared buffers cached Mem: 12164 6034 6130 0 272 2044 Low: 631 575 55 High: 11533 5458 6074 -/+ buffers/cache: 3716 8448 Swap: 1991 0 1991 Memory after dentry/inode cache flush: total used free shared buffers cached Mem: 12164 4036 8127 0 279 236 Low: 631 408 223 High: 11533 3628 7904 -/+ buffers/cache: 3521 8643 Swap: 1991 0 1991 Some 2GB of high memory is freed at the same time when the cache is flushed.
I just wanted to confirm that this bug affected me when I tried to make a backup image on an i386 kernel 4.x with only 8GB of RAM. Write speeds to SSD became as slow as ~2MB/sec, on all 3 SSDs connected to the system, even on SSDs that weren't participating in the backup. After troubleshooting for much of the night, and reproducing the bug every time, I finally found the right hint, booted amd64, and everything worked ok. The problem is that the naive expectation is that i386 should be legacy, stable, good enough - or even preferred - for a simple job like imaging a disk on a 5 year old Intel PC (Core i7-3520M). It's odd to find out that the opposite is the case.
P.S. Some more details: - Thinkpad T430s with i7-3520M, 8GB RAM, and 3 SSDs (1x mSATA, 2x SATA) - OS: Parted Magic 2019-05, default options (legacy BIOS boot, 32-bit distro loaded entirely to RAM) - Started Clonezilla from within the Parted Magic desktop Of course, running a Linux Desktop entirely in RAM just to use text-based Clonezilla is rather resource-inefficient to begin with. I see the folly of that. But the idea was that I could multi-task easily that way, e.g. watch CPU loads and perhaps use gparted; no web browser or network traffic was involved. On the plus side of things, this method was also a great way to reproduce and analyze the bug, and if I had *not* used the graphical desktop, I probably couldn't have (easily) tested read and write rates and other performance parameters (CPU) during and after Clonezilla's work. Multi-tasking without a Desktop is a little beyond my typical routine, without a Desktop I would typically just use a single terminal window (root shell?). Most revealing troubleshooting: - Write speed troubleshooting with dd to various test files showed slowdown - Troubleshooting read rates with hdparm showed read the rates remained fast I am tempted to do some more testing, such as: - once the bug manifests itself again, check if writes to /dev/null are also slowed down? Seeing as it slowed down all disk writes permanently without recovery unless rebooted. - any other suggestions for test devices I could write to with dd to perhaps narrow down the bug? Thanks!
Kernel: 5.1.5-pmagic64 dd writes are: 19.2 GB/s from /dev/zero to /dev/null 194 MB/s from /dev/urandom to /dev/null 182 MB/s from /dev/zero to /media/sda2/test64 97.3 MB/s from /dev/urandom to /media/sda2/test64 Kernel: 5.1.5-pmagic (i386) dd writes are: 33.8 GB/s from /dev/zero to /dev/null 149 MB/s from /dev/urandom to /dev/null 147 MB/s from /dev/zero to /media/sda2/test32 72.3 MB/s from /dev/urandom to /media/sda2/test32 Now, after writing about 5GB with dd to /media/sda2/test32, during which the write rate apparently dropped steeply (had to cancel after 20 minutes), performance remains low as follows: 32.7 GB/s from /dev/zero to /dev/null (same) 149 MB/s from /dev/urandom to /dev/null (same) 1.0 MB/s from /dev/zero to /media/sda2/test32 (150x lower) 1.1 MB/s from /dev/urandom to /media/sda2/test32 (70x lower)
Hello everyone, I registered only to post my findings about this bug in case it helps : Im no expert in linux, nor in english ... :-) Preamble: Lubuntu distro tested (also in debian stretch 4.xxxx (uninstalled because of this)with kernel: Linux jerry 5.0.0-37-generic #40~18.04.1-Ubuntu SMP Thu Nov 14 12:06:39 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux hdd 4 sata drives (2 wd, 2 seagate) surface checked and with no defects. How to reproduce (in my experience) the bug: cp a couple of TBytes from disc to disc (sata hdds in my case), and randomly (sometimes after 5 minutes, sometimes after an hour), speed will drop to a halt (1-5 MBytes/sec), you can check this with iotop. My findings: this options in /etc/sysctl.conf : vm.dirty_background_ratio=5 vm.dirty_ratio=10 vm.dirty_bytes = 200000000 vm.swappiness=60 copy regain speed up to 75megs/sec (once every 45 seconds it drops to 0 but it regain's speed again, after some part of the buffers get cleared, you can see this behaviour in htop because a line in yellow (buffers) is deleted at the end in memory section, and iotop shows a gain in speed after that) but, if you put a cron tasks like those at the same time (spaced 15 seconds in time each or 30 seconds if you want): */1 * * * * /bin/echo 2 > /proc/sys/vm/drop_caches */1 * * * * ( sleep 30; /bin/echo 2 > /proc/sys/vm/drop_caches ) */1 * * * * ( sleep 15; /bin/echo 2 > /proc/sys/vm/drop_caches ) */1 * * * * ( sleep 45; /bin/echo 2 > /proc/sys/vm/drop_caches ) The copy speed gets to the teorethical max speed of my hdd (about 130 MB/sec) every now and then, so i think the problem is in the disk buffering portion of the kernel, because when its empty the copy is at full speed and then gets back to slow again when it fills up. I hope it helps to find the bug, because Im not going back to windows 10. Also the bug seems to be also in freebsd ... so ... probably an opensource driver? thanks, and sorry for my poor english, i hope you comprehend with this explanation what I'm trying to explain. free bsd search: https://www.google.com/search?q=freebsd+slow+disk+writes&oq=freebsd+slow+disk+writes for similar problems.
I found a soution that works for every kernel I tried: Download the source kernel you want from: https://cdn.kernel.org/pub/linux/kernel/v5.x/ and compile it yourself and ... it works OK!!!! At the moment tried with kernel 4.16 and 5.4 and no problem at all.
(In reply to fernando from comment #37) > I found a soution that works for every kernel I tried: > > Download the source kernel you want from: > > https://cdn.kernel.org/pub/linux/kernel/v5.x/ > > and compile it yourself and ... it works OK!!!! > > At the moment tried with kernel 4.16 and 5.4 and no problem at all. I'm afraid the problem is NOT solved. The last kernel version I tried (v5.4.4) did see an improvement (i.e. the problem surfaces later, after a larger amount of data writes have occurred; e.g. in my test case compilation, it appears at 50% instead at 25% of the total compilation), but the problem is still there, and only kernels v4.1.x and older are exempt of it. Note that the amount of RAM you are using does also impact how fast the problem arises (or whether it will arise at all). I'm using 32 Gb here.