Bug 196729

Summary: System becomes unresponsive when swapping - Regression since 4.10.x
Product: Memory Management Reporter: Steven Haigh (netwiz)
Component: Page AllocatorAssignee: Andrew Morton (akpm)
Status: RESOLVED OBSOLETE    
Severity: normal CC: admin, auxsvr, bugzilla, cachapa, cfeck, code, damir.esenberlin, daniel.m.jordan, dion, dreamer.tan+kernel, dushistov, egorfedorovichletov, freepng.site, gdesmott, goodmirek, haydn.reysenbach, hi-angel, howaboutsynergy, iam, ikalvachev, iodreamify, jan, jhasse, jim, kortrax11, lander.noterman, lskrejci, lukycrociato, m.novosyolov, marc, nemesis, netwiz, russianneuromancer, taz.007, teppot, ultra10e, xjjzyx, ying.huang
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.11.x / 4.12.x Subsystem:
Regression: Yes Bisected commit-id:
Attachments: vmstat-4.10.17-10Gb-noswap.log (OK - OOM running)
vmstat-4.10.17-10Gb.log (OK with swapping)
vmstat-20Gb.log (OK - all in RAM)
vmstat-10Gb.log (NOT OK - System Unresponsive)
read_vmstat.c
8Gb-noswap.tar.gz
8Gb-swap-on-file.tar.gz
8Gb-swap-on-ssd.tar.gz
signature.asc
signature.asc
log of read_vmstat when filling RAM with new tabs in Chromium

Description Steven Haigh 2017-08-22 11:17:08 UTC
I have 10Gb of RAM in this system and run Fedora 26. If I launch Cities: 
Skylines with no swap space, things run well performance wise until I get an 
OOM - and it all dies - which is expected.

When I turn on swap to /dev/sda2 which resides on an SSD, I get complete 
system freezes while swap is being accessed.

The first swap was after loading a saved game, then launching kmail in the 
background. This caused ~500Mb to be swapped to /dev/sda2 on an SSD. The 
system froze for about 8 minutes - barely being able to move the mouse. The 
HDD LED was on constantly during the entire time.

To hopefully rule out the above glibc issue, I started the game via jemalloc - 
but experienced even more severe freezes while swapping. I gave up waiting 
after 13 minutes of non-responsiveness - not even being able to move the mouse 
properly.

During these hangs, I could typed into a Konsole window, and some of the 
typing took 3+ minutes to display on the screen (yay for buffers?).

I have tested this with both the default vm.swappiness values, as well as the 
following:
vm.swappiness = 1
vm.min_free_kbytes = 32768
vm.vfs_cache_pressure = 60

I noticed that when I do eventually get screen updates, all 8 cpus (4 cores / 
2 threads) show 100% CPU usage - and kswapd is right up there in the process 
list for CPU usage. Sadly I haven't been able to capture this information 
fully yet due to said unresponsiveness.

(more to come in comments & attachments)
Comment 1 Steven Haigh 2017-08-22 11:18:57 UTC
First - using kernel 4.10.17 - which does not show any issues in swapping:

I tried doing: swapoff /dev/sda2

Attached output as vmstat-4.10.17-10Gb-noswap.log

18:27:00 - Launched Cities: Skylines
18:27:30 - Started loading the saved game
18:28:25 - About this time the game started doing its thing. Started scrolling 
around.
18:28:47 - System stopped responding and then the C:S was killed by the OOM 
handler
Comment 2 Steven Haigh 2017-08-22 11:19:41 UTC
Created attachment 258045 [details]
vmstat-4.10.17-10Gb-noswap.log (OK - OOM running)
Comment 3 Steven Haigh 2017-08-22 11:21:32 UTC
Created attachment 258047 [details]
vmstat-4.10.17-10Gb.log (OK with swapping)

Second test, same kernel with swap turned on:

I have attached the vmstat output that goes with the following timestamps for 
system utilisation:

15:32:30 - Launch Skylines
15:33:00 - Load the saved game
15:34:11 - Saved game loaded ok.
15:35:00 - Launch Chrome.
15:35:36 - Chrome launched - System responding ok.
15:36:00 - Browsing a few web sites
15:36:50 - Exit Chrome
15:37:30 - Exit Cities: Skylines.

You'll note that there are very few missing vmstat lines - however I did 
notice the following missing:
        15:35:10
        15:35:12
        15:35:15
        15:35:20
        15:35:26
        15:35:29
        15:35:30

Attachment is vmstat-4.10.17-10Gb.log
Comment 4 Steven Haigh 2017-08-22 11:24:02 UTC
Created attachment 258049 [details]
vmstat-20Gb.log (OK - all in RAM)

Now using kernel 4.11.x (same happens with 4.12.x) - and testing with 20Gb of RAM in the system - meaning no swapping.

Attached as: vmstat-20Gb.log

Timestamps of events:
21:57 - launch the game from within Steam.
21:58:00 - Load the saved game.
21:58:48 - Saved game is loaded and I'm scrolling around in the map.
22:00:00 - Hit the quit to desktop button.
22:00:31 - Am back to desktop with all RAM free again.
Comment 5 Steven Haigh 2017-08-22 11:25:59 UTC
Created attachment 258051 [details]
vmstat-10Gb.log (NOT OK - System Unresponsive)

I now drop back to 10Gb of RAM to test the swapping under 4.11.x kernel.

Log attached as vmstat-10Gb.log

Timestamps:
22:10:00 - Launched of the game from within Steam
22:11:00 - Load the same saved game from the previous log
22:12:01 - Saved game is loaded and I can scroll around. Noted a slight
pause when swapd went to 256 - but otherwise all is well.
22:13:00 - Launched Google Chrome browser to make the system swap.

After this point, the whole system went to hell. You'll note many missing
vmstat entries up until around 22:22 when I managed to exit from the game
back to desktop via the normal means (and not getting annoyed and doing a
pkill from tty2).

As such, the system went nuts for ~9 minutes until I was able to exit the
game to stop things going nuts.

I note that with 20Gb RAM - as the system never touches swap, I can still
play the game, browse the web with the Chrome browser, read / write
email, and even watch a DVB-T broadcast in VLC without having any more
than a minor pause in the game for less than a second.
Comment 6 Steven Haigh 2017-08-22 11:29:50 UTC
So overall, this seems to indicate a regression between kernel 4.10.x (I'm pretty sure I tested all ok with 4.10.15?) and the newer 4.11 and 4.12 builds.

I made contact with Rik van Riel and Ying Huang (which I will attempt to add to this as a CC for comment?) - they don't believe it is a swapping issue - however Rik seems to believe that:

> > There is ZERO swap space in use.
> >
> > In other words, it is not actually swapping,
> > but thrashing through the page cache.

You may want to email the people who worked on page cache
replacement stuff recently, and the linux-mm mailing list
as well.
Comment 7 Andrew Morton 2017-08-22 22:56:22 UTC
(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Tue, 22 Aug 2017 11:17:08 +0000 bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=196729
> 
>             Bug ID: 196729
>            Summary: System becomes unresponsive when swapping - Regression
>                     since 4.10.x
>            Product: Memory Management
>            Version: 2.5
>     Kernel Version: 4.11.x / 4.12.x
>           Hardware: All
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Page Allocator
>           Assignee: akpm@linux-foundation.org
>           Reporter: netwiz@crc.id.au
>         Regression: No

So it's "Regression: yes".  More info at the bugzilla link.

> I have 10Gb of RAM in this system and run Fedora 26. If I launch Cities: 
> Skylines with no swap space, things run well performance wise until I get an 
> OOM - and it all dies - which is expected.
> 
> When I turn on swap to /dev/sda2 which resides on an SSD, I get complete 
> system freezes while swap is being accessed.
> 
> The first swap was after loading a saved game, then launching kmail in the 
> background. This caused ~500Mb to be swapped to /dev/sda2 on an SSD. The 
> system froze for about 8 minutes - barely being able to move the mouse. The 
> HDD LED was on constantly during the entire time.
> 
> To hopefully rule out the above glibc issue, I started the game via jemalloc
> - 
> but experienced even more severe freezes while swapping. I gave up waiting 
> after 13 minutes of non-responsiveness - not even being able to move the
> mouse 
> properly.
> 
> During these hangs, I could typed into a Konsole window, and some of the 
> typing took 3+ minutes to display on the screen (yay for buffers?).
> 
> I have tested this with both the default vm.swappiness values, as well as the 
> following:
> vm.swappiness = 1
> vm.min_free_kbytes = 32768
> vm.vfs_cache_pressure = 60
> 
> I noticed that when I do eventually get screen updates, all 8 cpus (4 cores / 
> 2 threads) show 100% CPU usage - and kswapd is right up there in the process 
> list for CPU usage. Sadly I haven't been able to capture this information 
> fully yet due to said unresponsiveness.
> 
> (more to come in comments & attachments)
> 
> -- 
> You are receiving this mail because:
> You are the assignee for the bug.
Comment 8 Michal Hocko 2017-08-23 13:54:38 UTC
Created attachment 258067 [details]
read_vmstat.c

On Tue 22-08-17 15:55:30, Andrew Morton wrote:
> 
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
> 
> On Tue, 22 Aug 2017 11:17:08 +0000 bugzilla-daemon@bugzilla.kernel.org wrote:
[...]
> Sadly I haven't been able to capture this information 
> > fully yet due to said unresponsiveness.

Please try to collect /proc/vmstat in the bacground and provide the
collected data. Something like

while true
do
	cp /proc/vmstat > vmstat.$(date +%s)
	sleep 1s
done

If the system turns out so busy that it won't be able to fork a process
or write the output (which you will see by checking timestamps of files
and looking for holes) then you can try the attached proggy
./read_vmstat output_file timeout output_size

Note you might need to increase the mlock rlimit to lock everything into
memory.
Comment 9 Steven Haigh 2017-08-23 14:41:38 UTC
Created attachment 258069 [details]
8Gb-noswap.tar.gz

On Wednesday, 23 August 2017 11:38:48 PM AEST Michal Hocko wrote:
> On Tue 22-08-17 15:55:30, Andrew Morton wrote:
> > (switched to email.  Please respond via emailed reply-to-all, not via the
> > bugzilla web interface).
> 
> > On Tue, 22 Aug 2017 11:17:08 +0000 bugzilla-daemon@bugzilla.kernel.org 
wrote:
> [...]
> 
> > Sadly I haven't been able to capture this information
> > 
> > > fully yet due to said unresponsiveness.
> 
> Please try to collect /proc/vmstat in the bacground and provide the
> collected data. Something like
> 
> while true
> do
>       cp /proc/vmstat > vmstat.$(date +%s)
>       sleep 1s
> done
> 
> If the system turns out so busy that it won't be able to fork a process
> or write the output (which you will see by checking timestamps of files
> and looking for holes) then you can try the attached proggy
> ./read_vmstat output_file timeout output_size
> 
> Note you might need to increase the mlock rlimit to lock everything into
> memory.

Thanks Michal,

I have upgraded PCs since I initially put together this data - however I was 
able to get strange behaviour by pulling out an 8Gb RAM stick in my new system 
- leaving it with only 8Gb of RAM.

All these tests are performed with Fedora 26 and kernel 4.12.8-300.fc26.x86_64

I have attached 3 files with output.

8Gb-noswap.tar.gz contains the output of /proc/vmstat running on 8Gb of RAM 
with no swap. Under this scenario, I was expecting the OOM reaper to just kill 
the game when memory allocated became too high for the amount of physical RAM. 
Interestingly, you'll notice a massive hang in the output before the game is 
terminated. I didn't see this before.

8Gb-swap-on-file.tar.gz contains the output of /proc/vmstat still with 8Gb of 
RAM - but creating a file with swap on the PCIe SSD /swapfile with size 8Gb 
via:
	# dd if=/dev/zero of=/swapfile bs=1G count=8
	# mkswap /swapfile
	# swapon /swapfile

Some times (all in UTC+10):
23:58:30 - Start loading the saved game
23:59:38 - Load ok, all running fine
00:00:15 - Load Chrome
00:01:00 - Quit the game

The game seemed to run ok with no real issue - and a lot was swapped to the 
swap file. I'm wondering if it was purely the speed of the PCIe SSD that 
caused this appearance - as the creation of the file with dd completed at 
~1.4GB/sec.

8Gb-swap-on-ssd.tar.gz contains adding a 32Gb SATA based SSD to the system and 
using the entire block device as swap via:
	# mkswap -f /dev/sda
	# swapon /dev/sda

There are many pauses and unresponsiveness issues while this was loading - 
however we eventually got there.

Some timings (all in UTC+10 again):
00:06:33 - Load the saved game
00:11:22 - Saved game loaded - somewhat responsive
00:12:00 - Load Chrome
00:13:07 - Quit the game + chrome

For the sake of information, the following is a speed test on the SSD in 
question:
# dd if=/dev/zero of=/dev/sda bs=1M count=8192 conv=fsync
8192+0 records in
8192+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 44.923 s, 191 MB/s
# dd if=/dev/sda of=/dev/null bs=1M count=8192 conv=fsync
dd: fsync failed for '/dev/null': Invalid argument
8192+0 records in
8192+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 30.7414 s, 279 MB/s

Running the game on the exact same system with 16Gb of RAM and no swap works 
perfectly - even with multitasking - as we never end up filling physical RAM.

As there is some data missing though, should I still attempt to compile + run 
the program provided? I'm not quite clear on the mlock rlimit mention - I 
haven't really had to debug anything like this before.
Comment 10 Steven Haigh 2017-08-23 14:41:39 UTC
Created attachment 258071 [details]
8Gb-swap-on-file.tar.gz
Comment 11 Steven Haigh 2017-08-23 14:41:39 UTC
Created attachment 258073 [details]
8Gb-swap-on-ssd.tar.gz
Comment 12 Steven Haigh 2017-08-23 14:41:39 UTC
Created attachment 258075 [details]
signature.asc
Comment 13 Michal Hocko 2017-08-24 12:41:44 UTC
On Thu 24-08-17 00:30:40, Steven Haigh wrote:
> On Wednesday, 23 August 2017 11:38:48 PM AEST Michal Hocko wrote:
> > On Tue 22-08-17 15:55:30, Andrew Morton wrote:
> > > (switched to email.  Please respond via emailed reply-to-all, not via the
> > > bugzilla web interface).
> > 
> > > On Tue, 22 Aug 2017 11:17:08 +0000 bugzilla-daemon@bugzilla.kernel.org 
> wrote:
> > [...]
> > 
> > > Sadly I haven't been able to capture this information
> > > 
> > > > fully yet due to said unresponsiveness.
> > 
> > Please try to collect /proc/vmstat in the bacground and provide the
> > collected data. Something like
> > 
> > while true
> > do
> >     cp /proc/vmstat > vmstat.$(date +%s)
> >     sleep 1s
> > done
> > 
> > If the system turns out so busy that it won't be able to fork a process
> > or write the output (which you will see by checking timestamps of files
> > and looking for holes) then you can try the attached proggy
> > ./read_vmstat output_file timeout output_size
> > 
> > Note you might need to increase the mlock rlimit to lock everything into
> > memory.
> 
> Thanks Michal,
> 
> I have upgraded PCs since I initially put together this data - however I was 
> able to get strange behaviour by pulling out an 8Gb RAM stick in my new
> system 
> - leaving it with only 8Gb of RAM.
> 
> All these tests are performed with Fedora 26 and kernel
> 4.12.8-300.fc26.x86_64
> 
> I have attached 3 files with output.
> 
> 8Gb-noswap.tar.gz contains the output of /proc/vmstat running on 8Gb of RAM 
> with no swap. Under this scenario, I was expecting the OOM reaper to just
> kill 
> the game when memory allocated became too high for the amount of physical
> RAM. 
> Interestingly, you'll notice a massive hang in the output before the game is 
> terminated. I didn't see this before.

I have checked few gaps. E.g. vmstat.1503496391 vmstat.1503496451 which
is one minute. The most notable thing is that there are only very few
pagecache pages
			[base]		[diff]
nr_active_file  	1641    	3345
nr_inactive_file        1630    	4787

So there is not much to reclaim without swap. The more important thing
is that we keep reclaiming and refaulting that memory

workingset_activate     5905591 	1616391
workingset_refault      33412538        10302135
pgactivate      	42279686        13219593
pgdeactivate    	48175757        14833350

pgscan_kswapd   	379431778       126407849
pgsteal_kswapd  	49751559        13322930

so we are effectivelly trashing over the very small amount of
reclaimable memory. This is something that we cannot detect right now.
It is even questionable whether the OOM killer would be an appropriate
action. Your system has recovered and then it is always hard to decide
whether a disruptive action is more appropriate. One minute of
unresponsiveness is certainly annoying though. Your system is obviously
under provisioned to load you want to run obviously.

It is quite interesting to see that we do not really have too many
direct reclaimers during this time period
allocstall_normal       30      	1
allocstall_movable      490     	88
pgscan_direct_throttle  0       	0
pgsteal_direct  	24434   	4069
pgscan_direct   	38678   	5868
 
> 8Gb-swap-on-file.tar.gz contains the output of /proc/vmstat still with 8Gb of 
> RAM - but creating a file with swap on the PCIe SSD /swapfile with size 8Gb 
> via:
>       # dd if=/dev/zero of=/swapfile bs=1G count=8
>       # mkswap /swapfile
>       # swapon /swapfile
> 
> Some times (all in UTC+10):
> 23:58:30 - Start loading the saved game
> 23:59:38 - Load ok, all running fine
> 00:00:15 - Load Chrome
> 00:01:00 - Quit the game
> 
> The game seemed to run ok with no real issue - and a lot was swapped to the 
> swap file. I'm wondering if it was purely the speed of the PCIe SSD that 
> caused this appearance - as the creation of the file with dd completed at 
> ~1.4GB/sec.

Swap IO tends to be really scattered and the IO performance is not really
great even on a fast storage AFAIK.
 
Anyway your original report sounded like a regression. Were you able to
run the _same_ workload on an older kernel without these issues?
Comment 14 Steven Haigh 2017-08-24 14:20:08 UTC
Created attachment 258079 [details]
signature.asc

On Thursday, 24 August 2017 10:41:39 PM AEST Michal Hocko wrote:
> On Thu 24-08-17 00:30:40, Steven Haigh wrote:
> > On Wednesday, 23 August 2017 11:38:48 PM AEST Michal Hocko wrote:
> > > On Tue 22-08-17 15:55:30, Andrew Morton wrote:
> > > > (switched to email.  Please respond via emailed reply-to-all, not via
> > > > the
> > > > bugzilla web interface).
> > > > 
> > > > On Tue, 22 Aug 2017 11:17:08 +0000 bugzilla-daemon@bugzilla.kernel.org
> > 
> > wrote:
> > > [...]
> > > 
> > > > Sadly I haven't been able to capture this information
> > > > 
> > > > > fully yet due to said unresponsiveness.
> > > 
> > > Please try to collect /proc/vmstat in the bacground and provide the
> > > collected data. Something like
> > > 
> > > while true
> > > do
> > > 
> > >   cp /proc/vmstat > vmstat.$(date +%s)
> > >   sleep 1s
> > > 
> > > done
> > > 
> > > If the system turns out so busy that it won't be able to fork a process
> > > or write the output (which you will see by checking timestamps of files
> > > and looking for holes) then you can try the attached proggy
> > > ./read_vmstat output_file timeout output_size
> > > 
> > > Note you might need to increase the mlock rlimit to lock everything into
> > > memory.
> > 
> > Thanks Michal,
> > 
> > I have upgraded PCs since I initially put together this data - however I
> > was able to get strange behaviour by pulling out an 8Gb RAM stick in my
> > new system - leaving it with only 8Gb of RAM.
> > 
> > All these tests are performed with Fedora 26 and kernel
> > 4.12.8-300.fc26.x86_64
> > 
> > I have attached 3 files with output.
> > 
> > 8Gb-noswap.tar.gz contains the output of /proc/vmstat running on 8Gb of
> > RAM
> > with no swap. Under this scenario, I was expecting the OOM reaper to just
> > kill the game when memory allocated became too high for the amount of
> > physical RAM. Interestingly, you'll notice a massive hang in the output
> > before the game is terminated. I didn't see this before.
> 
> I have checked few gaps. E.g. vmstat.1503496391 vmstat.1503496451 which
> is one minute. The most notable thing is that there are only very few
> pagecache pages
>                       [base]          [diff]
> nr_active_file        1641            3345
> nr_inactive_file        1630          4787
> 
> So there is not much to reclaim without swap. The more important thing
> is that we keep reclaiming and refaulting that memory
> 
> workingset_activate     5905591       1616391
> workingset_refault      33412538        10302135
> pgactivate            42279686        13219593
> pgdeactivate          48175757        14833350
> 
> pgscan_kswapd         379431778       126407849
> pgsteal_kswapd        49751559        13322930
> 
> so we are effectivelly trashing over the very small amount of
> reclaimable memory. This is something that we cannot detect right now.
> It is even questionable whether the OOM killer would be an appropriate
> action. Your system has recovered and then it is always hard to decide
> whether a disruptive action is more appropriate. One minute of
> unresponsiveness is certainly annoying though. Your system is obviously
> under provisioned to load you want to run obviously.
> 
> It is quite interesting to see that we do not really have too many
> direct reclaimers during this time period
> allocstall_normal       30            1
> allocstall_movable      490           88
> pgscan_direct_throttle  0             0
> pgsteal_direct        24434           4069
> pgscan_direct         38678           5868

Yes, I understand that the system is really not suitable - however I believe 
the test is useful - even from an informational point of view :)

> > 8Gb-swap-on-file.tar.gz contains the output of /proc/vmstat still with 8Gb
> > of RAM - but creating a file with swap on the PCIe SSD /swapfile with
> > size 8Gb> 
> > via:
> >     # dd if=/dev/zero of=/swapfile bs=1G count=8
> >     # mkswap /swapfile
> >     # swapon /swapfile
> > 
> > Some times (all in UTC+10):
> > 23:58:30 - Start loading the saved game
> > 23:59:38 - Load ok, all running fine
> > 00:00:15 - Load Chrome
> > 00:01:00 - Quit the game
> > 
> > The game seemed to run ok with no real issue - and a lot was swapped to
> > the
> > swap file. I'm wondering if it was purely the speed of the PCIe SSD that
> > caused this appearance - as the creation of the file with dd completed at
> > ~1.4GB/sec.
> 
> Swap IO tends to be really scattered and the IO performance is not really
> great even on a fast storage AFAIK.
> 
> Anyway your original report sounded like a regression. Were you able to
> run the _same_ workload on an older kernel without these issues?

When I try the same tests with swap on an SSD under kernel 4.10.x (I believe 
the latest I tried was 4.10.25?) - then swap using the SSD did not cause any 
issues or periods of system unresponsiveness.

The file attached in the original bug report "vmstat-4.10.17-10Gb.log" was 
taken on my old system with 10Gb of RAM - and there were no significant pauses 
while swapping.

I do find it interesting that the newer '8Gb-swap-on-file.tar.gz' does not 
show any issues. I wonder if it would be helpful to attempt the same using a 
file on the SSD that was a swap disk in the '8Gb-swap-on-ssd.tar.gz' so we 
have a constant device - but with a file on the SSD instead of the entire 
block device. That would at least expose any issues on the same device in file 
vs block mode? Or maybe even if there's a difference just having the file on a 
much (much!) faster drive?
Comment 15 Huang Ying 2017-08-28 07:14:14 UTC
Compared with

a) vmstat-4.10.17-10Gb.log (OK with swapping) and
b) vmstat-10Gb.log (NOT OK - System Unresponsive)

The si/so is low in both files.  And si/so in a) is higher than that of b), so the problem may be we swap less than before?

The bi is kept high in b).  I guess we encountered thrashing for file pages.
Comment 16 Steven Haigh 2017-11-27 02:37:31 UTC
To give this a bit of a nudge, I've been seeing reports of others having similar issues. See:
https://www.reddit.com/r/Fedora/comments/7f0dht/system_freezes_for_45min_in_lowmemory_conditions/

Also lodged on the RH BZ a while ago:
https://bugzilla.redhat.com/show_bug.cgi?id=1472336
Comment 17 Daniel 2018-04-02 01:47:19 UTC
Hi,

I’ve experienced what I believe is the same problem. The problem has gone away completely for me after I bumped vm.min_free_kbytes way up to 393216.

As soon as the system ran out of physical memory, the system would freeze for at least 2 minutes and often up to 45 minutes. GNOME desktop would stop. I could move the mouse cursor, and ping the system from a remote computer; but not connect over SSH or do anything other than wave the mouse about. The system clock on the top of GNOME would stop updating for 45 minutes. (Maaaybe it would move forward 1 minute after 20 minutes and still be 19 minutes out of sync.)

I've been having this issue for years on multiple different computer configurations with 8+ GiB of memory and large SWAP partitions. I never saw more than maybe 5 MiB in use on the SWAP partition. After tuning min_free_kbytes, the SWAP partition is now being used properly and the system only does the occasional (and expected) 1 second stutter when running low on physical memory.

I also run Fedora and have kept up with the latest stable release.

Aside 1: The issue would persist with SWAPOFF, just like Steven Haigh describes.
Aside 2: The problem happen much more frequently when I used BtrFS. After switching to XFS, this happen less frequently (weekly instead of daily).
Comment 18 lou 2018-05-07 00:49:26 UTC
Please refer also to this bug report. It is the same problem and has existed for eleven (11!) years if one can believe that.

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/159356

I personally experience this on two 4GB laptops running live versions (no swap) of Debian 8.6 - 9.4, Fedora 25 - 28, Ubuntu, with a myriad of shells; Gnome, Mate, Cinnamon, KDE, etc. (One laptop I expanded to 8GB now but it doesn't matter- it just take a little longer to freeze the system.)

Various browsers from Firefox52 to current Developers 60, Chrome and Chromium

Certain combos eat up memory faster (Gnome has a memory leak for example in which it consumers memory for every window drawn and NEVER relinquishes that memory, without a restart to gnome-shell), new Firefox or Chrom* vs older ESR versions (54 and under) of FF eat up memory much more quickly.

Under the best combo/circumstances, I can open up 25-30 FF tabs before the system SUDDENLY SEIZES (observe the "USB live stick" light flashing non-stop, as if swapping, even tho no swap on Live versions). If not caught within literal seconds to Ctrl-Alt-F5 to an opened root console where I can kill the FF ps and save this "live" session, the computer is entirely unresponsive and requires power cycling.

In rare instances, some 10's of minutes later or even hours (4,8 12) later, the system *might* finally respond to the request to drop to the console. Keystrokes to issue the kill command can take minutes per key, but if successful, I've seen the load reported after the kill as high as 75.

Truly amazing.


It's difficult to fathom this critical a bug in memory management has gone un-addressed/un-noticed for so long but alas. I can't recall but I've read this behavior ONLY occurs on 64-bit kernels, and is un-reproducible on 32-bit kernels.

Also, on non-"live" installs, with swap configured, one can watch the hard drive light come on and remain solid to the same effect. Power cycle time. I've read from others, that they've determined swap isn't even really being used, so not sure what the "read" thrashing going on is (and it must be read thrashing because on Live versions there's no swap and the USB drive light is steady active also).

I just run Linux to not run Windows. Basic browsing, text document editing, file management, a few cli's and an instant messaging program typically opened simultaneously. Nothing computationally heavy but memory intensive (at least for the web browser) for sure.

STILL- difficult to believe the OS cannot handle this situation with some sort of message, or killing a window/throwing an error about an opened Firefox tab or something-- rather, it simply fills up the memory (I watch on gnome-system-status/Resources tab now) to 99% and then it's too late.

I really don't know technically how the Out Of Memory killer works/is supposed to work, but it sure isn't doing anything here.
Comment 19 SlayerProof32 2018-05-18 01:41:55 UTC
I experience this too. Ive tested using Kernel version Kernel 4.17 rc8, Kernel 4.16.8, Kernel 4.12.8, and Kernel 4.14 across Manjaro Linux, Ubuntu Linux, Opensuse leap 15 and Fedora 28. 

Steps to trigger:
-Open firefox with many tabs, or any other high memory usage program
-Wait a second
-System freezes. Sometimes the only fix is a hard reboot

Other findings:
-I notice really high cpu load averages if the system unfreezes
-If the system is not frozen, it is highly unresponsive on high memory usage when swapping
-Hard drive indicator light stays solidly on when system is frozen (excessive hard disk use)
-The reason the system freezes is because it is swapping 

Tested on a 
Intel i5 520m with 4gb ram/ 4gb swap (Lenovo t410)
Intel E6400 with 3gb ram/ 3gb swap 

This bug is really hard to deal with because it usually requires a hard restart. Please fix ASAP if possible
Comment 20 SlayerProof32 2018-05-18 01:44:50 UTC
My fedora report here : https://bugzilla.redhat.com/show_bug.cgi?id=1577528
Comment 21 SlayerProof32 2018-05-18 01:51:18 UTC
If someone gives me some debugging instructions, ill gladly follow them
Comment 22 SlayerProof32 2018-05-19 01:36:26 UTC
https://askubuntu.com/questions/41778/computer-freezing-on-almost-full-ram-possibly-disk-cache-problem

https://unix.stackexchange.com/questions/28175/system-hanging-when-it-runs-out-of-memory

https://bbs.archlinux.org/viewtopic.php?id=231087

Over 5 (Non bug reporting) websites report this bug, with recent posts up to 2018. This memory issue is FATAL to people who rely on linux for gaming/working. Please don't let this issue go unnoticed for 11 more years. It is critical to system function that memory management is good. The issue actually doesn't only occur when swapping. Swapping however greatly aggravates this bug
Comment 23 lou 2018-05-19 20:32:29 UTC
@SlayerProof32 #19
> -The reason the system freezes is because it is swapping 

I don't think this is the case at least not in the traditional sense. if you read my comment above yours, I'm running Live versions of Linux where no swap is configured.

I suppose it could be swapping portions of code into and out of regular memory, but there's no hard disk writes. On my USB sticks, I note the light comes on steady when the system freezes. It's reading (something).

I also noted the bug doesn't exhibit in 32-bit systems (I'd read previously somewhere).

Eleven years.

Shocking.
Comment 24 SlayerProof32 2018-05-19 21:45:20 UTC
@lou #20

I have confirmed that swapping is not the issue and you are correct. 
On my linux system, I tried swapoff --a and to no surprise, it crashed on high memory usage still, and there was still excessive drive usage. Thanks for the insight
Comment 25 SlayerProof32 2018-05-19 21:45:55 UTC
https://bugzilla.kernel.org/show_bug.cgi?id=199763
I filed another report.
Comment 26 Ivan Kalvachev 2018-06-18 17:01:18 UTC
I'm just a user, haunted by similar issues.

1. If you have zswap compiled and enabled, try to disable it.
ZSwap uses some RAM in order to compress pages and hide them there. e.g. reserves 500MB with the idea it could put 1000MB swap in there.


2. I think that there is something very broken in "Transparent HugePage Support" or THP, especially at kernel-4.15 and later. A bug in 4.16 memory compaction led to very noticeable swap pressure. The issue has been fixed, but disabling THP also was able to workaround it.
If you have major issues, try to build a kernel with that option disabled and see if the problem remains as severe.
Comment 27 SlayerProof32 2018-06-19 16:25:32 UTC
What happens for me at least with basically any system I use, is that Linux runs out of memory first, then SLAMS my swap disk with writes, which I believe overloads my system’s I/O susbsystem, and caused a complete system freeze. I have swapiness set to 60.
Comment 28 lou 2018-09-23 08:21:34 UTC
Update:

I'd commented in detail about this bug (comment 18.23 above). I run the live versions of Linux on a 4GB Core-i5 laptop (and another 4GB pentium laptop also.)

Just wanted to add:

I've added 4Gb of RAM to the Core-i5 laptop for 8Gb total now. This obviously helps a lot. Now some notes for 8GB RAM instances.

With Fedora 28, the system will still cease up with maybe 2 dozen (or less depending on what's happening (video, etc) ) FF tabs opened/active.

I came back here to note that, I'm currently using a Live Debian Stretch (9.5).

There are obviously significant differences in the way these variants of Linux manage memory.


Why?

Because under the same system conditions (Gnome, same s/w programs installed and/or running), I can open WAY more tabs in FF on Debian; open more simultaneous programs, without fear of a sudden system heart-attack.

In fact, it is much harder for me to cause the system freeze in Debian, even with approaching 50 tabs opened in FF developer 63...

I understand there are underlying Fedora vs Debian system differences like: systemd vs init, and Wayland vs Xorg, Gnome versions (3.28.1 vs 3.22.5) and kernel revisions (4.16.3-301.fc28.x86_64 vs 4.9.110-1 (2018-07-05) ), but in all, I find Debian WAYYYY more forgiving, and more manageable, ESPECIALLY in light of this FATAL flaw, AND the known Gnome memory leak bug which can easily be remediated for in Debian by restarting Gnome (via Alt-F2, r) to free back up that memory. (The only way to accomplish this in Fedora is to actually log out of your session because of Wayland limitations.)

Anyway, I just thought it's another data point to add to the mystery.

I still have to keep resource monitor opened even in Stretch, just in case, but I only crashed Stretch once over the past 3 months or so when I was in the 80's (mem % used) and let a video play for 2 hrs without checking up.

Normally anyway, that percentage isn't rising above the 70's in my typical "working" environment.

Finally, I'd like to mention to those asking for logs, etc., for this issue, realize that WHEN this issue occurs, it *is* essentially a heart-attack for the system. There is no recourse, and no way to gather logs. EVERYTHING ceases up- usually never to come back. A hard power-cycle is the only recourse, and NO logs  which would shed light on the issue are written. EVERYTHING stops- including log writing.

This is the reality.

I *do* have a few logs from and old (non-live) Jesse 8.7.1 install-- for a few times when, the system did revive, after hours-- and there's nothing in there that would shed light on the issue. The few entries in the log that I've researched pointed to no other instances/causes of this same issue.

It would be nice after 11 or 12 years of this issue, if someone higher up and more knowledgeable in the development "food chain" would would simply replicate the issue, it's not really that hard to do so at all.

It honestly is a show-stopper.

Ciao.
Comment 29 SlayerProof32 2018-12-21 04:49:16 UTC
These kind of reports are all over the web. Please someone who knows how memory management works, fix this behavior
Comment 30 ValdikSS 2018-12-25 10:01:23 UTC
I can confirm this regression. My system does not freeze with 4.9.140 and is overall totally usable on high memory usage and swap (right now 7.5 out of 8 GB RAM is used and 2 GB is swapped), but it becomes unresponsive with 4.19.10 when memory consumption is close to my RAM limit of 8 GB.

My system is:
Lenovo Thinkpad X220 laptop, Intel Core i7-2640M CPU (Sandy Bridge), Intel HD 3000 GPU, 8GB RAM, 8GB SWAP, SSD, UEFI mode.
Fedora 29, kernels 4.9.140 (self-built), 4.19.10-300.fc29.x86_64 from repository.
I'm using KDE Plasma 5.

You can easily reproduce this issue with the following steps:
1. Add "mem=2G" to the kernel command line. If you use GRUB, press "e" button in the bootloader menu and append "mem=2G" to the "linux" line. Your system would be limited to 2 GB RAM.
2. Boot the system (press F10 in bootloader menu after editing)
3. Run several heavy applications: web browser, word processor, GIMP. Open gmail web interface in browser, it's also heavy.

Expected result:
The system (at least the GUI, including mouse cursor) does not freeze/hang, but properly utilizes swap, what could be seen with 4.9.140 kernel (before regression). System is usable, all applications are still accessible, the music plays without hiccups, you can continue doing your work.

Actual result:
The system (at least the GUI, including mouse cursor) freezes/hangs with 4.19.10 kernel, with a very little chance to recover by itself. Disk activity LED is constantly lid. System is unusable.
Comment 31 SlayerProof32 2018-12-25 17:11:47 UTC
This issue appears to slightly improve in 4.20. Some of the issue is that things like klogin are getting swapped, as well as other parts of the GUI. We need a way to tell the kernel what to swap out first (like Firefox tabs, or open programs). The kernel should also start swapping at say 65% memory usage very slowly, and then increase swapping as memory fills, not slamming swap at 95% usage. We also should make sure the OOM killer is doing its job, instead of waiting for alt SysRq f. As ram frees, swap should be unloaded immediately, slowly like MacOs does.
Comment 32 lou 2019-02-13 21:48:25 UTC
I know @SlayerProof32 posted that this was rectified w kernel >4.17.5 in https://bugzilla.redhat.com/show_bug.cgi?id=1577528 , but the bug still exists. Easily reproducible.

Tested on a 3GB desktop (Core 2 Quad), running Live Ubuntu 18.10 LTS off of a flash drive (pendrivelinux.com). Kernel 4.18.0-10.

FF 63.0. I set the download folder to a dir on the hard drive so as not to deliberately stress free RAM.

With the system showing about 1.5GB free (System Monitor), trying to d/l this 1.25GB ROM image from Mega ( https://mega.nz/#!KUAyRKjJ!3hALO7dkuyFdE41BTWf1OfHaZmdTA-Kzd8q0HYiMbYs ), the d/l gets to 100% but system monitor shows RAM at 97% or 98%, the flash drive lights up and stays lit. System frozen, as I've reported previously with my other laptops.

I've mitigated this on the laptops because they now have 8GB of RAM. BUT-- even then, I can STILL crash those systems using Live Debian (or whatever flavor). It just takes more stressing (more open tabs, bigger d/l's, whatever) to get there, but it does.
Comment 33 ValdikSS 2019-03-10 22:40:29 UTC
How to reproduce (light):

1. Open web browser
2. Run the following command:
stress --vm 1 --vm-hang 0 --vm-bytes "$(awk '/MemAvailable/ {print $2"000"}' /proc/meminfo)"
3. Navigate to https://www.tumblr.com/explore/trending in web browser

Actual result:
The system is very slow, almost unresponsive. HDD LED is constantly lid.




How to reproduce (heavy):

1. Open web browser
2. Run the following command:
stress --vm 1 --vm-hang 0 --vm-bytes "$(awk '/MemAvailable/ {print $2"000"}' /proc/meminfo)"
3. Navigate to https://mail.google.com/ in web browser, while being logged into Gmail.

Actual result:
The system is unresponsive. HDD LED is constantly lid.
Comment 34 frfgr 2019-03-31 15:03:22 UTC

      Please pay special attention to the fact that you may need to know some knowledge and take a proper backup before trying the methods mentioned later.




  I have tested and found that the kernel 4.19.32 may not have this problem,the problem in kernel 5.0.5 is obvious. 
  Under the kernel with this problem, I found a way, if the problem occurs, run the following command immediately  under the appropriate permissions:

        sync   &&    echo  3  >  /proc/sys/vm/drop_caches    &&     sync    &&     echo 3  >  /proc/sys/vm/drop_caches     &&    sync  &&  echo  3  >  /proc/sys/vm/drop_caches
        
        
  This may reset some of the system's operations, which may affect the next few seconds of IO operation, but should alleviate the unresponsive situation.
Comment 35 pbo 2019-06-04 07:04:39 UTC
This problem is still happening as of kernel 5.0.17 (Fedora 30) - shouldn't this be reflected in "Kernel version" of this issue?
Comment 36 ValdikSS 2019-06-04 10:38:33 UTC
Those who experience the issue, try to set the following sysctl settings:

vm.swappiness=100
vm.watermark_scale_factor=200

It greatly helps on my PC.
Comment 37 Luca Osvaldo M. 2019-06-29 13:59:55 UTC
I am experiencing the same on Ubuntu, when the system starts swapping even small quantities of RAM, it just locks up until I manually trigger the OOM via sysrq keys.


This does not happen on FreeBSD and I can easily swap gigabytes of memory without even a single slowdown
Comment 38 Luca Osvaldo M. 2019-06-29 17:48:58 UTC
(In reply to ValdikSS from comment #36)
> Those who experience the issue, try to set the following sysctl settings:
> 
> vm.swappiness=100
> vm.watermark_scale_factor=200
> 
> It greatly helps on my PC.

I confirm this also works very nicely for me.

Thanks. Not it doesn't slow down at all.
Comment 39 Luca Osvaldo M. 2019-06-29 23:06:49 UTC
Actually, it did not solved the problem on my AMD desktop, but so far on my Intel laptop it seems fine
Comment 40 Luca Osvaldo M. 2019-07-02 01:10:52 UTC
I switched from BTRFS to EXT4 and this did seem to solve the problem, at least for me.
Comment 41 ValdikSS 2019-07-03 13:28:58 UTC
(In reply to Luca from comment #40)
> I switched from BTRFS to EXT4 and this did seem to solve the problem, at
> least for me.

Do you use swap file or swap partition? I use swap partition, so it shouldn't matter, but I indeed have btrfs for /.
Comment 42 Luca Osvaldo M. 2019-07-03 13:52:25 UTC
(In reply to ValdikSS from comment #41)
> (In reply to Luca from comment #40)
> > I switched from BTRFS to EXT4 and this did seem to solve the problem, at
> > least for me.
> 
> Do you use swap file or swap partition? I use swap partition, so it
> shouldn't matter, but I indeed have btrfs for /.

I always used a swap partition. 
That's why I was surprised when I switched back to ext4 this wasn't happening anymore.

That was also even happening when using zram only
Comment 43 Mikhail Novosyolov 2019-07-24 18:37:15 UTC
I encountered the same or a similar bug on BTRFS + HDD 5400 RPM + swap on a separate partition. Unfortunately, that notebook is not mine and is far away from me, what makes it hard to make experiments, kernels 4.15 and 4.18 both did have this problem, which can be reliably reproduced by running

# stress --vm 2 --vm-bytes 2000M --vm-keep

on that notebook with 4 GB RAM.
After running stress Load Average bumps above 11.0 and the whole system freezes.
Information about its hardware: https://linux-hardware.org/index.php?probe=af73180d0c
(HDD is in bad condition as you may see in smartctl by the URL above, but I still believe it's not the reason why swap is not being used properly).

Some other users complain about similar problems here: https://forum.rosalinux.ru/viewtopic.php?t=9387 (in Russian). Andreas17 there also uses BTRFS. I see that here, in my case, in bug#199763 and also in another case of similar problem that I know there are too many people with BTRFS, but this may be a coincidence.
Comment 44 Mikhail Novosyolov 2019-07-24 18:39:47 UTC
Did anybody try to reproduce it in a virtual environment? It would allow to bibisect the kernel automatically.
Comment 45 Luca Osvaldo M. 2019-07-24 18:40:11 UTC
its(In reply to Mikhail Novosyolov from comment #43)
> I encountered the same or a similar bug on BTRFS + HDD 5400 RPM + swap on a
> separate partition. Unfortunately, that notebook is not mine and is far away
> from me, what makes it hard to make experiments, kernels 4.15 and 4.18 both
> did have this problem, which can be reliably reproduced by running
> 
> # stress --vm 2 --vm-bytes 2000M --vm-keep
> 
> on that notebook with 4 GB RAM.
> After running stress Load Average bumps above 11.0 and the whole system
> freezes.
> Information about its hardware:
> https://linux-hardware.org/index.php?probe=af73180d0c
> (HDD is in bad condition as you may see in smartctl by the URL above, but I
> still believe it's not the reason why swap is not being used properly).
> 
> Some other users complain about similar problems here:
> https://forum.rosalinux.ru/viewtopic.php?t=9387 (in Russian). Andreas17
> there also uses BTRFS. I see that here, in my case, in bug#199763 and also
> in another case of similar problem that I know there are too many people
> with BTRFS, but this may be a coincidence.

it's not a coincidence, I did a lot of tests and they were only happening on ALL my machines using a btrfs as root partition
Comment 46 Mikhail Novosyolov 2019-07-24 19:05:51 UTC
Did anyone try to reproduce it on (open)SUSE, especially with their LTS kernel 4.12?
SUSE uses BTRFS by default and develops it, there is a chance that they might have caught and fixed or worked around this problem or maybe their default sheduler/kernel options/etc prevent this.
Comment 47 Jim Rees 2019-08-06 03:44:50 UTC
This bug is being discussed on lkml:
https://lkml.org/lkml/2019/8/4/15

I'm not going to participate there, but someone should point them to this bug and point out that everything worked fine until 4.10. Sometimes things that used to work and then got broken rate a higher priority.
Comment 48 Mikhail Novosyolov 2019-08-06 06:47:43 UTC
(In reply to Jim Rees from comment #47)
> This bug is being discussed on lkml:
> https://lkml.org/lkml/2019/8/4/15
> 
> I'm not going to participate there, but someone should point them to this
> bug and point out that everything worked fine until 4.10. Sometimes things
> that used to work and then got broken rate a higher priority.

I believe it is not relevant. They are discussing problem of memory allocation in general, that behaviour did not change much in recent kernels. But what we are discussing is a regression of using swap in kernels >= 4.10 (or >= 4.11?). In that thread topic starter suggests to swapoff, but what we are discussing is wise a versa not using an enabled swap.
Comment 49 GYt2bW 2019-08-14 09:39:13 UTC
Just an idea, try reproducing with kernel patch `le9g.patch`:

```
diff --git a/mm/vmscan.c b/mm/vmscan.c
index dbdc46a84f63..7a0b7e32ff45 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2445,6 +2445,13 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg,
 			BUG();
 		}
 
+    if (NR_ACTIVE_FILE == lru) {
+      long long kib_active_file_now=global_node_page_state(NR_ACTIVE_FILE) * MAX_NR_ZONES;
+      if (kib_active_file_now <= 256*1024) {
+        nr[lru] = 0; //don't reclaim any Active(file) (see /proc/meminfo) if they are under 256MiB
+        continue;
+      }
+    }
 		*lru_pages += size;
 		nr[lru] = scan;
 	}
```



see: https://gist.github.com/constantoverride/84eba764f487049ed642eb2111a20830#gistcomment-2997481
(^ scroll a bit up for some details of what the patch does)
Comment 50 Mikhail Novosyolov 2019-08-28 21:53:00 UTC
(In reply to ValdikSS from comment #36)
> Those who experience the issue, try to set the following sysctl settings:
> 
> vm.swappiness=100
> vm.watermark_scale_factor=200
> 
> It greatly helps on my PC.

It did not change anything. Still only around ~15 MB are swapped while 3.5 out of 4 GB of RAM is used. This results to high load average (9-15) when loading new tabs in Chromium.
Comment 51 Mikhail Novosyolov 2019-08-28 23:22:24 UTC
Created attachment 284677 [details]
log of read_vmstat when filling RAM with new tabs in Chromium

(In reply to Michal Hocko from comment #8)
> Created attachment 258067 [details]
> read_vmstat.c
> 
> On Tue 22-08-17 15:55:30, Andrew Morton wrote:
> > 
> > (switched to email.  Please respond via emailed reply-to-all, not via the
> > bugzilla web interface).
> > 
> > On Tue, 22 Aug 2017 11:17:08 +0000 bugzilla-daemon@bugzilla.kernel.org
> wrote:
> [...]
> > Sadly I haven't been able to capture this information 
> > > fully yet due to said unresponsiveness.
> 
> Please try to collect /proc/vmstat in the bacground and provide the
> collected data. Something like
> 
> while true
> do
>       cp /proc/vmstat > vmstat.$(date +%s)
>       sleep 1s
> done
> 
> If the system turns out so busy that it won't be able to fork a process
> or write the output (which you will see by checking timestamps of files
> and looking for holes) then you can try the attached proggy
> ./read_vmstat output_file timeout output_size
> 
> Note you might need to increase the mlock rlimit to lock everything into
> memory.

I am facing the following issue on this [1] hardware:
- when new tabs are openned in Chromium, swap (on SSD) is not used, 0K of swap is used. After about 2.5 GB out of total 4 GB RAM becomes used, about 15-50 MB of swap can be used, sometimes up to ~570 MB, but not more;
- this leads to that Load Everage bumps from normal ~0.9 to 9-15 when loading a new tab in Chromium;
- so the system in general freezes from time to time when working in the web browser and switching between tabs and/or openning new ones

I have run your program read_vmstat (./read_vmstat vmstat.log 5s) and then ran a script that openned many tabs in Chromium, a new tab each 5s; after all tabs were openned, about ~3.5 GB of RAM became used, but only about 15 MB were swapped; then I ctrl+c'ed ./read_vmstat. Collected log is attached.

Kernel was 4.15.0-58-generic in Ubuntu.

[1] https://linux-hardware.org/?probe=414558f152
Comment 52 Mikhail Novosyolov 2019-08-28 23:45:17 UTC
Forgot to write, that if I ran
$ stress --vm 2 --vm-bytes 1000M --vm-keep
swap is eventually used normally
Comment 53 ValdikSS 2019-09-12 18:41:42 UTC
Recently, about since kernel 5.2.7, the issue is either gone or present to much less extent.
Right now I'm running kernel 5.2.11 and finally I can keel Firefox and VirtualBox running at the same time, with 3G+ in swap, and the system does not freeze.

Could anyone affected by this issue try newer kernels?

Mikhail, I have a spare laptop which I can setup for you for tests. Do you have time and wish to investigate this issue?
Comment 54 Mikhail Novosyolov 2019-09-12 20:55:37 UTC
(In reply to ValdikSS from comment #53)
> Recently, about since kernel 5.2.7, the issue is either gone or present to
> much less extent.
> Right now I'm running kernel 5.2.11 and finally I can keel Firefox and
> VirtualBox running at the same time, with 3G+ in swap, and the system does
> not freeze.
> 
> Could anyone affected by this issue try newer kernels?

First of all, did you reset all your custom sysctls to default values?

1) One user (ilfat@) reported that:
1.1) on kernels 4.19.57 and Ubuntu kernel 4.15.0-54 his system did not swap correctly: all memory was full but only a small part of the swap was used, that lead to freezes
1.2) on kernels 4.19.67 and Ubuntu kernels 4.15.0-60, 4.15.0-62 systemd swaps in general normally, but he tweaked vm.watermark_scale_factor to make it swap better
1.3) Ilfat had exactly the same problem on both ext4 and btrfs

So, something was fixed in upstream, backported to LTS kernel 4.19 and to Ubuntu kernel. I don't know what. And that issue is 100% not in BTRFS but is another problem or another aspect of the problem.

2) I did not see much difference between Ubuntu kernels 54 and 60, 62 in what is described in comment#51. So, that mystereous fix did not help.

3) another user (anreas@) reports the same as in (2):
https://forum.rosalinux.ru/viewtopic.php?p=101903&sid=621857320f4d1a566e0cfa6e80ff4a8c#p101903

4) trying to overcome issues from comment#51, I built Ubuntu kernel 4.15.0 with 2 pathes:

* https://abf.io/mikhailnov/kernel-desktop-4.15/blob/master/le9-rosa.patch
* https://abf.io/mikhailnov/kernel-desktop-4.15/blob/master/Chromium-OS-low-memory-patchset.patch

and set kernel options:
# https://bugzilla.kernel.org/show_bug.cgi?id=196729#c36
vm.watermark_scale_factor=100
vm.unevictable_activefile_kbytes=100000
#vm.swappiness=80
# https://bugs.chromium.org/p/chromium/issues/detail?id=263561#c16
# Disable swap read-ahead
vm.page-cluster=0

After that, I _thank_ it became a bit better (I cannot prove it by numbers, just was unbale to make the system become inresponsive, but LA remained the same), but unfortunately the main user of that notebook did not use it for some days and so right now I can't say that according to her feedback the situation has improved and the system does not microfreeze from time to time anymore. Let's wait a bit more. And still that may be a coincidence, not the result of patches and/or tweaked sysctls.

I was able to dead lock that system by openning too many tabs in Chromium, but that is not what those patches should have solved. nohang/earlymoon would have probably helped if it was used.

> 
> Mikhail, I have a spare laptop which I can setup for you for tests. Do you
> have time and wish to investigate this issue?

I don't have ideas how to investigate it. And how to measure the result. Maybe PSI can tell something, I did not try to look at them. And even more, I don't understand what the problem is ;)
Comment 55 GYt2bW 2019-09-12 22:19:26 UTC
On an unrelated note(but since btrfs was thought to be a problem at some point), I've discovered that btrfs with zstd:5 (or worse zstd:15) can cause (at least) mouse cursor stuttering(like it was skipping frames), while zstd:1 doesn't(likely because of the low CPU usage during compression), regardless of how fast/many writes are happening on the SSD(2-6M/s with zstd:15, 38-50+M/s with zstd:1), apparently due to high CPU usage during the compression. (zstd unspecified means zstd:3 aka default)

Normal CPU usage by itself(eg. during compiling) doesn't cause such stuttering though. I've tested this on a Lenovo Ideapad Z575, 16G RAM, Kingston SSD SA400S37240G firmware SBFK71F1, and I've personally switched to zstd:1

ie.
```
diff --git a/fs/btrfs/zstd.c b/fs/btrfs/zstd.c
index 6b9e29d050f3..02ffdb27c360 100644
--- a/fs/btrfs/zstd.c
+++ b/fs/btrfs/zstd.c
@@ -22,7 +22,7 @@
 
 #define ZSTD_BTRFS_MAX_WINDOWLOG 17
 #define ZSTD_BTRFS_MAX_INPUT (1 << ZSTD_BTRFS_MAX_WINDOWLOG)
-#define ZSTD_BTRFS_DEFAULT_LEVEL 3
+#define ZSTD_BTRFS_DEFAULT_LEVEL 1
 #define ZSTD_BTRFS_MAX_LEVEL 15
 /* 307s to avoid pathologically clashing with transaction commit */
 #define ZSTD_BTRFS_RECLAIM_JIFFIES (307 * HZ)
```

but zstd:1 in /etc/fstab should also work, unless using too old kernel that doesn't know about it (hence why I prefer using the patch anyway)
Comment 56 Mikhail Novosyolov 2019-09-12 22:26:33 UTC
(In reply to howaboutsynergy from comment #55)
> On an unrelated note(but since btrfs was thought to be a problem at some
> point), I've discovered that btrfs with zstd:5 (or worse zstd:15) can cause
> (at least) mouse cursor stuttering(like it was skipping frames), while
> zstd:1 doesn't(likely because of the low CPU usage during compression),
> regardless of how fast/many writes are happening on the SSD(2-6M/s with
> zstd:15, 38-50+M/s with zstd:1), apparently due to high CPU usage during the
> compression. (zstd unspecified means zstd:3 aka default)

I tried to move Chromium cache from btrfs to tmpfs, nothing became better, seems that Load Average became even a bit bigger in peacks (regrading what is described in comment#51).
Comment 57 Mikhail Novosyolov 2019-09-12 23:00:05 UTC
(In reply to Mikhail Novosyolov from comment #54)
> 
> So, something was fixed in upstream, backported to LTS kernel 4.19 and to
> Ubuntu kernel. I don't know what. And that issue is 100% not in BTRFS but is
> another problem or another aspect of the problem.
> 
I can suspect commit 2c012a4ad1a2cd3fb5a0f9307b9d219f84eda1fa "mm: vmscan: scan anonymous pages on file refaults"

https://github.com/torvalds/linux/commit/2c012a4ad1a2cd3fb5a0f9307b9d219f84eda1fa

It appeared in kernel 5.3 and was backported to 4.19.62 and Ubuntu kernel 4.15.0-59

(In reply to ValdikSS from comment #53)
> Recently, about since kernel 5.2.7, the issue is either gone or present to
> much less extent.
> Right now I'm running kernel 5.2.11 and finally I can keel Firefox and
> VirtualBox running at the same time, with 3G+ in swap, and the system does
> not freeze.
... and to 5.2.3

But that commit "fixes" commits from the era of kernels 3.8 and 3.9, so probably it is not the one I'm looking for.
Comment 58 ValdikSS 2019-12-27 13:05:44 UTC
I have an idea why this bug is much worse with BTRFS than with EXT4: BTRFS has much bigger read/write amplification, up to 10x higher than EXT4.
Comment 59 Mikhail Novosyolov 2019-12-27 13:34:45 UTC
(In reply to ValdikSS from comment #58)
> I have an idea why this bug is much worse with BTRFS than with EXT4: BTRFS
> has much bigger read/write amplification, up to 10x higher than EXT4.

You mean that when e.g. Chromium browser writes its cache, it loads IO "up to 10x higher than EXT4", and, when IO is also loaded by swapping, it causes microfreezes? Are you sure that operations that cause write amplifications are not done in background?
Comment 60 ValdikSS 2019-12-27 13:40:30 UTC
(In reply to Mikhail Novosyolov from comment #59)
> You mean that when e.g. Chromium browser writes its cache, it loads IO "up
> to 10x higher than EXT4"

Yes. This is also true for read operations, not only for write.

> and, when IO is also loaded by swapping, it causes
> microfreezes?

Yes, probably

> Are you sure that operations that cause write amplifications
> are not done in background?

I'm not sure, but people on Rosa forum blamed BTRFS for this bug. I'm pretty sure it's not directly tight with BTRFS, but write and read amplification may explain why lags are more severe with this FS.

Check https://arxiv.org/pdf/1707.08514.pdf and https://habr.com/ru/post/476414/
Comment 61 Mikhail Novosyolov 2019-12-27 14:57:24 UTC
(In reply to ValdikSS from comment #60)
> I'm not sure, but people on Rosa forum blamed BTRFS for this bug. I'm pretty
> sure it's not directly tight with BTRFS, but write and read amplification
> may explain why lags are more severe with this FS.

People here and on ROSA forum blamed BTRFS for the problem that swap is being not used, but not for microfreezes...
Comment 62 Christoph 2019-12-29 10:41:53 UTC
I don't use btrfs, but only ext4 on an SSD. Since updating my system from kernel 5.1.7 to kernel 5.3.12 in Tumbleweed, I get regular ~1-2 second freezes (e.g. mouse pointer hangs in X11, or characters don't appear while typing in Konsole) while Blender renders and swaps out finished tiles. This didn't happen with the previous kernel.
Comment 63 RussianNeuroMancer 2020-03-04 13:29:19 UTC
Christoph, your issue is different. Please fill separate bugreport.
Comment 64 RussianNeuroMancer 2020-03-10 21:36:46 UTC
On HP Stream 7 Tablet (1GB RAM) there was similar regression, but it started since Linux 4.12 instead of 4.10. Last year I tried bisect and several workarounds such as autostart cleanup, sysctl tweaks, zram, etc. But in the end it's seems like Linux 5.5.8 solved issue, at least in this particular use-case (device with 1GB RAM) 5.5.8 makes swapping perform like it was with 4.11.
Comment 65 Chris Murphy 2020-03-15 20:31:01 UTC
I primarily test by building webkitgtk [1], and I experience the same loss of system responsiveness whether / is ext4 or Btrfs. But I do see a difference in top and iotop.
https://drive.google.com/open?id=12jpQeskPsvHmfvDjWSPOwIWSz09JIUlk

This is an extreme case of refaulting, it's out of memory and swap, and since kswapd and btrfs threads are using a lot of CPU I'm guessing the faults are a mix of anonymous pages and file pages. At this point the system is really lost which is why the UX is the same with ext4 and btrfs; but behind the scenes it does seem more is going on. There might be other workloads which aren't as extreme, thereby exposing the difference. Two possible sources of the heavy CPU for btrfs threads: decompression, and checksumming. If it's true there is near constant reclaim happening, it's not just a simple minimum 4K read but rather a 128K minimum because all Btrfs compressed files use 128K extent size; is then decompressed, and then requires reading csum tree and computing csum on the read to compare. Ordinarily this is cheap but in this situation possibly it's resulting in a lot of extra congestion, but this is the limit of my knowledge so it's just speculation.

Btrfs write amplification is a known issue (wandering trees problem). But that appears to not be the issue in this example.

It might be this problem is better dealt with by cgroupsv2 to protect certain tasks from reclaim, and thus reduce the problem on any file system. But Btrfs alone (for now) does have more sophisticated cgroupvs2 IO isolation control as well.
https://www.spinics.net/lists/cgroups/msg24743.html

The upstream GNOME and KDE developers are aware of the loss of responsiveness problem and have done quite a lot of preliminary work in GNOME 3.34 with more work on the way.
https://blogs.gnome.org/benzea/2019/10/01/gnome-3-34-is-now-managed-using-systemd/

You can today take advantage of this cgroupsv2 work by running resource hungry tasks as a systemd user unit in Fedora 31.
https://blogs.gnome.org/benzea/2019/10/01/gnome-3-34-is-now-managed-using-systemd/#comment-14833

I expect in the next 6-12 months (it's a guesstimate) there will be additional work in GNOME to protect the user session or what I vaguely call the "GUI stack" from reclaim, and thus improve its responsiveness at the expense of the resource hungry process.


[1] first two lines; set -j to RAM in GiB +2 GiB; i.e. if you have 8G RAM, use -j 10; more jobs makes the problem happen faster.
https://trac.webkit.org/wiki/BuildingGtk
Comment 66 Chris Murphy 2020-03-15 20:48:34 UTC
Another resource, quite long but has a tl;dr, and reviewed by some of the cgroups/resource control folks:
https://chrisdown.name/2018/01/02/in-defence-of-swap.html

I'll use somewhat technically sloppy language, but hopefully a useful metaphor: there's incidental swap and heavy swap. Incidental swap is when some file or anonymous page really isn't needed, and it's good to evict it to free up memory. But in the case of heavy swap (or even reclaim), not at all incidental, it becomes a serious performance impediment. The reality is that there are some tasks that just take gobs of memory and swap isn't a good substitute. But some swap is useful for freeing up memory for things that need to stay in memory. 

I'm finding for the incidental swap need, swap-on-ZRAM is quite useful. You are exchanging an IO bound task for a memory+CPU task; also it forces such pages to be pinned into memory. So there's no free lunch. Conservative use would be a ZRAM device around 1/4 of RAM up to a max of 50% RAM. It's not bad or wrong to use more, it's just that some workloads, like the webkitgtk example, have such significant need that it'll actually turn 1/2 of memory into swap, which in effect means you have 50% less RAM for that task. It actually makes the problem worse. Really, the near term is a) build with flags that cause fewer resources to be used in the firist place, or b) build in a systemd user session and limit resources that way, or c) buy more RAM, or d) use conventional swap partition, possibly with zswap to leave the most frequent pages in a small RAM cache pool, that's big enough for the task to eventually complete and just suffer with the ensuing lack of GUI responsiveness.

Anyway, it's a bit complicated. And lots of moving parts. And probably needs more sophisticate use of perf and maybe even bpf could be useful in figuring where the various bottlenecks are.
Comment 67 BaNru 2020-09-24 04:07:47 UTC
@lou +1

Sorry for my Google translate.


This bug is very many years old. This is *Bug 12309* !!! Why didn't anyone remember?


I'm tired of him. Was in the early 2010s for 512 MB of RAM. Was in the early 2010s for 512 MB of RAM. and 2 GB of RAM. Mid 2010s with 4GB of RAM. And for the past five years, 16 GB of RAM has not gone anywhere.


There is a funny article in Russian about this bug lurkmore.to/12309 with links to the original bug reports and "problem solutions" in each new kernel. "According to the anonymous author, this useful feature is LOVE AND CAREFULLY into a fresh kernel."


Earlier I tried to rebuild kernels according to the advice on the Internet (BFQ). Sometimes it helped a little. Some assemblies gave time to urgently close programs. But the most effective was vm.swappiness = 90 and a script with a notification when approaching this value to clean up memory (conky + dialog).
Comment 69 Egor 2022-01-20 08:08:27 UTC
Are there any changes?
Comment 70 Konstantin Kharlamov 2022-03-27 21:57:22 UTC
Modern kernels handle SWAPPING situations much better. 

Also, these days the Multi-LRU patchset should pretty much resolve the problem. It is not yet upstream, but is used in downstream kernels such as linux-zen and liquorix-kernel. There is hope it will be merged by 5.19¹

1: https://www.phoronix.com/scan.php?page=news_item&px=MGLRU-Not-For-5.18
Comment 72 Konstantin Kharlamov 2024-08-02 19:48:35 UTC
(In reply to Konstantin Kharlamov from comment #70)
> Also, these days the Multi-LRU patchset should pretty much resolve the
> problem. It is not yet upstream, but is used in downstream kernels such as
> linux-zen and liquorix-kernel. There is hope it will be merged by 5.19¹

MLRU patches were merged long ago, should perhaps this issue be closed? Is Steven Haigh (the OP) by any chance still here?
Comment 73 Ivan Kalvachev 2024-08-02 20:35:59 UTC
I've commented here 6 years ago, having similar issues with swap trashing.

I can attest that despite web browsers managing to eat more memory than ram available, my system hasn't gone unresponsive recently.
Comment 74 Steven Haigh 2024-08-03 02:24:36 UTC
Yeah - I'm still around.

These days, its become cheap enough to just have more RAM - so to be honest, I haven't seen an issue like this again in a number of years - but now I don't bother with less than 32Gb of RAM for a desktop.

Likely, this has just become obsolete now - so closing as such.
Comment 75 Konstantin Kharlamov 2024-08-03 04:38:41 UTC
To future readers: if you're still seeing this, make sure you have the file `/sys/kernel/mm/lru_gen/enabled` and its value is `0x0007` (it's just the one I tested). AFAIK kernel doesn't enable MLRU by default; but at the same time AFAIK all major distros enable it in their kernels.
Comment 76 Marc Neiger 2024-08-05 07:54:59 UTC
(In reply to Steven Haigh from comment #74)
> These days, its become cheap enough to just have more RAM - so to be honest,
> I haven't seen an issue like this again in a number of years - but now I
> don't bother with less than 32Gb of RAM for a desktop.
> 
> Likely, this has just become obsolete now - so closing as such.

Sorry to object, but I'm reading this thread because the issue is still fresh enough and relevant to me. Some people are stuck with 4Gb to live everyday.

However
(In reply to Konstantin Kharlamov from comment #75) 
This is helpful, like others above, the issue did not show up badly in the last few month using Mint 21.3 and kernel 6.8 (and perhaps since 5.15 or 5.19). I can not easily revert to an older one just for the test, but the machine currently reports `/sys/kernel/mm/lru_gen/enabled` and its value is `0x0007` as suggested.

This later is fair enough to mandate closure.