Sometimes kswapd0 uses 100% of one core. This issue happens during all 4.0+ kernels. I tested some time ago, some of 3.0+ kernels, and problem was there.
drop_caches didn't help.
Adding swap didn't work.
There is no rule when problem appears. It can wait 1-2 weeks. It can appear twice in day. Most of times it will exist any time long, before i will reboot server. Rarely it can cure itself without reboot.
Perftop with i915 module enabled:
+ 98,22% 0,90% [kernel] [k] kswapd
+ 93,07% 0,32% [kernel] [k] shrink_zone
+ 87,41% 3,71% [kernel] [k] shrink_slab
+ 56,55% 1,11% [i915] [k] i915_gem_shrinker_scan
+ 50,65% 46,24% [i915] [k] i915_gem_shrink
+ 23,28% 4,32% [kernel] [k] super_cache_count
+ 18,22% 2,33% [kernel] [k] list_lru_count_one
+ 15,26% 0,00% [kernel] [k] ret_from_fork
+ 15,26% 0,00% [kernel] [k] kthread
+ 11,84% 11,83% [kernel] [k] _raw_spin_lock
46,24% [i915] [k] i915_gem_shrink
12,11% [kernel] [k] _raw_spin_lock
4,28% [kernel] [k] super_cache_count
3,74% [i915] [k] i915_vma_unbind
3,67% [kernel] [k] shrink_slab
3,20% [i915] [k] i915_gem_object_put_pages
2,78% [kernel] [k] __list_lru_count_one.isra.0
This is with i915 blacklisted. It don't changed this bug behavior in any way except this output:
+ 97,30% 2,32% [kernel] [k] kswapd
+ 83,40% 0,65% [kernel] [k] shrink_zone
+ 69,79% 7,77% [kernel] [k] shrink_slab
+ 59,73% 10,66% [kernel] [k] super_cache_count
+ 46,97% 5,76% [kernel] [k] list_lru_count_one
+ 30,73% 30,73% [kernel] [k] _raw_spin_lock
+ 23,84% 0,00% [kernel] [k] ret_from_fork
+ 23,84% 0,00% [kernel] [k] kthread
+ 9,63% 7,23% [kernel] [k] __list_lru_count_one.isra.0
+ 6,18% 0,82% [kernel] [k] zone_balanced
+ 5,05% 2,28% [kernel] [k] shrink_lruvec
30,63% [kernel] [k] _raw_spin_lock
10,84% [kernel] [k] super_cache_count
7,66% [kernel] [k] shrink_slab
7,28% [kernel] [k] __list_lru_count_one.isra.0
5,29% [kernel] [k] list_lru_count_one
3,69% [kernel] [k] memcg_cache_id
3,62% [kernel] [k] _raw_spin_unlock
2,63% [kernel] [k] zone_watermark_ok_safe
2,53% [kernel] [k] mem_cgroup_iter
2,44% [kernel] [k] shrink_lruvec
2,36% [kernel] [k] kswapd
1,35% [kernel] [k] _raw_spin_lock
Very probably is regression, seems like no problem at kernel 3.14.58
Problem appeared in kernel 4.1 and still exists.
A few observations:
* I can reproduce this at will on any 4.1 kernel with limited physical RAM (tested on a 2GB machine) and zram
* I cannot reproduce this with no swap enabled
* I cannot reproduce this on the same kernels when limiting RAM by kernel parameters (I've tried mem=1024M on a 2GB machine; collaborators have tested mem=2048M on 4GB machines)
* I cannot reproduce this on kernel 4.0.9 or earlier
* When the condition is triggered, kswapd jumps from 0% to 99+% CPU over a couple seconds
* kswapd will apparently never recover on its own, but will bounce around between 97-100% CPU for hours and overnight
* Relieving memory pressure does not help directly: I've frequently returned to >1GB freemem and nearly empty swap, while kswapd remained pegged
* Killing a specific process (or probably the last of a pair or a set, it's hard to tell) *does* help: kswapd drops to 0% CPU nearly instantly
* I do not know how to determine which process/set will be the "magic" one: it doesn't seem to be predictable by start order, and is rarely if ever the same process that triggered the condition
Quickly allocate largish blocks of memory (100MB) in separate processes. On a 2GB machine with 2.8GB zram configured, the condition triggers reliably at ~27 100MB blocks allocated. I can then allocate an additional ~13-15 100MB blocks before mallocs fail.
It is sometimes possible to avoid the condition when allocating gradually, and it does not occur at all when the allocations are in a single process.
This test case was created to resemble the launch of a multiprocess web browser with several tabs open, which is where I first heard about the issue. Reproducing with the web browser was less predictable but still possible.
Patch suggested by Kirill Shutemov in http://lkml.iu.edu//hypermail/linux/kernel/1601.2/03564.html , applied to kernel 4.1.14, seems to resolve this issue.
I've run about a dozen tests so far, and have not been able to trigger the condition except possibly one time (the first time), whereas I can trigger it every time under the unpatched kernel.
When kswapd hit 100% this time, I allocated another chunk and kswapd immediately recovered, which would never happen under the unpatched kernel.
I didn't give kswapd any time to recover on its own (I was expecting the old behaviour), so I don't know if it would have done so without the additional allocation.
I have not seen any similar behaviour since that first test, but will be doing more testing today, and will update again if I do.
I've found a workaround that works well for me so far: create a file /etc/sysctl.d/60-workaround-kswapd-allcpu.conf with the following contents and reboot:
The idea behind this workaround is a post by Kirill A. Shutemov on LKML (http://lkml.iu.edu//hypermail/linux/kernel/1601.2/03564.html) and this Gallium OS bug report: https://github.com/GalliumOS/galliumos-distro/issues/52
Would be interesting to know if this helps others
(In reply to Tim Edwards from comment #5)
> I've found a workaround that works well for me so far: create a file
> /etc/sysctl.d/60-workaround-kswapd-allcpu.conf with the following contents
> and reboot:
> The idea behind this workaround is a post by Kirill A. Shutemov on LKML
> (http://lkml.iu.edu//hypermail/linux/kernel/1601.2/03564.html) and this
> Gallium OS bug report:
> Would be interesting to know if this helps others
No use for me.
I has a asus notebook whith Intel Celeron 847 and 2GB RAM.
I'm using arch linux with kernel 4.4.3, and when I rebuild the kernel with makepkg command, kswapd0 uses 100% of one core after few minutes.
(In reply to jacky from comment #6)
> (In reply to Tim Edwards from comment #5)
> > I've found a workaround that works well for me so far: create a file
> > /etc/sysctl.d/60-workaround-kswapd-allcpu.conf with the following contents
> > and reboot:
> > vm.min_free_kbytes=67584
> > The idea behind this workaround is a post by Kirill A. Shutemov on LKML
> > (http://lkml.iu.edu//hypermail/linux/kernel/1601.2/03564.html) and this
> > Gallium OS bug report:
> > https://github.com/GalliumOS/galliumos-distro/issues/52
> > Would be interesting to know if this helps others
> No use for me.
> I has a asus notebook whith Intel Celeron 847 and 2GB RAM.
> I'm using arch linux with kernel 4.4.3, and when I rebuild the kernel with
> makepkg command, kswapd0 uses 100% of one core after few minutes.
When kswapd0 uses 100% of one core, the free command output:
total used free shared buff/cache available
Mem: 1866212 66036 1465132 1088 335044 1605068
Swap: 1952764 0 1952764
As you can see, the memory is enough for my system, and swap usage is 0%.
I've been seeing problems like this since the v2 kernel (6 years ago) when doing large amounts of buffered I/O writes. Ie., when writing hundreds of mbytes of data to database devices or to an NFS filesystem as part of a database backup. Where I work, we're currently on Linux 3.10 and we're still seeing it.
Various things we've read says that it might be related to kernel memory fragmentation. Ie., you can have lots of regular memory free, but that doesn't help. It's related to balancing kernel memory for NUMA nodes, slabs, & zones.
In some cases, disabling NUMA might help. We also changed our databases to use directio (ie., non-buffered. Doesn't thrash Linux virtual memory doing useless buffering for a database server that already does it's own caching). If possible, we try not to do large database backups to NFS.
And of course, we always get the answer, "Upgrade to the newest version of Linux. There's a patch there that sounds like it might fix it".
I wish kswapd had some clearer monitoring info to figure out why it's spinning.
Created attachment 241121 [details]
I was affected by this bug since kernel 2.x. Right now i got it with kernel 4.7.5
The behavior in this most recent kernel changed a bit: now i can see in the journalctl logs that firefox invokes oom-killer. In previous kernel versions there was nothing in the syslog. But despite this message, firefox did not get killed by oom-killer, and as usual i had to switch to root console session to launch "killall firefox" (it takes minutes to do that, as system is thrashing). Only after this firefox is killed and kswapd0 stops tearing PC apart.
Attaching the relevant journalctl log.
Ignore the drm bugs in the log - it happens when i switch from X to console, or back.
Oh, I forgot to mention that the bug happens regardless of the fact if swap is enabled or not.
Happens with no swap, no mount, only reading from SSD:
To me it seems like kswapd is getting stuck in an infinite loop. It may be some sort of buffer overflow, which explains why it is triggered by allocating multiple blocks of memory. Allocating gradually allows kswapd to empty its buffer before the next blocks come in, therefore not triggering the bug. After the buffer overflows (or worse, wraps around) the invalid data triggers an infinite loop that devours the CPU.
Created attachment 280229 [details]
echo m > /proc/sysrq-trigger; echo t > /proc/sysrq-trigger; dmesg > dmesg.log
dmesg -n 7
echo m > /proc/sysrq-trigger
echo t > /proc/sysrq-trigger
dmesg -s 1000000 > foo
when kswapd0 is starting to hang the system
Created attachment 280231 [details]
sysctl -a | grep vm > sysctl.txt
my sysctl vm.*
Reproduced on kernel 4.19.8
uname -a: Linux linux-fdsk 4.19.8-1-default #1 SMP PREEMPT Sun Dec 9 20:08:37 UTC 2018 (9cae63f) x86_64 x86_64 x86_64 GNU/Linux
Steps to reproduce:
Download my simple ramhog.py script: https://gist.github.com/Fak3/3c6bf52000651b00d9ab03e5b6bae677
Launch it, then press enter until it hogs all the ram
kswapd0 process will hang the system, taking all the cpu and all disk i\o forever (i was waitnig few hours):
> python3 ramhog.py
Expected results: system should not hang, oom killer should kill some process to free some ram.
i managed to get some logs when kswapd is starting to go rogue:
see my attachments to this bug: sysctl, dmesg, meminfo
Created attachment 280233 [details]
/proc/meminfo when kswapd is starting to hang the system
oh, some more details:
The script hanging testing above was done on openSUSE Tumbleweed distro, ssd drive.
This bug is haunting me for about 10 years, regardless of sysctl settings, swap, hdd type and kernel. All that is required to do - is open many tabs in web browser, enough to eat all ram.
oh, more info:
"echo 3 > /proc/sys/vm/drop_caches" does not solve this bug