Bug 65201

Summary: kswapd0 randomly high cpu load
Product: Memory Management Reporter: nleo (nleo)
Component: OtherAssignee: Andrew Morton (akpm)
Status: NEW ---    
Severity: normal CC: 00cpxxx, cdlscpmv, chris, d, dclowes1, ddstreet, dek94, edward.donovan, hdefendme, howaboutsynergy, ivanov.maxim, jc, jharbold, johan.radivoj, jonathan, kenny.macdermid, kevin, liststuff, mail+kernel-bugzilla, me, mihail.zenkov, mikhail.krestjaninoff, mvanross, n.sherlock, patrakov, ponymarzanna, rainer, rjmcy, robincello, sakhnik, samkostka, sanderboom, seo.d, serianox, sgnn7, someuniquename, Wilhelm.Buchmueller, xpaint, zarnovican
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 3.12 Subsystem:
Regression: No Bisected commit-id:
Attachments: kmsg dump
ftrace (function_graph)
ftrace (vmscan tracepoints)
/proc/vmstat (time 0)
/proc/vmstat (time 5s)
/proc/zoneinfo
/proc/pagetypeinfo
/proc/buddyinfo
vmstat -m (time 0)
vmstat -m (time 5s)

Description nleo 2013-11-19 19:40:40 UTC
kswapd0 randomly load one core of CPU by 100%

Linux localhost 3.12.0-1-ARCH #1 SMP PREEMPT Wed Nov 6 09:06:27 CET 2013 x86_64 GNU/Linux

No swap enabled

Befor on same laptop was installed Ubuntu 12.04 and kernel 3.2 32-bit pae, and there is no such problem.

[root@localhost ~]# free -mh
             total       used       free     shared    buffers     cached
Mem:          3.8G       2.4G       1.3G         0B       150M       508M
-/+ buffers/cache:       1.8G       2.0G
Swap:           0B         0B         0B


[root@localhost ~]# cat /proc/meminfo
MemTotal:        3935792 kB
MemFree:         1381360 kB
Buffers:          154216 kB
Cached:           533096 kB
SwapCached:            0 kB
Active:          1958896 kB
Inactive:         438004 kB
Active(anon):    1740916 kB
Inactive(anon):   136292 kB
Active(file):     217980 kB
Inactive(file):   301712 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:              2064 kB
Writeback:             0 kB
AnonPages:       1709628 kB
Mapped:           196696 kB
Shmem:            167620 kB
Slab:              81516 kB
SReclaimable:      61312 kB
SUnreclaim:        20204 kB
KernelStack:        1696 kB
PageTables:        13088 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     1967896 kB
Committed_AS:    3498576 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      361304 kB
VmallocChunk:   34359300731 kB
HardwareCorrupted:     0 kB
AnonHugePages:    157696 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       18476 kB
DirectMap2M:     4059136 kB

And I can't kill it. I heared that it's not good idea, but just for lulz)
Comment 1 Aaron Tomlin 2013-11-20 23:32:02 UTC
(In reply to nleo from comment #0)
> kswapd0 randomly load one core of CPU by 100%

You cannot issue a SIGKILL to 'kswapd' since it is
a kernel thread.

> CommitLimit:     1967896 kB
> Committed_AS:    3498576 kB
                   ^^^^^^^

Seem to be over committing memory.
Comment 2 Andrew Morton 2013-11-22 00:57:01 UTC
(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Tue, 19 Nov 2013 19:40:40 +0000 bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=65201
> 
>             Bug ID: 65201
>            Summary: kswapd0 randomly high cpu load
>            Product: Memory Management
>            Version: 2.5
>     Kernel Version: 3.12
>           Hardware: x86-64
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Other
>           Assignee: akpm@linux-foundation.org
>           Reporter: nleo@nm.ru
>         Regression: No
> 
> kswapd0 randomly load one core of CPU by 100%
> 
> Linux localhost 3.12.0-1-ARCH #1 SMP PREEMPT Wed Nov 6 09:06:27 CET 2013
> x86_64
> GNU/Linux
> 
> No swap enabled
> 
> Befor on same laptop was installed Ubuntu 12.04 and kernel 3.2 32-bit pae,
> and
> there is no such problem.
> 
> [root@localhost ~]# free -mh
>              total       used       free     shared    buffers     cached
> Mem:          3.8G       2.4G       1.3G         0B       150M       508M
> -/+ buffers/cache:       1.8G       2.0G
> Swap:           0B         0B         0B

hm, I wonder what kswapd is up to.

Could you please make it happen again and then

dmesg -n 7
dmesg -c
echo m > /proc/sysrq-trigger
echo t > /proc/sysrq-trigger
dmesg -s 1000000 > foo

then send us foo?

> 
> [root@localhost ~]# cat /proc/meminfo
> MemTotal:        3935792 kB
> MemFree:         1381360 kB
> Buffers:          154216 kB
> Cached:           533096 kB
> SwapCached:            0 kB
> Active:          1958896 kB
> Inactive:         438004 kB
> Active(anon):    1740916 kB
> Inactive(anon):   136292 kB
> Active(file):     217980 kB
> Inactive(file):   301712 kB
> Unevictable:           0 kB
> Mlocked:               0 kB
> SwapTotal:             0 kB
> SwapFree:              0 kB
> Dirty:              2064 kB
> Writeback:             0 kB
> AnonPages:       1709628 kB
> Mapped:           196696 kB
> Shmem:            167620 kB
> Slab:              81516 kB
> SReclaimable:      61312 kB
> SUnreclaim:        20204 kB
> KernelStack:        1696 kB
> PageTables:        13088 kB
> NFS_Unstable:          0 kB
> Bounce:                0 kB
> WritebackTmp:          0 kB
> CommitLimit:     1967896 kB
> Committed_AS:    3498576 kB
> VmallocTotal:   34359738367 kB
> VmallocUsed:      361304 kB
> VmallocChunk:   34359300731 kB
> HardwareCorrupted:     0 kB
> AnonHugePages:    157696 kB
> HugePages_Total:       0
> HugePages_Free:        0
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> Hugepagesize:       2048 kB
> DirectMap4k:       18476 kB
> DirectMap2M:     4059136 kB
> 
> And I can't kill it. I heared that it's not good idea, but just for lulz)
> 
> -- 
> You are receiving this mail because:
> You are the assignee for the bug.
Comment 3 Mihail Zenkov 2015-04-21 14:54:19 UTC
Created attachment 174671 [details]
kmsg dump

Sometimes I have same problem. I don't have swap. I have kernel 3.19.0 (i686) compiled without CONFIG_SWAP.
Comment 4 Anatoli Sakhnik 2015-04-30 04:24:52 UTC
My Acer C720 too suffers occasionally. Turning swap on/off doesn't help. Dropping caches *does* help:

# echo 3 > /proc/sys/vm/drop_caches  # 1 isn't enough

Next my guess would be to try to deactivate zswap.
Comment 5 Anatoli Sakhnik 2015-05-03 06:33:59 UTC
Zswap isn't to blame, dropping caches may help or may not. There's the output of `sudo perf top`:

  26,24%  [kernel]                         [k] _raw_spin_lock
  14,72%  [kernel]                         [k] _raw_spin_unlock
   6,62%  [kernel]                         [k] super_cache_count
   4,97%  [kernel]                         [k] shrink_slab.part.12
   4,92%  [kernel]                         [k] list_lru_count_one
   2,15%  [i2c_designware_core]            [k] 0x0000000000000099
   1,86%  [kernel]                         [k] shrink_lruvec
   1,74%  [kernel]                         [k] mem_cgroup_iter
   1,61%  [kernel]                         [k] native_read_tsc
   1,55%  [kernel]                         [k] delay_tsc
   1,52%  [kernel]                         [k] kswapd%
Comment 6 Marzanna 2015-11-09 20:45:11 UTC
(In reply to Anatoli Sakhnik from comment #4)
> My Acer C720 too suffers occasionally. Turning swap on/off doesn't help.

I have the same hardware. After system upgrade (current running kernel version 4.2.0) I get high CPU usage after "heavy" web site opens. If suggested workaround doesn't help (dropping caches), I just quit web browser and everything returns back to normal.
Comment 7 Sam Kostka 2015-11-10 19:40:15 UTC
Same here, also on an Acer C720 running arch.  kswapd0 takes up a whole core whenever swap is being used.  I run the Arch kernel, with a small patch to the chromos_laptop driver to enable my trackpad.

The weird thing is memory and swap both aren't that full.  Memory is at 50% utilization, and swap is only at 8%, according to xfce4-taskmanager.  It seems like Google Docs is the worst offender for triggering this issue.
Comment 8 Mark 2015-11-16 20:18:55 UTC
I had this bug, and for me it turned out to be my /tmp directory that
is a tmpfs (to gain speed and save my ssd).

df /tmp 
gave
tmpfs            3880480 2449036   1431444  95% /tmp

After removing junk from /tmp/ the system returned to normal.

Also in my case I had no swap, and sufficient free memory.

Would be interested to know if this works for you.
Comment 9 serianox 2016-01-19 06:19:40 UTC
same problem here, c720p chromebook , happens on several different distros like arch, ubuntu, xubuntu. I downgraded to the 4.1.x kernel and the issue is less frequent (needs much more memory pressure to trigger). then I downgraded to the 3.17 kernel and the issue is gone completely. all the previous suggestions and workarrounds didn't work for me. only downgrading the kernel did.
Comment 10 Tim Edwards 2016-02-09 11:23:59 UTC
Same problem here on Acer C720 Chromebook. I have 2GB of swap space on the SSD (I replaced the original 16GB M2 SSD with a 256GB version) and whenever swap is used I get this problem.

Linux localhost 4.2.0-27-generic #32-Ubuntu SMP Fri Jan 22 04:49:08 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 15.10
Release:        15.10
Codename:       wily

echo 3 > /proc/sys/vm/drop_caches  # 1 isn't enough works around the issue for me too
Comment 11 Anatoli Sakhnik 2016-02-09 12:49:06 UTC
I didn't suffer from the bug since compiled kernel myself: https://aur.archlinux.org/packages/linux-c720/ . Apparently, I compiled out something causing the trouble, but I didn't try to bisect what was the culprit.
Comment 12 serianox 2016-02-09 19:23:59 UTC
(In reply to Anatoli Sakhnik from comment #11)
> I didn't suffer from the bug since compiled kernel myself:
> https://aur.archlinux.org/packages/linux-c720/ . Apparently, I compiled out
> something causing the trouble, but I didn't try to bisect what was the
> culprit.

This bug seems to affect 2Gb models only. Do you have the 2Gb or 4Gb version? What are the changes you made on your kernel?
Comment 13 Anatoli Sakhnik 2016-02-09 20:22:17 UTC
Mine is 2G. I didn't change anything in the kernel source code, but switched off many options in the config file: https://aur.archlinux.org/cgit/aur.git/tree/config.x86_64?h=linux-c720 .

Even today, if I boot stock arch kernel, the bug regresses; if I boot linux-c720, kswapd0 is still. In theory, I could experiment with different configurations in between stock's and mine to triage the issue.
Comment 15 Anatoli Sakhnik 2016-02-09 20:39:48 UTC
I have no idea yet.
Comment 16 Marzanna 2016-02-10 10:45:39 UTC
To avoid this bug I installed ChromeOS on my C720 (with 2GB RAM). I was happy with performance. Until today. I noticed lags. For some reason this bug appeared suddenly. There was no update. Kernel version is 3.8.11. Stock ChromeOS kernel.
Comment 17 serianox 2016-02-14 06:12:55 UTC
(In reply to Anatoli Sakhnik from comment #13)
> Mine is 2G. I didn't change anything in the kernel source code, but switched
> off many options in the config file:
> https://aur.archlinux.org/cgit/aur.git/tree/config.x86_64?h=linux-c720 .
> 
> Even today, if I boot stock arch kernel, the bug regresses; if I boot
> linux-c720, kswapd0 is still. In theory, I could experiment with different
> configurations in between stock's and mine to triage the issue.

could you please share your configuration for the kernel so I can try your AUR package and solve this issue once for all :) ?  thanks in advance
Comment 18 Anatoli Sakhnik 2016-02-14 07:25:17 UTC
There it is: https://aur.archlinux.org/cgit/aur.git/tree/config.x86_64?h=linux-c720
Comment 19 Jonathan Stoppani 2016-02-15 12:23:55 UTC
We encounter this regularly on AWS, but only on t2.small instances, which indeed are the only ones we run which have 2GB of RAM.

We use the latest Ubuntu 15.10 AMIs as found here https://cloud-images.ubuntu.com/locator/ec2/. Please let me know if we can do anything to help track this down.
Comment 20 Tim Edwards 2016-02-21 07:26:28 UTC
The workaround suggested above (echo 3 > /proc/sys/vm/drop_caches) doesn't work consistently for me on kernel 4.2.0 (Ubuntu 15.10) on an Acer C720 Chromebook.

I've found another workaround that works well for me so far: create a file /etc/sysctl.d/60-workaround-kswapd-allcpu.conf with the following contents and reboot:
vm.min_free_kbytes=67584

The idea behind this workaround is a post by Kirill A. Shutemov on LKML (http://lkml.iu.edu//hypermail/linux/kernel/1601.2/03564.html) and this Gallium OS bug report: https://github.com/GalliumOS/galliumos-distro/issues/52

Would be interesting to know if this helps others
Comment 21 Srdjan Grubor 2016-03-04 20:23:01 UTC
Same problem here:
- No swap machine
- Wily (U15.10) - 4.2.0-19-generic #23-Ubuntu SMP Wed Nov 11 11:39:30 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
- 1GB RAM

- `meminfo` - Should have enough RAM to not swap though buffers do seem high

MemTotal:        1014932 kB
MemFree:          231296 kB
MemAvailable:     871180 kB
Buffers:          580684 kB
Cached:            47812 kB
SwapCached:            0 kB
Active:           547952 kB
Inactive:         164364 kB
Active(anon):      84280 kB
Inactive(anon):     4288 kB
Active(file):     463672 kB
Inactive(file):   160076 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:               224 kB
Writeback:             0 kB
AnonPages:         83800 kB
Mapped:            39688 kB
Shmem:              4768 kB
Slab:              48008 kB
SReclaimable:      31172 kB
SUnreclaim:        16836 kB
KernelStack:        1936 kB
PageTables:         3844 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:      507464 kB
Committed_AS:     314640 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       13524 kB
VmallocChunk:   34359717628 kB
HardwareCorrupted:     0 kB
AnonHugePages:     49152 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       53248 kB
DirectMap2M:     1126400 kB

- kernel config: https://gist.github.com/sgnn7/cbb41ce21d3a927eca27

- strace shows nothing interesting

- `perf` report:
Samples: 12K of event 'cpu-clock', Event count (approx.): 3245250000                                                                                                                                                                               
Overhead  Command  Shared Object      Symbol                                                                                                                                                                                                       
  19.34%  kswapd0  [kernel.kallsyms]  [k] shrink_lruvec                                                                                                                                                                                            
  17.04%  kswapd0  [kernel.kallsyms]  [k] mem_cgroup_iter                                                                                                                                                                                          
   8.60%  kswapd0  [kernel.kallsyms]  [k] mem_cgroup_zone_lruvec                                                                                                                                                                                   
   6.57%  kswapd0  [kernel.kallsyms]  [k] shrink_slab                                                                                                                                                                                              
   5.47%  kswapd0  [kernel.kallsyms]  [k] global_dirty_limits                                                                                                                                                                                      
   4.18%  kswapd0  [kernel.kallsyms]  [k] domain_dirty_limits                                                                                                                                                                                      
   3.71%  kswapd0  [kernel.kallsyms]  [k] mem_cgroup_get_lru_size                                                                                                                                                                                  
   3.59%  kswapd0  [kernel.kallsyms]  [k] super_cache_count                                                                                                                                                                                        
   3.27%  kswapd0  [kernel.kallsyms]  [k] get_lru_size                                                                                                                                                                                             
   3.26%  kswapd0  [kernel.kallsyms]  [k] throttle_vm_writeout                                                                                                                                                                                     
   2.20%  kswapd0  [kernel.kallsyms]  [k] css_next_descendant_pre                                                                                                                                                                                  
   2.15%  kswapd0  [kernel.kallsyms]  [k] blk_flush_plug_list                                                                                                                                                                                      
   1.96%  kswapd0  [kernel.kallsyms]  [k] shrink_zone                                                                                                                                                                                              
   1.73%  kswapd0  [kernel.kallsyms]  [k] _raw_spin_lock                                                                                                                                                                                           
   1.59%  kswapd0  [kernel.kallsyms]  [k] __list_lru_count_one.isra.2                                                                                                                                                                              
   1.43%  kswapd0  [kernel.kallsyms]  [k] list_lru_count_one                                                                                                                                                                                       
   1.37%  kswapd0  [kernel.kallsyms]  [k] memcg_kmem_is_active                                                                                                                                                                                     
   1.27%  kswapd0  [kernel.kallsyms]  [k] __raw_callee_save___pv_queued_spin_unlock                                                                                                                                                                
...


I'm going to try gdb, changing swappiness, changing vm.min_free_kbytes, and reducing buffer limits in that order and report back but most likely I'll have one shot before the bug goes away for the next few days.
Comment 22 Srdjan Grubor 2016-03-04 21:39:59 UTC
Cont'd from previous post

In order of attempts on a live system:
- gdb didn't work at all since kernel wasn't built w/ debugging flags
- hotload of 10 and 0 swappiness (from 60) didn't make the kswapd process reduce cpu usage
- hotload of vm.min_free_kbytes=64K (from 4K) didn't make the process reduce cpu usage
- hotload of vm.dirty_background_ratio=5 (from 10) didn't make the process reduce cpu usage
- hotload of vm.dirty_ratio=10 (from 20) didn't make the process reduce cpu usage
- hotload of vm.dirty_background_ratio=15 (from 5) didn't make the process reduce cpu usage
- hotload of vm.dirty_ratio=25 (from 10) didn't make the process reduce cpu usage
- live swapon on a new 256MB swapfile didn't reduce process use
- live swapoff and swapon after that also didn't drop cpu usage


Sidenote: We're using Docker so I'm not sure if that is contributing to the situation.
Comment 23 cdlscpmv 2016-03-08 04:28:41 UTC
Good news! I was able to get rid of the bug completely by setting the `mem` kernel parameter to a value slightly less than physical memory. I own an Acer C720 (2GB model), and setting `mem=1920M` does the job.

The idea sprung up in my head after reading the aforementioned bug report on github[1]. I hope this might give some clue to the issue.

[1]: https://github.com/GalliumOS/galliumos-distro/issues/52
Comment 24 Maxim Ivanov 2016-03-09 15:30:59 UTC
Created attachment 208411 [details]
ftrace (function_graph)
Comment 25 Maxim Ivanov 2016-03-09 15:31:36 UTC
Created attachment 208421 [details]
ftrace (vmscan tracepoints)
Comment 26 Maxim Ivanov 2016-03-09 15:32:36 UTC
Created attachment 208431 [details]
/proc/vmstat (time 0)
Comment 27 Maxim Ivanov 2016-03-09 15:33:01 UTC
Created attachment 208441 [details]
/proc/vmstat (time 5s)
Comment 28 Maxim Ivanov 2016-03-09 15:33:30 UTC
Created attachment 208451 [details]
/proc/zoneinfo
Comment 29 Maxim Ivanov 2016-03-09 15:33:47 UTC
Created attachment 208461 [details]
/proc/pagetypeinfo
Comment 30 Maxim Ivanov 2016-03-09 15:34:09 UTC
Created attachment 208471 [details]
/proc/buddyinfo
Comment 31 Maxim Ivanov 2016-03-09 15:34:45 UTC
Created attachment 208481 [details]
vmstat -m (time 0)
Comment 32 Maxim Ivanov 2016-03-09 15:35:35 UTC
Created attachment 208491 [details]
vmstat -m (time 5s)
Comment 33 Maxim Ivanov 2016-03-09 15:59:37 UTC
I am able to semi-reliably reproduce this (or very similar?) problem on a setup very close to one in comment #21

- kernel: 4.2.0-30-generic (ubuntu 15.10)
- 2 GB RAM, 1 CPU, running under Xen (EC2 t2.small instance)
- docker with LVM thin-pool storage backend, running 3 containers, no memory limits set for their memcg's
- server is mostly idling (load average 0.0-0.1)

To reproduce it I have to:

1. set vm.overcomit_memory=1
2. initiate some disk activity: 
     find -xdev / -type f |xargs -P10 -n1 md5sum &>/dev/null & 
     find /var/lib/docker -type f |xargs -P10 -n1 md5sum &>/dev/null & 

3. run some memory allocations until you hit OOM
    for x in {1..200}; do ./memalloc & : ; done

memalloc above is a simple C program which allocates 100MB and memsets it with 'x':

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
  int block_mb = 100;
  char *buf;


  printf("allocing %dMB: ", block_mb);
  buf = malloc(block_mb * 1024 * 1000);
  if (! buf) {
    printf("FAILED!\n");
    exit(EXIT_FAILURE);
  }
  printf("ok\n");
  memset(buf, 'x', block_mb * 1024 * 1000);
  sleep(180);
  return 0;
}


once you hit OOM, console slows down, it is time to CTRL+C, pkill memalloc and then check top. many times it spins `kswapd0` then recovers within tens of seconds, but once in a while it stays there for hours (didn't have patience to check for longer).

Once I triggered bug, I tried to get as much information as possible from running system. I am attaching /proc/*info files (some taken 5 s apart), ftrace outputs for event tracer (vmscan events only), ftrace output for function_graph tester. Let me know if you need more information.


To recover from situation need to free enough memory in a short period of time, sometime dropping caches helps, sometimes needed to close applications/containers as well, but never had to reboot to recover.
Comment 34 Maxim Ivanov 2016-03-09 16:05:18 UTC
It would be very helpful if there was a way to get output similar to ftrace function_graph tracer, but with function args and return values, but from the look of it, `pgdat_balance` for some reason keeps returning false even that /proc/zoneinfo shows that number of free pages is much higher than any watermark.


Problem description and recovery method very closely resembles discussion around kernel 3.7 (https://lkml.org/lkml/2012/11/28/88):

> The zonelist reclaim in kswapd would do
> nothing because all high watermarks are met, but the compaction logic
> would find its own requirements unmet and loop over the zones again.
> Indefinitely, until some third party would free enough memory to help
> meet the higher compaction watermark.
Comment 35 virgosun 2016-04-30 13:38:28 UTC
(In reply to Anatoli Sakhnik from comment #4)
> My Acer C720 too suffers occasionally. Turning swap on/off doesn't help.
> Dropping caches *does* help:
> 
> # echo 3 > /proc/sys/vm/drop_caches  # 1 isn't enough
> 
> Next my guess would be to try to deactivate zswap.

above work around works for me, kernel 4.4.2 debian jessie.

bug happens randomly after heavy web browsers for kernel 4.5
downgrade to 3.16 stable jessie kernel, bug gone.
upgrade 4.4.2 bug came again
Comment 36 mail+kernel-bugzilla 2016-07-25 18:46:14 UTC
Same thing on Thinkpad X220 with 8 GB RAM running Ubuntu 14.04, with Ubuntu's Kernel 3.16.0-77-generic.

Swap is disabled.

kswapd0 runs on high CPU and the HD light is on all the time during this (no idea why).

After 20 (!) minutes the OOM killer manages to kill a process to resolve the situation.
Comment 37 Nicholas Sherlock 2016-08-25 06:51:45 UTC
Same problem on Amazon's t2.nano instance (512MB of RAM). Seemed to be triggered by doing a bunch of file IO. This is a brand new install of Ubuntu 16.04. I have no swap enabled, and yet:

top - 06:42:57 up  1:58,  1 user,  load average: 2.43, 2.66, 2.31
Tasks: 125 total,   3 running, 122 sleeping,   0 stopped,   0 zombie
%Cpu(s):  2.1 us,  6.9 sy,  0.0 ni,  0.0 id,  0.9 wa,  0.0 hi,  0.0 si, 90.1 st
KiB Mem :   498416 total,   348096 free,    49772 used,   100548 buff/cache
KiB Swap:        0 total,        0 free,        0 used.   411900 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
   29 root      20   0       0      0      0 R 65.0  0.0 103:16.64 kswapd0
14343 root      20   0       0      0      0 R  2.9  0.0   0:00.82 python

Running "echo 1 > /proc/sys/vm/drop_caches" didn't fix the problem, but it did fix it immediately with "3".

Also, my /tmp isn't full at all (6.5GB / 85% left on root).
Comment 38 Nicholas Sherlock 2016-08-25 07:10:24 UTC
A workaround for machines running under Xen has been found over on Ubuntu's bug tracker, see comment #69:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457

The workaround is to disable hot-add of memory:

touch /etc/udev/rules.d/40-vm-hotadd.rules
reboot
Comment 39 David Kellum 2016-08-30 16:43:55 UTC
I tried the same Ubuntu inspired "disable hot-add of memory" (and CPU) workaround under AWS EC2 HVM, Centos 7.x with mainline (elrepo) 4.4.15 kernel: no such luck, I still see this occasionally.
Comment 40 Dan Streetman 2016-10-01 17:51:02 UTC
I detailed why this bug happens here:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/126

this appears to be fixed by Mel Gorman's patch series to change memory reclaim from "per zone" to "per node":
https://marc.info/?l=linux-mm&m=146797052519026

So this bug should be fixed with the latest kernel.
Comment 41 mail+kernel-bugzilla 2016-10-02 20:18:14 UTC
(In reply to Dan Streetman from comment #40)
> I detailed why this bug happens here:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/126
>
> So this bug should be fixed with the latest kernel.

Can you clarify, the link you mention seems to talk mainly about Xen. Do you think the latest kernel will fix it also for non-Xen machines?
Comment 42 Dan Streetman 2016-10-02 20:30:38 UTC
(In reply to mail+kernel-bugzilla from comment #41)
> (In reply to Dan Streetman from comment #40)
> > I detailed why this bug happens here:
> > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457/comments/126
> >
> > So this bug should be fixed with the latest kernel.
> 
> Can you clarify, the link you mention seems to talk mainly about Xen. Do you
> think the latest kernel will fix it also for non-Xen machines?

what does your /proc/zoneinfo look like?  do you have a system with (approx) <= 4g and Normal zone with few managed pages?
Comment 43 mail+kernel-bugzilla 2016-10-02 20:45:13 UTC
(In reply to Dan Streetman from comment #42)
> what does your /proc/zoneinfo look like?  do you have a system with (approx)
> <= 4g and Normal zone with few managed pages?

My zoneinfo file right now looks like this: https://gist.github.com/nh2/7ba7375d5c8de797714f7a909e6f0c94

(I upgraded from 8 GB to 16 GB memory recently though, after I wrote comment #36.)
Comment 44 Dan Streetman 2016-10-02 21:20:24 UTC
(In reply to mail+kernel-bugzilla from comment #43)
> (In reply to Dan Streetman from comment #42)
> > what does your /proc/zoneinfo look like?  do you have a system with
> (approx)
> > <= 4g and Normal zone with few managed pages?
> 
> My zoneinfo file right now looks like this:
> https://gist.github.com/nh2/7ba7375d5c8de797714f7a909e6f0c94
> 
> (I upgraded from 8 GB to 16 GB memory recently though, after I wrote comment
> #36.)

That zoneinfo doesn't look like you're seeing the same problem, so if you are seeing consistent, sustained (not just transient) 100% cpu from kswapd, I think it's a different problem from what I described in comment 40.
Comment 45 Sam Kostka 2016-10-13 21:25:14 UTC
I'm assuming by latest kernel you mean 4.8?  If so I'm looking forward to Arch pushing it through testing :)
Comment 46 phocean 2016-11-15 21:32:23 UTC
I am having the same issue on Fedora 24 with kernel 4.8.6. So I guess it has not been pushed there, or it does not fix anything.
It is a huge job stopper as I need to transfer many files between two USB disks.
Kwapd0 appears on top of processes after a while, and slowly degrades overall performance until I have to hard reboot the machine in the middle of some transfer.
Comment 47 Sam Kostka 2016-11-15 21:36:22 UTC
My guess is Fedora didn't put the changes through or something, because 4.8 has DEFINITELY fixed it for me.  I used to have to reboot about twice daily due to this, but ever since I upgraded to 4.8 it hasn't happened once.
Comment 48 Mikalai Daronin 2016-11-20 21:49:36 UTC
I'm on openSUSE with 4.8.8 and still have this issue.
Comment 49 Bruno Jesus 2016-12-09 01:27:14 UTC
I'm on Debian with 4.8.7 and still have this issue.
Comment 50 Wilhelm Buchmüller 2017-01-06 04:39:29 UTC
4.8.13-100.fc23.i686+PAE #1
/dev/sda is Samsung SSD 850 EVO 250GB 

swapoff -va
sysctl vm.drop_caches=3

Problem, causes always heavy kswapd0 load:
  cat /dev/sda >> /dev/zero
  hdparm -t /dev/sda
  ddrescue /dev/sda /dev/zero  -vf
  hexdump /dev/sda
  dd if=/dev/sda of=/dev/zero
  etc.

No problem (read speed ~500MB/s, except hdparm ):
  hdparm   --direct -t /dev/sda
  dd   iflag=direct if=/dev/sda of=/dev/zero bs=1073741824
  ddrescue --direct    /dev/sda /dev/zero -vf  -b 4096 -c 8192
Comment 51 Douglas Clowes 2017-01-08 23:03:59 UTC
I am not sure if this is the same bug, but for me kswapd0 goes high-cpu following a page allocation failure in xhci_segment_alloc and I think that this has been occurring since moving to 4.8 on Fedora 24. I don't remember experiencing it before that. Currently on 4.8.15.

I normally boot with 3 or 4 USB 3.0 disks attached and, after the upgrade to 4.8.x noticed that kswapd0 was running at 100%. I went back to 4.7.x and no problem. Searches on this issue frequently referred to USB disks so I unplugged and rebooted.

If I unplug all of my USB 3.0 devices I get a normal boot, even with a USB weather station, keyboard, mouse. Sometimes, one or two USB 3.0 disks is OK too, If I boot with all of the USB 3.0 disks included, I get a kworker page allocation failure and after boot kswapd0 is high-cpu, usually split across 2-4 cores.

If I boot with two USB 3.0 disks and get a normal boot (no page allocation failure and normal kswapd) and then plug in a hub with the rest of the disks (and a USB 3.0 card reader) I get the page allocation failure at that point and kswapd0 goes high-cpu.

I have not looked at them all, but whenever I see kswapd0 high-cpu and I do look, there is the page allocation failure in the log.

The 'perf top' command seems to show different information from time to time but the top contenders are frequently 'shrink_inactive_list', 'inactive_list_is_low', 'find_next_bit', 'shrink_none_memcg', '_raw_spin_lock' to name a few.

Makes me wonder if the xhci allocation failure is the trigger, and fails to clean up on the error exit path, and kswapd0 is just a hapless victim. There is a stack trace (on ubuntu kernel) of the page allocation failure in the dmesg attached to https://bugzilla.redhat.com/show_bug.cgi?id=1395825 on this issue but I have more if it would help.

I have 19GiB free on a 24GiB machine so there should be no memory shortage to prompt swapping or the page allocation failure.

I had also noticed frequently that not all of my USB disks were mounted after boot and that I had to remove and reinsert a disk to use it. IIRC this affected my USB 2.0 disks too and from before the upgrade to 4.8 too.
Comment 52 Dan Streetman 2017-01-12 20:14:07 UTC
> Problem, causes always heavy kswapd0 load:
>  cat /dev/sda >> /dev/zero
>  hdparm -t /dev/sda
>  ddrescue /dev/sda /dev/zero  -vf
>  hexdump /dev/sda
>  dd if=/dev/sda of=/dev/zero
>  etc.

of course those cause kswapd work, all those commands will fill your page cache and kswapd is responsible for clearing those pages out.

kswapd running isn't a problem, if it's doing work.  kswapd running *without* doing work is the problem.  When you stop running those commands, does kswapd catch up and stop using cpu?  If so, that's normal.  If not, and it never stops using cpu, that's the problem.

> No problem (read speed ~500MB/s, except hdparm ):
>  hdparm   --direct -t /dev/sda
>  dd   iflag=direct if=/dev/sda of=/dev/zero bs=1073741824
>  ddrescue --direct    /dev/sda /dev/zero -vf  -b 4096 -c 8192

the difference is those commands bypass the page cache - so the page cache doesn't fill up and kswapd doesn't need to clear it out.
Comment 53 Dan Streetman 2017-01-12 20:41:51 UTC
> I am not sure if this is the same bug, but for me kswapd0 goes high-cpu
> following a page allocation failure in xhci_segment_alloc and I think that
> this has been occurring since moving to 4.8 on Fedora 24

from your dmesg, it certainly doesn't look like the same bug.
Comment 54 Victor Bo 2018-12-18 17:55:52 UTC
(In reply to Dan Streetman from comment #52)

> of course those cause kswapd work, all those commands will fill your page
> cache and kswapd is responsible for clearing those pages out.
> 
> kswapd running isn't a problem, if it's doing work.  kswapd running
> *without* doing work is the problem.  When you stop running those commands,
> does kswapd catch up and stop using cpu?  If so, that's normal.  If not, and
> it never stops using cpu, that's the problem.

but, why kswapd so aggressively write something to storage when no data to flush (swap not set)?
Comment 55 Roman Evstifeev 2019-01-01 12:38:18 UTC
I reproduced the bug on the most recent kernel. I have extracted sysctl, meminfo and dmesg logs: please see my comments and attachments on the same bug: https://bugzilla.kernel.org/show_bug.cgi?id=110501#c15
I also wrote simple python script that eats ram and reproduces the bug 100% for me
Comment 56 d 2022-02-11 23:03:10 UTC
Reproduced on latest centos kernel 3.10.0-1160.53

It's so strange that this keeps on happening I tried disabling swap and everything but it doesn't care. There's 100 GB free ram and yet it happens
Comment 57 John E. Harbold 2022-05-17 02:05:31 UTC
I'm working with CentOS Linux kernel version 3.10.0-1160.49.1 and I also noticed that kswapd0 runs for over 20 seconds and seem to cause a kernel panic.  In examining the kswapd() code, it has an infinite loop.  It can only break from this loop if the function, kthread_should_stop() returns as true.  This function tests the current task's flag for the bit KTHREAD_SHOULD_STOP is set.  This bit will only be set if a call to to_live_kthread() that will get a pointer to the current kernel thread.  If the pointer is NULL, then the KTHREAD_SHOULD_STOP bit will not be set.  This may be the problem with this BUG.

Anyone have a comment?
Comment 58 John E. Harbold 2022-06-16 01:06:32 UTC
I believe this bug has be fixed in later versions of the Linux kernel.  I tested kswapd0 by writing a C program that create a large memory map for a file.  I then  encapsulate the above C program to generate several thousand instances of the C program.  My computer is running version 5.17 of the Linux kernel.  The computer bogged down, but the kswapd0 did not have a problem.

I believe this bug should be closed.
Comment 59 John E. Harbold 2022-06-16 01:07:53 UTC
I believe this bug has be fixed in later versions of the Linux kernel.  I tested kswapd0 by writing a C program that create a large memory map for a file.  I then  encapsulate the above C program to generate several thousand instances of the C program.  My computer is running version 5.17 of the Linux kernel.  The computer bogged down, but the kswapd0 did not have a problem.

I believe this bug should be closed.
Comment 60 John E. Harbold 2022-06-16 01:08:06 UTC
I believe this bug has be fixed in later versions of the Linux kernel.  I tested kswapd0 by writing a C program that create a large memory map for a file.  I then  encapsulate the above C program to generate several thousand instances of the C program.  My computer is running version 5.17 of the Linux kernel.  The computer bogged down, but the kswapd0 did not have a problem.

I believe this bug should be closed.
Comment 61 John E. Harbold 2022-06-17 19:14:52 UTC
I believe this bug has be fixed in later versions of the Linux kernel.  I tested kswapd0 by writing a C program that create a large memory map for a file.  I then  encapsulate the above C program to generate several thousand instances of the C program.  My computer is running version 5.17 of the Linux kernel.  The computer bogged down, but the kswapd0 did not have a problem.

I believe this bug should be closed.