Bug 197783

Summary: System is using much more swap than all processes' VmSwap combined
Product: Memory Management Reporter: Matt Whitlock (kernel)
Component: OtherAssignee: Andrew Morton (akpm)
Status: RESOLVED INVALID    
Severity: normal CC: nate
Priority: P1    
Hardware: Intel   
OS: Linux   
Kernel Version: 4.13.9-gentoo Subsystem:
Regression: No Bisected commit-id:

Description Matt Whitlock 2017-11-06 07:12:24 UTC
This has happened to me a few times now. After about a week of uptime, my system gets into a state where it is low on MemAvailable and swap usage is high, yet most of the swap usage is unaccounted for.

My system is in this bizarre state right now. Here's some info:

# free -k
              total        used        free      shared  buff/cache   available
Mem:        8165636     7228592      229640       53004      707404      459780
Swap:       8388604     3459820     4928784

# cat /proc/meminfo
MemTotal:        8165636 kB
MemFree:          229516 kB
MemAvailable:     459668 kB
Buffers:              44 kB
Cached:           658544 kB
SwapCached:       283532 kB
Active:          1703556 kB
Inactive:         982180 kB
Active(anon):    1539520 kB
Inactive(anon):   534456 kB
Active(file):     164036 kB
Inactive(file):   447724 kB
Unevictable:          32 kB
Mlocked:              32 kB
SwapTotal:       8388604 kB
SwapFree:        4928784 kB
Dirty:                84 kB
Writeback:             0 kB
AnonPages:       2026444 kB
Mapped:           246356 kB
Shmem:             53004 kB
Slab:             171892 kB
SReclaimable:      48816 kB
SUnreclaim:       123076 kB
KernelStack:       13840 kB
PageTables:        30740 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    12471420 kB
Committed_AS:   10150000 kB
VmallocTotal:   34359738367 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
DirectMap4k:     7227548 kB
DirectMap2M:     1159168 kB

# cat /proc/swaps
Filename                                Type            Size    Used    Priority
/dev/zram0                              partition       8388604 3459820 32767

# awk '/^VmSwap:/ { total += $2 } END { print total }' /proc/*/status   
1041100

# cat /proc/slabinfo
slabinfo - version: 2.1
# name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
nfsd4_files          859    992    264   31    2 : tunables    0    0    0 : slabdata     32     32      0
nfsd4_lockowners      80     80    392   20    2 : tunables    0    0    0 : slabdata      4      4      0
nfsd4_openowners      72     72    432   18    2 : tunables    0    0    0 : slabdata      4      4      0
rpc_inode_cache      100    100    640   25    4 : tunables    0    0    0 : slabdata      4      4      0
ext4_groupinfo_4k     28     28    144   28    1 : tunables    0    0    0 : slabdata      1      1      0
bio-3                460    494    832   19    4 : tunables    0    0    0 : slabdata     26     26      0
ip6-frags             20     20    200   20    1 : tunables    0    0    0 : slabdata      1      1      0
RAWv6                120    120   1088   30    8 : tunables    0    0    0 : slabdata      4      4      0
UDPv6                104    104   1216   26    8 : tunables    0    0    0 : slabdata      4      4      0
tw_sock_TCPv6         85     85    240   17    1 : tunables    0    0    0 : slabdata      5      5      0
request_sock_TCPv6    104    104    312   26    2 : tunables    0    0    0 : slabdata      4      4      0
TCPv6                156    180   2176   15    8 : tunables    0    0    0 : slabdata     12     12      0
kcopyd_job             0      0   3312    9    8 : tunables    0    0    0 : slabdata      0      0      0
cfq_io_cq            144    144    112   36    1 : tunables    0    0    0 : slabdata      4      4      0
cfq_queue             68     68    240   17    1 : tunables    0    0    0 : slabdata      4      4      0
mqueue_inode_cache     72     72    896   18    4 : tunables    0    0    0 : slabdata      4      4      0
f2fs_extent_tree    3025   3304     72   56    1 : tunables    0    0    0 : slabdata     59     59      0
discard_entry          0      0     88   46    1 : tunables    0    0    0 : slabdata      0      0      0
free_nid           11390  11390     24  170    1 : tunables    0    0    0 : slabdata     67     67      0
f2fs_inode_cache     684    684    880   18    4 : tunables    0    0    0 : slabdata     38     38      0
xfs_rui_item           0      0    672   24    4 : tunables    0    0    0 : slabdata      0      0      0
xfs_rud_item           0      0    152   26    1 : tunables    0    0    0 : slabdata      0      0      0
xfs_ili             2499   8640    168   24    1 : tunables    0    0    0 : slabdata    360    360      0
xfs_inode           4201   8432    960   17    4 : tunables    0    0    0 : slabdata    496    496      0
xfs_efd_item         380    380    416   19    2 : tunables    0    0    0 : slabdata     20     20      0
xfs_buf_item         416    416    248   16    1 : tunables    0    0    0 : slabdata     26     26      0
xfs_trans            810    828    216   18    1 : tunables    0    0    0 : slabdata     46     46      0
xfs_da_state          68     68    480   17    2 : tunables    0    0    0 : slabdata      4      4      0
bio-2                687    687    320   25    2 : tunables    0    0    0 : slabdata     28     28      0
isofs_inode_cache      0      0    616   26    4 : tunables    0    0    0 : slabdata      0      0      0
fat_inode_cache        0      0    704   23    4 : tunables    0    0    0 : slabdata      0      0      0
fat_cache              0      0     40  102    1 : tunables    0    0    0 : slabdata      0      0      0
jbd2_transaction_s      0      0    256   16    1 : tunables    0    0    0 : slabdata      0      0      0
jbd2_journal_handle      0      0     48   85    1 : tunables    0    0    0 : slabdata      0      0      0
jbd2_journal_head      0      0    120   34    1 : tunables    0    0    0 : slabdata      0      0      0
jbd2_revoke_table_s      0      0     16  256    1 : tunables    0    0    0 : slabdata      0      0      0
jbd2_revoke_record_s   5632   5632     32  128    1 : tunables    0    0    0 : slabdata     44     44      0
ext4_inode_cache      64     64   1008   16    4 : tunables    0    0    0 : slabdata      4      4      0
ext4_allocation_context     64     64    128   32    1 : tunables    0    0    0 : slabdata      2      2      0
ext4_io_end         1728   1728     64   64    1 : tunables    0    0    0 : slabdata     27     27      0
ext4_extent_status   2958   2958     40  102    1 : tunables    0    0    0 : slabdata     29     29      0
mbcache                0      0     56   73    1 : tunables    0    0    0 : slabdata      0      0      0
dio                    0      0    640   25    4 : tunables    0    0    0 : slabdata      0      0      0
pid_namespace         56     56   2224   14    8 : tunables    0    0    0 : slabdata      4      4      0
posix_timers_cache    810    901    232   17    1 : tunables    0    0    0 : slabdata     53     53      0
ip4-frags           1276   1276    184   22    1 : tunables    0    0    0 : slabdata     58     58      0
ip_fib_trie       364633 412675     48   85    1 : tunables    0    0    0 : slabdata   4855   4855      0
tw_sock_TCP          119    119    240   17    1 : tunables    0    0    0 : slabdata      7      7      0
request_sock_TCP     104    104    312   26    2 : tunables    0    0    0 : slabdata      4      4      0
TCP                  298    336   1984   16    8 : tunables    0    0    0 : slabdata     21     21      0
dax_cache             21     21    768   21    4 : tunables    0    0    0 : slabdata      1      1      0
request_queue         80     80   2024   16    8 : tunables    0    0    0 : slabdata      5      5      0
blkdev_requests        0      0    296   27    2 : tunables    0    0    0 : slabdata      0      0      0
blkdev_ioc           624    624    104   39    1 : tunables    0    0    0 : slabdata     16     16      0
user_namespace        72     72    440   18    2 : tunables    0    0    0 : slabdata      4      4      0
sock_inode_cache     790    975    640   25    4 : tunables    0    0    0 : slabdata     39     39      0
skbuff_fclone_cache    310    450    448   18    2 : tunables    0    0    0 : slabdata     25     25      0
file_lock_cache     1504   1577    208   19    1 : tunables    0    0    0 : slabdata     83     83      0
fsnotify_mark_connector    680    680     24  170    1 : tunables    0    0    0 : slabdata      4      4      0
net_namespace         68     68   1920   17    8 : tunables    0    0    0 : slabdata      4      4      0
shmem_inode_cache   2902   3816    656   24    4 : tunables    0    0    0 : slabdata    159    159      0
proc_inode_cache    2350   2350    640   25    4 : tunables    0    0    0 : slabdata     94     94      0
sigqueue             100    100    160   25    1 : tunables    0    0    0 : slabdata      4      4      0
bdev_cache            76     76    832   19    4 : tunables    0    0    0 : slabdata      4      4      0
kernfs_node_cache  16048  16048    120   34    1 : tunables    0    0    0 : slabdata    472    472      0
mnt_cache           1119   1764    384   21    2 : tunables    0    0    0 : slabdata     84     84      0
inode_cache         2604   2604    568   28    4 : tunables    0    0    0 : slabdata     93     93      0
dentry             11989  24465    192   21    1 : tunables    0    0    0 : slabdata   1165   1165      0
buffer_head         9654  18486    104   39    1 : tunables    0    0    0 : slabdata    474    474      0
vm_area_struct     33416  33649    176   23    1 : tunables    0    0    0 : slabdata   1463   1463      0
mm_struct            697    697    960   17    4 : tunables    0    0    0 : slabdata     41     41      0
files_cache          414    414    704   23    4 : tunables    0    0    0 : slabdata     18     18      0
signal_cache         540    540    896   18    4 : tunables    0    0    0 : slabdata     30     30      0
sighand_cache        302    330   2112   15    8 : tunables    0    0    0 : slabdata     22     22      0
task_struct          893    924   2688   12    8 : tunables    0    0    0 : slabdata     77     77      0
Acpi-Operand        1624   1624     72   56    1 : tunables    0    0    0 : slabdata     29     29      0
Acpi-Parse          3036   3431     56   73    1 : tunables    0    0    0 : slabdata     47     47      0
Acpi-State           357    357     80   51    1 : tunables    0    0    0 : slabdata      7      7      0
Acpi-Namespace       918    918     40  102    1 : tunables    0    0    0 : slabdata      9      9      0
anon_vma           15147  15147     80   51    1 : tunables    0    0    0 : slabdata    297    297      0
radix_tree_node    36342  50708    584   28    4 : tunables    0    0    0 : slabdata   1811   1811      0
dma-kmalloc-8192       0      0   8192    4    8 : tunables    0    0    0 : slabdata      0      0      0
dma-kmalloc-4096       0      0   4096    8    8 : tunables    0    0    0 : slabdata      0      0      0
dma-kmalloc-2048       0      0   2048   16    8 : tunables    0    0    0 : slabdata      0      0      0
dma-kmalloc-1024       0      0   1024   16    4 : tunables    0    0    0 : slabdata      0      0      0
dma-kmalloc-512       16     16    512   16    2 : tunables    0    0    0 : slabdata      1      1      0
dma-kmalloc-256        0      0    256   16    1 : tunables    0    0    0 : slabdata      0      0      0
dma-kmalloc-128        0      0    128   32    1 : tunables    0    0    0 : slabdata      0      0      0
dma-kmalloc-64         0      0     64   64    1 : tunables    0    0    0 : slabdata      0      0      0
dma-kmalloc-32         0      0     32  128    1 : tunables    0    0    0 : slabdata      0      0      0
dma-kmalloc-16         0      0     16  256    1 : tunables    0    0    0 : slabdata      0      0      0
dma-kmalloc-8          0      0      8  512    1 : tunables    0    0    0 : slabdata      0      0      0
dma-kmalloc-192        0      0    192   21    1 : tunables    0    0    0 : slabdata      0      0      0
dma-kmalloc-96         0      0     96   42    1 : tunables    0    0    0 : slabdata      0      0      0
kmalloc-8192         148    148   8192    4    8 : tunables    0    0    0 : slabdata     37     37      0
kmalloc-4096         390    513   4096    8    8 : tunables    0    0    0 : slabdata     65     65      0
kmalloc-2048        1536   2544   2048   16    8 : tunables    0    0    0 : slabdata    159    159      0
kmalloc-1024         948   1056   1024   16    4 : tunables    0    0    0 : slabdata     66     66      0
kmalloc-512         2124   4144    512   16    2 : tunables    0    0    0 : slabdata    259    259      0
kmalloc-256        15714  22336    256   16    1 : tunables    0    0    0 : slabdata   1396   1396      0
kmalloc-192        17887  20601    192   21    1 : tunables    0    0    0 : slabdata    981    981      0
kmalloc-128         6719  14144    128   32    1 : tunables    0    0    0 : slabdata    442    442      0
kmalloc-96          4164   8778     96   42    1 : tunables    0    0    0 : slabdata    209    209      0
kmalloc-64        778560 784832     64   64    1 : tunables    0    0    0 : slabdata  12263  12263      0
kmalloc-32          7168   7168     32  128    1 : tunables    0    0    0 : slabdata     56     56      0
kmalloc-16          3072   3072     16  256    1 : tunables    0    0    0 : slabdata     12     12      0
kmalloc-8         781470 983552      8  512    1 : tunables    0    0    0 : slabdata   1921   1921      0
kmem_cache_node      256    256     64   64    1 : tunables    0    0    0 : slabdata      4      4      0
kmem_cache           144    144    256   16    1 : tunables    0    0    0 : slabdata      9      9      0

# cat /etc/sysctl.d/local.conf
vm.dirty_expire_centisecs = 30000
vm.extfrag_threshold = 100
vm.page-cluster = 0
vm.swappiness = 10


As you can see, 3459820 kB of swap is used, yet the sum of VmSwap for all processes in the system is only 1041100 kB. Can it really be that the *kernel* itself is using so much swap?

Note that my only swap device is zram, and it has "discard" enabled:

[   19.969853] zram0: detected capacity change from 0 to 8589934592
[   22.004098] Adding 8388604k swap on /dev/zram0.  Priority:32767 extents:1 across:8388604k SSDsc

Even if I bring the system all the way down to only running init, udevd, and a bash shell (single-user mode), I still find that over 2 GB of swap is in use.

Would anyone please direct me to how I can find out what is using all this swap?
Comment 1 Matt Whitlock 2017-11-06 07:32:53 UTC
A possibly related observation: if I run "find / -xdev" in one terminal and "free -s1" in another terminal, I can watch the available memory steadily decline as the find spiders the file system. That's the _available_ memory, not the _free_ memory. Correct me if I'm wrong, but page cache and dentry cache shouldn't be affecting the _available_ memory, as cache entries are supposed to be reclaimable. My file system is XFS.
Comment 2 Matt Whitlock 2017-11-07 01:03:51 UTC
And can anyone explain this? (I turned off my swap after killing enough processes for that to become possible.)

# free
              total        used        free      shared  buff/cache   available
Mem:        8165636     2749000     4902576       26232      514060     4964520
Swap:             0           0           0

# echo 3 > /proc/sys/vm/drop_caches

# free
              total        used        free      shared  buff/cache   available
Mem:        8165636      638988     7328140       26232      198508     7177208
Swap:             0           0           0

Merely dropping the caches increases available memory by more than 2 GB. Again, shouldn't available memory exclude the caches?

It's as though something in the page cache or dentry cache is holding giant amounts of unaccountable kernel memory that is only released when the cache objects are destroyed.

Any ideas?
Comment 3 Matt Whitlock 2018-01-22 16:31:05 UTC
Note: This report may be related to
https://bugs.kde.org/show_bug.cgi?id=368838#c31
Comment 4 Matt Whitlock 2018-02-16 21:38:09 UTC
Nearly positive that this is due to some kind of leak in plasmashell. The kernel appears to free the memory when plasmashell is terminated. I'm closing this report as invalid, at least until and unless we discover that plasmashell is running into a kernel bug.