Bug 206117

Summary: loss of responsiveness during heavy swap
Product: Memory Management Reporter: Chris Murphy (bugzilla)
Component: Page AllocatorAssignee: Andrew Morton (akpm)
Status: NEW ---    
Severity: normal CC: bugzilla, feng.tang, lilydjwg
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 5.4.5 Subsystem:
Regression: No Bisected commit-id:
Attachments: /proc/vmstat OK
/proc/vmstat LOST
sysrq+t while unresponsive
/proc/vmstat every 1s
journalctl -k -o short-monotonic
read_vmstat output
read_vmstat output vm.swappiness=100

Description Chris Murphy 2020-01-07 20:24:41 UTC
Summary:
Upon executing certain unprivileged commands that (apparently) requests excessive resources, the system responsiveness decays to the point a reasonable user will no longer consider it viable, i.e. reach for the power button. (This is not a new issue but is mainly a tracker for discussion on linux-mm@ about congestion.)

Reproducible:
Always.
Is not a regression, is fairly trivial to reproduce (eventually anyway).

Hardware:
various, baremetal and VM, but for this report: i7-2820QM, 8G RAM, 8G swap on plain partition Samsung 840 EVO

Reproduce steps:
1. Install Fedora 30 or 31
2. Build webkitgtk per the first two lines here:
https://trac.webkit.org/wiki/BuildingGtk


Actual result:

~5-10 minutes system becomes slow to respond, mouse pointer stutters; intermittent recovery to normal behavior, at ~15-20 minutes, mouse pointer freezes there is no update to the GUI for at least 10 minutes although 30+ minutes is common. Sometimes the oom-killer is invoked before then, but usually it isn't. But that's well beyond the reasonable loss of control of the system so this isn't an oom-killer bug; also I understand oom-killer is mainly interested in kernel survival not user space performance.

Remote ssh session is possible during this time, but even 10s top updates are only updated every 5-15 minutes (variable).

Expected result:

*shrug* I suppose the expectation is totally idealistic: that somehow unprivileged processes can't crater the responsiveness of the system.



This is in response to:
https://lore.kernel.org/linux-fsdevel/20200104090955.GF23195@dread.disaster.area/T/#m8b25fd42501d780d8053fc7aa9f4e3a28a19c49f

Seems similar to aspects reported in:
"System locks with kswapd0 and kworker taking full IO and mem"
https://bugzilla.kernel.org/show_bug.cgi?id=99471
Comment 1 Chris Murphy 2020-01-07 20:25:19 UTC
Created attachment 286667 [details]
/proc/vmstat OK

During the compile but system responsiveness is OK.
Comment 2 Chris Murphy 2020-01-07 20:26:16 UTC
Created attachment 286669 [details]
/proc/vmstat LOST

Compiling continues, but system responsiveness is lost (everything frozen, mouse pointer, clock).
Comment 3 Chris Murphy 2020-01-07 20:28:13 UTC
Created attachment 286671 [details]
sysrq+t while unresponsive

Compiling, GUI frozen, issued within a few seconds of attached /proc/vmstat remotely by ssh, which is sluggish (few seconds delays not minutes, so this isn't the worst it can get).
Comment 4 Chris Murphy 2020-01-08 21:01:42 UTC
Created attachment 286695 [details]
/proc/vmstat every 1s

$ while sleep 1; do cat /proc/vmstat && date +%s;done

And then run ninja. This is one of those less common instances where kernel oom-killer did trigger rather than the system getting wedged in indefinitely. But there's about an hour of data here. 

At unix time 1578512362
GUI frozen, including mouse pointer, ~10s at a time, for practical purposes the system is not usable, but perhaps a patient user would tolerate this; dmesg takes ~20s to start producing output; sysrq+t issued at monotonic time [ 5088.xxxxxx], I'll attach dmesg separately.
Comment 5 Chris Murphy 2020-01-08 21:04:28 UTC
Created attachment 286697 [details]
journalctl -k -o short-monotonic

This matches up with comment 4's attachment.
Comment 6 Chris Murphy 2020-01-10 06:12:09 UTC
Created attachment 286729 [details]
read_vmstat output

Using Michal Hocko's read_vmstat.c to record ~12 minutes while running 'ninja' to compile webkitgtk. I killed ninja ~20s before killing read_vmstat. The system is essentially unusable for 7 minutes prior to killing ninja which could only be done remotely (which took maybe 1/2 minute to enter and execute the kill command; ssh was very sluggish). During those 7 minutes, swap was not even 1/2 full, but the GUI frozen most of the time including the mouse pointer.
Comment 7 Chris Murphy 2020-01-11 23:55:43 UTC
Same sequence but with vm.swappiness=100. GUI responsiveness is unchanged. Less heat (no fans for most of this). When I ssh in, it took about a minute, and I also get this call trace which I haven't seen before.

  927.259025] rfkill: input handler disabled
[ 1950.449811] sshd: page allocation failure: order:0, mode:0x800(GFP_NOWAIT), nodemask=(null),cpuset=/,mems_allowed=0
[ 1950.449818] CPU: 7 PID: 9930 Comm: sshd Not tainted 5.4.10-200.fc31.x86_64 #1
[ 1950.449819] Hardware name: Apple Inc. MacBookPro8,2/Mac-94245A3940C91C80, BIOS    MBP81.88Z.0050.B00.1804101331 04/10/18
[ 1950.449820] Call Trace:
[ 1950.449827]  dump_stack+0x66/0x90
[ 1950.449831]  warn_alloc.cold+0x7b/0xdf
[ 1950.449833]  __alloc_pages_slowpath+0xddd/0xe10
[ 1950.449836]  __alloc_pages_nodemask+0x2f2/0x340
[ 1950.449838]  alloc_slab_page+0x1a1/0x320
[ 1950.449840]  new_slab+0x464/0x4e0
[ 1950.449842]  ___slab_alloc+0x507/0x6a0
[ 1950.449846]  ? avc_alloc_node+0x27/0x130
[ 1950.449849]  ? context_struct_compute_av+0x382/0x4a0
[ 1950.449851]  ? avc_alloc_node+0x27/0x130
[ 1950.449852]  __slab_alloc+0x1c/0x30
[ 1950.449854]  kmem_cache_alloc+0x1ef/0x220
[ 1950.449856]  avc_alloc_node+0x27/0x130
[ 1950.449858]  avc_compute_av+0x79/0x1b0
[ 1950.449860]  avc_has_perm_noaudit+0xf4/0x130
[ 1950.449863]  security_get_user_sids+0x3b2/0x4f0
[ 1950.449866]  sel_write_user+0x154/0x240
[ 1950.449867]  ? sel_write_relabel+0x270/0x270
[ 1950.449869]  selinux_transaction_write+0x45/0x70
[ 1950.449872]  vfs_write+0xb6/0x1a0
[ 1950.449874]  ksys_write+0x5f/0xe0
[ 1950.449877]  do_syscall_64+0x5b/0x1a0
[ 1950.449880]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1950.449882] RIP: 0033:0x7f17174924b7
[ 1950.449884] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[ 1950.449885] RSP: 002b:00007ffc85877728 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 1950.449887] RAX: ffffffffffffffda RBX: 0000000000001000 RCX: 00007f17174924b7
[ 1950.449888] RDX: 0000000000000034 RSI: 000055992e7e0360 RDI: 0000000000000003
[ 1950.449889] RBP: 00007ffc858787a0 R08: 00000000ffffffff R09: 00007ffc858775b0
[ 1950.449890] R10: 000055992e7cd900 R11: 0000000000000246 R12: 0000000000000003
[ 1950.449891] R13: 00007ffc85878850 R14: 000055992e7e0360 R15: 000055992e7e0360
[ 1950.449892] Mem-Info:
[ 1950.449896] active_anon:1564525 inactive_anon:262785 isolated_anon:214
                active_file:21887 inactive_file:19827 isolated_file:0
                unevictable:1896 dirty:85 writeback:0 unstable:0
                slab_reclaimable:21884 slab_unreclaimable:57453
                mapped:30442 shmem:13676 pagetables:11469 bounce:0
                free:30331 free_pcp:1063 free_cma:0
[ 1950.449899] Node 0 active_anon:6258100kB inactive_anon:1051140kB active_file:87548kB inactive_file:79308kB unevictable:7584kB isolated(anon):856kB isolated(file):0kB mapped:121768kB dirty:340kB writeback:0kB shmem:54704kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[ 1950.449900] Node 0 DMA free:15344kB min:128kB low:160kB high:192kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15988kB managed:15360kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 1950.449903] lowmem_reserve[]: 0 2110 7814 7814 7814
[ 1950.449905] Node 0 DMA32 free:40504kB min:18216kB low:22768kB high:27320kB active_anon:1776156kB inactive_anon:311760kB active_file:17080kB inactive_file:18484kB unevictable:2828kB writepending:0kB present:2255864kB managed:2190176kB mlocked:0kB kernel_stack:292kB pagetables:6884kB bounce:0kB free_pcp:596kB local_pcp:4kB free_cma:0kB
[ 1950.449908] lowmem_reserve[]: 0 0 5703 5703 5703
[ 1950.449910] Node 0 Normal free:65476kB min:65620kB low:77928kB high:90236kB active_anon:4481988kB inactive_anon:740100kB active_file:69784kB inactive_file:61516kB unevictable:4756kB writepending:640kB present:6027264kB managed:5848656kB mlocked:0kB kernel_stack:14668kB pagetables:38992kB bounce:0kB free_pcp:3656kB local_pcp:404kB free_cma:0kB
[ 1950.449913] lowmem_reserve[]: 0 0 0 0 0
[ 1950.449914] Node 0 DMA: 0*4kB 0*8kB 1*16kB (U) 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 1*512kB (U) 0*1024kB 1*2048kB (M) 3*4096kB (M) = 15344kB
[ 1950.449920] Node 0 DMA32: 395*4kB (UME) 390*8kB (UME) 502*16kB (UME) 218*32kB (UME) 148*64kB (UME) 51*128kB (UME) 8*256kB (UME) 5*512kB (UM) 1*1024kB (M) 0*2048kB 0*4096kB = 41340kB
[ 1950.449925] Node 0 Normal: 1287*4kB (UMEH) 2659*8kB (UMEH) 1499*16kB (UMEH) 394*32kB (UME) 16*64kB (UM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 64036kB
[ 1950.449931] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[ 1950.449931] 127863 total pagecache pages
[ 1950.449934] 72589 pages in swap cache
[ 1950.449935] Swap cache stats: add 7841412, delete 7772527, find 1238474/3508218
[ 1950.449936] Free swap  = 2017800kB
[ 1950.449937] Total swap = 8391840kB
[ 1950.449938] 2074779 pages RAM
[ 1950.449938] 0 pages HighMem/MovableOnly
[ 1950.449939] 61231 pages reserved
[ 1950.449939] 0 pages cma reserved
[ 1950.449940] 0 pages hwpoisoned
[ 1950.449942] SLUB: Unable to allocate memory on node -1, gfp=0x900(GFP_NOWAIT|__GFP_ZERO)
[ 1950.449943]   cache: Acpi-Operand, object size: 72, buffer size: 72, default order: 0, min order: 0
[ 1950.449944]   node 0: slabs: 213, objs: 11928, free: 0
[ 1950.449994] SLUB: Unable to allocate memory on node -1, gfp=0x900(GFP_NOWAIT|__GFP_ZERO)
[ 1950.449996]   cache: Acpi-Operand, object size: 72, buffer size: 72, default order: 0, min order: 0
[ 1950.449997]   node 0: slabs: 213, objs: 11928, free: 0
[ 1950.450011] SLUB: Unable to allocate memory on node -1, gfp=0x900(GFP_NOWAIT|__GFP_ZERO)
[ 1950.450012]   cache: Acpi-Operand, object size: 72, buffer size: 72, default order: 0, min order: 0
[ 1950.450013]   node 0: slabs: 213, objs: 11928, free: 0
[ 1950.450027] SLUB: Unable to allocate memory on node -1, gfp=0x900(GFP_NOWAIT|__GFP_ZERO)
[ 1950.450028]   cache: Acpi-Operand, object size: 72, buffer size: 72, default order: 0, min order: 0
[ 1950.450029]   node 0: slabs: 213, objs: 11928, free: 0
[ 1950.450047] SLUB: Unable to allocate memory on node -1, gfp=0x900(GFP_NOWAIT|__GFP_ZERO)
[ 1950.450048]   cache: Acpi-Operand, object size: 72, buffer size: 72, default order: 0, min order: 0
[ 1950.450049]   node 0: slabs: 213, objs: 11928, free: 0
[ 1950.450063] SLUB: Unable to allocate memory on node -1, gfp=0x900(GFP_NOWAIT|__GFP_ZERO)
[ 1950.450064]   cache: Acpi-Operand, object size: 72, buffer size: 72, default order: 0, min order: 0
[ 1950.450065]   node 0: slabs: 213, objs: 11928, free: 0
[ 1950.450078] SLUB: Unable to allocate memory on node -1, gfp=0x900(GFP_NOWAIT|__GFP_ZERO)
[ 1950.450079]   cache: Acpi-Operand, object size: 72, buffer size: 72, default order: 0, min order: 0
[ 1950.450079]   node 0: slabs: 213, objs: 11928, free: 0
[ 1950.450121] SLUB: Unable to allocate memory on node -1, gfp=0x900(GFP_NOWAIT|__GFP_ZERO)
[ 1950.450122]   cache: Acpi-Operand, object size: 72, buffer size: 72, default order: 0, min order: 0
[ 1950.450123]   node 0: slabs: 213, objs: 11928, free: 0
[ 1950.450140] SLUB: Unable to allocate memory on node -1, gfp=0x900(GFP_NOWAIT|__GFP_ZERO)
[ 1950.450141]   cache: Acpi-Operand, object size: 72, buffer size: 72, default order: 0, min order: 0
[ 1950.450142]   node 0: slabs: 213, objs: 11928, free: 0
[ 1950.450157] SLUB: Unable to allocate memory on node -1, gfp=0x900(GFP_NOWAIT|__GFP_ZERO)
[ 1950.450158]   cache: Acpi-Operand, object size: 72, buffer size: 72, default order: 0, min order: 0
[ 1950.450159]   node 0: slabs: 213, objs: 11928, free: 0
[chris@fmac ~]$
Comment 8 Chris Murphy 2020-01-11 23:58:53 UTC
Created attachment 286767 [details]
read_vmstat output vm.swappiness=100

read_vmstate capture during compile, with vm.swappiness=100
~12 minutes