Summary: Upon executing certain unprivileged commands that (apparently) requests excessive resources, the system responsiveness decays to the point a reasonable user will no longer consider it viable, i.e. reach for the power button. (This is not a new issue but is mainly a tracker for discussion on linux-mm@ about congestion.) Reproducible: Always. Is not a regression, is fairly trivial to reproduce (eventually anyway). Hardware: various, baremetal and VM, but for this report: i7-2820QM, 8G RAM, 8G swap on plain partition Samsung 840 EVO Reproduce steps: 1. Install Fedora 30 or 31 2. Build webkitgtk per the first two lines here: https://trac.webkit.org/wiki/BuildingGtk Actual result: ~5-10 minutes system becomes slow to respond, mouse pointer stutters; intermittent recovery to normal behavior, at ~15-20 minutes, mouse pointer freezes there is no update to the GUI for at least 10 minutes although 30+ minutes is common. Sometimes the oom-killer is invoked before then, but usually it isn't. But that's well beyond the reasonable loss of control of the system so this isn't an oom-killer bug; also I understand oom-killer is mainly interested in kernel survival not user space performance. Remote ssh session is possible during this time, but even 10s top updates are only updated every 5-15 minutes (variable). Expected result: *shrug* I suppose the expectation is totally idealistic: that somehow unprivileged processes can't crater the responsiveness of the system. This is in response to: https://lore.kernel.org/linux-fsdevel/20200104090955.GF23195@dread.disaster.area/T/#m8b25fd42501d780d8053fc7aa9f4e3a28a19c49f Seems similar to aspects reported in: "System locks with kswapd0 and kworker taking full IO and mem" https://bugzilla.kernel.org/show_bug.cgi?id=99471
Created attachment 286667 [details] /proc/vmstat OK During the compile but system responsiveness is OK.
Created attachment 286669 [details] /proc/vmstat LOST Compiling continues, but system responsiveness is lost (everything frozen, mouse pointer, clock).
Created attachment 286671 [details] sysrq+t while unresponsive Compiling, GUI frozen, issued within a few seconds of attached /proc/vmstat remotely by ssh, which is sluggish (few seconds delays not minutes, so this isn't the worst it can get).
Created attachment 286695 [details] /proc/vmstat every 1s $ while sleep 1; do cat /proc/vmstat && date +%s;done And then run ninja. This is one of those less common instances where kernel oom-killer did trigger rather than the system getting wedged in indefinitely. But there's about an hour of data here. At unix time 1578512362 GUI frozen, including mouse pointer, ~10s at a time, for practical purposes the system is not usable, but perhaps a patient user would tolerate this; dmesg takes ~20s to start producing output; sysrq+t issued at monotonic time [ 5088.xxxxxx], I'll attach dmesg separately.
Created attachment 286697 [details] journalctl -k -o short-monotonic This matches up with comment 4's attachment.
Created attachment 286729 [details] read_vmstat output Using Michal Hocko's read_vmstat.c to record ~12 minutes while running 'ninja' to compile webkitgtk. I killed ninja ~20s before killing read_vmstat. The system is essentially unusable for 7 minutes prior to killing ninja which could only be done remotely (which took maybe 1/2 minute to enter and execute the kill command; ssh was very sluggish). During those 7 minutes, swap was not even 1/2 full, but the GUI frozen most of the time including the mouse pointer.
Same sequence but with vm.swappiness=100. GUI responsiveness is unchanged. Less heat (no fans for most of this). When I ssh in, it took about a minute, and I also get this call trace which I haven't seen before. 927.259025] rfkill: input handler disabled [ 1950.449811] sshd: page allocation failure: order:0, mode:0x800(GFP_NOWAIT), nodemask=(null),cpuset=/,mems_allowed=0 [ 1950.449818] CPU: 7 PID: 9930 Comm: sshd Not tainted 5.4.10-200.fc31.x86_64 #1 [ 1950.449819] Hardware name: Apple Inc. MacBookPro8,2/Mac-94245A3940C91C80, BIOS MBP81.88Z.0050.B00.1804101331 04/10/18 [ 1950.449820] Call Trace: [ 1950.449827] dump_stack+0x66/0x90 [ 1950.449831] warn_alloc.cold+0x7b/0xdf [ 1950.449833] __alloc_pages_slowpath+0xddd/0xe10 [ 1950.449836] __alloc_pages_nodemask+0x2f2/0x340 [ 1950.449838] alloc_slab_page+0x1a1/0x320 [ 1950.449840] new_slab+0x464/0x4e0 [ 1950.449842] ___slab_alloc+0x507/0x6a0 [ 1950.449846] ? avc_alloc_node+0x27/0x130 [ 1950.449849] ? context_struct_compute_av+0x382/0x4a0 [ 1950.449851] ? avc_alloc_node+0x27/0x130 [ 1950.449852] __slab_alloc+0x1c/0x30 [ 1950.449854] kmem_cache_alloc+0x1ef/0x220 [ 1950.449856] avc_alloc_node+0x27/0x130 [ 1950.449858] avc_compute_av+0x79/0x1b0 [ 1950.449860] avc_has_perm_noaudit+0xf4/0x130 [ 1950.449863] security_get_user_sids+0x3b2/0x4f0 [ 1950.449866] sel_write_user+0x154/0x240 [ 1950.449867] ? sel_write_relabel+0x270/0x270 [ 1950.449869] selinux_transaction_write+0x45/0x70 [ 1950.449872] vfs_write+0xb6/0x1a0 [ 1950.449874] ksys_write+0x5f/0xe0 [ 1950.449877] do_syscall_64+0x5b/0x1a0 [ 1950.449880] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 1950.449882] RIP: 0033:0x7f17174924b7 [ 1950.449884] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 [ 1950.449885] RSP: 002b:00007ffc85877728 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 1950.449887] RAX: ffffffffffffffda RBX: 0000000000001000 RCX: 00007f17174924b7 [ 1950.449888] RDX: 0000000000000034 RSI: 000055992e7e0360 RDI: 0000000000000003 [ 1950.449889] RBP: 00007ffc858787a0 R08: 00000000ffffffff R09: 00007ffc858775b0 [ 1950.449890] R10: 000055992e7cd900 R11: 0000000000000246 R12: 0000000000000003 [ 1950.449891] R13: 00007ffc85878850 R14: 000055992e7e0360 R15: 000055992e7e0360 [ 1950.449892] Mem-Info: [ 1950.449896] active_anon:1564525 inactive_anon:262785 isolated_anon:214 active_file:21887 inactive_file:19827 isolated_file:0 unevictable:1896 dirty:85 writeback:0 unstable:0 slab_reclaimable:21884 slab_unreclaimable:57453 mapped:30442 shmem:13676 pagetables:11469 bounce:0 free:30331 free_pcp:1063 free_cma:0 [ 1950.449899] Node 0 active_anon:6258100kB inactive_anon:1051140kB active_file:87548kB inactive_file:79308kB unevictable:7584kB isolated(anon):856kB isolated(file):0kB mapped:121768kB dirty:340kB writeback:0kB shmem:54704kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no [ 1950.449900] Node 0 DMA free:15344kB min:128kB low:160kB high:192kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15988kB managed:15360kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB [ 1950.449903] lowmem_reserve[]: 0 2110 7814 7814 7814 [ 1950.449905] Node 0 DMA32 free:40504kB min:18216kB low:22768kB high:27320kB active_anon:1776156kB inactive_anon:311760kB active_file:17080kB inactive_file:18484kB unevictable:2828kB writepending:0kB present:2255864kB managed:2190176kB mlocked:0kB kernel_stack:292kB pagetables:6884kB bounce:0kB free_pcp:596kB local_pcp:4kB free_cma:0kB [ 1950.449908] lowmem_reserve[]: 0 0 5703 5703 5703 [ 1950.449910] Node 0 Normal free:65476kB min:65620kB low:77928kB high:90236kB active_anon:4481988kB inactive_anon:740100kB active_file:69784kB inactive_file:61516kB unevictable:4756kB writepending:640kB present:6027264kB managed:5848656kB mlocked:0kB kernel_stack:14668kB pagetables:38992kB bounce:0kB free_pcp:3656kB local_pcp:404kB free_cma:0kB [ 1950.449913] lowmem_reserve[]: 0 0 0 0 0 [ 1950.449914] Node 0 DMA: 0*4kB 0*8kB 1*16kB (U) 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 1*512kB (U) 0*1024kB 1*2048kB (M) 3*4096kB (M) = 15344kB [ 1950.449920] Node 0 DMA32: 395*4kB (UME) 390*8kB (UME) 502*16kB (UME) 218*32kB (UME) 148*64kB (UME) 51*128kB (UME) 8*256kB (UME) 5*512kB (UM) 1*1024kB (M) 0*2048kB 0*4096kB = 41340kB [ 1950.449925] Node 0 Normal: 1287*4kB (UMEH) 2659*8kB (UMEH) 1499*16kB (UMEH) 394*32kB (UME) 16*64kB (UM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 64036kB [ 1950.449931] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 1950.449931] 127863 total pagecache pages [ 1950.449934] 72589 pages in swap cache [ 1950.449935] Swap cache stats: add 7841412, delete 7772527, find 1238474/3508218 [ 1950.449936] Free swap = 2017800kB [ 1950.449937] Total swap = 8391840kB [ 1950.449938] 2074779 pages RAM [ 1950.449938] 0 pages HighMem/MovableOnly [ 1950.449939] 61231 pages reserved [ 1950.449939] 0 pages cma reserved [ 1950.449940] 0 pages hwpoisoned [ 1950.449942] SLUB: Unable to allocate memory on node -1, gfp=0x900(GFP_NOWAIT|__GFP_ZERO) [ 1950.449943] cache: Acpi-Operand, object size: 72, buffer size: 72, default order: 0, min order: 0 [ 1950.449944] node 0: slabs: 213, objs: 11928, free: 0 [ 1950.449994] SLUB: Unable to allocate memory on node -1, gfp=0x900(GFP_NOWAIT|__GFP_ZERO) [ 1950.449996] cache: Acpi-Operand, object size: 72, buffer size: 72, default order: 0, min order: 0 [ 1950.449997] node 0: slabs: 213, objs: 11928, free: 0 [ 1950.450011] SLUB: Unable to allocate memory on node -1, gfp=0x900(GFP_NOWAIT|__GFP_ZERO) [ 1950.450012] cache: Acpi-Operand, object size: 72, buffer size: 72, default order: 0, min order: 0 [ 1950.450013] node 0: slabs: 213, objs: 11928, free: 0 [ 1950.450027] SLUB: Unable to allocate memory on node -1, gfp=0x900(GFP_NOWAIT|__GFP_ZERO) [ 1950.450028] cache: Acpi-Operand, object size: 72, buffer size: 72, default order: 0, min order: 0 [ 1950.450029] node 0: slabs: 213, objs: 11928, free: 0 [ 1950.450047] SLUB: Unable to allocate memory on node -1, gfp=0x900(GFP_NOWAIT|__GFP_ZERO) [ 1950.450048] cache: Acpi-Operand, object size: 72, buffer size: 72, default order: 0, min order: 0 [ 1950.450049] node 0: slabs: 213, objs: 11928, free: 0 [ 1950.450063] SLUB: Unable to allocate memory on node -1, gfp=0x900(GFP_NOWAIT|__GFP_ZERO) [ 1950.450064] cache: Acpi-Operand, object size: 72, buffer size: 72, default order: 0, min order: 0 [ 1950.450065] node 0: slabs: 213, objs: 11928, free: 0 [ 1950.450078] SLUB: Unable to allocate memory on node -1, gfp=0x900(GFP_NOWAIT|__GFP_ZERO) [ 1950.450079] cache: Acpi-Operand, object size: 72, buffer size: 72, default order: 0, min order: 0 [ 1950.450079] node 0: slabs: 213, objs: 11928, free: 0 [ 1950.450121] SLUB: Unable to allocate memory on node -1, gfp=0x900(GFP_NOWAIT|__GFP_ZERO) [ 1950.450122] cache: Acpi-Operand, object size: 72, buffer size: 72, default order: 0, min order: 0 [ 1950.450123] node 0: slabs: 213, objs: 11928, free: 0 [ 1950.450140] SLUB: Unable to allocate memory on node -1, gfp=0x900(GFP_NOWAIT|__GFP_ZERO) [ 1950.450141] cache: Acpi-Operand, object size: 72, buffer size: 72, default order: 0, min order: 0 [ 1950.450142] node 0: slabs: 213, objs: 11928, free: 0 [ 1950.450157] SLUB: Unable to allocate memory on node -1, gfp=0x900(GFP_NOWAIT|__GFP_ZERO) [ 1950.450158] cache: Acpi-Operand, object size: 72, buffer size: 72, default order: 0, min order: 0 [ 1950.450159] node 0: slabs: 213, objs: 11928, free: 0 [chris@fmac ~]$
Created attachment 286767 [details] read_vmstat output vm.swappiness=100 read_vmstate capture during compile, with vm.swappiness=100 ~12 minutes