After upgrading our systems from 5.3.x to 5.4.51 we started experiencing problems where some processes would enter an uninterruptible sleep state. We do not have a swap enabled or a swap mount point. We have tried updating to 5.7.10 and still experience the same issue. Downgrading to 5.3.x fixes the problem. Looking through dmesg we found the following error message: [819876.687024] ------------[ cut here ]------------ [819876.687038] WARNING: CPU: 22 PID: 434 at kernel/rcu/tree.c:1358 rcu_advance_cbs_nowake+0x53/0x60 [819876.687038] Modules linked in: binfmt_misc(E) intel_rapl_msr(E) dell_smbios(E) wmi_bmof(E) dell_wmi_descriptor(E) sr_mod(E) cdrom(E) dcdbas(E) xt_comment(E) xt_multiport(E) ip6t_rpfilter(E) ip6t_RE JECT(E) nf_reject_ipv6(E) ipt_REJECT(E) nf_reject_ipv4(E) xt_conntrack(E) ebtable_nat(E) ebtable_broute(E) ip6table_nat(E) ip6table_mangle(E) ip6table_security(E) ip6table_raw(E) iptable_nat(E) nf_nat( E) iptable_mangle(E) iptable_security(E) iptable_raw(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) ip_set(E) nfnetlink(E) ebtable_filter(E) ebtables(E) ip6table_filter(E) ip6_tables(E) iptable _filter(E) intel_rapl_common(E) vfat(E) fat(E) skx_edac(E) nfit(E) libnvdimm(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clm ulni_intel(E) aesni_intel(E) crypto_simd(E) cryptd(E) glue_helper(E) wdat_wdt(E) pcspkr(E) sg(E) i2c_i801(E) lpc_ich(E) mei_me(E) mei(E) ioatdma(E) wmi(E) ipmi_si(E) ipmi_devintf(E) ipmi_msghandler(E) [819876.687069] acpi_power_meter(E) ip_tables(E) xfs(E) libcrc32c(E) sd_mod(E) crc32c_intel(E) megaraid_sas(E) mgag200(E) i40e(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fo ps(E) igb(E) ptp(E) pps_core(E) drm_vram_helper(E) dca(E) ttm(E) ahci(E) libahci(E) drm(E) i2c_algo_bit(E) libata(E) uas(E) usb_storage(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) [819876.687087] CPU: 22 PID: 434 Comm: kswapd0 Tainted: G E 5.4.51-1.el7.x86_64 #1 [819876.687087] Hardware name: Dell Inc. PowerEdge R740xd/0YNX56, BIOS 2.2.11 06/13/2019 [819876.687090] RIP: 0010:rcu_advance_cbs_nowake+0x53/0x60 [819876.687092] Code: 85 c0 74 f1 48 89 ee 48 89 df e8 68 ff ff ff 84 c0 75 17 48 89 df c6 07 00 0f 1f 40 00 5b 5d c3 48 89 f7 e8 1f df ff ff eb c3 <0f> 0b eb e5 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 53 48 89 [819876.687093] RSP: 0018:ffffaaf58d713a40 EFLAGS: 00010002 [819876.687094] RAX: 0000000000000001 RBX: ffffffff9a2655c0 RCX: 0000000008326518 [819876.687095] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffffff9a265140 [819876.687096] RBP: ffff9547a0eeb2c0 R08: ffff95317eefb920 R09: 0000000130dae188 [819876.687096] R10: 0000000000000000 R11: 0000000000000001 R12: ffff953c2a0c1da8 [819876.687097] R13: 000000000002b2c0 R14: ffffaaf58d713b48 R15: ffff95317eefb6d8 [819876.687099] FS: 0000000000000000(0000) GS:ffff9547a0ec0000(0000) knlGS:0000000000000000 [819876.687100] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [819876.687101] CR2: 00007fc07fe2e300 CR3: 00000005ffe0a001 CR4: 00000000007606e0 [819876.687101] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [819876.687102] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [819876.687103] PKRU: 55555554 [819876.687103] Call Trace: [819876.687114] ? radix_tree_node_ctor+0x90/0x90 [819876.687115] __call_rcu+0x2aa/0x4e0 [819876.687120] xas_free_nodes+0x8b/0xb0 [819876.687122] xas_store+0x1a4/0x550 [819876.687130] shadow_lru_isolate+0xed/0x170 [819876.687132] ? scan_shadow_nodes+0x30/0x30 [819876.687134] ? workingset_update_node+0x70/0x70 [819876.687136] __list_lru_walk_one+0x6d/0x170 [819876.687138] ? workingset_update_node+0x70/0x70 [819876.687140] list_lru_walk_one_irq+0x4c/0x70 [819876.687145] do_shrink_slab+0x150/0x310 [819876.687148] shrink_slab+0x239/0x2d0 [819876.687151] shrink_node+0xd5/0x450 [819876.687154] balance_pgdat+0x275/0x560 [819876.687157] kswapd+0x15a/0x3d0 [819876.687160] ? remove_wait_queue+0x60/0x60 [819876.687166] kthread+0xf8/0x130 [819876.687169] ? balance_pgdat+0x560/0x560 [819876.687170] ? kthread_bind+0x10/0x10 [819876.687173] ret_from_fork+0x35/0x40 [819876.687175] ---[ end trace ff791da2812c6a3e ]--- After that message we see our process that is in D state: [820150.702270] INFO: task python:11885 blocked for more than 122 seconds. [820150.702315] Tainted: G W E 5.4.51-1.el7.x86_64 #1 [820150.702348] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [820150.702387] python D 0 11885 11872 0x00000080 [820150.702390] Call Trace: [820150.702401] __schedule+0x2d1/0x6c0 [820150.702404] schedule+0x39/0xa0 [820150.702406] schedule_timeout+0x1c8/0x290 [820150.702413] ? rcu_accelerate_cbs+0x61/0x180 [820150.702415] wait_for_completion+0x123/0x190 [820150.702420] ? wake_up_q+0x70/0x70 [820150.702422] __wait_rcu_gp+0xfd/0x130 [820150.702425] synchronize_rcu+0x4e/0x80 [820150.702427] ? __call_rcu+0x4e0/0x4e0 [820150.702428] ? __bpf_trace_rcu_utilization+0x10/0x10 [820150.702432] packet_release+0x247/0x380 [820150.702437] __sock_release+0x3d/0xc0 [820150.702439] sock_close+0x11/0x20 [820150.702442] __fput+0xbe/0x250 [820150.702446] task_work_run+0x88/0xa0 [820150.702452] exit_to_usermode_loop+0x77/0xc7 [820150.702454] do_syscall_64+0x182/0x1b0 [820150.702457] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [820150.702459] RIP: 0033:0x7fd85ca297a0 [820150.702464] Code: Bad RIP value. [820150.702465] RSP: 002b:00007ffd05cc66b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000003 [820150.702466] RAX: 0000000000000000 RBX: 00007fd85d0cba50 RCX: 00007fd85ca297a0 [820150.702467] RDX: 0000000000000000 RSI: 00007fd85d06a210 RDI: 0000000000000003 [820150.702468] RBP: 0000000000000007 R08: 0000000000000000 R09: 00007fd85d0cc1e7 [820150.702469] R10: 0000000000000083 R11: 0000000000000246 R12: 0000000000000007 [820150.702470] R13: 00007fd85d06a210 R14: 00007fd85ccd4590 R15: 00000000022750a0
https://lkml.org/lkml/2021/9/17/1048