Starting from some 6.5 kernels, a server started to experience problems with the root ext4 filesystem with processes accessing it ending up hanging. Here are relevant messages from dmesg: [242602.644812] INFO: task jbd2/nvme0n1p2-:127 blocked for more than 122 seconds. [242602.644817] Tainted: G T 6.5.7-custom #1 [242602.644818] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [242602.644819] task:jbd2/nvme0n1p2- state:D stack:0 pid:127 ppid:2 flags:0x00004000 [242602.644822] Call Trace: [242602.644824] <TASK> [242602.644825] __schedule+0x280/0xff0 [242602.644831] ? page_mkclean_one+0x90/0xd0 [242602.644834] schedule+0x52/0xa0 [242602.644836] io_schedule+0x3e/0x70 [242602.644838] folio_wait_bit_common+0x13f/0x310 [242602.644842] ? __pfx_wake_page_function+0x10/0x10 [242602.644844] write_cache_pages+0x12b/0x370 [242602.644847] ? __pfx_ext4_journalled_writepage_callback+0x10/0x10 [242602.644850] ext4_journalled_submit_inode_data_buffers+0x73/0xa0 [242602.644853] jbd2_journal_commit_transaction+0x428/0x16e0 [242602.644856] ? update_load_avg+0x71/0x680 [242602.644859] ? lock_timer_base+0x5c/0x80 [242602.644862] kjournald2+0xa7/0x270 [242602.644865] ? __pfx_autoremove_wake_function+0x10/0x10 [242602.644868] ? __pfx_kjournald2+0x10/0x10 [242602.644870] kthread+0xce/0x100 [242602.644873] ? __pfx_kthread+0x10/0x10 [242602.644875] ret_from_fork+0x2f/0x50 [242602.644877] ? __pfx_kthread+0x10/0x10 [242602.644879] ret_from_fork_asm+0x1b/0x30 [242602.644881] </TASK> [242602.644901] INFO: task vhost-3838:3850 blocked for more than 122 seconds. [242602.644902] Tainted: G T 6.5.7-custom #1 [242602.644903] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [242602.644903] task:vhost-3838 state:D stack:0 pid:3850 ppid:1 flags:0x00004000 [242602.644905] Call Trace: [242602.644906] <TASK> [242602.644906] __schedule+0x280/0xff0 [242602.644908] schedule+0x52/0xa0 [242602.644910] jbd2_log_wait_commit+0xd7/0x150 [242602.644912] ? __pfx_autoremove_wake_function+0x10/0x10 [242602.644914] jbd2_log_do_checkpoint+0x22f/0x2b0 [242602.644916] __jbd2_log_wait_for_space+0x4e/0x1f0 [242602.644918] add_transaction_credits+0x2e6/0x2f0 [242602.644920] start_this_handle+0xfc/0x5b0 [242602.644922] ? kmem_cache_alloc+0x15d/0x2a0 [242602.644924] jbd2__journal_start+0x125/0x1b0 [242602.644926] ext4_dirty_inode+0x36/0x90 [242602.644928] __mark_inode_dirty+0x4e/0x220 [242602.644931] generic_update_time+0x80/0xd0 [242602.644934] file_update_time+0xc7/0xe0 [242602.644937] ext4_page_mkwrite+0x93/0x550 [242602.644938] do_page_mkwrite+0x52/0xe0 [242602.644941] do_wp_page+0xf3/0xcc0 [242602.644944] ? balance_dirty_pages_ratelimited_flags+0x46/0x340 [242602.644948] __handle_mm_fault+0x2e1/0x2f0 [242602.644951] handle_mm_fault+0x106/0x300 [242602.644953] exc_page_fault+0x1f4/0x560 [242602.644956] asm_exc_page_fault+0x26/0x30 [242602.644960] RIP: 0010:rep_movs_alternative+0x33/0x70 [242602.644964] Code: 40 83 f9 08 73 21 85 c9 74 0f 8a 06 88 07 48 ff c7 48 ff c6 48 ff c9 75 f1 c3 cc cc cc cc 66 0f 1f 84 00 00 00 00 00 48 8b 06 <48> 89 07 48 83 c6 08 48 83 c7 08 83 e9 08 74 df 83 f9 08 73 e8 eb [242602.644965] RSP: 0018:ffffb850805abbe8 EFLAGS: 00010202 [242602.644967] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000000000000a [242602.644968] RDX: 0000000000000000 RSI: ffffb850805abc98 RDI: 00007f16b2de4a00 [242602.644969] RBP: 000000000000000a R08: 0000000000000000 R09: 0000000000000001 [242602.644969] R10: ffff961e8dcb0000 R11: 0000000000000000 R12: ffffb850805abe08 [242602.644970] R13: 0000000000000000 R14: ffffb850805abc98 R15: ffff961e8dcb0218 [242602.644971] copyout+0x20/0x40 [242602.644975] _copy_to_iter+0xdc/0x430 [242602.644977] ? translate_desc+0x74/0x160 [242602.644980] tun_do_read+0x282/0x750 [242602.644983] tun_recvmsg+0x6d/0x170 [242602.644985] handle_rx+0x5eb/0xa60 [242602.644987] vhost_worker+0x44/0x70 [242602.644989] vhost_task_fn+0x53/0xc0 [242602.645001] ? __pfx_vhost_task_fn+0x10/0x10 [242602.645003] ret_from_fork+0x2f/0x50 [242602.645005] ? __pfx_vhost_task_fn+0x10/0x10 [242602.645007] ret_from_fork_asm+0x1b/0x30 [242602.645008] </TASK> [242602.645009] INFO: task CPU 0/KVM:3852 blocked for more than 122 seconds. [242602.645010] Tainted: G T 6.5.7-custom #1 [242602.645010] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [242602.645011] task:CPU 0/KVM state:D stack:0 pid:3852 ppid:1 flags:0x00000000 [242602.645013] Call Trace: [242602.645014] <TASK> [242602.645014] __schedule+0x280/0xff0 [242602.645016] schedule+0x52/0xa0 [242602.645018] jbd2_log_wait_commit+0xd7/0x150 [242602.645019] ? __pfx_autoremove_wake_function+0x10/0x10 [242602.645021] jbd2_log_do_checkpoint+0x22f/0x2b0 [242602.645023] __jbd2_log_wait_for_space+0x4e/0x1f0 [242602.645025] add_transaction_credits+0x2e6/0x2f0 [242602.645026] start_this_handle+0xfc/0x5b0 [242602.645028] ? kmem_cache_alloc+0x15d/0x2a0 [242602.645029] jbd2__journal_start+0x125/0x1b0 [242602.645031] ext4_dirty_inode+0x36/0x90 [242602.645032] __mark_inode_dirty+0x4e/0x220 [242602.645034] generic_update_time+0x80/0xd0 [242602.645036] file_update_time+0xc7/0xe0 [242602.645038] ext4_page_mkwrite+0x93/0x550 [242602.645039] do_page_mkwrite+0x52/0xe0 [242602.645041] do_wp_page+0xf3/0xcc0 [242602.645043] __handle_mm_fault+0x2e1/0x2f0 [242602.645045] handle_mm_fault+0x106/0x300 [242602.645047] __get_user_pages+0x1f5/0x360 [242602.645049] get_user_pages_unlocked+0xcd/0x280 [242602.645051] hva_to_pfn+0xe9/0x420 [242602.645053] kvm_faultin_pfn+0xa5/0x3d0 [242602.645056] ? vmx_vcpu_load+0x1c/0x50 [242602.645058] ? kvm_arch_vcpu_load+0x65/0x210 [242602.645061] kvm_tdp_page_fault+0x127/0x180 [242602.645063] kvm_mmu_page_fault+0x295/0x760 [242602.645064] ? __pfx_emulator_write_gpr+0x10/0x10 [242602.645067] ? kvm_cpu_has_interrupt+0x5f/0x80 [242602.645070] ? __check_object_size+0x53/0x2d0 [242602.645072] vmx_handle_exit+0x12b/0x770 [242602.645074] kvm_arch_vcpu_ioctl_run+0x81b/0x1ed0 [242602.645076] kvm_vcpu_ioctl+0x1a3/0x6d0 [242602.645079] __x64_sys_ioctl+0x530/0xa70 [242602.645082] do_syscall_64+0x60/0x90 [242602.645083] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [242602.645086] RIP: 0033:0x7f172e60df9f [242602.645087] RSP: 002b:00007f1627ffd8e0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [242602.645089] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f172e60df9f [242602.645090] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000018 [242602.645090] RBP: 000055ef39425770 R08: 000055ef38594ea0 R09: 00007f16183be330 [242602.645091] R10: 072d79a2806b9d79 R11: 0000000000000246 R12: 0000000000000000 [242602.645092] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000 [242602.645093] </TASK> [242602.645093] INFO: task CPU 1/KVM:3853 blocked for more than 122 seconds. [242602.645094] Tainted: G T 6.5.7-custom #1 [242602.645095] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [242602.645095] task:CPU 1/KVM state:D stack:0 pid:3853 ppid:1 flags:0x00004000 [242602.645097] Call Trace: [242602.645098] <TASK> [242602.645098] __schedule+0x280/0xff0 [242602.645100] schedule+0x52/0xa0 [242602.645102] jbd2_log_wait_commit+0xd7/0x150 [242602.645103] ? __pfx_autoremove_wake_function+0x10/0x10 [242602.645105] jbd2_log_do_checkpoint+0x22f/0x2b0 [242602.645107] __jbd2_log_wait_for_space+0x4e/0x1f0 [242602.645109] ? __pfx_autoremove_wake_function+0x10/0x10 [242602.645111] add_transaction_credits+0x2e6/0x2f0 [242602.645112] ? __wake_up_common_lock+0x8a/0xd0 [242602.645114] start_this_handle+0xfc/0x5b0 [242602.645116] ? kmem_cache_alloc+0x15d/0x2a0 [242602.645117] jbd2__journal_start+0x125/0x1b0 [242602.645119] ext4_page_mkwrite+0x346/0x550 [242602.645120] do_page_mkwrite+0x52/0xe0 [242602.645122] do_wp_page+0xf3/0xcc0 [242602.645124] __handle_mm_fault+0x2e1/0x2f0 [242602.645126] handle_mm_fault+0x106/0x300 [242602.645128] __get_user_pages+0x1f5/0x360 [242602.645130] get_user_pages_unlocked+0xcd/0x280 [242602.645131] hva_to_pfn+0xe9/0x420 [242602.645132] kvm_faultin_pfn+0xa5/0x3d0 [242602.645134] kvm_tdp_page_fault+0x127/0x180 [242602.645135] kvm_mmu_page_fault+0x295/0x760 [242602.645137] ? vmx_vmexit+0x7d/0xd0 [242602.645138] ? vmx_vmexit+0x77/0xd0 [242602.645140] ? vmx_vmexit+0x9e/0xd0 [242602.645141] vmx_handle_exit+0x12b/0x770 [242602.645143] kvm_arch_vcpu_ioctl_run+0x81b/0x1ed0 [242602.645144] kvm_vcpu_ioctl+0x1a3/0x6d0 [242602.645146] __x64_sys_ioctl+0x530/0xa70 [242602.645148] do_syscall_64+0x60/0x90 [242602.645149] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [242602.645151] RIP: 0033:0x7f172e60df9f [242602.645152] RSP: 002b:00007f16277fc8e0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [242602.645153] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f172e60df9f [242602.645154] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000019 [242602.645154] RBP: 000055ef39456e40 R08: 000055ef38594ea0 R09: 0000000000000000 [242602.645155] R10: 000000000000000a R11: 0000000000000246 R12: 0000000000000000 [242602.645156] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000 [242602.645156] </TASK> [242602.645157] INFO: task CPU 2/KVM:3854 blocked for more than 122 seconds. [242602.645158] Tainted: G T 6.5.7-custom #1 [242602.645158] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [242602.645159] task:CPU 2/KVM state:D stack:0 pid:3854 ppid:1 flags:0x00004000 [242602.645160] Call Trace: [242602.645161] <TASK> [242602.645161] __schedule+0x280/0xff0 [242602.645163] schedule+0x52/0xa0 [242602.645164] jbd2_log_wait_commit+0xd7/0x150 [242602.645166] ? __pfx_autoremove_wake_function+0x10/0x10 [242602.645168] jbd2_log_do_checkpoint+0x22f/0x2b0 [242602.645170] __jbd2_log_wait_for_space+0x4e/0x1f0 [242602.645171] ? __pfx_autoremove_wake_function+0x10/0x10 [242602.645173] add_transaction_credits+0x2e6/0x2f0 [242602.645175] ? __jbd2_journal_file_buffer+0x6c/0x1f0 [242602.645176] start_this_handle+0xfc/0x5b0 [242602.645178] ? kmem_cache_alloc+0x15d/0x2a0 [242602.645179] jbd2__journal_start+0x125/0x1b0 [242602.645181] ext4_page_mkwrite+0x346/0x550 [242602.645182] do_page_mkwrite+0x52/0xe0 [242602.645184] do_wp_page+0xf3/0xcc0 [242602.645186] ? asm_sysvec_call_function_single+0x1a/0x20 [242602.645188] __handle_mm_fault+0x2e1/0x2f0 [242602.645190] handle_mm_fault+0x106/0x300 [242602.645192] __get_user_pages+0x1f5/0x360 [242602.645194] get_user_pages_unlocked+0xcd/0x280 [242602.645195] hva_to_pfn+0xe9/0x420 [242602.645196] kvm_faultin_pfn+0xa5/0x3d0 [242602.645198] kvm_tdp_page_fault+0x127/0x180 [242602.645199] kvm_mmu_page_fault+0x295/0x760 [242602.645201] ? sysvec_call_function_single+0xe/0x90 [242602.645203] ? vmx_vmexit+0x7d/0xd0 [242602.645204] ? vmx_vmexit+0x77/0xd0 [242602.645206] ? vmx_vmexit+0x9e/0xd0 [242602.645207] vmx_handle_exit+0x12b/0x770 [242602.645209] kvm_arch_vcpu_ioctl_run+0x81b/0x1ed0 [242602.645210] kvm_vcpu_ioctl+0x1a3/0x6d0 [242602.645212] __x64_sys_ioctl+0x530/0xa70 [242602.645214] do_syscall_64+0x60/0x90 [242602.645215] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [242602.645217] RIP: 0033:0x7f172e60df9f [242602.645218] RSP: 002b:00007f1626ffb8e0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [242602.645219] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f172e60df9f [242602.645219] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000001a [242602.645220] RBP: 000055ef3945ef50 R08: 000055ef38594ea0 R09: 00000000000000ff [242602.645221] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [242602.645221] R13: 0000000000000000 R14: 0000000000000000 R15: 00007f16267fc000 [242602.645222] </TASK> [242602.645223] INFO: task CPU 3/KVM:3855 blocked for more than 122 seconds. [242602.645223] Tainted: G T 6.5.7-custom #1 [242602.645224] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [242602.645225] task:CPU 3/KVM state:D stack:0 pid:3855 ppid:1 flags:0x00004000 [242602.645226] Call Trace: [242602.645226] <TASK> [242602.645227] __schedule+0x280/0xff0 [242602.645228] schedule+0x52/0xa0 [242602.645230] jbd2_log_wait_commit+0xd7/0x150 [242602.645231] ? __pfx_autoremove_wake_function+0x10/0x10 [242602.645233] jbd2_log_do_checkpoint+0x22f/0x2b0 [242602.645235] __jbd2_log_wait_for_space+0x4e/0x1f0 [242602.645237] ? __pfx_autoremove_wake_function+0x10/0x10 [242602.645239] add_transaction_credits+0x2e6/0x2f0 [242602.645240] start_this_handle+0xfc/0x5b0 [242602.645242] ? kmem_cache_alloc+0x15d/0x2a0 [242602.645243] jbd2__journal_start+0x125/0x1b0 [242602.645244] ext4_dirty_inode+0x36/0x90 [242602.645246] __mark_inode_dirty+0x4e/0x220 [242602.645247] generic_update_time+0x80/0xd0 [242602.645249] file_update_time+0xc7/0xe0 [242602.645251] ext4_page_mkwrite+0x93/0x550 [242602.645252] do_page_mkwrite+0x52/0xe0 [242602.645254] do_wp_page+0xf3/0xcc0 [242602.645256] __handle_mm_fault+0x2e1/0x2f0 [242602.645258] handle_mm_fault+0x106/0x300 [242602.645260] __get_user_pages+0x1f5/0x360 [242602.645261] get_user_pages_unlocked+0xcd/0x280 [242602.645263] hva_to_pfn+0xe9/0x420 [242602.645264] kvm_faultin_pfn+0xa5/0x3d0 [242602.645266] ? vmx_vcpu_load+0x1c/0x50 [242602.645267] ? kvm_arch_vcpu_load+0x65/0x210 [242602.645269] kvm_tdp_page_fault+0x127/0x180 [242602.645270] kvm_mmu_page_fault+0x295/0x760 [242602.645271] ? __pfx_emulator_write_gpr+0x10/0x10 [242602.645273] ? kvm_cpu_has_interrupt+0x5f/0x80 [242602.645275] ? __check_object_size+0x53/0x2d0 [242602.645277] vmx_handle_exit+0x12b/0x770 [242602.645279] kvm_arch_vcpu_ioctl_run+0x81b/0x1ed0 [242602.645280] kvm_vcpu_ioctl+0x1a3/0x6d0 [242602.645282] __x64_sys_ioctl+0x530/0xa70 [242602.645284] do_syscall_64+0x60/0x90 [242602.645285] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [242602.645287] RIP: 0033:0x7f172e60df9f [242602.645288] RSP: 002b:00007f16267fa8e0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [242602.645289] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f172e60df9f [242602.645289] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000001b [242602.645290] RBP: 000055ef39466ee0 R08: 000055ef38594ea0 R09: 00000000ffffffff [242602.645291] R10: 0000000000000028 R11: 0000000000000246 R12: 0000000000000000 [242602.645292] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000 [242602.645292] </TASK> [242602.645293] INFO: task worker:67152 blocked for more than 122 seconds. [242602.645294] Tainted: G T 6.5.7-custom #1 [242602.645295] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [242602.645295] task:worker state:D stack:0 pid:67152 ppid:1 flags:0x00000000 [242602.645297] Call Trace: [242602.645297] <TASK> [242602.645298] __schedule+0x280/0xff0 [242602.645299] schedule+0x52/0xa0 [242602.645301] schedule_preempt_disabled+0x9/0x10 [242602.645303] rwsem_down_read_slowpath+0x1f2/0x390 [242602.645305] down_read+0x30/0xa0 [242602.645306] do_madvise+0xe2/0x300 [242602.645308] __x64_sys_madvise+0x27/0x40 [242602.645309] do_syscall_64+0x60/0x90 [242602.645311] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [242602.645313] RIP: 0033:0x7f172e611d7b [242602.645313] RSP: 002b:00007f16016f1b18 EFLAGS: 00000206 ORIG_RAX: 000000000000001c [242602.645314] RAX: ffffffffffffffda RBX: 00007f16016f2cdc RCX: 00007f172e611d7b [242602.645315] RDX: 0000000000000004 RSI: 00000000007fb000 RDI: 00007f1600ef2000 [242602.645316] RBP: 00007f1600ef2000 R08: 00007f16016f26c0 R09: 0000000000000081 [242602.645317] R10: 0000000000000008 R11: 0000000000000206 R12: 0000000000801000 [242602.645317] R13: 0000000000000000 R14: 00007f1600ef06b0 R15: 00007f1600ef2000 [242602.645318] </TASK> [242602.645319] INFO: task worker:67153 blocked for more than 122 seconds. [242602.645319] Tainted: G T 6.5.7-custom #1 [242602.645320] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [242602.645320] task:worker state:D stack:0 pid:67153 ppid:1 flags:0x00000000 [242602.645322] Call Trace: [242602.645322] <TASK> [242602.645323] __schedule+0x280/0xff0 [242602.645324] schedule+0x52/0xa0 [242602.645326] schedule_preempt_disabled+0x9/0x10 [242602.645328] rwsem_down_read_slowpath+0x1f2/0x390 [242602.645329] down_read+0x30/0xa0 [242602.645330] do_madvise+0xe2/0x300 [242602.645331] __x64_sys_madvise+0x27/0x40 [242602.645332] do_syscall_64+0x60/0x90 [242602.645334] ? __x64_sys_rt_sigprocmask+0x7e/0xe0 [242602.645336] ? syscall_exit_to_user_mode+0x21/0x50 [242602.645337] ? do_syscall_64+0x6c/0x90 [242602.645339] ? do_syscall_64+0x6c/0x90 [242602.645340] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [242602.645342] RIP: 0033:0x7f172e611d7b [242602.645342] RSP: 002b:00007f1603cfbb18 EFLAGS: 00000206 ORIG_RAX: 000000000000001c [242602.645343] RAX: ffffffffffffffda RBX: 00007f1603cfccdc RCX: 00007f172e611d7b [242602.645344] RDX: 0000000000000004 RSI: 00000000007fb000 RDI: 00007f16034fc000 [242602.645345] RBP: 00007f16034fc000 R08: 00007f1603cfc6c0 R09: 0000000000000081 [242602.645345] R10: 0000000000000008 R11: 0000000000000206 R12: 0000000000801000 [242602.645346] R13: 0000000000000000 R14: 00007f16016f16b0 R15: 00007f16034fc000 [242602.645347] </TASK> [242602.645348] INFO: task worker:67157 blocked for more than 122 seconds. [242602.645348] Tainted: G T 6.5.7-custom #1 [242602.645349] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [242602.645349] task:worker state:D stack:0 pid:67157 ppid:1 flags:0x00000000 [242602.645351] Call Trace: [242602.645351] <TASK> [242602.645351] __schedule+0x280/0xff0 [242602.645353] schedule+0x52/0xa0 [242602.645355] schedule_preempt_disabled+0x9/0x10 [242602.645357] rwsem_down_read_slowpath+0x1f2/0x390 [242602.645358] down_read+0x30/0xa0 [242602.645359] do_madvise+0xe2/0x300 [242602.645360] __x64_sys_madvise+0x27/0x40 [242602.645361] do_syscall_64+0x60/0x90 [242602.645363] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [242602.645364] RIP: 0033:0x7f172e611d7b [242602.645365] RSP: 002b:00007f15ebffeb18 EFLAGS: 00000206 ORIG_RAX: 000000000000001c [242602.645366] RAX: ffffffffffffffda RBX: 00007f15ebfffcdc RCX: 00007f172e611d7b [242602.645367] RDX: 0000000000000004 RSI: 00000000007fb000 RDI: 00007f15eb7ff000 [242602.645367] RBP: 00007f15eb7ff000 R08: 00007f15ebfff6c0 R09: 0000000000000081 [242602.645368] R10: 0000000000000008 R11: 0000000000000206 R12: 0000000000801000 [242602.645369] R13: 0000000000000000 R14: 00007f16020f46b0 R15: 00007f15eb7ff000 [242602.645369] </TASK> [242602.645370] INFO: task worker:67160 blocked for more than 122 seconds. [242602.645371] Tainted: G T 6.5.7-custom #1 [242602.645371] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [242602.645372] task:worker state:D stack:0 pid:67160 ppid:1 flags:0x00000000 [242602.645373] Call Trace: [242602.645374] <TASK> [242602.645374] __schedule+0x280/0xff0 [242602.645376] ? _raw_spin_unlock_irqrestore+0x12/0x20 [242602.645378] ? try_to_wake_up+0x1f2/0x3f0 [242602.645380] schedule+0x52/0xa0 [242602.645382] schedule_preempt_disabled+0x9/0x10 [242602.645384] rwsem_down_read_slowpath+0x1f2/0x390 [242602.645385] down_read+0x30/0xa0 [242602.645386] do_madvise+0xe2/0x300 [242602.645387] __x64_sys_madvise+0x27/0x40 [242602.645388] do_syscall_64+0x60/0x90 [242602.645389] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [242602.645391] RIP: 0033:0x7f172e611d7b [242602.645392] RSP: 002b:00007f15eaffcb18 EFLAGS: 00000206 ORIG_RAX: 000000000000001c [242602.645393] RAX: ffffffffffffffda RBX: 00007f15eaffdcdc RCX: 00007f172e611d7b [242602.645394] RDX: 0000000000000004 RSI: 00000000007fb000 RDI: 00007f15ea7fd000 [242602.645394] RBP: 00007f15ea7fd000 R08: 00007f15eaffd6c0 R09: 0000000000000081 [242602.645395] R10: 0000000000000008 R11: 0000000000000206 R12: 0000000000801000 [242602.645395] R13: 0000000000000000 R14: 00007f15eb7fd6b0 R15: 00007f15ea7fd000 [242602.645396] </TASK> [242602.645397] Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings I've not seen this issue with 6.4 kernels. I started to use the 6.5 series from version 6.5.3 onwards, but I think I first saw this error with 6.5.5 or 6.5.6. # mount | head -1 /dev/nvme0n1p2 on / type ext4 (rw,noatime,nodiratime,nodioread_nolock,discard,nodelalloc) # zgrep -i ext4 /proc/config.gz CONFIG_EXT4_FS=y CONFIG_EXT4_USE_FOR_EXT2=y CONFIG_EXT4_FS_POSIX_ACL=y CONFIG_EXT4_FS_SECURITY=y # CONFIG_EXT4_DEBUG is not set
Please provide the output of mount | grep ext4 And sudo dumpe2fs -h /dev/partition
To be precise: sudo dumpe2fs -h /dev/nvme0n1p2
The mount line for root I already posted was this: /dev/nvme0n1p2 on / type ext4 (rw,noatime,nodiratime,nodioread_nolock,discard,nodelalloc) I'm sorry, but I recreated the root partition for safety. However, here's the dumpe2fs output on a binary copy of the old failing partition (only label changed with tune2fs -L): # dumpe2fs -h /dev/vg/root-old dumpe2fs 1.47.0 (5-Feb-2023) Filesystem volume name: root-old Last mounted on: / Filesystem UUID: febed35b-df13-4e6a-a9de-4ae17673322a Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype extent 64bit flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum Filesystem flags: signed_directory_hash Default mount options: journal_data user_xattr acl Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 2621440 Block count: 10485760 Reserved block count: 1048576 Free blocks: 5921591 Free inodes: 2208475 First block: 0 Block size: 4096 Fragment size: 4096 Group descriptor size: 64 Reserved GDT blocks: 1024 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 8192 Inode blocks per group: 512 Flex block group size: 16 Filesystem created: Sat Apr 25 15:02:31 2020 Last mount time: Sat Oct 14 17:49:31 2023 Last write time: Sat Oct 14 17:49:24 2023 Mount count: 133 Maximum mount count: -1 Last checked: Thu Nov 11 15:55:30 2021 Check interval: 0 (<none>) Lifetime writes: 79 TB Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 256 Required extra isize: 32 Desired extra isize: 32 Journal inode: 8 Default directory hash: half_md4 Directory Hash Seed: 764a85fe-24b0-4a45-96d7-7b43a75d90a2 Journal backup: inode blocks Checksum type: crc32c Checksum: 0xc4176fe3 Journal features: journal_64bit journal_checksum_v3 Total journal size: 256M Total journal blocks: 65536 Max transaction length: 65536 Fast commit length: 0 Journal sequence: 0x0080d73c Journal start: 0 Journal checksum type: crc32c Journal checksum: 0x6d8e4ded dumpe2fs: Superblock checksum does not match superblock while trying to open /dev/vg/root-old *** Run e2fsck now!
However, the issue seems to occur on the newly-created root as well: [21381.865721] INFO: task CPU 0/KVM:7834 blocked for more than 122 seconds. [21381.865739] Tainted: G T 6.5.7-custom #14 [21381.865744] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [21381.865748] task:CPU 0/KVM state:D stack:0 pid:7834 ppid:1 flags:0x00004000 [21381.865761] Call Trace: [21381.865767] <TASK> [21381.865770] __schedule+0x280/0xff0 [21381.865790] schedule+0x52/0xa0 [21381.865800] jbd2_log_wait_commit+0xd7/0x150 [21381.865809] ? __pfx_autoremove_wake_function+0x10/0x10 [21381.865822] jbd2_log_do_checkpoint+0x22f/0x2b0 [21381.865834] __jbd2_log_wait_for_space+0x4e/0x1f0 [21381.865843] add_transaction_credits+0x2e6/0x2f0 [21381.865851] start_this_handle+0xfc/0x5b0 [21381.865858] ? kmem_cache_alloc+0x15d/0x2a0 [21381.865867] jbd2__journal_start+0x125/0x1b0 [21381.865874] ext4_page_mkwrite+0x346/0x550 [21381.865883] do_page_mkwrite+0x52/0xe0 [21381.865895] do_wp_page+0xf3/0xcc0 [21381.865907] ? vmx_vcpu_load+0x1c/0x50 [21381.865917] ? kvm_arch_vcpu_load+0x65/0x210 [21381.865928] __handle_mm_fault+0x2e1/0x2f0 [21381.865938] handle_mm_fault+0x106/0x300 [21381.865948] __get_user_pages+0x1f5/0x360 [21381.865957] get_user_pages_unlocked+0xcd/0x280 [21381.865966] hva_to_pfn+0xe9/0x420 [21381.865973] kvm_faultin_pfn+0xa5/0x3d0 [21381.865984] kvm_tdp_page_fault+0x127/0x180 [21381.865991] kvm_mmu_page_fault+0x295/0x760 [21381.865999] ? sysvec_call_function+0xe/0x90 [21381.866011] ? vmx_vmexit+0x7d/0xd0 [21381.866018] ? vmx_vmexit+0x77/0xd0 [21381.866025] ? vmx_vmexit+0x9e/0xd0 [21381.866031] vmx_handle_exit+0x12b/0x770 [21381.866040] kvm_arch_vcpu_ioctl_run+0x81b/0x1ed0 [21381.866050] kvm_vcpu_ioctl+0x1a3/0x6d0 [21381.866061] __x64_sys_ioctl+0x530/0xa70 [21381.866069] do_syscall_64+0x60/0x90 [21381.866077] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [21381.866091] RIP: 0033:0x7f4b9c513f9f [21381.866098] RSP: 002b:00007f4b9a25d8e0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [21381.866106] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f4b9c513f9f [21381.866111] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000018 [21381.866115] RBP: 000055764903b770 R08: 000055764716dea0 R09: 0000000000000640 [21381.866119] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [21381.866123] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000 [21381.866127] </TASK> [21381.866130] INFO: task CPU 1/KVM:7835 blocked for more than 122 seconds. [21381.866135] Tainted: G T 6.5.7-custom #14 [21381.866139] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [21381.866141] task:CPU 1/KVM state:D stack:0 pid:7835 ppid:1 flags:0x00004000 [21381.866149] Call Trace: [21381.866152] <TASK> [21381.866154] __schedule+0x280/0xff0 [21381.866164] schedule+0x52/0xa0 [21381.866172] jbd2_log_wait_commit+0xd7/0x150 [21381.866178] ? __pfx_autoremove_wake_function+0x10/0x10 [21381.866189] jbd2_log_do_checkpoint+0x22f/0x2b0 [21381.866198] __jbd2_log_wait_for_space+0x4e/0x1f0 [21381.866208] ? __pfx_autoremove_wake_function+0x10/0x10 [21381.866217] add_transaction_credits+0x2e6/0x2f0 [21381.866224] ? ext4_mark_iloc_dirty+0x202/0x5d0 [21381.866230] start_this_handle+0xfc/0x5b0 [21381.866237] ? kmem_cache_alloc+0x15d/0x2a0 [21381.866244] jbd2__journal_start+0x125/0x1b0 [21381.866251] ext4_dirty_inode+0x36/0x90 [21381.866258] __mark_inode_dirty+0x4e/0x220 [21381.866269] generic_update_time+0x80/0xd0 [21381.866279] file_update_time+0xc7/0xe0 [21381.866288] ext4_page_mkwrite+0x93/0x550 [21381.866295] do_page_mkwrite+0x52/0xe0 [21381.866303] do_wp_page+0xf3/0xcc0 [21381.866312] __handle_mm_fault+0x2e1/0x2f0 [21381.866322] handle_mm_fault+0x106/0x300 [21381.866333] __get_user_pages+0x1f5/0x360 [21381.866340] get_user_pages_unlocked+0xcd/0x280 [21381.866348] hva_to_pfn+0xe9/0x420 [21381.866354] kvm_faultin_pfn+0xa5/0x3d0 [21381.866362] kvm_tdp_page_fault+0x127/0x180 [21381.866369] kvm_mmu_page_fault+0x295/0x760 [21381.866375] ? __check_object_size+0x53/0x2d0 [21381.866385] vmx_handle_exit+0x12b/0x770 [21381.866394] kvm_arch_vcpu_ioctl_run+0x81b/0x1ed0 [21381.866402] kvm_vcpu_ioctl+0x1a3/0x6d0 [21381.866411] __x64_sys_ioctl+0x530/0xa70 [21381.866417] do_syscall_64+0x60/0x90 [21381.866425] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [21381.866434] RIP: 0033:0x7f4b9c513f9f [21381.866439] RSP: 002b:00007f4b99a5c8e0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [21381.866444] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f4b9c513f9f [21381.866448] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000019 [21381.866452] RBP: 000055764906ce40 R08: 000055764716dea0 R09: 00007f4a88003010 [21381.866455] R10: 0000000000000100 R11: 0000000000000246 R12: 0000000000000000 [21381.866459] R13: 0000000000000000 R14: 00007ffcce9ee160 R15: 00007f4b9925d000 [21381.866463] </TASK> [21381.866465] INFO: task CPU 2/KVM:7836 blocked for more than 122 seconds. [21381.866469] Tainted: G T 6.5.7-custom #14 [21381.866472] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [21381.866475] task:CPU 2/KVM state:D stack:0 pid:7836 ppid:1 flags:0x00004000 [21381.866481] Call Trace: [21381.866484] <TASK> [21381.866486] __schedule+0x280/0xff0 [21381.866495] schedule+0x52/0xa0 [21381.866503] jbd2_log_wait_commit+0xd7/0x150 [21381.866509] ? __pfx_autoremove_wake_function+0x10/0x10 [21381.866518] jbd2_log_do_checkpoint+0x22f/0x2b0 [21381.866527] __jbd2_log_wait_for_space+0x4e/0x1f0 [21381.866536] ? __pfx_autoremove_wake_function+0x10/0x10 [21381.866546] add_transaction_credits+0x2e6/0x2f0 [21381.866552] ? ext4_mark_iloc_dirty+0x202/0x5d0 [21381.866558] start_this_handle+0xfc/0x5b0 [21381.866564] ? kmem_cache_alloc+0x15d/0x2a0 [21381.866587] jbd2__journal_start+0x125/0x1b0 [21381.866594] ext4_dirty_inode+0x36/0x90 [21381.866600] __mark_inode_dirty+0x4e/0x220 [21381.866609] generic_update_time+0x80/0xd0 [21381.866617] file_update_time+0xc7/0xe0 [21381.866626] ext4_page_mkwrite+0x93/0x550 [21381.866632] do_page_mkwrite+0x52/0xe0 [21381.866641] do_wp_page+0xf3/0xcc0 [21381.866650] __handle_mm_fault+0x2e1/0x2f0 [21381.866660] handle_mm_fault+0x106/0x300 [21381.866670] __get_user_pages+0x1f5/0x360 [21381.866678] get_user_pages_unlocked+0xcd/0x280 [21381.866686] hva_to_pfn+0xe9/0x420 [21381.866692] kvm_faultin_pfn+0xa5/0x3d0 [21381.866700] ? fast_page_fault+0xa8/0x510 [21381.866710] kvm_tdp_page_fault+0x127/0x180 [21381.866715] kvm_mmu_page_fault+0x295/0x760 [21381.866722] ? __check_object_size+0x53/0x2d0 [21381.866731] vmx_handle_exit+0x12b/0x770 [21381.866740] kvm_arch_vcpu_ioctl_run+0x81b/0x1ed0 [21381.866748] kvm_vcpu_ioctl+0x1a3/0x6d0 [21381.866757] __x64_sys_ioctl+0x530/0xa70 [21381.866764] do_syscall_64+0x60/0x90 [21381.866771] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [21381.866781] RIP: 0033:0x7f4b9c513f9f [21381.866785] RSP: 002b:00007f4b9925b8e0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [21381.866790] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f4b9c513f9f [21381.866794] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000001a [21381.866797] RBP: 0000557649074f50 R08: 000055764716dea0 R09: 00000000000000ff [21381.866801] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [21381.866804] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000 [21381.866808] </TASK> [21381.866811] INFO: task CPU 3/KVM:7837 blocked for more than 122 seconds. [21381.866814] Tainted: G T 6.5.7-custom #14 [21381.866817] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [21381.866820] task:CPU 3/KVM state:D stack:0 pid:7837 ppid:1 flags:0x00004000 [21381.866826] Call Trace: [21381.866829] <TASK> [21381.866831] __schedule+0x280/0xff0 [21381.866840] schedule+0x52/0xa0 [21381.866848] jbd2_log_wait_commit+0xd7/0x150 [21381.866854] ? __pfx_autoremove_wake_function+0x10/0x10 [21381.866863] jbd2_log_do_checkpoint+0x22f/0x2b0 [21381.866873] __jbd2_log_wait_for_space+0x4e/0x1f0 [21381.866881] ? __pfx_autoremove_wake_function+0x10/0x10 [21381.866891] add_transaction_credits+0x2e6/0x2f0 [21381.866897] ? __wake_up_common_lock+0x8a/0xd0 [21381.866906] start_this_handle+0xfc/0x5b0 [21381.866913] ? kmem_cache_alloc+0x15d/0x2a0 [21381.866919] jbd2__journal_start+0x125/0x1b0 [21381.866926] ext4_page_mkwrite+0x346/0x550 [21381.866932] do_page_mkwrite+0x52/0xe0 [21381.866941] do_wp_page+0xf3/0xcc0 [21381.866950] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 [21381.866961] __handle_mm_fault+0x2e1/0x2f0 [21381.866971] handle_mm_fault+0x106/0x300 [21381.866981] __get_user_pages+0x1f5/0x360 [21381.866988] get_user_pages_unlocked+0xcd/0x280 [21381.866996] hva_to_pfn+0xe9/0x420 [21381.867001] kvm_faultin_pfn+0xa5/0x3d0 [21381.867010] ? psi_group_change+0x171/0x3c0 [21381.867020] kvm_tdp_page_fault+0x127/0x180 [21381.867026] kvm_mmu_page_fault+0x295/0x760 [21381.867033] ? __check_object_size+0x53/0x2d0 [21381.867041] vmx_handle_exit+0x12b/0x770 [21381.867049] kvm_arch_vcpu_ioctl_run+0x81b/0x1ed0 [21381.867057] kvm_vcpu_ioctl+0x1a3/0x6d0 [21381.867066] __x64_sys_ioctl+0x530/0xa70 [21381.867073] do_syscall_64+0x60/0x90 [21381.867080] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [21381.867089] RIP: 0033:0x7f4b9c513f9f [21381.867093] RSP: 002b:00007f4b98a578e0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [21381.867098] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f4b9c513f9f [21381.867101] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000001b [21381.867104] RBP: 000055764907cee0 R08: 000055764716dea0 R09: 0000000000000000 [21381.867108] R10: 0000000000000006 R11: 0000000000000246 R12: 0000000000000000 [21381.867111] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000 [21381.867115] </TASK> [21381.867121] INFO: task kworker/u8:1:8446 blocked for more than 122 seconds. [21381.867124] Tainted: G T 6.5.7-custom #14 [21381.867127] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [21381.867129] task:kworker/u8:1 state:D stack:0 pid:8446 ppid:2 flags:0x00004000 [21381.867137] Workqueue: writeback wb_workfn (flush-259:0) [21381.867150] Call Trace: [21381.867153] <TASK> [21381.867155] __schedule+0x280/0xff0 [21381.867172] schedule+0x52/0xa0 [21381.867175] jbd2_log_wait_commit+0xd7/0x150 [21381.867176] ? __pfx_autoremove_wake_function+0x10/0x10 [21381.867179] jbd2_log_do_checkpoint+0x22f/0x2b0 [21381.867182] __jbd2_log_wait_for_space+0x4e/0x1f0 [21381.867185] ? __pfx_autoremove_wake_function+0x10/0x10 [21381.867188] add_transaction_credits+0x2e6/0x2f0 [21381.867190] ? __wake_up_common_lock+0x8a/0xd0 [21381.867193] start_this_handle+0xfc/0x5b0 [21381.867195] ? stop_this_handle+0xf6/0x110 [21381.867197] mpage_prepare_extent_to_map+0x2da/0x4e0 [21381.867201] ext4_do_writepages+0x251/0xb40 [21381.867203] ? update_sd_lb_stats.constprop.0+0x622/0x8c0 [21381.867206] ext4_writepages+0x9e/0x160 [21381.867208] do_writepages+0xc2/0x1e0 [21381.867211] ? enqueue_entity+0x131/0x390 [21381.867215] __writeback_single_inode+0x31/0x1a0 [21381.867218] writeback_sb_inodes+0x1ee/0x450 [21381.867221] __writeback_inodes_wb+0x47/0xf0 [21381.867224] wb_writeback.isra.0+0x17c/0x1d0 [21381.867227] wb_workfn+0x276/0x3c0 [21381.867229] ? __schedule+0x288/0xff0 [21381.867232] process_one_work+0x20a/0x3a0 [21381.867235] worker_thread+0x4d/0x3d0 [21381.867237] ? __pfx_worker_thread+0x10/0x10 [21381.867239] kthread+0xce/0x100 [21381.867242] ? __pfx_kthread+0x10/0x10 [21381.867244] ret_from_fork+0x2f/0x50 [21381.867247] ? __pfx_kthread+0x10/0x10 [21381.867249] ret_from_fork_asm+0x1b/0x30 [21381.867251] </TASK> [21504.744333] INFO: task jbd2/nvme0n1p2-:483 blocked for more than 122 seconds. [21504.744348] Tainted: G T 6.5.7-custom #14 [21504.744354] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [21504.744357] task:jbd2/nvme0n1p2- state:D stack:0 pid:483 ppid:2 flags:0x00004000 [21504.744370] Call Trace: [21504.744375] <TASK> [21504.744380] __schedule+0x280/0xff0 [21504.744401] ? page_mkclean_one+0x90/0xd0 [21504.744414] schedule+0x52/0xa0 [21504.744469] io_schedule+0x3e/0x70 [21504.744480] folio_wait_bit_common+0x13f/0x310 [21504.744493] ? __pfx_wake_page_function+0x10/0x10 [21504.744504] write_cache_pages+0x12b/0x370 [21504.744513] ? __pfx_ext4_journalled_writepage_callback+0x10/0x10 [21504.744526] ext4_journalled_submit_inode_data_buffers+0x73/0xa0 [21504.744537] jbd2_journal_commit_transaction+0x428/0x16e0 [21504.744549] ? lock_timer_base+0x5c/0x80 [21504.744560] kjournald2+0xa7/0x270 [21504.744570] ? __pfx_autoremove_wake_function+0x10/0x10 [21504.744582] ? __pfx_kjournald2+0x10/0x10 [21504.744592] kthread+0xce/0x100 [21504.744601] ? __pfx_kthread+0x10/0x10 [21504.744608] ret_from_fork+0x2f/0x50 [21504.744616] ? __pfx_kthread+0x10/0x10 [21504.744623] ret_from_fork_asm+0x1b/0x30 [21504.744631] </TASK> [21504.744653] INFO: task uptimed:3408 blocked for more than 122 seconds. [21504.744659] Tainted: G T 6.5.7-custom #14 [21504.744662] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [21504.744664] task:uptimed state:D stack:0 pid:3408 ppid:1 flags:0x00000000 [21504.744674] Call Trace: [21504.744676] <TASK> [21504.744679] __schedule+0x280/0xff0 [21504.744688] schedule+0x52/0xa0 [21504.744696] jbd2_log_wait_commit+0xd7/0x150 [21504.744703] ? __pfx_autoremove_wake_function+0x10/0x10 [21504.744713] jbd2_log_do_checkpoint+0x22f/0x2b0 [21504.744723] __jbd2_log_wait_for_space+0x4e/0x1f0 [21504.744732] add_transaction_credits+0x2e6/0x2f0 [21504.744739] start_this_handle+0xfc/0x5b0 [21504.744746] ? kmem_cache_alloc+0x15d/0x2a0 [21504.744755] jbd2__journal_start+0x125/0x1b0 [21504.744762] __ext4_new_inode+0x75c/0x1540 [21504.744770] ext4_create+0xf6/0x1d0 [21504.744778] path_openat+0x60f/0x10a0 [21504.744788] do_filp_open+0xaf/0x180 [21504.744798] do_sys_openat2+0xac/0xe0 [21504.744804] __x64_sys_openat+0x54/0xa0 [21504.744810] do_syscall_64+0x60/0x90 [21504.744820] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [21504.744832] RIP: 0033:0x7f49289cb1b2 [21504.744839] RSP: 002b:00007fff5f465a20 EFLAGS: 00000202 ORIG_RAX: 0000000000000101 [21504.744847] RAX: ffffffffffffffda RBX: 0000000000000241 RCX: 00007f49289cb1b2 [21504.744852] RDX: 0000000000000241 RSI: 00007f4928ab11b0 RDI: 00000000ffffff9c [21504.744856] RBP: 00007f4928ab11b0 R08: 0000000000000004 R09: 0000000000000001 [21504.744860] R10: 00000000000001b6 R11: 0000000000000202 R12: 00007f4928ab132a [21504.744863] R13: 00007f4928ab132a R14: 0000000000000001 R15: 0000000000000000 [21504.744867] </TASK> [21504.744875] INFO: task vhost-7818:7832 blocked for more than 122 seconds. [21504.744879] Tainted: G T 6.5.7-custom #14 [21504.744882] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [21504.744884] task:vhost-7818 state:D stack:0 pid:7832 ppid:1 flags:0x00004000 [21504.744891] Call Trace: [21504.744893] <TASK> [21504.744896] __schedule+0x280/0xff0 [21504.744905] schedule+0x52/0xa0 [21504.744914] jbd2_log_wait_commit+0xd7/0x150 [21504.744919] ? __pfx_autoremove_wake_function+0x10/0x10 [21504.744929] jbd2_log_do_checkpoint+0x22f/0x2b0 [21504.744938] __jbd2_log_wait_for_space+0x4e/0x1f0 [21504.744948] add_transaction_credits+0x2e6/0x2f0 [21504.744955] start_this_handle+0xfc/0x5b0 [21504.744961] ? kmem_cache_alloc+0x15d/0x2a0 [21504.744968] jbd2__journal_start+0x125/0x1b0 [21504.744975] ext4_dirty_inode+0x36/0x90 [21504.744983] __mark_inode_dirty+0x4e/0x220 [21504.744994] generic_update_time+0x80/0xd0 [21504.745003] file_update_time+0xc7/0xe0 [21504.745012] ext4_page_mkwrite+0x93/0x550 [21504.745019] do_page_mkwrite+0x52/0xe0 [21504.745029] do_wp_page+0xf3/0xcc0 [21504.745039] ? balance_dirty_pages_ratelimited_flags+0x46/0x340 [21504.745047] __handle_mm_fault+0x2e1/0x2f0 [21504.745057] handle_mm_fault+0x106/0x300 [21504.745067] exc_page_fault+0x1f4/0x560 [21504.745079] asm_exc_page_fault+0x26/0x30 [21504.745089] RIP: 0010:rep_movs_alternative+0x33/0x70 [21504.745100] Code: 40 83 f9 08 73 21 85 c9 74 0f 8a 06 88 07 48 ff c7 48 ff c6 48 ff c9 75 f1 c3 cc cc cc cc 66 0f 1f 84 00 00 00 00 00 48 8b 06 <48> 89 07 48 83 c6 08 48 83 c7 08 83 e9 08 74 df 83 f9 08 73 e8 eb [21504.745106] RSP: 0018:ffffa5e840567be8 EFLAGS: 00010202 [21504.745112] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000000000000a [21504.745116] RDX: 0000000000000000 RSI: ffffa5e840567c98 RDI: 00007f4b30d62600 [21504.745120] RBP: 000000000000000a R08: 0000000000000000 R09: 0000000000000001 [21504.745123] R10: ffff9c4bc0640000 R11: 0000000000000000 R12: ffffa5e840567e08 [21504.745127] R13: 0000000000000000 R14: ffffa5e840567c98 R15: ffff9c4bc0640218 [21504.745132] copyout+0x20/0x40 [21504.745143] _copy_to_iter+0xdc/0x430 [21504.745153] ? translate_desc+0x74/0x160 [21504.745164] tun_do_read+0x282/0x750 [21504.745175] tun_recvmsg+0x6d/0x170 [21504.745183] handle_rx+0x5eb/0xa60 [21504.745192] vhost_worker+0x44/0x70 [21504.745200] vhost_task_fn+0x53/0xc0 [21504.745212] ? __pfx_vhost_task_fn+0x10/0x10 [21504.745235] ret_from_fork+0x2f/0x50 [21504.745242] ? __pfx_vhost_task_fn+0x10/0x10 [21504.745251] ret_from_fork_asm+0x1b/0x30 [21504.745258] </TASK> [21504.745260] INFO: task CPU 0/KVM:7834 blocked for more than 245 seconds. [21504.745264] Tainted: G T 6.5.7-custom #14 [21504.745267] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [21504.745269] task:CPU 0/KVM state:D stack:0 pid:7834 ppid:1 flags:0x00004000 [21504.745276] Call Trace: [21504.745278] <TASK> [21504.745280] __schedule+0x280/0xff0 [21504.745289] schedule+0x52/0xa0 [21504.745298] jbd2_log_wait_commit+0xd7/0x150 [21504.745303] ? __pfx_autoremove_wake_function+0x10/0x10 [21504.745313] jbd2_log_do_checkpoint+0x22f/0x2b0 [21504.745322] __jbd2_log_wait_for_space+0x4e/0x1f0 [21504.745331] add_transaction_credits+0x2e6/0x2f0 [21504.745338] start_this_handle+0xfc/0x5b0 [21504.745344] ? kmem_cache_alloc+0x15d/0x2a0 [21504.745352] jbd2__journal_start+0x125/0x1b0 [21504.745358] ext4_page_mkwrite+0x346/0x550 [21504.745365] do_page_mkwrite+0x52/0xe0 [21504.745374] do_wp_page+0xf3/0xcc0 [21504.745383] ? vmx_vcpu_load+0x1c/0x50 [21504.745391] ? kvm_arch_vcpu_load+0x65/0x210 [21504.745401] __handle_mm_fault+0x2e1/0x2f0 [21504.745411] handle_mm_fault+0x106/0x300 [21504.745421] __get_user_pages+0x1f5/0x360 [21504.745429] get_user_pages_unlocked+0xcd/0x280 [21504.745437] hva_to_pfn+0xe9/0x420 [21504.745444] kvm_faultin_pfn+0xa5/0x3d0 [21504.745454] kvm_tdp_page_fault+0x127/0x180 [21504.745461] kvm_mmu_page_fault+0x295/0x760 [21504.745469] ? sysvec_call_function+0xe/0x90 [21504.745479] ? vmx_vmexit+0x7d/0xd0 [21504.745486] ? vmx_vmexit+0x77/0xd0 [21504.745493] ? vmx_vmexit+0x9e/0xd0 [21504.745500] vmx_handle_exit+0x12b/0x770 [21504.745509] kvm_arch_vcpu_ioctl_run+0x81b/0x1ed0 [21504.745518] kvm_vcpu_ioctl+0x1a3/0x6d0 [21504.745528] __x64_sys_ioctl+0x530/0xa70 [21504.745536] do_syscall_64+0x60/0x90 [21504.745544] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [21504.745554] RIP: 0033:0x7f4b9c513f9f [21504.745558] RSP: 002b:00007f4b9a25d8e0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [21504.745564] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f4b9c513f9f [21504.745568] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000018 [21504.745572] RBP: 000055764903b770 R08: 000055764716dea0 R09: 0000000000000640 [21504.745576] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [21504.745579] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000 [21504.745583] </TASK> [21504.745585] INFO: task CPU 1/KVM:7835 blocked for more than 245 seconds. [21504.745589] Tainted: G T 6.5.7-custom #14 [21504.745591] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [21504.745594] task:CPU 1/KVM state:D stack:0 pid:7835 ppid:1 flags:0x00004000 [21504.745600] Call Trace: [21504.745602] <TASK> [21504.745604] __schedule+0x280/0xff0 [21504.745613] schedule+0x52/0xa0 [21504.745621] jbd2_log_wait_commit+0xd7/0x150 [21504.745627] ? __pfx_autoremove_wake_function+0x10/0x10 [21504.745636] jbd2_log_do_checkpoint+0x22f/0x2b0 [21504.745646] __jbd2_log_wait_for_space+0x4e/0x1f0 [21504.745655] ? __pfx_autoremove_wake_function+0x10/0x10 [21504.745664] add_transaction_credits+0x2e6/0x2f0 [21504.745671] ? ext4_mark_iloc_dirty+0x202/0x5d0 [21504.745677] start_this_handle+0xfc/0x5b0 [21504.745683] ? kmem_cache_alloc+0x15d/0x2a0 [21504.745690] jbd2__journal_start+0x125/0x1b0 [21504.745696] ext4_dirty_inode+0x36/0x90 [21504.745703] __mark_inode_dirty+0x4e/0x220 [21504.745712] generic_update_time+0x80/0xd0 [21504.745720] file_update_time+0xc7/0xe0 [21504.745728] ext4_page_mkwrite+0x93/0x550 [21504.745734] do_page_mkwrite+0x52/0xe0 [21504.745743] do_wp_page+0xf3/0xcc0 [21504.745752] __handle_mm_fault+0x2e1/0x2f0 [21504.745762] handle_mm_fault+0x106/0x300 [21504.745772] __get_user_pages+0x1f5/0x360 [21504.745779] get_user_pages_unlocked+0xcd/0x280 [21504.745787] hva_to_pfn+0xe9/0x420 [21504.745793] kvm_faultin_pfn+0xa5/0x3d0 [21504.745801] kvm_tdp_page_fault+0x127/0x180 [21504.745807] kvm_mmu_page_fault+0x295/0x760 [21504.745815] ? __check_object_size+0x53/0x2d0 [21504.745825] vmx_handle_exit+0x12b/0x770 [21504.745833] kvm_arch_vcpu_ioctl_run+0x81b/0x1ed0 [21504.745841] kvm_vcpu_ioctl+0x1a3/0x6d0 [21504.745850] __x64_sys_ioctl+0x530/0xa70 [21504.745856] do_syscall_64+0x60/0x90 [21504.745863] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [21504.745872] RIP: 0033:0x7f4b9c513f9f [21504.745876] RSP: 002b:00007f4b99a5c8e0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [21504.745881] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f4b9c513f9f [21504.745885] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000019 [21504.745888] RBP: 000055764906ce40 R08: 000055764716dea0 R09: 00007f4a88003010 [21504.745892] R10: 0000000000000100 R11: 0000000000000246 R12: 0000000000000000 [21504.745895] R13: 0000000000000000 R14: 00007ffcce9ee160 R15: 00007f4b9925d000 [21504.745899] </TASK> [21504.745901] Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings
The new root filesystem was created with mkfs.ext4 with no special options. Here is the relevant /etc/mke2fs.conf: [defaults] base_features = sparse_super,large_file,filetype,resize_inode,dir_index,ext_attr default_mntopts = acl,user_xattr,journal_data enable_periodic_fsck = 0 blocksize = 4096 inode_size = 256 inode_ratio = 16384 [fs_types] ext3 = { features = has_journal } ext4 = { features = has_journal,extent,huge_file,flex_bg,metadata_csum,64bit,dir_nlink,extra_isize } small = { blocksize = 1024 inode_ratio = 4096 } floppy = { blocksize = 1024 inode_ratio = 8192 } big = { inode_ratio = 32768 } huge = { inode_ratio = 65536 } news = { inode_ratio = 4096 } largefile = { inode_ratio = 1048576 blocksize = -1 } largefile4 = { inode_ratio = 4194304 blocksize = -1 } hurd = { blocksize = 4096 inode_size = 128 warn_y2038_dates = 0 } And the dumpe2fs -h output from the currently failing system: dumpe2fs 1.47.0 (5-Feb-2023) Filesystem volume name: root Last mounted on: / Filesystem UUID: b7112037-b835-45ac-b3b3-4d9250c53503 Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent 64bit flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum Filesystem flags: signed_directory_hash Default mount options: journal_data user_xattr acl Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 2621440 Block count: 10485760 Reserved block count: 524288 Overhead clusters: 242376 Free blocks: 6899974 Free inodes: 2208475 First block: 0 Block size: 4096 Fragment size: 4096 Group descriptor size: 64 Reserved GDT blocks: 1024 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 8192 Inode blocks per group: 512 Flex block group size: 16 Filesystem created: Sat Oct 14 18:03:49 2023 Last mount time: Sat Oct 14 18:16:50 2023 Last write time: Sat Oct 14 18:16:48 2023 Mount count: 2 Maximum mount count: -1 Last checked: Sat Oct 14 18:03:49 2023 Check interval: 0 (<none>) Lifetime writes: 26 GB Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 256 Required extra isize: 32 Desired extra isize: 32 Journal inode: 8 Default directory hash: half_md4 Directory Hash Seed: bf662df3-a40b-4276-905e-70bb51bfd867 Journal backup: inode blocks Checksum type: crc32c Checksum: 0xdf22e900 Journal features: journal_64bit journal_checksum_v3 Total journal size: 256M Total journal blocks: 65536 Max transaction length: 65536 Fast commit length: 0 Journal sequence: 0x00001f33 Journal start: 18150 Journal checksum type: crc32c Journal checksum: 0x70a695b1
It would be really nice to get a translation from the stack trace offsets to line numbers, but what appears to be happening is that we're starting a journal commit, and to complete the journal, since we are in the default data=ordered mode, we call ext4_journalled_submit_inode_data_buffers(), which in turn calls write_cache_pages() to flush out modified data blocks associated with an inode which had newly allocated blocks (so that we don't accidentally expose stale data blocks if there is a crash, which is a guarantee of data=ordered mode). The write_cache_pages() function is then calling some function in mm/filemap.c (this is where a line number translation would be happy), which calls folio_wait_bit_common(), which presumably is waiting for some memory folio which is undergoing writeback, or otherwise busy, to complete. This then calls io_schedule() --- because we're waiting for some I/O to complete, and this apparently never completes, thus stalling the jbd2 commit operation, and then all of the other processes which are trying to make changes to the file system are waiting for the commit complete, leading to all of the other stack traces. The question is why is this happening on your system? It could be because of some kind of missed I/O completion interrupt, or some other problem in the block device layer or NVMe driver ---but normally if that were the case, there should have been some kind of kernel log messages from those parts of the I/O stack. Were there any that you could see (that perhaps were excerpted out in the bug report, since "obviously" it was assumed this was an ext4 problem, as opposed to ext4 simply being an innocent victim of problems lower down on the storage stack? The other question that might be worth asking is what sort of workload does your server run, and how might this be different from what other users might be doing, or what we exercise with out regression tests?
I'm not running anything special on the server, it is a KVM/libvirt host with only a few VMS, none of which use the root filesystem for storage. Besides that it is only the usual set of system services and ssh for admin access, so I think most writes to the root filesystem are due to logging. I compiled and rebooted to a new the 6.5.7 kernel with only CONFIG_DEBUG_INFO=y, CONFIG_DEBUG_INFO_DWARF5=y and CONFIG_DEBUG_INFO_COMPRESSED_NONE=y added, but the issue has not yet re-occurred during the almost 3-day uptime.