Hello, we recently got a few kernel crashes with following backtrace. Happened on 6.4.12 (and 6.4.11 I think) but did not happen (I think) on 6.4.4. [293790.928007] ------------[ cut here ]------------ [293790.929905] rq->clock_update_flags & RQCF_ACT_SKIP [293790.929919] WARNING: CPU: 13 PID: 3837105 at kernel/sched/sched.h:1561 __cfsb_csd_unthrottle+0x149/0x160 [293790.933694] Modules linked in: xt_owner(E) xt_REDIRECT(E) mptcp_diag(E) xsk_diag(E) raw_diag(E) unix_diag(E) af_packet_diag(E) netlink_diag(E) tcp_diag(E) udp_diag(E) inet_diag(E) rpcsec_gss_krb5(E) auth_rpcgss(E) nfsv4(E) nfs(E) lockd(E) grace(E) fscache(E) netfs(E) nbd(E) rbd(E) libceph(E) dns_resolver(E) xt_set(E) ipt_rpfilter(E) ip_set_hash_ip(E) ip_set_hash_net(E) bpf_preload(E) xt_multiport(E) veth(E) wireguard(E) libchacha20poly1305(E) chacha_x86_64(E) poly1305_x86_64(E) ip6_udp_tunnel(E) udp_tunnel(E) curve25519_x86_64(E) libcurve25519_generic(E) libchacha(E) nf_conntrack_netlink(E) xt_nat(E) xt_statistic(E) xt_addrtype(E) ipt_REJECT(E) nf_reject_ipv4(E) ip_set(E) ip_vs_sh(E) ip_vs_wrr(E) ip_vs_rr(E) ip_vs(E) xt_MASQUERADE(E) nft_chain_nat(E) xt_mark(E) xt_conntrack(E) xt_comment(E) nft_compat(E) nf_tables(E) nfnetlink(E) br_netfilter(E) bridge(E) stp(E) llc(E) iptable_nat(E) nf_nat(E) iptable_filter(E) ip_tables(E) overlay(E) dummy(E) sunrpc(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) [293790.933738] binfmt_misc(E) tls(E) isofs(E) intel_rapl_msr(E) intel_rapl_common(E) kvm_amd(E) virtio_gpu(E) ccp(E) virtio_dma_buf(E) drm_shmem_helper(E) virtio_net(E) vfat(E) kvm(E) i2c_i801(E) drm_kms_helper(E) net_failover(E) irqbypass(E) syscopyarea(E) fat(E) i2c_smbus(E) failover(E) sysfillrect(E) virtio_balloon(E) sysimgblt(E) drm(E) fuse(E) ext4(E) mbcache(E) jbd2(E) sr_mod(E) cdrom(E) sg(E) ahci(E) libahci(E) crct10dif_pclmul(E) crc32_pclmul(E) polyval_clmulni(E) libata(E) polyval_generic(E) ghash_clmulni_intel(E) sha512_ssse3(E) virtio_blk(E) serio_raw(E) btrfs(E) xor(E) zstd_compress(E) raid6_pq(E) libcrc32c(E) crc32c_intel(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) [293790.946262] Unloaded tainted modules: edac_mce_amd(E):1 [293790.956625] CPU: 13 PID: 3837105 Comm: QueryWorker-30f Tainted: G W E 6.4.12-1.gdc.el9.x86_64 #1 [293790.957963] Hardware name: RDO OpenStack Compute/RHEL, BIOS edk2-20230301gitf80f052277c8-2.el9 03/01/2023 [293790.959681] RIP: 0010:__cfsb_csd_unthrottle+0x149/0x160 [293790.960933] Code: 37 fa ff 0f 0b e9 17 ff ff ff 80 3d 41 59 fc 01 00 0f 85 21 ff ff ff 48 c7 c7 98 03 95 9d c6 05 2d 59 fc 01 01 e8 77 37 fa ff <0f> 0b 41 8b 85 88 09 00 00 e9 00 ff ff ff 66 0f 1f 84 00 00 00 00 [293790.964077] RSP: 0000:ffffb708e7217db8 EFLAGS: 00010086 [293790.965160] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000027 [293790.966340] RDX: ffff905482c5f708 RSI: 0000000000000001 RDI: ffff905482c5f700 [293790.967839] RBP: ffff9029bb0e9e00 R08: 0000000000000000 R09: 00000000ffff7fff [293790.969496] R10: ffffb708e7217c58 R11: ffffffff9e3e2c88 R12: 00000000000317c0 [293790.970859] R13: ffff903602c317c0 R14: 0000000000000000 R15: ffff905482c726b8 [293790.972085] FS: 00007ff3b66fe640(0000) GS:ffff905482c40000(0000) knlGS:0000000000000000 [293790.973678] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [293790.974663] CR2: 00007f16889036c0 CR3: 0000002072e34004 CR4: 0000000000770ee0 [293790.976108] PKRU: 55555554 [293790.977048] Call Trace: [293790.978013] <TASK> [293790.978678] ? __warn+0x80/0x130 [293790.979727] ? __cfsb_csd_unthrottle+0x149/0x160 [293790.980824] ? report_bug+0x195/0x1a0 [293790.981806] ? handle_bug+0x3c/0x70 [293790.982884] ? exc_invalid_op+0x14/0x70 [293790.983837] ? asm_exc_invalid_op+0x16/0x20 [293790.984626] ? __cfsb_csd_unthrottle+0x149/0x160 [293790.985599] ? __cfsb_csd_unthrottle+0x149/0x160 [293790.986583] unregister_fair_sched_group+0x73/0x1d0 [293790.987682] sched_unregister_group_rcu+0x1a/0x40 [293790.988752] rcu_do_batch+0x199/0x4d0 [293790.989643] rcu_core+0x267/0x420 [293790.990418] __do_softirq+0xc8/0x2ab [293790.991285] __irq_exit_rcu+0xb9/0xf0 [293790.992555] sysvec_apic_timer_interrupt+0x3c/0x90 [293790.993477] asm_sysvec_apic_timer_interrupt+0x16/0x20 [293790.994171] RIP: 0033:0x7ff4dca91f60 [293790.994801] Code: 75 15 49 8b f7 c5 f8 77 49 ba 80 6c bf f3 f4 7f 00 00 41 ff d2 eb 0d 4b 89 7c 13 f8 49 83 c2 f8 4d 89 57 70 48 8b c3 c5 f8 77 <48> 83 c4 50 5d 4d 8b 97 08 01 00 00 41 85 02 c3 49 8d 14 fc 8b 7a [293790.997256] RSP: 002b:00007ff3b66fd190 EFLAGS: 00000246 [293790.998138] RAX: 0000000655fd6ed0 RBX: 0000000655fd6ed0 RCX: 0000000000000004 [293790.999184] RDX: 0000000000000000 RSI: 000000066cf4939c RDI: 00007ff4f1180eb7 [293791.000220] RBP: 0000000000000004 R08: 000000066cf48530 R09: 000000066cf493a8 [293791.001274] R10: 00000000000007f0 R11: 00007ff3bc00ca80 R12: 0000000000000000 [293791.002222] R13: 000000066cf49390 R14: 00000000cd9e9272 R15: 00007ff39c033800 [293791.002966] </TASK> [293791.003489] ---[ end trace 0000000000000000 ]--- [293791.004440] ------------[ cut here ]------------ [293791.005479] rq->clock_update_flags < RQCF_ACT_SKIP [293791.005493] WARNING: CPU: 0 PID: 3920513 at kernel/sched/sched.h:1496 update_curr+0x162/0x1d0 Sadly I don't have more info but hopefully this stacktrace will be enough.
(In reply to Igor Raits from comment #0) > Hello, we recently got a few kernel crashes with following backtrace. > Happened on 6.4.12 (and 6.4.11 I think) but did not happen (I think) on > 6.4.4. > > [293790.928007] ------------[ cut here ]------------ > [293790.929905] rq->clock_update_flags & RQCF_ACT_SKIP > [293790.929919] WARNING: CPU: 13 PID: 3837105 at kernel/sched/sched.h:1561 > __cfsb_csd_unthrottle+0x149/0x160 > [293790.933694] Modules linked in: xt_owner(E) xt_REDIRECT(E) mptcp_diag(E) > xsk_diag(E) raw_diag(E) unix_diag(E) af_packet_diag(E) netlink_diag(E) > tcp_diag(E) udp_diag(E) inet_diag(E) rpcsec_gss_krb5(E) auth_rpcgss(E) > nfsv4(E) nfs(E) lockd(E) grace(E) fscache(E) netfs(E) nbd(E) rbd(E) > libceph(E) dns_resolver(E) xt_set(E) ipt_rpfilter(E) ip_set_hash_ip(E) > ip_set_hash_net(E) bpf_preload(E) xt_multiport(E) veth(E) wireguard(E) > libchacha20poly1305(E) chacha_x86_64(E) poly1305_x86_64(E) ip6_udp_tunnel(E) > udp_tunnel(E) curve25519_x86_64(E) libcurve25519_generic(E) libchacha(E) > nf_conntrack_netlink(E) xt_nat(E) xt_statistic(E) xt_addrtype(E) > ipt_REJECT(E) nf_reject_ipv4(E) ip_set(E) ip_vs_sh(E) ip_vs_wrr(E) > ip_vs_rr(E) ip_vs(E) xt_MASQUERADE(E) nft_chain_nat(E) xt_mark(E) > xt_conntrack(E) xt_comment(E) nft_compat(E) nf_tables(E) nfnetlink(E) > br_netfilter(E) bridge(E) stp(E) llc(E) iptable_nat(E) nf_nat(E) > iptable_filter(E) ip_tables(E) overlay(E) dummy(E) sunrpc(E) nf_conntrack(E) > nf_defrag_ipv6(E) nf_defrag_ipv4(E) > [293790.933738] binfmt_misc(E) tls(E) isofs(E) intel_rapl_msr(E) > intel_rapl_common(E) kvm_amd(E) virtio_gpu(E) ccp(E) virtio_dma_buf(E) > drm_shmem_helper(E) virtio_net(E) vfat(E) kvm(E) i2c_i801(E) > drm_kms_helper(E) net_failover(E) irqbypass(E) syscopyarea(E) fat(E) > i2c_smbus(E) failover(E) sysfillrect(E) virtio_balloon(E) sysimgblt(E) > drm(E) fuse(E) ext4(E) mbcache(E) jbd2(E) sr_mod(E) cdrom(E) sg(E) ahci(E) > libahci(E) crct10dif_pclmul(E) crc32_pclmul(E) polyval_clmulni(E) libata(E) > polyval_generic(E) ghash_clmulni_intel(E) sha512_ssse3(E) virtio_blk(E) > serio_raw(E) btrfs(E) xor(E) zstd_compress(E) raid6_pq(E) libcrc32c(E) > crc32c_intel(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) > [293790.946262] Unloaded tainted modules: edac_mce_amd(E):1 > [293790.956625] CPU: 13 PID: 3837105 Comm: QueryWorker-30f Tainted: G > W E 6.4.12-1.gdc.el9.x86_64 #1 > [293790.957963] Hardware name: RDO OpenStack Compute/RHEL, BIOS > edk2-20230301gitf80f052277c8-2.el9 03/01/2023 > [293790.959681] RIP: 0010:__cfsb_csd_unthrottle+0x149/0x160 > [293790.960933] Code: 37 fa ff 0f 0b e9 17 ff ff ff 80 3d 41 59 fc 01 00 0f > 85 21 ff ff ff 48 c7 c7 98 03 95 9d c6 05 2d 59 fc 01 01 e8 77 37 fa ff <0f> > 0b 41 8b 85 88 09 00 00 e9 00 ff ff ff 66 0f 1f 84 00 00 00 00 > [293790.964077] RSP: 0000:ffffb708e7217db8 EFLAGS: 00010086 > [293790.965160] RAX: 0000000000000000 RBX: 0000000000000000 RCX: > 0000000000000027 > [293790.966340] RDX: ffff905482c5f708 RSI: 0000000000000001 RDI: > ffff905482c5f700 > [293790.967839] RBP: ffff9029bb0e9e00 R08: 0000000000000000 R09: > 00000000ffff7fff > [293790.969496] R10: ffffb708e7217c58 R11: ffffffff9e3e2c88 R12: > 00000000000317c0 > [293790.970859] R13: ffff903602c317c0 R14: 0000000000000000 R15: > ffff905482c726b8 > [293790.972085] FS: 00007ff3b66fe640(0000) GS:ffff905482c40000(0000) > knlGS:0000000000000000 > [293790.973678] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [293790.974663] CR2: 00007f16889036c0 CR3: 0000002072e34004 CR4: > 0000000000770ee0 > [293790.976108] PKRU: 55555554 > [293790.977048] Call Trace: > [293790.978013] <TASK> > [293790.978678] ? __warn+0x80/0x130 > [293790.979727] ? __cfsb_csd_unthrottle+0x149/0x160 > [293790.980824] ? report_bug+0x195/0x1a0 > [293790.981806] ? handle_bug+0x3c/0x70 > [293790.982884] ? exc_invalid_op+0x14/0x70 > [293790.983837] ? asm_exc_invalid_op+0x16/0x20 > [293790.984626] ? __cfsb_csd_unthrottle+0x149/0x160 > [293790.985599] ? __cfsb_csd_unthrottle+0x149/0x160 > [293790.986583] unregister_fair_sched_group+0x73/0x1d0 > [293790.987682] sched_unregister_group_rcu+0x1a/0x40 > [293790.988752] rcu_do_batch+0x199/0x4d0 > [293790.989643] rcu_core+0x267/0x420 > [293790.990418] __do_softirq+0xc8/0x2ab > [293790.991285] __irq_exit_rcu+0xb9/0xf0 > [293790.992555] sysvec_apic_timer_interrupt+0x3c/0x90 > [293790.993477] asm_sysvec_apic_timer_interrupt+0x16/0x20 > [293790.994171] RIP: 0033:0x7ff4dca91f60 > [293790.994801] Code: 75 15 49 8b f7 c5 f8 77 49 ba 80 6c bf f3 f4 7f 00 00 > 41 ff d2 eb 0d 4b 89 7c 13 f8 49 83 c2 f8 4d 89 57 70 48 8b c3 c5 f8 77 <48> > 83 c4 50 5d 4d 8b 97 08 01 00 00 41 85 02 c3 49 8d 14 fc 8b 7a > [293790.997256] RSP: 002b:00007ff3b66fd190 EFLAGS: 00000246 > [293790.998138] RAX: 0000000655fd6ed0 RBX: 0000000655fd6ed0 RCX: > 0000000000000004 > [293790.999184] RDX: 0000000000000000 RSI: 000000066cf4939c RDI: > 00007ff4f1180eb7 > [293791.000220] RBP: 0000000000000004 R08: 000000066cf48530 R09: > 000000066cf493a8 > [293791.001274] R10: 00000000000007f0 R11: 00007ff3bc00ca80 R12: > 0000000000000000 > [293791.002222] R13: 000000066cf49390 R14: 00000000cd9e9272 R15: > 00007ff39c033800 > [293791.002966] </TASK> > [293791.003489] ---[ end trace 0000000000000000 ]--- > [293791.004440] ------------[ cut here ]------------ > [293791.005479] rq->clock_update_flags < RQCF_ACT_SKIP > [293791.005493] WARNING: CPU: 0 PID: 3920513 at kernel/sched/sched.h:1496 > update_curr+0x162/0x1d0 > > Sadly I don't have more info but hopefully this stacktrace will be enough. IMO this is triggered by 130ef25f7004ee, which is backported from mainline commit ebb83d84e49b54.
Can you confirm that current mainline (v6.5) also have this regression?
Hello, could you please share a reproducible case? Best, Cruz Zhao