Bug 217843 - WARNING: CPU: 13 PID: 3837105 at kernel/sched/sched.h:1561 __cfsb_csd_unthrottle+0x149/0x160
Summary: WARNING: CPU: 13 PID: 3837105 at kernel/sched/sched.h:1561 __cfsb_csd_unthrot...
Status: NEW
Alias: None
Product: Process Management
Classification: Unclassified
Component: Scheduler (show other bugs)
Hardware: All Linux
: P3 normal
Assignee: Ingo Molnar
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-08-29 18:44 UTC by Igor Raits
Modified: 2023-08-30 07:06 UTC (History)
2 users (show)

See Also:
Kernel Version:
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments

Description Igor Raits 2023-08-29 18:44:41 UTC
Hello, we recently got a few kernel crashes with following backtrace. Happened on 6.4.12 (and 6.4.11 I think) but did not happen (I think) on 6.4.4.

[293790.928007] ------------[ cut here ]------------
[293790.929905] rq->clock_update_flags & RQCF_ACT_SKIP
[293790.929919] WARNING: CPU: 13 PID: 3837105 at kernel/sched/sched.h:1561 __cfsb_csd_unthrottle+0x149/0x160
[293790.933694] Modules linked in: xt_owner(E) xt_REDIRECT(E) mptcp_diag(E) xsk_diag(E) raw_diag(E) unix_diag(E) af_packet_diag(E) netlink_diag(E) tcp_diag(E) udp_diag(E) inet_diag(E) rpcsec_gss_krb5(E) auth_rpcgss(E) nfsv4(E) nfs(E) lockd(E) grace(E) fscache(E) netfs(E) nbd(E) rbd(E) libceph(E) dns_resolver(E) xt_set(E) ipt_rpfilter(E) ip_set_hash_ip(E) ip_set_hash_net(E) bpf_preload(E) xt_multiport(E) veth(E) wireguard(E) libchacha20poly1305(E) chacha_x86_64(E) poly1305_x86_64(E) ip6_udp_tunnel(E) udp_tunnel(E) curve25519_x86_64(E) libcurve25519_generic(E) libchacha(E) nf_conntrack_netlink(E) xt_nat(E) xt_statistic(E) xt_addrtype(E) ipt_REJECT(E) nf_reject_ipv4(E) ip_set(E) ip_vs_sh(E) ip_vs_wrr(E) ip_vs_rr(E) ip_vs(E) xt_MASQUERADE(E) nft_chain_nat(E) xt_mark(E) xt_conntrack(E) xt_comment(E) nft_compat(E) nf_tables(E) nfnetlink(E) br_netfilter(E) bridge(E) stp(E) llc(E) iptable_nat(E) nf_nat(E) iptable_filter(E) ip_tables(E) overlay(E) dummy(E) sunrpc(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E)
[293790.933738]  binfmt_misc(E) tls(E) isofs(E) intel_rapl_msr(E) intel_rapl_common(E) kvm_amd(E) virtio_gpu(E) ccp(E) virtio_dma_buf(E) drm_shmem_helper(E) virtio_net(E) vfat(E) kvm(E) i2c_i801(E) drm_kms_helper(E) net_failover(E) irqbypass(E) syscopyarea(E) fat(E) i2c_smbus(E) failover(E) sysfillrect(E) virtio_balloon(E) sysimgblt(E) drm(E) fuse(E) ext4(E) mbcache(E) jbd2(E) sr_mod(E) cdrom(E) sg(E) ahci(E) libahci(E) crct10dif_pclmul(E) crc32_pclmul(E) polyval_clmulni(E) libata(E) polyval_generic(E) ghash_clmulni_intel(E) sha512_ssse3(E) virtio_blk(E) serio_raw(E) btrfs(E) xor(E) zstd_compress(E) raid6_pq(E) libcrc32c(E) crc32c_intel(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E)
[293790.946262] Unloaded tainted modules: edac_mce_amd(E):1
[293790.956625] CPU: 13 PID: 3837105 Comm: QueryWorker-30f Tainted: G        W   E      6.4.12-1.gdc.el9.x86_64 #1
[293790.957963] Hardware name: RDO OpenStack Compute/RHEL, BIOS edk2-20230301gitf80f052277c8-2.el9 03/01/2023
[293790.959681] RIP: 0010:__cfsb_csd_unthrottle+0x149/0x160
[293790.960933] Code: 37 fa ff 0f 0b e9 17 ff ff ff 80 3d 41 59 fc 01 00 0f 85 21 ff ff ff 48 c7 c7 98 03 95 9d c6 05 2d 59 fc 01 01 e8 77 37 fa ff <0f> 0b 41 8b 85 88 09 00 00 e9 00 ff ff ff 66 0f 1f 84 00 00 00 00
[293790.964077] RSP: 0000:ffffb708e7217db8 EFLAGS: 00010086
[293790.965160] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000027
[293790.966340] RDX: ffff905482c5f708 RSI: 0000000000000001 RDI: ffff905482c5f700
[293790.967839] RBP: ffff9029bb0e9e00 R08: 0000000000000000 R09: 00000000ffff7fff
[293790.969496] R10: ffffb708e7217c58 R11: ffffffff9e3e2c88 R12: 00000000000317c0
[293790.970859] R13: ffff903602c317c0 R14: 0000000000000000 R15: ffff905482c726b8
[293790.972085] FS:  00007ff3b66fe640(0000) GS:ffff905482c40000(0000) knlGS:0000000000000000
[293790.973678] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[293790.974663] CR2: 00007f16889036c0 CR3: 0000002072e34004 CR4: 0000000000770ee0
[293790.976108] PKRU: 55555554
[293790.977048] Call Trace:
[293790.978013]  <TASK>
[293790.978678]  ? __warn+0x80/0x130
[293790.979727]  ? __cfsb_csd_unthrottle+0x149/0x160
[293790.980824]  ? report_bug+0x195/0x1a0
[293790.981806]  ? handle_bug+0x3c/0x70
[293790.982884]  ? exc_invalid_op+0x14/0x70
[293790.983837]  ? asm_exc_invalid_op+0x16/0x20
[293790.984626]  ? __cfsb_csd_unthrottle+0x149/0x160
[293790.985599]  ? __cfsb_csd_unthrottle+0x149/0x160
[293790.986583]  unregister_fair_sched_group+0x73/0x1d0
[293790.987682]  sched_unregister_group_rcu+0x1a/0x40
[293790.988752]  rcu_do_batch+0x199/0x4d0
[293790.989643]  rcu_core+0x267/0x420
[293790.990418]  __do_softirq+0xc8/0x2ab
[293790.991285]  __irq_exit_rcu+0xb9/0xf0
[293790.992555]  sysvec_apic_timer_interrupt+0x3c/0x90
[293790.993477]  asm_sysvec_apic_timer_interrupt+0x16/0x20
[293790.994171] RIP: 0033:0x7ff4dca91f60
[293790.994801] Code: 75 15 49 8b f7 c5 f8 77 49 ba 80 6c bf f3 f4 7f 00 00 41 ff d2 eb 0d 4b 89 7c 13 f8 49 83 c2 f8 4d 89 57 70 48 8b c3 c5 f8 77 <48> 83 c4 50 5d 4d 8b 97 08 01 00 00 41 85 02 c3 49 8d 14 fc 8b 7a
[293790.997256] RSP: 002b:00007ff3b66fd190 EFLAGS: 00000246
[293790.998138] RAX: 0000000655fd6ed0 RBX: 0000000655fd6ed0 RCX: 0000000000000004
[293790.999184] RDX: 0000000000000000 RSI: 000000066cf4939c RDI: 00007ff4f1180eb7
[293791.000220] RBP: 0000000000000004 R08: 000000066cf48530 R09: 000000066cf493a8
[293791.001274] R10: 00000000000007f0 R11: 00007ff3bc00ca80 R12: 0000000000000000
[293791.002222] R13: 000000066cf49390 R14: 00000000cd9e9272 R15: 00007ff39c033800
[293791.002966]  </TASK>
[293791.003489] ---[ end trace 0000000000000000 ]---
[293791.004440] ------------[ cut here ]------------
[293791.005479] rq->clock_update_flags < RQCF_ACT_SKIP
[293791.005493] WARNING: CPU: 0 PID: 3920513 at kernel/sched/sched.h:1496 update_curr+0x162/0x1d0

Sadly I don't have more info but hopefully this stacktrace will be enough.
Comment 1 Bagas Sanjaya 2023-08-30 00:28:59 UTC
(In reply to Igor Raits from comment #0)
> Hello, we recently got a few kernel crashes with following backtrace.
> Happened on 6.4.12 (and 6.4.11 I think) but did not happen (I think) on
> 6.4.4.
> 
> [293790.928007] ------------[ cut here ]------------
> [293790.929905] rq->clock_update_flags & RQCF_ACT_SKIP
> [293790.929919] WARNING: CPU: 13 PID: 3837105 at kernel/sched/sched.h:1561
> __cfsb_csd_unthrottle+0x149/0x160
> [293790.933694] Modules linked in: xt_owner(E) xt_REDIRECT(E) mptcp_diag(E)
> xsk_diag(E) raw_diag(E) unix_diag(E) af_packet_diag(E) netlink_diag(E)
> tcp_diag(E) udp_diag(E) inet_diag(E) rpcsec_gss_krb5(E) auth_rpcgss(E)
> nfsv4(E) nfs(E) lockd(E) grace(E) fscache(E) netfs(E) nbd(E) rbd(E)
> libceph(E) dns_resolver(E) xt_set(E) ipt_rpfilter(E) ip_set_hash_ip(E)
> ip_set_hash_net(E) bpf_preload(E) xt_multiport(E) veth(E) wireguard(E)
> libchacha20poly1305(E) chacha_x86_64(E) poly1305_x86_64(E) ip6_udp_tunnel(E)
> udp_tunnel(E) curve25519_x86_64(E) libcurve25519_generic(E) libchacha(E)
> nf_conntrack_netlink(E) xt_nat(E) xt_statistic(E) xt_addrtype(E)
> ipt_REJECT(E) nf_reject_ipv4(E) ip_set(E) ip_vs_sh(E) ip_vs_wrr(E)
> ip_vs_rr(E) ip_vs(E) xt_MASQUERADE(E) nft_chain_nat(E) xt_mark(E)
> xt_conntrack(E) xt_comment(E) nft_compat(E) nf_tables(E) nfnetlink(E)
> br_netfilter(E) bridge(E) stp(E) llc(E) iptable_nat(E) nf_nat(E)
> iptable_filter(E) ip_tables(E) overlay(E) dummy(E) sunrpc(E) nf_conntrack(E)
> nf_defrag_ipv6(E) nf_defrag_ipv4(E)
> [293790.933738]  binfmt_misc(E) tls(E) isofs(E) intel_rapl_msr(E)
> intel_rapl_common(E) kvm_amd(E) virtio_gpu(E) ccp(E) virtio_dma_buf(E)
> drm_shmem_helper(E) virtio_net(E) vfat(E) kvm(E) i2c_i801(E)
> drm_kms_helper(E) net_failover(E) irqbypass(E) syscopyarea(E) fat(E)
> i2c_smbus(E) failover(E) sysfillrect(E) virtio_balloon(E) sysimgblt(E)
> drm(E) fuse(E) ext4(E) mbcache(E) jbd2(E) sr_mod(E) cdrom(E) sg(E) ahci(E)
> libahci(E) crct10dif_pclmul(E) crc32_pclmul(E) polyval_clmulni(E) libata(E)
> polyval_generic(E) ghash_clmulni_intel(E) sha512_ssse3(E) virtio_blk(E)
> serio_raw(E) btrfs(E) xor(E) zstd_compress(E) raid6_pq(E) libcrc32c(E)
> crc32c_intel(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E)
> [293790.946262] Unloaded tainted modules: edac_mce_amd(E):1
> [293790.956625] CPU: 13 PID: 3837105 Comm: QueryWorker-30f Tainted: G       
> W   E      6.4.12-1.gdc.el9.x86_64 #1
> [293790.957963] Hardware name: RDO OpenStack Compute/RHEL, BIOS
> edk2-20230301gitf80f052277c8-2.el9 03/01/2023
> [293790.959681] RIP: 0010:__cfsb_csd_unthrottle+0x149/0x160
> [293790.960933] Code: 37 fa ff 0f 0b e9 17 ff ff ff 80 3d 41 59 fc 01 00 0f
> 85 21 ff ff ff 48 c7 c7 98 03 95 9d c6 05 2d 59 fc 01 01 e8 77 37 fa ff <0f>
> 0b 41 8b 85 88 09 00 00 e9 00 ff ff ff 66 0f 1f 84 00 00 00 00
> [293790.964077] RSP: 0000:ffffb708e7217db8 EFLAGS: 00010086
> [293790.965160] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
> 0000000000000027
> [293790.966340] RDX: ffff905482c5f708 RSI: 0000000000000001 RDI:
> ffff905482c5f700
> [293790.967839] RBP: ffff9029bb0e9e00 R08: 0000000000000000 R09:
> 00000000ffff7fff
> [293790.969496] R10: ffffb708e7217c58 R11: ffffffff9e3e2c88 R12:
> 00000000000317c0
> [293790.970859] R13: ffff903602c317c0 R14: 0000000000000000 R15:
> ffff905482c726b8
> [293790.972085] FS:  00007ff3b66fe640(0000) GS:ffff905482c40000(0000)
> knlGS:0000000000000000
> [293790.973678] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [293790.974663] CR2: 00007f16889036c0 CR3: 0000002072e34004 CR4:
> 0000000000770ee0
> [293790.976108] PKRU: 55555554
> [293790.977048] Call Trace:
> [293790.978013]  <TASK>
> [293790.978678]  ? __warn+0x80/0x130
> [293790.979727]  ? __cfsb_csd_unthrottle+0x149/0x160
> [293790.980824]  ? report_bug+0x195/0x1a0
> [293790.981806]  ? handle_bug+0x3c/0x70
> [293790.982884]  ? exc_invalid_op+0x14/0x70
> [293790.983837]  ? asm_exc_invalid_op+0x16/0x20
> [293790.984626]  ? __cfsb_csd_unthrottle+0x149/0x160
> [293790.985599]  ? __cfsb_csd_unthrottle+0x149/0x160
> [293790.986583]  unregister_fair_sched_group+0x73/0x1d0
> [293790.987682]  sched_unregister_group_rcu+0x1a/0x40
> [293790.988752]  rcu_do_batch+0x199/0x4d0
> [293790.989643]  rcu_core+0x267/0x420
> [293790.990418]  __do_softirq+0xc8/0x2ab
> [293790.991285]  __irq_exit_rcu+0xb9/0xf0
> [293790.992555]  sysvec_apic_timer_interrupt+0x3c/0x90
> [293790.993477]  asm_sysvec_apic_timer_interrupt+0x16/0x20
> [293790.994171] RIP: 0033:0x7ff4dca91f60
> [293790.994801] Code: 75 15 49 8b f7 c5 f8 77 49 ba 80 6c bf f3 f4 7f 00 00
> 41 ff d2 eb 0d 4b 89 7c 13 f8 49 83 c2 f8 4d 89 57 70 48 8b c3 c5 f8 77 <48>
> 83 c4 50 5d 4d 8b 97 08 01 00 00 41 85 02 c3 49 8d 14 fc 8b 7a
> [293790.997256] RSP: 002b:00007ff3b66fd190 EFLAGS: 00000246
> [293790.998138] RAX: 0000000655fd6ed0 RBX: 0000000655fd6ed0 RCX:
> 0000000000000004
> [293790.999184] RDX: 0000000000000000 RSI: 000000066cf4939c RDI:
> 00007ff4f1180eb7
> [293791.000220] RBP: 0000000000000004 R08: 000000066cf48530 R09:
> 000000066cf493a8
> [293791.001274] R10: 00000000000007f0 R11: 00007ff3bc00ca80 R12:
> 0000000000000000
> [293791.002222] R13: 000000066cf49390 R14: 00000000cd9e9272 R15:
> 00007ff39c033800
> [293791.002966]  </TASK>
> [293791.003489] ---[ end trace 0000000000000000 ]---
> [293791.004440] ------------[ cut here ]------------
> [293791.005479] rq->clock_update_flags < RQCF_ACT_SKIP
> [293791.005493] WARNING: CPU: 0 PID: 3920513 at kernel/sched/sched.h:1496
> update_curr+0x162/0x1d0
> 
> Sadly I don't have more info but hopefully this stacktrace will be enough.

IMO this is triggered by 130ef25f7004ee, which is backported from mainline
commit ebb83d84e49b54.
Comment 2 Bagas Sanjaya 2023-08-30 00:34:15 UTC
Can you confirm that current mainline (v6.5) also have this regression?

Note You need to log in before you can comment on or make changes to this bug.