Bug 216388 - On Host, kernel errors in KVM, on guests, it shows CPU stalls
Summary: On Host, kernel errors in KVM, on guests, it shows CPU stalls
Status: RESOLVED MOVED
Alias: None
Product: Virtualization
Classification: Unclassified
Component: kvm (show other bugs)
Hardware: All Linux
: P1 high
Assignee: virtualization_kvm
URL:
Keywords:
: 216399 (view as bug list)
Depends on:
Blocks:
 
Reported: 2022-08-21 07:37 UTC by Robert Dinse
Modified: 2022-09-17 20:23 UTC (History)
0 users

See Also:
Kernel Version: 5.19.0 / 5.19.1 / 5.19.2
Subsystem:
Regression: No
Bisected commit-id:


Attachments
The configuration file used to Comile this kernel. (262.93 KB, text/plain)
2022-08-21 07:37 UTC, Robert Dinse
Details
signature.asc (195 bytes, application/pgp-signature)
2022-08-22 23:46 UTC, zhenyuw
Details

Description Robert Dinse 2022-08-21 07:37:09 UTC
Created attachment 301614 [details]
The configuration file used to Comile this kernel.

This behavior has persisted across 5.19.0, 5.19.1, and 5.19.2.  While the kernel I am taking this example from is tainted (owing to using Intel development drivers for GPU virtualization), it is also occurring on non-tainted kernels on servers with no development or third party modules installed.

INFO: task CPU 2/KVM:2343 blocked for more than 1228 seconds.
[207177.050049]       Tainted: G     U    I       5.19.2 #1
[207177.050050] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[207177.050051] task:CPU 2/KVM       state:D stack:    0 pid: 2343 ppid:     1 flags:0x00000002
[207177.050054] Call Trace:
[207177.050055]  <TASK>
[207177.050056]  __schedule+0x359/0x1400
[207177.050060]  ? kvm_mmu_page_fault+0x1ee/0x980
[207177.050062]  ? kvm_set_msr_common+0x31f/0x1060
[207177.050065]  schedule+0x5f/0x100
[207177.050066]  schedule_preempt_disabled+0x15/0x30
[207177.050068]  __mutex_lock.constprop.0+0x4e2/0x750
[207177.050070]  ? aa_file_perm+0x124/0x4f0
[207177.050071]  __mutex_lock_slowpath+0x13/0x20
[207177.050072]  mutex_lock+0x25/0x30
[207177.050075]  intel_vgpu_emulate_mmio_read+0x5d/0x3b0 [kvmgt]
[207177.050084]  intel_vgpu_rw+0xb8/0x1c0 [kvmgt]
[207177.050091]  intel_vgpu_read+0x20d/0x250 [kvmgt]
[207177.050097]  vfio_device_fops_read+0x1f/0x40
[207177.050100]  vfs_read+0x9b/0x160
[207177.050102]  __x64_sys_pread64+0x93/0xd0
[207177.050104]  do_syscall_64+0x58/0x80
[207177.050106]  ? kvm_on_user_return+0x84/0xe0
[207177.050107]  ? fire_user_return_notifiers+0x37/0x70
[207177.050109]  ? exit_to_user_mode_prepare+0x41/0x200
[207177.050111]  ? syscall_exit_to_user_mode+0x1b/0x40
[207177.050112]  ? do_syscall_64+0x67/0x80
[207177.050114]  ? irqentry_exit+0x54/0x70
[207177.050115]  ? sysvec_call_function_single+0x4b/0xa0
[207177.050116]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[207177.050118] RIP: 0033:0x7ff51131293f
[207177.050119] RSP: 002b:00007ff4ddffa260 EFLAGS: 00000293 ORIG_RAX: 0000000000000011
[207177.050121] RAX: ffffffffffffffda RBX: 00005599a6835420 RCX: 00007ff51131293f
[207177.050122] RDX: 0000000000000004 RSI: 00007ff4ddffa2a8 RDI: 0000000000000027
[207177.050123] RBP: 0000000000000004 R08: 0000000000000000 R09: 00000000ffffffff
[207177.050124] R10: 0000000000065f10 R11: 0000000000000293 R12: 0000000000065f10
[207177.050124] R13: 00005599a6835330 R14: 0000000000000004 R15: 0000000000065f10
[207177.050126]  </TASK>

     I am seeing this on Intel i7-6700k, i7-6850k, and i7-9700k platforms.

     This did not happen on 5.17 kernels, and 5.18 kernels never ran stable enough on my platforms to actually run them for more than a few minutes.

     Likewise 6.0-rc1 has not been stable enough to run in production.  After
less than three hours running on my workstation it locked hard with even the magic sys-request key being unresponsive and only power cycling the machine got it back.

     The operating system in use for the host on all machines is Ubuntu 22.04.

     Guests vary with Ubuntu 22.04 being the most common but also Mint, Debian, Manjaro, Centos, Fedora, ScientificLinux, Zorin, and Windows being in use.

     I see the same issue manifest on platforms running only Ubuntu guests as with guests of varying operating systems.  

     The configuration file I used to compile this kernel is attached.  I compiled it with gcc 12.1.0.

     This behavior does not manifest itself instantly, typically the machine needs to be running 3-7 days before it does.  Once it does guests keep stalling and restarting libvirtd does not help.  Only thing that seems to is a hard reboot of the physical host.  For this reason I believe the issue lies strictly with the host and not the guests.

     I have listed it as a severity of high since it is completely service interrupting.
Comment 1 Sean Christopherson 2022-08-22 17:50:41 UTC
+GVT folks

On Sun, Aug 21, 2022, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=216388
> 
>             Bug ID: 216388
>            Summary: On Host, kernel errors in KVM, on guests, it shows CPU
>                     stalls
>            Product: Virtualization
>            Version: unspecified
>     Kernel Version: 5.19.0 / 5.19.1 / 5.19.2
>           Hardware: All
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: high
>           Priority: P1
>          Component: kvm
>           Assignee: virtualization_kvm@kernel-bugs.osdl.org
>           Reporter: nanook@eskimo.com
>         Regression: No
> 
> Created attachment 301614 [details]
>   --> https://bugzilla.kernel.org/attachment.cgi?id=301614&action=edit
> The configuration file used to Comile this kernel.
> 
> This behavior has persisted across 5.19.0, 5.19.1, and 5.19.2.  While the
> kernel I am taking this example from is tainted (owing to using Intel
> development drivers for GPU virtualization), it is also occurring on
> non-tainted kernels on servers with no development or third party modules
> installed.
> 
> INFO: task CPU 2/KVM:2343 blocked for more than 1228 seconds.
> [207177.050049]       Tainted: G     U    I       5.19.2 #1
> [207177.050050] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [207177.050051] task:CPU 2/KVM       state:D stack:    0 pid: 2343 ppid:    
> 1
> flags:0x00000002
> [207177.050054] Call Trace:
> [207177.050055]  <TASK>
> [207177.050056]  __schedule+0x359/0x1400
> [207177.050060]  ? kvm_mmu_page_fault+0x1ee/0x980
> [207177.050062]  ? kvm_set_msr_common+0x31f/0x1060
> [207177.050065]  schedule+0x5f/0x100
> [207177.050066]  schedule_preempt_disabled+0x15/0x30
> [207177.050068]  __mutex_lock.constprop.0+0x4e2/0x750
> [207177.050070]  ? aa_file_perm+0x124/0x4f0
> [207177.050071]  __mutex_lock_slowpath+0x13/0x20
> [207177.050072]  mutex_lock+0x25/0x30
> [207177.050075]  intel_vgpu_emulate_mmio_read+0x5d/0x3b0 [kvmgt]

This isn't a KVM problem, it's a KVMGT problem (despite the name, KVMGT is very
much not KVM).

> [207177.050084]  intel_vgpu_rw+0xb8/0x1c0 [kvmgt]
> [207177.050091]  intel_vgpu_read+0x20d/0x250 [kvmgt]
> [207177.050097]  vfio_device_fops_read+0x1f/0x40
> [207177.050100]  vfs_read+0x9b/0x160
> [207177.050102]  __x64_sys_pread64+0x93/0xd0
> [207177.050104]  do_syscall_64+0x58/0x80
> [207177.050106]  ? kvm_on_user_return+0x84/0xe0
> [207177.050107]  ? fire_user_return_notifiers+0x37/0x70
> [207177.050109]  ? exit_to_user_mode_prepare+0x41/0x200
> [207177.050111]  ? syscall_exit_to_user_mode+0x1b/0x40
> [207177.050112]  ? do_syscall_64+0x67/0x80
> [207177.050114]  ? irqentry_exit+0x54/0x70
> [207177.050115]  ? sysvec_call_function_single+0x4b/0xa0
> [207177.050116]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [207177.050118] RIP: 0033:0x7ff51131293f
> [207177.050119] RSP: 002b:00007ff4ddffa260 EFLAGS: 00000293 ORIG_RAX:
> 0000000000000011
> [207177.050121] RAX: ffffffffffffffda RBX: 00005599a6835420 RCX:
> 00007ff51131293f
> [207177.050122] RDX: 0000000000000004 RSI: 00007ff4ddffa2a8 RDI:
> 0000000000000027
> [207177.050123] RBP: 0000000000000004 R08: 0000000000000000 R09:
> 00000000ffffffff
> [207177.050124] R10: 0000000000065f10 R11: 0000000000000293 R12:
> 0000000000065f10
> [207177.050124] R13: 00005599a6835330 R14: 0000000000000004 R15:
> 0000000000065f10
> [207177.050126]  </TASK>
> 
>      I am seeing this on Intel i7-6700k, i7-6850k, and i7-9700k platforms.
> 
>      This did not happen on 5.17 kernels, and 5.18 kernels never ran stable
> enough on my platforms to actually run them for more than a few minutes.
> 
>      Likewise 6.0-rc1 has not been stable enough to run in production.  After
> less than three hours running on my workstation it locked hard with even the
> magic sys-request key being unresponsive and only power cycling the machine
> got
> it back.
> 
>      The operating system in use for the host on all machines is Ubuntu
>      22.04.
> 
>      Guests vary with Ubuntu 22.04 being the most common but also Mint,
>      Debian,
> Manjaro, Centos, Fedora, ScientificLinux, Zorin, and Windows being in use.
> 
>      I see the same issue manifest on platforms running only Ubuntu guests as
> with guests of varying operating systems.  
> 
>      The configuration file I used to compile this kernel is attached.  I
> compiled it with gcc 12.1.0.
> 
>      This behavior does not manifest itself instantly, typically the machine
> needs to be running 3-7 days before it does.  Once it does guests keep
> stalling
> and restarting libvirtd does not help.  Only thing that seems to is a hard
> reboot of the physical host.  For this reason I believe the issue lies
> strictly
> with the host and not the guests.
> 
>      I have listed it as a severity of high since it is completely service
> interrupting.
> 
> -- 
> You may reply to this email to add a comment.
> 
> You are receiving this mail because:
> You are watching the assignee of the bug.
Comment 2 zhenyuw 2022-08-22 23:46:47 UTC
Created attachment 301634 [details]
signature.asc

On 2022.08.22 17:50:33 +0000, Sean Christopherson wrote:
> +GVT folks
>
> On Sun, Aug 21, 2022, bugzilla-daemon@kernel.org wrote:
> > https://bugzilla.kernel.org/show_bug.cgi?id=216388
> > 
> >             Bug ID: 216388
> >            Summary: On Host, kernel errors in KVM, on guests, it shows CPU
> >                     stalls
> >            Product: Virtualization
> >            Version: unspecified
> >     Kernel Version: 5.19.0 / 5.19.1 / 5.19.2
> >           Hardware: All
> >                 OS: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: high
> >           Priority: P1
> >          Component: kvm
> >           Assignee: virtualization_kvm@kernel-bugs.osdl.org
> >           Reporter: nanook@eskimo.com
> >         Regression: No
> > 
> > Created attachment 301614 [details]
> >   --> https://bugzilla.kernel.org/attachment.cgi?id=301614&action=edit
> > The configuration file used to Comile this kernel.
> > 
> > This behavior has persisted across 5.19.0, 5.19.1, and 5.19.2.  While the
> > kernel I am taking this example from is tainted (owing to using Intel
> > development drivers for GPU virtualization), it is also occurring on
> > non-tainted kernels on servers with no development or third party modules
> > installed.
> > 
> > INFO: task CPU 2/KVM:2343 blocked for more than 1228 seconds.
> > [207177.050049]       Tainted: G     U    I       5.19.2 #1
> > [207177.050050] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> > this message.
> > [207177.050051] task:CPU 2/KVM       state:D stack:    0 pid: 2343 ppid:   
>  1
> > flags:0x00000002
> > [207177.050054] Call Trace:
> > [207177.050055]  <TASK>
> > [207177.050056]  __schedule+0x359/0x1400
> > [207177.050060]  ? kvm_mmu_page_fault+0x1ee/0x980
> > [207177.050062]  ? kvm_set_msr_common+0x31f/0x1060
> > [207177.050065]  schedule+0x5f/0x100
> > [207177.050066]  schedule_preempt_disabled+0x15/0x30
> > [207177.050068]  __mutex_lock.constprop.0+0x4e2/0x750
> > [207177.050070]  ? aa_file_perm+0x124/0x4f0
> > [207177.050071]  __mutex_lock_slowpath+0x13/0x20
> > [207177.050072]  mutex_lock+0x25/0x30
> > [207177.050075]  intel_vgpu_emulate_mmio_read+0x5d/0x3b0 [kvmgt]
> 
> This isn't a KVM problem, it's a KVMGT problem (despite the name, KVMGT is
> very
> much not KVM).
> 
> > [207177.050084]  intel_vgpu_rw+0xb8/0x1c0 [kvmgt]
> > [207177.050091]  intel_vgpu_read+0x20d/0x250 [kvmgt]
> > [207177.050097]  vfio_device_fops_read+0x1f/0x40
> > [207177.050100]  vfs_read+0x9b/0x160
> > [207177.050102]  __x64_sys_pread64+0x93/0xd0
> > [207177.050104]  do_syscall_64+0x58/0x80
> > [207177.050106]  ? kvm_on_user_return+0x84/0xe0
> > [207177.050107]  ? fire_user_return_notifiers+0x37/0x70
> > [207177.050109]  ? exit_to_user_mode_prepare+0x41/0x200
> > [207177.050111]  ? syscall_exit_to_user_mode+0x1b/0x40
> > [207177.050112]  ? do_syscall_64+0x67/0x80
> > [207177.050114]  ? irqentry_exit+0x54/0x70
> > [207177.050115]  ? sysvec_call_function_single+0x4b/0xa0
> > [207177.050116]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > [207177.050118] RIP: 0033:0x7ff51131293f
> > [207177.050119] RSP: 002b:00007ff4ddffa260 EFLAGS: 00000293 ORIG_RAX:
> > 0000000000000011
> > [207177.050121] RAX: ffffffffffffffda RBX: 00005599a6835420 RCX:
> > 00007ff51131293f
> > [207177.050122] RDX: 0000000000000004 RSI: 00007ff4ddffa2a8 RDI:
> > 0000000000000027
> > [207177.050123] RBP: 0000000000000004 R08: 0000000000000000 R09:
> > 00000000ffffffff
> > [207177.050124] R10: 0000000000065f10 R11: 0000000000000293 R12:
> > 0000000000065f10
> > [207177.050124] R13: 00005599a6835330 R14: 0000000000000004 R15:
> > 0000000000065f10
> > [207177.050126]  </TASK>
> > 
> >      I am seeing this on Intel i7-6700k, i7-6850k, and i7-9700k platforms.

One recent regression fix on Comet Lake is https://patchwork.freedesktop.org/patch/496987/,
it's on the way to 6.0-rc and would be pushed to 5.19 stable as well. But looks this
report impacts on more platforms? We'll double check.

Thanks

> > 
> >      This did not happen on 5.17 kernels, and 5.18 kernels never ran stable
> > enough on my platforms to actually run them for more than a few minutes.
> > 
> >      Likewise 6.0-rc1 has not been stable enough to run in production. 
> After
> > less than three hours running on my workstation it locked hard with even
> the
> > magic sys-request key being unresponsive and only power cycling the machine
> got
> > it back.
> > 
> >      The operating system in use for the host on all machines is Ubuntu
> 22.04.
> > 
> >      Guests vary with Ubuntu 22.04 being the most common but also Mint,
> Debian,
> > Manjaro, Centos, Fedora, ScientificLinux, Zorin, and Windows being in use.
> > 
> >      I see the same issue manifest on platforms running only Ubuntu guests
> as
> > with guests of varying operating systems.  
> > 
> >      The configuration file I used to compile this kernel is attached.  I
> > compiled it with gcc 12.1.0.
> > 
> >      This behavior does not manifest itself instantly, typically the
> machine
> > needs to be running 3-7 days before it does.  Once it does guests keep
> stalling
> > and restarting libvirtd does not help.  Only thing that seems to is a hard
> > reboot of the physical host.  For this reason I believe the issue lies
> strictly
> > with the host and not the guests.
> > 
> >      I have listed it as a severity of high since it is completely service
> > interrupting.
> > 
> > -- 
> > You may reply to this email to add a comment.
> > 
> > You are receiving this mail because:
> > You are watching the assignee of the bug.
Comment 3 Robert Dinse 2022-08-23 00:57:26 UTC
     Regarding this being a KVMGT and NOT a KVM problem, while this report does come from a machine where I have Intel GPU virtualization in use, it has also occurred on three machines i7-6700k and i7-6850k machines with no GPU virtualization although it is configured into the kernel simply because I used the same config file for all of the machines.
Comment 4 Robert Dinse 2022-08-27 19:42:34 UTC
     I am not seeing this particular CPU stall on 5.19.4, but I am seeing other CPU stalls.  I've opened three different tickets on CPU stalls because they've all been in completely different tasks but at this point I have to wonder if there isn't some common code that they are all calling or a broken structure they are all using or something similar.  Rather than open 40 more tickets that all end up being a duplicate, perhaps someone familiar with the internal workings could look at these two tickets in addition to this one, #216399, which is a stall on an MDRAID task, and #216405, and then before I open yet another ticket, here is yet another CPU stall in a task worker:

[  489.383957] INFO: task worker:11403 blocked for more than 122 seconds.
[  489.383962]       Not tainted 5.19.4 #1
[  489.383964] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  489.383965] task:worker          state:D stack:    0 pid:11403 ppid:     1 flags:0x00004002
[  489.383968] Call Trace:
[  489.383970]  <TASK>
[  489.383973]  __schedule+0x367/0x1400
[  489.383980]  schedule+0x58/0xf0
[  489.383983]  io_schedule+0x46/0x80
[  489.383985]  folio_wait_bit_common+0x11e/0x350
[  489.383989]  ? filemap_invalidate_unlock_two+0x50/0x50
[  489.383992]  folio_wait_bit+0x18/0x20
[  489.383994]  folio_wait_writeback+0x2c/0x80
[  489.383997]  wait_on_page_writeback+0x18/0x50
[  489.383999]  __filemap_fdatawait_range+0x98/0x140
[  489.384003]  file_write_and_wait_range+0x83/0xb0
[  489.384005]  ext4_sync_file+0xf3/0x320
[  489.384009]  __x64_sys_fdatasync+0x4e/0xa0
[  489.384012]  ? syscall_enter_from_user_mode+0x50/0x70
[  489.384014]  do_syscall_64+0x58/0x80
[  489.384017]  ? sysvec_apic_timer_interrupt+0x4b/0xa0
[  489.384020]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[  489.384022] RIP: 0033:0x7f96e331bb1b
[  489.384025] RSP: 002b:00007f96788c75d0 EFLAGS: 00000293 ORIG_RAX: 000000000000004b
[  489.384027] RAX: ffffffffffffffda RBX: 00005639414e0860 RCX: 00007f96e331bb1b
[  489.384029] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000000000b
[  489.384030] RBP: 0000563941270890 R08: 0000000000000000 R09: 0000000000000000
[  489.384031] R10: 00007f96788c75f0 R11: 0000000000000293 R12: 0000000000000000
[  489.384033] R13: 00005639412708f8 R14: 00005639425cedd0 R15: 00007ffded76f3d0
[  489.384036]  </TASK>

If this appears to be related I will not generate a ticket but I am not knowledgable enough about the internals to know.
Comment 5 Robert Dinse 2022-08-28 21:08:53 UTC
Here is another example:

[98519.357381] Task dump for CPU 10:
[98519.357382] task:Embedded solr q state:R  running task     stack:    0 pid:607931 ppid:     1 flags:0x00000000
[98519.357389] Call Trace:
[98519.357393]  <TASK>
[98519.357399]  ? kvm_clock_get_cycles+0x11/0x20
[98519.357408]  ? ktime_get+0x46/0xc0
[98519.357411]  ? lapic_next_deadline+0x2c/0x40
[98519.357414]  ? clockevents_program_event+0xae/0x130
[98519.357418]  ? tick_program_event+0x43/0x90
[98519.357420]  ? hrtimer_interrupt+0x11f/0x220
[98519.357423]  ? exit_to_user_mode_prepare+0x41/0x1e0
[98519.357427]  ? irqentry_exit_to_user_mode+0x9/0x30
[98519.357430]  ? irqentry_exit+0x1d/0x30
[98519.357432]  ? sysvec_apic_timer_interrupt+0x4b/0xa0
[98519.357436]  ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
[98519.357442]  </TASK>

As you can see these are happening all over hell and back.
Comment 6 Robert Dinse 2022-09-01 06:09:17 UTC
Installed 5.19.6 on a couple of machines today, still getting CPU stalls but in random locations:

[    6.601788] rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 4-... } 3 jiffies s: 53 root: 0x10/.                                                                                                                                                                                                            
[    6.601802] rcu: blocking rcu_node structures (internal RCU debug):                                                                                                                                                                                                                                                         
[    6.601806] Task dump for CPU 4:                                                                                                                                                                                                                                                                                            
[    6.601808] task:systemd-udevd   state:R  running task     stack:    0 pid:  468 ppid:   454 flags:0x0000400a                                                                                                                                                                                                               
[    6.604313] Call Trace:                                                                                                                                                                                                                                                                              
[    6.604324]  <TASK>                                                                                                                                                                                                                                                                  
[    6.604326]  ? cpumask_any_but+0x35/0x50                                                                                                                                                                                                                                                                                    
[    6.604336]  ? x2apic_send_IPI_allbutself+0x2f/0x40                                                                                                                                                                                                                                                                         
[    6.604339]  ? do_sync_core+0x2a/0x30                                                                                                                                                                                                                                                                                       
[    6.604342]  ? cpumask_next+0x23/0x30                                                                                                                                                                                                                                                                                       
[    6.604344]  ? smp_call_function_many_cond+0xea/0x370                                                                                                                                                                                                                                                                       
[    6.604347]  ? text_poke_memset+0x20/0x20                                                                                                                                                                                                                                                                                   
[    6.604350]  ? arch_unregister_cpu+0x50/0x50                                                                                                                                                                                                                                                                                
[    6.604352]  ? on_each_cpu_cond_mask+0x1d/0x30                                                                                                                                                                                                                                                                              
[    6.604354]  ? text_poke_bp_batch+0x1fb/0x210                                                                                                                                                                                                                                                                               
[    6.604358]  ? enter_smm.constprop.0+0x51a/0xa70 [kvm]                                                                                                                                                                                                                                                                      
[    6.604414]  ? vmx_set_cr0+0x16f0/0x16f0 [kvm_intel]                                                                                                                                                                                                                                                                        
[    6.604457]  ? enter_smm.constprop.0+0x519/0xa70 [kvm]                                                                                                                                                                                                                                                                      
[    6.604501]  ? text_poke_bp+0x49/0x70                                                                                                                                                                                                                                                                                       
[    6.604504]  ? __static_call_transform+0x7f/0x120                                                                                                                                                                                                                                                                           
[    6.604506]  ? arch_static_call_transform+0x87/0xa0                                                                                                                                                                                                                                                                         
[    6.604508]  ? enter_smm.constprop.0+0x519/0xa70 [kvm]                                                                                                                                                                                                                                                                      
[    6.604552]  ? __static_call_update+0x16e/0x220                                                                                                                                                                                                                                                                             
[    6.604554]  ? vmx_set_cr0+0x16f0/0x16f0 [kvm_intel]                                                                                                                                                                                                                                                                        
[    6.604567]  ? kvm_arch_hardware_setup+0x35a/0x17f0 [kvm]                                                                                                                                                                                                                                                                   
[    6.604611]  ? __kmalloc_node+0x16c/0x380                                                                                                                                                                                                                                                                                   
[    6.604615]  ? kvm_init+0xa2/0x400 [kvm]                                                                                                                                                                                                                                                                                    
[    6.604654]  ? hardware_setup+0x7e2/0x8cc [kvm_intel]                                                                                                                                                                                                                                                                       
[    6.604666]  ? vmx_init+0xf9/0x201 [kvm_intel]                                                                                                                                                                                                                                                                              
[    6.604676]  ? hardware_setup+0x8cc/0x8cc [kvm_intel]                                                                                                                                                                                                                                                                       
[    6.604685]  ? do_one_initcall+0x47/0x1e0                                                                                                                                                                                                                                                                                   
[    6.604689]  ? kmem_cache_alloc_trace+0x16c/0x2b0                                                                                                                                                                                                                                                                           
[    6.604692]  ? do_init_module+0x50/0x1f0                                                                                                                                                                                                                                                                                    
[    6.604694]  ? load_module+0x21bd/0x25e0                                                                                                                                                                                                                                                                                    
[    6.604696]  ? ima_post_read_file+0xd5/0x100                                                                                                                                                                                                                                                                                
[    6.604700]  ? kernel_read_file+0x23d/0x2e0                                                                                                                                                                                                                                                                                 
[    6.604703]  ? __do_sys_finit_module+0xbd/0x130                                                                                                                                                                                                                                                                             
[    6.604705]  ? __do_sys_finit_module+0xbd/0x130                                                                                                                                                                                                                                                                             
[    6.604708]  ? __x64_sys_finit_module+0x18/0x20                                                                                                                                                                                                                                                                             
[    6.604710]  ? do_syscall_64+0x58/0x80                                                                                                                                                                                                                                                                                      
[    6.604713]  ? syscall_exit_to_user_mode+0x1b/0x40                                                                                                                                                                                                                                                                          
[    6.604715]  ? do_syscall_64+0x67/0x80                                                                                                                                                                                                                                                                                      
[    6.604718]  ? switch_fpu_return+0x4e/0xc0                                                                                                                                                                                                                                                                                  
[    6.604720]  ? exit_to_user_mode_prepare+0x184/0x1e0                                                                                                                                                                                                                                                                        
[    6.604723]  ? syscall_exit_to_user_mode+0x1b/0x40                                                                                                                                                                                                                                                                          
[    6.604725]  ? do_syscall_64+0x67/0x80                                                                                                                                                                                                                                                                                      
[    6.604728]  ? do_syscall_64+0x67/0x80                                                                                                                                                                                                                                                                                      
[    6.604730]  ? do_syscall_64+0x67/0x80                                                                                                                                                                                                                                                                                      
[    6.604732]  ? sysvec_call_function+0x4b/0xa0                                                                                                                                                                                                                                                                               
[    6.604735]  ? entry_SYSCALL_64_after_hwframe+0x63/0xcd                                                                                                                                                                                                                                                                     
[    6.604739]  </TASK>     
[    6.697044] rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 4-... } 13 jiffies s: 53 root: 0x10/.                                                                                                                                                                                                           
[    6.697051] rcu: blocking rcu_node structures (internal RCU debug):                                                                                                                                                                                                                                                         
[    6.697052] Task dump for CPU 4:                                                                                                                                                                                                                                                                                            
[    6.697053] task:systemd-udevd   state:R  running task     stack:    0 pid:  468 ppid:   454 flags:0x0000400a                                                                                                                                                                                                               
[    6.697057] Call Trace:                                                                                                                                                                                                                                                                              
[    6.697058]  <TASK>                                                                                                                                                                                                                                                                  
[    6.697059]  ? cpumask_any_but+0x35/0x50                                                                                                                                                                                                                                                                                    
[    6.697065]  ? x2apic_send_IPI_allbutself+0x2f/0x40                                                                                                                                                                                                                                                                         
[    6.697068]  ? do_sync_core+0x2a/0x30                                                                                                                                                                                                                                                                                       
[    6.697071]  ? cpumask_next+0x23/0x30                                                                                                                                                                                                                                                                                       
[    6.697072]  ? smp_call_function_many_cond+0xea/0x370                                                                                                                                                                                                                                                                       
[    6.697075]  ? text_poke_memset+0x20/0x20                                                                                                                                                                                                                                                                                   
[    6.697077]  ? arch_unregister_cpu+0x50/0x50                                                                                                                                                                                                                                                                                
[    6.697080]  ? on_each_cpu_cond_mask+0x1d/0x30                                                                                                                                                                                                                                                                              
[    6.697081]  ? text_poke_bp_batch+0x1fb/0x210                                                                                                                                                                                                                                                                               
[    6.697084]  ? kvm_set_msr_common+0x939/0x1060 [kvm]                                                                                                                                                                                                                                                                        
[    6.697133]  ? vmx_set_efer.part.0+0x160/0x160 [kvm_intel]                                                                                                                                                                                                                                                                  
[    6.697147]  ? kvm_set_msr_common+0x938/0x1060 [kvm]                                                                                                                                                                                                                                                                        
[    6.697187]  ? text_poke_bp+0x49/0x70                                                                                                                                                                                                                                                                                       
[    6.697189]  ? __static_call_transform+0x7f/0x120                                                                                                                                                                                                                                                                           
[    6.697191]  ? arch_static_call_transform+0x87/0xa0                                                                                                                                                                                                                                                                         
[    6.697193]  ? kvm_set_msr_common+0x938/0x1060 [kvm]                                                                                                                                                                                                                                                                        
[    6.697234]  ? __static_call_update+0x16e/0x220                                                                                                                                                                                                                                                                             
[    6.697236]  ? vmx_set_efer.part.0+0x160/0x160 [kvm_intel]                                                                                                                                                                                                                                                                  
[    6.697246]  ? kvm_arch_hardware_setup+0x423/0x17f0 [kvm]                                                                                                                                                                                                                                                                   
[    6.697286]  ? __kmalloc_node+0x16c/0x380                                                                                                                                                                                                                                                                                   
[    6.697290]  ? kvm_init+0xa2/0x400 [kvm]                                                                                                                                                                                                                                                                                    
[    6.697326]  ? hardware_setup+0x7e2/0x8cc [kvm_intel]                                                                                                                                                                                                                                                                       
[    6.697336]  ? vmx_init+0xf9/0x201 [kvm_intel]                                                                                                                                                                                                                                                                              
[    6.697345]  ? hardware_setup+0x8cc/0x8cc [kvm_intel]                                                                                                                                                                                                                                                                       
[    6.697353]  ? do_one_initcall+0x47/0x1e0                                                                                                                                                                                                                                                                                   
[    6.697356]  ? kmem_cache_alloc_trace+0x16c/0x2b0                                                                                                                                                                                                                                                                           
[    6.697359]  ? do_init_module+0x50/0x1f0                                                                                                                                                                                                                                                                                    
[    6.697360]  ? load_module+0x21bd/0x25e0                                                                                                                                                                                                                                                                                    
[    6.697362]  ? ima_post_read_file+0xd5/0x100                                                                                                                                                                                                                                                                                
[    6.697365]  ? kernel_read_file+0x23d/0x2e0                                                                                                                                                                                                                                                                                 
[    6.697368]  ? __do_sys_finit_module+0xbd/0x130                                                                                                                                                                                                                                                                             
[    6.697370]  ? __do_sys_finit_module+0xbd/0x130                                                                                                                                                                                                                                                                             
[    6.697372]  ? __x64_sys_finit_module+0x18/0x20                                                                                                                                                                                                                                                                             
[    6.697373]  ? do_syscall_64+0x58/0x80                                                                                                                                                                                                                                                                                      
[    6.697376]  ? syscall_exit_to_user_mode+0x1b/0x40
[    6.697377]  ? do_syscall_64+0x67/0x80
[    6.697379]  ? switch_fpu_return+0x4e/0xc0
[    6.697382]  ? exit_to_user_mode_prepare+0x184/0x1e0
[    6.697384]  ? syscall_exit_to_user_mode+0x1b/0x40
[    6.697386]  ? do_syscall_64+0x67/0x80
[    6.697387]  ? do_syscall_64+0x67/0x80
[    6.697389]  ? do_syscall_64+0x67/0x80
[    6.697391]  ? sysvec_call_function+0x4b/0xa0
[    6.697393]  ? entry_SYSCALL_64_after_hwframe+0x63/0xcd
[    6.697397]  </TASK>

[    6.798781] rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 4-... } 23 jiffies s: 53 root: 0x10/.
[    6.798787] rcu: blocking rcu_node structures (internal RCU debug):
[    6.798833] Task dump for CPU 4:
[    6.798952] task:systemd-udevd   state:R  running task     stack:    0 pid:  468 ppid:   454 flags:0x0000400a
[    6.798957] Call Trace:
[    6.798959]  <TASK>
[    6.798960]  ? cpumask_any_but+0x35/0x50
[    6.798967]  ? x2apic_send_IPI_allbutself+0x2f/0x40
[    6.798969]  ? do_sync_core+0x2a/0x30
[    6.800010]  ? cpumask_next+0x23/0x30
[    6.800014]  ? smp_call_function_many_cond+0xea/0x370
[    6.800017]  ? text_poke_memset+0x20/0x20
[    6.800019]  ? arch_unregister_cpu+0x50/0x50
[    6.800024]  ? __SCT__kvm_x86_set_rflags+0x8/0x8 [kvm]
[    6.800096]  ? vmx_get_rflags+0x130/0x130 [kvm_intel]
[    6.800109]  ? on_each_cpu_cond_mask+0x1d/0x30
[    6.800110]  ? text_poke_bp_batch+0xaf/0x210
[    6.800113]  ? vmx_get_rflags+0x130/0x130 [kvm_intel]
[    6.800121]  ? __SCT__kvm_x86_set_rflags+0x8/0x8 [kvm]
[    6.800172]  ? vmx_get_rflags+0x130/0x130 [kvm_intel]
[    6.800180]  ? text_poke_bp+0x49/0x70
[    6.800182]  ? __static_call_transform+0x7f/0x120
[    6.800183]  ? arch_static_call_transform+0x58/0xa0
[    6.800185]  ? __SCT__kvm_x86_set_rflags+0x8/0x8 [kvm]
[    6.800233]  ? __static_call_update+0x62/0x220
[    6.800235]  ? vmx_get_rflags+0x130/0x130 [kvm_intel]
[    6.800243]  ? kvm_arch_hardware_setup+0x581/0x17f0 [kvm]
[    6.800284]  ? __kmalloc_node+0x16c/0x380
[    6.800288]  ? kvm_init+0xa2/0x400 [kvm]
[    6.800324]  ? hardware_setup+0x7e2/0x8cc [kvm_intel]
[    6.800334]  ? vmx_init+0xf9/0x201 [kvm_intel]
[    6.800342]  ? hardware_setup+0x8cc/0x8cc [kvm_intel]
[    6.800350]  ? do_one_initcall+0x47/0x1e0
[    6.800352]  ? kmem_cache_alloc_trace+0x16c/0x2b0
[    6.800355]  ? do_init_module+0x50/0x1f0
[    6.800357]  ? load_module+0x21bd/0x25e0
[    6.800358]  ? ima_post_read_file+0xd5/0x100
[    6.800361]  ? kernel_read_file+0x23d/0x2e0
[    6.800364]  ? __do_sys_finit_module+0xbd/0x130
[    6.800365]  ? __do_sys_finit_module+0xbd/0x130
[    6.800368]  ? __x64_sys_finit_module+0x18/0x20
[    6.800369]  ? do_syscall_64+0x58/0x80
[    6.800371]  ? syscall_exit_to_user_mode+0x1b/0x40
[    6.800373]  ? do_syscall_64+0x67/0x80
[    6.800375]  ? switch_fpu_return+0x4e/0xc0
[    6.800377]  ? exit_to_user_mode_prepare+0x184/0x1e0
[    6.800379]  ? syscall_exit_to_user_mode+0x1b/0x40
[    6.800380]  ? do_syscall_64+0x67/0x80
[    6.800382]  ? do_syscall_64+0x67/0x80
[    6.800384]  ? do_syscall_64+0x67/0x80
[    6.800385]  ? sysvec_call_function+0x4b/0xa0
[    6.800387]  ? entry_SYSCALL_64_after_hwframe+0x63/0xcd
[    6.800391]  </TASK>

     Are these related or should I open a new ticket?  These occurred right after boot.
Comment 7 Sean Christopherson 2022-09-01 16:44:22 UTC
On Thu, Sep 01, 2022, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=216388
> 
> --- Comment #6 from Robert Dinse (nanook@eskimo.com) ---
> Installed 5.19.6 on a couple of machines today, still getting CPU stalls but
> in
> random locations:

...

>      Are these related or should I open a new ticket?  These occurred right
> after boot.

Odds are very good that all of the stalls are due to one bug.  Stall warnings fire
when a task or CPU waiting on an RCU grace period hasn't made forward progress in
a certain amount of time.  In both cases, many times the CPU yelling that it's
stalled is a victim and not the culprit, i.e. a stalled task/CPU often indicates
that something is broken elsewhere in the system that is preventing forward progress
on _this_ task/CPU.

Normally I would suggest bisecting, but given that v5.18 is broken for you that
probably isn't an option.

In the logs, are there any common patterns (beyond running KVM)?  E.g. any functions
that show up in stack traces in all instances?  If nothing obvious jumps out, it
might be worth uploading a pile of (compressed) traces somewhere so that others can
poke through them; maybe someone will find the needle.
Comment 8 Robert Dinse 2022-09-01 19:46:38 UTC
     I will scour the logs and see what I can find.  My understanding is that Ubuntu 22.10 is going to be 5.19 based, but Ubuntu does not run a tickless kernel so they would not see this if related to that.  I may try compiling the host machines non-tickless just to see if that makes a difference.

     I could run 5.18 on my workstation but it is not busy enough to frequently see these (maybe once a week).  Oh these three above were all on a KVM guest rather than a host machine.  They were from my web server which is quite a busy machine.  I tried 6.0.0-rc3 on my workstation and it is still wonky.  No longer locking up without error or even magic sys request working but now video will not play from bitchute, and video will play from odysee but without audio while youtube gets both audio and video, all from the same browser (Firefox) so I don't know where to even start with that.
Comment 9 Robert Dinse 2022-09-01 21:37:01 UTC
I spent some time digging through web server logs, these are the logs from the machine that produced the last three CPU stall messages, but this time the stall occurred on apache2, this is cool because that is something I can readily test.  Here is the stall message:

Sep  1 14:26:53 ftp kernel: [   18.819394][  T298] rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 3-... } 4 jiffies s: 441 root: 0x8/.
Sep  1 14:26:53 ftp kernel: [   18.819413][  T298] rcu: blocking rcu_node structures (internal RCU debug):
Sep  1 14:26:53 ftp kernel: [   18.819417][  T298] Task dump for CPU 3:
Sep  1 14:26:53 ftp kernel: [   18.819418][  T298] task:httpd           state:R  running task     stack:    0 pid: 2798 ppid:  2460 flags:0x0000400a
Sep  1 14:26:53 ftp kernel: [   18.819424][  T298] Call Trace:
Sep  1 14:26:53 ftp kernel: [   18.819428][  T298]  <TASK>
Sep  1 14:26:53 ftp kernel: [   18.819437][  T298]  ? alloc_pages+0x90/0x1a0
Sep  1 14:26:53 ftp kernel: [   18.819443][  T298]  ? allocate_slab+0x274/0x460
Sep  1 14:26:53 ftp kernel: [   18.819445][  T298]  ? xa_load+0xa6/0xc0
Sep  1 14:26:53 ftp kernel: [   18.819448][  T298]  ? ___slab_alloc.constprop.0+0x50b/0x5f0
Sep  1 14:26:53 ftp kernel: [   18.819451][  T298]  ? kmem_cache_alloc_lru+0x297/0x360
Sep  1 14:26:53 ftp kernel: [   18.819456][  T298]  ? nfs_find_actor+0x90/0x90 [nfs]
Sep  1 14:26:53 ftp kernel: [   18.819504][  T298]  ? nfs_alloc_inode+0x21/0x60 [nfs]
Sep  1 14:26:53 ftp kernel: [   18.819519][  T298]  ? alloc_inode+0x23/0xc0
Sep  1 14:26:53 ftp kernel: [   18.819526][  T298]  ? nfs_alloc_fhandle+0x30/0x30 [nfs]
Sep  1 14:26:53 ftp kernel: [   18.819541][  T298]  ? iget5_locked+0x53/0xa0
Sep  1 14:26:53 ftp kernel: [   18.819543][  T298]  ? list_lru_add+0x13f/0x190
Sep  1 14:26:53 ftp kernel: [   18.819547][  T298]  ? nfs_fhget+0xd2/0x6d0 [nfs]
Sep  1 14:26:53 ftp kernel: [   18.819570][  T298]  ? nfs_readdir_entry_decode+0x31e/0x440 [nfs]
Sep  1 14:26:53 ftp kernel: [   18.819581][  T298]  ? nfs_readdir_page_filler+0x10d/0x4f0 [nfs]
Sep  1 14:26:53 ftp kernel: [   18.819592][  T298]  ? nfs_readdir_xdr_to_array+0x45e/0x4a0 [nfs]
Sep  1 14:26:53 ftp kernel: [   18.819602][  T298]  ? nfs_readdir+0x2e6/0xea0 [nfs]
Sep  1 14:26:53 ftp kernel: [   18.819613][  T298]  ? iterate_dir+0x9b/0x1d0
Sep  1 14:26:53 ftp kernel: [   18.819615][  T298]  ? __x64_sys_getdents64+0x84/0x120
Sep  1 14:26:53 ftp kernel: [   18.819616][  T298]  ? __ia32_sys_getdents64+0x120/0x120
Sep  1 14:26:53 ftp kernel: [   18.819618][  T298]  ? do_syscall_64+0x5b/0x80
Sep  1 14:26:53 ftp kernel: [   18.819620][  T298]  ? do_user_addr_fault+0x1c1/0x620
Sep  1 14:26:53 ftp kernel: [   18.819622][  T298]  ? exit_to_user_mode_prepare+0x41/0x1e0
Sep  1 14:26:53 ftp kernel: [   18.819625][  T298]  ? irqentry_exit_to_user_mode+0x9/0x30
Sep  1 14:26:53 ftp kernel: [   18.819626][  T298]  ? irqentry_exit+0x1d/0x30
Sep  1 14:26:53 ftp kernel: [   18.819627][  T298]  ? exc_page_fault+0x86/0x160
Sep  1 14:26:53 ftp kernel: [   18.819628][  T298]  ? entry_SYSCALL_64_after_hwframe+0x63/0xcd
Sep  1 14:26:53 ftp kernel: [   18.819631][  T298]  </TASK>

     Now that process is gone, but the parent process is still running and Apache still seems to be responding fine.  Checking the error log, there were no errors with that PID.
Comment 10 Robert Dinse 2022-09-02 05:46:48 UTC
This >MAY< be a compiler issue.  I was wondering why I seem to be the only one having this problem.  Given how frequently it occurs to me, I would expect a gazillion me too's, but so far none.

I know very few seem to be using gcc 12.1.0, which I was using, because I seem to be the only person who had problems compiling 5.18 with it.

Since gcc 12.2.0 was out, I built it today and rebuilt a kernel using it, so far that kernel has not produced any cpu-stall reports.
Comment 11 Robert Dinse 2022-09-02 08:36:28 UTC
Well still happening with gcc-12.2.0 but seems to be somewhat less frequent.

[    7.092312] rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 4-... } 3 jiffies s: 389 root: 0x10/.
[    7.092329] rcu: blocking rcu_node structures (internal RCU debug):
[    7.092332] Task dump for CPU 4:
[    7.092334] task:modprobe        state:R  running task     stack:    0 pid: 1502 ppid:     8 flags:0x0000400a
[    7.092338] Call Trace:
[    7.092344]  <TASK>
[    7.092347]  ? __wake_up_common_lock+0x87/0xc0
[    7.092355]  ? sysvec_apic_timer_interrupt+0x90/0xa0
[    7.092361]  ? insn_get_prefixes+0x1f1/0x440
[    7.092365]  ? load_new_mm_cr3+0x7f/0xe0
[    7.092368]  ? cpumask_any_but+0x35/0x50
[    7.092372]  ? x2apic_send_IPI_allbutself+0x2f/0x40
[    7.092375]  ? do_sync_core+0x2a/0x30
[    7.092379]  ? cpumask_next+0x23/0x30
[    7.092381]  ? smp_call_function_many_cond+0xea/0x370
[    7.092386]  ? text_poke_memset+0x20/0x20
[    7.092389]  ? arch_unregister_cpu+0x50/0x50
[    7.092394]  ? __fscache_acquire_cookie+0x4f4/0x500 [fscache]
[    7.092407]  ? on_each_cpu_cond_mask+0x1d/0x30
[    7.092409]  ? text_poke_bp_batch+0xaf/0x210
[    7.092412]  ? __traceiter_fscache_volume+0x60/0x60 [fscache]
[    7.092421]  ? __fscache_acquire_cookie+0x4f4/0x500 [fscache]
[    7.092429]  ? __fscache_acquire_cookie+0x4f4/0x500 [fscache]
[    7.092438]  ? text_poke_bp+0x49/0x70
[    7.092440]  ? __static_call_transform+0x7f/0x120
[    7.092442]  ? arch_static_call_transform+0x87/0xa0
[    7.092446]  ? __static_call_init+0x167/0x210
[    7.092450]  ? static_call_module_notify+0x13e/0x1a0
[    7.092452]  ? blocking_notifier_call_chain_robust+0x72/0xd0
[    7.092456]  ? load_module+0x2068/0x25e0
[    7.092459]  ? ima_post_read_file+0xd5/0x100
[    7.092464]  ? __do_sys_finit_module+0xbd/0x130
[    7.092466]  ? __do_sys_finit_module+0xbd/0x130
[    7.092469]  ? __x64_sys_finit_module+0x18/0x20
[    7.092470]  ? do_syscall_64+0x5b/0x80
[    7.092474]  ? ksys_mmap_pgoff+0x108/0x250
[    7.092478]  ? do_syscall_64+0x67/0x80
[    7.092480]  ? exit_to_user_mode_prepare+0x41/0x1e0
[    7.092485]  ? syscall_exit_to_user_mode+0x1b/0x40
[    7.092487]  ? do_syscall_64+0x67/0x80
[    7.092489]  ? do_syscall_64+0x67/0x80
[    7.092492]  ? entry_SYSCALL_64_after_hwframe+0x63/0xcd
[    7.092496]  </TASK>
Comment 12 Robert Dinse 2022-09-03 01:37:55 UTC
There are four machines where these seem to happen within moments of a boot, so I am going to five the EOL 5.18.19 a try as they are all guest machines and thus I can easily reboot remotely into a working kernel if 5.18 locks or otherwise does not work.

Another thing I tried was raising the rcu expedited stalls timeout from 20ms to 40ms, but made no difference.
Comment 13 Robert Dinse 2022-09-03 02:03:29 UTC
I am going to 'five' meant to say 'give' but given there is no edit function here...
Comment 14 Robert Dinse 2022-09-03 05:31:49 UTC
Ok, with 5.18.19 no rcu sched detected expedited stalls so this is definitely something that broke between 5.18.19 and 5.19.0.
Comment 15 Robert Dinse 2022-09-03 05:37:59 UTC
Please forgive my lack of knowledge regarding git, but is there a way to get a patch that took the kernel from 5.18.19 to 5.19.0 now that earlier releases of 5.19.x are not on the kernel.org site?  I know there is a patch that goes from 5.18.19 to 5.19.6 and one that goes 5.19.5 to 5.19.6 but I just want to look at the changes between 5.18.19 and 5.19.0.
Comment 16 Robert Dinse 2022-09-04 04:17:15 UTC
*** Bug 216399 has been marked as a duplicate of this bug. ***
Comment 17 Robert Dinse 2022-09-04 05:41:22 UTC
5.18.19 has run for a day on my four busiest servers and on my workstation without errors, where as 5.19.0 would generally generate cpu expedited stall warnings within minutes of boot.  So definitely broken from 5.18.19 -> 5.19.0.
Comment 18 Robert Dinse 2022-09-05 04:06:27 UTC
Since 6.0.0-rc4 came out today I decided to give it a try.  rc3 did not work with my Intel graphics, booted and ran okay except no display, rc2 had issues where on some websites such as odysee.com, video would not play at all even though it was fine on youtube, on others like bitchute, video would play but no audio, rc1 ran for three hours then hard-locked, not even magic-sys-req key worked.

But 4th time's a charm it would seem.  rc4 worked the display properly again.. And video worked on all the websites.  And I didn't get any of the RCU expedited CPU stalls AND it FLIES!  My PHP based wordpress website loads in an awesome 38ms!  And at least half of that is network latency between where I am and my servers are located.

So I'm not going to continue to pursue 5.19, I don't feel real comfortable using release candidates for live workloads but this is working better than anything ever has.'

If 5.19 isn't going to become a long term release candidate perhaps should just close this ticket as will not fix.
Comment 19 Sean Christopherson 2022-09-06 15:52:52 UTC
On Sat, Sep 03, 2022, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=216388
> 
> --- Comment #15 from Robert Dinse (nanook@eskimo.com) ---
> Please forgive my lack of knowledge regarding git, but is there a way to get
> a
> patch that took the kernel from 5.18.19 to 5.19.0 now that earlier releases
> of
> 5.19.x are not on the kernel.org site?

Strictly speaking, no.  Stable branches, i.e. v5.18.x in this case, are effectively
forks.  After v5.18.0, everything that goes into v5.18.y is a unique commit, even
if bug fixes are based on an upstream (master branch) commit.

Visually, it's something like this.

v5.18.0 --> v5.18.1 --> v5.18.2 --> v5.18.y
\
 -> ... -> v5.19.0 -> v5.19.1
           \
            -> ... -> v5.20


IIUC, in this situation v5.18.0 isn't stable enough to test on its own, but the
v5.18.19 candidate is fully healthy.  In that case, if you wanted to bisect between
v5.18.0 and v5.19.0 to figure out what broke in v5.19, the least awful approach
would be to first find what commit(s) between v5.18.0 and v5.18.19 fixed the unrelated
instability in v5.18.0, and then manually apply that commit(s) at every stage when
bisecting between v5.18.0 and v5.19.0 to identify the buggy commit that introduced
the CPU/RCU stalls.

> I know there is a patch that goes from 5.18.19 to 5.19.6

I assume you mean v5.18.19 => v5.18.20?

> and one that goes 5.19.5 to 5.19.6 but I just want to look at the changes
> between 5.18.19 and 5.19.0.

If you just want to look at the changes, you can always do

	git diff <commit A>..<commit B>

e.g.

	git diff v5.18.18..v5.19

but that's going to show _all_ changes in a single diff, i.e. pinpointing exactly
what change broke/fixed something is extremely difficult.
Comment 20 Robert Dinse 2022-09-06 21:44:27 UTC
At this point 6.0rc4 is running flawlessly so whatever was broken in 5.19 is fixed in 6.0.  If 5.19 is going to be a long term support release then it's worth continuing to pursue but if not there is little point.
Comment 21 Robert Dinse 2022-09-17 19:53:18 UTC
Well shite!  6.0rc4 ran perfectly, but 6.0rc5 is back to massive CPU stalls just like 5.19.  AAAARRRRGGGGHHHH!
Comment 22 Robert Dinse 2022-09-17 20:23:22 UTC
Since this is happening all over not just KVM code AND since it's happening even on machines with no KVM-QEMU guests, this ticket is targeting the wrong code and so I'm closing it and opening up a new ticket with more current and extensive details.

Note You need to log in before you can comment on or make changes to this bug.