Bug 209867 - CPU soft lockup/stall with nested KVM and SMP
Summary: CPU soft lockup/stall with nested KVM and SMP
Status: RESOLVED CODE_FIX
Alias: None
Product: Virtualization
Classification: Unclassified
Component: kvm (show other bugs)
Hardware: All Linux
: P1 high
Assignee: virtualization_kvm
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-10-26 12:13 UTC by Frantisek Sumsal
Modified: 2020-12-23 12:31 UTC (History)
1 user (show)

See Also:
Kernel Version: 5.9.11-arch2-1
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
RHEL 8.2 host (AMD) - full log (59.16 KB, text/plain)
2020-10-26 13:10 UTC, Frantisek Sumsal
Details
Fedora 32 host (AMD) - full log (71.79 KB, text/plain)
2020-10-26 13:10 UTC, Frantisek Sumsal
Details

Description Frantisek Sumsal 2020-10-26 12:13:54 UTC
Hello,

During my systemd CI adventures I've encountered an issue with kernel 5.9.x where the boot freezes at completely random moments because of a CPU soft lockup. From my testing it seems to be reproducible with nested KVM & SMP > 1 (it does happen with SMP == 1 as well, but not always) - see[0].

Reproducer is quite straightforward - enable nested KVM on the host, create a VM, and create a nested KVM VM in that VM. During my testing I used Vagrant[1] (with libvirt backend) for the outer VM, and an image generated by mkosi[2] for the inner VM. Both VMs run the same kernel version.

Hosts:
 * several AMD & Intel servers with RHEL 8.2 (4.18.0-193.19.1.el8_2)
 * AMD desktop with Fedora 32 (5.6.2-300.fc32.x86_64)

The behavior was consistent on all hosts.

Desktop results:
# qemu-system-x86_64 -net none -smp 2 -m 512 -nographic -machine accel=kvm -enable-kvm -cpu host -kernel /boot/vmlinuz-linux -initrd /boot/initramfs-linux.img -append 'debug rw console=ttyS0 root=/dev/sda1' -drive format=raw,file=image.raw
...
[    4.602193] random: dbus-daemon: uninitialized urandom read (12 bytes read)
[    5.538763] random: crng init done
[   28.635398] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [systemd:1]
[   28.638215] Modules linked in: drm agpgart ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 sr_mod cdrom ata_generic pata_acpi crc32_pclmul crc32c_intel serio_raw atkbd libps2 aesni_intel glue_helper crypto_simd cryptd ata_piix floppy i8042 serio
[   28.642668] CPU: 2 PID: 1 Comm: systemd Not tainted 5.9.1-arch1-1 #1
[   28.648865] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.14.0-1 04/01/2014
[   28.648865] RIP: 0010:smp_call_function_many_cond+0x2a3/0x2f0
[   28.655420] Code: c3 0d 3d 00 3b 05 61 0a 83 01 89 c7 0f 83 f4 fd ff ff 48 63 c7 49 8b 55 00 48 03 14 c5 00 19 81 8b 8b 42 08 a8 01 74 09 f3 90 <8b> 42 08 a8 01 75 f7 eb c9 48 c7 c2 60 45 d7 8b 48 89 ee 44 89 ff
[   28.655420] RSP: 0018:ffffacf800013b18 EFLAGS: 00000202
[   28.668750] RAX: 0000000000000011 RBX: 0000000000000000 RCX: 0000000000000000
[   28.668750] RDX: ffff91c39da333e0 RSI: 0000000000000000 RDI: 0000000000000000
[   28.668750] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
[   28.668750] R10: 0000000000000140 R11: 0000000000000002 R12: 0000000000000000
[   28.682088] R13: ffff91c39db2d340 R14: 0000000000000140 R15: ffff91c39db2d348
[   28.682088] FS:  00007fd4cbc04340(0000) GS:ffff91c39db00000(0000) knlGS:0000000000000000
[   28.682088] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   28.682088] CR2: 00007fd4cc9e8520 CR3: 000000001fa8e000 CR4: 0000000000350ee0
[   28.695419] Call Trace:
[   28.695419]  ? __flush_tlb_all+0x30/0x30
[   28.695419]  ? __flush_tlb_all+0x30/0x30
[   28.695419]  on_each_cpu+0x43/0xb0
[   28.695419]  __purge_vmap_area_lazy+0x5d/0x670
[   28.695419]  ? do_jit+0xbdf/0x1cd0
[   28.708758]  ? purge_fragmented_blocks+0xbd/0x1a0
[   28.708758]  _vm_unmap_aliases.part.0+0x110/0x140
[   28.708758]  change_page_attr_set_clr+0xb9/0x1c0
[   28.708758]  set_memory_ro+0x26/0x30
[   28.708758]  bpf_int_jit_compile+0x407/0x42b
[   28.708758]  bpf_prog_select_runtime+0x101/0x1a0
[   28.708758]  bpf_prog_load+0x49a/0x8e0
[   28.722089]  __do_sys_bpf+0x2dd/0x1ea0
[   28.722089]  do_syscall_64+0x33/0x40
[   28.722089]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   28.722089] RIP: 0033:0x7fd4cc91ed5d
[   28.722089] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e3 70 0c 00 f7 d8 64 89 01 48
[   28.735427] RSP: 002b:00007ffe18ec4218 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
[   28.735427] RAX: ffffffffffffffda RBX: 000055cea99a9ab0 RCX: 00007fd4cc91ed5d
[   28.735427] RDX: 0000000000000070 RSI: 00007ffe18ec4220 RDI: 0000000000000005
[   28.748752] RBP: 0000000000000000 R08: 0070756f7267632f R09: 0000000800000008
[   28.748752] R10: 0000000000000000 R11: 0000000000000246 R12: 000055cea999fb20
[   28.748752] R13: 0000000000000001 R14: 0000000000000001 R15: 000055cea99836a0
[   56.635397] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [systemd:1]
[   56.638254] Modules linked in: drm agpgart ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 sr_mod cdrom ata_generic pata_acpi crc32_pclmul crc32c_intel serio_raw atkbd libps2 aesni_intel glue_helper crypto_simd cryptd ata_piix floppy i8042 serio
[   56.642094] CPU: 2 PID: 1 Comm: systemd Tainted: G             L    5.9.1-arch1-1 #1
[   56.648798] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.14.0-1 04/01/2014
[   56.648798] RIP: 0010:smp_call_function_many_cond+0x2a3/0x2f0
[   56.655444] Code: c3 0d 3d 00 3b 05 61 0a 83 01 89 c7 0f 83 f4 fd ff ff 48 63 c7 49 8b 55 00 48 03 14 c5 00 19 81 8b 8b 42 08 a8 01 74 09 f3 90 <8b> 42 08 a8 01 75 f7 eb c9 48 c7 c2 60 45 d7 8b 48 89 ee 44 89 ff
[   56.655444] RSP: 0018:ffffacf800013b18 EFLAGS: 00000202
[   56.655444] RAX: 0000000000000011 RBX: 0000000000000000 RCX: 0000000000000000
[   56.668871] RDX: ffff91c39da333e0 RSI: 0000000000000000 RDI: 0000000000000000
[   56.668871] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
[   56.668871] R10: 0000000000000140 R11: 0000000000000002 R12: 0000000000000000
[   56.668871] R13: ffff91c39db2d340 R14: 0000000000000140 R15: ffff91c39db2d348
[   56.668871] FS:  00007fd4cbc04340(0000) GS:ffff91c39db00000(0000) knlGS:0000000000000000
[   56.682244] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   56.682244] CR2: 00007fd4cc9e8520 CR3: 000000001fa8e000 CR4: 0000000000350ee0
[   56.682244] Call Trace:
[   56.682244]  ? __flush_tlb_all+0x30/0x30
[   56.682244]  ? __flush_tlb_all+0x30/0x30
[   56.682244]  on_each_cpu+0x43/0xb0
[   56.682244]  __purge_vmap_area_lazy+0x5d/0x670
[   56.695525]  ? do_jit+0xbdf/0x1cd0
[   56.695525]  ? purge_fragmented_blocks+0xbd/0x1a0
[   56.695525]  _vm_unmap_aliases.part.0+0x110/0x140
[   56.695525]  change_page_attr_set_clr+0xb9/0x1c0
[   56.695525]  set_memory_ro+0x26/0x30
[   56.695525]  bpf_int_jit_compile+0x407/0x42b
[   56.695525]  bpf_prog_select_runtime+0x101/0x1a0
[   56.708855]  bpf_prog_load+0x49a/0x8e0
[   56.708855]  __do_sys_bpf+0x2dd/0x1ea0
[   56.708855]  do_syscall_64+0x33/0x40
[   56.708855]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   56.708855] RIP: 0033:0x7fd4cc91ed5d
[   56.708855] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e3 70 0c 00 f7 d8 64 89 01 48
[   56.722223] RSP: 002b:00007ffe18ec4218 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
[   56.722223] RAX: ffffffffffffffda RBX: 000055cea99a9ab0 RCX: 00007fd4cc91ed5d
[   56.722223] RDX: 0000000000000070 RSI: 00007ffe18ec4220 RDI: 0000000000000005
[   56.722223] RBP: 0000000000000000 R08: 0070756f7267632f R09: 0000000800000008
[   56.735526] R10: 0000000000000000 R11: 0000000000000246 R12: 000055cea999fb20
[   56.735526] R13: 0000000000000001 R14: 0000000000000001 R15: 000055cea99836a0
...
[   64.578716] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[   64.578716] 	(detected by 2, t=18002 jiffies, g=-207, q=1973)
[   64.578716] rcu: All QSes seen, last rcu_preempt kthread activity 18002 (4294896513-4294878511), jiffies_till_next_fqs=2, root ->qsmask 0x0
[   64.578716] rcu: rcu_preempt kthread starved for 18002 jiffies! g-207 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x200 ->cpu=0
[   64.588745] rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
[   64.588745] rcu: RCU grace-period kthread stack dump:
[   64.588745] task:rcu_preempt     state:R stack:    0 pid:   11 ppid:     2 flags:0x00004000
[   64.602163] Call Trace:
[   64.602163]  __schedule+0x292/0x830
[   64.602163]  schedule+0x46/0xf0
[   64.602163]  schedule_timeout+0x99/0x170
[   64.602163]  ? __next_timer_interrupt+0x100/0x100
[   64.602163]  rcu_gp_kthread+0x5a4/0xbe0
[   64.602163]  ? __note_gp_changes+0x190/0x190
[   64.602163]  kthread+0x142/0x160
[   64.602163]  ? __kthread_bind_mask+0x60/0x60
[   64.615482]  ret_from_fork+0x22/0x30
...

Server results:
...
[   32.051205] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]
[   32.051237] Modules linked in:
[   32.051237] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.9.1-arch1-1 #1
[   32.051237] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.14.0-1 04/01/2014
[   32.051237] RIP: 0010:smp_call_function_many_cond+0x2a3/0x2f0
[   32.051237] Code: c3 0d 3d 00 3b 05 61 0a 83 01 89 c7 0f 83 f4 fd ff ff 48 63 c7 49 8b 55 00 48 03 14 c5 00 19 41 bb 8b 42 08 a8 01 74 09 f3 90 <8b> 42 08 a8 01 75 f7 eb c9 48 c7 c2 60 45 97 bb 48 89 ee 44 89 ff
[   32.051237] RSP: 0018:ffffa661c0013d98 EFLAGS: 00000202
[   32.051237] RAX: 0000000000000011 RBX: 0000000000000000 RCX: 0000000000000004
[   32.051237] RDX: ffff9e955db320a0 RSI: 0000000000000000 RDI: 0000000000000004
[   32.051237] RBP: 0000000000000007 R08: 0000000000000000 R09: 0000000000000004
[   32.051237] R10: 0000000000000005 R11: 0000000000000005 R12: 0000000000000000
[   32.051237] R13: ffff9e955da2d340 R14: 0000000000000140 R15: ffff9e955da2d348
[   32.051237] FS:  0000000000000000(0000) GS:ffff9e955da00000(0000) knlGS:0000000000000000
[   32.051237] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   32.051237] CR2: 0000000000000000 CR3: 000000001840e000 CR4: 00000000000406f0
[   32.051237] Call Trace:
[   32.051237]  ? _raw_spin_unlock+0x16/0x30
[   32.051237]  ? text_poke_loc_init+0x160/0x160
[   32.051237]  ? text_poke_loc_init+0x160/0x160
[   32.051237]  on_each_cpu+0x43/0xb0
[   32.051237]  text_poke_bp_batch+0x1d7/0x200
[   32.051237]  text_poke_finish+0x1b/0x26
[   32.051237]  arch_jump_label_transform_apply+0x16/0x30
[   32.051237]  static_key_slow_inc_cpuslocked+0x7a/0x90
[   32.051237]  static_key_slow_inc+0x16/0x20
[   32.051237]  ? kvm_init_platform+0x16/0x16
[   32.051237]  activate_jump_labels+0x2f/0x32
[   32.051237]  do_one_initcall+0x59/0x234
[   32.051237]  kernel_init_freeable+0x1b0/0x1f5
[   32.051237]  ? rest_init+0xbf/0xbf
[   32.051237]  kernel_init+0xa/0x111
[   32.051237]  ret_from_fork+0x22/0x30
...


Frankly, I'm at wits' end, as I've been noticing similar issues since kernel 5.8.x and still can pinpoint what's going on (again, see [0]), thus my aplogies if I filed this under a wrong component.

Thank you.


[0] https://github.com/systemd/systemd-centos-ci/pull/295#issuecomment-682519585
[1]
Vagrant.configure("2") do |config|
  config.vm.box = "generic/arch"
  config.vm.provider :libvirt do |libvirt|
    libvirt.cpus = 4
    libvirt.memory = "2048"
    libvirt.driver = "kvm"
    libvirt.nested = true
    libvirt.cpu_mode = "host-model"
    libvirt.random :model => "random"
  end
end

[2] # mkosi -b -d arch --qemu-headless -t gpt_ext4
Comment 1 Frantisek Sumsal 2020-10-26 13:10:30 UTC
Created attachment 293197 [details]
RHEL 8.2 host (AMD) - full log
Comment 2 Frantisek Sumsal 2020-10-26 13:10:47 UTC
Created attachment 293199 [details]
Fedora 32 host (AMD) - full log
Comment 3 Frantisek Sumsal 2020-11-02 16:24:03 UTC
Clarification: the issue seems to appear only on AMD CPUs. I went through several runs and tests in the "AMD[0] rack" suffer from the soft lockup above, but the same workload passes on machines from the "Intel[1] rack"

[0] AMD Opteron 63xx class CPU (family: 0x15, model: 0x2, stepping: 0x0)
[1] Intel(R) Xeon(R) CPU E3-1265L V2 @ 2.50GHz (family: 0x6, model: 0x3a, stepping: 0x9)
Comment 4 Frantisek Sumsal 2020-11-09 10:59:27 UTC
Results with kernel 5.9.6:

[    4.353614] PCI: Using configuration type 1 for extended access
[    4.361708] HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages
[    4.363625] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
[   64.373614] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[   64.376918] rcu: 	3-...0: (0 ticks this GP) idle=95a/1/0x4000000000000000 softirq=18/18 fqs=6000 last_accelerate: 0000/e77e dyntick_enabled: 0
[   64.376918] 	(detected by 0, t=18002 jiffies, g=-1123, q=62)
[   64.376918] Sending NMI from CPU 0 to CPUs 3:
[  244.390281] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[  244.393584] rcu: 	3-...0: (0 ticks this GP) idle=95a/1/0x4000000000000000 softirq=18/18 fqs=24002 last_accelerate: 0000/ba73 dyntick_enabled: 0
[  244.393584] 	(detected by 0, t=72007 jiffies, g=-1123, q=62)
[  244.393584] Sending NMI from CPU 0 to CPUs 3:
[  424.406947] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[  424.410251] rcu: 	3-...0: (0 ticks this GP) idle=95a/1/0x4000000000000000 softirq=18/18 fqs=42004 last_accelerate: 0000/8d68 dyntick_enabled: 0
[  424.410251] 	(detected by 0, t=126012 jiffies, g=-1123, q=62)
[  424.410251] Sending NMI from CPU 0 to CPUs 3:
qemu-system-x86_64: terminating on signal 15 from pid 31982 (timeout)
Comment 5 Frantisek Sumsal 2020-11-27 10:21:53 UTC
I noticed there's a MSR access error when trying to online secondary CPUs, which may be relevant:

[    3.969876] Last level dTLB entries: 4KB 512, 2MB 255, 4MB 127, 1GB 0
[    3.973256] Spectre V1 : Mitigation: usercopy/swapgs barriers and __user pointer sanitization
[    3.976544] Spectre V2 : Mitigation: Full AMD retpoline
[    3.979874] Spectre V2 : Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch
[    3.983210] Spectre V2 : mitigation: Enabling conditional Indirect Branch Prediction Barrier
[    3.986544] Speculative Store Bypass: Mitigation: Speculative Store Bypass disabled via prctl and seccomp
[    3.990704] Freeing SMP alternatives memory: 32K
[    3.997866] smpboot: CPU0: AMD Opteron 63xx class CPU (family: 0x15, model: 0x2, stepping: 0x0)
[    4.001938] Performance Events: Fam15h core perfctr, AMD PMU driver.
[    4.003261] ... version:                0
[    4.006576] ... bit width:              48
[    4.009900] ... generic registers:      6
[    4.013234] ... value mask:             0000ffffffffffff
[    4.016567] ... max period:             00007fffffffffff
[    4.019900] ... fixed-purpose events:   0
[    4.023233] ... event mask:             000000000000003f
[    4.026887] rcu: Hierarchical SRCU implementation.
[    4.030952] smp: Bringing up secondary CPUs ...
[    4.034030] x86: Booting SMP configuration:

[    4.036581] .... node  #0, CPUs:      #1
[    1.328014] kvm-clock: cpu 1, msr 8801041, secondary cpu clock
[    1.328014] smpboot: CPU 1 Converting physical 0 to logical die 1
[    1.328014] unchecked MSR access error: WRMSR to 0x48 (tried to write 0x0000000000000000) at rIP: 0xffffffff9da6c984 (native_write_msr+0x4/0x20)
[    1.328014] Call Trace:
[    1.328014]  x86_spec_ctrl_setup_ap+0x34/0x50
[    1.328014]  identify_secondary_cpu+0x6c/0x80
[    1.328014]  smp_store_cpu_info+0x45/0x50
[    1.328014]  start_secondary+0x58/0x160
[    1.328014]  secondary_startup_64+0xb6/0xc0
[    6.088346] kvm-guest: stealtime: cpu 1, msr 1e66e080
[    6.094247]  #2
[    1.328014] kvm-clock: cpu 2, msr 8801081, secondary cpu clock
[    1.328014] smpboot: CPU 2 Converting physical 0 to logical die 2
[    6.123987] kvm-guest: stealtime: cpu 2, msr 1e6ae080
Comment 6 taz.007 2020-12-01 08:39:26 UTC
I've got a similar stack trace, but with a completely different context. This is a small htpc box, no kvm involved, Intel CPU, 32bits.

déc 01 05:01:20 Aspire kernel: watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [sshd:20874]
déc 01 05:01:20 Aspire kernel: Modules linked in: mptcp_diag tcp_diag udp_diag raw_diag inet_diag rpcsec_gss_krb5 md4 cmac nls_utf8 cifs libdes dns_resolver fscache fuse hwmon_vid nouveau ath5k snd_hda_codec_hdmi ath mxm_wmi ttm snd_hda_codec_realtek mac80211 snd_hda_codec_generic drm_kms_helper ledtrig_audio snd_hda_intel cfg80211 snd_intel_dspcfg mousedev cec input_leds snd_hda_codec rc_core snd_hda_core syscopyarea rfkill hid_generic sysfillrect snd_hwdep wmi_bmof libarc4 snd_pcm sysimgblt snd_timer fb_sys_fops coretemp usbhid uas hid pcspkr i2c_algo_bit usb_storage snd nv_tco soundcore forcedeth i2c_nforce2 wmi evdev nfsd auth_rpcgss tcp_bbr nfs_acl lockd grace sunrpc sg drm nfs_ssc agpgart ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 ohci_pci ehci_pci ehci_hcd ohci_hcd
déc 01 05:01:20 Aspire kernel: CPU: 2 PID: 20874 Comm: sshd Tainted: G      D           5.9.9-arch1-1 #1
déc 01 05:01:20 Aspire kernel: Hardware name: Acer Aspire R3610/FMCP7A-ION-LE, BIOS P01-A4 11/03/2009
déc 01 05:01:20 Aspire kernel: EIP: queued_spin_lock_slowpath+0x42/0x200
déc 01 05:01:20 Aspire kernel: Code: 8b 01 0f b6 d2 c1 e2 08 30 e4 09 d0 a9 00 01 ff ff 0f 85 21 01 00 00 85 c0 74 15 8b 01 84 c0 74 0f 8d b4 26 00 00 00 00 f3 90 <8b> 01 84 c0 75 f8 b8 01 00 00 00 66 89 01 64 ff 05 c0 1e bd c6 c3
déc 01 05:01:20 Aspire kernel: EAX: 00000101 EBX: df4f1ea4 ECX: c6bf5e68 EDX: 00000000
déc 01 05:01:20 Aspire kernel: ESI: 00000001 EDI: df4f1e58 EBP: df4f1dcc ESP: df4f1dc8
déc 01 05:01:20 Aspire kernel: DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00000202
déc 01 05:01:20 Aspire kernel: CR0: 80050033 CR2: 0048a107 CR3: 218a8000 CR4: 000006d0
déc 01 05:01:20 Aspire kernel: Call Trace:
déc 01 05:01:20 Aspire kernel:  ? _raw_spin_lock+0x2c/0x30
déc 01 05:01:20 Aspire kernel:  __change_page_attr_set_clr+0x45/0x740
déc 01 05:01:20 Aspire kernel:  ? _vm_unmap_aliases.part.0+0x114/0x130
déc 01 05:01:20 Aspire kernel:  change_page_attr_set_clr+0xd0/0x2a0
déc 01 05:01:20 Aspire kernel:  set_memory_ro+0x1b/0x20
déc 01 05:01:20 Aspire kernel:  bpf_prog_select_runtime+0x16c/0x1b0
déc 01 05:01:20 Aspire kernel:  bpf_migrate_filter+0xe2/0x130
déc 01 05:01:20 Aspire kernel:  bpf_prog_create_from_user+0x147/0x190
déc 01 05:01:20 Aspire kernel:  ? hardlockup_detector_perf_cleanup+0x70/0x70
déc 01 05:01:20 Aspire kernel:  do_seccomp+0x22d/0x9a0
déc 01 05:01:20 Aspire kernel:  ? security_task_prctl+0x38/0x90
déc 01 05:01:20 Aspire kernel:  prctl_set_seccomp+0x27/0x40
déc 01 05:01:20 Aspire kernel:  __ia32_sys_prctl+0x87/0x4f0
déc 01 05:01:20 Aspire kernel:  __do_fast_syscall_32+0x40/0x70
déc 01 05:01:20 Aspire kernel:  do_fast_syscall_32+0x29/0x60
déc 01 05:01:20 Aspire kernel:  do_SYSENTER_32+0x15/0x20
déc 01 05:01:20 Aspire kernel:  entry_SYSENTER_32+0x9f/0xf2
déc 01 05:01:20 Aspire kernel: EIP: 0xb7fd0549
déc 01 05:01:20 Aspire kernel: Code: b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 00 00 00 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d 76 00 58 b8 77 00 00 00 cd 80 90 8d 76
déc 01 05:01:20 Aspire kernel: EAX: ffffffda EBX: 00000016 ECX: 00000002 EDX: 004f4d5c
déc 01 05:01:20 Aspire kernel: ESI: 00000000 EDI: 00000001 EBP: b7b6ae1c ESP: bffff6ec
déc 01 05:01:20 Aspire kernel: DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000292

kernel 5.9.9-arch1-1

Feel free to report if it's non related and I'll open a new bug report about it.
Comment 7 Frantisek Sumsal 2020-12-23 12:31:05 UTC
So, it looks like the issue was either resolved in kernel 5.9.12+ (currently on 5.9.14) or upgrade of the hypervisors to CentOS 8.3 (4.18.0-240.1.1.el8_3) helped. Unfortunately, I have no way to easily check which one of them is the real fix here.

As for you, Taz, please open a new bug if you still encounter the issue you mentioned, so it won't get forgotten.

Note You need to log in before you can comment on or make changes to this bug.