Bug 198833 - Random system lockup with: kernel: watchdog: BUG: soft lockup - CPU stuck
Summary: Random system lockup with: kernel: watchdog: BUG: soft lockup - CPU stuck
Status: NEW
Alias: None
Product: Memory Management
Classification: Unclassified
Component: Other (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: Andrew Morton
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-02-20 00:14 UTC by Adam Puleo
Modified: 2020-03-29 04:26 UTC (History)
2 users (show)

See Also:
Kernel Version: 4.15.2-1.el7.elrepo.x86_64
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Description of problem and messages file (37.04 KB, application/zip)
2018-02-20 00:14 UTC, Adam Puleo
Details
kern.log with soft lockup and RCU stall information (2.73 MB, text/plain)
2020-03-29 04:19 UTC, ilkka.prusi
Details

Description Adam Puleo 2018-02-20 00:14:10 UTC
Created attachment 274255 [details]
Description of problem and messages file

I have attached two files:
Description.txt: More detailed output from the system.
messages: The messages file both before and during the lockup, and the boot messages.

[1.] One line summary of the problem:
Random system lockup with: kernel: watchdog: BUG: soft lockup - CPU stuck

[2.] Full description of the problem/report:
Hello,

About every 24 hours my computer locks up. The 4.15.2 kernel wrote the crash logs to messages which I have appended here. Since upgrading to the 4.15.3 kernel the system still locks up but I'm not receiving any messages on the console or in the messages file.

The messages are of the form:
Feb 10 19:03:09 nas kernel: watchdog: BUG: soft lockup - CPU#7 stuck for 22s! [splunkd:1999]
...
Feb 10 19:03:09 nas kernel: Call Trace:
Feb 10 19:03:09 nas kernel: native_flush_tlb_others+0x7d/0x130
Feb 10 19:03:09 nas kernel: flush_tlb_mm_range+0xab/0x110
Feb 10 19:03:09 nas kernel: zap_page_range+0xcd/0x140
Feb 10 19:03:09 nas kernel: SyS_madvise+0x40e/0x8f0
Feb 10 19:03:09 nas kernel: ? __audit_syscall_entry+0xac/0xf0
Feb 10 19:03:09 nas kernel: ? syscall_trace_enter+0x1cd/0x2b0
Feb 10 19:03:09 nas kernel: do_syscall_64+0x74/0x1b0
Feb 10 19:03:09 nas kernel: entry_SYSCALL_64_after_hwframe+0x21/0x86

How can I troubleshoot the problem further?

This is a bare metal system that I'm using to learn Splunk at home.

Thank you.
Comment 1 Darksurf 2018-10-06 22:45:17 UTC
Is there something I can do to help troubleshoot this issue. I'm having this very issue with Ryzen 2500U laptops. 4.18 and 4.19rc kernels. I don't even get a machine to boot. it hangs doing this soft lock up error over and over upon attempted boot. I have to boot using noapic in the grub CLI.
Comment 2 ilkka.prusi 2020-03-29 04:17:11 UTC
I am seeing this problem quite often (5.5.13 currently). Last night while computer was idle I did get a lot of soft lockup and RCU stall dumps in the kern.log.
Comment 3 ilkka.prusi 2020-03-29 04:19:45 UTC
Created attachment 288113 [details]
kern.log with soft lockup and RCU stall information

Log from last night (computer idle) with plenty of soft lockup and RCU stall information.
Comment 4 ilkka.prusi 2020-03-29 04:23:12 UTC
Bug #207009 (acpi_idle_enter related) might be related to this bug.

Mar 29 01:50:51 Amaranthea kernel: [ 8316.436645] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
Mar 29 01:50:51 Amaranthea kernel: [ 8316.436654] rcu:  10-...0: (5 GPs behind) idle=2a6/0/0x1 softirq=89288/89288 fqs=2619 
Mar 29 01:50:51 Amaranthea kernel: [ 8316.436660]       (detected by 7, t=5252 jiffies, g=455173, q=336)
Mar 29 01:50:51 Amaranthea kernel: [ 8316.436663] Sending NMI from CPU 7 to CPUs 10:
Mar 29 01:50:51 Amaranthea kernel: [ 8316.437660] NMI backtrace for cpu 10
Mar 29 01:50:51 Amaranthea kernel: [ 8316.437662] CPU: 10 PID: 0 Comm: swapper/10 Tainted: G            E     5.5.13 #1
Mar 29 01:50:51 Amaranthea kernel: [ 8316.437663] Hardware name: System manufacturer System Product Name/TUF B450-PLUS GAMING, BIOS 2008 12/06/2019
Mar 29 01:50:51 Amaranthea kernel: [ 8316.437663] RIP: 0010:__x86_indirect_thunk_rbp+0x3/0x20
Mar 29 01:50:51 Amaranthea kernel: [ 8316.437665] Code: 00 00 00 0f 1f 40 00 0f ae e8 ff e7 0f 1f 84 00 00 00 00 00 0f 1f 40 00 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f ae e8 <ff> e5 0f 1f 84 00 00 00 00 00 0f 1f 40 00 66 66 2e 0f 1f 84 00 00
Mar 29 01:50:51 Amaranthea kernel: [ 8316.437666] RSP: 0018:ffffc900003d4f20 EFLAGS: 00000002
Mar 29 01:50:51 Amaranthea kernel: [ 8316.437667] RAX: 0000000000000000 RBX: ffff8887fea9fe80 RCX: 0000000000000000
Mar 29 01:50:51 Amaranthea kernel: [ 8316.437668] RDX: 0000000000000000 RSI: 0000000000000006 RDI: ffffc90000c53a58
Mar 29 01:50:51 Amaranthea kernel: [ 8316.437669] RBP: ffffffff810fa460 R08: 0000000000000000 R09: ffff8887f2c07800
Mar 29 01:50:51 Amaranthea kernel: [ 8316.437669] R10: ffff8887f2c07618 R11: ffff8887feaac9f8 R12: ffff8887fea9fe40
Mar 29 01:50:51 Amaranthea kernel: [ 8316.437670] R13: ffff8887fea9fe40 R14: ffff8887fea9ff38 R15: ffffc90000c53a58
Mar 29 01:50:51 Amaranthea kernel: [ 8316.437671] FS:  0000000000000000(0000) GS:ffff8887fea80000(0000) knlGS:0000000000000000
Mar 29 01:50:51 Amaranthea kernel: [ 8316.437671] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 29 01:50:51 Amaranthea kernel: [ 8316.437672] CR2: 00007fd7fcb9e180 CR3: 00000007dbc64000 CR4: 00000000003406e0
Mar 29 01:50:51 Amaranthea kernel: [ 8316.437672] Call Trace:
Mar 29 01:50:51 Amaranthea kernel: [ 8316.437672]  <IRQ>
Mar 29 01:50:51 Amaranthea kernel: [ 8316.437673]  ? __hrtimer_run_queues+0xf6/0x2c0
Mar 29 01:50:51 Amaranthea kernel: [ 8316.437673]  ? hrtimer_interrupt+0x10e/0x240
Mar 29 01:50:51 Amaranthea kernel: [ 8316.437674]  ? smp_apic_timer_interrupt+0x6c/0x160
Mar 29 01:50:51 Amaranthea kernel: [ 8316.437674]  ? apic_timer_interrupt+0xf/0x20
Mar 29 01:50:51 Amaranthea kernel: [ 8316.437675]  </IRQ>
Mar 29 01:50:51 Amaranthea kernel: [ 8316.437675]  ? cpuidle_enter_state+0xc9/0x410
Mar 29 01:50:51 Amaranthea kernel: [ 8316.437676]  ? cpuidle_enter_state+0xa4/0x410
Mar 29 01:50:51 Amaranthea kernel: [ 8316.437676]  ? cpuidle_enter+0x29/0x40
Mar 29 01:50:51 Amaranthea kernel: [ 8316.437677]  ? do_idle+0x1d7/0x260
Mar 29 01:50:51 Amaranthea kernel: [ 8316.437677]  ? cpu_startup_entry+0x19/0x20
Mar 29 01:50:51 Amaranthea kernel: [ 8316.437678]  ? start_secondary+0x166/0x1c0
Mar 29 01:50:51 Amaranthea kernel: [ 8316.437678]  ? secondary_startup_64+0xa4/0xb0
Comment 5 ilkka.prusi 2020-03-29 04:26:32 UTC
Linux version 5.5.13, gcc version 9.3.0, cat /proc/modules:

snd_seq_dummy 16384 2 - Live 0x0000000000000000 (E)
snd_hrtimer 16384 2 - Live 0x0000000000000000 (E)
snd_seq_midi 20480 0 - Live 0x0000000000000000 (E)
snd_seq_midi_event 16384 1 snd_seq_midi, Live 0x0000000000000000 (E)
snd_seq 86016 11 snd_seq_dummy,snd_seq_midi,snd_seq_midi_event, Live 0x0000000000000000 (E)
nf_tables 176128 0 - Live 0x0000000000000000 (E)
nfnetlink 16384 1 nf_tables, Live 0x0000000000000000 (E)
binfmt_misc 24576 1 - Live 0x0000000000000000 (E)
nls_ascii 16384 1 - Live 0x0000000000000000 (E)
nls_cp437 20480 1 - Live 0x0000000000000000 (E)
vfat 20480 1 - Live 0x0000000000000000 (E)
fat 86016 1 vfat, Live 0x0000000000000000 (E)
snd_hda_codec_realtek 126976 1 - Live 0x0000000000000000 (E)
amdgpu 4648960 13 - Live 0x0000000000000000 (E)
eeepc_wmi 16384 0 - Live 0x0000000000000000 (E)
asus_wmi 36864 1 eeepc_wmi, Live 0x0000000000000000 (E)
edac_mce_amd 32768 0 - Live 0x0000000000000000 (E)
snd_hda_codec_hdmi 73728 1 - Live 0x0000000000000000 (E)
snd_hda_codec_generic 94208 1 snd_hda_codec_realtek, Live 0x0000000000000000 (E)
snd_usb_audio 290816 2 - Live 0x0000000000000000 (E)
ledtrig_audio 16384 2 snd_hda_codec_realtek,snd_hda_codec_generic, Live 0x0000000000000000 (E)
kvm_amd 114688 0 - Live 0x0000000000000000 (E)
snd_usbmidi_lib 36864 1 snd_usb_audio, Live 0x0000000000000000 (E)
snd_hda_intel 57344 3 - Live 0x0000000000000000 (E)
kvm 790528 1 kvm_amd, Live 0x0000000000000000 (E)
battery 20480 1 asus_wmi, Live 0x0000000000000000 (E)
sparse_keymap 16384 1 asus_wmi, Live 0x0000000000000000 (E)
video 53248 1 asus_wmi, Live 0x0000000000000000 (E)
snd_intel_dspcfg 20480 1 snd_hda_intel, Live 0x0000000000000000 (E)
joydev 28672 0 - Live 0x0000000000000000 (E)
irqbypass 16384 1 kvm, Live 0x0000000000000000 (E)
snd_rawmidi 45056 2 snd_seq_midi,snd_usbmidi_lib, Live 0x0000000000000000 (E)
snd_hda_codec 163840 4 snd_hda_codec_realtek,snd_hda_codec_hdmi,snd_hda_codec_generic,snd_hda_intel, Live 0x0000000000000000 (E)
rfkill 28672 4 asus_wmi, Live 0x0000000000000000 (E)
crct10dif_pclmul 16384 1 - Live 0x0000000000000000 (E)
snd_seq_device 16384 3 snd_seq_midi,snd_seq,snd_rawmidi, Live 0x0000000000000000 (E)
crc32_pclmul 16384 0 - Live 0x0000000000000000 (E)
snd_hda_core 102400 5 snd_hda_codec_realtek,snd_hda_codec_hdmi,snd_hda_codec_generic,snd_hda_intel,snd_hda_codec, Live 0x0000000000000000 (E)
gpu_sched 36864 1 amdgpu, Live 0x0000000000000000 (E)
ghash_clmulni_intel 16384 0 - Live 0x0000000000000000 (E)
mc 57344 1 snd_usb_audio, Live 0x0000000000000000 (E)
wmi_bmof 16384 0 - Live 0x0000000000000000 (E)
r8169 94208 0 - Live 0x0000000000000000 (E)
snd_hwdep 16384 2 snd_usb_audio,snd_hda_codec, Live 0x0000000000000000 (E)
aesni_intel 368640 0 - Live 0x0000000000000000 (E)
ttm 118784 1 amdgpu, Live 0x0000000000000000 (E)
snd_pcm_oss 65536 0 - Live 0x0000000000000000 (E)
drm_kms_helper 233472 1 amdgpu, Live 0x0000000000000000 (E)
snd_mixer_oss 28672 1 snd_pcm_oss, Live 0x0000000000000000 (E)
crypto_simd 16384 1 aesni_intel, Live 0x0000000000000000 (E)
snd_pcm 131072 6 snd_hda_codec_hdmi,snd_usb_audio,snd_hda_intel,snd_hda_codec,snd_hda_core,snd_pcm_oss, Live 0x0000000000000000 (E)
cryptd 24576 2 ghash_clmulni_intel,crypto_simd, Live 0x0000000000000000 (E)
drm 532480 8 amdgpu,gpu_sched,ttm,drm_kms_helper, Live 0x0000000000000000 (E)
snd_timer 40960 3 snd_hrtimer,snd_seq,snd_pcm, Live 0x0000000000000000 (E)
glue_helper 16384 1 aesni_intel, Live 0x0000000000000000 (E)
agpgart 49152 2 ttm,drm, Live 0x0000000000000000 (E)
snd 102400 27 snd_seq,snd_hda_codec_realtek,snd_hda_codec_hdmi,snd_hda_codec_generic,snd_usb_audio,snd_usbmidi_lib,snd_hda_intel,snd_rawmidi,snd_hda_codec,snd_seq_device,snd_hwdep,snd_pcm_oss,snd_mixer_oss,snd_pcm,snd_timer, Live 0x0000000000000000 (E)
realtek 24576 1 - Live 0x0000000000000000 (E)
sr_mod 28672 0 - Live 0x0000000000000000 (E)
libphy 106496 2 r8169,realtek, Live 0x0000000000000000 (E)
sp5100_tco 20480 0 - Live 0x0000000000000000 (E)
soundcore 16384 1 snd, Live 0x0000000000000000 (E)
efi_pstore 16384 0 - Live 0x0000000000000000 (E)
i2c_piix4 28672 0 - Live 0x0000000000000000 (E)
cdrom 73728 1 sr_mod, Live 0x0000000000000000 (E)
k10temp 16384 0 - Live 0x0000000000000000 (E)
i2c_algo_bit 16384 1 amdgpu, Live 0x0000000000000000 (E)
ccp 98304 7 kvm_amd, Live 0x0000000000000000 (E)
sg 36864 0 - Live 0x0000000000000000 (E)
efivars 20480 1 efi_pstore, Live 0x0000000000000000 (E)
rng_core 16384 1 ccp, Live 0x0000000000000000 (E)
wmi 36864 2 asus_wmi,wmi_bmof, Live 0x0000000000000000 (E)
button 24576 0 - Live 0x0000000000000000 (E)
acpi_cpufreq 28672 0 - Live 0x0000000000000000 (E)
nfsd 479232 13 - Live 0x0000000000000000 (E)
parport_pc 36864 0 - Live 0x0000000000000000 (E)
ppdev 24576 0 - Live 0x0000000000000000 (E)
auth_rpcgss 110592 1 nfsd, Live 0x0000000000000000 (E)
lp 20480 0 - Live 0x0000000000000000 (E)
nfs_acl 16384 1 nfsd, Live 0x0000000000000000 (E)
lockd 122880 1 nfsd, Live 0x0000000000000000 (E)
parport 61440 3 parport_pc,ppdev,lp, Live 0x0000000000000000 (E)
grace 16384 2 nfsd,lockd, Live 0x0000000000000000 (E)
sunrpc 491520 18 nfsd,auth_rpcgss,nfs_acl,lockd, Live 0x0000000000000000 (E)
efivarfs 16384 1 - Live 0x0000000000000000 (E)
ip_tables 32768 0 - Live 0x0000000000000000 (E)
x_tables 53248 1 ip_tables, Live 0x0000000000000000 (E)
autofs4 53248 2 - Live 0x0000000000000000 (E)
raid10 65536 0 - Live 0x0000000000000000 (E)
raid456 172032 0 - Live 0x0000000000000000 (E)
libcrc32c 16384 1 raid456, Live 0x0000000000000000 (E)
async_raid6_recov 24576 1 raid456, Live 0x0000000000000000 (E)
async_memcpy 20480 2 raid456,async_raid6_recov, Live 0x0000000000000000 (E)
async_pq 20480 2 raid456,async_raid6_recov, Live 0x0000000000000000 (E)
async_xor 20480 3 raid456,async_raid6_recov,async_pq, Live 0x0000000000000000 (E)
xor 24576 1 async_xor, Live 0x0000000000000000 (E)
async_tx 20480 5 raid456,async_raid6_recov,async_memcpy,async_pq,async_xor, Live 0x0000000000000000 (E)
raid6_pq 122880 3 raid456,async_raid6_recov,async_pq, Live 0x0000000000000000 (E)
raid1 49152 0 - Live 0x0000000000000000 (E)
raid0 24576 0 - Live 0x0000000000000000 (E)
multipath 20480 0 - Live 0x0000000000000000 (E)
linear 20480 0 - Live 0x0000000000000000 (E)
md_mod 180224 6 raid10,raid456,raid1,raid0,multipath,linear, Live 0x0000000000000000 (E)
sd_mod 57344 9 - Live 0x0000000000000000 (E)
input_leds 16384 0 - Live 0x0000000000000000 (E)
evdev 28672 16 - Live 0x0000000000000000 (E)
hid_steam 20480 0 - Live 0x0000000000000000 (E)
hid_generic 16384 0 - Live 0x0000000000000000 (E)
usbhid 65536 0 - Live 0x0000000000000000 (E)
hid 147456 3 hid_steam,hid_generic,usbhid, Live 0x0000000000000000 (E)
ahci 40960 6 - Live 0x0000000000000000 (E)
xhci_pci 20480 0 - Live 0x0000000000000000 (E)
libahci 45056 1 ahci, Live 0x0000000000000000 (E)
crc32c_intel 24576 9 - Live 0x0000000000000000 (E)
xhci_hcd 282624 1 xhci_pci, Live 0x0000000000000000 (E)
libata 286720 2 ahci,libahci, Live 0x0000000000000000 (E)
usbcore 315392 5 snd_usb_audio,snd_usbmidi_lib,usbhid,xhci_pci,xhci_hcd, Live 0x0000000000000000 (E)
scsi_mod 245760 4 sr_mod,sg,sd_mod,libata, Live 0x0000000000000000 (E)
gpio_amdpt 20480 0 - Live 0x0000000000000000 (E)
gpio_generic 16384 1 gpio_amdpt, Live 0x0000000000000000 (E)

Note You need to log in before you can comment on or make changes to this bug.