Created attachment 300727 [details] dmesg output with initcall_debug, no_console_suspend, ignore_loglevel This bug started recently on kernel 5.17.x I believe, I should do a downgrade to confirm this, however I'm pretty confident this issue wasn't here before a recent upgrade I've made to the kernel. So far testing under the Linux lts 5.15.32-arch1-1 shows that this issue is not present. The hardware is a ThinkPad T495 AMD Ryzen 7 PRO 3700U with a Radeon Vega RX10. Current linux-firmware installed : 20220209.6342082-1 If an X11 server is running resuming fails as the screen never comes back on, weirdly in the TTY it does resume, not sure if it's a relevant detail. Dmesg attached was made with cmdline options : initcall_debug, no_console_suspend, ignore_loglevel for lots of outputs. Here is some relevant logs : [ 82.540316] ACPI: PM: Preparing to enter system sleep state S3 [ 82.547782] ACPI: EC: event blocked [ 82.547784] ACPI: EC: EC stopped [ 82.547785] ACPI: PM: Saving platform NVS memory [ 82.548228] Disabling non-boot CPUs ... [ 82.550506] smpboot: CPU 1 is now offline [ 82.553132] smpboot: CPU 2 is now offline [ 82.555485] smpboot: CPU 3 is now offline [ 82.557593] smpboot: CPU 4 is now offline [ 82.559873] smpboot: CPU 5 is now offline [ 82.561829] smpboot: CPU 6 is now offline [ 82.563933] smpboot: CPU 7 is now offline [ 82.565077] ACPI: PM: Low-level resume complete [ 82.565107] ACPI: EC: EC started [ 82.565108] ACPI: PM: Restoring platform NVS memory [ 83.718277] ------------[ cut here ]------------ [ 83.718278] WARNING: CPU: 0 PID: 2572 at drivers/iommu/amd/init.c:851 amd_iommu_enable_interrupts+0x34d/0x420 [ 83.718290] Modules linked in: ccm cmac algif_hash algif_skcipher af_alg bnep lm92 uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev mc btusb btrtl btbcm btintel btmtk bluetooth intel_rapl_msr ecdh_generic joydev mousedev intel_rapl_common crc16 edac_mce_amd snd_sof_amd_renoir snd_acp_config kvm_amd iwlmvm snd_sof_amd_acp kvm snd_sof_pci irqbypass snd_sof mac80211 snd_ctl_led snd_soc_acpi crct10dif_pclmul snd_hda_codec_realtek think_lmi crc32_pclmul libarc4 snd_hda_codec_hdmi snd_hda_codec_generic firmware_attributes_class crc32c_intel snd_soc_core ghash_clmulni_intel snd_hda_intel aesni_intel wmi_bmof snd_compress snd_intel_dspcfg iwlwifi snd_intel_sdw_acpi crypto_simd ac97_bus vfat snd_hda_codec snd_pcm_dmaengine cryptd iwlmei fat rapl snd_hda_core snd_pci_acp6x thinkpad_acpi snd_pci_acp5x snd_hwdep tpm_crb ledtrig_audio snd_pcm cfg80211 psmouse sp5100_tco platform_profile snd_rn_pci_acp3x ucsi_acpi zenpower(OE) snd_timer tpm_tis rfkill i2c_piix4 [ 83.718366] typec_ucsi snd ipmi_devintf typec snd_pci_acp3x tpm_tis_core ccp mei ipmi_msghandler r8168(OE) soundcore roles wmi tpm video rng_core i2c_scmi pinctrl_amd mac_hid acpi_cpufreq sg crypto_user acpi_call(OE) fuse bpf_preload ip_tables x_tables usbhid zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) serio_raw atkbd libps2 sdhci_pci cqhci sdhci xhci_pci xhci_pci_renesas mmc_core i8042 serio radeon amdgpu gpu_sched drm_ttm_helper ttm [ 83.718413] CPU: 0 PID: 2572 Comm: systemd-sleep Tainted: P OE 5.17.1-arch1-1 #1 0ea933cb6bfe82a8dc16ab834a4bccdd297f98b7 [ 83.718418] Hardware name: LENOVO 20NKS28F00/20NKS28F00, BIOS R12ET55W(1.25 ) 07/06/2020 [ 83.718421] RIP: 0010:amd_iommu_enable_interrupts+0x34d/0x420 [ 83.718427] Code: ff ff 49 8b 7f 18 89 04 24 e8 9f 36 ee ff 8b 04 24 e9 4b fd ff ff 0f 0b 4d 8b 3f 49 81 ff 50 09 56 99 0f 85 05 fd ff ff eb 96 <0f> 0b 4d 8b 3f 49 81 ff 50 09 56 99 0f 85 f1 fc ff ff eb 82 31 f6 [ 83.718429] RSP: 0018:ffffa787405cbc68 EFLAGS: 00010046 [ 83.718432] RAX: 00000001262cdc89 RBX: 0000000000000000 RCX: 0000000000000000 [ 83.718434] RDX: 000000000000607e RSI: 00000000000059ae RDI: 00000001262c7c0b [ 83.718436] RBP: 0000000080000000 R08: 0000000000000000 R09: 000000000000000f [ 83.718437] R10: 0000000079726f6d R11: 000000006d656d20 R12: 000ffffffffffff8 [ 83.718439] R13: 0800000000000000 R14: ffffa787405cbc70 R15: ffff95d48004a800 [ 83.718441] FS: 00007fb3d354fe80(0000) GS:ffff95d76fa00000(0000) knlGS:0000000000000000 [ 83.718443] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 83.718445] CR2: 00007f42204d6ad0 CR3: 000000012dbe8000 CR4: 00000000003506f0 [ 83.718447] Call Trace: [ 83.718450] <TASK> [ 83.718455] ? early_enable_iommus+0x1c5/0x300 [ 83.718460] ? enable_iommus_v2+0x8e/0x130 [ 83.718464] syscore_resume+0x4b/0x160 [ 83.718469] suspend_devices_and_enter+0x6d3/0x7d0 [ 83.718476] pm_suspend.cold+0x2fb/0x342 [ 83.718482] state_store+0x71/0xd0 [ 83.718487] kernfs_fop_write_iter+0x11c/0x1b0 [ 83.718493] new_sync_write+0x15c/0x1f0 [ 83.718500] vfs_write+0x1eb/0x280 [ 83.718503] ksys_write+0x67/0xe0 [ 83.718506] do_syscall_64+0x5c/0x80 [ 83.718511] ? do_syscall_64+0x69/0x80 [ 83.718513] ? exc_page_fault+0x72/0x170 [ 83.718517] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 83.718522] RIP: 0033:0x7fb3d3f44257 [ 83.718526] Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 [ 83.718528] RSP: 002b:00007ffeda5645a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 83.718531] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fb3d3f44257 [ 83.718532] RDX: 0000000000000004 RSI: 00007ffeda564690 RDI: 0000000000000004 [ 83.718534] RBP: 00007ffeda564690 R08: 000055ba9c2d1230 R09: 0000000000000000 [ 83.718535] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000004 [ 83.718536] R13: 000055ba9c2cd3c0 R14: 0000000000000004 R15: 00007fb3d403d7a0 [ 83.718540] </TASK> [ 83.718541] ---[ end trace 0000000000000000 ]--- [ 83.719139] Enabling non-boot CPUs ... [ 83.719211] x86: Booting SMP configuration: A bug report was also made downstream to bugs.archlinux.org. For any more information, feel free to reach out to me in the comments.
Follow up on this bug, it went away recently on the latest kernel but recently on 5.17.9-arch1-1 which now, causes another bug for which I shall make another bug report and link it to this one, I suppose. The bug is still in IOMMU of AMD device, it is inside the same function, than in this bug report, but the oops trace looks like it is a bit different this time.