Bug 215821 - kernel BUG at drivers/iommu/amd/init.c:851 amd_iommu_enable_interrupts+0x34d/0x420 when resuming from suspend to RAM
Summary: kernel BUG at drivers/iommu/amd/init.c:851 amd_iommu_enable_interrupts+0x34d/...
Status: RESOLVED MOVED
Alias: None
Product: Drivers
Classification: Unclassified
Component: IOMMU (show other bugs)
Hardware: x86-64 Linux
: P1 high
Assignee: drivers_iommu
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-04-08 15:55 UTC by Lahfa Samy
Modified: 2022-06-03 13:34 UTC (History)
3 users (show)

See Also:
Kernel Version: 5.17.1-arch1-1
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
dmesg output with initcall_debug, no_console_suspend, ignore_loglevel (131.17 KB, text/plain)
2022-04-08 15:55 UTC, Lahfa Samy
Details

Description Lahfa Samy 2022-04-08 15:55:32 UTC
Created attachment 300727 [details]
dmesg output with initcall_debug, no_console_suspend, ignore_loglevel

This bug started recently on kernel 5.17.x I believe, I should do a downgrade to confirm this, however I'm pretty confident this issue wasn't here before a recent upgrade I've made to the kernel.

So far testing under the Linux lts 5.15.32-arch1-1 shows that this issue is not present.

The hardware is a ThinkPad T495 AMD Ryzen 7 PRO 3700U with a Radeon Vega RX10.

Current linux-firmware installed : 20220209.6342082-1

If an X11 server is running resuming fails as the screen never comes back on, weirdly in the TTY it does resume, not sure if it's a relevant detail.

Dmesg attached was made with cmdline options : initcall_debug, no_console_suspend, ignore_loglevel for lots of outputs.

Here is some relevant logs :
[   82.540316] ACPI: PM: Preparing to enter system sleep state S3
[   82.547782] ACPI: EC: event blocked
[   82.547784] ACPI: EC: EC stopped
[   82.547785] ACPI: PM: Saving platform NVS memory
[   82.548228] Disabling non-boot CPUs ...
[   82.550506] smpboot: CPU 1 is now offline
[   82.553132] smpboot: CPU 2 is now offline
[   82.555485] smpboot: CPU 3 is now offline
[   82.557593] smpboot: CPU 4 is now offline
[   82.559873] smpboot: CPU 5 is now offline
[   82.561829] smpboot: CPU 6 is now offline
[   82.563933] smpboot: CPU 7 is now offline
[   82.565077] ACPI: PM: Low-level resume complete
[   82.565107] ACPI: EC: EC started
[   82.565108] ACPI: PM: Restoring platform NVS memory
[   83.718277] ------------[ cut here ]------------
[   83.718278] WARNING: CPU: 0 PID: 2572 at drivers/iommu/amd/init.c:851 amd_iommu_enable_interrupts+0x34d/0x420
[   83.718290] Modules linked in: ccm cmac algif_hash algif_skcipher af_alg bnep lm92 uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev mc btusb btrtl btbcm btintel btmtk bluetooth intel_rapl_msr ecdh_generic joydev mousedev intel_rapl_common crc16 edac_mce_amd snd_sof_amd_renoir snd_acp_config kvm_amd iwlmvm snd_sof_amd_acp kvm snd_sof_pci irqbypass snd_sof mac80211 snd_ctl_led snd_soc_acpi crct10dif_pclmul snd_hda_codec_realtek think_lmi crc32_pclmul libarc4 snd_hda_codec_hdmi snd_hda_codec_generic firmware_attributes_class crc32c_intel snd_soc_core ghash_clmulni_intel snd_hda_intel aesni_intel wmi_bmof snd_compress snd_intel_dspcfg iwlwifi snd_intel_sdw_acpi crypto_simd ac97_bus vfat snd_hda_codec snd_pcm_dmaengine cryptd iwlmei fat rapl snd_hda_core snd_pci_acp6x thinkpad_acpi snd_pci_acp5x snd_hwdep tpm_crb ledtrig_audio snd_pcm cfg80211 psmouse sp5100_tco platform_profile snd_rn_pci_acp3x ucsi_acpi zenpower(OE) snd_timer tpm_tis rfkill i2c_piix4
[   83.718366]  typec_ucsi snd ipmi_devintf typec snd_pci_acp3x tpm_tis_core ccp mei ipmi_msghandler r8168(OE) soundcore roles wmi tpm video rng_core i2c_scmi pinctrl_amd mac_hid acpi_cpufreq sg crypto_user acpi_call(OE) fuse bpf_preload ip_tables x_tables usbhid zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) serio_raw atkbd libps2 sdhci_pci cqhci sdhci xhci_pci xhci_pci_renesas mmc_core i8042 serio radeon amdgpu gpu_sched drm_ttm_helper ttm
[   83.718413] CPU: 0 PID: 2572 Comm: systemd-sleep Tainted: P           OE     5.17.1-arch1-1 #1 0ea933cb6bfe82a8dc16ab834a4bccdd297f98b7
[   83.718418] Hardware name: LENOVO 20NKS28F00/20NKS28F00, BIOS R12ET55W(1.25 ) 07/06/2020
[   83.718421] RIP: 0010:amd_iommu_enable_interrupts+0x34d/0x420
[   83.718427] Code: ff ff 49 8b 7f 18 89 04 24 e8 9f 36 ee ff 8b 04 24 e9 4b fd ff ff 0f 0b 4d 8b 3f 49 81 ff 50 09 56 99 0f 85 05 fd ff ff eb 96 <0f> 0b 4d 8b 3f 49 81 ff 50 09 56 99 0f 85 f1 fc ff ff eb 82 31 f6
[   83.718429] RSP: 0018:ffffa787405cbc68 EFLAGS: 00010046
[   83.718432] RAX: 00000001262cdc89 RBX: 0000000000000000 RCX: 0000000000000000
[   83.718434] RDX: 000000000000607e RSI: 00000000000059ae RDI: 00000001262c7c0b
[   83.718436] RBP: 0000000080000000 R08: 0000000000000000 R09: 000000000000000f
[   83.718437] R10: 0000000079726f6d R11: 000000006d656d20 R12: 000ffffffffffff8
[   83.718439] R13: 0800000000000000 R14: ffffa787405cbc70 R15: ffff95d48004a800
[   83.718441] FS:  00007fb3d354fe80(0000) GS:ffff95d76fa00000(0000) knlGS:0000000000000000
[   83.718443] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   83.718445] CR2: 00007f42204d6ad0 CR3: 000000012dbe8000 CR4: 00000000003506f0
[   83.718447] Call Trace:
[   83.718450]  <TASK>
[   83.718455]  ? early_enable_iommus+0x1c5/0x300
[   83.718460]  ? enable_iommus_v2+0x8e/0x130
[   83.718464]  syscore_resume+0x4b/0x160
[   83.718469]  suspend_devices_and_enter+0x6d3/0x7d0
[   83.718476]  pm_suspend.cold+0x2fb/0x342
[   83.718482]  state_store+0x71/0xd0
[   83.718487]  kernfs_fop_write_iter+0x11c/0x1b0
[   83.718493]  new_sync_write+0x15c/0x1f0
[   83.718500]  vfs_write+0x1eb/0x280
[   83.718503]  ksys_write+0x67/0xe0
[   83.718506]  do_syscall_64+0x5c/0x80
[   83.718511]  ? do_syscall_64+0x69/0x80
[   83.718513]  ? exc_page_fault+0x72/0x170
[   83.718517]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   83.718522] RIP: 0033:0x7fb3d3f44257
[   83.718526] Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[   83.718528] RSP: 002b:00007ffeda5645a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[   83.718531] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fb3d3f44257
[   83.718532] RDX: 0000000000000004 RSI: 00007ffeda564690 RDI: 0000000000000004
[   83.718534] RBP: 00007ffeda564690 R08: 000055ba9c2d1230 R09: 0000000000000000
[   83.718535] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000004
[   83.718536] R13: 000055ba9c2cd3c0 R14: 0000000000000004 R15: 00007fb3d403d7a0
[   83.718540]  </TASK>
[   83.718541] ---[ end trace 0000000000000000 ]---
[   83.719139] Enabling non-boot CPUs ...
[   83.719211] x86: Booting SMP configuration:

A bug report was also made downstream to bugs.archlinux.org.

For any more information, feel free to reach out to me in the comments.
Comment 1 Lahfa Samy 2022-06-03 13:34:34 UTC
Follow up on this bug, it went away recently on the latest kernel but recently on 5.17.9-arch1-1 which now, causes another bug for which I shall make another bug report and link it to this one, I suppose.
The bug is still in IOMMU of AMD device, it is inside the same function, than in this bug report, but the oops trace looks like it is a bit different this time.

Note You need to log in before you can comment on or make changes to this bug.