Bug 214413

Summary: Kernel oops on boot for amdgpu (in si_dpm_set_power_state)
Product: Drivers Reporter: Marco Piazza (mpiazza)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: RESOLVED CODE_FIX    
Severity: normal CC: alexdeucher
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 5.14.4 Subsystem:
Regression: No Bisected commit-id:
Attachments: Dmesg log with kernel oops
Patch to revert ATPX/ATCS global structures

Description Marco Piazza 2021-09-15 14:46:32 UTC
Created attachment 298817 [details]
Dmesg log with kernel oops

Booting from a fresh self-compiled kernel 5.14.4 causes the following kernell oops.

[   11.451662] RIP: 0010:si_dpm_set_power_state+0xde3/0x1250 [amdgpu]
[   11.452272] Code: 0f 84 8e f5 ff ff c7 44 24 30 ea ff ff ff 48 c7 c7 38 a5 d6 a0 e8 9d 44 ae ff e9 75 f5 ff ff 45 31 c0 49 8b b4 24 10 0e 00 00 <0f
> b7 0e 66 85 c9 0f 84 eb 03 00 00 83 e9 01 48 8d 46 14 48 8d 0c
[   11.452468] RSP: 0018:ffff888106c0baa8 EFLAGS: 00010246
[   11.452530] RAX: ffff8881116a9a00 RBX: 000000000000ffff RCX: 0000000000000000
[   11.452608] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8881116c0000
[   11.452686] RBP: ffff8881116c0000 R08: 0000000000000000 R09: ffff8881116a8e58
[   11.452764] R10: ffff8881116c8400 R11: ffff8881116a99d8 R12: ffff8881116a8000
[   11.452841] R13: ffff8881116a8000 R14: 0000000000000000 R15: 0000000000000005
[   11.452919] FS:  00007face07fa040(0000) GS:ffff8881a9400000(0000) knlGS:0000000000000000
[   11.453008] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   11.453073] CR2: 0000000000000000 CR3: 0000000104e7a000 CR4: 00000000000406f0
[   11.453151] Call Trace:
[   11.453190]  ? _raw_spin_unlock_irqrestore+0x15/0x30
[   11.453256]  ? si_dpm_pre_set_power_state+0x506/0xa50 [amdgpu]
[   11.453818]  amdgpu_pm_compute_clocks.part.0+0x31c/0x5c0 [amdgpu]
[   11.454382]  si_dpm_hw_init+0x72/0x80 [amdgpu]
[   11.454928]  amdgpu_device_init.cold+0xd5a/0x1761 [amdgpu]
[   11.455479]  ? pci_conf1_read+0x9f/0xf0
[   11.455533]  ? pci_bus_read_config_word+0x44/0x70
[   11.455595]  amdgpu_driver_load_kms+0x63/0x2e0 [amdgpu]
[   11.456078]  amdgpu_pci_probe+0xf6/0x180 [amdgpu]
[   11.456554]  local_pci_probe+0x3d/0x70
[   11.456603]  ? pci_match_device+0xd2/0x100
[   11.456655]  pci_device_probe+0xf5/0x1b0
[   11.456706]  really_probe.part.0+0xb3/0x2a0
[   11.456761]  __driver_probe_device+0x8b/0x120
[   11.456816]  driver_probe_device+0x19/0xd0
[   11.456868]  __driver_attach+0xa6/0x170
[   11.456916]  ? __device_attach_driver+0xe0/0xe0
[   11.456972]  bus_for_each_dev+0x73/0xb0
[   11.457022]  bus_add_driver+0x106/0x1b0
[   11.457072]  driver_register+0x86/0xd0
[   11.457120]  ? 0xffffffffa0f33000
[   11.459179]  do_one_initcall+0x48/0x200
[   11.461264]  ? kmem_cache_alloc_trace+0x2c9/0x430
[   11.463394]  do_init_module+0x56/0x240
[   11.465492]  __do_sys_finit_module+0xa0/0xe0
[   11.467586]  do_syscall_64+0x43/0x90
[   11.469658]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   11.471773] RIP: 0033:0x7face0986f49
[   11.473851] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48
> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 17 3f 0c 00 f7 d8 64 89 01 48
[   11.478308] RSP: 002b:00007ffe5da504d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[   11.480601] RAX: ffffffffffffffda RBX: 00005599b42ece30 RCX: 00007face0986f49
[   11.482914] RDX: 0000000000000000 RSI: 00005599b42eedb0 RDI: 0000000000000014
[   11.485212] RBP: 00005599b42eedb0 R08: 0000000000000000 R09: 0000000000000000
[   11.487483] R10: 0000000000000014 R11: 0000000000000246 R12: 0000000000000000
[   11.489715] R13: 00005599b42f2a10 R14: 0000000000020000 R15: 0000000000000000
[   11.491855] Modules linked in: ext4 crc32c_generic mbcache jbd2 ath3k btusb btrtl btbcm btintel bluetooth jitterentropy_rng uvcvideo videobuf2_vmal
loc videobuf2_memops videobuf2_v4l2 sha512_generic videobuf2_common hmac videodev mc drbg ecdh_generic ecc crc16 toshiba_wmi wmi_bmof sparse_keymap am
d_freq_sensitivity ath9k ath9k_common ath9k_hw kvm_amd mac80211 ath kvm irqbypass sha256_generic snd_hda_codec_idt snd_hda_codec_generic amdgpu(+) gha
sh_clmulni_intel ledtrig_audio snd_hda_codec_hdmi deflate cryptd joydev snd_hda_intel evdev mfd_core snd_intel_dspcfg cfg80211 gpu_sched serio_raw efi
_pstore toshiba_bluetooth fam15h_power sg snd_hda_codec snd_hda_core snd_hwdep snd_pcm wmi rfkill libarc4 i2c_algo_bit snd_timer snd drm_ttm_helper tt
m soundcore drm_kms_helper ac battery video acpi_cpufreq button sch_cake loop msr parport_pc ppdev lp parport drm fuse configfs sunrpc efivarfs ip_tab
les x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq
[   11.492046]  async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 linear md_mod hid_generic usbhid hid sd_mod t10_pi uas usb_storage ohci_pci crc3
2c_intel xhci_pci psmouse i2c_piix4 ahci ohci_hcd libahci ehci_pci xhci_hcd ehci_hcd i2c_core libata usbcore scsi_mod alx usb_common mdio fan thermal
[   11.512445] CR2: 0000000000000000
[   11.515877] ---[ end trace 9d0f57da9351a59c ]---
[   11.524001] ------------[ cut here ]------------



This is my hardware:
marco@albireo:~$ lspci 
00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 16h Processor Root Complex
00:01.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Kabini [Radeon HD 8330]
00:01.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Kabini HDMI/DP Audio
00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 16h Processor Function 0
00:02.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 16h Processor Functions 5:1
00:02.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 16h Processor Functions 5:1
00:02.4 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 16h Processor Functions 5:1
00:10.0 USB controller: Advanced Micro Devices, Inc. [AMD] FCH USB XHCI Controller (rev 01)
00:11.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode]
00:12.0 USB controller: Advanced Micro Devices, Inc. [AMD] FCH USB OHCI Controller (rev 39)
00:12.2 USB controller: Advanced Micro Devices, Inc. [AMD] FCH USB EHCI Controller (rev 39)
00:13.0 USB controller: Advanced Micro Devices, Inc. [AMD] FCH USB OHCI Controller (rev 39)
00:13.2 USB controller: Advanced Micro Devices, Inc. [AMD] FCH USB EHCI Controller (rev 39)
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 3a)
00:14.2 Audio device: Advanced Micro Devices, Inc. [AMD] FCH Azalia Controller (rev 02)
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 11)
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 16h Processor Function 0
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 16h Processor Function 1
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 16h Processor Function 2
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 16h Processor Function 3
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 16h Processor Function 4
00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 16h Processor Function 5
01:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Sun PRO [Radeon HD 8570A/8570M]
02:00.0 Network controller: Qualcomm Atheros QCA9565 / AR9565 Wireless Network Adapter (rev 01)
03:00.0 Ethernet controller: Qualcomm Atheros QCA8172 Fast Ethernet (rev 10)


Attached the full dmesg log

Marco
Comment 1 Artem S. Tashkinov 2021-09-19 12:04:08 UTC
Is this a regression?

Could you please perform a git bisect?
Comment 2 Marco Piazza 2021-09-20 07:30:59 UTC
It is a regression, without apparent problems.
In fact the laptop start and is working as usual.

I've found a similar bug described here:
https://gitlab.freedesktop.org/drm/amd/-/issues/1698
Comment 3 Marco Piazza 2021-09-20 09:55:01 UTC
Created attachment 298885 [details]
Patch to revert ATPX/ATCS global structures
Comment 4 Marco Piazza 2021-09-20 09:56:28 UTC
I confirm that using the above patch make the oops disappear.
Comment 5 Alex Deucher 2021-09-20 13:52:10 UTC
Can provide the info requested in:
https://gitlab.freedesktop.org/drm/amd/-/issues/1698#note_1066944
Comment 6 Marco Piazza 2021-10-01 06:40:58 UTC
A patch with the bug fix has been included in 5.14.9 release.

I've downloaded the new version, compiled and booted fine.