Since linux-next-20230418 my MSI Alpha15 laptop (running an up to date debian sid) hangs on boot: Hardware: lspci: 00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne Root Complex 00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne IOMMU 00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge 00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe GPP Bridge 00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge 00:02.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge 00:02.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge 00:02.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge 00:02.4 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge 00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge 00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir Internal PCIe GPP Bridge to Bus 00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 51) 00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51) 00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 0 00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 1 00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 2 00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 3 00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 4 00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 5 00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 6 00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 7 01:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch (rev c3) 02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch 03:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 23 [Radeon RX 6600/6600 XT/6600M] (rev c3) 03:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21/23 HDMI/DP Audio Controller 04:00.0 Network controller: MEDIATEK Corp. MT7921K (RZ608) Wi-Fi 6E 80MHz 05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15) 06:00.0 Non-Volatile memory controller: Micron/Crucial Technology P1 NVMe PCIe SSD (rev 03) 07:00.0 Non-Volatile memory controller: Kingston Technology Company, Inc. Device 500c (rev 01) 08:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cezanne [Radeon Vega Series / Radeon Vega Mobile Series] (rev c5) 08:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Renoir Radeon High Definition Audio Controller 08:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) Platform Security Processor 08:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1 08:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1 08:00.5 Multimedia controller: Advanced Micro Devices, Inc. [AMD] ACP/ACP3X/ACP6x Audio Coprocessor (rev 01) 08:00.6 Audio device: Advanced Micro Devices, Inc. [AMD] Family 17h/19h HD Audio Controller 08:00.7 Signal processing controller: Advanced Micro Devices, Inc. [AMD] Sensor Fusion Hub I bisected this between v6.3-rc7 and next-20230418 and got commit 4bd763568dbdafdf7cd6b3fcc73f84f1a6f305d1 (HEAD) Author: Basavaraj Natikar <Basavaraj.Natikar@amd.com> Date: Tue Apr 11 21:49:01 2023 +0530 HID: amd_sfh: Support for additional light sensor There is support for additional light sensors in the SFH firmware. As a result, add support for additional light sensors. Signed-off-by: Basavaraj Natikar <Basavaraj.Natikar@amd.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> As the first faulty commit. I tried to revert this in next-20230418 but this did not solve the problem so I introduced some printk to aid in debugging: diff --git a/drivers/hid/amd-sfh-hid/amd_sfh_client.c b/drivers/hid/amd-sfh-hid/amd_sfh_client.c index d9b7b01900b5..ed78d99e1f8a 100644 --- a/drivers/hid/amd-sfh-hid/amd_sfh_client.c +++ b/drivers/hid/amd-sfh-hid/amd_sfh_client.c @@ -86,6 +86,8 @@ void amd_sfh_work(struct work_struct *work) mp2 = container_of(in_data, struct amd_mp2_dev, in_data); mp2_ops = mp2->mp2_ops; if (node_type == HID_FEATURE_REPORT) { + printk(KERN_INFO "%s %d: cli_data->feature_report[%u] = %px\n", + __func__, __LINE__, current_index, cli_data->feature_report[current_index]); report_size = mp2_ops->get_feat_rep(sensor_index, report_id, cli_data->feature_report[current_index]); if (report_size) @@ -263,6 +265,7 @@ int amd_sfh_hid_client_init(struct amd_mp2_dev *privdata) goto cleanup; } cl_data->feature_report[i] = devm_kzalloc(dev, feature_report_size, GFP_KERNEL); + printk(KERN_INFO "%s %d: feature_report[%d] = %px\n", __func__, __LINE__, i, cl_data->feature_report[i]); if (!cl_data->feature_report[i]) { rc = -ENOMEM; goto cleanup; diff --git a/drivers/hid/amd-sfh-hid/hid_descriptor/amd_sfh_hid_desc.c b/drivers/hid/amd-sfh-hid/hid_descriptor/amd_sfh_hid_desc.c index 8716a05950c8..6be5e0d9c9ad 100644 --- a/drivers/hid/amd-sfh-hid/hid_descriptor/amd_sfh_hid_desc.c +++ b/drivers/hid/amd-sfh-hid/hid_descriptor/amd_sfh_hid_desc.c @@ -143,6 +143,7 @@ static u8 get_feature_report(int sensor_idx, int report_id, u8 *feature_report) struct hpd_feature_report hpd_feature; struct als_feature_report als_feature; u8 report_size = 0; + printk(KERN_INFO "%s %d: feature_report = %px\n",__func__, __LINE__, feature_report); if (!feature_report) return report_size; This leads to the following messages in /var/log/kern.log: [...] Here the feature_report pointers get initialized correctly: 2023-04-19T14:07:58.307091+02:00 lisa kernel: [ 0.558881][ T233] amd_sfh_hid_client_init 268: feature_report[0] = ffff9dc040bea9e8 2023-04-19T14:07:58.307092+02:00 lisa kernel: [ 0.558964][ T233] amd_sfh_hid_client_init 268: feature_report[1] = ffff9dc040bea7e8 2023-04-19T14:07:58.307093+02:00 lisa kernel: [ 0.559040][ T233] amd_sfh_hid_client_init 268: feature_report[2] = ffff9dc040beade8 2023-04-19T14:07:58.307094+02:00 lisa kernel: [ 0.559115][ T233] amd_sfh_hid_client_init 268: feature_report[3] = ffff9dc040beaf28 2023-04-19T14:07:58.307096+02:00 lisa kernel: [ 0.559178][ T233] amd_sfh_hid_client_init 268: feature_report[4] = ffff9dc040bea9a8 2023-04-19T14:07:58.307098+02:00 lisa kernel: [ 0.559236][ T233] amd_sfh_hid_client_init 268: feature_report[5] = ffff9dc040beab68 20 [...] The first calls to feature report still have the correct pointers: 2023-04-19T14:07:58.307672+02:00 lisa kernel: [ 5.194507][ T124] amd_sfh_work 89: cli_data->feature_report[3] = ffff9dc040beaf28 2023-04-19T14:07:58.307673+02:00 lisa kernel: [ 5.194511][ T124] get_feature_report 146: feature_report = ffff9dc040beaf28 2023-04-19T14:07:58.307674+02:00 lisa kernel: [ 5.194566][ T345] amd_sfh_work 89: cli_data->feature_report[2] = ffff9dc040beade8 2023-04-19T14:07:58.307676+02:00 lisa kernel: [ 5.194568][ T345] get_feature_report 146: feature_report = ffff9dc040beade8 2023-04-19T14:07:58.307677+02:00 lisa kernel: [ 5.194571][ T345] amd_sfh_work 89: cli_data->feature_report[1] = ffff9dc040bea7e8 2023-04-19T14:07:58.307678+02:00 lisa kernel: [ 5.194571][ T345] get_feature_report 146: feature_report = ffff9dc040bea7e8 2023-04-19T14:07:58.307681+02:00 lisa kernel: [ 5.194856][ T16] amd_sfh_work 89: cli_data->feature_report[4] = ffff9dc040bea9a8 2023-04-19T14:07:58.307683+02:00 lisa kernel: [ 5.194858][ T16] get_feature_report 146: feature_report = ffff9dc040bea9a8 2023-04-19T14:07:58.307684+02:00 lisa kernel: [ 5.196126][ T122] amd_sfh_work 89: cli_data->feature_report[0] = ffff9dc040bea9e8 2023-04-19T14:07:58.307685+02:00 lisa kernel: [ 5.196129][ T122] get_feature_report 146: feature_report = ffff9dc040bea9e8 [...] The last call to feature report shows that the pointer has been corrupted: 2023-04-19T14:07:58.342881+02:00 lisa kernel: [ 6.207833][ T255] amd_sfh_work 89: cli_data->feature_report[5] = ffff000101010100 2023-04-19T14:07:58.342887+02:00 lisa kernel: [ 6.207835][ T255] get_feature_report 146: feature_report = ffff000101010100 2023-04-19T14:07:58.342889+02:00 lisa kernel: [ 6.207838][ T255] stack segment: 0000 [#1] PREEMPT SMP NOPTI 2023-04-19T14:07:58.342891+02:00 lisa kernel: [ 6.207851][ T255] CPU: 8 PID: 255 Comm: kworker/8:2 Not tainted 6.3.0-rc6-00054-g4bd763568dbd-dirty #494 2023-04-19T14:07:58.342892+02:00 lisa kernel: [ 6.207862][ T255] Hardware name: Micro-Star International Co., Ltd. Alpha 15 B5EEK/MS-158L, BIOS E158LAMS.107 11/10/2021 2023-04-19T14:07:58.342901+02:00 lisa kernel: [ 6.207872][ T255] Workqueue: events amd_sfh_work [amd_sfh] 2023-04-19T14:07:58.342903+02:00 lisa kernel: [ 6.207882][ T255] RIP: 0010:get_feature_report+0x4f/0xf0 [amd_sfh] 2023-04-19T14:07:58.342905+02:00 lisa kernel: [ 6.207892][ T255] Code: d9 f9 48 85 ed 0f 84 88 00 00 00 83 fb 10 0f 84 8a 00 00 00 7e 3a 83 fb 13 74 05 83 fb 16 75 73 48 b8 01 41 51 05 50 00 00 00 <44> 88 65 00 48 89 45 01 b8 7f 00 00 00 66 89 45 0d b8 0f 00 00 00 2023-04-19T14:07:58.342907+02:00 lisa kernel: [ 6.207899][ T752] Bluetooth: MGMT ver 1.22 2023-04-19T14:07:58.342909+02:00 lisa kernel: [ 6.207907][ T255] RSP: 0018:ffffb2e18066be30 EFLAGS: 00010246 2023-04-19T14:07:58.342911+02:00 lisa kernel: [ 6.207908][ T255] RAX: 0000005005514101 RBX: 0000000000000016 RCX: 0000000000000027 2023-04-19T14:07:58.342913+02:00 lisa kernel: [ 6.208366][ T255] RDX: ffff9dc31e817348 RSI: 0000000000000001 RDI: ffff9dc31e817340 2023-04-19T14:07:58.342918+02:00 lisa kernel: [ 6.208702][ T255] RBP: ffff000101010100 R08: 0000000000000000 R09: ffffb2e18066bd00 2023-04-19T14:07:58.350285+02:00 lisa kernel: [ 6.209011][ T255] R10: 0000000000000003 R11: ffffffffbb296aa8 R12: 0000000000000004 2023-04-19T14:07:58.350291+02:00 lisa kernel: [ 6.209309][ T255] R13: 0000000000000004 R14: 0000000000000005 R15: 0000000000000002 2023-04-19T14:07:58.350293+02:00 lisa kernel: [ 6.209601][ T255] FS: 0000000000000000(0000) GS:ffff9dc31e800000(0000) knlGS:0000000000000000 2023-04-19T14:07:58.350295+02:00 lisa kernel: [ 6.209895][ T255] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 2023-04-19T14:07:58.350295+02:00 lisa kernel: [ 6.210182][ T255] CR2: 000055d1f9e7f988 CR3: 00000001194c4000 CR4: 0000000000750ee0 2023-04-19T14:07:58.350297+02:00 lisa kernel: [ 6.210470][ T255] PKRU: 55555554 2023-04-19T14:07:58.350298+02:00 lisa kernel: [ 6.210758][ T255] Call Trace: 2023-04-19T14:07:58.350300+02:00 lisa kernel: [ 6.211041][ T255] <TASK> 2023-04-19T14:07:58.350302+02:00 lisa kernel: [ 6.211321][ T255] amd_sfh_work+0x12b/0x170 [amd_sfh] 2023-04-19T14:07:58.350303+02:00 lisa kernel: [ 6.211603][ T255] process_one_work+0x1ed/0x370 2023-04-19T14:07:58.350305+02:00 lisa kernel: [ 6.211884][ T255] worker_thread+0x45/0x3b0 2023-04-19T14:07:58.350306+02:00 lisa kernel: [ 6.212157][ T255] ? rescuer_thread+0x3a0/0x3a0 2023-04-19T14:07:58.350307+02:00 lisa kernel: [ 6.212428][ T255] kthread+0xd5/0x100 2023-04-19T14:07:58.350309+02:00 lisa kernel: [ 6.212700][ T255] ? kthread_complete_and_exit+0x20/0x20 2023-04-19T14:07:58.350311+02:00 lisa kernel: [ 6.212969][ T255] ret_from_fork+0x22/0x30 2023-04-19T14:07:58.350311+02:00 lisa kernel: [ 6.213237][ T255] </TASK> 2023-04-19T14:07:58.350314+02:00 lisa kernel: [ 6.213502][ T255] Modules linked in: bnep cpufreq_conservative cpufreq_powersave cpufreq_userspace nls_ascii nls_cp437 vfat fat snd_ctl_led btusb btrtl btbcm btintel btmtk bluetooth snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_soc_dmic snd_acp3x_pdm_dma snd_acp3x_rn snd_hda_codec uvcvideo snd_soc_core snd_hwdep jitterentropy_rng videobuf2_vmalloc snd_hda_core uvc snd_acp_pci snd_pcm_oss videobuf2_memops snd_rn_pci_acp3x videobuf2_v4l2 sha512_generic snd_mixer_oss joydev snd_acp_config videodev snd_pcm edac_mce_amd snd_timer snd_soc_acpi msi_wmi ctr drbg ecdh_generic ecc videobuf2_common rapl sparse_keymap wmi_bmof snd k10temp soundcore snd_pci_acp3x ccp battery ac button hid_sensor_accel_3d hid_sensor_prox hid_sensor_magn_3d hid_sensor_gyro_3d hid_sensor_als(+) hid_sensor_trigger industrialio_triggered_buffer kfifo_buf industrialio amd_pmc acpi_cpufreq hid_sensor_iio_common evdev hid_multitouch serio_raw mt7921e mt7921_common mt76_connac_lib mt76 mac80211 2023-04-19T14:07:58.350324+02:00 lisa kernel: [ 6.213531][ T255] libarc4 cfg80211 rfkill msr fuse dm_mod loop efi_pstore configfs efivarfs autofs4 ext4 crc32c_generic crc16 mbcache jbd2 usbhid amdgpu drm_ttm_helper ttm gpu_sched i2c_algo_bit drm_buddy nvme drm_display_helper xhci_pci r8169 nvme_core xhci_hcd drm_kms_helper realtek hid_sensor_hub t10_pi mdio_devres mfd_core hid_generic usbcore psmouse crc32c_intel amd_sfh syscopyarea crc64_rocksoft i2c_hid_acpi libphy sysfillrect crc64 i2c_hid sysimgblt crc_t10dif i2c_piix4 usb_common crct10dif_generic cec crct10dif_common hid i2c_designware_platform i2c_designware_core 2023-04-19T14:07:58.350326+02:00 lisa kernel: [ 6.216071][ T255] ---[ end trace 0000000000000000 ]--- 2023-04-19T14:07:58.949460+02:00 lisa kernel: [ 6.810810][ T498] mt7921e 0000:04:00.0 wlp4s0: renamed from wlan0 2023-04-19T14:08:03.549208+02:00 lisa kernel: [ 11.414036][ T255] RIP: 0010:get_feature_report+0x4f/0xf0 [amd_sfh] 2023-04-19T14:08:03.549220+02:00 lisa kernel: [ 11.414585][ T255] Code: d9 f9 48 85 ed 0f 84 88 00 00 00 83 fb 10 0f 84 8a 00 00 00 7e 3a 83 fb 13 74 05 83 fb 16 75 73 48 b8 01 41 51 05 50 00 00 00 <44> 88 65 00 48 89 45 01 b8 7f 00 00 00 66 89 45 0d b8 0f 00 00 00 2023-04-19T14:08:03.550037+02:00 lisa kernel: [ 11.415394][ T255] RSP: 0018:ffffb2e18066be30 EFLAGS: 00010246 2023-04-19T14:08:03.550044+02:00 lisa kernel: [ 11.415829][ T255] RAX: 0000005005514101 RBX: 0000000000000016 RCX: 0000000000000027 2023-04-19T14:08:03.550876+02:00 lisa kernel: [ 11.416219][ T255] RDX: ffff9dc31e817348 RSI: 0000000000000001 RDI: ffff9dc31e817340 2023-04-19T14:08:03.550880+02:00 lisa kernel: [ 11.416220][ T255] RBP: ffff000101010100 R08: 0000000000000000 R09: ffffb2e18066bd00 2023-04-19T14:08:03.550881+02:00 lisa kernel: [ 11.416220][ T255] R10: 0000000000000003 R11: ffffffffbb296aa8 R12: 0000000000000004 2023-04-19T14:08:03.550883+02:00 lisa kernel: [ 11.416221][ T255] R13: 0000000000000004 R14: 0000000000000005 R15: 0000000000000002 2023-04-19T14:08:03.550885+02:00 lisa kernel: [ 11.416221][ T255] FS: 0000000000000000(0000) GS:ffff9dc31e800000(0000) knlGS:0000000000000000 2023-04-19T14:08:03.550886+02:00 lisa kernel: [ 11.416222][ T255] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 2023-04-19T14:08:03.550888+02:00 lisa kernel: [ 11.416223][ T255] CR2: 000055d1f9e7f988 CR3: 00000001194c4000 CR4: 0000000000750ee0 2023-04-19T14:08:03.550890+02:00 lisa kernel: [ 11.416223][ T255] PKRU: 55555554 2023-04-19T14:08:03.669878+02:00 lisa kernel: [ 11.534683][ T845] Generic FE-GE Realtek PHY r8169-0-500:00: attached PHY driver (mii_bus:phy_addr=r8169-0-500:00, irq=MAC) 2023-04-19T14:08:03.852904+02:00 lisa kernel: [ 11.717578][ T451] r8169 0000:05:00.0 enp5s0: Link is Down 2023-04-19T14:08:06.644702+02:00 lisa kernel: [ 14.510007][ T888] wlp4s0: authenticate with 54:67:51:3d:a2:e0 2023-04-19T14:08:06.799995+02:00 lisa kernel: [ 14.663387][ T888] wlp4s0: send auth to 54:67:51:3d:a2:e0 (try 1/3) 2023-04-19T14:08:06.804010+02:00 lisa kernel: [ 14.667965][ T95] wlp4s0: authenticated 2023-04-19T14:08:06.805897+02:00 lisa kernel: [ 14.670778][ T95] wlp4s0: associate with 54:67:51:3d:a2:e0 (try 1/3) 2023-04-19T14:08:06.833179+02:00 lisa kernel: [ 14.697227][ T95] wlp4s0: RX AssocResp from 54:67:51:3d:a2:e0 (capab=0x1411 status=0 aid=1) 2023-04-19T14:08:06.862139+02:00 lisa kernel: [ 14.726209][ T95] wlp4s0: associated 2023-04-19T14:08:06.990877+02:00 lisa kernel: [ 14.854686][ T888] IPv6: ADDRCONF(NETDEV_CHANGE): wlp4s0: link becomes ready 2023-04-19T14:08:13.386780+02:00 lisa kernel: [ 21.251125][ C12] sysrq: Emergency Sync
In 4bd763568dbdafdf7cd6b3fcc73f84f1a6f305d1 the bug can be fixed by increasing MAX_HID_DEVICES: diff --git a/drivers/hid/amd-sfh-hid/amd_sfh_hid.h b/drivers/hid/amd-sfh-hid/amd_sfh_hid.h index 528036892c9d..97296f587bc7 100644 --- a/drivers/hid/amd-sfh-hid/amd_sfh_hid.h +++ b/drivers/hid/amd-sfh-hid/amd_sfh_hid.h @@ -11,7 +11,7 @@ #ifndef AMDSFH_HID_H #define AMDSFH_HID_H -#define MAX_HID_DEVICES 5 +#define MAX_HID_DEVICES 6 #define AMD_SFH_HID_VENDOR 0x1022 #define AMD_SFH_HID_PRODUCT 0x0001
The patch above does NOT solve the hang on boot issue for next-20230418, so there probably is another perhaps unrelated bug. This new bug does not leave traces in logs so far.
The other bug in next-20230418 was introduced in aca356f6b0ec7df50f108a36c13ef0c73f9ac5b5 but is no longer present in next-20230419.
The bug which was introduced in aca356f6b0ec7df50f108a36c13ef0c73f9ac5b5 was fixex in commit 41d8f240eed1fc4b7e3d7bc6a56d92b844ef0956 Author: Mark Brown <broonie@kernel.org> Date: Fri Apr 14 17:54:12 2023 +0100 drm: Fix up merge issue Signed-off-by: Mark Brown <broonie@kernel.org> This commit is present in next-20230420.
Fixed in next-20230426 (commit 37386669887d3f2ccf021322c5558353d20f2387 ).