Bug 213365

Summary: Elkhart Lake: kernel NULL pointer dereference in intel_pinctrl_get_soc_data
Product: Drivers Reporter: You-Sheng Yang (vicamo)
Component: OtherAssignee: Andy Shevchenko (andy.shevchenko)
Status: RESOLVED ANSWERED    
Severity: normal CC: andy.shevchenko, gottwald, vicamo
Priority: P1    
Hardware: Intel   
OS: Linux   
Kernel Version: 5.13-rc5 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg with v5.13-rc5 kernel on EHL CRB FAB B rev200
dmesg with v5.13-rc5 kernel on EHL CRB FAB B rev200
dsdt of BIOS EHLSFWI1.R00.2091.A00.2002250754 02/25/2020
Patch to solve at least the NULL pointer dereference
Dmesg of HP t550 with 5.18.9 kernel with the patch applied
Dmidecode of th HP t550 system
Acpidump of the HP t550 system

Description You-Sheng Yang 2021-06-08 12:42:19 UTC
Created attachment 297233 [details]
dmesg with v5.13-rc5 kernel on EHL CRB FAB B rev200

When booting v5.13-rc5 kernel on Intel Elkhart Lake CRB FAB B rev 200/201, it would always gives a kernel NULL pointer dereference, and further access to INTC1020 via udev will hang forever:

[    1.267327] BUG: kernel NULL pointer dereference, address: 0000000000000000
[    1.267374] #PF: supervisor read access in kernel mode
[    1.267398] #PF: error_code(0x0000) - not-present page
[    1.267423] PGD 0 P4D 0 
[    1.267438] Oops: 0000 [#1] SMP NOPTI
[    1.267458] CPU: 3 PID: 161 Comm: systemd-udevd Not tainted 5.13.0-051300rc5-generic #202106062330
[    1.267498] Hardware name: Intel Corporation Elkhart Lake Embedded Platform/ElkhartLake LPDDR4x T3 CRB, BIOS EHLSFWI1.R00.2091.A00.2002250754 02/25/2020
[    1.267554] RIP: 0010:strcmp+0xc/0x20
[    1.267577] Code: 06 49 89 f8 48 83 c6 01 48 83 c7 01 88 47 ff 84 c0 75 eb 4c 89 c0 c3 0f 1f 80 00 00 00 00 31 c0 eb 08 48 83 c0 01 84 d2 74 0f <0f> b6 14 07 3a 14 06 74 ef 19 c0 83 c8 01 c3 31 c0 c3 66 90 48 89
[    1.267654] RSP: 0018:ffffb20f4056fb40 EFLAGS: 00010246
[    1.267678] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffb20f4056faf0
[    1.267710] RDX: 0000000000000000 RSI: ffffffffc0289c93 RDI: 0000000000000000
[    1.267741] RBP: ffffb20f4056fb68 R08: 0000000000000000 R09: 0000000000000000
[    1.267771] R10: ffffffffffffffff R11: 0000000000000000 R12: ffffffffc028bc40
[    1.267802] R13: ffffffffc028d0e0 R14: 0000000000000000 R15: 0000000000000002
[    1.267832] FS:  00007f301052e880(0000) GS:ffff99c478180000(0000) knlGS:0000000000000000
[    1.267868] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    1.267895] CR2: 0000000000000000 CR3: 0000000102a38000 CR4: 0000000000350ee0
[    1.267928] Call Trace:
[    1.267944]  ? intel_pinctrl_get_soc_data+0x67/0xc0
[    1.267973]  intel_pinctrl_probe_by_uid+0x13/0x30
[    1.267996]  platform_probe+0x45/0xa0
[    1.268016]  really_probe+0xff/0x480
[    1.268035]  driver_probe_device+0xf0/0x160
[    1.268057]  device_driver_attach+0xab/0xb0
[    1.268079]  __driver_attach+0xb2/0x140
[    1.268099]  ? device_driver_attach+0xb0/0xb0
[    1.268121]  bus_for_each_dev+0x7e/0xc0
[    1.268140]  driver_attach+0x1e/0x20
[    1.268159]  bus_add_driver+0x135/0x1f0
[    1.268179]  driver_register+0x95/0xf0
[    1.268198]  ? 0xffffffffc0290000
[    1.268215]  __platform_driver_register+0x1e/0x20
[    1.268237]  ehl_pinctrl_driver_init+0x1c/0x1000 [pinctrl_elkhartlake]
[    1.268269]  do_one_initcall+0x46/0x1d0
[    1.268290]  ? kmem_cache_alloc_trace+0x11c/0x240
[    1.268315]  do_init_module+0x62/0x290
[    1.268336]  load_module+0x71c/0x7a0
[    1.268355]  __do_sys_finit_module+0xc2/0x120
[    1.268377]  __x64_sys_finit_module+0x1a/0x20
[    1.268398]  do_syscall_64+0x40/0xb0
[    1.268418]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[    1.268442] RIP: 0033:0x7f3010aa6f6d
[    1.268461] Code: 28 0d 00 0f 05 eb a9 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d cb de 0c 00 f7 d8 64 89 01 48
[    1.268538] RSP: 002b:00007fff7a2be7f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[    1.268572] RAX: ffffffffffffffda RBX: 000055d53d830e60 RCX: 00007f3010aa6f6d
[    1.268603] RDX: 0000000000000000 RSI: 00007f301098dded RDI: 0000000000000005
[    1.268633] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000000
[    1.268663] R10: 0000000000000005 R11: 0000000000000246 R12: 00007f301098dded
[    1.268693] R13: 0000000000000000 R14: 000055d53d8340c0 R15: 000055d53d830e60
[    1.268725] Modules linked in: video fjes(-) pinctrl_elkhartlake(+)
[    1.268757] CR2: 0000000000000000
[    1.268775] ---[ end trace dbc6bf9db99ca940 ]---
[    1.268796] RIP: 0010:strcmp+0xc/0x20
[    1.268815] Code: 06 49 89 f8 48 83 c6 01 48 83 c7 01 88 47 ff 84 c0 75 eb 4c 89 c0 c3 0f 1f 80 00 00 00 00 31 c0 eb 08 48 83 c0 01 84 d2 74 0f <0f> b6 14 07 3a 14 06 74 ef 19 c0 83 c8 01 c3 31 c0 c3 66 90 48 89
[    1.271383] RSP: 0018:ffffb20f4056fb40 EFLAGS: 00010246
[    1.272689] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffb20f4056faf0
[    1.274003] RDX: 0000000000000000 RSI: ffffffffc0289c93 RDI: 0000000000000000
[    1.275299] RBP: ffffb20f4056fb68 R08: 0000000000000000 R09: 0000000000000000
[    1.276589] R10: ffffffffffffffff R11: 0000000000000000 R12: ffffffffc028bc40
[    1.277869] R13: ffffffffc028d0e0 R14: 0000000000000000 R15: 0000000000000002
[    1.279134] FS:  00007f301052e880(0000) GS:ffff99c478180000(0000) knlGS:0000000000000000
[    1.280381] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    1.281636] CR2: 0000000000000000 CR3: 0000000102a38000 CR4: 0000000000350ee0
Comment 1 You-Sheng Yang 2021-06-08 12:52:22 UTC
Created attachment 297235 [details]
dmesg with v5.13-rc5 kernel on EHL CRB FAB B rev200

Strip ANCI code.
Comment 2 You-Sheng Yang 2021-06-08 13:03:30 UTC
It seems EHL uses intel_pinctrl_probe_by_uid, but INTC1020 doesn't have uid. All other modern platforms use intel_pinctrl_probe_by_hid, and they don't have uid either.
Comment 3 You-Sheng Yang 2021-06-08 13:11:58 UTC
Created attachment 297237 [details]
dsdt of BIOS EHLSFWI1.R00.2091.A00.2002250754 02/25/2020

Add DSDT.
Comment 4 Stefan Gottwald 2022-07-06 11:29:36 UTC
Could second this got the same error on a HP t550 with kernels from 5.15.x until 5.19-rc5. Yes the log from me is from a tainted kernel but also tested with vanilla and got the same issue but it crashed too early (at least on Ubuntu) to get a proper log so you must live with the tainted one:
[   14.590015] BUG: kernel NULL pointer dereference, address: 0000000000000000
[   14.590023] #PF: supervisor read access in kernel mode
[   14.590026] #PF: error_code(0x0000) - not-present page
[   14.590029] PGD 0 P4D 0 
[   14.590032] Oops: 0000 [#1] PREEMPT SMP NOPTI
[   14.590036] CPU: 2 PID: 1006 Comm: systemd-udevd Tainted: G           O      5.15.26 #mainline-lxos_dev-g1647852051
[   14.590041] Hardware name: HP HP Pro t550 Thin Client/8A3B, BIOS N45 v00.21 05/19/2022
[   14.590044] RIP: 0010:strcmp+0xc/0x20
[   14.590050] Code: 06 49 89 f8 48 83 c6 01 48 83 c7 01 88 47 ff 84 c0 75 eb 4c 89 c0 c3 0f 1f 80 00 00 00 00 31 c0 eb 08 48 83 c0 01 84 d2 74 0f <0f> b6 14 07 3a 14 06 74 ef 19 c0 83 c8 01 c3 31 c0 c3 66 90 48 85
[   14.590056] RSP: 0018:ffffa25a80ce3c28 EFLAGS: 00010246
[   14.590059] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffa25a80ce3c00
[   14.590062] RDX: 0000000000000000 RSI: ffffffffc0302c93 RDI: 0000000000000000
[   14.590065] RBP: ffffffffc03060e0 R08: 0000000000000000 R09: ffffa25a80ce3a08
[   14.590068] R10: ffff93135a472000 R11: 0000000000000000 R12: ffffffffc0304c40
[   14.590070] R13: 0000000000000000 R14: 0000000000000006 R15: 0000000000000000
[   14.590073] FS:  00007f63cac93900(0000) GS:ffff9313bd900000(0000) knlGS:0000000000000000
[   14.590077] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   14.590080] CR2: 0000000000000000 CR3: 000000010f77a000 CR4: 0000000000350ee0
[   14.590083] Call Trace:
[   14.590086]  <TASK>
[   14.590088]  intel_pinctrl_get_soc_data+0x62/0xb0
[   14.590097]  intel_pinctrl_probe_by_uid+0xe/0x30
[   14.590101]  platform_probe+0x3c/0x90
[   14.590106]  really_probe+0x1f2/0x3f0
[   14.590109]  __driver_probe_device+0xfe/0x180
[   14.590113]  driver_probe_device+0x1e/0x90
[   14.590116]  __driver_attach+0xc0/0x1c0
[   14.590118]  ? __device_attach_driver+0xe0/0xe0
[   14.590122]  ? __device_attach_driver+0xe0/0xe0
[   14.590125]  bus_for_each_dev+0x75/0xc0
[   14.590129]  bus_add_driver+0x12b/0x1e0
[   14.590132]  driver_register+0x8f/0xe0
[   14.590135]  ? 0xffffffffc0309000
[   14.590138]  do_one_initcall+0x41/0x200
[   14.590143]  ? kmem_cache_alloc_trace+0x198/0x310
[   14.590148]  do_init_module+0x4c/0x240
[   14.590151]  __do_sys_finit_module+0xae/0x110
[   14.590155]  do_syscall_64+0x59/0xc0
[   14.590160]  ? exit_to_user_mode_prepare+0x1b5/0x240
[   14.590165]  ? syscall_exit_to_user_mode+0x18/0x40
[   14.590168]  ? do_syscall_64+0x69/0xc0
[   14.590171]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   14.590176] RIP: 0033:0x7f63cbced76d
[   14.590179] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d f3 36 0d 00 f7 d8 64 89 01 48
[   14.590184] RSP: 002b:00007ffd073bc648 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[   14.590188] RAX: ffffffffffffffda RBX: 000056234541e220 RCX: 00007f63cbced76d
[   14.590191] RDX: 0000000000000000 RSI: 00007f63cb9cf105 RDI: 0000000000000006
[   14.590193] RBP: 00007f63cb9cf105 R08: 0000000000000000 R09: 00007ffd073bc770
[   14.590196] R10: 0000000000000006 R11: 0000000000000246 R12: 0000000000000000
[   14.590198] R13: 000056234538e950 R14: 0000000000020000 R15: 000056234541e220
[   14.590202]  </TASK>
[   14.590203] Modules linked in: pinctrl_elkhartlake(+) dm_crypt b43 cordic bcma mac80211 cfg80211 ssb i915 joydev video i2c_algo_bit drm_kms_helper cec rc_core psmouse sch_fq_codel parport_pc ppdev usbhid hid_generic lp parport ttm drm ip_tables x_tables autofs4 igel_flash(O) pcspkr mmc_block uas usb_storage sdhci_pci cqhci sdhci
[   14.590234] CR2: 0000000000000000
[   14.590237] ---[ end trace aeb2c09bf7400e3c ]---
[   14.590239] RIP: 0010:strcmp+0xc/0x20
[   14.590242] Code: 06 49 89 f8 48 83 c6 01 48 83 c7 01 88 47 ff 84 c0 75 eb 4c 89 c0 c3 0f 1f 80 00 00 00 00 31 c0 eb 08 48 83 c0 01 84 d2 74 0f <0f> b6 14 07 3a 14 06 74 ef 19 c0 83 c8 01 c3 31 c0 c3 66 90 48 85
[   14.590247] RSP: 0018:ffffa25a80ce3c28 EFLAGS: 00010246
[   14.590250] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffa25a80ce3c00
[   14.590252] RDX: 0000000000000000 RSI: ffffffffc0302c93 RDI: 0000000000000000
[   14.590255] RBP: ffffffffc03060e0 R08: 0000000000000000 R09: ffffa25a80ce3a08
[   14.590257] R10: ffff93135a472000 R11: 0000000000000000 R12: ffffffffc0304c40
[   14.590260] R13: 0000000000000000 R14: 0000000000000006 R15: 0000000000000000
[   14.590262] FS:  00007f63cac93900(0000) GS:ffff9313bd900000(0000) knlGS:0000000000000000
[   14.590265] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   14.590268] CR2: 0000000000000000 CR3: 000000010f77a000 CR4: 0000000000350ee0

To solve this issue I added the following patch to the kernel:

--- a/drivers/pinctrl/intel/pinctrl-intel.c
+++ b/drivers/pinctrl/intel/pinctrl-intel.c
@@ -1632,12 +1632,17 @@ const struct intel_pinctrl_soc_data *intel_pinctrl_get_soc_data(struct platform_
        unsigned int i;
 
        adev = ACPI_COMPANION(&pdev->dev);
-       if (adev) {
+       if (adev && !adev->pnp.unique_id) {
+               printk(KERN_ERR "intel_pinctrl_get_soc_data: adev->pnp.unique_id is NULL\n");
+       }
+       if (adev && adev->pnp.unique_id) {
                const void *match = device_get_match_data(&pdev->dev);
 
                table = (const struct intel_pinctrl_soc_data **)match;
                for (i = 0; table[i]; i++) {
-                       if (!strcmp(adev->pnp.unique_id, table[i]->uid)) {
+                       if (!table[i]->uid) {
+                               printk(KERN_ERR "intel_pinctrl_get_soc_data: table[%d]->uid is NULL\n", i);
+                       } else if (!strcmp(adev->pnp.unique_id, table[i]->uid)) {
                                data = table[i];
                                break;
                        }

It is probably not the correct solution but at least the kernel do not crash with this anymore. Will attach the patch the full dmesg of a vanilla kernel with the patch a dmidecode and a acpidump as debug information.
Comment 5 Stefan Gottwald 2022-07-06 11:30:43 UTC
Created attachment 301343 [details]
Patch to solve at least the NULL pointer dereference
Comment 6 Stefan Gottwald 2022-07-06 11:31:44 UTC
Created attachment 301344 [details]
Dmesg of HP t550 with 5.18.9 kernel with the patch applied
Comment 7 Stefan Gottwald 2022-07-06 11:32:21 UTC
Created attachment 301345 [details]
Dmidecode of th HP t550 system
Comment 8 Stefan Gottwald 2022-07-06 11:33:02 UTC
Created attachment 301346 [details]
Acpidump of the HP t550 system
Comment 9 Andy Shevchenko 2022-08-25 18:38:01 UTC
The BIOS must be updated. Push your vendor for that.

You may also try to build a kernel with the pinctrl-elkhartlake.c replaced by https://git.yoctoproject.org/linux-yocto/tree/drivers/pinctrl/intel/pinctrl-elkhartlake.c?h=v5.10/standard/x86.
Comment 11 Andy Shevchenko 2022-08-26 09:15:38 UTC
In accordance with https://bugzilla.redhat.com/show_bug.cgi?id=1948468#c11 Intel is deploying a new BIOS for OEMs/ODMs/customers. Nothing to fix in upstream.