Bug 213365 - Elkhart Lake: kernel NULL pointer dereference in intel_pinctrl_get_soc_data
Summary: Elkhart Lake: kernel NULL pointer dereference in intel_pinctrl_get_soc_data
Status: RESOLVED ANSWERED
Alias: None
Product: Drivers
Classification: Unclassified
Component: Other (show other bugs)
Hardware: Intel Linux
: P1 normal
Assignee: Andy Shevchenko
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-06-08 12:42 UTC by You-Sheng Yang
Modified: 2022-08-26 09:15 UTC (History)
3 users (show)

See Also:
Kernel Version: 5.13-rc5
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg with v5.13-rc5 kernel on EHL CRB FAB B rev200 (100.49 KB, application/octet-stream)
2021-06-08 12:42 UTC, You-Sheng Yang
Details
dmesg with v5.13-rc5 kernel on EHL CRB FAB B rev200 (80.01 KB, text/plain)
2021-06-08 12:52 UTC, You-Sheng Yang
Details
dsdt of BIOS EHLSFWI1.R00.2091.A00.2002250754 02/25/2020 (1.82 MB, text/plain)
2021-06-08 13:11 UTC, You-Sheng Yang
Details
Patch to solve at least the NULL pointer dereference (1.16 KB, patch)
2022-07-06 11:30 UTC, Stefan Gottwald
Details | Diff
Dmesg of HP t550 with 5.18.9 kernel with the patch applied (64.06 KB, text/plain)
2022-07-06 11:31 UTC, Stefan Gottwald
Details
Dmidecode of th HP t550 system (13.95 KB, text/plain)
2022-07-06 11:32 UTC, Stefan Gottwald
Details
Acpidump of the HP t550 system (1.19 MB, text/plain)
2022-07-06 11:33 UTC, Stefan Gottwald
Details

Description You-Sheng Yang 2021-06-08 12:42:19 UTC
Created attachment 297233 [details]
dmesg with v5.13-rc5 kernel on EHL CRB FAB B rev200

When booting v5.13-rc5 kernel on Intel Elkhart Lake CRB FAB B rev 200/201, it would always gives a kernel NULL pointer dereference, and further access to INTC1020 via udev will hang forever:

[    1.267327] BUG: kernel NULL pointer dereference, address: 0000000000000000
[    1.267374] #PF: supervisor read access in kernel mode
[    1.267398] #PF: error_code(0x0000) - not-present page
[    1.267423] PGD 0 P4D 0 
[    1.267438] Oops: 0000 [#1] SMP NOPTI
[    1.267458] CPU: 3 PID: 161 Comm: systemd-udevd Not tainted 5.13.0-051300rc5-generic #202106062330
[    1.267498] Hardware name: Intel Corporation Elkhart Lake Embedded Platform/ElkhartLake LPDDR4x T3 CRB, BIOS EHLSFWI1.R00.2091.A00.2002250754 02/25/2020
[    1.267554] RIP: 0010:strcmp+0xc/0x20
[    1.267577] Code: 06 49 89 f8 48 83 c6 01 48 83 c7 01 88 47 ff 84 c0 75 eb 4c 89 c0 c3 0f 1f 80 00 00 00 00 31 c0 eb 08 48 83 c0 01 84 d2 74 0f <0f> b6 14 07 3a 14 06 74 ef 19 c0 83 c8 01 c3 31 c0 c3 66 90 48 89
[    1.267654] RSP: 0018:ffffb20f4056fb40 EFLAGS: 00010246
[    1.267678] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffb20f4056faf0
[    1.267710] RDX: 0000000000000000 RSI: ffffffffc0289c93 RDI: 0000000000000000
[    1.267741] RBP: ffffb20f4056fb68 R08: 0000000000000000 R09: 0000000000000000
[    1.267771] R10: ffffffffffffffff R11: 0000000000000000 R12: ffffffffc028bc40
[    1.267802] R13: ffffffffc028d0e0 R14: 0000000000000000 R15: 0000000000000002
[    1.267832] FS:  00007f301052e880(0000) GS:ffff99c478180000(0000) knlGS:0000000000000000
[    1.267868] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    1.267895] CR2: 0000000000000000 CR3: 0000000102a38000 CR4: 0000000000350ee0
[    1.267928] Call Trace:
[    1.267944]  ? intel_pinctrl_get_soc_data+0x67/0xc0
[    1.267973]  intel_pinctrl_probe_by_uid+0x13/0x30
[    1.267996]  platform_probe+0x45/0xa0
[    1.268016]  really_probe+0xff/0x480
[    1.268035]  driver_probe_device+0xf0/0x160
[    1.268057]  device_driver_attach+0xab/0xb0
[    1.268079]  __driver_attach+0xb2/0x140
[    1.268099]  ? device_driver_attach+0xb0/0xb0
[    1.268121]  bus_for_each_dev+0x7e/0xc0
[    1.268140]  driver_attach+0x1e/0x20
[    1.268159]  bus_add_driver+0x135/0x1f0
[    1.268179]  driver_register+0x95/0xf0
[    1.268198]  ? 0xffffffffc0290000
[    1.268215]  __platform_driver_register+0x1e/0x20
[    1.268237]  ehl_pinctrl_driver_init+0x1c/0x1000 [pinctrl_elkhartlake]
[    1.268269]  do_one_initcall+0x46/0x1d0
[    1.268290]  ? kmem_cache_alloc_trace+0x11c/0x240
[    1.268315]  do_init_module+0x62/0x290
[    1.268336]  load_module+0x71c/0x7a0
[    1.268355]  __do_sys_finit_module+0xc2/0x120
[    1.268377]  __x64_sys_finit_module+0x1a/0x20
[    1.268398]  do_syscall_64+0x40/0xb0
[    1.268418]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[    1.268442] RIP: 0033:0x7f3010aa6f6d
[    1.268461] Code: 28 0d 00 0f 05 eb a9 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d cb de 0c 00 f7 d8 64 89 01 48
[    1.268538] RSP: 002b:00007fff7a2be7f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[    1.268572] RAX: ffffffffffffffda RBX: 000055d53d830e60 RCX: 00007f3010aa6f6d
[    1.268603] RDX: 0000000000000000 RSI: 00007f301098dded RDI: 0000000000000005
[    1.268633] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000000
[    1.268663] R10: 0000000000000005 R11: 0000000000000246 R12: 00007f301098dded
[    1.268693] R13: 0000000000000000 R14: 000055d53d8340c0 R15: 000055d53d830e60
[    1.268725] Modules linked in: video fjes(-) pinctrl_elkhartlake(+)
[    1.268757] CR2: 0000000000000000
[    1.268775] ---[ end trace dbc6bf9db99ca940 ]---
[    1.268796] RIP: 0010:strcmp+0xc/0x20
[    1.268815] Code: 06 49 89 f8 48 83 c6 01 48 83 c7 01 88 47 ff 84 c0 75 eb 4c 89 c0 c3 0f 1f 80 00 00 00 00 31 c0 eb 08 48 83 c0 01 84 d2 74 0f <0f> b6 14 07 3a 14 06 74 ef 19 c0 83 c8 01 c3 31 c0 c3 66 90 48 89
[    1.271383] RSP: 0018:ffffb20f4056fb40 EFLAGS: 00010246
[    1.272689] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffb20f4056faf0
[    1.274003] RDX: 0000000000000000 RSI: ffffffffc0289c93 RDI: 0000000000000000
[    1.275299] RBP: ffffb20f4056fb68 R08: 0000000000000000 R09: 0000000000000000
[    1.276589] R10: ffffffffffffffff R11: 0000000000000000 R12: ffffffffc028bc40
[    1.277869] R13: ffffffffc028d0e0 R14: 0000000000000000 R15: 0000000000000002
[    1.279134] FS:  00007f301052e880(0000) GS:ffff99c478180000(0000) knlGS:0000000000000000
[    1.280381] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    1.281636] CR2: 0000000000000000 CR3: 0000000102a38000 CR4: 0000000000350ee0
Comment 1 You-Sheng Yang 2021-06-08 12:52:22 UTC
Created attachment 297235 [details]
dmesg with v5.13-rc5 kernel on EHL CRB FAB B rev200

Strip ANCI code.
Comment 2 You-Sheng Yang 2021-06-08 13:03:30 UTC
It seems EHL uses intel_pinctrl_probe_by_uid, but INTC1020 doesn't have uid. All other modern platforms use intel_pinctrl_probe_by_hid, and they don't have uid either.
Comment 3 You-Sheng Yang 2021-06-08 13:11:58 UTC
Created attachment 297237 [details]
dsdt of BIOS EHLSFWI1.R00.2091.A00.2002250754 02/25/2020

Add DSDT.
Comment 4 Stefan Gottwald 2022-07-06 11:29:36 UTC
Could second this got the same error on a HP t550 with kernels from 5.15.x until 5.19-rc5. Yes the log from me is from a tainted kernel but also tested with vanilla and got the same issue but it crashed too early (at least on Ubuntu) to get a proper log so you must live with the tainted one:
[   14.590015] BUG: kernel NULL pointer dereference, address: 0000000000000000
[   14.590023] #PF: supervisor read access in kernel mode
[   14.590026] #PF: error_code(0x0000) - not-present page
[   14.590029] PGD 0 P4D 0 
[   14.590032] Oops: 0000 [#1] PREEMPT SMP NOPTI
[   14.590036] CPU: 2 PID: 1006 Comm: systemd-udevd Tainted: G           O      5.15.26 #mainline-lxos_dev-g1647852051
[   14.590041] Hardware name: HP HP Pro t550 Thin Client/8A3B, BIOS N45 v00.21 05/19/2022
[   14.590044] RIP: 0010:strcmp+0xc/0x20
[   14.590050] Code: 06 49 89 f8 48 83 c6 01 48 83 c7 01 88 47 ff 84 c0 75 eb 4c 89 c0 c3 0f 1f 80 00 00 00 00 31 c0 eb 08 48 83 c0 01 84 d2 74 0f <0f> b6 14 07 3a 14 06 74 ef 19 c0 83 c8 01 c3 31 c0 c3 66 90 48 85
[   14.590056] RSP: 0018:ffffa25a80ce3c28 EFLAGS: 00010246
[   14.590059] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffa25a80ce3c00
[   14.590062] RDX: 0000000000000000 RSI: ffffffffc0302c93 RDI: 0000000000000000
[   14.590065] RBP: ffffffffc03060e0 R08: 0000000000000000 R09: ffffa25a80ce3a08
[   14.590068] R10: ffff93135a472000 R11: 0000000000000000 R12: ffffffffc0304c40
[   14.590070] R13: 0000000000000000 R14: 0000000000000006 R15: 0000000000000000
[   14.590073] FS:  00007f63cac93900(0000) GS:ffff9313bd900000(0000) knlGS:0000000000000000
[   14.590077] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   14.590080] CR2: 0000000000000000 CR3: 000000010f77a000 CR4: 0000000000350ee0
[   14.590083] Call Trace:
[   14.590086]  <TASK>
[   14.590088]  intel_pinctrl_get_soc_data+0x62/0xb0
[   14.590097]  intel_pinctrl_probe_by_uid+0xe/0x30
[   14.590101]  platform_probe+0x3c/0x90
[   14.590106]  really_probe+0x1f2/0x3f0
[   14.590109]  __driver_probe_device+0xfe/0x180
[   14.590113]  driver_probe_device+0x1e/0x90
[   14.590116]  __driver_attach+0xc0/0x1c0
[   14.590118]  ? __device_attach_driver+0xe0/0xe0
[   14.590122]  ? __device_attach_driver+0xe0/0xe0
[   14.590125]  bus_for_each_dev+0x75/0xc0
[   14.590129]  bus_add_driver+0x12b/0x1e0
[   14.590132]  driver_register+0x8f/0xe0
[   14.590135]  ? 0xffffffffc0309000
[   14.590138]  do_one_initcall+0x41/0x200
[   14.590143]  ? kmem_cache_alloc_trace+0x198/0x310
[   14.590148]  do_init_module+0x4c/0x240
[   14.590151]  __do_sys_finit_module+0xae/0x110
[   14.590155]  do_syscall_64+0x59/0xc0
[   14.590160]  ? exit_to_user_mode_prepare+0x1b5/0x240
[   14.590165]  ? syscall_exit_to_user_mode+0x18/0x40
[   14.590168]  ? do_syscall_64+0x69/0xc0
[   14.590171]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   14.590176] RIP: 0033:0x7f63cbced76d
[   14.590179] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d f3 36 0d 00 f7 d8 64 89 01 48
[   14.590184] RSP: 002b:00007ffd073bc648 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[   14.590188] RAX: ffffffffffffffda RBX: 000056234541e220 RCX: 00007f63cbced76d
[   14.590191] RDX: 0000000000000000 RSI: 00007f63cb9cf105 RDI: 0000000000000006
[   14.590193] RBP: 00007f63cb9cf105 R08: 0000000000000000 R09: 00007ffd073bc770
[   14.590196] R10: 0000000000000006 R11: 0000000000000246 R12: 0000000000000000
[   14.590198] R13: 000056234538e950 R14: 0000000000020000 R15: 000056234541e220
[   14.590202]  </TASK>
[   14.590203] Modules linked in: pinctrl_elkhartlake(+) dm_crypt b43 cordic bcma mac80211 cfg80211 ssb i915 joydev video i2c_algo_bit drm_kms_helper cec rc_core psmouse sch_fq_codel parport_pc ppdev usbhid hid_generic lp parport ttm drm ip_tables x_tables autofs4 igel_flash(O) pcspkr mmc_block uas usb_storage sdhci_pci cqhci sdhci
[   14.590234] CR2: 0000000000000000
[   14.590237] ---[ end trace aeb2c09bf7400e3c ]---
[   14.590239] RIP: 0010:strcmp+0xc/0x20
[   14.590242] Code: 06 49 89 f8 48 83 c6 01 48 83 c7 01 88 47 ff 84 c0 75 eb 4c 89 c0 c3 0f 1f 80 00 00 00 00 31 c0 eb 08 48 83 c0 01 84 d2 74 0f <0f> b6 14 07 3a 14 06 74 ef 19 c0 83 c8 01 c3 31 c0 c3 66 90 48 85
[   14.590247] RSP: 0018:ffffa25a80ce3c28 EFLAGS: 00010246
[   14.590250] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffa25a80ce3c00
[   14.590252] RDX: 0000000000000000 RSI: ffffffffc0302c93 RDI: 0000000000000000
[   14.590255] RBP: ffffffffc03060e0 R08: 0000000000000000 R09: ffffa25a80ce3a08
[   14.590257] R10: ffff93135a472000 R11: 0000000000000000 R12: ffffffffc0304c40
[   14.590260] R13: 0000000000000000 R14: 0000000000000006 R15: 0000000000000000
[   14.590262] FS:  00007f63cac93900(0000) GS:ffff9313bd900000(0000) knlGS:0000000000000000
[   14.590265] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   14.590268] CR2: 0000000000000000 CR3: 000000010f77a000 CR4: 0000000000350ee0

To solve this issue I added the following patch to the kernel:

--- a/drivers/pinctrl/intel/pinctrl-intel.c
+++ b/drivers/pinctrl/intel/pinctrl-intel.c
@@ -1632,12 +1632,17 @@ const struct intel_pinctrl_soc_data *intel_pinctrl_get_soc_data(struct platform_
        unsigned int i;
 
        adev = ACPI_COMPANION(&pdev->dev);
-       if (adev) {
+       if (adev && !adev->pnp.unique_id) {
+               printk(KERN_ERR "intel_pinctrl_get_soc_data: adev->pnp.unique_id is NULL\n");
+       }
+       if (adev && adev->pnp.unique_id) {
                const void *match = device_get_match_data(&pdev->dev);
 
                table = (const struct intel_pinctrl_soc_data **)match;
                for (i = 0; table[i]; i++) {
-                       if (!strcmp(adev->pnp.unique_id, table[i]->uid)) {
+                       if (!table[i]->uid) {
+                               printk(KERN_ERR "intel_pinctrl_get_soc_data: table[%d]->uid is NULL\n", i);
+                       } else if (!strcmp(adev->pnp.unique_id, table[i]->uid)) {
                                data = table[i];
                                break;
                        }

It is probably not the correct solution but at least the kernel do not crash with this anymore. Will attach the patch the full dmesg of a vanilla kernel with the patch a dmidecode and a acpidump as debug information.
Comment 5 Stefan Gottwald 2022-07-06 11:30:43 UTC
Created attachment 301343 [details]
Patch to solve at least the NULL pointer dereference
Comment 6 Stefan Gottwald 2022-07-06 11:31:44 UTC
Created attachment 301344 [details]
Dmesg of HP t550 with 5.18.9 kernel with the patch applied
Comment 7 Stefan Gottwald 2022-07-06 11:32:21 UTC
Created attachment 301345 [details]
Dmidecode of th HP t550 system
Comment 8 Stefan Gottwald 2022-07-06 11:33:02 UTC
Created attachment 301346 [details]
Acpidump of the HP t550 system
Comment 9 Andy Shevchenko 2022-08-25 18:38:01 UTC
The BIOS must be updated. Push your vendor for that.

You may also try to build a kernel with the pinctrl-elkhartlake.c replaced by https://git.yoctoproject.org/linux-yocto/tree/drivers/pinctrl/intel/pinctrl-elkhartlake.c?h=v5.10/standard/x86.
Comment 11 Andy Shevchenko 2022-08-26 09:15:38 UTC
In accordance with https://bugzilla.redhat.com/show_bug.cgi?id=1948468#c11 Intel is deploying a new BIOS for OEMs/ODMs/customers. Nothing to fix in upstream.

Note You need to log in before you can comment on or make changes to this bug.