Bug 217517 - Regression introduced by 326e1c208f3f24d14b93f910b8ae32c94923d22c
Summary: Regression introduced by 326e1c208f3f24d14b93f910b8ae32c94923d22c
Status: NEW
Alias: None
Product: ACPI
Classification: Unclassified
Component: Other (show other bugs)
Hardware: i386 Linux
: P3 normal
Assignee: acpi_other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-06-01 13:41 UTC by Stephan Bolten
Modified: 2023-06-19 15:37 UTC (History)
4 users (show)

See Also:
Kernel Version: 6.3.4 and 6.3.5
Subsystem:
Regression: Yes
Bisected commit-id: 326e1c208f3f24d14b93f910b8ae32c94923d22c


Attachments
journal log of affected boot (188.05 KB, text/plain)
2023-06-01 13:41 UTC, Stephan Bolten
Details
Fix command cancellation (1.07 KB, patch)
2023-06-02 12:24 UTC, Heikki Krogerus
Details | Diff
kernel config for plain vanilla compilation (256.93 KB, text/plain)
2023-06-02 16:49 UTC, Stephan Bolten
Details
Fix command cancellation v2 (1.76 KB, patch)
2023-06-05 13:29 UTC, Heikki Krogerus
Details | Diff
Fix command cancellation v3 (1.59 KB, patch)
2023-06-06 12:04 UTC, Heikki Krogerus
Details | Diff
attachment-23736-0.html (2.61 KB, text/html)
2023-06-06 15:52 UTC, Stephan Bolten
Details
Lenovo ThinkPad P1 splat on 6.3.4-101.fc37 (3.15 KB, text/plain)
2023-06-16 20:51 UTC, Radu Rendec
Details

Description Stephan Bolten 2023-06-01 13:41:48 UTC
Created attachment 304364 [details]
journal log of affected boot

Null pointer deref

after reverting 326e1c208f3f24d14b93f910b8ae32c94923d22c the problem is gone and the kernel does not crash anymore

See this discussion for details:

https://bbs.archlinux.org/viewtopic.php?pid=2102715#p2102715

journal log of affected boot is attached



Regards,

Stephan
Comment 1 Heikki Krogerus 2023-06-02 12:24:44 UTC
Created attachment 304366 [details]
Fix command cancellation

There is another bug that can cause null pointer dereference, but I'm not sure if that problem is related to this one. Nevertheless, attaching the fix for that one. Let me know if it does anything.

The kernel is not patched with anything extra - it's "vanilla" v6.1.31 [1] - right?



Can you share your kernel config file?

thanks

[1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
Comment 2 Stephan Bolten 2023-06-02 16:48:16 UTC
Compiled plain vanilla kernel 6.3.5 with patch --> working
Compiled plain vanilla kernel 6.3.5 without patch --> not working / null pointer

All of this using the arch approach (makepkg -s) and using standard config file (attached).

Thx
Comment 3 Stephan Bolten 2023-06-02 16:49:47 UTC
Created attachment 304368 [details]
kernel config for plain vanilla compilation

kernel config for plain vanilla compilation
Comment 4 Stephan Bolten 2023-06-02 16:50:50 UTC
(In reply to Stephan Bolten from comment #2)
> Compiled plain vanilla kernel 6.3.5 with patch --> working
> Compiled plain vanilla kernel 6.3.5 without patch --> not working / null
> pointer
> 
> All of this using the arch approach (makepkg -s) and using standard config
> file (attached).
> 
> Thx

sorry config says it was kernel 6.3.3-arch1
Comment 5 Heikki Krogerus 2023-06-05 12:28:27 UTC
Thank you. I'll send the fix out now.
Comment 6 Heikki Krogerus 2023-06-05 13:29:38 UTC
Created attachment 304377 [details]
Fix command cancellation v2

I'm sorry but I had to change the patch a little bit. Can you test it again?

If it works, can I add your Tested-by tag (it will show your email address) to the patch?
Comment 7 Stephan Bolten 2023-06-06 07:09:27 UTC
re-tested with new patch from yesterday - still working - no crashes

Feel free to add the "Tested-by" tag for me.
Comment 8 Heikki Krogerus 2023-06-06 12:04:12 UTC
Created attachment 304382 [details]
Fix command cancellation v3

I had to make one more modification to the patch - one of my tests was still failing. Nevertheless, I've now send this last version out.

But because of the modification I did not include your Tested-by tag. If you have time, please test this last version, and give the tag as a reply to the patch mail (you are CCd). Thank you.
Comment 9 Stephan Bolten 2023-06-06 15:50:55 UTC
working with 3rd patch as well - just tested it.
Comment 10 Stephan Bolten 2023-06-06 15:52:46 UTC
Created attachment 304383 [details]
attachment-23736-0.html

Tested-by: Stephan Bolten (stephan.bolten@gmx.net)

-----Original Message-----
From: bugzilla-daemon@kernel.org
To: stephan.bolten@gmx.net
Subject: [Bug 217517] Regression introduced by
326e1c208f3f24d14b93f910b8ae32c94923d22c
Date: 06.06.2023 14:04:12

https://bugzilla.kernel.org/show_bug.cgi?id=217517

Heikki Krogerus (heikki.krogerus@linux.intel.com) changed:

           What    |Removed                     |Added
-----------------------------------------------------------------------
-----
 Attachment #304377 [details]|0                           |1
        is obsolete|                            |

--- Comment #8 from Heikki Krogerus (heikki.krogerus@linux.intel.com) -
--
Created attachment 304382 [details]
  --> https://bugzilla.kernel.org/attachment.cgi?id=304382&action=edit
Fix command cancellation v3

I had to make one more modification to the patch - one of my tests was
still
failing. Nevertheless, I've now send this last version out.

But because of the modification I did not include your Tested-by tag.
If you
have time, please test this last version, and give the tag as a reply
to the
patch mail (you are CCd). Thank you.
Comment 11 Radu Rendec 2023-06-16 20:51:11 UTC
Created attachment 304439 [details]
Lenovo ThinkPad P1 splat on 6.3.4-101.fc37

I'm seeing this splat on a (Fedora) 6.3.4 kernel, on a ThinkPad P1 gen 4. The stack trace and the code dump look similar, but not identical. Could this be the same bug?

I haven't seen it before kernel 6.3.4, so it looks like a regression. Unfortunately, I don't have a way to reproduce it. Sometimes it happens shortly after boot and sometimes it doesn't happen for days.
Comment 12 Bagas Sanjaya 2023-06-17 04:27:04 UTC
(In reply to Radu Rendec from comment #11)
> Created attachment 304439 [details]
> Lenovo ThinkPad P1 splat on 6.3.4-101.fc37
> 
> I'm seeing this splat on a (Fedora) 6.3.4 kernel, on a ThinkPad P1 gen 4.
> The stack trace and the code dump look similar, but not identical. Could
> this be the same bug?
> 
> I haven't seen it before kernel 6.3.4, so it looks like a regression.
> Unfortunately, I don't have a way to reproduce it. Sometimes it happens
> shortly after boot and sometimes it doesn't happen for days.

Please test the proposed patch above.
Comment 13 Radu Rendec 2023-06-19 15:37:03 UTC
(In reply to Bagas Sanjaya from comment #12)
> Please test the proposed patch above.

Fedora kernel took the patch early, before it made it into Linus’ tree and all the way back into the upstream stable series (see https://gitlab.com/cki-project/kernel-ark/-/commit/f2c156884d4579452284662c07d2a1d5297c65bb).

I no longer see the issue after taking the new kernel, so I *assume* it is this patch that fixed it. I know, ideally I should reproduce the issue on a locally compiled kernel, then apply the patch and verify that the issue disappears. However, this laptop is my daily driver and rebooting it to test various kernels is too much of a hassle.

Thanks for posting the patch!

Note You need to log in before you can comment on or make changes to this bug.