Created attachment 304364 [details] journal log of affected boot Null pointer deref after reverting 326e1c208f3f24d14b93f910b8ae32c94923d22c the problem is gone and the kernel does not crash anymore See this discussion for details: https://bbs.archlinux.org/viewtopic.php?pid=2102715#p2102715 journal log of affected boot is attached Regards, Stephan
Created attachment 304366 [details] Fix command cancellation There is another bug that can cause null pointer dereference, but I'm not sure if that problem is related to this one. Nevertheless, attaching the fix for that one. Let me know if it does anything. The kernel is not patched with anything extra - it's "vanilla" v6.1.31 [1] - right? Can you share your kernel config file? thanks [1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
Compiled plain vanilla kernel 6.3.5 with patch --> working Compiled plain vanilla kernel 6.3.5 without patch --> not working / null pointer All of this using the arch approach (makepkg -s) and using standard config file (attached). Thx
Created attachment 304368 [details] kernel config for plain vanilla compilation kernel config for plain vanilla compilation
(In reply to Stephan Bolten from comment #2) > Compiled plain vanilla kernel 6.3.5 with patch --> working > Compiled plain vanilla kernel 6.3.5 without patch --> not working / null > pointer > > All of this using the arch approach (makepkg -s) and using standard config > file (attached). > > Thx sorry config says it was kernel 6.3.3-arch1
Thank you. I'll send the fix out now.
Created attachment 304377 [details] Fix command cancellation v2 I'm sorry but I had to change the patch a little bit. Can you test it again? If it works, can I add your Tested-by tag (it will show your email address) to the patch?
re-tested with new patch from yesterday - still working - no crashes Feel free to add the "Tested-by" tag for me.
Created attachment 304382 [details] Fix command cancellation v3 I had to make one more modification to the patch - one of my tests was still failing. Nevertheless, I've now send this last version out. But because of the modification I did not include your Tested-by tag. If you have time, please test this last version, and give the tag as a reply to the patch mail (you are CCd). Thank you.
working with 3rd patch as well - just tested it.
Created attachment 304383 [details] attachment-23736-0.html Tested-by: Stephan Bolten (stephan.bolten@gmx.net) -----Original Message----- From: bugzilla-daemon@kernel.org To: stephan.bolten@gmx.net Subject: [Bug 217517] Regression introduced by 326e1c208f3f24d14b93f910b8ae32c94923d22c Date: 06.06.2023 14:04:12 https://bugzilla.kernel.org/show_bug.cgi?id=217517 Heikki Krogerus (heikki.krogerus@linux.intel.com) changed: What |Removed |Added ----------------------------------------------------------------------- ----- Attachment #304377 [details]|0 |1 is obsolete| | --- Comment #8 from Heikki Krogerus (heikki.krogerus@linux.intel.com) - -- Created attachment 304382 [details] --> https://bugzilla.kernel.org/attachment.cgi?id=304382&action=edit Fix command cancellation v3 I had to make one more modification to the patch - one of my tests was still failing. Nevertheless, I've now send this last version out. But because of the modification I did not include your Tested-by tag. If you have time, please test this last version, and give the tag as a reply to the patch mail (you are CCd). Thank you.
Created attachment 304439 [details] Lenovo ThinkPad P1 splat on 6.3.4-101.fc37 I'm seeing this splat on a (Fedora) 6.3.4 kernel, on a ThinkPad P1 gen 4. The stack trace and the code dump look similar, but not identical. Could this be the same bug? I haven't seen it before kernel 6.3.4, so it looks like a regression. Unfortunately, I don't have a way to reproduce it. Sometimes it happens shortly after boot and sometimes it doesn't happen for days.
(In reply to Radu Rendec from comment #11) > Created attachment 304439 [details] > Lenovo ThinkPad P1 splat on 6.3.4-101.fc37 > > I'm seeing this splat on a (Fedora) 6.3.4 kernel, on a ThinkPad P1 gen 4. > The stack trace and the code dump look similar, but not identical. Could > this be the same bug? > > I haven't seen it before kernel 6.3.4, so it looks like a regression. > Unfortunately, I don't have a way to reproduce it. Sometimes it happens > shortly after boot and sometimes it doesn't happen for days. Please test the proposed patch above.
(In reply to Bagas Sanjaya from comment #12) > Please test the proposed patch above. Fedora kernel took the patch early, before it made it into Linus’ tree and all the way back into the upstream stable series (see https://gitlab.com/cki-project/kernel-ark/-/commit/f2c156884d4579452284662c07d2a1d5297c65bb). I no longer see the issue after taking the new kernel, so I *assume* it is this patch that fixed it. I know, ideally I should reproduce the issue on a locally compiled kernel, then apply the patch and verify that the issue disappears. However, this laptop is my daily driver and rebooting it to test various kernels is too much of a hassle. Thanks for posting the patch!