Created attachment 295399 [details]
journalctl from the failed boot
The affected system is a Dell Latitude E5570 running ArchLinux with latest firmware/UEFI/etc. The system boots to a general protection fault screen (see https://bugs.archlinux.org/task/69702 - there is an attached screenshot in the first comment).
I've also just tested with linux-next from 20210222 - the problem persists. Blacklisting 'dell_wmi_sysman' is the only solution for reaching a working system.
Tell me if I can be of any help.
Created attachment 295461 [details]
journalctl from the failed boot - DELL E5470
Panic reproduced also on Dell Latitude E5470/0MT53G, BIOS 1.23.3 08/04/2020 with kernel 5.11.1.
First log lines about WMI are:
wmi_bus wmi_bus-PNP0C14:01: WQBC data block query control method not found
acpi PNP0C14:02: duplicate WMI GUID 05901221-D566-11D1-B2F0-00A0C9062910 (first instance was on PNP0C14:01)
acpi PNP0C14:03: duplicate WMI GUID 05901221-D566-11D1-B2F0-00A0C9062910 (first instance was on PNP0C14:01)
and then boot goes on for a second. Several processes cause stack traces to be printed, pointing all over the place. Blacklisting the module workarounds the problem.
So if it's 5.11.1 as well as linux-next then you have https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/drivers/platform/x86/dell-wmi-sysman?h=linux-5.11.y&id=215164bfb7144c5890dd8021ff06e486939862d4 already applied then.
These are quite old right - like they came out in 2015/2016 time frame.
I would expect they don't support this interface, but they should have bailed more gracefully too.
Can you please test https://lkml.org/lkml/2021/2/18/748
It needs to be re-spun from maintainer feedback, but I think it will fix the issue for you. If it works for you, i'll respin it soon and you can test the new one as well hopefully.
Created attachment 295547 [details]
journalctl from 5.11.2 + the patch by Mario
There is no change in the behavior with the above patch.
The panic occurs on my Dell Latitude E5570 as well (figures, same hardware as the E5470) using Fedora 33. The proposed patch does not change anything.
I've prepared and posted a set of patches which deal with various problems with error-exit path cleanups and general robustness of the dell-wmi-sysman driver:
Note it is not entirely clear to me what is going on here, so I'm not sure if these patches fix things but hopefully they will help.
What would be helpful, independent of testing the patches, is if someone could boot a 5.11 kernel with dell-wmi-sysman blacklisted to avoid the problem.
1. Switch to a text-console
2. ssh into the machine and run dmesg -w
3. ssh into the machine a second time and run: "sudo modprobe dell_wmi_sysman dyndbg"
And then collect log info from the "dmesg -w" and in case there are log messages on the text-console which did not make it into the ssh dmesg -w output, make a picture of those.
And if you are capable of building your own kernels then testing the patches would be great too of course (save the emails in "raw" format and then "git am" them).
I answered above question for my situation in https://bugzilla.redhat.com/show_bug.cgi?id=1936171. For completeness sake I will upload the requested dmesg info here as well.
Created attachment 295965 [details]
dmesg during modprobe dell_wmi_sysman
Thanks to Freddy's testing, we now have confirmation that the patches fix things and I now also now the exact circumstances / root-cause which triggers this crash.
I've posted a v2 of the patches, adding one more robustness fix and dropping one patch which needs more testing:
I'll work on getting these merged by Linus and then also on getting them added to the stable kernel series (I'm the drivers/platform/x86 maintainer). In the mean time it would be best if distros carry the v2 patch-series as downstream patches.
I've just tested with Linux Next 20210323 on my Dell Latitude E5570 - everything works as expected. I just couldn't reboot for some reason - the system seemed stuck at some point before the reboot target and I had to forcibly turn it off. But I think this is out of scope of the current report :)
Thank you all for fixing and testing this.
Good luck ahead!