Bug 200455
Summary: | (Dell Latitude E7440 laptop with Intel Core-i7 CPU)Laptop freeze on hibernate(MEI error) | ||
---|---|---|---|
Product: | Drivers | Reporter: | Yakov Sh. (yman) |
Component: | Other | Assignee: | Tomas Winkler (tomas.winkler) |
Status: | NEEDINFO --- | ||
Severity: | normal | CC: | alexander.usyskin, bugrfeaturer, georgmueller, pizza, tomasw, yu.c.chen |
Priority: | P1 | ||
Hardware: | Intel | ||
OS: | Linux | ||
Kernel Version: | 4.17 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
MEI error log photo
kernel panic photo MEI error log photo attachment-18619-0.html fix hw module get/put balance panic fix |
Description
Yakov Sh.
2018-07-09 09:18:18 UTC
Getting logs/workaround: * wait for a while (5m), then move to other virtual terminal; * also try to switch various options of 'i915' module and 'intel' driver. 1. Please blacklist i915 module and try if it works. 2. And also you can try different pm_test mode to narrow down. 3. Besides, you can try test_resume mode to check if works. Please refer to https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/power/basic-pm-debugging.txt for detail. thanks. Besides, you can try using git to download kernel code via git clone https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git and leverage git bisect to find the bad commit. Observations correction. After system boots one hibernation work smooth, second - freeze system. First of all I tried latest available kernel in Arch - 4.17.5. Nothing changed. Blacklisting i915 didn't work. But using debugging guide and working from console I managed to get this messages on freeze: mei_wdt mei::{LONG_STRING_LIKE_SERIAL_NUMBER}: get hw module failed med_wdt mei::{LONG_SAME_STRING}: Could not enable cl device After blacklisting mei and mei_me modules problem disappears. Interestingly, in a last month while struggling with hibernation I got kernel panics 2 times while rebooting. On the second occurrence (today) I read through text and it mention mei too. I can upload couple of photos if it's necessary. Thanks for your investigation, so this seems to be mei related. Could you please upload the full log(picture) for mei? There should be a fix for it in v4.17.6 commit 7ac3afe1341e20359c0d1f9e1f358d175e52bcf4 Author: Alexander Usyskin <alexander.usyskin@intel.com> Date: Thu Jun 7 00:31:48 2018 +0300 mei: discard messages from not connected client during power down. commit b7a020bff31318fc8785e6f96b1d38c1625cf1fb upstream. This fixes regression introduced by commit 8d52af6795c0 ("mei: speed up the power down flow") In power down or suspend flow a message can still be received from the FW because the clients fake disconnection. In normal case we interpret messages w/o destination as corrupted and link reset is performed in order to clean the channel, but during power down link reset is already in progress resulting in endless loop. To resolve the issue under power down flow we discard messages silently. Created attachment 277869 [details]
MEI error log photo
Created attachment 277871 [details]
kernel panic photo
Created attachment 277873 [details]
MEI error log photo
Comment on attachment 277873 [details]
MEI error log photo
Messages I got trying to hibernate system manually
Comment on attachment 277871 [details]
kernel panic photo
Kernel panic happened once when I tried to reboot laptop, which mentions MEI
Unblacklisted both modules while running 4.17.14-arch1-1-ARCH, problem persist. Okay, this really looks like another issue. We are trying to analyze this. Is the crash on 4.17.14, exactly the same as in 4.17.1? Created attachment 277877 [details]
attachment-18619-0.html
I’m on vacation with no email access till Sep 2.
For LMS and PRT tools issues contact Oren Weil jer.
For Mei driver - Tomas Winkler.
You can try to revert this one if and check if it helps commit 257355a44b9929e55d6fd47bfff66971dc4de948 Author: Tomas Winkler <tomas.winkler@intel.com> Date: Sun Feb 25 20:07:04 2018 +0200 mei: make module referencing local to the bus.c Module reference counting is relevant only to the mei client devices. Make the implementation clean and move it to bus.c I also have a Dell Latitude E7440 with the same issue and reverting commit 257355a44b9929e55d6fd47bfff66971dc4de948 solved it. Thanks Georg for report, I'm working on the fix. If I can help with some additional added debug messages just write me what to test. Created attachment 278055 [details]
fix hw module get/put balance
fix hw module get/put balance
Created attachment 278057 [details]
panic fix
need to unlink client before freeing
Georg, it would be great if you can check those two patches. Applying the two patches fixes the issue. Really appreciated your help, can I add your Tested-by: to the sumbission? Sure. Thank you very much for the patches. I have seen you have posted the patches to LKML, but they are still neither in Linus' tree nor considered for next stable. This bug has "regression: no", even though it is a more or less serious regression. Should I rather ask in Redhat bugzilla to include the patch? They normally track stable. I hope it will be picked up for 4.19-rc3 looks like Greg has missed 4.19-rc2. Greg KH, is rather a very busy maintainer. I believe, I had marked the patches for stable, so they will eventually land there. I'm sorry we caused this issue and I made an effor to provide the fix ASAP but I'm not sure how to speed up the process from this point on. Kernel 4.18.10 and 4.19-rc5 both contain the patches for the issue. Thank you very much. I think the issue can now be marked as resolved. Thanks for the report. Tomas Work as expected on 4.18.10 with MEI modules unblacklisted. Thanks everyone for the help! |