Bug 60612

Summary: Kernel crashes after resume from ram, takes about an hour after resume
Product: Drivers Reporter: John Yost (AlleyTrotter)
Component: OtherAssignee: Tomas Winkler (tomas.winkler)
Status: CLOSED CODE_FIX    
Severity: blocking CC: aaron.lu, tomasw
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 3.10.0 3.10.1 3.10.2 Subsystem:
Regression: No Bisected commit-id:
Attachments: my .config file
my lspci

Description John Yost 2013-07-23 18:37:19 UTC
not real sure how to file so I have attached a picture of the error dump. I have several photos from additional dumps, but the system seems to only accept one. My system is an AsRock Z77 extreme 4 motherboard with an Intel core i5 3570k with kingston SSD 120GB 32G of ram and a 2TB hard drive running slackware 14.0 and KDE 4.10.5
The configuration for the kernel is from slackware current/testing config-generic
ext4 filesystem.
Please advise if you need anything additional.
thanks
john

It seems my photo is too large, How do I get it to you?
Comment 1 John Yost 2013-07-23 18:39:48 UTC
The kernel crashes after a resume from "suspend to ram" in KDE. It takes about an hour after the resume, consistently
Comment 2 John Yost 2013-07-23 18:50:04 UTC
The photos can be reviewed on my G+ page "AlleyTrotter"
Comment 3 John Yost 2013-07-23 23:02:06 UTC
some additional info after resume my syslog fills with the following two lines

Jul 23 18:45:37 linux kernel: [ 911.287719] mei_me 0000:00:16.0: reset: init clients timeout hbm_state = 1.
Jul 23 18:45:37 linux kernel: [ 911.287726] mei_me 0000:00:16.0: unexpected reset: dev_state = RESETTING

sorry I don't have more info
John
Comment 4 Aaron Lu 2013-07-24 02:11:53 UTC
Looks like problem related to MEI, I suppose MEI here means: Intel Management Engine Interface (Intel MEI) Linux driver. Can you please unload that module before S3 and see if problem solved?
Comment 5 John Yost 2013-07-24 21:20:55 UTC
Aaron thanks for the response
Per your question.

CONFIG_INTEL_MEI=y
CONFIG_INTEL_MEI_ME=y

if this is what you mean by MEI, both are built onto kernel
What is the reference to S3?

John
Comment 6 Rafael J. Wysocki 2013-07-24 21:23:52 UTC
On Wednesday, July 24, 2013 09:20:55 PM you wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=60612
> 
> --- Comment #5 from John Yost <AlleyTrotter@gmail.com> ---
> Aaron thanks for the response
> Per your question.
> 
> CONFIG_INTEL_MEI=y
> CONFIG_INTEL_MEI_ME=y
> 
> if this is what you mean by MEI, both are built onto kernel
> What is the reference to S3?

This driver may cause problems to happen if you use suspend/resume.

Please try to unset CONFIG_INTEL_MEI in your .config, rebuild the kernel and see if you still experience the problem as described.
Comment 7 Aaron Lu 2013-07-25 01:23:59 UTC
S3 means suspend to ram. Please follow Rafael's suggestion in comment #6, thanks.
Comment 8 John Yost 2013-07-25 12:44:58 UTC
Rafael & Aaron
I have changed the kernel configuration as per your recommendation :
# CONFIG_INTEL_MEI is not set
# CONFIG_INTEL_MEI_ME is not set
and rebuilt
My system has been running successfully with several suspend and resume cycles
I believe that my 'bug' has been solved
Please advise if you require any additional information from me.
Thanks
John
Comment 9 Rafael J. Wysocki 2013-07-25 19:51:30 UTC
On Thursday, July 25, 2013 12:44:58 PM bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=60612
> 
> --- Comment #8 from John Yost <AlleyTrotter@gmail.com> ---
> Rafael & Aaron
> I have changed the kernel configuration as per your recommendation :
> # CONFIG_INTEL_MEI is not set
> # CONFIG_INTEL_MEI_ME is not set
> and rebuilt
> My system has been running successfully with several suspend and resume
> cycles
> I believe that my 'bug' has been solved
> Please advise if you require any additional information from me.

Tomas, I wonder if this is a known issue?

Rafael
Comment 10 Tomas Winkler 2013-07-26 07:47:59 UTC
> > Aaron I have changed the kernel configuration as per your
> > recommendation :
> > # CONFIG_INTEL_MEI is not set
> > # CONFIG_INTEL_MEI_ME is not set
> > and rebuilt
> > My system has been running successfully with several suspend and
> > resume cycles I believe that my 'bug' has been solved Please advise if
> > you require any additional information from me.
> 
> Tomas, I wonder if this is a known issue?

Yes, I've posted few patches that should resolved that, we got mixed answer to so I continue to investigate.
https://lkml.org/lkml/2013/7/24/599

Thanks
Tomas
Comment 11 John Yost 2013-07-26 13:59:56 UTC
Something which may be interesting
I rebuilt the kernel with MEI as modules
My system does indeed use the modules as they are indicated to be loaded by lsmod.

The system still crashes but only after the second resume from S3.
on the first resume everthing is fine. After the second resume I get the repeated messages in syslog.
after 
rmmod mei_me
and 
rmmod mei 
the system returns to normal.
Hopefully this info will be of use to you
thanks
john
Comment 12 John Yost 2013-07-26 14:05:51 UTC
One additional piece of data the crash does not occur in 3.9
Sorry to be such a bother
john
Comment 13 John Yost 2013-07-28 12:33:57 UTC
Problem still exists in 3.10.3

john
Comment 14 Tomas Winkler 2013-07-30 11:06:35 UTC
When you apply this patch on 3.10.X does it solve the issue
https://lkml.org/lkml/2013/7/17/219
Thanks
Tomas
Comment 15 John Yost 2013-07-30 17:05:30 UTC
Created attachment 107042 [details]
my .config file
Comment 16 John Yost 2013-07-30 17:06:35 UTC
Created attachment 107043 [details]
my lspci
Comment 17 John Yost 2013-07-30 17:14:04 UTC
when I built kernel 3.10.4 with your patch and MEI & MEI-ME as built ins the system on the second resume from S3 would not function. When trying to start any program from the KDE desktop a window would open but would be filled with alternate horizontal lines. Completely unusable. I had to use [control-alternate F2] to open a second terminal and kill kwin to get control. The only usable information in syslog was
"""
Jul 30 11:55:11 linux kernel: [  603.360126] Restarting tasks ... done.
Jul 30 11:57:31 linux kernel: [  743.356520] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
"""

Then I tried building the same kernel with MEI & MEI-ME built as modules. After starting the KDE desktop all seemed normal. I then tried to suspend to ram and the computer locked up. My only possible action was to do a hard reset on the system.

I added my .config and my lspci in hope it will help
John
Comment 18 John Yost 2013-08-17 18:45:41 UTC
Just a reminder
The MEI-ME flood still occurs in 3.10.7

On a better note. All my external USB devices are properly enumerated during bootup, for a while they were regularly missing some of them.
John
Comment 19 John Yost 2013-08-21 23:53:15 UTC
Just to stay current the bug is still there in 3.10.9
John
Comment 20 John Yost 2013-08-29 21:56:03 UTC
Just a quick report.
3.10.10 fixes the mei-me flood after resume from S3

I built it back into 3.10.10 as modules (mei-me) (mei)
Removed the Blacklist
I can S3 suspend and resume as many times as I like and no flood.
I still don't know what it is used for even though my i5 lists it under lspci

John