Hello, subject says it: On my old Samsung Q35 (Intel, dual core) laptop, resume after suspend to RAM fails, beginning with the 3.7 kernel series, while working up to 3.6.11. The symptoms are: After trying to resume, the backlight switches on and there seems to be HD activity (as indicated by LED, about 30 sec.), but then I'm stuck with a blinking cursor (seemingly in 80x24 VGA mode) in the upper left corner of the screen. The keyboard seems dead, including Alt-SysRq, thus forcing a hard reboot. Bisecting (1st time, so bear with me) starting from 'linux-stable.git' finally gave me: 73201dbec64aebf6b0dca855b523f437972dc7bb is the first bad commit commit 73201dbec64aebf6b0dca855b523f437972dc7bb Author: H. Peter Anvin <hpa@linux.intel.com> Date: Wed Sep 26 15:02:34 2012 -0700 x86, suspend: On wakeup always initialize cr4 and EFER .... which at least sounds relevant to my untrained eye. Don't know how to proceed or what other info you may need, so feel free to advise / ask! Thanks, Ralph
Hi Ralph, Thanks for the report! Hi HPA, Can you please take a look? Thanks.
Hello, exactly same issue here. I was really hoping 3.8.x will solve the issue, but not. I'm willing to help solving this bug. I can provide further debug information if required. Regards
I forgot to mention, my laptop is a Toshiba satellite u200, not a Samsung Q35. Also a Intel dual core (not core 2 duo), with i915 graphic card (maybe relevant). Regards
Hi Alejandro, Can you please confirm if that commit also breaks your system? Thanks.
Hi Aaron, It started to fail right with the 3.7.x series, and still does with 3.8.x. I would love to help solving this. How do I proceed to check that specific commit? I'm not used to git, hence I will need some indications. Thanks!
(In reply to comment #5) > Hi Aaron, > > It started to fail right with the 3.7.x series, and still does with 3.8.x. I > would love to help solving this. How do I proceed to check that specific > commit? I'm not used to git, hence I will need some indications. First clone linus' tree: $ git clone http://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git Then checkout the commit: $ git reset --hard 73201dbec64aebf6b0dca855b523f437972dc7bb Now build the kernel and test if resume failed. If it failed, checkout the previous commit: $ git reset --hard 5a5a51db78ef24aa61a4cb2ae36f07f6fa37356d Now build the kernel and test again. If resume OK, then it means 73201dbec64aebf6b0dca855b523f437972dc7bb is the bad commit that breaks your system. Thanks.
Hi Aaron, thanks for the info. Following the instructions it was really easy to do. After testing both of them I can confirm that the version corresponding to commit 73201dbec64aebf6b0dca855b523f437972dc7bb did not work, while the one corresponding to commit 5a5a51db78ef24aa61a4cb2ae36f07f6fa37356d worked without problems. Therefore, I can blame that commit for breaking my suspend/resume support :) I hope this can be solved quickly. Anything else you may need (e.g. additional HW info), just ask me. Thanks!
On Thursday, March 21, 2013 05:21:46 PM bugzilla-daemon@bugzilla.kernel.org wrote: > > --- Comment #7 from Alejandro <alejandro.perez.mendez@gmail.com> 2013-03-21 > 17:21:45 --- > Hi Aaron, > > thanks for the info. Following the instructions it was really easy to do. > After > testing both of them I can confirm that the version corresponding to commit > 73201dbec64aebf6b0dca855b523f437972dc7bb did not work, while the one > corresponding to commit 5a5a51db78ef24aa61a4cb2ae36f07f6fa37356d worked > without > problems. > > Therefore, I can blame that commit for breaking my suspend/resume support :) > > I hope this can be solved quickly. Anything else you may need (e.g. > additional > HW info), just ask me. Peter, it looks like we need your help.
73201dbec64aebf6b0dca855b523f437972dc7bb is known buggy, but the bug was believed to be fixed in 1396adc3c2bdc556d4cdd1cf107aa0b6d59fbb1e. So the first thing to find out is if 73201dbec64aebf6b0dca855b523f437972dc7bb with 1396adc3c2bdc556d4cdd1cf107aa0b6d59fbb1e cherry-picked on top of it works or not.
Hi Alejandro, Please follow these steps to test if commit 1396adc3c2bdc556d4cdd1cf107aa0b6d59fbb1e fixed this problem(in the cloned git tree): $ git reset --hard 73201dbec64aebf6b0dca855b523f437972dc7bb $ git cherry-pick 1396adc3c2bdc556d4cdd1cf107aa0b6d59fbb1e Now build the kernel and test if resume is OK, thanks for your help.
Hi all, no luck for me with Aarons cherry-pick recipe. Still no resume, same symptoms as before. Thanks, Ralph
(In reply to comment #11) > Hi all, > > no luck for me with Aarons cherry-pick recipe. Still no resume, same symptoms > as before. Thank you Ralph for the test. Hi Peter, Looks like that commit doesn't fix it?
It didn't work for me either :(. Same behaviour, blinking cursor, no response after trying to resume.
Umm, something interesting happened today. My daily distribution is ARCH Linux, which AFAIK provides unmodified Kernel packages. Hence, their kernel package fails the same way the one from git, showing similar behaviour. However, I just tested Ubuntu 13.04 live (pre-release) which is shipped with a (probably modified) 3.8.0 kernel version. Using that kernel suspend/resume is working. Maybe they just disabled commit 73201dbec64aebf6b0dca855b523f437972dc7bb, I don't really know. But I may be interesting to look around that. Regards, Alejandro
(In reply to comment #14) > Umm, something interesting happened today. My daily distribution is ARCH > Linux, > which AFAIK provides unmodified Kernel packages. Hence, their kernel package > fails the same way the one from git, showing similar behaviour. > > However, I just tested Ubuntu 13.04 live (pre-release) which is shipped with > a > (probably modified) 3.8.0 kernel version. Using that kernel suspend/resume is > working. Maybe they just disabled commit > 73201dbec64aebf6b0dca855b523f437972dc7bb, I don't really know. But I may be > interesting to look around that. Hi Alejandro, Is there any update about your finding?
Sorry, I've been busy, so I couldn't try further. I discovered an interesting thing, using the .config from ubuntu, with vanilla 3.8.0, the computer is able to suspend. however, using ArchLinux's configuration, it is not able to. Therefore, it seems to be related with a new kernel configuration parameter. I'm trying to discover what is it. The problem is that both files (the one from ubuntu and the one from ArchLinux) are significantly different, and the Kernel compilation process takes more than 1h in my computer. I'm trying to copy entire sections from ubuntu's to arch's (starting from ACPI section). This way I aim to surround the conflictive parameter. I will keep you posted once I discover what speciific option is. If you (as an expert) think that uploading both files could be useful, just tell. Regards
hello again, analysing this is taking longer than I though, as every single change on the .config file requires me to rebuild the whole kernel (2h 100%CPU). As commit 73201dbec64aebf6b0dca855b523f437972dc7bb is supposed to make the difference: is there any particular config option involved on that changeset? I need to narrow my search, or I'm afraid I will never find the problem :). Thanks, Alelajdnro
Hello there, I've found the problematic option. It's the PAE related stuff. With pre-3.7.x kernels, everything worked fine without PAE activated. However, it seems that starting from 3.7.0, I need to activate the CONFIG_HIGHMEM64G (and related) options to make it work. I only have 2G of RAM, so it does not make sense to me. Does commit 73201dbec64aebf6b0dca855b523f437972dc7bb introduce any change taking into account whether PAE is activated or not? Does it assumes 64bit addresses? Ralph, can you confirm if having PAE activated solves the problem for you too? Best regards, Alejandro
Hi Alejandro, you seem to have found something: Same behaviour here, enabling CONFIG_HIGHMEM64G (which seems unnecessary with my 1G RAM as well) makes resume work again! Do you or someone else know about any undesired side effects of that option, dissallowing its usage on low RAM machines? Thanks a lot for your research, and best regards, Ralph
You are welcome. I do not think enabling PAE has great disadvantages (ubuntu ships its kernel images with PAE enabled by default), but nevertheless the clearly unrelated cause->consequence relation surely indicates a bug somewhere in the kernel. I have other computers, and this one is the only one exposing such a behaviour. It could be a buggy BIOS, but then IMO it should be failing before 3.7.x (but it worked fine). It would be great if the kernel developers could locate and fix this bug. Best regards, Alejandro
On Monday, April 08, 2013 02:15:45 PM bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=54911 Hi Peter, > --- Comment #19 from Ralph Boehm <xurpher@web.de> 2013-04-08 14:15:45 --- > Hi Alejandro, > > you seem to have found something: Same behaviour here, enabling > CONFIG_HIGHMEM64G (which seems unnecessary with my 1G RAM as well) makes > resume > work again! Do you or someone else know about any undesired side effects of > that option, dissallowing its usage on low RAM machines? It looks like some change in 3.7 broke suspend/resume with highmem other than CONFIG_HIGHMEM64G which worked before and the breakage is still present (any of you guys can confirm that suspend/resume doesn't work for you with 3.9-rc6?). Do you have any idea what change might have cause that to happen?
Is this reproducible on arbitrary hardware or is it a specific set of machines? "Rafael J. Wysocki" <rjw@sisk.pl> wrote: >On Monday, April 08, 2013 02:15:45 PM >bugzilla-daemon@bugzilla.kernel.org wrote: >> https://bugzilla.kernel.org/show_bug.cgi?id=54911 > >Hi Peter, > >> --- Comment #19 from Ralph Boehm <xurpher@web.de> 2013-04-08 >14:15:45 --- >> Hi Alejandro, >> >> you seem to have found something: Same behaviour here, enabling >> CONFIG_HIGHMEM64G (which seems unnecessary with my 1G RAM as well) >makes resume >> work again! Do you or someone else know about any undesired side >effects of >> that option, dissallowing its usage on low RAM machines? > >It looks like some change in 3.7 broke suspend/resume with highmem >other than >CONFIG_HIGHMEM64G which worked before and the breakage is still present >(any of you guys can confirm that suspend/resume doesn't work for you >with 3.9-rc6?). > >Do you have any idea what change might have cause that to happen?
On Monday, April 08, 2013 07:51:08 PM bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=54911 > > --- Comment #22 from H. Peter Anvin <hpa@zytor.com> 2013-04-08 19:51:08 --- > Is this reproducible on arbitrary hardware or is it a specific set of > machines? Well, I haven't tried to reproduce it. Aaron, any chance to try to reproduce this issue in a lab?
I have several computers, both i686 and X64. It only fails on one computer (I can provide further information upon request). It's a Toshiba satellite u200, Vendor ID: GenuineIntel CPU family: 6 Model: 14 Stepping: 8 2G of RAM. I has been suspending/resuming without problems with bouth, 2.x and 3.x series.
I just confirmed that 3.9-rc6 does not solve the problem. Regards
First of all: since PAE is required for NX, it is generally the preferred configuration these days, regardless of amount of memory. However, it should work, obviously, but why on Earth PAE should have any impact here *in that direction* is bizarre. -hpa
Rafael, can you think of any way we could get the wakeup_header dumped out at suspend time? The other thing I can think of is if we can get a message out giving an idea where it is hanging during startup... -hpa
On Monday, April 08, 2013 10:41:26 PM bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=54911 > > --- Comment #27 from H. Peter Anvin <hpa@zytor.com> 2013-04-08 22:41:26 --- > Rafael, can you think of any way we could get the wakeup_header dumped > out at suspend time? That's populated very late, after we've switched every useful output device off. The only thing we could do would be to use the CMOS RTC memory to store stuff and read it from there on the next boot. > The other thing I can think of is if we can get a message out giving an > idea where it is hanging during startup... Or add something that will cause the box to reboot to the wakeup code and move it from one place to another to see when it hangs instead of rebooting.
It sounds like we're in text mode so maybe we can just poke into video memory... bugzilla-daemon@bugzilla.kernel.org wrote: >https://bugzilla.kernel.org/show_bug.cgi?id=54911 > > > > > >--- Comment #28 from Rafael J. Wysocki <rjw@sisk.pl> 2013-04-09 >00:35:55 --- >On Monday, April 08, 2013 10:41:26 PM >bugzilla-daemon@bugzilla.kernel.org >wrote: >> https://bugzilla.kernel.org/show_bug.cgi?id=54911 >> >> --- Comment #27 from H. Peter Anvin <hpa@zytor.com> 2013-04-08 >22:41:26 --- >> Rafael, can you think of any way we could get the wakeup_header >dumped >> out at suspend time? > >That's populated very late, after we've switched every useful output >device >off. > >The only thing we could do would be to use the CMOS RTC memory to store >stuff >and read it from there on the next boot. > >> The other thing I can think of is if we can get a message out giving >an >> idea where it is hanging during startup... > >Or add something that will cause the box to reboot to the wakeup code >and move >it from one place to another to see when it hangs instead of rebooting.
No, we aren't in the text mode. There simply is no graphics at this point.
In principle, we can use acpi_sleep=s3_bios to revive the graphics early, but that usually doesn't work.
(In reply to comment #23) > > --- Comment #22 from H. Peter Anvin <hpa@zytor.com> 2013-04-08 19:51:08 > --- > > Is this reproducible on arbitrary hardware or is it a specific set of > machines? > > Well, I haven't tried to reproduce it. > > Aaron, any chance to try to reproduce this issue in a lab? No I'm afraid, sorry. I have an old HP Compaq 6531s but it doesn't have this problem.
Umm, you said PAE is required for NX. Is it somehow possible that I have NX enabled in the BIOS, and that is making the resume fail? Maybe I'm taking shots in the dark. I cannot try it now anyway, since I'm at work and the failure is happening in my laptop, but I will try when I arrive home.
Well, I can confirm this issue. In the BIOS, Executable bit (NX) was disabled, and that's is making a non-PAE kernel to fail resuming on 3.7, 3.8 and 3.9 series. Enabling NX in the BIOS solves the problem, and the resume functionality works again, at least with 3.8.5 and 3.9-rc6. But, using any of the 3.x (x < 7) series, it was working without problem. Regards
NX should not matter at all for non-PAE, but clearly that is not actually happening. Could you install the msr-tools package on your computer and do a: rdmsr -xc 0xc0000080 ... as root, please?
(Specifially while running on the non-PAE kernel with PAE enabled in the BIOS.)
Even better, please do (again, as root): cd /dev/cpu ; for c in [0-9]*; do rdmsr -p $c -xc 0xc0000080; done
Sure, executed on a non-PAE kernel, with Executable Bit option activated in the BIOS (the configuration that actually works), the results of the above command gives the following: $cd /dev/cpu ; for c in [0-9]*; do rdmsr -p $c -xc 0xc0000080; done 0x0 0x0
Hi, peter, any update for this?
ping...
Alejandro, any chance to try 3.10-rc4?
Sure, I will try when I have a moment. I suppuse I have to disable NX to replicate the error conditions. I will try tonigh (here it is 8:23am). Regards
Hello, sadly it does not work either. Did you change anything in special or just trying to know if it was fixed by other changeset? Regards
The 3.10-rc kernels include a fix that might be related to it. Thanks for testing!
Hi Alejandro, Can you please test if v3.10 fixed your problem? Thanks.
Fixed by 5ff560f x86, suspend: Handle CPUs which fail to #GP on RDMSR .