Latest working kernel version: 2.6.27.2 Earliest failing kernel version: 2.6.28.3 Distribution: Arch Linux, all problems reproduced using a mainline kernel Hardware Environment: Samsung NC10 Netbook, purchased 02/2009 Software Environment: Linux, latest BIOS, triggered suspend in early userspace before modules are loaded Problem Description: The kernel hangs before the video is turned on (but after the kernel takes control) when resuming from suspend. Using pm_trace produces nothing useful, because the RTC does not get updated at all. Steps to reproduce: Suspend at any time by using pm-suspend or echo mem. The HDD light will blink when resuming, indicating that the kernel has control, but it will hang.
Marking as a regression, because it works in 2.6.27.2 and not in 2.6.28.3.
Does it also happen if the kernel is booted with init=/bin/bash ?
Yes, I tried solely booting off the initramfs (without loading modules) and it still resulted in a hang.
Unfortunately, I have no idea what's wrong. There were not too many suspend-specific changes between 2.6.27 and 2.6.28. Can you check if the latest stable 2.6.27.y still works for you? Also please attach the output of lspci from your box.
Created attachment 20180 [details] lspci for Samsung NC10
I've been triaging it some. Earliest working kernel is 2.6.27.6. The 2.6.27.10 kernel, however, fails in the same way, so it seems that this bug actually appeared in the 2.6.27 series. If nobody has further information I'll continue trying to bisect it down to the commit level if possible. Also I attached the lspci. I believe that there are different revisions of the Samsung NC10 hardware, because not everybody with one has encountered this bug.
If 2.6.27.6 works for you and 2.6.27.10 doesn't, it should be possible to use bisection to find the -stable commit that broke things for you.
Bisection successful. Here's the bad commit: -------- fb03039affb5a36920abcfb5523c30ca39098498 is first bad commit commit fb03039affb5a36920abcfb5523c30ca39098498 Author: Philipp Kohlbecher <xt28@gmx.de> Date: Sun Nov 16 12:11:01 2008 +0100 x86: more general identifier for Phoenix BIOS commit 0af40a4b1050c050e62eb1dc30b82d5ab22bf221 upstream. Impact: widen the reach of the low-memory-protect DMI quirk -------- It seems that the kernel's *avoidance* of the first 64k region is causing it to be unable to resume. I don't know why this would be the case.
Great, thanks for bisecting it. Notify-Also : Philipp Kohlbecher <xt28@gmx.de> Notify-Also : Ingo Molnar <mingo@elte.hu> First-Bad-Commit : 0af40a4b1050c050e62eb1dc30b82d5ab22bf221 Ingo, are we going to fix or revert it?
On Tuesday 24 February 2009, Patrick Walton wrote: > Rafael J. Wysocki wrote: > > This message has been generated automatically as a part of a report > > of regressions introduced between 2.6.27 and 2.6.28. > > > > The following bug entry is on the current list of known regressions > > introduced between 2.6.27 and 2.6.28. Please verify if it still should > > be listed and let me know (either way). > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12645 > > Subject : DMI low-memory-protect quirk causes resume hang on > Samsung NC10 > > Submitter : Patrick Walton <pcwalton@cs.ucla.edu> > > Date : 2009-02-06 18:35 (18 days old) > > First-Bad-Commit: > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=0af40a4b1050c050e62eb1dc30b82d5ab22bf221 > > > > > > Yes, it should remain listed, as it's still a problem; there are others > now experiencing the same issue.
First-Bad-Commit : 0af40a4b1050c050e62eb1dc30b82d5ab22bf221
> > Yes, it should remain listed, as it's still a problem; there > > are others now experiencing the same issue. There's a fix from today that might solve this resume-hang bug. Could people who are affected by this bug please test the latest -tip tree: http://people.redhat.com/mingo/tip.git/README Is the problem fixed? The patch in question is: 6d7942d: x86: fix 64k corruption-check Ingo
Created attachment 20528 [details] Patch from Comment #12 x86: fix 64k corruption-check Impact: fix boot crash Need to exit early if the addr is far above 64k. The crash got exposed by: 78a8b35: x86: make e820_update_range() handle small range update
patrick, please try the patch in comment #13 and see if it helps.
moving to RESOLED state since patch is proposed.
closed by: commit 6d7942dc2a70a7e74c352107b150265602671588 Author: Yinghai Lu <yinghai@kernel.org> Date: Sat Mar 14 14:32:41 2009 -0700 x86: fix 64k corruption-check Impact: fix boot crash Need to exit early if the addr is far above 64k. The crash got exposed by: 78a8b35: x86: make e820_update_range() handle small range update Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: <stable@kernel.org> LKML-Reference: <49BC2279.2030101@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>