Bug 12645 - DMI low-memory-protect quirk causes resume hang on Samsung NC10
DMI low-memory-protect quirk causes resume hang on Samsung NC10
Status: CLOSED CODE_FIX
Product: Power Management
Classification: Unclassified
Component: Hibernation/Suspend
All Linux
: P1 normal
Assigned To: Ingo Molnar
:
Depends on:
Blocks: 7216 11808
  Show dependency treegraph
 
Reported: 2009-02-06 18:35 UTC by Patrick Walton
Modified: 2011-07-30 05:53 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.27.7
Tree: Mainline
Regression: Yes


Attachments
lspci for Samsung NC10 (9.72 KB, text/plain)
2009-02-10 01:40 UTC, Patrick Walton
Details
Patch from Comment #12 (1.22 KB, patch)
2009-03-15 03:25 UTC, Rafael J. Wysocki
Details | Diff

Description Patrick Walton 2009-02-06 18:35:15 UTC
Latest working kernel version: 2.6.27.2
Earliest failing kernel version: 2.6.28.3
Distribution: Arch Linux, all problems reproduced using a mainline kernel
Hardware Environment: Samsung NC10 Netbook, purchased 02/2009
Software Environment: Linux, latest BIOS, triggered suspend in early userspace before modules are loaded
Problem Description:
The kernel hangs before the video is turned on (but after the kernel takes control) when resuming from suspend.

Using pm_trace produces nothing useful, because the RTC does not get updated at all.

Steps to reproduce:
Suspend at any time by using pm-suspend or echo mem. The HDD light will blink when resuming, indicating that the kernel has control, but it will hang.
Comment 1 Patrick Walton 2009-02-06 18:35:58 UTC
Marking as a regression, because it works in 2.6.27.2 and not in 2.6.28.3.
Comment 2 Rafael J. Wysocki 2009-02-07 05:39:25 UTC
Does it also happen if the kernel is booted with init=/bin/bash ?
Comment 3 Patrick Walton 2009-02-07 10:18:57 UTC
Yes, I tried solely booting off the initramfs (without loading modules) and it still resulted in a hang.
Comment 4 Rafael J. Wysocki 2009-02-07 14:59:00 UTC
Unfortunately, I have no idea what's wrong.

There were not too many suspend-specific changes between 2.6.27 and 2.6.28.

Can you check if the latest stable 2.6.27.y still works for you?

Also please attach the output of lspci from your box.
Comment 5 Patrick Walton 2009-02-10 01:40:55 UTC
Created attachment 20180 [details]
lspci for Samsung NC10
Comment 6 Patrick Walton 2009-02-10 01:43:04 UTC
I've been triaging it some. Earliest working kernel is 2.6.27.6. The 2.6.27.10 kernel, however, fails in the same way, so it seems that this bug actually appeared in the 2.6.27 series. If nobody has further information I'll continue trying to bisect it down to the commit level if possible.

Also I attached the lspci. I believe that there are different revisions of the Samsung NC10 hardware, because not everybody with one has encountered this bug.
Comment 7 Rafael J. Wysocki 2009-02-10 05:39:43 UTC
If 2.6.27.6 works for you and 2.6.27.10 doesn't, it should be possible to use bisection to find the -stable commit that broke things for you.
Comment 8 Patrick Walton 2009-02-12 01:19:14 UTC
Bisection successful. Here's the bad commit:

--------

fb03039affb5a36920abcfb5523c30ca39098498 is first bad commit
commit fb03039affb5a36920abcfb5523c30ca39098498
Author: Philipp Kohlbecher <xt28@gmx.de>
Date:   Sun Nov 16 12:11:01 2008 +0100

    x86: more general identifier for Phoenix BIOS
    
    commit 0af40a4b1050c050e62eb1dc30b82d5ab22bf221 upstream.
    
    Impact: widen the reach of the low-memory-protect DMI quirk

--------

It seems that the kernel's *avoidance* of the first 64k region is causing it to be unable to resume. I don't know why this would be the case.
Comment 9 Rafael J. Wysocki 2009-02-12 13:52:14 UTC
Great, thanks for bisecting it.

Notify-Also : Philipp Kohlbecher <xt28@gmx.de>
Notify-Also : Ingo Molnar <mingo@elte.hu>

First-Bad-Commit : 0af40a4b1050c050e62eb1dc30b82d5ab22bf221

Ingo, are we going to fix or revert it?
Comment 10 Rafael J. Wysocki 2009-02-25 14:21:39 UTC
On Tuesday 24 February 2009, Patrick Walton wrote:
> Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.27 and 2.6.28.
> > 
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.27 and 2.6.28.  Please verify if it still should
> > be listed and let me know (either way).
> > 
> > 
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=12645
> > Subject		: DMI low-memory-protect quirk causes resume hang on Samsung NC10
> > Submitter	: Patrick Walton <pcwalton@cs.ucla.edu>
> > Date		: 2009-02-06 18:35 (18 days old)
> > First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=0af40a4b1050c050e62eb1dc30b82d5ab22bf221
> > 
> > 
> 
> Yes, it should remain listed, as it's still a problem; there are others 
> now experiencing the same issue.

Comment 11 Rafael J. Wysocki 2009-03-02 14:43:48 UTC
First-Bad-Commit : 0af40a4b1050c050e62eb1dc30b82d5ab22bf221
Comment 12 Ingo Molnar 2009-03-15 00:36:29 UTC
> > Yes, it should remain listed, as it's still a problem; there 
> > are others now experiencing the same issue.

There's a fix from today that might solve this resume-hang bug. 
Could people who are affected by this bug please test the latest 
-tip tree:

    http://people.redhat.com/mingo/tip.git/README

Is the problem fixed? The patch in question is:

    6d7942d: x86: fix 64k corruption-check

	Ingo

Comment 13 Rafael J. Wysocki 2009-03-15 03:25:46 UTC
Created attachment 20528 [details]
Patch from Comment #12

x86: fix 64k corruption-check

Impact: fix boot crash

Need to exit early if the addr is far above 64k.

The crash got exposed by:

  78a8b35: x86: make e820_update_range() handle small range update
Comment 14 Zhang Rui 2009-03-18 20:02:01 UTC
patrick, please try the patch in comment #13 and see if it helps.
Comment 15 Len Brown 2009-04-14 01:36:34 UTC
moving to RESOLED state since patch is proposed.
Comment 16 Len Brown 2011-07-30 05:53:42 UTC
closed by:


commit 6d7942dc2a70a7e74c352107b150265602671588
Author: Yinghai Lu <yinghai@kernel.org>
Date:   Sat Mar 14 14:32:41 2009 -0700

    x86: fix 64k corruption-check
    
    Impact: fix boot crash
    
    Need to exit early if the addr is far above 64k.
    
    The crash got exposed by:
    
      78a8b35: x86: make e820_update_range() handle small range update
    
    Signed-off-by: Yinghai Lu <yinghai@kernel.org>
    Cc: <stable@kernel.org>
    LKML-Reference: <49BC2279.2030101@kernel.org>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

Note You need to log in before you can comment on or make changes to this bug.