Bug 27472

Summary: 5bd5a45 breaks resume from suspend on Thinkpad X201
Product: Platform Specific/Hardware Reporter: Björn Schließmann (chronoss)
Component: i386Assignee: CASTET Matthieu (castet.matthieu)
Severity: normal CC: castet.matthieu, florian, hpa, maciej.rutecki, mat, rjw, tglx, wl
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.38-rc Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 7216, 27352    
Attachments: kernel configuration used
possible fix
potential fix

Description Björn Schließmann 2011-01-23 23:24:18 UTC
Created attachment 44962 [details]
kernel configuration used

Trying v2.6.38-rc1 I noticed the following behaviour with suspending (to ram, i.e. no hibernate) on Lenovo Thinkpad X201:

- Entering suspend works fine.
- After suspending, pressing the power button wakes up the computer (LEDs light up)
- A second or so thereafter, it turns off completely.
- A few more seconds later, computer turns itself on again and boots freshly.

Bisecting lead me to commit 5bd5a452662bc37c54fb6828db1a3faf87e6511c (x86: Add NX protection for kernel data). The commit mentions extensions to the CONFIG_DEBUG_RODATA option. I had this option turned on until now; turning it off doesn't make the described suspend bug go away however. Also checked with v2.6.38-rc2, no change.

I'll attach my .config. Please advise when I can help further.
Comment 1 Rafael J. Wysocki 2011-01-24 22:12:27 UTC
Please test if commit 691513f70d3957939a318da970987b876c720861 works for you.
Comment 2 H. Peter Anvin 2011-01-24 22:27:36 UTC
Specifically, 5bd5a452662bc37c54fb6828db1a3faf87e6511c and 84e1c6bb38eb318e456558b610396d9f1afaabf0 are known broken (and fixed immediately thereafter).  It might be useful to redo the bisect, using "git bisect skip" if you land on either of those two commits.  (If this is meaningful is hard to know without having seen the exact bisect sequence.)
Comment 3 CASTET Matthieu 2011-01-24 22:36:03 UTC
Could you update the Component ?
Your not on 64 bit kernel but 32 bit one (so 691513f7 will be useless).

I already got another report of falling resume
http://marc.info/?l=linux-kernel&m=129545865717572&w=2, I will look into this.

I suspect this come from the fact that 32 bit wakeup code doesn't use
trampoline, but does it own real mode to protected mode transition.

This is reproducible in qemu, so it should be easy to track what's going on.
Comment 4 CASTET Matthieu 2011-01-24 23:29:36 UTC
Ok, kernel triple fault at jmp	pmode_return in wakeup.S

That's because initial_page_table look like :
0x00000000-0x000a0000         640K     RW             GLB NX pte
0x000a0000-0x00100000         384K     RW             GLB x  pte
0x00100000-0x00355000        2388K     ro             GLB x  pte
0x00355000-0x00424000         828K     ro             GLB NX pte
0x00424000-0x00600000        1904K     RW             GLB NX pte
0x00600000-0x07e00000         120M     RW         PSE GLB NX pmd
0x07e00000-0x07ffd000        2036K     RW             GLB NX pte

and we are running at 0x9a000 (in nx aera).

I will see how to fix that tomorrow.
Comment 5 H. Peter Anvin 2011-01-25 00:54:04 UTC
All low memory "trampolines" need to be marked !NX.  I'm working on a unified trampoline patch, but in the meantime we need a fix for this.
Comment 6 CASTET Matthieu 2011-01-25 09:15:11 UTC
Its depends of the trampoline :
for example the smp 32 bits trampoline doesn't enable paging, so it doesn't need to be !NX.

I'll attach a possible fix.
Comment 7 CASTET Matthieu 2011-01-25 09:20:44 UTC
Created attachment 45102 [details]
possible fix
Comment 8 H. Peter Anvin 2011-01-25 18:49:39 UTC
What is the point in flipping this back and forth?  It seems easier to mark it X when it is allocated and leave it like that.
Comment 9 CASTET Matthieu 2011-01-25 20:40:40 UTC
Created attachment 45192 [details]
potential fix

This patch make the first 1MB not NX when ACPI_SLEEP is enabled. This should fix the resume issue on 32 bit kernel.

We could make the first 1MB NX if we change the wake trampoline logic to do something like smp trampoline (or even use it). But that's for another patch. Let's fix the regression for the moment.
Comment 10 H. Peter Anvin 2011-01-25 21:22:40 UTC
This is OK as a test patch, but the proper patch should be a tight bound: it should avoid the trampoline reserved areas (and no, it probably will not use the SMP trampoline per se.)  However, given the trampoline allocation unification patchset I'm working on it might be easier to wait for that.
Comment 11 Björn Schließmann 2011-01-25 22:32:15 UTC
ad #1: I'm afraid it does not work. Compiling at this commit shows the same behaviour as 5bd5a45 (as does tag v2.6.38-rc2).

ad #9: I applied this patch on top of 691513f; it made suspending work again.

ad #10: If you need your patchset tested when it's ready, I'd be happy to help.
Comment 12 H. Peter Anvin 2011-01-25 23:30:18 UTC
I posted an *incomplete* version of this patchset (incomplete because reboot is not yet converted, there are modprobe false positive warnings, and it is all one big patch at this point) at:

Comment 14 Björn Schließmann 2011-01-26 23:21:59 UTC
I applied your unified patchset from #13 to v2.6.38-rc2 and compiled; the kernel failed to wake up from suspend. Would it make sense to test the patch applied upon a different commit?
Comment 15 CASTET Matthieu 2011-01-27 18:54:47 UTC
H. Peter Anvin: but your trampoline stuff won't be ready for 2.6.28 ?
So what could we do ?

Either we apply a patch like the one I proposed in #9, either we revert the NX protection for 32 bit version (but that means path #9 + not doing it for data/bss).

Don't you think #9 is ok for 2.6.28, and we will do a cleaner stuff when your trampoline stuff will be in the next stable version ?

PS : I tried to make a patch that only make the wakeup trampoline not NX, but it failed for an unknown reason. And because I am a bit busy ATM and it is only a temporary solution I got some the #9 patch.
Comment 16 Maciej Rutecki 2011-01-27 19:35:01 UTC
*** Bug 27662 has been marked as a duplicate of this bug. ***
Comment 17 Rafael J. Wysocki 2011-01-27 19:48:29 UTC
Matthieu, can you prepare a patch reverting the NX change for x86_32, please?
Comment 18 CASTET Matthieu 2011-01-27 21:41:13 UTC
Doesn't #9 patch is okay ?
This revert only the problematic part : ie setting NX for the 1 MB were the wakeup trampoline live.

I have another patch that really fix the problem (and merge 32 and 64 bit wakeup code) :
http://castet.matthieu.free.fr/tmp/wake32_use_trampoline.diff , but I think it is too early for 2.6.28.
Comment 19 Rafael J. Wysocki 2011-01-27 22:38:17 UTC
I don't really see right now why the change in is_kernel_text() is necessary
(care to explain?).  Besides, it looks like Peter wanted something more
fine grained.

If your trampoline merging code is _tested_ and known to work, I don't see
much reason deferring it.  It's not exactly a one-liner, but also it's simple
Comment 20 Matthias Hopf 2011-01-28 16:46:33 UTC
The patch from comment 9 fixes *both* this regression, and the regression regarding S3 suspend I described later in a mail on LKML.
Comment 21 CASTET Matthieu 2011-01-28 22:30:49 UTC
The x86: Add NX protection for kernel data [1], make is_kernel_text() start at _text instead of PAGE_OFFSET.

Doing that break the current wakeup trampoline code. Why because it enables pagination while it is in low memory (at address < 1MB). The initial_page_table is load, but 1MB memory is now NX, so we do a triple fault.

I tested the trampoline merging code only on qemu (smp=1 and smp=2) and a x86_32 laptop.

[1] http://git.kernel.org/?p=linux/kernel/git/x86/linux-2.6-tip.git;a=commitdiff;h=5bd5a452662bc37c54fb6828db1a3faf87e6511c
Comment 22 Rafael J. Wysocki 2011-01-31 17:57:09 UTC
Handled-By : CASTET Matthieu <castet.matthieu@free.fr>
Patch : https://bugzilla.kernel.org/attachment.cgi?id=45192
Comment 23 Rafael J. Wysocki 2011-02-07 09:48:56 UTC
Ignore-Patch : https://bugzilla.kernel.org/attachment.cgi?id=45192

Handled-By : H. Peter Anvin <hpa@zytor.com>
Patch : https://lkml.org/lkml/2011/2/7/55
Comment 24 Björn Schließmann 2011-02-07 17:33:55 UTC
I confirm that, with patch from #23, waking up from suspend works.
Comment 25 Björn Schließmann 2011-02-10 17:27:13 UTC
BTW: At the place where the patch from #23 (d344e38b2c151ca5e5e39f562017127e93912528) entered mainline, a different commit(5d1d0cc87fc0887921993ea0742932e0c8adeda0, bug #28802) breaks resume again (a bit differently though). But cherry-picking d344e38 resolves the issue until 5d1d0cc.
Comment 26 Rafael J. Wysocki 2011-02-10 18:47:00 UTC
What kernel are you referring to?
Comment 27 Björn Schließmann 2011-02-10 19:45:21 UTC
Sorry ;) I mean Linus' "main" 2.6 tree, i.e. the one from git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
Comment 28 Rafael J. Wysocki 2011-02-10 20:18:14 UTC
What do you mean by "cherry-picking d344e38", then?
Comment 29 Björn Schließmann 2011-02-11 18:29:01 UTC
Well, because this bug here prevents resuming even before the other (i915-related) bug kicks in, masking the other bug; bisecting would normally not be useful. So, at every bisect step, I "git cherry-pick"ed the patch from this bug report (that the trampoline is fixed) and I can concentrate on the i915 bug. After testing, I fed git bisect with "good/bad HEAD^".
Comment 30 Rafael J. Wysocki 2011-02-11 19:01:14 UTC
I see, thanks.  Let's track the new problem in bug #28802, then.
Comment 31 Rafael J. Wysocki 2011-02-12 22:26:22 UTC
*** Bug 28342 has been marked as a duplicate of this bug. ***