Created attachment 44962 [details]
kernel configuration used
Trying v2.6.38-rc1 I noticed the following behaviour with suspending (to ram, i.e. no hibernate) on Lenovo Thinkpad X201:
- Entering suspend works fine.
- After suspending, pressing the power button wakes up the computer (LEDs light up)
- A second or so thereafter, it turns off completely.
- A few more seconds later, computer turns itself on again and boots freshly.
Bisecting lead me to commit 5bd5a452662bc37c54fb6828db1a3faf87e6511c (x86: Add NX protection for kernel data). The commit mentions extensions to the CONFIG_DEBUG_RODATA option. I had this option turned on until now; turning it off doesn't make the described suspend bug go away however. Also checked with v2.6.38-rc2, no change.
I'll attach my .config. Please advise when I can help further.
Please test if commit 691513f70d3957939a318da970987b876c720861 works for you.
Specifically, 5bd5a452662bc37c54fb6828db1a3faf87e6511c and 84e1c6bb38eb318e456558b610396d9f1afaabf0 are known broken (and fixed immediately thereafter). It might be useful to redo the bisect, using "git bisect skip" if you land on either of those two commits. (If this is meaningful is hard to know without having seen the exact bisect sequence.)
Could you update the Component ?
Your not on 64 bit kernel but 32 bit one (so 691513f7 will be useless).
I already got another report of falling resume
http://marc.info/?l=linux-kernel&m=129545865717572&w=2, I will look into this.
I suspect this come from the fact that 32 bit wakeup code doesn't use
trampoline, but does it own real mode to protected mode transition.
This is reproducible in qemu, so it should be easy to track what's going on.
Ok, kernel triple fault at jmp pmode_return in wakeup.S
That's because initial_page_table look like :
0x00000000-0x000a0000 640K RW GLB NX pte
0x000a0000-0x00100000 384K RW GLB x pte
0x00100000-0x00355000 2388K ro GLB x pte
0x00355000-0x00424000 828K ro GLB NX pte
0x00424000-0x00600000 1904K RW GLB NX pte
0x00600000-0x07e00000 120M RW PSE GLB NX pmd
0x07e00000-0x07ffd000 2036K RW GLB NX pte
and we are running at 0x9a000 (in nx aera).
I will see how to fix that tomorrow.
All low memory "trampolines" need to be marked !NX. I'm working on a unified trampoline patch, but in the meantime we need a fix for this.
Its depends of the trampoline :
for example the smp 32 bits trampoline doesn't enable paging, so it doesn't need to be !NX.
I'll attach a possible fix.
Created attachment 45102 [details]
What is the point in flipping this back and forth? It seems easier to mark it X when it is allocated and leave it like that.
Created attachment 45192 [details]
This patch make the first 1MB not NX when ACPI_SLEEP is enabled. This should fix the resume issue on 32 bit kernel.
We could make the first 1MB NX if we change the wake trampoline logic to do something like smp trampoline (or even use it). But that's for another patch. Let's fix the regression for the moment.
This is OK as a test patch, but the proper patch should be a tight bound: it should avoid the trampoline reserved areas (and no, it probably will not use the SMP trampoline per se.) However, given the trampoline allocation unification patchset I'm working on it might be easier to wait for that.
ad #1: I'm afraid it does not work. Compiling at this commit shows the same behaviour as 5bd5a45 (as does tag v2.6.38-rc2).
ad #9: I applied this patch on top of 691513f; it made suspending work again.
ad #10: If you need your patchset tested when it's ready, I'd be happy to help.
I posted an *incomplete* version of this patchset (incomplete because reboot is not yet converted, there are modprobe false positive warnings, and it is all one big patch at this point) at:
I applied your unified patchset from #13 to v2.6.38-rc2 and compiled; the kernel failed to wake up from suspend. Would it make sense to test the patch applied upon a different commit?
H. Peter Anvin: but your trampoline stuff won't be ready for 2.6.28 ?
So what could we do ?
Either we apply a patch like the one I proposed in #9, either we revert the NX protection for 32 bit version (but that means path #9 + not doing it for data/bss).
Don't you think #9 is ok for 2.6.28, and we will do a cleaner stuff when your trampoline stuff will be in the next stable version ?
PS : I tried to make a patch that only make the wakeup trampoline not NX, but it failed for an unknown reason. And because I am a bit busy ATM and it is only a temporary solution I got some the #9 patch.
*** Bug 27662 has been marked as a duplicate of this bug. ***
Matthieu, can you prepare a patch reverting the NX change for x86_32, please?
Doesn't #9 patch is okay ?
This revert only the problematic part : ie setting NX for the 1 MB were the wakeup trampoline live.
I have another patch that really fix the problem (and merge 32 and 64 bit wakeup code) :
http://castet.matthieu.free.fr/tmp/wake32_use_trampoline.diff , but I think it is too early for 2.6.28.
I don't really see right now why the change in is_kernel_text() is necessary
(care to explain?). Besides, it looks like Peter wanted something more
If your trampoline merging code is _tested_ and known to work, I don't see
much reason deferring it. It's not exactly a one-liner, but also it's simple
The patch from comment 9 fixes *both* this regression, and the regression regarding S3 suspend I described later in a mail on LKML.
The x86: Add NX protection for kernel data , make is_kernel_text() start at _text instead of PAGE_OFFSET.
Doing that break the current wakeup trampoline code. Why because it enables pagination while it is in low memory (at address < 1MB). The initial_page_table is load, but 1MB memory is now NX, so we do a triple fault.
I tested the trampoline merging code only on qemu (smp=1 and smp=2) and a x86_32 laptop.
Handled-By : CASTET Matthieu <email@example.com>
Patch : https://bugzilla.kernel.org/attachment.cgi?id=45192
Ignore-Patch : https://bugzilla.kernel.org/attachment.cgi?id=45192
Handled-By : H. Peter Anvin <firstname.lastname@example.org>
Patch : https://lkml.org/lkml/2011/2/7/55
I confirm that, with patch from #23, waking up from suspend works.
BTW: At the place where the patch from #23 (d344e38b2c151ca5e5e39f562017127e93912528) entered mainline, a different commit(5d1d0cc87fc0887921993ea0742932e0c8adeda0, bug #28802) breaks resume again (a bit differently though). But cherry-picking d344e38 resolves the issue until 5d1d0cc.
What kernel are you referring to?
Sorry ;) I mean Linus' "main" 2.6 tree, i.e. the one from git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
What do you mean by "cherry-picking d344e38", then?
Well, because this bug here prevents resuming even before the other (i915-related) bug kicks in, masking the other bug; bisecting would normally not be useful. So, at every bisect step, I "git cherry-pick"ed the patch from this bug report (that the trampoline is fixed) and I can concentrate on the i915 bug. After testing, I fed git bisect with "good/bad HEAD^".
I see, thanks. Let's track the new problem in bug #28802, then.
*** Bug 28342 has been marked as a duplicate of this bug. ***
Fixed by http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=d344e38b2c151ca5e5e39f562017127e93912528