Bug 10927 - Dell laptops: reboot instead of resume
Summary: Dell laptops: reboot instead of resume
Status: CLOSED CODE_FIX
Alias: None
Product: Power Management
Classification: Unclassified
Component: Hibernation/Suspend (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Rafael J. Wysocki
URL:
Keywords:
: 10260 (view as bug list)
Depends on:
Blocks: 7216 10492
  Show dependency tree
 
Reported: 2008-06-16 07:31 UTC by Kirill A. Shutemov
Modified: 2008-07-13 10:48 UTC (History)
8 users (show)

See Also:
Kernel Version:
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
acpidump (92.70 KB, text/plain)
2008-06-16 07:32 UTC, Kirill A. Shutemov
Details
Debug patch (530 bytes, patch)
2008-06-17 11:53 UTC, Rafael J. Wysocki
Details | Diff
Debug patch #2 (643 bytes, patch)
2008-06-17 14:01 UTC, Rafael J. Wysocki
Details | Diff
Debug patch #3 (735 bytes, patch)
2008-06-18 07:29 UTC, Rafael J. Wysocki
Details | Diff
Debug patch #4 (730 bytes, patch)
2008-06-18 13:18 UTC, Rafael J. Wysocki
Details | Diff
Debug patch #5 (749 bytes, patch)
2008-06-18 13:39 UTC, Rafael J. Wysocki
Details | Diff
Segment normalization patch (1.64 KB, patch)
2008-06-18 15:23 UTC, H. Peter Anvin
Details | Diff
Segment normalization patch (corrected) (1.64 KB, patch)
2008-06-18 15:24 UTC, H. Peter Anvin
Details | Diff
Normalization patch #3 (1.87 KB, patch)
2008-06-19 09:02 UTC, H. Peter Anvin
Details | Diff
Normalization patch #4 (3.84 KB, patch)
2008-06-23 14:11 UTC, H. Peter Anvin
Details | Diff
Normalization patch #5 (3.84 KB, patch)
2008-06-23 14:13 UTC, H. Peter Anvin
Details | Diff
Patch changing the meaning of acpi_wakeup_address (1.05 KB, patch)
2008-06-24 03:02 UTC, Rafael J. Wysocki
Details | Diff
Patch changing the meaning of acpi_wakeup_address #2 (1.14 KB, patch)
2008-06-24 08:04 UTC, Rafael J. Wysocki
Details | Diff
Modified normalization patch #5 (3.90 KB, patch)
2008-06-24 09:10 UTC, Rafael J. Wysocki
Details | Diff
Normalization patch #6 (3.89 KB, patch)
2008-06-24 09:55 UTC, Kirill A. Shutemov
Details | Diff
Normalization patch with the acpi_wakeup_address change, #7 (4.71 KB, patch)
2008-06-24 12:46 UTC, Rafael J. Wysocki
Details | Diff

Description Kirill A. Shutemov 2008-06-16 07:31:22 UTC
Latest working kernel version: v2.6.25-611-gf49688d
Earliest failing kernel version: v2.6.25-612-ge44b7b7
Distribution: ALT Linux Sisyphus
Hardware Environment: Dell XPS M1210
After I type 'echo mem > /sys/power/state' the kernel seems to go to S3 but when I press the button to wake the system restarts like in case of a power on.

I have bisected the bug:

git-bisect start
# bad: [7775c9753b94fe429dc4323360d6502c95e0dd6e] Merge git://git.kernel.org/pub/scm/linux/kernel/git/kyle/parisc-2.6
git-bisect bad 7775c9753b94fe429dc4323360d6502c95e0dd6e
# good: [e478bec0ba0a83a48a0f6982934b6de079e7e6b3] Linux v2.6.18. Arrr!
git-bisect good e478bec0ba0a83a48a0f6982934b6de079e7e6b3
# good: [bc31d3b2c7d7f2a03721a05cb3c9a3ce8b1e2e5a] [IPSEC] ah: Remove keys from ah_data structure
git-bisect good bc31d3b2c7d7f2a03721a05cb3c9a3ce8b1e2e5a
# good: [bc31d3b2c7d7f2a03721a05cb3c9a3ce8b1e2e5a] [IPSEC] ah: Remove keys from ah_data structure
git-bisect good bc31d3b2c7d7f2a03721a05cb3c9a3ce8b1e2e5a
# good: [d21b05f101ae732d9bc322f13eea2f59c0aa60f5] rdma: SVCRMDA Header File
git-bisect good d21b05f101ae732d9bc322f13eea2f59c0aa60f5
# good: [d21b05f101ae732d9bc322f13eea2f59c0aa60f5] rdma: SVCRMDA Header File
git-bisect good d21b05f101ae732d9bc322f13eea2f59c0aa60f5
# bad: [7cea51be4e91edad05bd834f3235b45c57783f0d] security: fix up documentation for security_module_enable
git-bisect bad 7cea51be4e91edad05bd834f3235b45c57783f0d
# good: [989b0b930218661b504bbb056b309e2c7bcdfb86] Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/czankel/xtensa-2.6
git-bisect good 989b0b930218661b504bbb056b309e2c7bcdfb86
# good: [989b0b930218661b504bbb056b309e2c7bcdfb86] Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/czankel/xtensa-2.6
git-bisect good 989b0b930218661b504bbb056b309e2c7bcdfb86
# good: [fb8c7fb25d7d754a992481e9f763ec0b5889c4d9] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86
git-bisect good fb8c7fb25d7d754a992481e9f763ec0b5889c4d9
# good: [fb8c7fb25d7d754a992481e9f763ec0b5889c4d9] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86
git-bisect good fb8c7fb25d7d754a992481e9f763ec0b5889c4d9
# good: [72f74fa25a468f40781e452f1d2528395090d55f] include/asm-x86/ptrace.h: checkpatch cleanups - formatting only
git-bisect good 72f74fa25a468f40781e452f1d2528395090d55f
# bad: [07fe944e87d79f8d7e1b090913fe9f2ace78f41d] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx
git-bisect bad 07fe944e87d79f8d7e1b090913fe9f2ace78f41d
# good: [d7bb545d86825e635cab33a1dd81ca0ad7b92887] Merge branch 'semaphore' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc
git-bisect good d7bb545d86825e635cab33a1dd81ca0ad7b92887
# good: [746f2244065ddfbe0c5d339e309db4d2b48f185b] x86: unify arch/x86/kernel/mpparse_64.c
git-bisect good 746f2244065ddfbe0c5d339e309db4d2b48f185b
# good: [17c5aab5b34e351531466e35b154ca86db7d46a9] sata_mv cosmetics
git-bisect good 17c5aab5b34e351531466e35b154ca86db7d46a9
# bad: [9732b6112343df2872518ec6701c8ef729310a05] Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-kgdb
git-bisect bad 9732b6112343df2872518ec6701c8ef729310a05
# bad: [e44b7b7525ad9d43163ab5e60c784325419e0ea6] x86: move suspend wakeup code to C
git-bisect bad e44b7b7525ad9d43163ab5e60c784325419e0ea6
# good: [03056c88cf65ec8375753900246b36ae1c4b8a33] x86: remove superfluous initialisation in boot code.
git-bisect good 03056c88cf65ec8375753900246b36ae1c4b8a33
# good: [6107a7c4e2a871c37bb6c49e5e8286079f0968f9] x86: use cpu_online()
git-bisect good 6107a7c4e2a871c37bb6c49e5e8286079f0968f9
# good: [df96323dfaebdf7e17cdf0656096e6ab2158ec76] x86: section mismatch fixes, #1
git-bisect good df96323dfaebdf7e17cdf0656096e6ab2158ec76
# good: [f49688d459c5eaa62db3597cbfd3cb13e361d415] x86: coding style fixes to arch/x86/kernel/acpi/sleep.c
git-bisect good f49688d459c5eaa62db3597cbfd3cb13e361d415

Steps to reproduce:
1. echo mem > /sys/power/state
2. press power button.
Comment 1 Kirill A. Shutemov 2008-06-16 07:32:38 UTC
Created attachment 16506 [details]
acpidump
Comment 2 Rafael J. Wysocki 2008-06-16 07:55:25 UTC
So,

commit e44b7b7525ad9d43163ab5e60c784325419e0ea6
Author: Pavel Machek <pavel@suse.cz>
Date:   Thu Apr 10 23:28:10 2008 +0200

    x86: move suspend wakeup code to C

    Signed-off-by: Pavel Machek <pavel@suse.cz>
    Signed-off-by: H. Peter Anvin <hpa@zytor.com>
    Reviewed-by: Sam Ravnborg <sam@ravnborg.org>
    Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

is the first bad commit?
Comment 3 Kirill A. Shutemov 2008-06-16 08:02:48 UTC
Yes.
Comment 4 Rafael J. Wysocki 2008-06-16 08:08:14 UTC
Is the system (kernel) 32-bit or 64-bit?
Comment 5 Kirill A. Shutemov 2008-06-16 08:13:58 UTC
64-bit
Comment 6 Rafael J. Wysocki 2008-06-16 08:56:57 UTC
Can you boot the kernel with "acpi_sleep=s3_beep" in the command line and see if it beeps?
Comment 7 Kirill A. Shutemov 2008-06-16 09:26:36 UTC
It doesn't beep.
Comment 8 Rafael J. Wysocki 2008-06-16 09:43:29 UTC
Thanks.

Now, there are two problems possible and I'll have to figure a reliable way of verifying which of them manifests itself.

First, it's possible that we are called with a wrong values of CS:IP in the beginning, so the "movl $wakeup_stack_end, %esp" in arch/x86/kernel/acpi/realmode/wakeup.S places the stack at a wrong location.

Second, the trampoline segment might be wrong.

Would it be possible to try a 32-bit version of the failing kernel on this box?
Comment 9 Pavel Machek 2008-06-17 03:41:34 UTC
Hi!

> Now, there are two problems possible and I'll have to figure a reliable way
> of
> verifying which of them manifests itself.
> 
> First, it's possible that we are called with a wrong values of CS:IP in the
> beginning, so the "movl $wakeup_stack_end, %esp" in
> arch/x86/kernel/acpi/realmode/wakeup.S places the stack at a wrong location.
> 
> Second, the trampoline segment might be wrong.

One way to check that is place jmp $ at the begining -- if it hangs
instead of reboot, it reaches the beggining.

Then place jmp $ after %esp setup, and after first push to see how far
it goes...
									Pavel
Comment 10 Rafael J. Wysocki 2008-06-17 11:53:37 UTC
Created attachment 16528 [details]
Debug patch

So, if we comment out the "pushl $0 ; popfl" thing and the CS:IP are not as expected, we should detect a wrong signature and just halt?

Kirill, please apply the attached patch and see if anything changes, thanks.
Comment 11 Kirill A. Shutemov 2008-06-17 13:43:01 UTC
> Kirill, please apply the attached patch and see if anything changes, thanks.

No. It still reboots.
Comment 12 Rafael J. Wysocki 2008-06-17 14:01:38 UTC
Created attachment 16529 [details]
Debug patch #2

Thanks for testing.

Please apply this patch instead and see if it changes anything.
Comment 13 Kirill A. Shutemov 2008-06-17 14:51:37 UTC
I have played with loop inserting.

If we insert loop before the "pushl $0 ; popfl" it hangs, but it reboots if we insert after.

If we comment out the "pushl $0 ; popfl" it reachs "calll main"(hangs if we insert loop before it), but it reboots then.
Comment 14 Kirill A. Shutemov 2008-06-17 14:58:58 UTC
It hangs with Debug patch #2
Comment 15 H. Peter Anvin 2008-06-17 15:06:04 UTC
Okay, so the stack memory or stack pointer is buggered...
Comment 16 Rafael J. Wysocki 2008-06-17 15:10:26 UTC
(In reply to comment #14)
> It hangs with Debug patch #2 

This is consistent with your observations in Comment #13.

What happens if you comment out the "pushl $0 ; popfl" along with the "calll main", without the debug patch #2?
Comment 17 Rafael J. Wysocki 2008-06-17 15:12:02 UTC
(In reply to comment #15)
> Okay, so the stack memory or stack pointer is buggered...

Hm, well, why the value of DS is correct in that case?
Comment 18 H. Peter Anvin 2008-06-17 15:16:38 UTC
What about SS?
Comment 19 Rafael J. Wysocki 2008-06-17 15:22:16 UTC
My point is, if the value of DS is correct, then the value of SS should be correct too, as we set them both in the same way.  What am I missing?
Comment 20 H. Peter Anvin 2008-06-17 15:27:32 UTC
Looking at it, *how* do you know DS is actually correct?

It seems to me that all we have is the *assumption* that the entrypoint is at CS:0, which isn't inherently a given.
Comment 21 Rafael J. Wysocki 2008-06-17 15:34:04 UTC
(In reply to comment #20)
> It seems to me that all we have is the *assumption* that the entrypoint is at
> CS:0, which isn't inherently a given.

Indeed we assume so, but the signature test should fail if that's not the case and it doesn't.  Well, I'm certainly missing something ...
 
Comment 22 H. Peter Anvin 2008-06-17 15:49:11 UTC
How do you know it's not failing?
Comment 23 H. Peter Anvin 2008-06-17 15:50:35 UTC
Something that does come to mind is if it at all possible we're not actually in real mode when this code is entered?
Comment 24 Rafael J. Wysocki 2008-06-17 15:57:07 UTC
(In reply to comment #22)
> How do you know it's not failing?

Because if it had failed, the kernel with the patch from Comment #10 would have hanged, but it hangs with the patch from Comment #12 instead.
Comment 25 Rafael J. Wysocki 2008-06-17 15:59:28 UTC
(In reply to comment #23)
> Something that does come to mind is if it at all possible we're not actually
> in
> real mode when this code is entered?

In that case we would get an exception on an attempt to load DS, ES and SS with a code segment selector, I think.
Comment 26 H. Peter Anvin 2008-06-17 16:02:51 UTC
No, that's not true at all.  Most code segment selectors are also valid data segment selectors, but will trap on a write through that segment.
Comment 27 H. Peter Anvin 2008-06-17 16:04:29 UTC
Do we have any way to get data out of this box, e.g. a UART serial port?
Comment 28 Rafael J. Wysocki 2008-06-17 16:16:05 UTC
(In reply to comment #26)
> No, that's not true at all.  Most code segment selectors are also valid data
> segment selectors, but will trap on a write through that segment.

Well, that might be going on, then.

Still, I wonder why the old code worked in that case.  It wasn't so much different AFAICS ...
Comment 29 H. Peter Anvin 2008-06-17 16:24:08 UTC
*Grumble*

My $LARGE_POLITICAL_ENTITY for an ICE for stuff like this.

We need to get some info out.  What do we have available?
Comment 30 Pavel Machek 2008-06-18 01:33:22 UTC
> Something that does come to mind is if it at all possible we're not actually
> in
> real mode when this code is entered?

Hmm, that would be bad and against the spec. Not sure if that's
possible. Anything else than CS:0 would be against the spec, too,
IIRC.

Now... what is different from the "old" code... the old code did not
need a stack in real mode at all (if it did not need to call video
bios etc) -- so if you commented out pushl 0; popfl, it went back to
protected mode without using stack at all. (And yes, some strange
machines needed that).

I'm not sure what is going on here; one crazy explanation would be
"%rsp has high bits set"...

Hmm, can we get away with staying on whatever stack is provided to us
by BIOS?
									Pavel
Comment 31 Kirill A. Shutemov 2008-06-18 04:43:14 UTC
(In reply to comment #16)
> What happens if you comment out the "pushl $0 ; popfl" along with the "calll
> main", without the debug patch #2?

It reboots on "pushw $0", line 89.
Comment 32 Rafael J. Wysocki 2008-06-18 07:27:42 UTC
(In reply to comment #29)
> *Grumble*
> 
> My $LARGE_POLITICAL_ENTITY for an ICE for stuff like this.
> 
> We need to get some info out.  What do we have available?

1. We know that the value of DS is correct, so the value of SS should be correct too, because it's the same value, and therefore our assumption that we've been called at CS:0 is satisfied.

2. We know that write accesses to the stack segment cause reboots.
Comment 33 Rafael J. Wysocki 2008-06-18 07:29:13 UTC
Created attachment 16535 [details]
Debug patch #3

Kirill, please apply this patch instead of the debug patch #2 and see if that reboots.
Comment 34 H. Peter Anvin 2008-06-18 08:50:53 UTC
1. We know that the value of DS is correct, so the value of SS should be
correct too, because it's the same value, and therefore our assumption that
we've been called at CS:0 is satisfied.

2. We know that write accesses to the stack segment cause reboots.

---

Assuming that is correct, this means we're either in Protected Mode or we're in real mode, but a transition from protected mode to real mode was botched and there are broken segment attributes bits still in the descriptor registers.

The SMSW instruction can be used to tell if we're in protected mode or not.
Comment 35 H. Peter Anvin 2008-06-18 08:57:00 UTC
bugme-daemon@bugzilla.kernel.org wrote:
> I'm not sure what is going on here; one crazy explanation would be
> "%rsp has high bits set"...

Impossible, unless we're in *long* mode...

	-hpa
Comment 36 Kirill A. Shutemov 2008-06-18 10:44:27 UTC
(In reply to comment #33)
> Created an attachment (id=16535) [details]
> Debug patch #3
> 
> Kirill, please apply this patch instead of the debug patch #2 and see if that
> reboots.

No. It hangs. 
Comment 37 H. Peter Anvin 2008-06-18 11:09:11 UTC
bugme-daemon@bugzilla.kernel.org wrote:
> (In reply to comment #33)
>> Created an attachment (id=16535)
>  --> (http://bugzilla.kernel.org/attachment.cgi?id=16535&action=view)
>  [details]
>> Debug patch #3
>>
>> Kirill, please apply this patch instead of the debug patch #2 and see if
>> that
>> reboots.
> 
> No. It hangs. 

Okay, I'm confused about this patch; what was the intent with it?
Comment 38 Rafael J. Wysocki 2008-06-18 13:12:24 UTC
That was to check if it would trap on an attempt to write to a know location in the data segment.  It wouldn't, so it means DS is rather not loaded with a code segment selector and obviously the same applies to SS.

Botched tansition from protected mode to real mode seems to be most likely to me at the moment.
Comment 39 Pavel Machek 2008-06-18 13:15:19 UTC
Is there any chance to test 32-bit kernel? With this applied to 32-bit
kernel, we should be able to resume without touching stack at all.

(Doing it in 64-bit kernel is not so easy, because we put address on
the stack and ret, instead of jmp...)

diff --git a/arch/x86/kernel/acpi/realmode/wakeup.S b/arch/x86/kernel/acpi/realmode/wakeup.S
index f9b77fb..b91b2ea 100644
--- a/arch/x86/kernel/acpi/realmode/wakeup.S
+++ b/arch/x86/kernel/acpi/realmode/wakeup.S
@@ -42,22 +42,6 @@ _start:
 
 	movl	$wakeup_stack_end, %esp
 
-	/* Clear the EFLAGS */
-	pushl	$0
-	popfl
-
-	/* Check header signature... */
-	movl	signature, %eax
-	cmpl	$0x51ee1111, %eax
-	jne	bogus_real_magic
-
-	/* Check we really have everything... */
-	movl	end_signature, %eax
-	cmpl	$0x65a22c82, %eax
-	jne	bogus_real_magic
-
-	/* Call the C code */
-	calll	main
 
 	/* Do any other stuff... */
 
 
Comment 40 Rafael J. Wysocki 2008-06-18 13:18:28 UTC
Created attachment 16536 [details]
Debug patch #4

Now, let's try to write to the location pointed to by %sx.

Kirill, please test if it hangs with this patch instead of the previous one.
Comment 41 Kirill A. Shutemov 2008-06-18 13:24:31 UTC
(In reply to comment #40)
> Created an attachment (id=16536) [details]
> Debug patch #4
> 
> Now, let's try to write to the location pointed to by %sx.
> 
> Kirill, please test if it hangs with this patch instead of the previous one.
> 

$ make
  CHK     include/linux/version.h
  CHK     include/linux/utsrelease.h
  CALL    scripts/checksyscalls.sh
  CHK     include/linux/compile.h
  AS      arch/x86/kernel/acpi/realmode/wakeup.o
arch/x86/kernel/acpi/realmode/wakeup.S: Assembler messages:
arch/x86/kernel/acpi/realmode/wakeup.S:97: Error: `(%sp)' is not a valid base/index expression
make[3]: *** [arch/x86/kernel/acpi/realmode/wakeup.o] Памылка 1
make[2]: *** [arch/x86/kernel/acpi/realmode/wakeup.bin] Памылка 2
make[1]: *** [arch/x86/kernel/acpi] Памылка 2
make: *** [arch/x86/kernel] Памылка 2
Comment 42 H. Peter Anvin 2008-06-18 13:33:15 UTC
bugme-daemon@bugzilla.kernel.org wrote:
> Created an attachment (id=16536)
>  --> (http://bugzilla.kernel.org/attachment.cgi?id=16536&action=view)
> Debug patch #4
> 
> Now, let's try to write to the location pointed to by %sx.

Uhm, (%sp) is not a valid addressing form.  Only %bx, %bp, %si, and %di 
can be used to form 16-bit addresses; addresses which include %bp are 
implicitly %ss-based.

	-hoa
Comment 43 Rafael J. Wysocki 2008-06-18 13:39:32 UTC
Created attachment 16537 [details]
Debug patch #5

My bad, sorry.  This one should compile, hopefully.
Comment 44 Kirill A. Shutemov 2008-06-18 13:59:27 UTC
(In reply to comment #43)
> Created an attachment (id=16537) [details]
> Debug patch #5

It hangs too.
Comment 45 H. Peter Anvin 2008-06-18 14:02:12 UTC
bugme-daemon@bugzilla.kernel.org wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=10927
> 
> 
> 
> 
> 
> ------- Comment #44 from kirill@shutemov.name  2008-06-18 13:59 -------
> (In reply to comment #43)
>> Created an attachment (id=16537)
>  --> (http://bugzilla.kernel.org/attachment.cgi?id=16537&action=view)
>  [details]
>> Debug patch #5
> 
> It hangs too.

What happens if you change "(%bx)" to "%ss:(%bx)"?

	-hpa
Comment 46 Kirill A. Shutemov 2008-06-18 14:16:53 UTC
(In reply to comment #45)
> What happens if you change "(%bx)" to "%ss:(%bx)"?

Reboot.
Comment 47 H. Peter Anvin 2008-06-18 14:25:58 UTC
bugme-daemon@bugzilla.kernel.org wrote:
> (In reply to comment #45)
>> What happens if you change "(%bx)" to "%ss:(%bx)"?
> 
> Reboot.

Okay, it's definitely %ss being buggered, and the only thing that can 
bugger up %ss but not %ds downstream of that code is a botched 
transition out of protected mode.  Rewriting all the real-mode code to 
not use the stack is obviously not realistic (can't make any BIOS calls 
that way), but it would be possible to dart into protected mode using 
only %cs: references to set up a GDT, then setting all the segments to 
sane, real-mode-like segments, and darting back out.

	-hpa
Comment 48 Rafael J. Wysocki 2008-06-18 14:53:43 UTC
I wonder, what happens if we load %ss with 0 initially and then with the %cs value?  The null selector is invalid in protected mode so that might cause the CPU to wipe out the hidden part of the register ...
Comment 49 H. Peter Anvin 2008-06-18 15:03:01 UTC
bugme-daemon@bugzilla.kernel.org wrote:
> ------- Comment #48 from rjw@sisk.pl  2008-06-18 14:53 -------
> I wonder, what happens if we load %ss with 0 initially and then with the %cs
> value?  The null selector is invalid in protected mode so that might cause
> the
> CPU to wipe out the hidden part of the register ...

Absolutely nothing.  If you were in protected mode, loading %ds and %ss 
with the same value would set the attribute bits to the same thing. 
Since the attribute bits didn't change, you're in real mode, and there 
are no null selectors in real mode.

	-hpa
Comment 50 Pavel Machek 2008-06-18 15:11:05 UTC
Hidden parts should really be refreshed as soon as anything is written there, but perhaps this helps? (Implementation of Rafael's idea)

--- a/arch/x86/kernel/acpi/realmode/wakeup.S
+++ b/arch/x86/kernel/acpi/realmode/wakeup.S
@@ -34,6 +34,11 @@ _start:
        cli
        cld
 
+       xorw    %ax, %ax
+       movw    %ax, %ds
+       movw    %ax, %es
+       movw    %ax, %ss
+
        /* Set up segments */
        movw    %cs, %ax
        movw    %ax, %ds
Comment 51 Rafael J. Wysocki 2008-06-18 15:13:09 UTC
(In reply to comment #49)

Fun, fun. :-(

Okay, perhaps I'll be able to write this patch, but since I haven't done anything similar for years, I'll need some assistance.
Comment 52 H. Peter Anvin 2008-06-18 15:23:24 UTC
Created attachment 16538 [details]
Segment normalization patch

Segment normalization patch.  Try this one on for size.
Comment 53 H. Peter Anvin 2008-06-18 15:24:42 UTC
Created attachment 16539 [details]
Segment normalization patch (corrected)

Correction to segment normalization patch...
Comment 54 H. Peter Anvin 2008-06-18 15:25:45 UTC
bugme-daemon@bugzilla.kernel.org wrote:
> Created an attachment (id=16538)
>  --> (http://bugzilla.kernel.org/attachment.cgi?id=16538&action=view)
> Segment normalization patch
> 
> Segment normalization patch.  Try this one on for size.

Don't use this one... minor thinko.

	-hpa
Comment 55 H. Peter Anvin 2008-06-18 20:05:03 UTC
Note: if the "corrected" patch works, we should tidy it up to minimize the dependencies.  In particular, if we let the kernel pre-patch the segmentation information, we can get rid of any dependency on writes.  We can reduce it to requiring only that one segment is readable; it's not 100% clear which one to use -- DS,ES,FS,GS could be null, SS could have too short of a limit, and CS could be non-readable.  CS is still probably the safest one, though.
Comment 56 Kirill A. Shutemov 2008-06-18 22:45:43 UTC
(In reply to comment #53)
> Created an attachment (id=16539) [details]
> Segment normalization patch (corrected)
> 
> Correction to segment normalization patch...
> 

$ make 
  CHK     include/linux/version.h
  CHK     include/linux/utsrelease.h
  CALL    scripts/checksyscalls.sh
  CHK     include/linux/compile.h
  LD      arch/x86/kernel/acpi/realmode/wakeup.elf
arch/x86/kernel/acpi/realmode/wakeup.o: In function `wakeup_gdt':
arch/x86/kernel/acpi/realmode/wakeup.S:(.data+0x0): undefined reference to `wakeup_gdt_len'
make[3]: *** [arch/x86/kernel/acpi/realmode/wakeup.elf] Error 1
make[2]: *** [arch/x86/kernel/acpi/realmode/wakeup.bin] Error 2
make[1]: *** [arch/x86/kernel/acpi] Error 2
make: *** [arch/x86/kernel] Error 2
Comment 57 H. Peter Anvin 2008-06-18 23:00:28 UTC
bugme-daemon@bugzilla.kernel.org wrote:
> arch/x86/kernel/acpi/realmode/wakeup.o: In function `wakeup_gdt':
> arch/x86/kernel/acpi/realmode/wakeup.S:(.data+0x0): undefined reference to
> `wakeup_gdt_len'

Just add the line:

wakeup_gdt_len = 3*8

... anywhere in the file...

	-hpa
Comment 58 Kirill A. Shutemov 2008-06-18 23:24:10 UTC
It works!
Comment 59 Kirill A. Shutemov 2008-06-18 23:39:54 UTC
I have problem with video restore so I have tried to boot with acpi_sleep=s3_bios and acpi_sleep=s3_mode. It reboots in both case. :( acpi_sleep=s3_beep works fine. Is it related to this bug?
Comment 60 H. Peter Anvin 2008-06-18 23:46:22 UTC
This is with the patch?
Comment 61 Kirill A. Shutemov 2008-06-18 23:55:18 UTC
Yes.
Comment 62 H. Peter Anvin 2008-06-19 08:41:17 UTC
It could be related - clearly the resume code is entered with state scrambled enough that the segment descriptors aren't even sanitized, there is other state that could be in a bizarre state.

However, we should almost certainly explicitly set up a real-mode IDT; that might help and is something we can do easily enough.
Comment 63 H. Peter Anvin 2008-06-19 09:02:20 UTC
Created attachment 16549 [details]
Normalization patch #3

Please try this patch.  It is the segment normalization patch #2 with the addition of explicitly setting the IDT.
Comment 64 Kirill A. Shutemov 2008-06-19 09:28:45 UTC
(In reply to comment #63)
> Created an attachment (id=16549) [details]
> Normalization patch #3
> 
> Please try this patch.  It is the segment normalization patch #2 with the
> addition of explicitly setting the IDT.
> 

It doesn't reboot, but it hangs with acpi_sleep=s3_bios. s3_mode doesn't hang or reboot system, but screen still black.
Comment 65 Pavel Machek 2008-06-19 13:21:54 UTC
Well, see suspend.sf.net for video debugging instructions.

Unless video resume worked before, this is separate bug.
Comment 66 Kirill A. Shutemov 2008-06-19 14:09:19 UTC
Ok. Will this patch be in 2.6.26?
Comment 67 Rafael J. Wysocki 2008-06-19 14:39:12 UTC
(In reply to comment #63)
> Created an attachment (id=16549) [details]
> Normalization patch #3
> 
> Please try this patch.  It is the segment normalization patch #2 with the
> addition of explicitly setting the IDT.

hpa, many thanks for the patch.  If I had tried to prepare one, it would have taken much more time.

It is a regression fix, so it would be nice to push it for 2.6.26.  Will you take care of it or should I push it upstream?
Comment 68 Pavel Machek 2008-06-19 14:50:28 UTC
> > Created an attachment (id=16549)
>  --> (http://bugzilla.kernel.org/attachment.cgi?id=16549&action=view)
>  [details]
> > Normalization patch #3
> > 
> > Please try this patch.  It is the segment normalization patch #2 with the
> > addition of explicitly setting the IDT.
> 
> hpa, many thanks for the patch.  If I had tried to prepare one, it would have
> taken much more time.
> 
> It is a regression fix, so it would be nice to push it for 2.6.26.  Will you
> take care of it or should I push it upstream?

yes, it is a regression fix, but it works around extremely broken
BIOS, and I'd call it slightly risky: we go (cripled?) real mode ->
protected mode -> real mode -> protected mode now.

(Note that even old code would die horribly as soon as you enabled
acpi_sleep=s3_bios/s3_mode...)

I'd prefer this to go into 2.6.27.

(And yes, hpa's assembly-coding skills are wonderful).
Comment 69 H. Peter Anvin 2008-06-19 15:07:51 UTC
I'd like to clean it up first to not require a writable code segment.
Comment 70 Rafael J. Wysocki 2008-06-19 15:56:52 UTC
On Thursday, 19 of June 2008, bugme-daemon@bugzilla.kernel.org wrote:
> ------- Comment #68 from pavel@suse.cz  2008-06-19 14:50 -------
> 
> > > Created an attachment (id=16549)
>  --> (http://bugzilla.kernel.org/attachment.cgi?id=16549&action=view)
> >  --> (http://bugzilla.kernel.org/attachment.cgi?id=16549&action=view)
> [details]
> > > Normalization patch #3
> > > 
> > > Please try this patch.  It is the segment normalization patch #2 with the
> > > addition of explicitly setting the IDT.
> > 
> > hpa, many thanks for the patch.  If I had tried to prepare one, it would
> have
> > taken much more time.
> > 
> > It is a regression fix, so it would be nice to push it for 2.6.26.  Will
> you
> > take care of it or should I push it upstream?
> 
> yes, it is a regression fix, but it works around extremely broken
> BIOS, and I'd call it slightly risky: we go (cripled?) real mode ->
> protected mode -> real mode -> protected mode now.

And what the risk is, actually?

> (Note that even old code would die horribly as soon as you enabled
> acpi_sleep=s3_bios/s3_mode...)
> 
> I'd prefer this to go into 2.6.27.

Shall we let the upstream decide, perhaps?
Comment 71 Rafael J. Wysocki 2008-06-19 15:57:04 UTC
(In reply to comment #69)
> I'd like to clean it up first to not require a writable code segment.

OK
Comment 72 Jason Riedy 2008-06-20 17:55:11 UTC
BTW, the patch fixes my Dell D620 as well.
Comment 73 Adrian Bunk 2008-06-22 02:38:21 UTC
Handled-By      : H. Peter Anvin <hpa@kernel.org> 
Comment 74 Rafael J. Wysocki 2008-06-22 11:00:01 UTC
Patch : http://bugzilla.kernel.org/attachment.cgi?id=16549&action=view
Comment 75 Kirill A. Shutemov 2008-06-22 13:14:43 UTC
Tested-by: Kirill A. Shutemov <kirill@shutemov.name>
Comment 76 H. Peter Anvin 2008-06-23 14:11:08 UTC
Created attachment 16591 [details]
Normalization patch #4

Updated patch which doesn't require a writable code segment, and with proper metadata for submission.

Please test this out, and if it works, please add it to the ACPI tree.
Comment 77 H. Peter Anvin 2008-06-23 14:13:54 UTC
Created attachment 16592 [details]
Normalization patch #5

Erk, fix trivial braindamage
Comment 78 Kirill A. Shutemov 2008-06-23 16:18:09 UTC
(In reply to comment #77)
> Created an attachment (id=16592) [details]
> Normalization patch #5
> 
> Erk, fix trivial braindamage
> 

It doesn't work. My laptop reboots.
Comment 79 H. Peter Anvin 2008-06-23 16:25:10 UTC
Okay... wtf... perhaps I'm confused as to the meaning of acpi_wakeup_address?  In particular, what variable do I use to get the *physical* address of the wakeup page in low memory?
Comment 80 Rafael J. Wysocki 2008-06-24 02:33:17 UTC
acpi_wakeup_address is a virtual address (virt_to_phys() is applied to it before it's passed to the firmware).
Comment 81 Rafael J. Wysocki 2008-06-24 02:42:57 UTC
And we don't cache the physical address itself anywhere AFAICS.
Comment 82 Rafael J. Wysocki 2008-06-24 03:02:25 UTC
Created attachment 16596 [details]
Patch changing the meaning of acpi_wakeup_address

Please apply this patch additionally and see if that helps.
Comment 83 Kirill A. Shutemov 2008-06-24 05:54:28 UTC
(In reply to comment #82)
> Created an attachment (id=16596) [details]
> Patch changing the meaning of acpi_wakeup_address
> 
> Please apply this patch additionally and see if that helps.
> 

No. Still reboots.
Comment 84 Rafael J. Wysocki 2008-06-24 08:04:33 UTC
Created attachment 16598 [details]
Patch changing the meaning of acpi_wakeup_address #2

Please boot with this patch applied instead of the previous one and see what is printed after "ACPI: Wake-up address set to ".
Comment 85 Rafael J. Wysocki 2008-06-24 09:10:44 UTC
Created attachment 16599 [details]
Modified normalization patch #5

This (slightly modified) normalization patch on top of the patch from Comment  #84 works correctly on my test box.

Kirill, can you test it, please?
Comment 86 Kirill A. Shutemov 2008-06-24 09:51:53 UTC
Yes, it works.
Comment 87 Kirill A. Shutemov 2008-06-24 09:55:03 UTC
Created attachment 16601 [details]
Normalization patch #6

Strip trailing whitespace.
Tested-by: Kirill A. Shutemov <kirill@shutemov.name>
Comment 88 H. Peter Anvin 2008-06-24 09:59:36 UTC
Okay, I'm confused... what changed, other than formatting?
Comment 89 H. Peter Anvin 2008-06-24 10:02:25 UTC
Nevermind, see the parenthization error.  That'd do it.
Comment 90 H. Peter Anvin 2008-06-24 10:03:23 UTC
Rafael: will you push this upstream?  (I'd rather it flows to you even though I'll come right back our way.)
Comment 91 Rafael J. Wysocki 2008-06-24 12:46:21 UTC
Created attachment 16604 [details]
Normalization patch with the acpi_wakeup_address change, #7

OK

Now, because it's necessary to change the meaning of acpi_wakeup_address for the normalization patch to work, I've merged the normalization patch #6 with the patch from Comment #82 and the result is attached.  That's the one I'm going to push upstream.

I tested it, but Kirill please verify that it works for you.
Comment 92 H. Peter Anvin 2008-06-24 12:50:30 UTC
Well, it's not technically necessary, we could to the virt_to_phys() in acpi_save_state_mem() just as well.
Comment 93 Rafael J. Wysocki 2008-06-24 12:52:22 UTC
Whatever.  Still, it does make sense to cache the physical address, IMO.
Comment 94 Kirill A. Shutemov 2008-06-24 13:27:56 UTC
(In reply to comment #91)
> Created an attachment (id=16604) [details]
> Normalization patch with the acpi_wakeup_address change, #7
>
> I tested it, but Kirill please verify that it works for you.

The patch works properly.
Comment 95 Rafael J. Wysocki 2008-06-24 14:25:29 UTC
Thanks for testing, the patch is in -mm, so hopefully it'll be merged soon.
Comment 96 Kirill A. Shutemov 2008-06-24 14:28:17 UTC
Thanks a lot.
Comment 97 Jason Riedy 2008-06-24 20:18:21 UTC
Little late in responding, but the latest patch works for a D620.  Thank you!
Comment 98 Len Brown 2008-06-26 18:48:06 UTC
marking RESOLVED, as the patch in comment #91 seems to work,
(and is queued in the ACPI test tree)
Comment 99 H. Peter Anvin 2008-06-26 18:55:11 UTC
Sadly, the patch has been reported (by Ingo Molnar) to cause a regression on ASUS A8N-E.
Comment 100 Rafael J. Wysocki 2008-06-27 06:14:04 UTC
But Ingo said later that the problew was unrelated to this patch.
References : http://marc.info/?l=linux-kernel&m=121446903218446&w=4
Comment 101 H. Peter Anvin 2008-06-29 16:46:33 UTC
Okay, then.  Then I consider this issue closed.
Comment 102 Rafael J. Wysocki 2008-07-05 13:00:00 UTC
Regressions list annotation:
Handled-By : Rafael J. Wysocki <rjw@sisk.pl>
Patch : http://bugzilla.kernel.org/attachment.cgi?id=16604&action=view
Comment 104 Rafael J. Wysocki 2008-07-13 10:48:20 UTC
*** Bug 10260 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.