Distribution: Debian Testing Hardware Environment: VIA KT400/KT400A/KT600 Software Environment: Problem Description: waking up from s3 state leads to instant reboot Steps to reproduce: echo -n "mem" > /sys/power/state then lid button or power button to wake up the laptop.
Created attachment 3849 [details] lspci from my laptop
Created attachment 3850 [details] lspci -V from my laptop
From a long discussion/debug process on the acpi-devel mailing list was pointed out: the reboot is issued from a pushl $0 in arch/i386/kernel/acpi/wakeup.S Code from arch/i386/kernel/acpi/wakeup.S: [...] mov $(wakeup_stack - wakeup_code), %sp # Private stack is needed for ASUS board movw $0x0e00 + 'S', %fs:(0x12) pushl $0 ###THIS LINE CAUSES REBOOT! # Kill any dangerous flags popfl movl real_magic - wakeup_code, %eax [...] Commenting out pushl $0 and popfl wakeup code goes on until another issue blocks it later (more on this to be added as soon as we point something out on the acpi-devel mailing list) Two patches were proposed on the ml (wakeup_address.patch and wakeup_gdt.patch) but they don't change things on this laptop (while they may fix other wakup problems).
so the suspend works, but when you wake up the system it boots instantly, or does it start to resume, and then reboot?
it starts resuming, enters wakeup.S and the pushl $0 issues instant reboot; beeping dead-loops were used to narrow it down to a single line. NOTE: a workaround other than removing those two lines has to be found, but other issues freeze (not reboot) this unlucky laptop upon s3 resume. As soon as i understand a bit more about this i'll file a new bugzilla entry.
How could pushl $0 cause reboot in real mode? I guess it is in protect mode.
I suggest you to try the following patch to see if "pushl $0" still cause reboot. --- linux-2.6.10-rc2/arch/i386/kernel/acpi/wakeup.S.orig 2004-12-02 18:18:01.132147992 -0800 +++ linux-2.6.10-rc2/arch/i386/kernel/acpi/wakeup.S 2004-12-02 18:17:34.872140120 -0800 @@ -20,6 +20,8 @@ wakeup_code_start = . .code16 + xor %eax, %eax + movl %eax, %cr0 movw $0xb800, %ax movw %ax,%fs movw $0x0e00 + 'L', %fs:(0x10)
Sorry for the late answer, but university has been pressing in the last weeks.. Luming, this patch doesn't help, reboot occurs as usual :( could i post other useful info about the system? maybe the DSDT?
Hmm, strange. Would you try to change pushl $0 ###THIS LINE CAUSES REBOOT! # Kill any dangerous flags with xorw %ax, %ax pushw %ax And see what could happen? Not sure if it can help, I just wonder why my objdump shows that GNU assembler just emit 6A instead of 68 for "pushl $0". PS: 6A Push imm8 68 Push imm16/imm32
no.. usual hated reboot... by the way objdump gives this code... (this is the object file without the last xorw %ax %ax push %ax modification you suggested) ************** linux-2.6.10-rc2-bk9/arch/i386/kernel/acpi/wakeup.o: file form at elf32-i386 Disassembly of section .text: 00000000 <wakeup_start>: ... 22: 53 push %ebx 23: 0e push %cs 24: 66 6a 00 pushw $0x0 27: 66 9d popfw ...
Could this bug be related to bug 3691? Diagnostic information about my machine is posted in that bug. I have an averatec laptop and removing those lines worked for a while (about 20 reboots) and I could resume from sleep. But now my laptop has started to have other weird behaviors if the pushl/popl are missing. Sometimes it reboots, sometimes it hangs during/after resume. And the behavior seems to change depending whether or not I boot into WinXP (reboot of course) and then try to suspend/resume in linux. I was asking around about what could cause a reboot, and someone suggested that there may be a triple fault triggering a reboot. How can we debug this further? I'm willing to help in anyway I can. But I lack the technical experience to troubleshoot this myself. Thanks!
I want to know what is the CPU type? It is rather strange a single PUSH $0 could cause problem in real mode. because, according to IA32 manual, In the real-address mode, if the ESP or SP register is 1 when the PUSH instruction is executed, the processor shuts down due to a lack of stack space. Or if the new value of the SP or ESP register is outside the stack segment limit. I want to see 2 thing: 1. add printk(" virt_to_phys: acpi_wakeup_address %p \n", acpi_wakeup_address); 2. comment out "mov $(wakeup_stack - wakeup_code), %sp" in wakeup_code
CPU is a mobile Athlon XP 2600+ i added the printk right after the point where it reports supported sleep states, and i get the following in dmesg: virt_to_phys: acpi_wakeup_address c0001000 commenting out mov $(wakeup_stack - wakeup_code), %sp nothing changes, i still get a reboot removing the pushl statement i'm able to resume (boot with vga=normal and vbetool to get screen back to life), but problems with the hard disk occur then.. more following on the mailing list (but that's a different bug)
Created attachment 4790 [details] contents of /proc/cpuinfo
I've been playing with suspend/resume and booting different ways before I do that. It would appear that my laptop doesn't always run the wakeup code, but if I boot into WinXP do a suspend/resume and then boot into linux and try the suspend/resume it almost always runs the wakeup code. So I think my machine is experiencing two bugs here. 1) It doesn't always run the wakeup code 2) It reboots on pushl. I added the printout of the acpi_wakeup_address to my code and I get the same address. Which after virt_to_phys it becomes 00001000. Unfortunately I cannot get my laptop to make sound through the PC speaker so using that debugging trick will not help on my system.
Could you check PE bit at CR0? I suspect the CPU is NOT in real mode.
"Could you check PE bit at CR0? I suspect the CPU is NOT in real mode." Forgive my ignorance, but I don't know how to do that. If your solution involves something of the form "if PE then BEEP" Keep in mind that my laptop can't beep for some reason. But perhaps an infinite loop or something else would be helpful. I'm really thinking that the machine is just not being prepared properly. Why else would booting into WinXP before the suspend/resume help? I wonder if that must be causing some sort of initialization or something is getting left over in some weird place in memory. If that were the case how could we figure out what is going on? Maybe there is a fairly standard plan of attack? Some way to rule it out maybe?
How about if not in real-mode ,Then skip pushl $0 ; popfl
I changed the pushl $0; popl lines to the following: movl %cr0, %eax and $0x0001, %ax test $0x01, %ax je skip_push_pop pushl $0 popl skip_push_pop: If I understand that code it will only skip that if PE is set in cr0. I don't know asm so I could be wrong. Please check that code. Here are the restults of testing this: 1) Boot into WinXP do a suspend/resume/reboot. This makes sure that my machine will enter the wakeup code. 2) Boot into modifed kernel, echo mem >/sys/power/state 3) Wait for the machine to sleep and then hit the power button. The machine resumes correctly. So this means the laptop is in protected mode right?
Please check if below change fixes the issue: - mov $(wakeup_stack - wakeup_code), %sp + movl $(wakeup_stack - wakeup_code), %esp I suspect the high bits of esp isn't cleared, which cause any kind of 'push' (Jason said 'pushw 0' can't survive also) reboots the system.
If so, could you disable PE at the begin of wakeup_code.
Please check if below change fixes the issue: - mov $(wakeup_stack - wakeup_code), %sp + movl $(wakeup_stack - wakeup_code), %esp David, I tried that code with the same 3 step test and the machine rebooted.
Luming, I don't know how to disable PE. I looked it up on the web and found it to be quite a confusing process for someone with no experience. See section 14.5: http://library.n0i.net/hardware/intel80386-programmer-manual/Chap14.html
movl %cr0, %eax xorl $0xFFFFFFFE, %eax movl %eax, %cr0
I reverted wakeup.S back to the original version again, and made just the change to clear the PE bit. Then I did my 3 step test and the machine reboots.
Please post your changes before push and pop
Right now the only change is adding those 3 lines: wakeup_code: wakeup_code_start = . .code16 movl %cr0, %eax xorl $0xFFFFFFFE, %eax movl %eax, %cr0 [...] The rest is unchanged from linux-2.6.12-rc1.
Perhaps this would be easier to do over IRC. If so, I'm currentl logged into #kernel on irc.freenode.net. My nick is lispy.
I tried the modifications that Lumin suggested on irc. But my machine still rebooted. As a sanity check I changed the code back to the version that skips the push/pop if PE is set and the machine suspends/resumes again. Hardcoding CS and so on to 0x100 and forcing PE to be disabled does not help. This is quite odd. Is it possible that the bios does not set the machine to real-mode as it should? Perhaps the machine expects to be put into real-mode before sleep? Speaking of sleep it is very late in my timezone so I will work on this more another time. Thanks again.
Do you find out where the reboot happen?
wakeup_code: wakeup_code_start = . .code16 movw $0xb800, %ax movw %ax,%fs movw $0x0e00 + 'L', %fs:(0x10) cli cld # setup data segment movw $0x100, %ax movw %ax, %cs #### The reboot happens on this line movw %ax, %ds movw %ax, %ss mov $(wakeup_stack - wakeup_code), %sp # Private stack is needed for ASUS board movw $0x0e00 + 'S', %fs:(0x12) # Make sure the PE bit is not set movl %cr0, %eax xorl $0xFFFFFFFE, %eax movl %eax, %cr0 pushl $0 # Kill any dangerous flags popfl
From what I understand, movw %ax, %cs causes a reboot because the machine is in protected mode. So I tried moving the code that disables PE bit to before setting up the data segment. But the machine reboots on the line that loads into %cr0 if I do that. I think we are not understanding the problem. I think a couple things could be happening 1) The machine is not being put to sleep properly for my bios so it does not resume properly. This could mean that we need to force the machine to real-mode before sleeping. 2) My bios violates the ACPI spec, and we need to either check if we are in protected mode and deal with that case or try a different DSDT.
Could you try this: IF PE mode then ljmpl $__KERNEL_CS,$wakeup_pmode_return
To comment #19, the code is wrong. So, push-pop will be ignored if real mode.
Please test SMP kernel (see bug 5107) Please test UP kernel with lapic.
I;m assuming this issue is already fixed. Please reopen this bug if: - it is still present in recent 2.6 kernels and - you can provide the requested information.
This has not been fixed. Please reopen the bug unless you can demonstrate a fix. Thanks, Jason On Feb 13, 2006, at 2:42 PM, bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=3586 > > bunk@stusta.de changed: > > What |Removed |Added > ---------------------------------------------------------------------- > ------ > Status|ASSIGNED |REJECTED > Resolution| |INSUFFICIENT_DATA > > > > ------- Additional Comments From bunk@stusta.de 2006-02-13 14:42 > ------- > I;m assuming this issue is already fixed. > > Please reopen this bug if: > - it is still present in recent 2.6 kernels and > - you can provide the requested information. > > ------- You are receiving this mail because: ------- > You are on the CC list for the bug, or are watching someone who is. >