Bug 4532
Summary: | FPU registers corrupt after S3 | ||
---|---|---|---|
Product: | ACPI | Reporter: | Ben Liblit (liblit) |
Component: | Power-Sleep-Wake | Assignee: | Shaohua (shaohua.li) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | acpi-bugzilla, lool+linux |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.11 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: |
additional messages in "dmesg" after waking up
raw input logging script patch to correctly restore FPU registers |
Description
Ben Liblit
2005-04-22 16:22:58 UTC
Created attachment 4974 [details] additional messages in "dmesg" after waking up I've attached a record of the new messages in "dmesg" after waking up that were not there before. Perhaps there's a clue here as to the cause. Initially I was suspicious of the fact that this output includes a "sleeping function called from invalid context" warning containing syscall_call() which in turn calls sys_write(). Could this be the write system call being run by the "echo" command? Since "echo" is a shell built-in, if hte kernel decided to kill the process doing the write, that would actually kill the entire shell. However, Warren Togami reports that he sees the same "sleeping function called from invalid context" on his ThinkPad T41 but that he does not see the shell-closing behavior I report here. So that may be a red herring. (See <https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=140257> for the record of that exchange.) No, the oops message isn't related with your issue. I'm a little supect the keyboard driver is broken. Can you try a different kernel, like 2.6.12-rc3? Unfortunately I have since upgraded. I no longer have the ThinkPad X24 on which I originally reported the problem. I won't object if someone wants to mark this report as UNREPRODUCIBLE. If someone can reproduce the issue, please reopen. I am seeing the problem once again using Fedora Core 4 with the latest Fedora kernel (2.6.12-1.1398_FC4) on my ThinkPad X40. I do have a minor additional clue, though: The shell only closes when using "zsh" as root's shell. If I change to "sh" first, the shell does not close. However, the shell does react as though an additional newline had been typed. I.e.: sh-3.00# echo 3 >/proc/acpi/sleep sh-3.00# sh-3.00# The second blank prompt there suggests that the shell thinks someone pressed <Enter> again after the echo command. At least once while testing this, though, the shell got into a bad loop where it would wake up and almost immediately go back to sleep again. I could even see extra "echo 3 >/proc/acpi/sleep" commands at the shell prompt each time. So this time it was as though the shell were repeatedly seeing <Up> followed by <Enter>, where <Up> was bringing the echo command back from the shell's command history and <Enter> was running the command again. The only way I could break out of this sleep loop was to race against the system to close root's shell (<Control>-D) before the echo command could be received yet again. Also, on just one occasion, when I pressed <Control> in order to start typing <Control>-D, the terminal window displayed its popup menu as though I'd pressed the right mouse button or perhaps some keycode which is mapped to <Menu> by the X server. This is all consistent with the general idea that junk is appearing in the input stream. I guess different shells with different command line editing setups react differently to that junk. same thing if you echo mem >/sys/power/state ? same in 2.6.13? what if you put a command after it to collect the input, eg # echo mem >/sys/power/state; cat > /tmp/input.log << EOF Interseting that zsh and sh behave differently. Do you see input garbage if you suspend from a VGA console? This may be an issue in the input sub-system or in X. Created attachment 5660 [details]
raw input logging script
Len, I like your idea about logging the input after the "echo" command. Rather
than using "cat", though, I wrote a small script that logs input in the most
raw way possible, character by character, until stopped via SIGINT. I'm
attaching that script to this report for future reference.
So now I'm running the following command line:
# echo 3 >/proc/acpi/sleep; ./logger
If I run this in a gnome-terminal under X, the logger script records a single
carriage return (\r) appearing on stdin. If I run this under a VGA console,the
logger script records no input at all.
If I run this under zsh, after I manually send SIGINT to the logger script, the
zsh process terminates. This is true whether I'm using X or a VGA console. I
haven't looked into why it terminates more closely, but this is an interesting
question. Is it reading EOF from stdin? Is it being killed by some signal?
These are all open questions. Perhaps I should attach a debugger to that zsh
process, set a breakpoint in _exit(), and see if I can tell why it's getting
there. Does that sound worth checking out?
If I run this under sh, the sh process remains alive and running normally after
I manually send SIGINT to the logger script. Again, this is true under both X
and a VGA console.
If I change the echo command to "echo mem >/sys/power/state", nothing changes.
Behavior is identical in all cases.
So we've now split into two subquestions: (1) why the extra carriage return
under X but not under a VGA console, and (2) why does zsh exit while sh keeps
running?
Curiouser and curiouser!
Since I have no idea how to pursue the spurrious carriage return problem, I thought I'd look closer at why zsh exits while sh keeps running. Turns out zsh is getting SIGFPE in a call to difftime(). In the source code, difftime() is called as: difftime(time(NULL), lastwatch) difftime() is a tiny function. Here's the complete disassembly: 0x005a5b10 <__difftime+0>: push %ebp 0x005a5b11 <__difftime+1>: mov %esp,%ebp 0x005a5b13 <__difftime+3>: fildl 0xc(%ebp) 0x005a5b16 <__difftime+6>: fisubrl 0x8(%ebp) 0x005a5b19 <__difftime+9>: pop %ebp 0x005a5b1a <__difftime+10>: ret That's the whole function. The SIGFPE arises at the "fisubrl" instruction. Strangely, I can confirm that both arguments to difftime() look perfectly reasonable. For example: difftime(1124313430, 1124313394) If I write a small program that directly calls difftime() with these two values, it returns the expected result, with no SIGFPE. So the failure of difftime() is due to some other environmental state, not just the two arguments. I don't know a lot about the floating point arithmetic environment on x86, but there must be *something* different to explain why that "fisubrl" instruction succeeds in one process and fails in another. I logged in to let you know that i saw the same zsh exit vs bash extra characters symptom as you, but using 2.6.13-rc6 on my D600. Wow, a SIGFPE -- who would have thunk it? Thanks, Len! It's good to know I'm not crazy. :-) Just to clarify, though, the "extra characters" symptom depends on running under X and is independent of zsh/bash. Either shell gave me extra characters under X; neither shell gave me extra characters under VGA. The zsh/bash distinction affects whether the shell exits due to SIGFPE (zsh) or keeps running without any obvious problems (bash). This symptom is independent of X/VGA. Seems to be X11 issue rather than kernel one. I did some tests.
it seems 'echo' command is a built-in command in zsh.
If I write a simple program which does someting like 'echo 3
>/proc/acpi/sleep' and invoke the program in zsh, zsh isn't crash.
If I directly run echo command in zsh, zsh gets a FPE signal. The real cause
is still investigating.
Created attachment 6002 [details]
patch to correctly restore FPU registers
*** Bug 3919 has been marked as a duplicate of this bug. *** patch is shipped in Linus's git tree. Closed. Good news. Thanks for hunting this down, David! |