Bug 4532

Summary: FPU registers corrupt after S3
Product: ACPI Reporter: Ben Liblit (liblit)
Component: Power-Sleep-WakeAssignee: Shaohua (shaohua.li)
Status: CLOSED CODE_FIX    
Severity: normal CC: acpi-bugzilla, lool+linux
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.11 Subsystem:
Regression: --- Bisected commit-id:
Attachments: additional messages in "dmesg" after waking up
raw input logging script
patch to correctly restore FPU registers

Description Ben Liblit 2005-04-22 16:22:58 UTC
Distribution: Fedora Linux 3
Hardware Environment: IBM ThinkPad X24
Software Environment: Fedora kernel RPM build "kernel-2.6.11-1.14_FC3"
Problem Description:

With ACPI enabled, I can put my ThinkPad X24 laptop to sleep (S3) and wake it
back up again.  However, if I used the very simple "echo 3 >/proc/acpi/sleep"
technique to sleep, my root shell is closed by the time the laptop resumes.  It
appears that several newlines and an EOF are arriving on my pseudo-TTY from ...
well ... from I have no idea where.

Steps to reproduce:

1. Log in to the text console as root.  Or if you prefer, start a terminal
window and "su" to root.
2. Type "echo 3 >/proc/acpi/sleep".
3. Wake the computer back up.

Actual Results:  If you had logged in to the text console as root, you are no
longer logged in.  You're simply back to the "login:" prompt.  Furthermore, a
NULL control character ("^@") is visible as though it had been typed in at the
"login:" prompt.

If you had used "su" to become root, your root shell is no longer running. 
There are several blank lines below the "echo" command you typed, and then
several blank prompts in the shell from which you typed "su".  It appears as
though the terminal window received (as input) a number of newlines with an EOF
stuck in the middle somewhere.

Expected Results:  Sleeping and waking up should not generate any "phantom"
input.  The shell that did the write should come back in exactly the state we
left it: still logged in, or still su'ed to root.
Comment 1 Ben Liblit 2005-04-22 16:28:58 UTC
Created attachment 4974 [details]
additional messages in "dmesg" after waking up

I've attached a record of the new messages in "dmesg" after waking up that were
not there before.  Perhaps there's a clue here as to the cause.

Initially I was suspicious of the fact that this output includes a "sleeping
function called from invalid context" warning containing syscall_call() which
in turn calls sys_write().  Could this be the write system call being run by
the "echo" command?  Since "echo" is a shell built-in, if hte kernel decided to
kill the process doing the write, that would actually kill the entire shell.

However, Warren Togami reports that he sees the same "sleeping function called
from invalid context" on his ThinkPad T41 but that he does not see the
shell-closing behavior I report here.  So that may be a red herring.

(See <https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=140257> for the
record of that exchange.)
Comment 2 Shaohua 2005-04-26 01:48:03 UTC
No, the oops message isn't related with your issue.
I'm a little supect the keyboard driver is broken. Can you try a different 
kernel, like 2.6.12-rc3?
Comment 3 Ben Liblit 2005-05-29 11:41:40 UTC
Unfortunately I have since upgraded.  I no longer have the ThinkPad X24 on which
I originally reported the problem.

I won't object if someone wants to mark this report as UNREPRODUCIBLE.
Comment 4 Shaohua 2005-05-29 17:52:07 UTC
If someone can reproduce the issue, please reopen.
Comment 5 Ben Liblit 2005-07-17 10:16:58 UTC
I am seeing the problem once again using Fedora Core 4 with the latest Fedora
kernel (2.6.12-1.1398_FC4) on my ThinkPad X40.  I do have a minor additional
clue, though:

The shell only closes when using "zsh" as root's shell.  If I change to "sh"
first, the shell does not close.  However, the shell does react as though an
additional newline had been typed.  I.e.:

        sh-3.00# echo 3 >/proc/acpi/sleep
        sh-3.00#
        sh-3.00#

The second blank prompt there suggests that the shell thinks someone pressed
<Enter> again after the echo command.

At least once while testing this, though, the shell got into a bad loop where it
would wake up and almost immediately go back to sleep again.  I could even see
extra "echo 3 >/proc/acpi/sleep" commands at the shell prompt each time.  So
this time it was as though the shell were repeatedly seeing <Up> followed by
<Enter>, where <Up> was bringing the echo command back from the shell's command
history and <Enter> was running the command again.

The only way I could break out of this sleep loop was to race against the system
to close root's shell (<Control>-D) before the echo command could be received
yet again.

Also, on just one occasion, when I pressed <Control> in order to start typing
<Control>-D, the terminal window displayed its popup menu as though I'd pressed
the right mouse button or perhaps some keycode which is mapped to <Menu> by the
X server.

This is all consistent with the general idea that junk is appearing in the input
stream.  I guess different shells with different command line editing setups
react differently to that junk.
Comment 6 Len Brown 2005-08-17 12:26:55 UTC
same thing if you echo mem >/sys/power/state ?
same in 2.6.13?
what if you put a command after it to collect the input, eg

# echo mem >/sys/power/state; cat > /tmp/input.log << EOF

Interseting that zsh and sh behave differently.
Do you see input garbage if you suspend from a VGA console?
This may be an issue in the input sub-system or in X.
Comment 7 Ben Liblit 2005-08-17 13:36:44 UTC
Created attachment 5660 [details]
raw input logging script

Len, I like your idea about logging the input after the "echo" command.  Rather
than using "cat", though, I wrote a small script that logs input in the most
raw way possible, character by character, until stopped via SIGINT.  I'm
attaching that script to this report for future reference.

So now I'm running the following command line:
    # echo 3 >/proc/acpi/sleep; ./logger

If I run this in a gnome-terminal under X, the logger script records a single
carriage return (\r) appearing on stdin.  If I run this under a VGA console,the
logger script records no input at all.

If I run this under zsh, after I manually send SIGINT to the logger script, the
zsh process terminates.  This is true whether I'm using X or a VGA console.  I
haven't looked into why it terminates more closely, but this is an interesting
question.  Is it reading EOF from stdin?  Is it being killed by some signal? 
These are all open questions.  Perhaps I should attach a debugger to that zsh
process, set a breakpoint in _exit(), and see if I can tell why it's getting
there.	Does that sound worth checking out?

If I run this under sh, the sh process remains alive and running normally after
I manually send SIGINT to the logger script.  Again, this is true under both X
and a VGA console.

If I change the echo command to "echo mem >/sys/power/state", nothing changes. 
Behavior is identical in all cases.

So we've now split into two subquestions: (1) why the extra carriage return
under X but not under a VGA console, and (2) why does zsh exit while sh keeps
running?

Curiouser and curiouser!
Comment 8 Ben Liblit 2005-08-17 14:40:03 UTC
Since I have no idea how to pursue the spurrious carriage return problem, I
thought I'd look closer at why zsh exits while sh keeps running.  Turns out zsh
is getting SIGFPE in a call to difftime().  In the source code, difftime() is
called as:

    difftime(time(NULL), lastwatch)

difftime() is a tiny function.  Here's the complete disassembly:

    0x005a5b10 <__difftime+0>:      push   %ebp
    0x005a5b11 <__difftime+1>:      mov    %esp,%ebp
    0x005a5b13 <__difftime+3>:      fildl  0xc(%ebp)
    0x005a5b16 <__difftime+6>:      fisubrl 0x8(%ebp)
    0x005a5b19 <__difftime+9>:      pop    %ebp
    0x005a5b1a <__difftime+10>:     ret

That's the whole function.  The SIGFPE arises at the "fisubrl" instruction.

Strangely, I can confirm that both arguments to difftime() look perfectly
reasonable.  For example:

    difftime(1124313430, 1124313394)

If I write a small program that directly calls difftime() with these two values,
it returns the expected result, with no SIGFPE.  So the failure of difftime() is
due to some other environmental state, not just the two arguments.  I don't know
a lot about the floating point arithmetic environment on x86, but there must be
*something* different to explain why that "fisubrl" instruction succeeds in one
process and fails in another.
Comment 9 Len Brown 2005-08-18 01:28:14 UTC
I logged in to let you know that i saw the same zsh exit vs bash
extra characters symptom as you, but using 2.6.13-rc6 on my D600.
Wow, a SIGFPE -- who would have thunk it?
Comment 10 Ben Liblit 2005-08-18 11:24:44 UTC
Thanks, Len!  It's good to know I'm not crazy.  :-)

Just to clarify, though, the "extra characters" symptom depends on running under
X and is independent of zsh/bash.  Either shell gave me extra characters under
X; neither shell gave me extra characters under VGA.

The zsh/bash distinction affects whether the shell exits due to SIGFPE (zsh) or
keeps running without any obvious problems (bash).  This symptom is independent
of X/VGA.
Comment 11 Alexey Starikovskiy 2005-09-12 03:38:14 UTC
Seems to be X11 issue rather than kernel one. 
Comment 12 Shaohua 2005-09-13 02:19:33 UTC
I did some tests.
it seems 'echo' command is a built-in command in zsh.
If I write a simple program which does someting like 'echo 3 
>/proc/acpi/sleep' and invoke the program in zsh, zsh isn't crash.
If I directly run echo command in zsh, zsh gets a FPE signal. The real cause 
is still investigating.
Comment 13 Shaohua 2005-09-13 21:46:50 UTC
Created attachment 6002 [details]
patch to correctly restore FPU registers
Comment 14 Shaohua 2005-09-13 21:48:28 UTC
*** Bug 3919 has been marked as a duplicate of this bug. ***
Comment 15 Shaohua 2005-11-06 18:49:52 UTC
patch is shipped in Linus's git tree. Closed.
Comment 16 Ben Liblit 2005-11-06 18:55:20 UTC
Good news.  Thanks for hunting this down, David!