Bug 69751 - serial console does not wake from S3
Summary: serial console does not wake from S3
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Serial (show other bugs)
Hardware: i386 Linux
: P1 normal
Assignee: Alan
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-01-31 03:01 UTC by Valerio Vanni
Modified: 2014-09-18 08:55 UTC (History)
7 users (show)

See Also:
Kernel Version: All version tried
Subsystem:
Regression: No
Bisected commit-id:


Attachments
logs and configs (152.78 KB, text/plain)
2014-01-31 03:01 UTC, Valerio Vanni
Details
logs 3.12.9 on Lenny (84.12 KB, text/plain)
2014-01-31 03:02 UTC, Valerio Vanni
Details
logs drm-intel-nightly on lenny (79.41 KB, text/plain)
2014-01-31 03:02 UTC, Valerio Vanni
Details
mangled text on serial console (57.88 KB, image/png)
2014-02-10 18:31 UTC, Valerio Vanni
Details
serial console log (note: garbage characters have been converted to spaces loading the txt file in bugzilla) (68.72 KB, text/plain)
2014-03-01 12:22 UTC, Valerio Vanni
Details

Description Valerio Vanni 2014-01-31 03:01:17 UTC
Created attachment 123941 [details]
logs and configs

This crash is discussed also here
https://bugzilla.kernel.org/show_bug.cgi?id=69521
https://bugzilla.kernel.org/show_bug.cgi?id=69581
(for the i915 and saa7134 warnings)

[1.] One line summary of the problem:    

Kernel 3.12.8 gives a warning during resume from S3 sleep

[2.] Full description of the problem/report:

It happens also with 3.12.7, 3.12.6, 3,12.9, 3.13 and also with drm-intel-nightly. It doesn't happen with 2.6.24.7.
OS is Debian Lenny, with vanilla kernel. It happens the same after upgrade to Squeeze.

I suspend the machine with s2ram and it goes off.
During the resume it writes that warning, then it seem to work normally, 
except for serial redirection of console.

I use redirection of console to serial port (with lilo directive: append="
console=ttyS0 console=tty0") and I check the messages in another machine.
This stops working as soon as I suspend. It begins to send mangled lines and 
it doesn't work again until the next full restart.
Only the serial redirection has this problem, the local console  works.
Comment 1 Valerio Vanni 2014-01-31 03:02:02 UTC
Created attachment 123951 [details]
logs 3.12.9 on Lenny
Comment 2 Valerio Vanni 2014-01-31 03:02:27 UTC
Created attachment 123961 [details]
logs drm-intel-nightly on lenny
Comment 3 Valerio Vanni 2014-02-07 03:52:10 UTC
It still happens with 3.12.10.
Comment 4 Valerio Vanni 2014-02-08 01:56:00 UTC
It still happens with 3.13.2.
Comment 5 Peter Hurley 2014-02-10 14:32:21 UTC
Is the scrambled serial console output a regression against some previously working kernel, and if so, what version was that?

Please attach an excerpt of the "mangled lines" on the serial console.
Comment 6 Valerio Vanni 2014-02-10 18:31:24 UTC
Created attachment 125531 [details]
mangled text on serial console

As I wrote in the first message, it worked on 2.6.24.7.

1) I start minicom on the remote machine
2) I boot the local machine, and on the remote all messages are shown
3) I suspend the local machine, and still the other receives messages during the suspend

As long as I resume the local machine
-On the same appears the oops text and then the regular one (after the crash, the local console continue to work).
-On the remote machine start to appear mangled characters, it seems to mangle also previous text because inside the mess I can view some piece of the correct messages
Comment 7 Valerio Vanni 2014-02-12 20:13:33 UTC
I tried with 2.6.34.15: the v4l crash is not present.
To say the truth, no crash happens. But the serial output is already mangled.

So it seems that the regression is somewhere between 2.6.24.7 and 2.6.34.15.
Comment 8 Valerio Vanni 2014-02-22 09:14:07 UTC
2.6.25 fails.
I'd want to try with other 2.6.24.x (against my working 2.6.24.7) but I cannot find them in www.kernel.org.

Why some subversion have "extraversion" and others not? For example, there are 2.6.24, 2.6.25, 2.6.26 without extraversion, and 2.6.27 has extraversion up to 57.
Comment 9 Peter Hurley 2014-02-26 17:51:54 UTC
(In reply to Valerio Vanni from comment #6)
> As I wrote in the first message, it worked on 2.6.24.7.

Sorry, missed that.

Please supply the output of 'setserial -a /dev/ttyS0' for 2.6.24.7 and verify that the output is the same for 2.6.25, both before and after suspend/resume (or the output if different).

The other "extraversion"s are in the stable git tree which you can browse here:
https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/refs/?id=refs/tags/v3.13.5

If the 'setserial' output provides no clue as to the cause of this regression, the only realistic solution is to perform a 'git bisect' to discover what change(s) caused the regression, which will help narrow down the actual problem (eg., mis-identifying the UART, mis-programming the divisors, or something else).

I can provide some instructions on how to perform a bisect, if necessary.
Comment 10 Valerio Vanni 2014-02-27 17:53:43 UTC
Today I went to try with many kernels, and I found that also 2.6.24.7 was failing.

It seemed a partial failing, because it mangled only the resuming events (the following normal events were printed). I went on with test and had restricted this partial failing (against the total, the mangling all events up to the next shutdown) to happen from 3.2.1.
Partial failing up to 3.2.0, and total failing from 3.2.1.

Now it's failing everywhere, total failing.
I don't understand the reason of this sudden change, but at the moment I cannot speak anymore of regression.

What could I look at?

The serial state has always been the same. with all kernel tried, with partial or total failing, after or before suspension.

newton:~# setserial -a /dev/ttyS0
/dev/ttyS0, Line 0, UART: 16550A, Port: 0x03f8, IRQ: 4
        Baud_base: 115200, close_delay: 50, divisor: 0
        closing_wait: 3000
        Flags: spd_normal skip_test
Comment 11 Peter Hurley 2014-02-27 19:03:36 UTC
What is the output of:

setserial /dev/ttyS0 autoconfig ^skip_test

Did this problem begin with a distro upgrade?
Was the other machine changed?
Comment 12 Valerio Vanni 2014-02-27 20:34:50 UTC
The command itself gives no output, but if I look again with "setserial -" I find the same values without "skip_test".

No, the problem began with Lenny. I noticed it together with two other warnings during the resume from S3 after a kernel upgrade. The first was 3.12.6, but later I tried all the following kernels as long as they became available.

The i915 issue has been fixed in the developement branch, the v4l has still received no attention.

The distro upgrades have been, so far, tests (that din't fix anything)
All today's tests were on original Lenny restored from a disk image.

I have disk images of the distro upgraded (I did also some clean install), so I can do further tests (even dangerous ones).

The other machine (at the other end of the serial cable) was not changed.
To avoid problem of something-stale, I did the same sequence between each reboot of this machine.

-Close minicom on the other machine
-Turn off this machine
-Open minicom on the other machine
-Turn on this machine
-Suspend this machine
-Resume this machine
-Triggered some kernel event to say if it went on the other machine
Comment 13 Valerio Vanni 2014-02-27 20:58:22 UTC
I just found this old message
http://osdir.com/ml/linux.serial/2005-02/msg00000.html
And I find a common point: if I output something to the serial (when I see the mangled text), then the kernel too is able to write.
A "foo" is not enough, I have to "cat" a little file.
Comment 14 Valerio Vanni 2014-03-01 12:22:23 UTC
Created attachment 127721 [details]
serial console log (note: garbage characters have been converted to spaces loading the txt file in bugzilla)

I did a trial on a virtual machine.
VmWare Player is installed on a Windows XP host.
Inside I installed a Debian Wheezy with Debian kernel (Wheezy 3.2.0-4-686-pae #1 SMP Debian 3.2.54-2 i686 GNU/Linux). I activated serial console on lilo, and in VmWare Player I redirected /dev/ttyS1 to a txt file on the host (in append mode).

The result is still the same: at the time the machine is suspended to ram, serial output begins to be mangled, and only a manual writing is able to stop it.

1) Boot the machine: events are written on txt file
2) Suspend the machine (not the "vmware suspend virtual machine", but a "suspend to ram" called from the Debian guest as if it was a physical machine): events are written up to "suspending console(s)".
3) Resume the machine: garbage is written to txt file
4) Triggered some kernel event: garbage is written to txt file
5) echo "ciao" > /dev/ttyS1: "ciao" is added at the end of garbage
6) Triggered some event and shutdown the machine: events are written to txt file
Comment 15 Valerio Vanni 2014-03-03 19:43:24 UTC
To make it clear: the last test was on a totally different setup.
Physical machine was none of the usual two, but another machine. I think it could exclude hardware problems.

Note: at the start I filed a bug in the power managment/hibernation-suspend, but they told me to open separate bug reports in the driver sections.

https://bugzilla.kernel.org/show_bug.cgi?id=69351

Could this issue be more for that section than for this?
Comment 16 Peter Hurley 2014-03-03 19:56:27 UTC
I plan to debug this issue later this week, so no need to file another bug in a different section. The information you've provided should be enough to discover the root cause -- thanks!
Comment 17 Valerio Vanni 2014-03-25 15:10:21 UTC
Are there any news on this bug?

I just tried on a testing machine with 3.14-rc8 and also with linux-next to see if the issue had disappeared: no, it's still there.
Comment 18 Valerio Vanni 2014-04-16 03:25:37 UTC
It's still present in 3.14.1, 3.15-rc1 and in -next.
Comment 19 Valerio Vanni 2014-05-10 10:49:18 UTC
Is there some news on this bug?

If there's some other test I can do, I'll do. But I've already tried on different hardware and distributions, always with the same result of the output mangled after the resume.

I think it should be easy to reproduce this.
Comment 20 Valerio Vanni 2014-05-30 09:47:24 UTC
It still fails on 3.14.4, 3.15-rc7 and today's -next.
Comment 21 Valerio Vanni 2014-09-18 08:55:30 UTC
The fix from Peter Hurley is this:
https://lkml.org/lkml/2014/7/9/357

commit ae84db9661cafc63d179e1d985a2c5b841ff0ac4 upstream

It's now in 3.16.2, 3.14.18, 3.12.28, 3.10.54, 3.2.63.

I close the bug report.

Note You need to log in before you can comment on or make changes to this bug.