Bug 102611 - Battery and wireless detection and touchscreen input not working after suspend - Asus T200TA
Summary: Battery and wireless detection and touchscreen input not working after suspen...
Status: CLOSED UNREPRODUCIBLE
Alias: None
Product: Power Management
Classification: Unclassified
Component: Hibernation/Suspend (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Aaron Lu
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-08-10 15:17 UTC by Marcin Mielniczuk
Modified: 2016-12-22 03:31 UTC (History)
4 users (show)

See Also:
Kernel Version: 4.1.2 - 4.1.6
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Log from `lshw` before and after suspend (1.27 KB, text/x-log)
2015-08-10 15:17 UTC, Marcin Mielniczuk
Details
Output of acpidump before suspend (372.01 KB, text/plain)
2015-08-19 15:46 UTC, Marcin Mielniczuk
Details
Output of acpidump after suspend (372.01 KB, text/plain)
2015-08-19 15:47 UTC, Marcin Mielniczuk
Details
Output of dmesg after suspend (55.12 KB, text/plain)
2015-08-19 15:47 UTC, Marcin Mielniczuk
Details
Output of dmesg after attaching the AC power (57.03 KB, text/plain)
2015-08-20 12:43 UTC, Marcin Mielniczuk
Details
Print I2C operation region errors (410 bytes, patch)
2015-08-20 13:01 UTC, Mika Westerberg
Details | Diff
Output of dmesg from the patched kernel 4.1.6 (55.17 KB, text/plain)
2015-08-21 15:11 UTC, Marcin Mielniczuk
Details
Output of dmesg with initcall_debug (179.24 KB, text/plain)
2015-08-24 08:18 UTC, Marcin Mielniczuk
Details
Output of dmesg after disabled ACPI I2C opregion support (178.70 KB, text/plain)
2015-08-28 14:04 UTC, Marcin Mielniczuk
Details
Output of dmesg - computer suspends back on its own. (54.59 KB, text/x-log)
2015-08-31 07:39 UTC, Marcin Mielniczuk
Details
/proc/interrupts (3.34 KB, text/plain)
2015-08-31 12:04 UTC, Marcin Mielniczuk
Details
I2C DW late suspend patch (605 bytes, patch)
2015-08-31 12:32 UTC, Mika Westerberg
Details | Diff
I2C DW late suspend patch for v4.1 (676 bytes, patch)
2015-08-31 13:45 UTC, Mika Westerberg
Details | Diff
Dump device names according to dpm_list (433 bytes, patch)
2015-09-09 08:51 UTC, Aaron Lu
Details | Diff
dmesg after Aaron's patch (208.45 KB, text/plain)
2015-09-10 17:47 UTC, Marcin Mielniczuk
Details
dmesg with debug enabled in Makefiles (183.68 KB, text/plain)
2015-10-25 14:18 UTC, Marcin Mielniczuk
Details
do not attach in dep_walk, instead, rely on defer probe to trigger it (1.46 KB, patch)
2015-10-26 08:07 UTC, Aaron Lu
Details | Diff
On patched 4.3 (220.25 KB, text/x-log)
2015-12-28 09:10 UTC, Marcin Mielniczuk
Details
On patched 4.3, after suspend (218.36 KB, text/plain)
2015-12-28 09:11 UTC, Marcin Mielniczuk
Details

Description Marcin Mielniczuk 2015-08-10 15:17:00 UTC
Created attachment 184601 [details]
Log from `lshw` before and after suspend

I'm using Asus T200TA with Ubuntu 15.04 amd64, kernel 4.1.2 and updated linux-firmware.

The wireless drivers work after operations as described in #102531. Battery and touchscreen work out of the box.

Then the device is suspended. After waking up the notebook, the battery is not detected anymore. KDE5 displayed information that no battery was found, Unity simply didn't display the applet. The battery is detected again after a charges has been plugged in (and the status is updated properly when removing the charger).

The touchscreen stops working, no input can be done through it. The network card isn't detected anymore, see the log from `lshw`. The first command was issued before suspend, the other - after.

The problem is 100% reproducible. I'm eager to provide any other feedback or even create a patch if I am up to it and if I could be guided a little.
Comment 1 Aaron Lu 2015-08-18 07:28:40 UTC
Please attach dmesg output after system resumed, and acpidump too.
Comment 2 Marcin Mielniczuk 2015-08-19 15:46:55 UTC
Created attachment 185211 [details]
Output of acpidump before suspend
Comment 3 Marcin Mielniczuk 2015-08-19 15:47:26 UTC
Created attachment 185221 [details]
Output of acpidump after suspend
Comment 4 Marcin Mielniczuk 2015-08-19 15:47:47 UTC
Created attachment 185231 [details]
Output of dmesg after suspend
Comment 5 Aaron Lu 2015-08-20 09:15:16 UTC
[  201.283355] PM: resume from suspend-to-idle
[  201.385503] i2c_designware 80860F41:00: timeout waiting for bus ready
[  201.401599] PM: noirq resume of devices complete after 118.040 msecs
[  201.758302] PM: early resume of devices complete after 356.530 msecs
[  201.766355] i2c i2c-0: i2c read failed
[  201.778279] i2c i2c-0: i2c read failed
[  202.055521] PM: resume of devices complete after 297.155 msecs
[  202.056211] PM: Finishing wakeup.

I guess the problem is that after resume, i2c controller stops working and battery information is got from it, so battery information is lost too. I have no idea why plug in AC will solve the problem though.

Mike, do you have any ideas on the above error messages?
Comment 6 Marcin Mielniczuk 2015-08-20 11:23:02 UTC
My theory: (newbie here, so might be wrong)

i2c read fails and it's not retried automatically. When the status changes, a read is requested, it succeeds. 

Or maybe the read occurs too early?

I'll attach a dmesg after AC plug-in today.
Comment 7 Mika Westerberg 2015-08-20 12:02:07 UTC
I guess at that point (during resume cycle) the I2C bus is powered off and that's why the adapter times out. I don't see anything obvious in the DSDT related to I2C1 power.
Comment 8 Mika Westerberg 2015-08-20 12:17:57 UTC
Aaron, actually I think what is happening:

[  201.385503] i2c_designware 80860F41:00: timeout waiting for bus ready
[  201.401599] PM: noirq resume of devices complete after 118.040 msecs

The I2C driver has not been resumed yet (it is done after _noirq and _late hooks have been run) and something is trying to access the bus already. Since it cannot be the I2C children devices, I suspect that it is battery or similar through the I2C Operation Region.
Comment 9 Marcin Mielniczuk 2015-08-20 12:43:34 UTC
Created attachment 185321 [details]
Output of dmesg after attaching the AC power
Comment 10 Mika Westerberg 2015-08-20 13:01:07 UTC
Created attachment 185351 [details]
Print I2C operation region errors
Comment 11 Mika Westerberg 2015-08-20 13:02:06 UTC
Marcin,

Can you try the attached patch and attach dmesg after resume. It should print out something if it was I2C operation region access that caused the I2C error.
Comment 12 Marcin Mielniczuk 2015-08-21 08:10:07 UTC
Kernel build fails for me:

CC      drivers/i2c/i2c-core.o
In file included from include/linux/kernel.h:13:0,
                 from include/linux/list.h:8,
                 from include/linux/module.h:9,
                 from drivers/i2c/i2c-core.c:30:
drivers/i2c/i2c-core.c: In function ‘acpi_i2c_space_handler’:
drivers/i2c/i2c-core.c:363:54: error: invalid type argument of ‘->’ (have ‘struct i2c_client’)
   pr_err("ACPI I2C (%u): %u failed with %d\n", client->addr,
                                                      ^
include/linux/printk.h:250:33: note: in definition of macro ‘pr_err’
  printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
                                 ^
  CC      drivers/idle/intel_idle.o

Using gcc 4.8.4 (my build machine is running Linux Mint 17.2 - based on Ubuntu 14.04, and the target machine has Ubuntu 15.04, which uses gcc 4.9)
Comment 13 Marcin Mielniczuk 2015-08-21 08:11:40 UTC
It may be of importance: since I'm building anyway, I took the source of 4.1.6
Comment 14 Mika Westerberg 2015-08-21 08:16:02 UTC
Right, it was changed with 7ef85f5fdd ("i2c: core: Reduce stack size of acpi_i2c_space_handler()").

Please replace "client->" with "client." and it should compile fine.
Comment 15 Aaron Lu 2015-08-21 08:48:35 UTC
> Mike, do you have any ideas on the above error messages?

I guess I'm typing too fast, sorry for mis-typing your name Mika.

(In reply to Mika Westerberg from comment #8)
> Aaron, actually I think what is happening:
> 
> [  201.385503] i2c_designware 80860F41:00: timeout waiting for bus ready
> [  201.401599] PM: noirq resume of devices complete after 118.040 msecs
> 
> The I2C driver has not been resumed yet (it is done after _noirq and _late
> hooks have been run) and something is trying to access the bus already.

Makes sense.

> Since it cannot be the I2C children devices, I suspect that it is battery or
> similar through the I2C Operation Region.

OK, let's see what that is.

> [  201.283355] PM: resume from suspend-to-idle
> [  201.385503] i2c_designware 80860F41:00: timeout waiting for bus ready
> [  201.401599] PM: noirq resume of devices complete after 118.040 msecs
> [  201.758302] PM: early resume of devices complete after 356.530 msecs
> [  201.766355] i2c i2c-0: i2c read failed
> [  201.778279] i2c i2c-0: i2c read failed
> [  202.055521] PM: resume of devices complete after 297.155 msecs
> [  202.056211] PM: Finishing wakeup.

I just noticed that the "i2c read failed" message above is emitted by function acpi_gsb_i2c_read_bytes, which is used by the I2C operation region handler. And according to its position, it happened during the device resume phase. So the above two "i2c read failed" messages means some ASL code access I2C operation region fields(most probably battery related methods), but the bus is still not operational at that time, which is not normal?
Comment 16 Mika Westerberg 2015-08-21 08:56:21 UTC
The I2C host controller _PS0 is not yet executed at that point (since it has not been resumed yet).
Comment 17 Aaron Lu 2015-08-21 09:24:52 UTC
I think acpi_lpss_resume_early will power these controllers up?
Comment 18 Mika Westerberg 2015-08-21 09:28:08 UTC
It will but the error happens in _noirq resume which happens earlier if I'm not mistaken.
Comment 19 Aaron Lu 2015-08-21 12:35:23 UTC
The "timeout waiting for bus ready" error happens in _noirq resume phase, but do the two "i2c read failed" error also happen in that phase? I thought they happen in resume phase.
Comment 20 Mika Westerberg 2015-08-21 12:44:36 UTC
Yes, they happen in resume phase. The debug patch should reveal if I2C access happens from ASL (something else than acpi_gsb_i2c_read_bytes()).
Comment 21 Marcin Mielniczuk 2015-08-21 15:11:59 UTC
Created attachment 185431 [details]
Output of dmesg from the patched kernel 4.1.6
Comment 22 Aaron Lu 2015-08-24 02:39:09 UTC
[   60.767740] PM: resume from suspend-to-idle
[   60.990719] i2c_designware 80860F41:00: timeout waiting for bus ready
[   60.990723] ACPI I2C (102): 6 failed with -110

As Mika explained, this is expected, since the controller is still in powered off state. It's not clear to me why ASL needs to access the I2C operation region at such an early time.

[   61.006818] PM: noirq resume of devices complete after 238.850 msecs
[   61.363656] PM: early resume of devices complete after 356.665 msecs
[   61.371524] i2c i2c-0: i2c read failed
[   61.371537] ACPI I2C (102): 11 failed with -121
[   61.379530] i2c i2c-0: i2c read failed
[   61.379544] ACPI I2C (102): 11 failed with -121

Perhaps this is due to the I2C controller's resume is not done yet?

Marcin,
Please add initcall_debug to kernel cmdline and attach dmesg again after resume, thanks.

[   61.760262] PM: resume of devices complete after 396.528 msecs
[   61.760816] PM: Finishing wakeup.
Comment 23 Marcin Mielniczuk 2015-08-24 07:05:46 UTC
OK, so my command line in Grub would be now:

linux /boot/vmlinuz-4.1.6 root=<UUID> ro quiet splash $vt_handoff initcall_debug

I see that quiet removes most log messages. Should I disable this option too?
Comment 24 Aaron Lu 2015-08-24 07:17:41 UTC
(In reply to Marcin Mielniczuk from comment #23)
> OK, so my command line in Grub would be now:
> 
> linux /boot/vmlinuz-4.1.6 root=<UUID> ro quiet splash $vt_handoff
> initcall_debug
> 
> I see that quiet removes most log messages. Should I disable this option too?

Yes please.
Comment 25 Marcin Mielniczuk 2015-08-24 08:18:29 UTC
Created attachment 185591 [details]
Output of dmesg with initcall_debug
Comment 26 Marcin Mielniczuk 2015-08-24 08:21:31 UTC
Something I noticed once or twice, IIRC when I getting the first log on the patched kernel: sometimes after a wakeup the computer suspends back again without user's will.

Should I give you a dmesg when I experience this once again?
Comment 27 Mika Westerberg 2015-08-24 09:29:43 UTC
[   42.036319] i2c_designware 80860F41:00: timeout waiting for bus ready
[   42.036324] ACPI I2C (102): 6 failed with -110
[   42.052350] call 0000:00:02.0+ returned 0 after 170605 usecs

The device in question seems to be GFX adapter:

    Device (GFX0)
    {
        Name (_ADR, 0x00020000)  // _ADR: Address
...
        Method (_PS0, 0, NotSerialized)  // _PS0: Power State 0
        {
            If (^^^I2C1.GLID ())
            {

The driver itself does not do anything in _noirq phase but it's the PCI core which goes and powers on the device resulting _PS0 to be called.
Comment 28 Mika Westerberg 2015-08-24 09:35:50 UTC
I wonder if we can just uninstall the I2C OpReg when the I2C adapter is suspended and reinstall it again on resume?
Comment 29 Aaron Lu 2015-08-25 06:12:56 UTC
I think there are two problems:

1 GFX0 access I2C when I2C is not available yet and a possible solution is provided by Mika in comment #28;
2 the battery device(PNP0C0A:00) doesn't wait for its parent I2C controller(80860F41:00) to finish its resume callback in resume phase:
[   42.409345] PM: early resume of devices complete after 356.715 msecs
... ...
[   42.409637] calling  PNP0C0A:00+ @ 1468, parent: 80860F41:00
... ...
[   42.424747] i2c i2c-0: i2c read failed
[   42.424750] ACPI I2C (102): 11 failed with -121
[   42.424907] call PNP0C0A:00+ returned 0 after 14906 usecs
... ...
[   42.443969] calling  80860F41:00+ @ 1468, parent: platform
[   42.443977] call 80860F41:00+ returned 0 after 6 usecs

It's not clear to me how this can happen, and this may contribute to the above "i2c read failed" error message, which again cause the battery information lost.
Comment 30 Mika Westerberg 2015-08-26 11:31:17 UTC
Marcin,

Can you disable I2C Operation Region support from the kernel and see if things like touchscreen work better after suspend/resume cycle?

You can do it by unselecting following from Kconfig:

  Device Drivers -> I2C Support -> ACPI I2C Operation region support

(The symbol is CONFIG_ACPI_I2C_OPREGION).
Comment 31 Marcin Mielniczuk 2015-08-28 14:03:07 UTC
The bug persists, as before. Attaching new dmesg.
Comment 32 Marcin Mielniczuk 2015-08-28 14:04:39 UTC
Created attachment 186031 [details]
Output of dmesg after disabled ACPI I2C opregion support

Kernel options: root=<UUID> ro initcall_debug splash $vt_handoff
Comment 33 Mika Westerberg 2015-08-31 07:15:30 UTC
Hmm, I don't see any errors in the dmesg. Apart from the battery how about touch screen, is it working after resume?
Comment 34 Marcin Mielniczuk 2015-08-31 07:38:26 UTC
Sorry, I didn't write clearly. Battery info is missing altogether, even before suspend. Wireless and touchscreen are lost after suspend. 

I managed to reproduce the aforementioned "suspend-back" problem: I suspend the device then choose to wake it up. Shortly after the wakeup, the computer suspends back again on its own.

Maybe this is connected, see the new dmesg. I see some brcmfmac-related errors.
Comment 35 Marcin Mielniczuk 2015-08-31 07:39:40 UTC
Created attachment 186261 [details]
Output of dmesg - computer suspends back on its own.

Unfortunately with quiet and without initcall_debug. I've just realized.
Comment 36 Mika Westerberg 2015-08-31 08:06:54 UTC
Can you open the touch screen input device directly and check if it outputs anything after suspend?

Probably best way to test this is to boot to non-GUI mode and then run:

 # echo freeze > /sys/power/state

(wake up the device)

and then run

 # od -x /dev/input/event9

(or whatever is the correct device node for the touch screen).

Also can you attach output of "/proc/interrupts" to this bug.
Comment 37 Marcin Mielniczuk 2015-08-31 08:50:37 UTC
What does this `echo freeze` do? Suspend?

Should I do it on the mainline 4.1.2 kernel or my patched 4.1.6?
Comment 38 Mika Westerberg 2015-08-31 09:04:13 UTC
It does suspend-to-idle (same thing that you do now with the GUI). Please try with your patched kernel.
Comment 39 Marcin Mielniczuk 2015-08-31 11:01:39 UTC
My Ubuntu seems to refuse not to launch GUI. So I tried from tty

`od -x /dev/input/event9` shows touchscreen taps before suspending. Then

# echo freeze /sys/power/state
[...] brcmf_sdio_bus_rxctl: resumed on timeout
[...] brcmf_fil_cmd_data: bus is down. we have nothing to do
[...] brcmf_cfg80211_get_station: Could not get rate (-5)
[...] brcmf_fil_cmd_data: bus is down. we have nothing to do
[...] brcmf_cfg80211_get_station: Could not get rate (-5)
[...] brcmf_fil_cmd_data: bus is down. we have nothing to do
[...] brcmf_link_down: WLC_DISASSOC failed (-5)
[...] brcmf_cfg80211_reg_notifier: not a ISO3166 code

Then `od -x /dev/input/event9` shows touchscreen taps (still!)

Suspended from menu, woke up. `od -x ...` still works, touchscreen in the UI - not.

Then I tried the old kernel; the behavior is identical. The battery applet disappears after suspending via echo. (in patched it's missing all the time)
Comment 40 Marcin Mielniczuk 2015-08-31 11:02:28 UTC
Just to be clear: the first test was on the patched kernel.
Comment 41 Mika Westerberg 2015-08-31 11:12:39 UTC
So what comes to the touchscreen, I think it actually works just fine. Just the Ubuntu GUI somehow loses it.

Does output of 'xinput' change after suspend?
Comment 42 Marcin Mielniczuk 2015-08-31 12:04:36 UTC
Created attachment 186311 [details]
/proc/interrupts
Comment 43 Marcin Mielniczuk 2015-08-31 12:06:06 UTC
The Ubuntu GUI only uses it when using Suspend from GUI. The touchscreen works after suspend if the echo method is used.

Besides, the brcmfmac output from comment #39 appears BEFORE the screen turns off.
Comment 44 Mika Westerberg 2015-08-31 12:32:21 UTC
Created attachment 186321 [details]
I2C DW late suspend patch
Comment 45 Mika Westerberg 2015-08-31 12:33:30 UTC
Marcin, can you try the attached patch with unpatched kernel? It should resume I2C before battery tries to read from it.
Comment 46 Marcin Mielniczuk 2015-08-31 13:30:13 UTC
On 4.1.6: 

$ git apply i2c-dw-late-suspend.diff
error: patch failed: drivers/i2c/busses/i2c-designware-platdrv.c:343
error: drivers/i2c/busses/i2c-designware-platdrv.c: patch does not apply
Comment 47 Mika Westerberg 2015-08-31 13:45:20 UTC
Created attachment 186331 [details]
I2C DW late suspend patch for v4.1

Late suspend patch that should apply to v4.1.
Comment 48 Marcin Mielniczuk 2015-08-31 14:11:31 UTC
Did the first one apply to 4.2? I can build this as well, no problem, provided it's stable. Should I?

The kernel.org website contradicts itself, once says 4.2 is mainline, once that it's stable
Comment 49 Mika Westerberg 2015-08-31 14:21:33 UTC
Yeah, it applies on top of v4.2. v4.2 was just released so it will become stable soon.
Comment 50 Marcin Mielniczuk 2015-08-31 19:49:48 UTC
So,

this patch fixed the issue with battery, it's properly detected after suspend.

Wireless is lost as before, but no wonder - the problem seems to occur during suspend, cf. comment #43. Maybe the i2c read occurs after something has already suspended? The output from brcmfmac is the same as in comment #39

I noticed a few important things as for the touchscreen.

1. The touchscreen input fails if `echo freeze > /sys/power/state` is issued from the gnome-terminal, i.e. from the GUI session
2. When switching to tty, `od` manages to receive touchscreen info correctly.
3. When switching back to tty, the touchscreen starts working back again (simple switch GUI-tty-GUI suffices to work)

I have no clue why switching back to tty magically fixes everything.

I won't be fully available until the end of this week. No kernel rebuild will be possible for sure, debug info - I don't know whether I'll be able to provide it. Anyway, I'll reply ASAP, whenever this possible occurs.
Comment 51 Aaron Lu 2015-09-01 01:40:55 UTC
(In reply to Marcin Mielniczuk from comment #50)
> So,
> 
> this patch fixed the issue with battery, it's properly detected after
> suspend.

Hi Mika,

Thanks for the patch, any plan to send it upstream?
Comment 52 Mika Westerberg 2015-09-02 10:28:50 UTC
(In reply to Aaron Lu from comment #51)
> (In reply to Marcin Mielniczuk from comment #50)
> > So,
> > 
> > this patch fixed the issue with battery, it's properly detected after
> > suspend.
> 
> Hi Mika,
> 
> Thanks for the patch, any plan to send it upstream?

I'm not sure if that is general enough solution for this. I think we should handle platform devices (battery) which use I2C OpRegs in such way that the I2C host controller gets resumed whenever the region is accessed.

Unfortunately I have no idea how to do this. The battery device (BATC) seems to have I2C in its _DEP list but how can we take advantage of this is a good question.
Comment 53 Mika Westerberg 2015-09-02 10:31:49 UTC
Marcin,

Can you next try to disable the Ubuntu GUI completely and do the suspend/resume cycle directly from the console? I'm interested if the touch screen works before and after (use od -x).

If the touch screen works we can, at least, rule out kernel/driver issue.
Comment 54 Marcin Mielniczuk 2015-09-02 14:41:49 UTC
Mika,

I don't really know how I can do it. I've already tried booting with the text parameter on the kernel cmdline but this still boots me into the GUI.
Comment 56 Mika Westerberg 2015-09-03 14:01:39 UTC
Adding Tianyu who did the initial _DEP support for ACPI battery device.
Comment 57 Mika Westerberg 2015-09-03 15:28:09 UTC
Aaron, Tianyu,

The reason why I2C is not resumed before battery is that the ACPI battery driver is acpi_driver instead of a physical device driver (e.g a platform driver). So there is no parent <-> child relationship between the two.
Comment 58 Marcin Mielniczuk 2015-09-06 16:47:52 UTC
Mika,

Yes, the reason the usual text mode didn't work was the fact that Ubuntu 15.04 uses systemd. Thanks.

I booted the system into text mode, and after suspend `od -x ...` prints stuff after touching the screen. So it seems it's not a kernel thing.

I tried to find out the affected component. I did `xinit -- :1` to set up a new X server. Then I enter the newly created X server, suspended from the xterm inside. After suspending in this new environment `od -x ...` failed. Thus, I suspect that Xorg is the affected component. Should I report a bug against Xorg?

One more observation: `rmmod brcmfmac && modprobe brcmfmac` doesn't bring wlan back
Comment 59 Mika Westerberg 2015-09-08 10:58:01 UTC
Yeah, I think you should file the touchscreen issue against Xorg.

I'll try to look what can we do to the battery issue. I'm able to reproduce it here on Asus T100.

For the WLAN issue, I think it may be better to file a separate buf since it is not related to the battery problem.
Comment 60 Mika Westerberg 2015-09-08 10:58:50 UTC
s/buf/bug/
Comment 61 Marcin Mielniczuk 2015-09-08 12:49:57 UTC
Thanks!

Filed a bug #104201 against kernel.
Comment 62 Aaron Lu 2015-09-09 01:09:58 UTC
(In reply to Mika Westerberg from comment #59)
> Yeah, I think you should file the touchscreen issue against Xorg.

Sorry for the late reply and I'll also take a look at this battery to see if I can come up with something.
Comment 63 Mika Westerberg 2015-09-09 08:21:48 UTC
OK, thanks Aaron.
Comment 64 Aaron Lu 2015-09-09 08:46:18 UTC
(In reply to Mika Westerberg from comment #57)
> Aaron, Tianyu,
> 
> The reason why I2C is not resumed before battery is that the ACPI battery
> driver is acpi_driver instead of a physical device driver (e.g a platform
> driver). So there is no parent <-> child relationship between the two.

I see, thanks.

I just noticed that the battery driver will return -EPROBE_DEFER on initial call due to the unavailability of the I2C operation region, its device node should be moved to last of the dpm_list by deferred_probe_work_func, thus maintain the order of parent/child with I2C controller 80860F41:00. And since both the I2C controller and battery are not async suspend enabled, they should be resumed one by one with battery being resumed first. I'll prepare a patch for Marcin to dump the device list of dpm_list on suspend.
Comment 65 Aaron Lu 2015-09-09 08:51:24 UTC
Created attachment 187121 [details]
Dump device names according to dpm_list

Hi Marcin,

Please add initcall_debug and no_console_suspend to kernel cmdline option, and please attach the dmesg after system resumed, thanks.
Comment 66 Marcin Mielniczuk 2015-09-09 10:08:16 UTC
Of course I should apply the patch on a clean kernel, shouldn't I? Is it for 4.1.y?
Comment 67 Aaron Lu 2015-09-10 01:09:54 UTC
(In reply to Marcin Mielniczuk from comment #66)
> Of course I should apply the patch on a clean kernel, shouldn't I? Is it for
> 4.1.y?

Feel free to apply the patch on the tree you have been using, it's a one-line patch for a file that shouldn't have much changes. And yes, it can apply cleanly on 4.1.y.
Comment 68 Marcin Mielniczuk 2015-09-10 17:47:44 UTC
Created attachment 187291 [details]
dmesg after Aaron's patch
Comment 69 Aaron Lu 2015-09-11 03:20:57 UTC
Thanks for the result, the battery device node isn't moved to the tail of dpm_list, I'll see what happened there.
Comment 70 Mika Westerberg 2015-09-11 13:51:37 UTC
BTW, I found following patch floating around:

https://lkml.org/lkml/2015/9/10/218

Maybe it helps here?
Comment 71 Aaron Lu 2015-09-14 06:20:24 UTC
(In reply to Mika Westerberg from comment #70)
> BTW, I found following patch floating around:
> 
> https://lkml.org/lkml/2015/9/10/218
> 
> Maybe it helps here?

Yes, definitely worth a try.
Comment 72 Aaron Lu 2015-09-14 06:24:13 UTC
BTW, please enable debug output for drivers/base/dd.c and drivers/base/power/main.c so that we can see if the battery device and I2C host controller device have deferred probe happened, thanks.

https://www.kernel.org/doc/local/pr_debug.txt
Comment 73 Marcin Mielniczuk 2015-09-14 06:39:37 UTC
Aaron, 

should I rebuild the kernel to enable pr_debug now and attach the dmesg on the new kernel or should I use a kernel compiled in such a way for all the future dmesgs?
Comment 74 Aaron Lu 2015-09-14 07:43:22 UTC
(In reply to Marcin Mielniczuk from comment #73)
> Aaron, 
> 
> should I rebuild the kernel to enable pr_debug now and attach the dmesg on
> the new kernel or should I use a kernel compiled in such a way for all the
> future dmesgs?

For all the future dmesgs, thanks.
Comment 75 Aaron Lu 2015-09-29 02:57:14 UTC
Hi Marcin,

It looks like I mis-understood your comment #73. Can you please build a kernel according to comment #72 and attach dmesg? Thanks.
Comment 76 Marcin Mielniczuk 2015-09-29 17:51:53 UTC
Hmmm... will it suffice to prepend `CFLAGS += -DDEBUG` to make? Or should I do
`CFLAGS_drivers/base/dd.o := -DDEBUG` (and the same for the second)?
Comment 77 Aaron Lu 2015-09-30 07:55:19 UTC
Like this:

diff --git a/drivers/base/Makefile b/drivers/base/Makefile
index 6b2a84e7f2be..29e6f4fd61c6 100644
--- a/drivers/base/Makefile
+++ b/drivers/base/Makefile
@@ -26,3 +26,4 @@ obj-$(CONFIG_GENERIC_MSI_IRQ_DOMAIN) += platform-msi.o
 
 ccflags-$(CONFIG_DEBUG_DRIVER) := -DDEBUG
 
+CFLAGS_dd.o :=-DDEBUG
diff --git a/drivers/base/power/Makefile b/drivers/base/power/Makefile
index f94a6ccfe787..b38bbf0399ea 100644
--- a/drivers/base/power/Makefile
+++ b/drivers/base/power/Makefile
@@ -6,3 +6,4 @@ obj-$(CONFIG_PM_GENERIC_DOMAINS)        +=  domain.o domain_governor.o
 obj-$(CONFIG_HAVE_CLK) += clock_ops.o
 
 ccflags-$(CONFIG_DEBUG_DRIVER) := -DDEBUG
+CFLAGS_main.o := -DDEBUG
Comment 78 Marcin Mielniczuk 2015-10-25 14:18:04 UTC
Created attachment 191081 [details]
dmesg with debug enabled in Makefiles

Sorry it took me so long.
Comment 79 Aaron Lu 2015-10-26 08:07:09 UTC
Created attachment 191151 [details]
do not attach in dep_walk, instead, rely on defer probe to trigger it

Marcin,

It seems you missed the dd debug part. Never mind, I just prepared this patch to have all the required stuffs, please apply it on top of v4.3-rc7 and after boot, please attach dmesg, thanks.
Comment 80 Marcin Mielniczuk 2015-11-12 13:27:47 UTC
Aaron, 

Sorry it takes me such a long time but I'm simply a busy student :)

Can I apply it on top of v4.3 final?

Thanks
Comment 81 Aaron Lu 2015-11-13 02:13:08 UTC
Never mind and yes, it can be applied cleanly on top of v4.3.
Comment 82 Aaron Lu 2015-12-16 02:52:35 UTC
Any update?
Comment 83 Zhang Rui 2015-12-28 07:35:16 UTC
ping...
Comment 84 Marcin Mielniczuk 2015-12-28 09:10:16 UTC
Created attachment 198381 [details]
On patched 4.3

Sorry, I've been very busy but I finally have some free time after Christmas.

So, the battery detection doesn't work altogether on the patched kernel, even before the reboot.

The dmesg after suspend will follow in a while.
Comment 85 Marcin Mielniczuk 2015-12-28 09:11:34 UTC
Created attachment 198391 [details]
On patched 4.3, after suspend
Comment 86 Aaron Lu 2016-05-13 05:40:32 UTC
(In reply to Marcin Mielniczuk from comment #84)
> Created attachment 198381 [details]
> On patched 4.3
> 
> Sorry, I've been very busy but I finally have some free time after Christmas.
> 
> So, the battery detection doesn't work altogether on the patched kernel,
> even before the reboot.
> 
> The dmesg after suspend will follow in a while.

Sorry for the long delay.

Can you please set CONFIG_DEBUG_DRIVER to y and upload the dmesg after boot again?
Comment 87 Zhang Rui 2016-05-16 06:47:02 UTC
please attach the output of
"cat /sys/bus/acpi/devices/PNP0C0A\:0*/status" both before and after resume.
Comment 88 Marcin Mielniczuk 2016-05-18 19:29:45 UTC
Should anything unusual than CONFIG_DEBUG_DRIVER=y be enabled?
Comment 89 Aaron Lu 2016-05-20 01:52:18 UTC
Configs for ACPI battery/I2C designware host controllers/etc. should also be enabled and I believe you have already done that.
The CONFIG_DEBUG_DRIVER=y will enable debug print from driver core, I didn't see battery driver gets loaded from your last dmesg so would like to know more.

In you last dmesg, there is:
[    1.099413] calling  acpi_battery_init+0x0/0x2c @ 1
[    1.099428] initcall acpi_battery_init+0x0/0x2c returned 0 after 0 usecs
which means the battery driver has loaded, but I don't see any output of:
[    x.xxxxxx] bus: 'acpi': driver_probe_device: matched device PNP0C0A:00 with driver battery

It could be due to the bettery device's status indicates it's not present, thus Rui has also asked for the status output in comment #87.
Comment 90 Marcin Mielniczuk 2016-05-20 06:50:15 UTC
I doubt I still have the old sources on my computer. It's why I'm asking for the complete set of options.
Comment 91 Aaron Lu 2016-05-20 08:08:18 UTC
No need to use the old sources, just use the latest upstream code and enable that config when building, then attach dmesg after boot is enough for now.
The old patch is used to see the device order in dpm_list but I don't see battery device gets probed in your last dmesg so I need to make sure the battery driver works after boot first.

And please list the status of device PNP0C0A as requested in comment #87.
Comment 92 Marcin Mielniczuk 2016-05-20 08:57:27 UTC
I don't even have the old config with me. I can use the default config for my distro.
Comment 93 Marcin Mielniczuk 2016-05-28 20:17:27 UTC
I'll have access to the affected computer only for two days, after that I won't be able to access it for 3 weeks. So if I don't build the kernel tomorrow, I won't be able to test it during these two days.

So, can you please list all the non-standard kernel build options I should enable?
Comment 94 Zhang Rui 2016-05-30 03:21:45 UTC
can you reproduce the problem for now?
If yes, as the first step, please just attach the output of
"cat /sys/bus/acpi/devices/PNP0C0A\:0*/status" both before and after resume.
Comment 95 Zhang Rui 2016-06-20 02:11:33 UTC
ping...
Comment 96 Marcin Mielniczuk 2016-06-20 08:22:01 UTC
Please tell me which config options have to be enabled for the results to be worth anything. I don't have my old config, I'll base on the default config for the Ubuntu kernel, some old version in fact.

Because otherwise you'll soon tell me I should I compile with some kernel option I didn't remember to enable; the results will be worthless.
Comment 97 Zhang Rui 2016-06-20 08:35:26 UTC
hmmm, for the battery, please make sure CONFIG_ACPI_BATTERY is enabled, for the others, I don't know what drivers you were using, but if you want to reproduce the problem (device not working after resume), you must enable those drivers, right?
Comment 98 Marcin Mielniczuk 2016-06-20 09:04:27 UTC
Ok, I'm building the kernel, I'll hopefully report back today in the evening.
Comment 99 Marcin Mielniczuk 2016-06-22 10:47:31 UTC
The system froze during upgrade, now is unbootable. (I reported the freezes some time ago but can't find the report anymore)
I'll reinstall the system and try again. Expect a report at the beginning of July
Comment 100 Marcin Mielniczuk 2016-07-04 13:22:30 UTC
Installed Ubuntu 16.04 + mainline kernel 4.6.3 (unpatched). The device does not wake up from suspend (pressing the power button seems to do nothing)

What should I do next?
Comment 101 Aaron Lu 2016-07-05 03:32:23 UTC
The original problem is about battery not usable after resume, but now the system failed to resume. So looks like we will need to make it resume first and then continue to track the battery problem.
Just to make sure, do you have CONFIG_INPUT_SOC_BUTTON_ARRAY and CONFIG_GPIO_CRYSTAL_COVE set?
Comment 102 Marcin Mielniczuk 2016-07-05 13:02:47 UTC
CONFIG_INPUT_SOC_BUTTON_ARRAY=m
CONFIG_GPIO_CRYSTAL_COVE=m
Comment 103 Marcin Mielniczuk 2016-07-05 13:08:43 UTC
Exactly which hardware do these options pertain? Why are they important in our case?
Comment 104 Aaron Lu 2016-07-06 05:26:07 UTC
Based on my T100 knowledge, the CONFIG_GPIO_CRYSTAL_COVE is for the CrystalCove PMIC device where the power button is connected to and the CONFIG_INPUT_SOC_BUTTON_ARRAY is the driver for the 5 button keys around the tablet.
Comment 105 Aaron Lu 2016-07-11 01:57:04 UTC
Marcin,
Are you willing to continue debug this issue? If so, we will need to first figure out why resume breaks and then continue the debug of battery issue; if the problem doesn't affect you much and you do not want to spend time on it, we can close it. What do you think?
Comment 106 Marcin Mielniczuk 2016-07-11 07:19:50 UTC
Aaron,
of course.

But you need to know:
- I have continuous access to the device (every day) in July, August and September (possibly barring a week or two)
- Later on, (from October to December) my physical access to the computer will be sparse (one or two weekends a month)

Hopefully we'll manage to fix it during the vacation :)
Please tell me what should I do next.

One observation: the system doesn't really suspend! I called

    # echo freeze > /sys/power/state

Then the display froze. (but stayed lit) The system didn't react to pressing the power button. I raised the elephant and the system successfully rebooted.
Comment 107 Marcin Mielniczuk 2016-07-11 07:22:15 UTC
My above comment may be wrong. I've just tried it on my own laptop and pressing any key wakes the system up. The conclusion "the system doesn't really suspend" may simply be wrong.

Anyway, the conclusion is: the system reacts to raising the elephant.
Comment 108 Marcin Mielniczuk 2016-07-11 07:24:52 UTC
One more note: I'm using intel_idle.max_cstate=1 so as to avoid system freezes (see #109051)
This doesn't seem to affect the suspend behavior
Comment 109 Aaron Lu 2016-07-13 01:56:25 UTC
The easy way to see why freeze/resume stops working is git bisect, but that would cause much time, you may consider doing that if we run out of ideas.

The display stayed lit seems to indicate the freeze failed, probably it's related to the GPU driver. Booting the system with "nomodeset" will get rid of the GPU driver, but then I suppose the display will not be turned on after resume since there is no driver for it. But anyway, it should be worth a try to see if adding "nomodeset" to kernel cmdline could at least make suspend work.
Comment 110 Marcin Mielniczuk 2016-07-13 07:17:47 UTC
I'll do a simple check whether I can suspend at all with a kernel old enough. Some users have reported similar problems with suspend with T200TA on the Arch Linux forum.

After using nomodeset and 

    # echo freeze > /sys/power/state

the display stays lit. The GUI session disappears, in some virtual console there can be seen some messages similar to the ones in comment 39.

The system has not suspended (or has suspended and woken up instantly), since I can "switch" virtual consoles. Written in quotation marks, since pressing ctrl+alt+FN yields a black screen. for any N=1, 2, ..., 12

The system reacts to reisub.
Comment 111 Aaron Lu 2016-07-13 07:53:54 UTC
(In reply to Marcin Mielniczuk from comment #110)
> I'll do a simple check whether I can suspend at all with a kernel old
> enough.

That would be good to know.

> 
> After using nomodeset and 
> 
>     # echo freeze > /sys/power/state
> 
> the display stays lit. The GUI session disappears, in some virtual console
> there can be seen some messages similar to the ones in comment 39.

Is it possible to get rid of the broadcom SDIO wifi driver when testing freeze/resume?

> 
> The system has not suspended (or has suspended and woken up instantly),
> since I can "switch" virtual consoles. Written in quotation marks, since
> pressing ctrl+alt+FN yields a black screen. for any N=1, 2, ..., 12

Looks like the system is in a messed state...

> 
> The system reacts to reisub.

Thanks for the test.
Comment 112 Marcin Mielniczuk 2016-07-13 08:03:35 UTC
Under kernel 4.1.2 the device wakes up from suspend as it used to. So we have a regression.

(In reply to Aaron Lu from comment #111)
> 
> Is it possible to get rid of the broadcom SDIO wifi driver when testing
> freeze/resume?

Will it suffice to blacklist brcmfmac?

As for the git binsearch, this is maybe not the most efficient way, but it'll be feasible. There are Ubuntu builds for almost all 4.x kernel releases here: http://kernel.ubuntu.com/~kernel-ppa/mainline/
Comment 113 Aaron Lu 2016-07-13 09:15:37 UTC
(In reply to Marcin Mielniczuk from comment #112)
> Under kernel 4.1.2 the device wakes up from suspend as it used to. So we
> have a regression.
> 
> (In reply to Aaron Lu from comment #111)
> > 
> > Is it possible to get rid of the broadcom SDIO wifi driver when testing
> > freeze/resume?
> 
> Will it suffice to blacklist brcmfmac?

I think so, or you can also blacklist the sdio host controller driver, something like sdhci_acpi I guess.

> 
> As for the git binsearch, this is maybe not the most efficient way, but
> it'll be feasible. There are Ubuntu builds for almost all 4.x kernel
> releases here: http://kernel.ubuntu.com/~kernel-ppa/mainline/

This is not enough to get the bad commit, only the first bad kernel version.
Comment 114 Marcin Mielniczuk 2016-07-13 09:39:21 UTC
When I blacklisted sdhci_acpi, the system gets stuck in initramfs. I blacklisted brcmfmac, but the problem suspend persists.
Comment 115 Marcin Mielniczuk 2016-07-13 09:55:20 UTC
I forgot about nomodeset then. Now with modprobe.blacklist=brcmfmac nomodeset, I called the command from tty1. 

After issuing it, I get a blank tty with a single underscore. I can freely switch ttys 1-6. I can see everything I typed into tty1. As soon as I enter tty7, I get a black screen forever.

Reisub still responds.
Comment 116 Aaron Lu 2016-07-14 04:12:51 UTC
(In reply to Marcin Mielniczuk from comment #114)
> When I blacklisted sdhci_acpi, the system gets stuck in initramfs. I
> blacklisted brcmfmac, but the problem suspend persists.

My bad, I just realized the MMC storage, where the roofs lives, is also on sdhci host controller.
Comment 117 Marcin Mielniczuk 2016-07-21 07:33:25 UTC
Aaron,

since there were no new ideas from you, I tested two kernel versions. 
4.4.0 is good
4.5.0 is tainted

This may help make some hypotheses.
I'll test 4.4.7 today and report back.
Comment 118 Aaron Lu 2016-07-21 08:42:37 UTC
Thanks for the test, I do not have any idea what could be the cause for the freeze failure.
I guess 4.4.7 still works, since stable kernel doesn't pick major changes normally.
To know the exact commit that breaks freeze, only git bisect is possible I'm afraid.
Comment 119 Marcin Mielniczuk 2016-07-21 08:49:09 UTC
Ok, so I'll test the last from the 4.4.x series.

Can you take a look at the commits which enter kernel 4.5? Maybe you'll find something possibly related?
Comment 120 Marcin Mielniczuk 2016-07-21 08:50:05 UTC
A small note: after `echo freeze` on 4.5+ I can still type into the tty!
Comment 121 Marcin Mielniczuk 2016-07-21 08:50:31 UTC
(and ctrl+alt+del doesn't cause a reboot, reisub is needed)
Comment 122 Aaron Lu 2016-07-22 02:18:46 UTC
(In reply to Marcin Mielniczuk from comment #119)
> Ok, so I'll test the last from the 4.4.x series.
> 
> Can you take a look at the commits which enter kernel 4.5? Maybe you'll find
> something possibly related?

It doesn't seem possible, there are too many commits for a release:
[aaron@aaronlu linux]$ git log --no-merges v4.4..v4.5 |grep ^commit |wc -l
12080
Looking for a single subsystem may be possible, but we do not know which subsystem(e.g. gpu driver, wifi driver, etc.) makes the freeze fail yet.
Comment 123 Marcin Mielniczuk 2016-08-10 07:14:12 UTC
Can you give me the lo and hi for the git binsearch?
Comment 124 Aaron Lu 2016-08-11 01:26:36 UTC
v4.4 is the good(lo) one and v4.5 is the bad(hi) one.
You can start by:
$ git bisect start
$ git bisect good v4.4
$ git bisect bad v4.5
Comment 125 Zhang Rui 2016-09-02 06:38:45 UTC
ping...
Comment 126 Marcin Mielniczuk 2016-11-04 20:51:05 UTC
Sorry that I reply after such a long time. I started bisecting, but until I finished, I had to move to another city and don't have physical access to the machine anymore.

I guess I won't be of much use anymore. Sorry.
Comment 127 Zhang Rui 2016-12-22 01:19:14 UTC
okay.
Bug closed. Please feel free to reopen it if you can provide the bisect result.

Note You need to log in before you can comment on or make changes to this bug.