Bug 203877 - (bisect a3fbfae82b4cb3ff9928e29f34c64d0507cad874 ) Resume from suspend causes reset/crash and corruption on ASUS C302 (tpm_tis.interrupts=0 workaround the issue)
Summary: (bisect a3fbfae82b4cb3ff9928e29f34c64d0507cad874 ) Resume from suspend causes...
Status: NEEDINFO
Alias: None
Product: Drivers
Classification: Unclassified
Component: Other (show other bugs)
Hardware: Intel Linux
: P1 normal
Assignee: jarkko.sakkinen
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-06-12 15:10 UTC by Chris Osgood
Modified: 2021-03-17 00:43 UTC (History)
10 users (show)

See Also:
Kernel Version: 4.19.67+ 5.1+
Subsystem:
Regression: No
Bisected commit-id:


Attachments
journal logs (29.76 KB, text/plain)
2020-04-12 21:56 UTC, Ferry Toth
Details
acpidump from Acer C720P (94.42 KB, text/plain)
2020-06-26 16:35 UTC, Ferry Toth
Details

Description Chris Osgood 2019-06-12 15:10:07 UTC
Starting with kernel 5.1 suspend on my ASUS C302 will crash and reset the machine when resumed. This machine is a Skylake (CAVE) based ChromeBook running Arch Linux.

The suspend itself seems to work to put the machine to sleep but when resuming the machine resets and displays the white "BIOS" boot screen. After this happens the machine can no longer boot using legacy mode and I have to boot to ChromeOS and re-enable legacy boot. Also the headphones get screwed up and will not work until the machine is suspended then resumed in ChromeOS with headphones plugged in. Seems like some sort of possible firmware corruption when the crash occurs.

There is nothing generated in the kernel logs when this happens. The last entry in the log is me pushing the power button to suspend the machine then nothing after that.

Kernel 5.0 works fine. This started with 5.1 and all versions up to the current 5.1.8 are affected.
Comment 1 Chris Osgood 2019-06-13 20:01:36 UTC
I meant to mention this is NOT fixed by the "'nosmt' vs hibernation triple fault during resume" patch in 949525fff5f722245ee2e2b1fe1e860e7e603579

Issue still present in kernel 5.1.9
Comment 2 Jesse Becker 2019-08-09 00:31:53 UTC
Still present in 5.2.6.

5.0.13 works, 5.1.0 onward does not.  Running on an Acer C720 (also a brainwashed chromebook), also running Arch.

This laptop seems to handle things a little better than the ASUS:  legacy boots still works, and I do not see any evidence of audio problems or firmware corruption (the filesystems are, however, unhappy with the hard shutdown).  There are no logs after the suspend.
Comment 3 Chris Osgood 2019-08-27 23:43:19 UTC
This bug or a version of it has now been ported over to kernel 4.19.68 (at least that's the one I tested).
 
I suppose I should update the kernel version on this bug but want to see if I can get more confirmation first.
 
I'm running out of workable kernels now. :/
Comment 4 Chris Osgood 2019-08-29 14:15:21 UTC
I find that Arch Linux kernel 4.19.66-1-lts works. Then kernel 4.19.67-1-lts and above is broken (though I have not tested 4.19.69).

So if someone wants to bisect whatever changed between .66 and .67 then that's likely the problem code. Maybe something to do with power management? I'm not sure. This might also find the problem in the 5.1+ kernels.
Comment 5 Jesse Becker 2019-08-29 14:20:33 UTC
> changed between .66 and .67 then

And is also common with the deltas between 5.0.13 and 5.1.0?  That may help narrow down the scope of changes.
Comment 6 Chen Yu 2019-09-09 09:18:07 UTC
(In reply to Chris Osgood from comment #4)
> I find that Arch Linux kernel 4.19.66-1-lts works. Then kernel 4.19.67-1-lts
> and above is broken (though I have not tested 4.19.69).
> 
Since 66 and 67 are with lts suffix, it might be more straightforward to test on upstream vanilla kernel.

> So if someone wants to bisect whatever changed between .66 and .67 then
> that's likely the problem code. Maybe something to do with power management?
> I'm not sure. This might also find the problem in the 5.1+ kernels.
Yeah, it looks like you are at the front line to bisect this out : )
Comment 7 Chen Yu 2019-09-30 01:09:48 UTC
@Chris, just wonder if you have time for a bisect if this issue is still there?
Comment 8 Chris Osgood 2019-09-30 14:39:18 UTC
Still broken in kernel 5.3.1.

I haven't had a chance to bisect the issue. I'm busy fighting all sorts of fires related to kernel 5, which in general has been a disaster. Almost every new version breaks more stuff (5.1 broke the chromebooks, 5.2 broke some of our servers, 5.3 broke the e1000e driver, the list goes on).

So I'm still stuck running older version 4 kernels for a while.
Comment 9 Ferry Toth 2019-10-23 07:07:35 UTC
I confirm this issue also occurs on Acer 720P (ex)chromebook (with touch screen + 4Gb ram). After updating Ubuntu to 19.10 (linux v5.3) it suspends fine but will not resume. The issue does not occur  with Ubuntu 19.04 (linux v5.0). 

With 720P the machine just reboots to legacy, no other issues.

Using Ubuntu kernel PPA versions I test linux v5.0, v5.1, v5.2 and fount the issue first occurs with v5.1.

The number of backported patches from 4.19.66 -> 67 should be very limited right? Maybe we can identify from the commit message?
Comment 10 Ferry Toth 2019-10-23 18:48:14 UTC
From Ubuntu ppa v4.19.80 resumes correctly.
Comment 11 Ferry Toth 2019-11-26 19:29:58 UTC
Any news here?
Comment 12 Jane Soko 2019-11-27 02:53:49 UTC
For me, the original culprit was [a3fbfae82b4cb3ff9928e29f34c64d0507cad874]

> tpm: take TPM chip power gating out of tpm_transmit()

[a3fbfae82b4cb3ff9928e29f34c64d0507cad874]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a3fbfae82b4cb3ff9928e29f34c64d0507cad874
Comment 13 Ferry Toth 2019-12-01 22:33:09 UTC
This evening I built ubuntu eoan master-next.

This is the to be kernel 5.3.0-24 based of linux 5.3.13 + UBUNTU: SAUCE: Revert "tpm_tis_core: Turn on the TPM before probing IRQ's" + UBUNTU: SAUCE: Revert "tpm_tis_core: Set TPM_CHIP_FLAG_IRQ before probing for interrupts"

(see https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/eoan/log/?h=master-next)

Unfortunately this kernel does not resolve the issue in this bug.
Comment 14 Chen Yu 2019-12-02 01:08:50 UTC
(In reply to Ferry Toth from comment #13)
> This evening I built ubuntu eoan master-next.
> 
> This is the to be kernel 5.3.0-24 based of linux 5.3.13 + UBUNTU: SAUCE:
> Revert "tpm_tis_core: Turn on the TPM before probing IRQ's" + UBUNTU: SAUCE:
> Revert "tpm_tis_core: Set TPM_CHIP_FLAG_IRQ before probing for interrupts"
> 
These two patches do not revert all the changes introduced in a3fbfae82b4cb3ff9928e29f34c64d0507cad874, do they?
> (see
> https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/eoan/log/
> ?h=master-next)
> 
> Unfortunately this kernel does not resolve the issue in this bug.
How about unload the tpm module or even unset CONFIG_TCG_TPM and build the kernel?
I have encountered the hibernation issue that the system hangs when issuing S4 due to tpm unable to shutdown the devices during that phase.
Comment 15 Ferry Toth 2019-12-02 10:02:38 UTC
Maybe "tpm: take TPM chip power gating out of tpm_transmit()" needs reverting too.
Comment 16 Ferry Toth 2019-12-13 23:17:36 UTC
I have long time on the kernel command line: "tpm_tis.force=1".

On 4.15 this causes:
tpm_tis: 1.2 TPM (device-id 0xB, rev-id 16)
genirq: Flags mismatch irq 9. 00000000 (tpm0) vs. 00000080 (acpi)
tpm tpm0: Unable to request irq: 9 for probe
tpm_tis 00:08: can't request region for resource [mem 0xfed40000-0xfed44fff]
tpm_tis: probe of 00:08 failed with error -16
genirq: Flags mismatch irq 8. 00000080 (rtc0) vs. 00000000 (tpm0)
tpm_inf_pnp 00:08: Found TPM with ID IFX0102

On 5.3.0:
tpm_tis: 1.2 TPM (device-id 0xB, rev-id 16)
tpm tpm0: tpm_try_transmit: send(): error -5
tpm tpm0: A TPM error (-5) occurred attempting to determine the timeouts
tpm_tis tpm_tis: Could not get TPM timeouts and durations
tpm_tis 00:08: 1.2 TPM (device-id 0xB, rev-id 16)
tpm tpm0: tpm_try_transmit: send(): error -5
tpm tpm0: A TPM error (-5) occurred attempting to determine the timeouts
tpm_tis 00:08: Could not get TPM timeouts and durations
ima: No TPM chip found, activating TPM-bypass!
tpm_inf_pnp 00:08: Found TPM with ID IFX0102

So, it something normally going wrong goes more wrong.
Comment 17 Ferry Toth 2019-12-15 19:06:36 UTC
And 5.3.0 with tpm_tis.force=1

ferry@chromium:~$ tpm_version 
Tspi_Context_Connect failed: 0x00003011 - layer=tsp, code=0011 (17), Communication failure

with 4.15.0 with tpm_tis.force=1

ferry@chromium:~$ tpm_version 
  TPM 1.2 Version Info:
  Chip Version:        1.2.4.32
  Spec Level:          2
  Errata Revision:     3
  TPM Vendor ID:       IFX
  Vendor Specific data: 0420036f 0074706d 3338ffff ff
  TPM Version:         01010000
  Manufacturer Info:   49465800
Comment 18 Jan De Luyck 2020-01-03 10:39:27 UTC
I'm seeing something similar on an Asus Zenbook UX305U. Laptop suspends correctly, on resume I am greeted by the bios boot screen (I guess it crashes really quickly).

Nothing visible in the logs inbetween the suspend and the reboot.
Comment 19 Jan De Luyck 2020-03-21 11:35:03 UTC
Is there anything we can do to further diagnose this issue?
Comment 20 Chris Osgood 2020-03-22 21:25:05 UTC
This is still broken as of 5.5.10

Probably the only way to get this fixed is via a 3rd party kernel developer like Ubuntu/Fedora/GalliumOS or something because currently it seems to be ignored by the mainline kernel devs.

I still haven't had time to bisect the issue and have my kernel pinned at 4.19.66. I think by far the easiest way to find the problem is by comparing 4.19.66 to 4.19.67 and see what changed.

Anyone tried the latest GalliumOS? What kernel does it use?
Comment 21 Chen Yu 2020-03-29 15:39:53 UTC
Hi,
Is it possible to recompile the kernel on latest kernel without "CONFIG_TCG_TPM" , and apply the following debug patch from
https://patchwork.kernel.org/patch/11464059/
and check:
echo test_resume > /sys/power/disk
echo disk > /sys/power/state
and wait for 5 seconds to see if it could resume automatically?
Comment 22 Ferry Toth 2020-03-31 20:51:36 UTC
Must we disable 'CONFIG_TCG_TPM'? Because it is auto selected:
Selected by [y]:                                                                                                                                                                                                                                                         
- IMA [=y] && INTEGRITY [=y] && HAS_IOMEM [=y] && !UML
Comment 23 Ferry Toth 2020-03-31 21:45:35 UTC
I disabled INTEGRITY and then TCG_TPM'in menuconfig and built kernel.

Then:
root@chromium:~# echo test_resume > /sys/power/disk
root@chromium:~# echo disk > /sys/power/state
-bash: echo: schrijffout: Geen ruimte meer over op apparaat

(that is write error: no more space on device).

It seems with this test it is trying to hybernate instead of suspend.
Comment 24 Ferry Toth 2020-03-31 21:55:45 UTC
Closing laptop lid it goes to sleep properly (suspend led blinks). Opening the lid again brings my direct to the boot screen.

So no change. 

This is with vanilla 5.6.0 + ubuntu 'sauce' + patch in #21 above - TCG_TPM.
Comment 25 Jan De Luyck 2020-04-10 08:55:44 UTC
The patch should only give more debug output - did you capture anything?
Comment 26 Ferry Toth 2020-04-11 18:40:57 UTC
Nope. All I got was the error.
Comment 27 Chen Yu 2020-04-12 10:40:31 UTC
Ferry, Chris,
I thought you were trying to test hibernation. Let's switch back to
suspend to mem.
The first thing is to figure out what suspend mode you are using:
1. boot with the same kernel with TPM disabled.
2. cat /sys/power/mem_sleep,
    echo 'N' > /sys/module/printk/parameters/console_suspend
    echo 'Y' > /sys/module/printk/parameters/initcall_debug
and leverage pm_test mode to narrow down:

3. echo  different mode to /sys/power/pm_test
    start from right to left (freezer to devices to platform...)
   [none] core processors platform devices freezer
    check if the system could resume back within 5 seconds.

For example:
  echo freezer > /sys/power/pm_test
  echo mem > /sys/power/state

if succeed to resume back, then:
   save the dmesg and launch the next test mode:
  echo devices> /sys/power/pm_test
  echo mem > /sys/power/state

and go on util you see a hang during resume.
Comment 28 Ferry Toth 2020-04-12 21:54:28 UTC
ferry@chromium:~$ tpm_version 
Tspi_Context_Connect failed: 0x00003011 - layer=tsp, code=0011 (17), Communication failure

ferry@chromium:~$ cat /sys/power/mem_sleep
s2idle [deep]
root@chromium:~# echo 'N' > /sys/module/printk/parameters/console_suspend
root@chromium:~# echo 'Y' > /sys/module/printk/parameters/initcall_debug
-bash: /sys/module/printk/parameters/initcall_debug: Toegang geweigerd
(that is access denied, because not existing)


Going from right to left only none fails (i.e. does not return but goes to the boot screen).
I captured journal tails in pm_test.txt
Comment 29 Ferry Toth 2020-04-12 21:56:08 UTC
Created attachment 288401 [details]
journal logs
Comment 30 Ferry Toth 2020-04-12 21:57:55 UTC
In line 329 it says:
echo processors > /sys/power/pm_test

But that of course was:
echo none > /sys/power/pm_test
Comment 31 Chen Yu 2020-04-13 03:50:19 UTC
(In reply to Ferry Toth from comment #28)
> ferry@chromium:~$ tpm_version 
> Tspi_Context_Connect failed: 0x00003011 - layer=tsp, code=0011 (17),
> Communication failure
> 
> ferry@chromium:~$ cat /sys/power/mem_sleep
> s2idle [deep]
> root@chromium:~# echo 'N' > /sys/module/printk/parameters/console_suspend
> root@chromium:~# echo 'Y' > /sys/module/printk/parameters/initcall_debug
> -bash: /sys/module/printk/parameters/initcall_debug: Toegang geweigerd
> (that is access denied, because not existing)
> 
> 
> Going from right to left only none fails (i.e. does not return but goes to
> the boot screen).
> I captured journal tails in pm_test.txt
This has shown that when resuming from BIOS S3 and the BIOS is about to
jump back to the vector in OS, the system has triggered a reboot. Did you update
your BIOS recently?
How about
echo deep > /sys/power/mem_sleep
rtcwake -m mem -s 30? 


And how about:
echo s2idle > /sys/power/mem_sleep
rtcwake -m freeze -s 30?
Comment 32 Ferry Toth 2020-04-13 17:17:39 UTC
No, this is a brainwashed Chromebook. Once linux installed impossible to update BIOS:
ferry@chromium:~$ sudo dmidecode  
# dmidecode 3.2
Getting SMBIOS data from sysfs.
SMBIOS 2.7 present.
10 structures occupying 397 bytes.
Table at 0x7F782020.

Handle 0x0000, DMI type 0, 24 bytes
BIOS Information
        Vendor: coreboot
        Version:                                                                   
        Release Date: 03/02/2017
        ROM Size: 8192 kB
        Characteristics:
                PCI is supported
                PC Card (PCMCIA) is supported
                BIOS is upgradeable
                Selectable boot is supported
                ACPI is supported
                Targeted content distribution is supported
        BIOS Revision: 4.0
        Firmware Revision: 0.0

I'll reboot into linux-5.6 and answer your other suggestions.
Comment 33 Ferry Toth 2020-04-13 17:44:38 UTC
> echo deep > /sys/power/mem_sleep
> rtcwake -m mem -s 30? 

Looking from blinking LED goes into suspend. Waking takes me directly to BIOS screen.

> echo s2idle > /sys/power/mem_sleep
> rtcwake -m freeze -s 30?

Looking from non-blinking LED is not really suspended, except everything else looks suspended.
Pressing keyboard wakes normally.

journalctl -b -e:
apr 13 19:36:15 chromium kernel: PM: suspend entry (s2idle)
apr 13 19:36:15 chromium kernel: Filesystems sync: 0.000 seconds
apr 13 19:36:15 chromium kernel: Freezing user space processes ... (elapsed 0.002 seconds) done.
apr 13 19:36:15 chromium kernel: OOM killer disabled.
apr 13 19:36:15 chromium kernel: Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
apr 13 19:36:15 chromium kernel: printk: Suspending console(s) (use no_console_suspend to debug)
apr 13 19:36:15 chromium kernel: sd 0:0:0:0: [sda] Synchronizing SCSI cache
apr 13 19:36:15 chromium kernel: sd 0:0:0:0: [sda] Stopping disk
apr 13 19:36:15 chromium kernel: ACPI: EC: interrupt blocked
apr 13 19:36:15 chromium kernel: ACPI: EC: interrupt unblocked
apr 13 19:36:15 chromium kernel: hpet_rtc_timer_reinit: 14 callbacks suppressed
apr 13 19:36:15 chromium kernel: hpet: Lost 6192 RTC interrupts
apr 13 19:36:15 chromium kernel: ath: phy0: ASPM enabled: 0x43
apr 13 19:36:15 chromium kernel: sd 0:0:0:0: [sda] Starting disk
apr 13 19:36:15 chromium kernel: atmel_mxt_ts 2-004a: Resetting device
apr 13 19:36:15 chromium kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
apr 13 19:36:15 chromium kernel: ata1.00: configured for UDMA/100
apr 13 19:36:15 chromium kernel: atmel_mxt_ts 2-004a: Wait for completion timed out.
apr 13 19:36:15 chromium kernel: OOM killer enabled.
apr 13 19:36:15 chromium kernel: Restarting tasks ... done.
apr 13 19:36:15 chromium kernel: PM: suspend exit
Comment 34 Chen Yu 2020-04-14 02:31:53 UTC
(In reply to Ferry Toth from comment #33)
> > echo deep > /sys/power/mem_sleep
> > rtcwake -m mem -s 30? 
> 
> Looking from blinking LED goes into suspend. Waking takes me directly to
> BIOS screen.
> 
> > echo s2idle > /sys/power/mem_sleep
> > rtcwake -m freeze -s 30?
> 
> Looking from non-blinking LED is not really suspended, except everything
> else looks suspended.
> Pressing keyboard wakes normally.
> 
> journalctl -b -e:
> apr 13 19:36:15 chromium kernel: PM: suspend entry (s2idle)
> apr 13 19:36:15 chromium kernel: Filesystems sync: 0.000 seconds
> apr 13 19:36:15 chromium kernel: Freezing user space processes ... (elapsed
> 0.002 seconds) done.
> apr 13 19:36:15 chromium kernel: OOM killer disabled.
> apr 13 19:36:15 chromium kernel: Freezing remaining freezable tasks ...
> (elapsed 0.001 seconds) done.
> apr 13 19:36:15 chromium kernel: printk: Suspending console(s) (use
> no_console_suspend to debug)
> apr 13 19:36:15 chromium kernel: sd 0:0:0:0: [sda] Synchronizing SCSI cache
> apr 13 19:36:15 chromium kernel: sd 0:0:0:0: [sda] Stopping disk
> apr 13 19:36:15 chromium kernel: ACPI: EC: interrupt blocked
> apr 13 19:36:15 chromium kernel: ACPI: EC: interrupt unblocked
> apr 13 19:36:15 chromium kernel: hpet_rtc_timer_reinit: 14 callbacks
> suppressed
> apr 13 19:36:15 chromium kernel: hpet: Lost 6192 RTC interrupts
> apr 13 19:36:15 chromium kernel: ath: phy0: ASPM enabled: 0x43
> apr 13 19:36:15 chromium kernel: sd 0:0:0:0: [sda] Starting disk
> apr 13 19:36:15 chromium kernel: atmel_mxt_ts 2-004a: Resetting device
> apr 13 19:36:15 chromium kernel: ata1: SATA link up 6.0 Gbps (SStatus 133
> SControl 300)
> apr 13 19:36:15 chromium kernel: ata1.00: configured for UDMA/100
> apr 13 19:36:15 chromium kernel: atmel_mxt_ts 2-004a: Wait for completion
> timed out.
> apr 13 19:36:15 chromium kernel: OOM killer enabled.
> apr 13 19:36:15 chromium kernel: Restarting tasks ... done.
> apr 13 19:36:15 chromium kernel: PM: suspend exit

suspend to idle works as expected. 

Could you switch back to linux v5.0, as you mentioned in Comment 9, and check what is the default suspend mode :
cat /sys/power/mem_sleep
and try 
 echo deep > /sys/power/mem_sleep
 rtcwake -m mem -s 30
 
 echo s2idle > /sys/power/mem_sleep
 rtcwake -m freeze -s 30
I was thinking if the default suspend mode is s2idle on v5.0, or your bios has implicitly been adjusted, as there's no chance for OS to control the flow once suspended to S3.
Comment 35 Ferry Toth 2020-04-14 19:28:41 UTC
On Ubuntu 19:10 linux 4.15.0-1079-oem (working well):
ferry@chromium:~$ cat /sys/power/mem_sleep
s2idle [deep]
root@chromium:~# rtcwake -m mem -s 30
rtcwake: aangenomen wordt dat de hardwareklok UTC bevat...
rtcwake: /dev/rtc0: kan apparaat niet vinden: Bestand of map bestaat niet (doesn't exist)
Comment 36 Ferry Toth 2020-04-14 19:49:52 UTC
On Ubuntu 19:10 linux 5.3.0-46 (resume bad):
ferry@chromium:~$ cat /sys/power/mem_sleep
s2idle [deep]
root@chromium:~# rtcwake -m mem -s 30

After 30 sec takes me to boot screen.

root@chromium:~# echo s2idle > /sys/power/mem_sleep
root@chromium:~# rtcwake -m freeze -s 30
rtcwake: aangenomen wordt dat de hardwareklok UTC bevat...
rtcwake: ontwaking uit 'freeze' via /dev/rtc0 op Tue Apr 14 19:43:43 2020

(wakes)

journalctl -b -e:
apr 14 21:43:46 chromium kernel: PM: suspend entry (s2idle)
apr 14 21:43:46 chromium kernel: Filesystems sync: 0.000 seconds
apr 14 21:43:46 chromium kernel: Freezing user space processes ... (elapsed 0.002 seconds) done.
apr 14 21:43:46 chromium kernel: OOM killer disabled.
apr 14 21:43:46 chromium kernel: Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
apr 14 21:43:46 chromium kernel: printk: Suspending console(s) (use no_console_suspend to debug)
apr 14 21:43:46 chromium kernel: sd 0:0:0:0: [sda] Synchronizing SCSI cache
apr 14 21:43:46 chromium kernel: sd 0:0:0:0: [sda] Stopping disk
apr 14 21:43:46 chromium kernel: ACPI: EC: interrupt blocked
apr 14 21:43:46 chromium kernel: ACPI: EC: interrupt unblocked
apr 14 21:43:46 chromium kernel: ath: phy0: ASPM enabled: 0x43
apr 14 21:43:46 chromium kernel: sd 0:0:0:0: [sda] Starting disk
apr 14 21:43:46 chromium kernel: atmel_mxt_ts 1-004a: Resetting device
apr 14 21:43:46 chromium kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
apr 14 21:43:46 chromium kernel: ata1.00: configured for UDMA/100
apr 14 21:43:46 chromium kernel: atmel_mxt_ts 1-004a: Wait for completion timed out.
apr 14 21:43:46 chromium kernel: OOM killer enabled.
apr 14 21:43:46 chromium kernel: Restarting tasks ... done.
apr 14 21:43:46 chromium kernel: PM: suspend exit
Comment 37 Ferry Toth 2020-04-14 19:58:06 UTC
Also with s2idle enabled on 5.3.0 if I close the laptop lid, I does not reboot when I open the lid.
Additionally, it does not wake when I open the lid, or press a key.

But it does wake when I press the power button.
Comment 38 Chen Yu 2020-04-27 05:47:38 UTC
(In reply to Ferry Toth from comment #36)
> On Ubuntu 19:10 linux 5.3.0-46 (resume bad):
> ferry@chromium:~$ cat /sys/power/mem_sleep
> s2idle [deep]
> root@chromium:~# rtcwake -m mem -s 30
> 
> After 30 sec takes me to boot screen.
> 
So it reboots during resume. it's quite hard to track at which stage it reboots if there's no uart log. Since the S3 works in old kernel, The most straight way is to do a git bisect to find the offender. Or else, we have to add hack code during resume to spin the kernel at different place thus to narrow down.
Comment 39 Ferry Toth 2020-06-14 13:34:21 UTC
I just tried from Ubuntu kernel ppa:
4.19.128 OK
5.0 OK (1c163f4c7b3f621efff9b28a47abb36f7378d783)
5.0.21 OK
5.1-rc1 NOK (9e98c678c2d6ae3a17cb2de55d17f69dddaa231b)
I'll try to bisect (13 steps)
Comment 40 Ferry Toth 2020-06-14 21:02:09 UTC
Pff, this is slow. Up to now I have:
36011ddc78395b59a8a418c37f20bcc18828f1ef good
6bc3fe8e7e172d5584e529a04cf9eec946428768 bad
a50243b1ddcdd766d0d17fbfeeb1a22e62fdc461 now building

10 steps to go. I'll need a few more evenings to complete this.
Comment 41 Ferry Toth 2020-06-15 22:44:04 UTC
While bysecting (I hope to complete tomorrow) I found a workaround (that works for me on Acer 720P).

I always had tpm_tis.force=1 on the kernel command line.
Now I added tpm_tis.interrupts=0.

Wakes fine now with linux 5.6.0.

Full command line:
Kernel command line: BOOT_IMAGE=/@boot/vmlinuz-5.6.0-1011-oem root=UUID=17d2cd1d-cc37-446d-ac0b-933def63c867 ro rootflags=subvol=@ quiet splash tpm_tis.force=1 tpm_tis.interrupts=0 modprobe.blacklist=ehci_hcd,ehci-pci vt.handoff=7
Comment 42 Chris Osgood 2020-06-16 00:05:23 UTC
(In reply to Ferry Toth from comment #41)
> While bysecting (I hope to complete tomorrow) I found a workaround (that
> works for me on Acer 720P).
> 
> I always had tpm_tis.force=1 on the kernel command line.
> Now I added tpm_tis.interrupts=0.
> 
> Wakes fine now with linux 5.6.0.

First of all, thanks for bisecting this!

I can confirm setting tpm_tis.interrupts=0 works for me on ASUS C302 kernel 5.7.2 (Arch latest) and kernel 5.4.46 (Arch LTS). Previously I had no tpm_tis.interrupts setting so it must default to on.

So the question is, why does tpm_tis.interrupts only cause problems on newer kernels? Is it a kernel bug?
Comment 43 Ferry Toth 2020-06-16 20:59:53 UTC
I haven't finished bisecting yet but I am now between
5af7f115886f7ec193171e2e49b8000ddd1e7147 bad 
2f257402ee981720d65080b1e3ce19f693f5c9c3 good
9d4023ed4db6e01ff50cb68d782202c2f50760ae testing this now

This is the next-tpm merge, it may very well be that I land at Jane's conclusion (#12 above).
Maybe the author has ideas what is going on, Jarko?
Comment 44 jarkko.sakkinen 2020-06-17 23:24:01 UTC
Please re-test it with v5.8-rc1.
Comment 45 Ferry Toth 2020-06-18 21:13:05 UTC
@jarko just tested with v5.8-rc1, result is the same as 5.1 - 5.6: crashes on resume to boot screen, but setting tpm_tis.interrupts=0 resolves the situation.

Note the original reporter has a brainwashed chromebook Asus C302, I have a brainwashed chromebook Acer 720P. In both cases tpm_tis.interrupts=0 solves the problem.

Other reporters may be experiencing unrelated issues.

I had no time to bisect further today, will do tomorrow evening. and see if I can confirm [a3fbfae82b4cb3ff9928e29f34c64d0507cad874]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a3fbfae82b4cb3ff9928e29f34c64d0507cad874
Comment 46 Ferry Toth 2020-06-22 22:13:59 UTC
Indeed, after finishing bysecting:
a3fbfae82b4cb3ff9928e29f34c64d0507cad874 is the first bad commit
Comment 47 jarkko.sakkinen 2020-06-23 01:26:00 UTC
The specific commit ID would be b160c94be5d2816b62c8ac338605668304242959 that might fix the issue and it appeared first in v5.7-rc3.
Comment 48 Chen Yu 2020-06-23 01:44:56 UTC
Thanks for bisecting, Ferry.
Hi Jarkko, 
It looks like Ferry has tested v5.8-rc1 and the issue is still there.
Comment 49 jarkko.sakkinen 2020-06-24 22:48:34 UTC
(In reply to Ferry Toth from comment #45)
> @jarko just tested with v5.8-rc1, result is the same as 5.1 - 5.6: crashes
> on resume to boot screen, but setting tpm_tis.interrupts=0 resolves the
> situation.
> 
> Note the original reporter has a brainwashed chromebook Asus C302, I have a
> brainwashed chromebook Acer 720P. In both cases tpm_tis.interrupts=0 solves
> the problem.
> 
> Other reporters may be experiencing unrelated issues.
> 
> I had no time to bisect further today, will do tomorrow evening. and see if
> I can confirm [a3fbfae82b4cb3ff9928e29f34c64d0507cad874]:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/
> ?id=a3fbfae82b4cb3ff9928e29f34c64d0507cad874

I found something truly weird based on your dmesg outputs:

% git --no-pager grep IFX0102 drivers/char/tpm
drivers/char/tpm/tpm_infineon.c:	{"IFX0102", 0},
drivers/char/tpm/tpm_tis.c:	{"IFX0102", 0},		/* Infineon */

I.e. 

tpm_tis: 1.2 TPM (device-id 0xB, rev-id 16)
tpm tpm0: tpm_try_transmit: send(): error -5
tpm tpm0: A TPM error (-5) occurred attempting to determine the timeouts
tpm_tis tpm_tis: Could not get TPM timeouts and durations
tpm_tis 00:08: 1.2 TPM (device-id 0xB, rev-id 16)
tpm tpm0: tpm_try_transmit: send(): error -5
tpm tpm0: A TPM error (-5) occurred attempting to determine the timeouts
tpm_tis 00:08: Could not get TPM timeouts and durations
ima: No TPM chip found, activating TPM-bypass!
tpm_inf_pnp 00:08: Found TPM with ID IFX0102

The HID is associated with two drivers and the last log entry tells that tpm_inf_pnp was successfully initialized.

Given that tpm_tis showed problems already in the in v4.15, it would clue that tpm_tis driver should not include IFX0102.

Looking at

Author: Kylene Jo Hall <kjhall@us.ibm.com>
Date:   Sat Apr 22 02:39:52 2006 -0700

    [PATCH] tpm: add HID module parameter
    
    I recently found that not all BIOS manufacturers are using the specified
    generic PNP id in their TPM ACPI table entry.  I have added the vendor
    specific IDs that I know about and added a module parameter that a user can
    specify another HID to the probe list if their device isn't being found by the
    default list.
    
    Signed-off-by: Kylene Hall <kjhall@us.ibm.com>
    Signed-off-by: Andrew Morton <akpm@osdl.org>
    Signed-off-by: Linus Torvalds <torvalds@osdl.org>

and

% git --no-pager grep ATM1200 drivers/char/tpm
drivers/char/tpm/tpm_tis.c:	{"ATM1200", 0},		/* Atmel */
% git --no-pager grep BCM0101 drivers/char/tpm
drivers/char/tpm/tpm_tis.c:	{"BCM0101", 0},		/* Broadcom */
% git --no-pager grep NSC1200 drivers/char/tpm
drivers/char/tpm/tpm_tis.c:	{"NSC1200", 0},		/* National */

It looks like that that the author was not aware that tpm_infineon.c already was implemented for IFX0102. The errors come from non-TCG compatible TPM implemenation tried to be used with the TCG TIS driver.

I'm not sure (yet) if this a full resolution of this bug but it is obviously something that should be first fixed before making any fast conclusions on further actions.

If the issue still persists after fixing this, then it is easier to debug because the bug scoped down to the tpm_infineon driver.
Comment 50 jarkko.sakkinen 2020-06-24 22:54:00 UTC
(In reply to jarkko.sakkinen from comment #49)
> (In reply to Ferry Toth from comment #45)
> > @jarko just tested with v5.8-rc1, result is the same as 5.1 - 5.6: crashes
> > on resume to boot screen, but setting tpm_tis.interrupts=0 resolves the
> > situation.
> > 
> > Note the original reporter has a brainwashed chromebook Asus C302, I have a
> > brainwashed chromebook Acer 720P. In both cases tpm_tis.interrupts=0 solves
> > the problem.
> > 
> > Other reporters may be experiencing unrelated issues.
> > 
> > I had no time to bisect further today, will do tomorrow evening. and see if
> > I can confirm [a3fbfae82b4cb3ff9928e29f34c64d0507cad874]:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/
> > ?id=a3fbfae82b4cb3ff9928e29f34c64d0507cad874
> 
> I found something truly weird based on your dmesg outputs:
> 
> % git --no-pager grep IFX0102 drivers/char/tpm
> drivers/char/tpm/tpm_infineon.c:      {"IFX0102", 0},
> drivers/char/tpm/tpm_tis.c:   {"IFX0102", 0},         /* Infineon */
> 
> I.e. 
> 
> tpm_tis: 1.2 TPM (device-id 0xB, rev-id 16)
> tpm tpm0: tpm_try_transmit: send(): error -5
> tpm tpm0: A TPM error (-5) occurred attempting to determine the timeouts
> tpm_tis tpm_tis: Could not get TPM timeouts and durations
> tpm_tis 00:08: 1.2 TPM (device-id 0xB, rev-id 16)
> tpm tpm0: tpm_try_transmit: send(): error -5
> tpm tpm0: A TPM error (-5) occurred attempting to determine the timeouts
> tpm_tis 00:08: Could not get TPM timeouts and durations
> ima: No TPM chip found, activating TPM-bypass!
> tpm_inf_pnp 00:08: Found TPM with ID IFX0102
> 
> The HID is associated with two drivers and the last log entry tells that
> tpm_inf_pnp was successfully initialized.
> 
> Given that tpm_tis showed problems already in the in v4.15, it would clue
> that tpm_tis driver should not include IFX0102.
> 
> Looking at
> 
> Author: Kylene Jo Hall <kjhall@us.ibm.com>
> Date:   Sat Apr 22 02:39:52 2006 -0700
> 
>     [PATCH] tpm: add HID module parameter
>     
>     I recently found that not all BIOS manufacturers are using the specified
>     generic PNP id in their TPM ACPI table entry.  I have added the vendor
>     specific IDs that I know about and added a module parameter that a user
> can
>     specify another HID to the probe list if their device isn't being found
> by the
>     default list.
>     
>     Signed-off-by: Kylene Hall <kjhall@us.ibm.com>
>     Signed-off-by: Andrew Morton <akpm@osdl.org>
>     Signed-off-by: Linus Torvalds <torvalds@osdl.org>
> 
> and
> 
> % git --no-pager grep ATM1200 drivers/char/tpm
> drivers/char/tpm/tpm_tis.c:   {"ATM1200", 0},         /* Atmel */
> % git --no-pager grep BCM0101 drivers/char/tpm
> drivers/char/tpm/tpm_tis.c:   {"BCM0101", 0},         /* Broadcom */
> % git --no-pager grep NSC1200 drivers/char/tpm
> drivers/char/tpm/tpm_tis.c:   {"NSC1200", 0},         /* National */
> 
> It looks like that that the author was not aware that tpm_infineon.c already
> was implemented for IFX0102. The errors come from non-TCG compatible TPM
> implemenation tried to be used with the TCG TIS driver.
> 
> I'm not sure (yet) if this a full resolution of this bug but it is obviously
> something that should be first fixed before making any fast conclusions on
> further actions.
> 
> If the issue still persists after fixing this, then it is easier to debug
> because the bug scoped down to the tpm_infineon driver.

93e1b7d42e1edb4ddde6257e9a02513fef26f715
Comment 51 jarkko.sakkinen 2020-06-24 23:05:29 UTC
My hunch is that is a bug associated specifically with the tpm_infineon driver. It is very rare these days, which explains the somewhat long time line on discovering the bug.

It is better first to fix the HID issue first so that this can be properly validated.
Comment 52 jarkko.sakkinen 2020-06-24 23:05:50 UTC
https://lkml.org/lkml/2020/6/24/1362
Comment 53 jarkko.sakkinen 2020-06-25 02:58:37 UTC
v2: https://lkml.org/lkml/2020/6/24/1476
Comment 54 Ferry Toth 2020-06-25 21:28:19 UTC
Alright, I built 5.8-rc2 with you patch v2. Then tried resuming in 3 cases and noting the kernel log.

no tis params on kernel command line
tpm_tis 00:08: 1.2 TPM (device-id 0xB, rev-id 16)
tpm tpm0: tpm_try_transmit: send(): error -5
tpm tpm0: A TPM error (-5) occurred attempting to determine the timeouts
tpm_tis 00:08: Could not get TPM timeouts and durations
ima: No TPM chip found, activating TPM-bypass!
tpm_inf_pnp 00:08: Found TPM with ID IFX0102
result: reboot on resume

Kernel command line: tpm_tis.force=1
tpm_tis tpm_tis: 1.2 TPM (device-id 0xB, rev-id 16)
tpm tpm0: tpm_try_transmit: send(): error -5
tpm tpm0: A TPM error (-5) occurred attempting to determine the timeouts
tpm_tis tpm_tis: Could not get TPM timeouts and durations
tpm_tis 00:08: 1.2 TPM (device-id 0xB, rev-id 16)
tpm tpm0: tpm_try_transmit: send(): error -5
tpm tpm0: A TPM error (-5) occurred attempting to determine the timeouts
tpm_tis 00:08: Could not get TPM timeouts and durations
ima: No TPM chip found, activating TPM-bypass!
tpm_inf_pnp 00:08: Found TPM with ID IFX0102
result: reboot on resume

Kernel command line: tpm_tis.force=1 tpm_tis.interrupts=0
tpm_tis tpm_tis: 1.2 TPM (device-id 0xB, rev-id 16)
tpm_tis 00:08: can't request region for resource [mem 0xfed40000-0xfed44fff]
tpm_tis: probe of 00:08 failed with error -16
tpm_inf_pnp 00:08: Found TPM with ID IFX0102
result: resume correct

Looks like there is another trigger to probe tpm_tis first.
Comment 55 Ferry Toth 2020-06-25 21:35:49 UTC
Maybe this?

pnp 00:08: Plug and Play ACPI device, IDs IFX0102 PNP0c31 (active)
Comment 56 jarkko.sakkinen 2020-06-25 22:48:01 UTC
Can you send acpidump output for this device?
Comment 57 jarkko.sakkinen 2020-06-25 22:48:12 UTC
Or attach.
Comment 58 Ferry Toth 2020-06-26 16:35:32 UTC
Created attachment 289895 [details]
acpidump from Acer C720P
Comment 59 Ferry Toth 2020-06-26 16:39:59 UTC
All other info collected with hw-probe: https://linux-hardware.org/?probe=c858e37129
Comment 60 jarkko.sakkinen 2020-07-01 08:48:57 UTC
I got my hands on C720P. I'll try to reproduce this with that machine.
Comment 61 Ferry Toth 2020-07-01 17:24:08 UTC
Oh, that's good news!
Comment 62 Ferry Toth 2020-07-01 19:28:32 UTC
When I decode the dsdt I see:
    Device (TPM)
    {
        Name (_HID, EisaId ("IFX0102"))  // _HID: Hardware ID
        Name (_CID, EisaId ("PNP0C31"))  // _CID: Compatible ID
...

and 

ferry@delfion:~/tmp/linux/v5.8-rc2$ git --no-pager grep PNP0C31 
drivers/acpi/acpi_pnp.c:        {"PNP0C31"},            /* TPM */
drivers/char/tpm/tpm_tis.c:     {"PNP0C31", 0},         /* TPM */

Does this mean the driver is probed due to PNP0C31?
Comment 63 Mervin Beng 2020-07-27 08:25:42 UTC
I run my C302 on Arch Linux with kernal 5.7.10. I was seeing the same resume to BIOS issue on suspend.

I can confirm that tpm_tis.interrupts=0 on boot addresses the resume issue. Thanks Ferry. Is there a downside to using this option?
Comment 64 jarkko.sakkinen 2020-08-28 18:04:15 UTC
The holiday season came. That's why no progress with this. I have the failing laptop in my hands. I'll try to find time next week to reproduce the bug.
Comment 65 Jesse Becker 2021-03-13 14:38:29 UTC
Still present in 5.11.2 (Arch), without additional kernel options.  I'll give tpm_tis.interrupts=0 a try though.
Comment 66 Jesse Becker 2021-03-17 00:43:31 UTC
The workaround of adding tpm_tis.interrupts=0 at boot appears to help.

Note You need to log in before you can comment on or make changes to this bug.