Bug 203877
Summary: | (bisect a3fbfae82b4cb3ff9928e29f34c64d0507cad874 ) Resume from suspend causes reset/crash and corruption on ASUS C302 (tpm_tis.interrupts=0 workaround the issue) | ||
---|---|---|---|
Product: | Drivers | Reporter: | Chris Osgood (q45e7uj7) |
Component: | Other | Assignee: | jarkko.sakkinen |
Status: | NEEDINFO --- | ||
Severity: | normal | CC: | boynamedjane, bugs+kernel, ferry.toth, hawson, jad_dyna, jarkko.sakkinen, mervinb, mt51990, yu.c.chen, yunying.sun |
Priority: | P1 | ||
Hardware: | Intel | ||
OS: | Linux | ||
Kernel Version: | 4.19.67+ 5.1+ | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
journal logs
acpidump from Acer C720P |
Description
Chris Osgood
2019-06-12 15:10:07 UTC
I meant to mention this is NOT fixed by the "'nosmt' vs hibernation triple fault during resume" patch in 949525fff5f722245ee2e2b1fe1e860e7e603579 Issue still present in kernel 5.1.9 Still present in 5.2.6. 5.0.13 works, 5.1.0 onward does not. Running on an Acer C720 (also a brainwashed chromebook), also running Arch. This laptop seems to handle things a little better than the ASUS: legacy boots still works, and I do not see any evidence of audio problems or firmware corruption (the filesystems are, however, unhappy with the hard shutdown). There are no logs after the suspend. This bug or a version of it has now been ported over to kernel 4.19.68 (at least that's the one I tested). I suppose I should update the kernel version on this bug but want to see if I can get more confirmation first. I'm running out of workable kernels now. :/ I find that Arch Linux kernel 4.19.66-1-lts works. Then kernel 4.19.67-1-lts and above is broken (though I have not tested 4.19.69). So if someone wants to bisect whatever changed between .66 and .67 then that's likely the problem code. Maybe something to do with power management? I'm not sure. This might also find the problem in the 5.1+ kernels. > changed between .66 and .67 then
And is also common with the deltas between 5.0.13 and 5.1.0? That may help narrow down the scope of changes.
(In reply to Chris Osgood from comment #4) > I find that Arch Linux kernel 4.19.66-1-lts works. Then kernel 4.19.67-1-lts > and above is broken (though I have not tested 4.19.69). > Since 66 and 67 are with lts suffix, it might be more straightforward to test on upstream vanilla kernel. > So if someone wants to bisect whatever changed between .66 and .67 then > that's likely the problem code. Maybe something to do with power management? > I'm not sure. This might also find the problem in the 5.1+ kernels. Yeah, it looks like you are at the front line to bisect this out : ) @Chris, just wonder if you have time for a bisect if this issue is still there? Still broken in kernel 5.3.1. I haven't had a chance to bisect the issue. I'm busy fighting all sorts of fires related to kernel 5, which in general has been a disaster. Almost every new version breaks more stuff (5.1 broke the chromebooks, 5.2 broke some of our servers, 5.3 broke the e1000e driver, the list goes on). So I'm still stuck running older version 4 kernels for a while. I confirm this issue also occurs on Acer 720P (ex)chromebook (with touch screen + 4Gb ram). After updating Ubuntu to 19.10 (linux v5.3) it suspends fine but will not resume. The issue does not occur with Ubuntu 19.04 (linux v5.0). With 720P the machine just reboots to legacy, no other issues. Using Ubuntu kernel PPA versions I test linux v5.0, v5.1, v5.2 and fount the issue first occurs with v5.1. The number of backported patches from 4.19.66 -> 67 should be very limited right? Maybe we can identify from the commit message? From Ubuntu ppa v4.19.80 resumes correctly. Any news here? For me, the original culprit was [a3fbfae82b4cb3ff9928e29f34c64d0507cad874] > tpm: take TPM chip power gating out of tpm_transmit() [a3fbfae82b4cb3ff9928e29f34c64d0507cad874]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a3fbfae82b4cb3ff9928e29f34c64d0507cad874 This evening I built ubuntu eoan master-next. This is the to be kernel 5.3.0-24 based of linux 5.3.13 + UBUNTU: SAUCE: Revert "tpm_tis_core: Turn on the TPM before probing IRQ's" + UBUNTU: SAUCE: Revert "tpm_tis_core: Set TPM_CHIP_FLAG_IRQ before probing for interrupts" (see https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/eoan/log/?h=master-next) Unfortunately this kernel does not resolve the issue in this bug. (In reply to Ferry Toth from comment #13) > This evening I built ubuntu eoan master-next. > > This is the to be kernel 5.3.0-24 based of linux 5.3.13 + UBUNTU: SAUCE: > Revert "tpm_tis_core: Turn on the TPM before probing IRQ's" + UBUNTU: SAUCE: > Revert "tpm_tis_core: Set TPM_CHIP_FLAG_IRQ before probing for interrupts" > These two patches do not revert all the changes introduced in a3fbfae82b4cb3ff9928e29f34c64d0507cad874, do they? > (see > https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/eoan/log/ > ?h=master-next) > > Unfortunately this kernel does not resolve the issue in this bug. How about unload the tpm module or even unset CONFIG_TCG_TPM and build the kernel? I have encountered the hibernation issue that the system hangs when issuing S4 due to tpm unable to shutdown the devices during that phase. Maybe "tpm: take TPM chip power gating out of tpm_transmit()" needs reverting too. I have long time on the kernel command line: "tpm_tis.force=1". On 4.15 this causes: tpm_tis: 1.2 TPM (device-id 0xB, rev-id 16) genirq: Flags mismatch irq 9. 00000000 (tpm0) vs. 00000080 (acpi) tpm tpm0: Unable to request irq: 9 for probe tpm_tis 00:08: can't request region for resource [mem 0xfed40000-0xfed44fff] tpm_tis: probe of 00:08 failed with error -16 genirq: Flags mismatch irq 8. 00000080 (rtc0) vs. 00000000 (tpm0) tpm_inf_pnp 00:08: Found TPM with ID IFX0102 On 5.3.0: tpm_tis: 1.2 TPM (device-id 0xB, rev-id 16) tpm tpm0: tpm_try_transmit: send(): error -5 tpm tpm0: A TPM error (-5) occurred attempting to determine the timeouts tpm_tis tpm_tis: Could not get TPM timeouts and durations tpm_tis 00:08: 1.2 TPM (device-id 0xB, rev-id 16) tpm tpm0: tpm_try_transmit: send(): error -5 tpm tpm0: A TPM error (-5) occurred attempting to determine the timeouts tpm_tis 00:08: Could not get TPM timeouts and durations ima: No TPM chip found, activating TPM-bypass! tpm_inf_pnp 00:08: Found TPM with ID IFX0102 So, it something normally going wrong goes more wrong. And 5.3.0 with tpm_tis.force=1 ferry@chromium:~$ tpm_version Tspi_Context_Connect failed: 0x00003011 - layer=tsp, code=0011 (17), Communication failure with 4.15.0 with tpm_tis.force=1 ferry@chromium:~$ tpm_version TPM 1.2 Version Info: Chip Version: 1.2.4.32 Spec Level: 2 Errata Revision: 3 TPM Vendor ID: IFX Vendor Specific data: 0420036f 0074706d 3338ffff ff TPM Version: 01010000 Manufacturer Info: 49465800 I'm seeing something similar on an Asus Zenbook UX305U. Laptop suspends correctly, on resume I am greeted by the bios boot screen (I guess it crashes really quickly). Nothing visible in the logs inbetween the suspend and the reboot. Is there anything we can do to further diagnose this issue? This is still broken as of 5.5.10 Probably the only way to get this fixed is via a 3rd party kernel developer like Ubuntu/Fedora/GalliumOS or something because currently it seems to be ignored by the mainline kernel devs. I still haven't had time to bisect the issue and have my kernel pinned at 4.19.66. I think by far the easiest way to find the problem is by comparing 4.19.66 to 4.19.67 and see what changed. Anyone tried the latest GalliumOS? What kernel does it use? Hi, Is it possible to recompile the kernel on latest kernel without "CONFIG_TCG_TPM" , and apply the following debug patch from https://patchwork.kernel.org/patch/11464059/ and check: echo test_resume > /sys/power/disk echo disk > /sys/power/state and wait for 5 seconds to see if it could resume automatically? Must we disable 'CONFIG_TCG_TPM'? Because it is auto selected: Selected by [y]: - IMA [=y] && INTEGRITY [=y] && HAS_IOMEM [=y] && !UML I disabled INTEGRITY and then TCG_TPM'in menuconfig and built kernel. Then: root@chromium:~# echo test_resume > /sys/power/disk root@chromium:~# echo disk > /sys/power/state -bash: echo: schrijffout: Geen ruimte meer over op apparaat (that is write error: no more space on device). It seems with this test it is trying to hybernate instead of suspend. Closing laptop lid it goes to sleep properly (suspend led blinks). Opening the lid again brings my direct to the boot screen. So no change. This is with vanilla 5.6.0 + ubuntu 'sauce' + patch in #21 above - TCG_TPM. The patch should only give more debug output - did you capture anything? Nope. All I got was the error. Ferry, Chris, I thought you were trying to test hibernation. Let's switch back to suspend to mem. The first thing is to figure out what suspend mode you are using: 1. boot with the same kernel with TPM disabled. 2. cat /sys/power/mem_sleep, echo 'N' > /sys/module/printk/parameters/console_suspend echo 'Y' > /sys/module/printk/parameters/initcall_debug and leverage pm_test mode to narrow down: 3. echo different mode to /sys/power/pm_test start from right to left (freezer to devices to platform...) [none] core processors platform devices freezer check if the system could resume back within 5 seconds. For example: echo freezer > /sys/power/pm_test echo mem > /sys/power/state if succeed to resume back, then: save the dmesg and launch the next test mode: echo devices> /sys/power/pm_test echo mem > /sys/power/state and go on util you see a hang during resume. ferry@chromium:~$ tpm_version Tspi_Context_Connect failed: 0x00003011 - layer=tsp, code=0011 (17), Communication failure ferry@chromium:~$ cat /sys/power/mem_sleep s2idle [deep] root@chromium:~# echo 'N' > /sys/module/printk/parameters/console_suspend root@chromium:~# echo 'Y' > /sys/module/printk/parameters/initcall_debug -bash: /sys/module/printk/parameters/initcall_debug: Toegang geweigerd (that is access denied, because not existing) Going from right to left only none fails (i.e. does not return but goes to the boot screen). I captured journal tails in pm_test.txt Created attachment 288401 [details]
journal logs
In line 329 it says: echo processors > /sys/power/pm_test But that of course was: echo none > /sys/power/pm_test (In reply to Ferry Toth from comment #28) > ferry@chromium:~$ tpm_version > Tspi_Context_Connect failed: 0x00003011 - layer=tsp, code=0011 (17), > Communication failure > > ferry@chromium:~$ cat /sys/power/mem_sleep > s2idle [deep] > root@chromium:~# echo 'N' > /sys/module/printk/parameters/console_suspend > root@chromium:~# echo 'Y' > /sys/module/printk/parameters/initcall_debug > -bash: /sys/module/printk/parameters/initcall_debug: Toegang geweigerd > (that is access denied, because not existing) > > > Going from right to left only none fails (i.e. does not return but goes to > the boot screen). > I captured journal tails in pm_test.txt This has shown that when resuming from BIOS S3 and the BIOS is about to jump back to the vector in OS, the system has triggered a reboot. Did you update your BIOS recently? How about echo deep > /sys/power/mem_sleep rtcwake -m mem -s 30? And how about: echo s2idle > /sys/power/mem_sleep rtcwake -m freeze -s 30? No, this is a brainwashed Chromebook. Once linux installed impossible to update BIOS: ferry@chromium:~$ sudo dmidecode # dmidecode 3.2 Getting SMBIOS data from sysfs. SMBIOS 2.7 present. 10 structures occupying 397 bytes. Table at 0x7F782020. Handle 0x0000, DMI type 0, 24 bytes BIOS Information Vendor: coreboot Version: Release Date: 03/02/2017 ROM Size: 8192 kB Characteristics: PCI is supported PC Card (PCMCIA) is supported BIOS is upgradeable Selectable boot is supported ACPI is supported Targeted content distribution is supported BIOS Revision: 4.0 Firmware Revision: 0.0 I'll reboot into linux-5.6 and answer your other suggestions. > echo deep > /sys/power/mem_sleep > rtcwake -m mem -s 30? Looking from blinking LED goes into suspend. Waking takes me directly to BIOS screen. > echo s2idle > /sys/power/mem_sleep > rtcwake -m freeze -s 30? Looking from non-blinking LED is not really suspended, except everything else looks suspended. Pressing keyboard wakes normally. journalctl -b -e: apr 13 19:36:15 chromium kernel: PM: suspend entry (s2idle) apr 13 19:36:15 chromium kernel: Filesystems sync: 0.000 seconds apr 13 19:36:15 chromium kernel: Freezing user space processes ... (elapsed 0.002 seconds) done. apr 13 19:36:15 chromium kernel: OOM killer disabled. apr 13 19:36:15 chromium kernel: Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. apr 13 19:36:15 chromium kernel: printk: Suspending console(s) (use no_console_suspend to debug) apr 13 19:36:15 chromium kernel: sd 0:0:0:0: [sda] Synchronizing SCSI cache apr 13 19:36:15 chromium kernel: sd 0:0:0:0: [sda] Stopping disk apr 13 19:36:15 chromium kernel: ACPI: EC: interrupt blocked apr 13 19:36:15 chromium kernel: ACPI: EC: interrupt unblocked apr 13 19:36:15 chromium kernel: hpet_rtc_timer_reinit: 14 callbacks suppressed apr 13 19:36:15 chromium kernel: hpet: Lost 6192 RTC interrupts apr 13 19:36:15 chromium kernel: ath: phy0: ASPM enabled: 0x43 apr 13 19:36:15 chromium kernel: sd 0:0:0:0: [sda] Starting disk apr 13 19:36:15 chromium kernel: atmel_mxt_ts 2-004a: Resetting device apr 13 19:36:15 chromium kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) apr 13 19:36:15 chromium kernel: ata1.00: configured for UDMA/100 apr 13 19:36:15 chromium kernel: atmel_mxt_ts 2-004a: Wait for completion timed out. apr 13 19:36:15 chromium kernel: OOM killer enabled. apr 13 19:36:15 chromium kernel: Restarting tasks ... done. apr 13 19:36:15 chromium kernel: PM: suspend exit (In reply to Ferry Toth from comment #33) > > echo deep > /sys/power/mem_sleep > > rtcwake -m mem -s 30? > > Looking from blinking LED goes into suspend. Waking takes me directly to > BIOS screen. > > > echo s2idle > /sys/power/mem_sleep > > rtcwake -m freeze -s 30? > > Looking from non-blinking LED is not really suspended, except everything > else looks suspended. > Pressing keyboard wakes normally. > > journalctl -b -e: > apr 13 19:36:15 chromium kernel: PM: suspend entry (s2idle) > apr 13 19:36:15 chromium kernel: Filesystems sync: 0.000 seconds > apr 13 19:36:15 chromium kernel: Freezing user space processes ... (elapsed > 0.002 seconds) done. > apr 13 19:36:15 chromium kernel: OOM killer disabled. > apr 13 19:36:15 chromium kernel: Freezing remaining freezable tasks ... > (elapsed 0.001 seconds) done. > apr 13 19:36:15 chromium kernel: printk: Suspending console(s) (use > no_console_suspend to debug) > apr 13 19:36:15 chromium kernel: sd 0:0:0:0: [sda] Synchronizing SCSI cache > apr 13 19:36:15 chromium kernel: sd 0:0:0:0: [sda] Stopping disk > apr 13 19:36:15 chromium kernel: ACPI: EC: interrupt blocked > apr 13 19:36:15 chromium kernel: ACPI: EC: interrupt unblocked > apr 13 19:36:15 chromium kernel: hpet_rtc_timer_reinit: 14 callbacks > suppressed > apr 13 19:36:15 chromium kernel: hpet: Lost 6192 RTC interrupts > apr 13 19:36:15 chromium kernel: ath: phy0: ASPM enabled: 0x43 > apr 13 19:36:15 chromium kernel: sd 0:0:0:0: [sda] Starting disk > apr 13 19:36:15 chromium kernel: atmel_mxt_ts 2-004a: Resetting device > apr 13 19:36:15 chromium kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 > SControl 300) > apr 13 19:36:15 chromium kernel: ata1.00: configured for UDMA/100 > apr 13 19:36:15 chromium kernel: atmel_mxt_ts 2-004a: Wait for completion > timed out. > apr 13 19:36:15 chromium kernel: OOM killer enabled. > apr 13 19:36:15 chromium kernel: Restarting tasks ... done. > apr 13 19:36:15 chromium kernel: PM: suspend exit suspend to idle works as expected. Could you switch back to linux v5.0, as you mentioned in Comment 9, and check what is the default suspend mode : cat /sys/power/mem_sleep and try echo deep > /sys/power/mem_sleep rtcwake -m mem -s 30 echo s2idle > /sys/power/mem_sleep rtcwake -m freeze -s 30 I was thinking if the default suspend mode is s2idle on v5.0, or your bios has implicitly been adjusted, as there's no chance for OS to control the flow once suspended to S3. On Ubuntu 19:10 linux 4.15.0-1079-oem (working well): ferry@chromium:~$ cat /sys/power/mem_sleep s2idle [deep] root@chromium:~# rtcwake -m mem -s 30 rtcwake: aangenomen wordt dat de hardwareklok UTC bevat... rtcwake: /dev/rtc0: kan apparaat niet vinden: Bestand of map bestaat niet (doesn't exist) On Ubuntu 19:10 linux 5.3.0-46 (resume bad): ferry@chromium:~$ cat /sys/power/mem_sleep s2idle [deep] root@chromium:~# rtcwake -m mem -s 30 After 30 sec takes me to boot screen. root@chromium:~# echo s2idle > /sys/power/mem_sleep root@chromium:~# rtcwake -m freeze -s 30 rtcwake: aangenomen wordt dat de hardwareklok UTC bevat... rtcwake: ontwaking uit 'freeze' via /dev/rtc0 op Tue Apr 14 19:43:43 2020 (wakes) journalctl -b -e: apr 14 21:43:46 chromium kernel: PM: suspend entry (s2idle) apr 14 21:43:46 chromium kernel: Filesystems sync: 0.000 seconds apr 14 21:43:46 chromium kernel: Freezing user space processes ... (elapsed 0.002 seconds) done. apr 14 21:43:46 chromium kernel: OOM killer disabled. apr 14 21:43:46 chromium kernel: Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. apr 14 21:43:46 chromium kernel: printk: Suspending console(s) (use no_console_suspend to debug) apr 14 21:43:46 chromium kernel: sd 0:0:0:0: [sda] Synchronizing SCSI cache apr 14 21:43:46 chromium kernel: sd 0:0:0:0: [sda] Stopping disk apr 14 21:43:46 chromium kernel: ACPI: EC: interrupt blocked apr 14 21:43:46 chromium kernel: ACPI: EC: interrupt unblocked apr 14 21:43:46 chromium kernel: ath: phy0: ASPM enabled: 0x43 apr 14 21:43:46 chromium kernel: sd 0:0:0:0: [sda] Starting disk apr 14 21:43:46 chromium kernel: atmel_mxt_ts 1-004a: Resetting device apr 14 21:43:46 chromium kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) apr 14 21:43:46 chromium kernel: ata1.00: configured for UDMA/100 apr 14 21:43:46 chromium kernel: atmel_mxt_ts 1-004a: Wait for completion timed out. apr 14 21:43:46 chromium kernel: OOM killer enabled. apr 14 21:43:46 chromium kernel: Restarting tasks ... done. apr 14 21:43:46 chromium kernel: PM: suspend exit Also with s2idle enabled on 5.3.0 if I close the laptop lid, I does not reboot when I open the lid. Additionally, it does not wake when I open the lid, or press a key. But it does wake when I press the power button. (In reply to Ferry Toth from comment #36) > On Ubuntu 19:10 linux 5.3.0-46 (resume bad): > ferry@chromium:~$ cat /sys/power/mem_sleep > s2idle [deep] > root@chromium:~# rtcwake -m mem -s 30 > > After 30 sec takes me to boot screen. > So it reboots during resume. it's quite hard to track at which stage it reboots if there's no uart log. Since the S3 works in old kernel, The most straight way is to do a git bisect to find the offender. Or else, we have to add hack code during resume to spin the kernel at different place thus to narrow down. I just tried from Ubuntu kernel ppa: 4.19.128 OK 5.0 OK (1c163f4c7b3f621efff9b28a47abb36f7378d783) 5.0.21 OK 5.1-rc1 NOK (9e98c678c2d6ae3a17cb2de55d17f69dddaa231b) I'll try to bisect (13 steps) Pff, this is slow. Up to now I have: 36011ddc78395b59a8a418c37f20bcc18828f1ef good 6bc3fe8e7e172d5584e529a04cf9eec946428768 bad a50243b1ddcdd766d0d17fbfeeb1a22e62fdc461 now building 10 steps to go. I'll need a few more evenings to complete this. While bysecting (I hope to complete tomorrow) I found a workaround (that works for me on Acer 720P). I always had tpm_tis.force=1 on the kernel command line. Now I added tpm_tis.interrupts=0. Wakes fine now with linux 5.6.0. Full command line: Kernel command line: BOOT_IMAGE=/@boot/vmlinuz-5.6.0-1011-oem root=UUID=17d2cd1d-cc37-446d-ac0b-933def63c867 ro rootflags=subvol=@ quiet splash tpm_tis.force=1 tpm_tis.interrupts=0 modprobe.blacklist=ehci_hcd,ehci-pci vt.handoff=7 (In reply to Ferry Toth from comment #41) > While bysecting (I hope to complete tomorrow) I found a workaround (that > works for me on Acer 720P). > > I always had tpm_tis.force=1 on the kernel command line. > Now I added tpm_tis.interrupts=0. > > Wakes fine now with linux 5.6.0. First of all, thanks for bisecting this! I can confirm setting tpm_tis.interrupts=0 works for me on ASUS C302 kernel 5.7.2 (Arch latest) and kernel 5.4.46 (Arch LTS). Previously I had no tpm_tis.interrupts setting so it must default to on. So the question is, why does tpm_tis.interrupts only cause problems on newer kernels? Is it a kernel bug? I haven't finished bisecting yet but I am now between 5af7f115886f7ec193171e2e49b8000ddd1e7147 bad 2f257402ee981720d65080b1e3ce19f693f5c9c3 good 9d4023ed4db6e01ff50cb68d782202c2f50760ae testing this now This is the next-tpm merge, it may very well be that I land at Jane's conclusion (#12 above). Maybe the author has ideas what is going on, Jarko? Please re-test it with v5.8-rc1. @jarko just tested with v5.8-rc1, result is the same as 5.1 - 5.6: crashes on resume to boot screen, but setting tpm_tis.interrupts=0 resolves the situation. Note the original reporter has a brainwashed chromebook Asus C302, I have a brainwashed chromebook Acer 720P. In both cases tpm_tis.interrupts=0 solves the problem. Other reporters may be experiencing unrelated issues. I had no time to bisect further today, will do tomorrow evening. and see if I can confirm [a3fbfae82b4cb3ff9928e29f34c64d0507cad874]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a3fbfae82b4cb3ff9928e29f34c64d0507cad874 Indeed, after finishing bysecting: a3fbfae82b4cb3ff9928e29f34c64d0507cad874 is the first bad commit The specific commit ID would be b160c94be5d2816b62c8ac338605668304242959 that might fix the issue and it appeared first in v5.7-rc3. Thanks for bisecting, Ferry. Hi Jarkko, It looks like Ferry has tested v5.8-rc1 and the issue is still there. (In reply to Ferry Toth from comment #45) > @jarko just tested with v5.8-rc1, result is the same as 5.1 - 5.6: crashes > on resume to boot screen, but setting tpm_tis.interrupts=0 resolves the > situation. > > Note the original reporter has a brainwashed chromebook Asus C302, I have a > brainwashed chromebook Acer 720P. In both cases tpm_tis.interrupts=0 solves > the problem. > > Other reporters may be experiencing unrelated issues. > > I had no time to bisect further today, will do tomorrow evening. and see if > I can confirm [a3fbfae82b4cb3ff9928e29f34c64d0507cad874]: > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/ > ?id=a3fbfae82b4cb3ff9928e29f34c64d0507cad874 I found something truly weird based on your dmesg outputs: % git --no-pager grep IFX0102 drivers/char/tpm drivers/char/tpm/tpm_infineon.c: {"IFX0102", 0}, drivers/char/tpm/tpm_tis.c: {"IFX0102", 0}, /* Infineon */ I.e. tpm_tis: 1.2 TPM (device-id 0xB, rev-id 16) tpm tpm0: tpm_try_transmit: send(): error -5 tpm tpm0: A TPM error (-5) occurred attempting to determine the timeouts tpm_tis tpm_tis: Could not get TPM timeouts and durations tpm_tis 00:08: 1.2 TPM (device-id 0xB, rev-id 16) tpm tpm0: tpm_try_transmit: send(): error -5 tpm tpm0: A TPM error (-5) occurred attempting to determine the timeouts tpm_tis 00:08: Could not get TPM timeouts and durations ima: No TPM chip found, activating TPM-bypass! tpm_inf_pnp 00:08: Found TPM with ID IFX0102 The HID is associated with two drivers and the last log entry tells that tpm_inf_pnp was successfully initialized. Given that tpm_tis showed problems already in the in v4.15, it would clue that tpm_tis driver should not include IFX0102. Looking at Author: Kylene Jo Hall <kjhall@us.ibm.com> Date: Sat Apr 22 02:39:52 2006 -0700 [PATCH] tpm: add HID module parameter I recently found that not all BIOS manufacturers are using the specified generic PNP id in their TPM ACPI table entry. I have added the vendor specific IDs that I know about and added a module parameter that a user can specify another HID to the probe list if their device isn't being found by the default list. Signed-off-by: Kylene Hall <kjhall@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> and % git --no-pager grep ATM1200 drivers/char/tpm drivers/char/tpm/tpm_tis.c: {"ATM1200", 0}, /* Atmel */ % git --no-pager grep BCM0101 drivers/char/tpm drivers/char/tpm/tpm_tis.c: {"BCM0101", 0}, /* Broadcom */ % git --no-pager grep NSC1200 drivers/char/tpm drivers/char/tpm/tpm_tis.c: {"NSC1200", 0}, /* National */ It looks like that that the author was not aware that tpm_infineon.c already was implemented for IFX0102. The errors come from non-TCG compatible TPM implemenation tried to be used with the TCG TIS driver. I'm not sure (yet) if this a full resolution of this bug but it is obviously something that should be first fixed before making any fast conclusions on further actions. If the issue still persists after fixing this, then it is easier to debug because the bug scoped down to the tpm_infineon driver. (In reply to jarkko.sakkinen from comment #49) > (In reply to Ferry Toth from comment #45) > > @jarko just tested with v5.8-rc1, result is the same as 5.1 - 5.6: crashes > > on resume to boot screen, but setting tpm_tis.interrupts=0 resolves the > > situation. > > > > Note the original reporter has a brainwashed chromebook Asus C302, I have a > > brainwashed chromebook Acer 720P. In both cases tpm_tis.interrupts=0 solves > > the problem. > > > > Other reporters may be experiencing unrelated issues. > > > > I had no time to bisect further today, will do tomorrow evening. and see if > > I can confirm [a3fbfae82b4cb3ff9928e29f34c64d0507cad874]: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/ > > ?id=a3fbfae82b4cb3ff9928e29f34c64d0507cad874 > > I found something truly weird based on your dmesg outputs: > > % git --no-pager grep IFX0102 drivers/char/tpm > drivers/char/tpm/tpm_infineon.c: {"IFX0102", 0}, > drivers/char/tpm/tpm_tis.c: {"IFX0102", 0}, /* Infineon */ > > I.e. > > tpm_tis: 1.2 TPM (device-id 0xB, rev-id 16) > tpm tpm0: tpm_try_transmit: send(): error -5 > tpm tpm0: A TPM error (-5) occurred attempting to determine the timeouts > tpm_tis tpm_tis: Could not get TPM timeouts and durations > tpm_tis 00:08: 1.2 TPM (device-id 0xB, rev-id 16) > tpm tpm0: tpm_try_transmit: send(): error -5 > tpm tpm0: A TPM error (-5) occurred attempting to determine the timeouts > tpm_tis 00:08: Could not get TPM timeouts and durations > ima: No TPM chip found, activating TPM-bypass! > tpm_inf_pnp 00:08: Found TPM with ID IFX0102 > > The HID is associated with two drivers and the last log entry tells that > tpm_inf_pnp was successfully initialized. > > Given that tpm_tis showed problems already in the in v4.15, it would clue > that tpm_tis driver should not include IFX0102. > > Looking at > > Author: Kylene Jo Hall <kjhall@us.ibm.com> > Date: Sat Apr 22 02:39:52 2006 -0700 > > [PATCH] tpm: add HID module parameter > > I recently found that not all BIOS manufacturers are using the specified > generic PNP id in their TPM ACPI table entry. I have added the vendor > specific IDs that I know about and added a module parameter that a user > can > specify another HID to the probe list if their device isn't being found > by the > default list. > > Signed-off-by: Kylene Hall <kjhall@us.ibm.com> > Signed-off-by: Andrew Morton <akpm@osdl.org> > Signed-off-by: Linus Torvalds <torvalds@osdl.org> > > and > > % git --no-pager grep ATM1200 drivers/char/tpm > drivers/char/tpm/tpm_tis.c: {"ATM1200", 0}, /* Atmel */ > % git --no-pager grep BCM0101 drivers/char/tpm > drivers/char/tpm/tpm_tis.c: {"BCM0101", 0}, /* Broadcom */ > % git --no-pager grep NSC1200 drivers/char/tpm > drivers/char/tpm/tpm_tis.c: {"NSC1200", 0}, /* National */ > > It looks like that that the author was not aware that tpm_infineon.c already > was implemented for IFX0102. The errors come from non-TCG compatible TPM > implemenation tried to be used with the TCG TIS driver. > > I'm not sure (yet) if this a full resolution of this bug but it is obviously > something that should be first fixed before making any fast conclusions on > further actions. > > If the issue still persists after fixing this, then it is easier to debug > because the bug scoped down to the tpm_infineon driver. 93e1b7d42e1edb4ddde6257e9a02513fef26f715 My hunch is that is a bug associated specifically with the tpm_infineon driver. It is very rare these days, which explains the somewhat long time line on discovering the bug. It is better first to fix the HID issue first so that this can be properly validated. Alright, I built 5.8-rc2 with you patch v2. Then tried resuming in 3 cases and noting the kernel log. no tis params on kernel command line tpm_tis 00:08: 1.2 TPM (device-id 0xB, rev-id 16) tpm tpm0: tpm_try_transmit: send(): error -5 tpm tpm0: A TPM error (-5) occurred attempting to determine the timeouts tpm_tis 00:08: Could not get TPM timeouts and durations ima: No TPM chip found, activating TPM-bypass! tpm_inf_pnp 00:08: Found TPM with ID IFX0102 result: reboot on resume Kernel command line: tpm_tis.force=1 tpm_tis tpm_tis: 1.2 TPM (device-id 0xB, rev-id 16) tpm tpm0: tpm_try_transmit: send(): error -5 tpm tpm0: A TPM error (-5) occurred attempting to determine the timeouts tpm_tis tpm_tis: Could not get TPM timeouts and durations tpm_tis 00:08: 1.2 TPM (device-id 0xB, rev-id 16) tpm tpm0: tpm_try_transmit: send(): error -5 tpm tpm0: A TPM error (-5) occurred attempting to determine the timeouts tpm_tis 00:08: Could not get TPM timeouts and durations ima: No TPM chip found, activating TPM-bypass! tpm_inf_pnp 00:08: Found TPM with ID IFX0102 result: reboot on resume Kernel command line: tpm_tis.force=1 tpm_tis.interrupts=0 tpm_tis tpm_tis: 1.2 TPM (device-id 0xB, rev-id 16) tpm_tis 00:08: can't request region for resource [mem 0xfed40000-0xfed44fff] tpm_tis: probe of 00:08 failed with error -16 tpm_inf_pnp 00:08: Found TPM with ID IFX0102 result: resume correct Looks like there is another trigger to probe tpm_tis first. Maybe this? pnp 00:08: Plug and Play ACPI device, IDs IFX0102 PNP0c31 (active) Can you send acpidump output for this device? Or attach. Created attachment 289895 [details]
acpidump from Acer C720P
All other info collected with hw-probe: https://linux-hardware.org/?probe=c858e37129 I got my hands on C720P. I'll try to reproduce this with that machine. Oh, that's good news! When I decode the dsdt I see: Device (TPM) { Name (_HID, EisaId ("IFX0102")) // _HID: Hardware ID Name (_CID, EisaId ("PNP0C31")) // _CID: Compatible ID ... and ferry@delfion:~/tmp/linux/v5.8-rc2$ git --no-pager grep PNP0C31 drivers/acpi/acpi_pnp.c: {"PNP0C31"}, /* TPM */ drivers/char/tpm/tpm_tis.c: {"PNP0C31", 0}, /* TPM */ Does this mean the driver is probed due to PNP0C31? I run my C302 on Arch Linux with kernal 5.7.10. I was seeing the same resume to BIOS issue on suspend. I can confirm that tpm_tis.interrupts=0 on boot addresses the resume issue. Thanks Ferry. Is there a downside to using this option? The holiday season came. That's why no progress with this. I have the failing laptop in my hands. I'll try to find time next week to reproduce the bug. Still present in 5.11.2 (Arch), without additional kernel options. I'll give tpm_tis.interrupts=0 a try though. The workaround of adding tpm_tis.interrupts=0 at boot appears to help. |