Bug 205935

Summary: Boot freeze on TPM (tpm_tis) line
Product: Drivers Reporter: Martin Mareš (mmrmartin)
Component: OtherAssignee: drivers_other
Status: ASSIGNED ---    
Severity: normal CC: jarkko.sakkinen, stefanb, tiwai
Priority: P1    
Hardware: All   
OS: Linux   
See Also: https://bugzilla.suse.com/show_bug.cgi?id=1159152
https://launchpad.net/bugs/1852586
https://bugzilla.kernel.org/show_bug.cgi?id=204121
Kernel Version: 5.3.12; 5.5.rc1 Subsystem:
Regression: No Bisected commit-id:
Attachments: kernel log
kernel log (longer)

Description Martin Mareš 2019-12-20 19:47:42 UTC
Created attachment 286383 [details]
kernel log

Hi,

I wanted to install openSUSE Tumbleweed NET x86_64 Snapshot 20191210 on Lenovo L490, but I got stuck 4 seconds after the start on line `tpm_tis STM7308:00: 2.0 TPM (device-id 0x0, rev-id 78)`. I cannot even change CapsLock light so I expect a Kernel freeze occurred.

I did some research and I found out an Arch forum <https://bbs.archlinux.org/viewtopic.php?id=250025> that says about two Kernel commits between 5.3.3 and 5.3.4 that could cause it:

- 7f064c378e2c8c848c7acc3ebba7ec45df1c5492
- 5b359c7c43727e624eac3efc7ad21bd2defea161

The forum also provides a workaround using `tpm_tis.interrupts=0` boot parameter. That workaround also helped in my case...

Ubuntu's Bugzilla maybe found a solution to this <https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1852586>... It seems to me they only reverted those commits <https://www.ubuntuupdates.org/package/canonical_kernel_team/eoan/main/base/linux>

- SAUCE: Revert "tpm_tis_core: Set TPM_CHIP_FLAG_IRQ before probing for interrupts"
- SAUCE: Revert "tpm_tis_core: Turn on the TPM before probing IRQ's"

I originally submitted this issue to openSUSE Bugzilla but we agreed to escalate this issue to upstream. I've tested Kernel 5.3.12-1 and 5.5.rc1-2.1.gb783fd1 (GIT revision b783fd1229dfeeff09af268db73921bf3f5e0671) and I got the same result.

It seems to me the bug also affects L580, E590 (at least according to forums).

Related:
- https://bugzilla.suse.com/show_bug.cgi?id=1159152 (original openSUSE bug report)
- https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1852586 (possible fix)
- https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1852435 (more details)
- https://bugzilla.kernel.org/show_bug.cgi?id=204121 (maybe related)
- https://bbs.archlinux.org/viewtopic.php?id=250025
Comment 1 Takashi Iwai 2019-12-21 08:36:03 UTC
FWIW, the upstream commits are:

7f064c378e2c8c848c7acc3ebba7ec45df1c5492
    tpm_tis_core: Turn on the TPM before probing IRQ's
5b359c7c43727e624eac3efc7ad21bd2defea161
    tpm_tis_core: Turn on the TPM before probing IRQ's
Comment 2 Takashi Iwai 2019-12-21 08:39:22 UTC
(In reply to Takashi Iwai from comment #1)
> FWIW, the upstream commits are:

Correction,

5b359c7c43727e624eac3efc7ad21bd2defea161
    tpm_tis_core: Turn on the TPM before probing IRQ's
1ea32c83c699df32689d329b2415796b7bfc2f6e
    tpm_tis_core: Set TPM_CHIP_FLAG_IRQ before probing for interrupts
Comment 3 jarkko.sakkinen 2019-12-27 05:51:17 UTC
We are reverting the faulting patches ASAP. Thanks for reporting.
Comment 4 jarkko.sakkinen 2019-12-31 15:57:25 UTC
Can you test from git://git.infradead.org/users/jjs/linux-tpmdd.git branch for-linus-v5.5-rc5 and update the results here? Thank you.
Comment 5 jarkko.sakkinen 2019-12-31 15:59:08 UTC
AND: if it works for you I need to ask your permission to add tested-by to the patch so that I can send a legit PR to Linus. Thanks.
Comment 6 Takashi Iwai 2020-01-01 08:50:47 UTC
Martin, as mentioned in openSUSE Bugzilla, I'm building a test kernel package based on 5.5-rc4 with Jarkko's revert patches.  Please give it a try.

BTW, Jarkko, the commits have no proper changelog explaining why these are reverted.  It'd be better to have some background information in the changelog as well as a link to the bug tracker URL or whatever source information about the bug itself.  Thanks.
Comment 7 jarkko.sakkinen 2020-01-02 17:16:38 UTC
Takashi, I fully agree with you and thank you for the suggestion.

I was already going to write something like that to the pull request email but you are right that it makes sense to document it also to the commit message.
Comment 8 jarkko.sakkinen 2020-01-03 23:31:45 UTC
Patches out: https://lore.kernel.org/linux-integrity/20200103232935.11314-1-jarkko.sakkinen@linux.intel.com/T/#t

I'll cycle them through linux-integrity for feedback before sending a pull request.
Comment 9 Martin Mareš 2020-01-04 02:20:51 UTC
Thanks, I tested 5.5.0-rc4-1.g06ad70c-default (06ad70c8a1eb780ac39452aebb64f54b1d25872d GIT Branch: users/tiwai/master/tpm-revert) that built Takashi for me and I was able to boot fine without any boot param. So these reverts fixed the issue for me. Feel free to add me to `tested-by` ;-)

As I was able to boot I got these errors about TPM interrupts (these errors are missing if I disable interrupts):

...
[    2.450387] tpm_tis STM7308:00: 2.0 TPM (device-id 0x0, rev-id 78)
[    2.450680] tpm tpm0: tpm_try_transmit: send(): error -5
[    2.450705] tpm tpm0: [Firmware Bug]: TPM interrupt not working, polling instead
...
[    3.913639] irq 31: nobody cared (try booting with the "irqpoll" option)
[    3.913640] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 5.5.0-rc4-1.g06ad70c-default #1 openSUSE Tumbleweed (unreleased)
[    3.913641] Hardware name: LENOVO 20Q50025MC/20Q50025MC, BIOS R0ZET36W (1.14 ) 11/26/2019
[    3.913641] Call Trace:
[    3.913643]  <IRQ>
[    3.913647]  dump_stack+0x8f/0xd0
[    3.913649]  __report_bad_irq+0x38/0xad
[    3.913651]  note_interrupt.cold+0xb/0x6e
[    3.913652]  handle_irq_event_percpu+0x72/0x80
[    3.913652]  handle_irq_event+0x3c/0x5c
[    3.913653]  handle_fasteoi_irq+0xa3/0x160
[    3.913655]  do_IRQ+0x53/0xe0
[    3.913656]  common_interrupt+0xf/0xf
[    3.913656]  </IRQ>
[    3.913659] RIP: 0010:cpuidle_enter_state+0xce/0x3f0
[    3.913660] Code: 80 7c 24 0f 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 ef 02 00 00 31 ff e8 0e ef 99 ff e8 89 91 a0 ff fb 66 0f 1f 44 00 00 <45> 85 ed 0f 88 40 02 00 00 49 63 d5 4c 2b 64 24 10 48 8d 04 52 48
[    3.913660] RSP: 0018:ffffb5bec010fe68 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffde
[    3.913661] RAX: 0000000080000000 RBX: ffff9640a057a800 RCX: 000000000000001f
[    3.913661] RDX: 0000000000000000 RSI: 000000004041c206 RDI: 0000000000000000
[    3.913662] RBP: ffffffffa4ce2a80 R08: 00000000e94553b2 R09: 000000007fffffff
[    3.913662] R10: 0000000000000005 R11: ffff9640a056df64 R12: 00000000e94553b2
[    3.913662] R13: 0000000000000001 R14: 0000000000000001 R15: ffff963d476c0000
[    3.913665]  ? cpuidle_enter_state+0xc7/0x3f0
[    3.913666]  cpuidle_enter+0x29/0x40
[    3.913668]  do_idle+0x1e9/0x290
[    3.913669]  cpu_startup_entry+0x19/0x20
[    3.913670]  start_secondary+0x164/0x1b0
[    3.913672]  secondary_startup_64+0xb6/0xc0
[    3.913673] handlers:
[    3.913675] [<00000000974bdd58>] tis_int_handler
[    3.913676] Disabling IRQ #31
...

I don't know how to test TPM but I think this is already reported in bug 204121. So it's probably nothing new.
Comment 10 Martin Mareš 2020-01-04 03:27:44 UTC
Created attachment 286607 [details]
kernel log (longer)

I sometimes get a longer output before freeze

(original) Kernel 5.3.12 (without reverts and extra params)
Comment 11 Martin Mareš 2020-01-04 03:30:04 UTC
I read you will make a more complex fix in the future. I found out during the testing that the original openSUSE kernel (5.3.12) sometimes gives a longer output (attachment 286607 [details]) to screen before the freeze. Maybe it could be more useful than my previous screenshot...

(all Kernels with reverts works fine)
Comment 12 jarkko.sakkinen 2020-01-06 17:47:17 UTC
The point of the reverts is to rollback to the best known state.

Also, PR has been sent:

https://lkml.org/lkml/2020/1/6/521

The error you get will still let TPM to initialize in polling mode.