Bug 205935 - Boot freeze on TPM (tpm_tis) line
Summary: Boot freeze on TPM (tpm_tis) line
Status: ASSIGNED
Alias: None
Product: Drivers
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-12-20 19:47 UTC by Martin Mareš
Modified: 2020-01-06 17:47 UTC (History)
3 users (show)

See Also:
Kernel Version: 5.3.12; 5.5.rc1
Subsystem:
Regression: No
Bisected commit-id:


Attachments
kernel log (2.03 MB, image/jpeg)
2019-12-20 19:47 UTC, Martin Mareš
Details
kernel log (longer) (357.78 KB, image/jpeg)
2020-01-04 03:27 UTC, Martin Mareš
Details

Description Martin Mareš 2019-12-20 19:47:42 UTC
Created attachment 286383 [details]
kernel log

Hi,

I wanted to install openSUSE Tumbleweed NET x86_64 Snapshot 20191210 on Lenovo L490, but I got stuck 4 seconds after the start on line `tpm_tis STM7308:00: 2.0 TPM (device-id 0x0, rev-id 78)`. I cannot even change CapsLock light so I expect a Kernel freeze occurred.

I did some research and I found out an Arch forum <https://bbs.archlinux.org/viewtopic.php?id=250025> that says about two Kernel commits between 5.3.3 and 5.3.4 that could cause it:

- 7f064c378e2c8c848c7acc3ebba7ec45df1c5492
- 5b359c7c43727e624eac3efc7ad21bd2defea161

The forum also provides a workaround using `tpm_tis.interrupts=0` boot parameter. That workaround also helped in my case...

Ubuntu's Bugzilla maybe found a solution to this <https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1852586>... It seems to me they only reverted those commits <https://www.ubuntuupdates.org/package/canonical_kernel_team/eoan/main/base/linux>

- SAUCE: Revert "tpm_tis_core: Set TPM_CHIP_FLAG_IRQ before probing for interrupts"
- SAUCE: Revert "tpm_tis_core: Turn on the TPM before probing IRQ's"

I originally submitted this issue to openSUSE Bugzilla but we agreed to escalate this issue to upstream. I've tested Kernel 5.3.12-1 and 5.5.rc1-2.1.gb783fd1 (GIT revision b783fd1229dfeeff09af268db73921bf3f5e0671) and I got the same result.

It seems to me the bug also affects L580, E590 (at least according to forums).

Related:
- https://bugzilla.suse.com/show_bug.cgi?id=1159152 (original openSUSE bug report)
- https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1852586 (possible fix)
- https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1852435 (more details)
- https://bugzilla.kernel.org/show_bug.cgi?id=204121 (maybe related)
- https://bbs.archlinux.org/viewtopic.php?id=250025
Comment 1 Takashi Iwai 2019-12-21 08:36:03 UTC
FWIW, the upstream commits are:

7f064c378e2c8c848c7acc3ebba7ec45df1c5492
    tpm_tis_core: Turn on the TPM before probing IRQ's
5b359c7c43727e624eac3efc7ad21bd2defea161
    tpm_tis_core: Turn on the TPM before probing IRQ's
Comment 2 Takashi Iwai 2019-12-21 08:39:22 UTC
(In reply to Takashi Iwai from comment #1)
> FWIW, the upstream commits are:

Correction,

5b359c7c43727e624eac3efc7ad21bd2defea161
    tpm_tis_core: Turn on the TPM before probing IRQ's
1ea32c83c699df32689d329b2415796b7bfc2f6e
    tpm_tis_core: Set TPM_CHIP_FLAG_IRQ before probing for interrupts
Comment 3 jarkko.sakkinen 2019-12-27 05:51:17 UTC
We are reverting the faulting patches ASAP. Thanks for reporting.
Comment 4 jarkko.sakkinen 2019-12-31 15:57:25 UTC
Can you test from git://git.infradead.org/users/jjs/linux-tpmdd.git branch for-linus-v5.5-rc5 and update the results here? Thank you.
Comment 5 jarkko.sakkinen 2019-12-31 15:59:08 UTC
AND: if it works for you I need to ask your permission to add tested-by to the patch so that I can send a legit PR to Linus. Thanks.
Comment 6 Takashi Iwai 2020-01-01 08:50:47 UTC
Martin, as mentioned in openSUSE Bugzilla, I'm building a test kernel package based on 5.5-rc4 with Jarkko's revert patches.  Please give it a try.

BTW, Jarkko, the commits have no proper changelog explaining why these are reverted.  It'd be better to have some background information in the changelog as well as a link to the bug tracker URL or whatever source information about the bug itself.  Thanks.
Comment 7 jarkko.sakkinen 2020-01-02 17:16:38 UTC
Takashi, I fully agree with you and thank you for the suggestion.

I was already going to write something like that to the pull request email but you are right that it makes sense to document it also to the commit message.
Comment 8 jarkko.sakkinen 2020-01-03 23:31:45 UTC
Patches out: https://lore.kernel.org/linux-integrity/20200103232935.11314-1-jarkko.sakkinen@linux.intel.com/T/#t

I'll cycle them through linux-integrity for feedback before sending a pull request.
Comment 9 Martin Mareš 2020-01-04 02:20:51 UTC
Thanks, I tested 5.5.0-rc4-1.g06ad70c-default (06ad70c8a1eb780ac39452aebb64f54b1d25872d GIT Branch: users/tiwai/master/tpm-revert) that built Takashi for me and I was able to boot fine without any boot param. So these reverts fixed the issue for me. Feel free to add me to `tested-by` ;-)

As I was able to boot I got these errors about TPM interrupts (these errors are missing if I disable interrupts):

...
[    2.450387] tpm_tis STM7308:00: 2.0 TPM (device-id 0x0, rev-id 78)
[    2.450680] tpm tpm0: tpm_try_transmit: send(): error -5
[    2.450705] tpm tpm0: [Firmware Bug]: TPM interrupt not working, polling instead
...
[    3.913639] irq 31: nobody cared (try booting with the "irqpoll" option)
[    3.913640] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 5.5.0-rc4-1.g06ad70c-default #1 openSUSE Tumbleweed (unreleased)
[    3.913641] Hardware name: LENOVO 20Q50025MC/20Q50025MC, BIOS R0ZET36W (1.14 ) 11/26/2019
[    3.913641] Call Trace:
[    3.913643]  <IRQ>
[    3.913647]  dump_stack+0x8f/0xd0
[    3.913649]  __report_bad_irq+0x38/0xad
[    3.913651]  note_interrupt.cold+0xb/0x6e
[    3.913652]  handle_irq_event_percpu+0x72/0x80
[    3.913652]  handle_irq_event+0x3c/0x5c
[    3.913653]  handle_fasteoi_irq+0xa3/0x160
[    3.913655]  do_IRQ+0x53/0xe0
[    3.913656]  common_interrupt+0xf/0xf
[    3.913656]  </IRQ>
[    3.913659] RIP: 0010:cpuidle_enter_state+0xce/0x3f0
[    3.913660] Code: 80 7c 24 0f 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 ef 02 00 00 31 ff e8 0e ef 99 ff e8 89 91 a0 ff fb 66 0f 1f 44 00 00 <45> 85 ed 0f 88 40 02 00 00 49 63 d5 4c 2b 64 24 10 48 8d 04 52 48
[    3.913660] RSP: 0018:ffffb5bec010fe68 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffde
[    3.913661] RAX: 0000000080000000 RBX: ffff9640a057a800 RCX: 000000000000001f
[    3.913661] RDX: 0000000000000000 RSI: 000000004041c206 RDI: 0000000000000000
[    3.913662] RBP: ffffffffa4ce2a80 R08: 00000000e94553b2 R09: 000000007fffffff
[    3.913662] R10: 0000000000000005 R11: ffff9640a056df64 R12: 00000000e94553b2
[    3.913662] R13: 0000000000000001 R14: 0000000000000001 R15: ffff963d476c0000
[    3.913665]  ? cpuidle_enter_state+0xc7/0x3f0
[    3.913666]  cpuidle_enter+0x29/0x40
[    3.913668]  do_idle+0x1e9/0x290
[    3.913669]  cpu_startup_entry+0x19/0x20
[    3.913670]  start_secondary+0x164/0x1b0
[    3.913672]  secondary_startup_64+0xb6/0xc0
[    3.913673] handlers:
[    3.913675] [<00000000974bdd58>] tis_int_handler
[    3.913676] Disabling IRQ #31
...

I don't know how to test TPM but I think this is already reported in bug 204121. So it's probably nothing new.
Comment 10 Martin Mareš 2020-01-04 03:27:44 UTC
Created attachment 286607 [details]
kernel log (longer)

I sometimes get a longer output before freeze

(original) Kernel 5.3.12 (without reverts and extra params)
Comment 11 Martin Mareš 2020-01-04 03:30:04 UTC
I read you will make a more complex fix in the future. I found out during the testing that the original openSUSE kernel (5.3.12) sometimes gives a longer output (attachment 286607 [details]) to screen before the freeze. Maybe it could be more useful than my previous screenshot...

(all Kernels with reverts works fine)
Comment 12 jarkko.sakkinen 2020-01-06 17:47:17 UTC
The point of the reverts is to rollback to the best known state.

Also, PR has been sent:

https://lkml.org/lkml/2020/1/6/521

The error you get will still let TPM to initialize in polling mode.

Note You need to log in before you can comment on or make changes to this bug.