Bug 203561

Summary: BISECTED tpm tpm0: tpm_try_transmit: send(): error -5
Product: Drivers Reporter: Todd Brandt (todd.e.brandt)
Component: OtherAssignee: jarkko.sakkinen
Status: ASSIGNED ---    
Severity: normal CC: jhnmlkvch9, lenb, m.b.anti666, yunying.sun
Priority: P1    
Hardware: Intel   
OS: Linux   
Kernel Version: 5.1.0-rc4 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 178231    
Attachments: issue.def
otcpl-dell-p5510-xeon-1_mem.html
tpm-error-git-bisect
dell-p5510-test-tpm-fail-commit.html

Description Todd Brandt 2019-05-09 12:08:32 UTC
Created attachment 282689 [details]
issue.def

During the suspend phase of mem and freeze (S3 and S2idle), on the Dell Precision 5510 (i5-6300HQ), I'm consistently getting this error:

[  805.847975] tpm tpm0: tpm_try_transmit: send(): error -5
[  805.847979] tpm tpm0: Error (-5) sending savestate before suspend

In 5.1.0-rc4 the suspend ended up failing because the tpm refused to suspend. In 5.1.0-rc5 and after, the suspend doesn't fail (the second message does not happen), but the first still does. The error is still there but is less catastrophic now.
Comment 1 Todd Brandt 2019-05-09 12:11:00 UTC
Created attachment 282691 [details]
otcpl-dell-p5510-xeon-1_mem.html

dmesg log is in the timeline, click the dmesg button
Comment 2 Jonny Mako 2019-07-06 15:58:07 UTC
*** Bug 204073 has been marked as a duplicate of this bug. ***
Comment 3 jarkko.sakkinen 2019-07-10 20:12:41 UTC
Hi, Nayna Jain and Mimi Zohar kindly investigated this and apparently e891db1a18bf should address the issue. I made that fix post v5.1-rc4.
Comment 4 Jonny Mako 2019-07-20 15:07:42 UTC
I still see this in 5.2

Linux jonmhome 5.2.1-arch1-1-ARCH #1 SMP PREEMPT Sun Jul 14 14:52:52 UTC 2019 x86_64 GNU/Linux
Comment 5 Len Brown 2019-10-02 23:55:56 UTC
In Linux-5.3.0 this message still appears for every s2-idle and ACPI S3
on these two machines, though the actual suspend/resume succeeds:

5.3.0+	otcpl-dell-p5510-xeon-1	freeze-x2431	2431	100.00%
5.3.0+	otcpl-dell-p5510-xeon-1	mem-x2209	2209	100.00%
5.3.0+	otcpl-dell-p5510-xeon-2	freeze-x1093	1093	100.00%
5.3.0+	otcpl-dell-p5510-xeon-2	mem-x570	570	100.00%

We don't see this message in 5.0.0, suggesting this is a regression.
Comment 6 Todd Brandt 2019-10-07 22:36:27 UTC
I've just confirmed that this error was not present in v5.0 and was introduced in v5.1-rc1 on both the dell p5510 machines. I'm doing a git bisect now and will publish it when it's complete.
Comment 7 jarkko.sakkinen 2019-10-08 22:50:24 UTC
I'll try to get my hands on hardware that I could use to reproduce this issue.
Comment 10 Todd Brandt 2019-10-09 07:00:39 UTC
Created attachment 285411 [details]
tpm-error-git-bisect

a3fbfae82b4cb3ff9928e29f34c64d0507cad874 is the first bad commit
commit a3fbfae82b4cb3ff9928e29f34c64d0507cad874
Author: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Date:   Mon Nov 5 02:07:56 2018 +0200
Comment 11 Todd Brandt 2019-10-09 07:08:24 UTC
BISECT SUCCESSFUL: The git bisect was completed correctly this time, and I attached the full log as well for more info (tpm-error-git-bisect).

  The offending commit is: a3fbfae82b4cb3ff9928e29f34c64d0507cad874

Interestingly, the commit that causes this error also causes the suspend/resume to fail. The suspend failure must have been fixed later on, but the underlying issue is still there. This is the dmesg log output snippet for a test on this commit:

[   35.802283] tpm_tis 00:0a: calling pnp_bus_suspend+0x0/0x20 @ 1743, parent: pnp0
[   35.802335] tpm tpm0: tpm_try_transmit: send(): error -5
[   35.802339] tpm tpm0: Error (-5) sending savestate before suspend
[   35.802348] __pnp_bus_suspend(): tpm_pm_suspend+0x0/0x90 returns -5
[   35.802354] dpm_run_callback(): pnp_bus_suspend+0x0/0x20 returns -5
[   35.802360] tpm_tis 00:0a: pnp_bus_suspend+0x0/0x20 returned -5 after 69 usecs
[   35.802364] PM: Device 00:0a failed to suspend: error -5
[   35.802369] PM: suspend of devices aborted after 1110.693 msecs
[   35.802372] PM: Some devices failed to suspend, or early wake event detected

I'll also attach the timeline for this test. The dmesg log is viewable by clicking the "dmesg" button in the upper right hand corner.
Comment 12 Todd Brandt 2019-10-09 07:15:52 UTC
Created attachment 285413 [details]
dell-p5510-test-tpm-fail-commit.html

This is a suspend/resume timeline run on a kernel built from the first bad commit: a3fbfae82b4cb3ff9928e29f34c64d0507cad874. Tests run on the commit just previous work completely (achieve freeze mode and no tpm error occurs).
Comment 13 jarkko.sakkinen 2019-10-14 18:42:53 UTC
(In reply to Todd Brandt from comment #12)
> Created attachment 285413 [details]
> dell-p5510-test-tpm-fail-commit.html
> 
> This is a suspend/resume timeline run on a kernel built from the first bad
> commit: a3fbfae82b4cb3ff9928e29f34c64d0507cad874. Tests run on the commit
> just previous work completely (achieve freeze mode and no tpm error occurs).

The most obvious problem in a3fbfae82b4cb3ff9928e29f34c64d0507cad874 was fixed in 
e891db1a18bf11e02533ec2386b796cfd8d60666:

commit e891db1a18bf11e02533ec2386b796cfd8d60666
Author: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Date:   Fri Mar 22 12:51:20 2019 +0200

    tpm: turn on TPM on suspend for TPM 1.x
    
    tpm_chip_start/stop() should be also called for TPM 1.x devices on
    suspend. Add that functionality back. Do not lock the chip because
    it is unnecessary as there are no multiple threads using it when
    doing the suspend.
    
    Fixes: a3fbfae82b4c ("tpm: take TPM chip power gating out of tpm_transmit()")
    Reported-by: Paul Zimmerman <pauldzim@gmail.com>
    Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
    Tested-by: Domenico Andreoli <domenico.andreoli@linux.com>
    Signed-off-by: James Morris <james.morris@microsoft.com>


Have you sanity checked that a3fbfae82b4c~1 works?
Comment 14 jarkko.sakkinen 2019-10-14 19:17:18 UTC
The 2nd sanity check would be to try out e891db1a18bf11e02533ec2386b796cfd8d60666 (i.e. to verify that something new after that did not trigger just an issue with the same outcome).
Comment 15 jarkko.sakkinen 2019-10-14 19:17:53 UTC
I'm sorry to bother you with this but I do not have access to the given laptop model.
Comment 16 Len Brown 2019-11-05 02:36:00 UTC
Encountered this regression on an additional machine --
100% of the time over 2,000 suspend to ACPI S3 plus 2,000 suspend to idle
tests on the Dell XPS/11-9P33 running Ubuntu 19.10's 5.3.0-based kernel:

5.3.0-19-generic Dell-XPS-11-9P33	mem-x2000	2000	100.00%
5.3.0-19-generic Dell-XPS-11-9P33	freeze-x2000	2000	100.00%