Bug 202311

Summary: mei_me intermittently over 2,000 ms on resume from suspend
Product: Drivers Reporter: Len Brown (lenb)
Component: OtherAssignee: Tomas Winkler (tomasw)
Status: ASSIGNED ---    
Severity: normal CC: alexander.usyskin, lenb, todd.e.brandt, tomas.winkler
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.20 Subsystem:
Regression: No Bisected commit-id:
Bug Depends on:    
Bug Blocks: 178231    
Attachments: Example of mei_me slowness from CFL-H machine in S3 resume
sleepgraph timeline with callgraph over mei_me
expanded callgraph showing mei_reset
issue.def

Description Len Brown 2019-01-17 01:08:56 UTC
Created attachment 280543 [details]
Example of mei_me slowness from CFL-H machine in S3 resume

On several machines, including a Dell XPS13 9360,
the mei_me device sometimes takes over 2,000 ms to resume.
All of resume waits for this device to complete
when this happens.

In endurance tests of 2,000 suspend/resume cycles,
mei_me behaves this way about 20 times (about 1% of the time).

This is seen on both suspend to mem (ACPI S3) and also s2idle.
Comment 1 Todd Brandt 2019-01-17 02:40:47 UTC
Created attachment 280545 [details]
sleepgraph timeline with callgraph over mei_me

sleepgraph timeline with callgraph enabled, only mei_me callgraphs are shown.
Comment 2 Todd Brandt 2019-01-17 02:42:38 UTC
The issue is occurring in the mei_reset (see callgraph timeline):

mei_me_hw_start [mei_me] (2005.518 ms @ 7098.274)

it's waiting for a schedule call.
Comment 3 Tomas Winkler 2019-01-17 09:31:23 UTC
Can you please provide BIOS and CSE FW version (cat /sys/class/mei/mei0/fw_ver)
Thanks
Comment 4 Len Brown 2019-01-17 16:25:36 UTC
Created attachment 280571 [details]
expanded callgraph showing mei_reset

If you expand the callgraph in comment #1
this is that you see.
Comment 5 Todd Brandt 2019-01-17 16:27:34 UTC
$ cat /sys/class/mei/mei0/fw_ver
0:12.0.22.1310
0:12.0.22.1310
0:12.0.22.1310
Comment 6 Todd Brandt 2019-01-17 16:29:39 UTC
As for BIOS version, from the timeline's dmesg header:
# sysinfo | man:Intel Corporation | plat:CoffeeLake H DDR4 RVP | cpu:Genuine Intel(R) CPU 0000 @ 2.10GHz | bios:CNLSFWR1.R00.X181.B00.1812130202 | numcpu:16 | memsz:8053184 | memfr:3353116
Comment 7 Tomas Winkler 2019-02-11 14:49:38 UTC
Can you try to disable the TPM  via BIOS, if this helps?
Comment 8 Todd Brandt 2019-04-25 15:50:14 UTC
Created attachment 282531 [details]
issue.def
Comment 9 Todd Brandt 2019-10-08 20:08:52 UTC
I just verified that the TPM Security BIOS switch has been disabled from the beginning. i.e. this one (looks like the picture but not greyed out):

https://kbimg.dell.com/library/KB/DELL_ORGANIZATIONAL_GROUPS/DELL_GLOBAL/Content%20Team/Grey%20TPM%20Lat%207350%201.png
Comment 10 Len Brown 2019-11-15 03:12:37 UTC
Still an issue in Linux 5.4-rc7

Dell XPS 9360 with latest BIOS running with SETUP to "factory defaults", which has the TPM disabled (per above).

/sys/class/dmi/id:
bios_date:05/26/2019
bios_vendor:Dell Inc.
bios_version:2.12.0
board_name:0839Y6

/sys/class/mei/mei0:
dev_state:ENABLED
fw_status:94000245
fw_status:82218506
fw_status:00000030
fw_status:00684004
fw_status:00001F01
fw_status:47C00BC9
fw_ver:0:11.8.65.3590
fw_ver:0:11.8.65.3590
fw_ver:0:11.5.1.1006
hbm_ver:2.0
hbm_ver_drv:2.1
Comment 11 Len Brown 2020-01-08 15:35:10 UTC
Hi Tomas, what can we do to debug this?
Comment 12 Tomas Winkler 2020-01-08 15:50:20 UTC
This is not a kernel driver issue as far as I understand, but fw one, which busy with some bookkeeping task.
I've provided all the data to the fw team, as you pointed out in the issue description this happens %1 of the time, so I'm trying to understand if reflects  the priority of this issue in real life.