Bug 196129 - EC noirq spinnig - Fan blows up on Lenovo Carbon X1 Generation 5 (2017)
Summary: EC noirq spinnig - Fan blows up on Lenovo Carbon X1 Generation 5 (2017)
Status: CLOSED CODE_FIX
Alias: None
Product: ACPI
Classification: Unclassified
Component: EC (show other bugs)
Hardware: All Linux
: P3 normal
Assignee: Lv Zheng
URL:
Keywords:
: 191181 (view as bug list)
Depends on:
Blocks:
 
Reported: 2017-06-19 23:46 UTC by Lv Zheng
Modified: 2017-10-13 12:39 UTC (History)
17 users (show)

See Also:
Kernel Version: 4.11
Tree: Mainline
Regression: Yes


Attachments
T470 lspci output (30.24 KB, text/plain)
2017-09-12 22:54 UTC, Tomislav Ivek
Details
T470 dmidecode output (15.16 KB, text/plain)
2017-09-12 22:56 UTC, Tomislav Ivek
Details
x1c Generation 5 (2017) acpidump output (780.10 KB, text/plain)
2017-09-14 07:52 UTC, shian
Details

Description Lv Zheng 2017-06-19 23:46:59 UTC
Spawned from bug 191181.

From Fernando Chaves:
> > I can confirm revert  commit  d30283057ecdf8c543ae757ae34db3d7fd2d7732 in
> > kernel 4.11.3  (archlinux) solves acpitz-virtual-0 stuck in 48º, but not
> > solve the fan issue.
> > 
> > I will try in theses days to revert the others commits and see what happen
> 
> not for me on 4.12-rc4. acpitz-virtual-0 is still stuck at +48.0°C when the
> issue appears.

You are right, I was watching another sensor, sorry I was tired.



Now I reverts all the commits in 4.11.3 and the issue is gone, no more fan blowing and acpitz-virtual-0 stuck in +48.0°C

df45db6177f8dde380d44149cca46ad800a00575
750f628be68e8b8e1624d8abd003b9f1fc758ed6
e923e8e79e18fd6be9162f1be6b99a002e9df2cb
c2b46d679b30c5c0d7eb47a21085943242bdd8dc
39a2a2aa3e9e5538984e9130c92a6c889ad86435
d30283057ecdf8c543ae757ae34db3d7fd2d7732
72c77b7ea9ce781f4987840984a462e4456ba98e
46922d2a3aff5122253d97e64500801c08f4f2c0
2a5708409e4e05446eb1a89ecb48641d6fd5d5a9
97cb159fd91d00f8d7d1adeb075503dc0d946bff
eab05ec38073f72389386f4a77fb58c06e246a4c
4c237371f290d1ed3b2071dd43554362137b1cce
c3a696b6e8f8f75f9f75e556a9f9f6472eae2655

I don't deep in the changes of theses commits because I don't have the knowledge and time, but if I have time I will try to see what happen in theses commits.

And I forget to say, my laptop is X1 Carbon 5th Gen
Comment 1 Lv Zheng 2017-06-19 23:52:20 UTC
From Fernando:

(In reply to Lv Zheng from comment #161)
> Created attachment 256927 [details]
> [PATCH] ACPI: EC: Revert back to default wait polling style processing in
> noirq stage
> 
> Could someone try to:
> 
> 1. apply this commit.
> 2. boot the kernel with "acpi.ec_freeze_events=N acpi.ec_suspend_yield=Y"
> and let me know the result.
> 3. boot the kernel with "acpi.ec_freeze_events=Y acpi.ec_suspend_yield=Y"
> and let me know the result.
> 
> Thanks in advance.


No issues with 2, 3 and without params (as ec_suspend_yield default is TRUE),  tested 5 times with each

If boot with acpi.ec_suspend_yield=N, issues appears in first suspend
Comment 2 Lv Zheng 2017-06-19 23:53:11 UTC
From Damjan:

After testing it more carefully, it seems to work with "acpi.ec_freeze_events=Y acpi.ec_suspend_yield=Y"

Carbon 5th gen
20HQS0LV00
BIOS N1MET35W (1.20 ) 05/17/2017
Firmware Revision: 1.14
Comment 3 Lv Zheng 2017-06-19 23:54:05 UTC
From Gjorgji Jankovski:

(In reply to Lv Zheng from comment #161)
> Created attachment 256927 [details]
> [PATCH] ACPI: EC: Revert back to default wait polling style processing in
> noirq stage
> 
> Could someone try to:
> 
> 1. apply this commit.
> 2. boot the kernel with "acpi.ec_freeze_events=N acpi.ec_suspend_yield=Y"
> and let me know the result.
> 3. boot the kernel with "acpi.ec_freeze_events=Y acpi.ec_suspend_yield=Y"
> and let me know the result.
> 
> Thanks in advance.
After applying the patch it seems to be fixed.
Kernel: 4.11.4
BIOS: N1QET55W (1.30 ) 05/23/2017
Firmware Revision: 1.13
Model: T470
Comment 4 Andreas Lindhé 2017-06-20 12:31:03 UTC
In an attempt to clarify, I made a video demonstrating the bug I am experiencing: https://youtu.be/9NQ9x-Jm99Q.

From the descriptions in bug 191181 I assume this is what this new thread is about. Have I understood this correctly?
Comment 5 Lv Zheng 2017-06-21 01:26:17 UTC
To Andreas Lindhé:

It looks this happens occasionally.
Can the fan always return silence after running for a longer period?

I'm also not familiar with your problem.
It seems it is firmware version related.
Let me confirm this with you:
Can your problem be fixed by applying attachment 256927 [details]?
Tell me your test result about:
1. Booting kernel with: acpi.ec_freeze_events=N acpi.ec_suspend_yield=Y
2. Booting kernel with: acpi.ec_freeze_events=N acpi.ec_suspend_yield=N
3. Booting kernel with: acpi.ec_freeze_events=Y acpi.ec_suspend_yield=Y

Thanks
Comment 6 Andreas Lindhé 2017-06-21 06:42:53 UTC
(In reply to Lv Zheng from comment #5)
> To Andreas Lindhé:
> 
> It looks this happens occasionally.
> Can the fan always return silence after running for a longer period?
No. If they start spinning in this manner, they do not stop until I suspend the machine.

> 
> I'm also not familiar with your problem.
> It seems it is firmware version related.
> Let me confirm this with you:
> Can your problem be fixed by applying attachment 256927 [details]?
> Tell me your test result about:
> 1. Booting kernel with: acpi.ec_freeze_events=N acpi.ec_suspend_yield=Y
> 2. Booting kernel with: acpi.ec_freeze_events=N acpi.ec_suspend_yield=N
> 3. Booting kernel with: acpi.ec_freeze_events=Y acpi.ec_suspend_yield=Y
> 
> Thanks

Sure. I didn't have time during the semester to recompile my kernel, but I'll look into in in a few days and come back to you.
Comment 7 Tomislav Ivek 2017-06-23 09:34:02 UTC
(In reply to Lv Zheng from comment #5)
> To Andreas Lindhé:
> 
> It looks this happens occasionally.
> Can the fan always return silence after running for a longer period?
> 
> I'm also not familiar with your problem.
> It seems it is firmware version related.
> Let me confirm this with you:
> Can your problem be fixed by applying attachment 256927 [details]?
> Tell me your test result about:
> 1. Booting kernel with: acpi.ec_freeze_events=N acpi.ec_suspend_yield=Y
> 2. Booting kernel with: acpi.ec_freeze_events=N acpi.ec_suspend_yield=N
> 3. Booting kernel with: acpi.ec_freeze_events=Y acpi.ec_suspend_yield=Y
> 
> Thanks

Kernel: 4.12.0-rc4 and -rc6 with patch attachment 256927 [details]
Model: Lenovo ThinkPad T470
BIOS: N1QET55W (1.30 ) 05/23/2017
Firmware Revision: 1.13

The issue is gone in cases 1 and 3 (acpi.ec_suspend_yield=Y). I have tested this over the course of a couple of days, on and off ac power, with several hibernates or full shutdowns in between.

Booting with case 2 (acpi.ec_freeze_events=N acpi.ec_suspend_yield=N) shows the issue immediately after the first resume from suspend: the cpu fan spins fast and the acpitz-virtual-0 temp1 is stuck at 48°C.

There is an odd thing I have noticed in cases 1 and 3, ie. when the fan issue seems to be fixed. Immediately after resume, the thinkpad-isa-0000 fan1 is stuck at 65535 RPM even though the fan seems not to be spinning at all. After a couple of seconds fan1 abruptly changes to 0 RPM and stays there until some load makes the fan spin up and then the sensor shows actual fan speed. All the time acpitz-virtual-0 is responsive and shows the actual temperature.
Comment 8 Denis P. 2017-06-25 11:20:02 UTC
(In reply to Lv Zheng from comment #5)
> To Andreas Lindhé:
> 
> It looks this happens occasionally.
> Can the fan always return silence after running for a longer period?
> 
> I'm also not familiar with your problem.
> It seems it is firmware version related.
> Let me confirm this with you:
> Can your problem be fixed by applying attachment 256927 [details]?
> Tell me your test result about:
> 1. Booting kernel with: acpi.ec_freeze_events=N acpi.ec_suspend_yield=Y
> 2. Booting kernel with: acpi.ec_freeze_events=N acpi.ec_suspend_yield=N
> 3. Booting kernel with: acpi.ec_freeze_events=Y acpi.ec_suspend_yield=Y
> 
> Thanks

Kernel: 4.11.7-041107-generic patched with attachment 256927 [details]
Model: Lenovo ThinkPad T470
BIOS: N1QET55W (1.30 ),Release Date: 05/23/2017
Firmware Revision: 1.13

I also can confirm that the issue is gone when I boot with the kernel parameters 1 and 3. My observations are the same as in comment #7.
Comment 9 Lv Zheng 2017-06-26 02:25:59 UTC
One interesting test could be:

Under /sys/devices/system/cpu, there are "cpu devices".
Could you try to write all cpu's scaling_min_freq values with their scaling_max_freq values?
Can this cause fan blowing up?

If you have intel_pstate enabled, you can also do this in a simpler way by writing "100" to /sys/devices/system/cpu/intel_pstate/min_perf_pct.

Then if this can cause fan blowing up, will it return normal after changing the scaling_min_freq values or min_perf_pct values back?

Thanks
Lv
Comment 10 Denis P. 2017-06-26 02:41:35 UTC
(In reply to Lv Zheng from comment #9)
> One interesting test could be:
> 
> Under /sys/devices/system/cpu, there are "cpu devices".
> Could you try to write all cpu's scaling_min_freq values with their
> scaling_max_freq values?
> Can this cause fan blowing up?
> 
> If you have intel_pstate enabled, you can also do this in a simpler way by
> writing "100" to /sys/devices/system/cpu/intel_pstate/min_perf_pct.
> 
> Then if this can cause fan blowing up, will it return normal after changing
> the scaling_min_freq values or min_perf_pct values back?
> 
> Thanks
> Lv

I have tried it:
Original value is:
cat /sys/devices/system/cpu/intel_pstate/min_perf_pct
13

Sensor values with original value:
sensors
iwlwifi-virtual-0
Adapter: Virtual device
temp1:        +28.0°C  

thinkpad-isa-0000
Adapter: ISA adapter
fan1:           0 RPM

acpitz-virtual-0
Adapter: Virtual device
temp1:        +43.0°C  (crit = +128.0°C)

coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +44.0°C  (high = +100.0°C, crit = +100.0°C)
Core 0:        +42.0°C  (high = +100.0°C, crit = +100.0°C)
Core 1:        +42.0°C  (high = +100.0°C, crit = +100.0°C)

pch_skylake-virtual-0
Adapter: Virtual device
temp1:        +39.0°C  

Now with the value of 100 for some minutes:
echo 100 > /sys/devices/system/cpu/intel_pstate/min_perf_pct

sensors
iwlwifi-virtual-0
Adapter: Virtual device
temp1:        +30.0°C  

thinkpad-isa-0000
Adapter: ISA adapter
fan1:           0 RPM

acpitz-virtual-0
Adapter: Virtual device
temp1:        +46.0°C  (crit = +128.0°C)

coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +49.0°C  (high = +100.0°C, crit = +100.0°C)
Core 0:        +48.0°C  (high = +100.0°C, crit = +100.0°C)
Core 1:        +49.0°C  (high = +100.0°C, crit = +100.0°C)

pch_skylake-virtual-0
Adapter: Virtual device
temp1:        +41.5°C  

This does not cause fan to blow up.


Now I wrote again original value:

echo 13 > /sys/devices/system/cpu/intel_pstate/min_perf_pct

sensor values after some minutes:
sensors
iwlwifi-virtual-0
Adapter: Virtual device
temp1:        +29.0°C  

thinkpad-isa-0000
Adapter: ISA adapter
fan1:           0 RPM

acpitz-virtual-0
Adapter: Virtual device
temp1:        +44.0°C  (crit = +128.0°C)

coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +45.0°C  (high = +100.0°C, crit = +100.0°C)
Core 0:        +43.0°C  (high = +100.0°C, crit = +100.0°C)
Core 1:        +43.0°C  (high = +100.0°C, crit = +100.0°C)

pch_skylake-virtual-0
Adapter: Virtual device
temp1:        +40.5°C
Comment 11 H Zeng 2017-06-27 20:13:11 UTC
Sorry, but I do not have the knowledge to apply patch and compile the kernel myself so I cannot provide the test information regarding to T470s.

P.S. Is there any patched and compiled kernel I can install and test?

P.P.S Which version of the kernel will the patch being merged into?
Comment 12 Darrick J. Wong 2017-06-28 15:13:54 UTC
Hi,

I built a 4.11.7 kernel with patch attachment 256927 [details].

/sys/module/acpi/parameters//acpica_version : 20170119
/sys/module/acpi/parameters//aml_debug_output = 0
/sys/module/acpi/parameters//ec_busy_polling = N
/sys/module/acpi/parameters//ec_delay = 500
/sys/module/acpi/parameters//ec_event_clearing = query
/sys/module/acpi/parameters//ec_freeze_events = Y
/sys/module/acpi/parameters//ec_max_queries = 16
/sys/module/acpi/parameters//ec_polling_guard = 550
/sys/module/acpi/parameters//ec_storm_threshold = 8
/sys/module/acpi/parameters//ec_suspend_yield = Y
/sys/module/acpi/parameters//immediate_undock = Y

With these settings, the /first/ suspend worked fine as you'd expect, but after the second suspend/resume cycle the fan came on and stayed on.

iwlwifi-virtual-0
Adapter: Virtual device
temp1:        +28.0°C  

pch_skylake-virtual-0
Adapter: Virtual device
temp1:        +26.0°C  

acpitz-virtual-0
Adapter: Virtual device
temp1:        +48.0°C  (crit = +128.0°C)

thinkpad-isa-0000
Adapter: ISA adapter
fan1:        3975 RPM

coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +30.0°C  (high = +100.0°C, crit = +100.0°C)
Core 0:        +28.0°C  (high = +100.0°C, crit = +100.0°C)
Core 1:        +27.0°C  (high = +100.0°C, crit = +100.0°C)
Comment 13 Darrick J. Wong 2017-06-28 20:17:53 UTC
Sorry, somehow I forgot to mention that my machine is:

Model: Lenovo ThinkPad T470
BIOS: N1QET55W (1.30), Release Date: 05/23/2017
Firmware Revision: 1.13
Comment 14 Denis P. 2017-06-28 20:25:20 UTC
Interesting. I have the same settings like in comment 12 only difference is the acpica_version which is 20170303 . For me the issue with fan blowing is resolved with that settings.
Comment 15 Damjan Georgievski 2017-07-01 10:06:38 UTC
attachment 256927 [details] without any kernel command line parameters, fixes the issue for me too.

Kernel 4.12-rc7
ThinkPad X1 Carbon 5th
LENOVO 20HQS0LV00/20HQS0LV00, BIOS N1MET35W (1.20 ) 05/17/2017
EC: Firmware Revision: 1.14
Comment 16 Tomislav Ivek 2017-07-02 11:24:42 UTC
The original symptoms resurfaced today after a week of normal operation with 4.12.0-rc6, patch attachment 256927 [details], acpi.ec_freeze_events=Y, and acpi.ec_suspend_yield=Y. I am getting acpitz-virtual-0 temp1 at 48°C and the fan spinning at max speed until a power off/hibernate. Will try acpi.ec_freeze_events=N acpi.ec_suspend_yield=Y for a couple of days.
Comment 17 Zhang Rui 2017-07-02 14:54:54 UTC
*** Bug 191181 has been marked as a duplicate of this bug. ***
Comment 18 fellowsgarden 2017-07-06 10:48:00 UTC
(In reply to Damjan Georgievski from comment #15)
> attachment 256927 [details] without any kernel command line parameters,
> fixes the issue for me too.
> 
> Kernel 4.12-rc7
> ThinkPad X1 Carbon 5th
> LENOVO 20HQS0LV00/20HQS0LV00, BIOS N1MET35W (1.20 ) 05/17/2017
> EC: Firmware Revision: 1.14

Same here (x1 carbon 2017, i7, bios v1.20). Anything I can do to help (resolve this issue)? Recommended to try and apply one of these "patches" or just to wait and hope that it will be resolved through a normal update, soon.

Running Ubuntu 17.04.

Thanks!
Comment 19 Damjan Georgievski 2017-07-06 12:51:27 UTC
> Same here (x1 carbon 2017, i7, bios v1.20). Anything I can do to help
> (resolve this issue)? Recommended to try and apply one of these "patches" or
> just to wait and hope that it will be resolved through a normal update, soon.
> 
> Running Ubuntu 17.04.

if you know how to compile and setup your own kernel try the patch in attachment 256927 [details]
Comment 20 fellowsgarden 2017-07-06 13:58:50 UTC
(In reply to Damjan Georgievski from comment #19)
> > Same here (x1 carbon 2017, i7, bios v1.20). Anything I can do to help
> > (resolve this issue)? Recommended to try and apply one of these "patches"
> or
> > just to wait and hope that it will be resolved through a normal update,
> soon.
> > 
> > Running Ubuntu 17.04.
> 
> if you know how to compile and setup your own kernel try the patch in
> attachment 256927 [details]

1)
Never done it. If you could point me to a step by step guide that would be great. 

2)
Given that this bug is being discussed here, how likely is it that a fix is going to find its way into an update of Ubuntu 17.04 soon? How soon?
Comment 21 Darrick J. Wong 2017-07-06 17:44:27 UTC
FWIW after a week's worth of wanderings with this laptop (4.11.7), I've observed that about half of the resumes result in the "48C, fans blowing constantly" situation and the rest of the time the fans work as expected.  4.12.0 seems no different.  I've not noticed any difference in the ec-related dmesg spew between the two cases, unfortunately.
Comment 22 fellowsgarden 2017-07-07 07:49:54 UTC
3)
Has the patch (above) been included in the new stable kernel (v. 4.12) which was released earlier this week?
Comment 23 Damjan Georgievski 2017-07-07 10:08:14 UTC
(In reply to fellowsgarden from comment #22)
> 3)
> Has the patch (above) been included in the new stable kernel (v. 4.12) which
> was released earlier this week?

no.

ps. this is not the proper channel for kernel compile/install tutorials. you better ask an Ubuntu forum
Comment 24 Lv Zheng 2017-07-10 01:12:51 UTC
There is still a discussion for whether the noirq stage tuning is useful in community.
If not, all noirq hooks will be removed instead of adding an option.

Thanks
Lv
Comment 25 fellowsgarden 2017-07-11 17:29:01 UTC
I'm inclined to try out the patch (and learn how to independently elsewhere) for two reasons:
a) I'm eager to have a system which works
b) Further up on this page there's an indication that "more info is needed" Status: NEEDINFO: if I could help, I'd be more than happy to!

But if my help (e.g. to test if the noirq stage modification is useful on my system - a system which doesn't seem to be unique here!) is not needed, then, before attempting to implement the patch, I would like to know if the problem will just "go away by itself" with an update in the not too distant future. -- So, if I may, reiterate my question from my earlier post: Any educated guess as to when a "fix" might be included in the normal ubuntu 17.04 updates? Rough ballpark: Q4_2017 or Q2_2018 or only Q3_2021? By "educated" I mean from your experience with the time it takes for kernel bug fixes to eventually percolate to stable ubuntu releases. Maybe this question, too, belongs on stackexchange and not here. Apologies if so (am new to kernel.org and still learning my way about...)

Thanks.
Comment 26 Lv Zheng 2017-07-12 03:10:34 UTC
Patches submitted here:

https://patchwork.kernel.org/patch/9835825/
https://patchwork.kernel.org/patch/9835823/

Thanks
Comment 27 Lv Zheng 2017-07-12 03:13:00 UTC
To Darrick J. Wong:

I also suspected these fixes might not be the root cause fixes.
I think you can wait until the upstream kernel is patched with the fixes required for ThinkPad X1 Carbon 5th Gen users. Then if you still had problems with the patched upstream kernel, you could file a new bug to report your issue, starting from thermal category.

Thanks
Lv
Comment 28 Darrick J. Wong 2017-07-12 06:16:49 UTC
(In reply to Lv Zheng from comment #27)
> To Darrick J. Wong:
> 
> I also suspected these fixes might not be the root cause fixes.
> I think you can wait until the upstream kernel is patched with the fixes
> required for ThinkPad X1 Carbon 5th Gen users. Then if you still had
> problems with the patched upstream kernel, you could file a new bug to
> report your issue, starting from thermal category.

I'll set myself a reminder to include those two patches the next time I build a kernel for that T470.  Which, given my continuing need to integrate xfs_scrub fixes, probably won't be more than a few days. ;)

--D
Comment 29 Gjorgji Jankovski 2017-07-12 12:02:36 UTC
(In reply to Lv Zheng from comment #26)
> Patches submitted here:
> 
> https://patchwork.kernel.org/patch/9835825/
> https://patchwork.kernel.org/patch/9835823/
> 
> Thanks

Just tried it again with these specific ones and it seems to be fine for now.
Comment 30 Lv Zheng 2017-07-13 02:47:12 UTC
To Gjorgji Jankovski:

Thanks for confirmation.
I'll mark this bug as RESOLVED.

Thanks
Lv
Comment 31 Darrick J. Wong 2017-07-14 05:27:23 UTC
So far so good, after a whole day of testing...
Comment 32 Damjan Georgievski 2017-07-14 10:06:44 UTC
(In reply to Lv Zheng from comment #26)
> Patches submitted here:
> 
> https://patchwork.kernel.org/patch/9835825/
> https://patchwork.kernel.org/patch/9835823/

so far so good on 4.12.1 + the patches.
X1 Carbon (gen 5) + bios 1.22 (new from this week)
Comment 33 Lv Zheng 2017-07-14 10:14:29 UTC
(In reply to Darrick J. Wong from comment #31)
> So far so good, after a whole day of testing...

So your platform is also OK now.

(In reply to Damjan Georgievski from comment #32)
> (In reply to Lv Zheng from comment #26)
> > Patches submitted here:
> > 
> > https://patchwork.kernel.org/patch/9835825/
> > https://patchwork.kernel.org/patch/9835823/
> 
> so far so good on 4.12.1 + the patches.
> X1 Carbon (gen 5) + bios 1.22 (new from this week)

Patches are merged by linux-pm.git, will appear in upstream in next release process.
I'll close the bug.

Thanks for the reports and the tests.
Comment 34 Darrick J. Wong 2017-07-17 06:11:03 UTC
Regrettably, it just happened again, three days in. :(
Comment 35 Andreas Lindhé 2017-07-17 07:55:45 UTC
(In reply to Darrick J. Wong from comment #34)
> Regrettably, it just happened again, three days in. :(

I'm on my fourth day (4.12.0-2 + these patches), and  it still works fine on X1C.
Comment 36 Tomislav Ivek 2017-07-17 19:12:51 UTC
With patch 256927 on 4.12.0rc6 I got the problem on T470 within ten days, with both acpi.ec_freeze_events=N and Y.

Now I am testing patches 9835823 and 9835825 on kernel 4.12.0 without any specific kernel boot options. A couple of days in, so far so good.
Comment 37 Lv Zheng 2017-07-18 05:08:46 UTC
> Regrettably, it just happened again, three days in.

Now you can file a different bug than X1C users', asking help from thermal developers.
Comment 38 Gjorgji Jankovski 2017-07-19 08:47:09 UTC
Happened again when unsuspending while docked on a T470.
Comment 39 Tobias Westergaard Kjeldsen 2017-07-21 08:53:23 UTC
I'm on a T470s and have had the same problem as described here.

On 4.11.11 with patches 9835825 and 9835823 applied the problem has now gone.
Comment 40 Tomislav Ivek 2017-08-05 14:13:38 UTC
T470, patched kernel 4.12.4 and the bug is still present. The symptoms do not occur as often as without the patch but every couple of days the fans still spins fast after resume and acpitz-virtual-0 is stuck at 48°C. BIOS version: N1QET56W, BIOS Revision: 1.31, Firmware Revision: 1.14
Comment 41 Andreas Lindhé 2017-08-05 18:45:29 UTC
I believe that was never the same bug.
Comment 42 Lv Zheng 2017-08-07 01:03:06 UTC
To Tomislav:

Could you try to patch the followings and see if the situation can be improved:

https://patchwork.kernel.org/patch/9870917/
https://patchwork.kernel.org/patch/9870915/
https://patchwork.kernel.org/patch/9870919/
https://patchwork.kernel.org/patch/9870925/

Thanks
Lv
Comment 43 Tomislav Ivek 2017-08-07 15:22:23 UTC
(In reply to Lv Zheng from comment #42)
> To Tomislav:
> 
> Could you try to patch the followings and see if the situation can be
> improved:
> 
> https://patchwork.kernel.org/patch/9870917/
> https://patchwork.kernel.org/patch/9870915/
> https://patchwork.kernel.org/patch/9870919/
> https://patchwork.kernel.org/patch/9870925/
> 
> Thanks
> Lv


Lv, I am now running 4.13.0-rc4 with the four patches above. So far so good, but as the bug only appears sporadically I would like to test the new kernel for a couple of days under normal workloads.
Comment 44 Lv Zheng 2017-08-08 02:29:43 UTC
Sure, I'll wait until your next feedback and refresh the posted patches with enhanced patch description and your tested-by.
Comment 45 Tomislav Ivek 2017-08-10 14:26:14 UTC
After three days of normal behavior, today the bug resurfaced again on T470 running 4.13.0-rc4 and the four latest patches, after a 5-gour standby. This is better than hearing fans spin up every resume, but still not fixed.

I do not see anything suspicious in dmesg except perhaps this message which might be unrelated:
mei_me 0000:00:16.0: can't suspend (mei_me_pm_runtime_suspend [mei_me] returned -62)
mei_me 0000:00:16.0: unexpected reset: dev_state = ENABLED fw status = 90000245 80100306 00000020 00084400 00000000 40400AD9

Anything else I can try? Thank you for your hard work.
Comment 46 Lv Zheng 2017-08-11 02:07:26 UTC
> This is better than hearing fans spin up every resume, but still not fixed.

The patches can help to handle EC events for a longer period during suspend.
Maybe we can tune the timing of acpi_ec_block_transaction() invocation to make the period even longer.
I'll check with the others to get a possible latest stage to invoke acpi_ec_block_transactions().

Thanks
Lv
Comment 47 fellowsgarden 2017-08-26 19:43:08 UTC
Sorry for asking a beginner's question (again): in the links to both patches above it currently still says "state: awaiting upstream". I've googled this a bit but still have not arrived at a clear idea as to how to interpret this.

 - Does "awaiting upstream" imply an indefinite wait?

(Does this mean that eventually these patches will enter some future kernel of, say, 4.18, or have they already entered some specific kernel, e.g. 4.13-rc6?)

Bottom line:

 - How soon will these patches arrive via normal updates (educated guess from past experience)?

Thanks!
Comment 48 H Zeng 2017-08-26 20:01:44 UTC
I am just here to report that this issue seems gone on my T470s with the arrival of kernel 4.12.7.

I am using openSUSE Tumbleweed. It skipped kernel 4.12.1-6 and jumped to 4.12.7 on last Saturday. For one week, with more than a dozen cycles of sleep and awake, my T470s has seen no issue of fan blowing after resume.
Comment 49 Dominik Sandjaja 2017-09-08 18:40:52 UTC
Since 4.12.8 on Fedora 26, the issue is also gone after at least two dozen sleep/wake cycles. X1 Carbon (4th generation).
Comment 50 Lv Zheng 2017-09-11 02:09:51 UTC
(In reply to fellowsgarden from comment #47)
> Sorry for asking a beginner's question (again): in the links to both patches
> above it currently still says "state: awaiting upstream". I've googled this
> a bit but still have not arrived at a clear idea as to how to interpret this.
> 
>  - Does "awaiting upstream" imply an indefinite wait?
> 
> (Does this mean that eventually these patches will enter some future kernel
> of, say, 4.18, or have they already entered some specific kernel, e.g.
> 4.13-rc6?)
> 
> Bottom line:
> 
>  - How soon will these patches arrive via normal updates (educated guess
> from past experience)?

Regression fixes are already upstreamed in Linus tree:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=662591461c4
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9c40f956ce9

That's why the bug is closed as code fix.
If you still can suffer from same issue, please open a new bug, try old kernels, and let us know which kernel is the earliest bad kernel and latest good kernel.
Comment 51 shian 2017-09-11 11:39:34 UTC
(In reply to Dominik Sandjaja from comment #49)
> Since 4.12.8 on Fedora 26, the issue is also gone after at least two dozen
> sleep/wake cycles. X1 Carbon (4th generation).

I am sad,My x1c 5th with kernel 4.12.9 but the bug is still present :(
Comment 52 Lv Zheng 2017-09-12 05:46:56 UTC
If you mean the regression fixes. It should be in 4.13 kernels.
$ git tag --contains 66259146
v4.13
v4.13-rc1
v4.13-rc2
v4.13-rc3
v4.13-rc4
v4.13-rc5
v4.13-rc6
v4.13-rc7

Anyone that still can reproduce this issue on latest v4.13 kernels (even with lowered replication rate), please upload lspci/dmidecode output here.
Thanks in advance.
Comment 53 Jens Axboe 2017-09-12 15:04:00 UTC
FWIW, on my side things are working fine. I upgraded to the latest BIOS the other day and I'm running 4.13+ on my laptop (x1 gen4), and no fan issues.
Comment 54 Tomislav Ivek 2017-09-12 22:54:49 UTC
Created attachment 258341 [details]
T470 lspci output

(In reply to Lv Zheng from comment #52)
> If you mean the regression fixes. It should be in 4.13 kernels.
> $ git tag --contains 66259146
> v4.13
> v4.13-rc1
> v4.13-rc2
> v4.13-rc3
> v4.13-rc4
> v4.13-rc5
> v4.13-rc6
> v4.13-rc7
> 
> Anyone that still can reproduce this issue on latest v4.13 kernels (even
> with lowered replication rate), please upload lspci/dmidecode output here.
> Thanks in advance.

On ThinkPad T470 I do see the issue as reported in comment 45 on 4.13-rc4 with patches:
https://patchwork.kernel.org/patch/9870917/
https://patchwork.kernel.org/patch/9870915/
https://patchwork.kernel.org/patch/9870919/
https://patchwork.kernel.org/patch/9870925/

(will attach dmidecode output below). Next I can try the latest 4.13 as well as the recently published BIOS update.
Comment 55 Tomislav Ivek 2017-09-12 22:56:34 UTC
Created attachment 258343 [details]
T470 dmidecode output

dmidecode output for T470 running kernel 4.13-rc4 with the issue still appearing.
Comment 56 Lv Zheng 2017-09-13 02:01:04 UTC
To Tomislav:

I happened to know a fact that, there are many issues related to this phenomenon:
FAN is blowing up because EC has failed to obtain the temperature of a specific CPU, returning invalid temperature value to the BIOS code and BIOS code decided to blow the FAN up. Normally this is because a failure of a PECI communication that could be resulted from various causes.

Some of the root causes I learned recently include:
If a CPU is not in a proper C state, PECI will fail to obtain the temperature.
If someone is using a specific SSD (Samsung SM961 512GB), he has to upgrade his EC firmware.

So please also upload the lspci output here.
And I'd suggest you to file a different bug to investigate the root cause of your platform.

This bug is a regression related to Lenovo Carbon X1 users where the root cause might be related to the EC firmware event handling mechanism.

Thanks and best regards
Lv
Comment 57 Lv Zheng 2017-09-14 07:29:22 UTC
To Tomislav:

Could you also upload an acpidump output here.

Thanks
Lv
Comment 58 Lv Zheng 2017-09-14 07:35:39 UTC
To Tomislav:

In the lspci:
3e:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM961/PM961 (prog-if 02 [NVM Express])

It's the NVMe model that can reproduce the issue.
It's reported that the issue couldn't be reproduced with specific NVMe models.
Only limitted SSDs models are affected.

Thanks
Lv
Comment 59 shian 2017-09-14 07:52:12 UTC
Created attachment 258371 [details]
x1c Generation 5 (2017)  acpidump output
Comment 60 shian 2017-09-14 08:00:20 UTC
Sorry, English is not my native language, My x1c has the same SM961 NVMe SSD,Its my acpidump output in comment 59. Hope it be helpful.
Comment 61 Lv Zheng 2017-09-15 04:24:53 UTC
> To shian 

Have you confirmed like Tomislav that you are still suffering from this issue after upgrading to latest kernel?

Thanks
Lv
Comment 62 Lv Zheng 2017-09-15 04:30:20 UTC
> To shian

This is an already fixed/closed bug.
We are still leaving comments here as Tomislav complained a lowered replication ratio in comment 36, comment 40, comment 43 and comment 45 (probably should open a different bug).
Comment 63 Lv Zheng 2017-09-15 04:40:29 UTC
> To shian

You are using 4.12.9 while I have confirmed that the fix is in 4.13 kernels in comment 52. Have you tried?
Comment 64 shian 2017-09-15 05:31:11 UTC
I am so sorry I have not been fully tested,I think that the issue still exists on my pc because I found following patch in the fedora src.rpm package:

$ rpm2cpio kernel-4.12.11-300.fc26.src.rpm| cpio -div
$ cat patch-4.12.11| grep '@@.*acpi_ec_ecdt_probe' -A31
@@ -1812,24 +1812,6 @@ int __init acpi_ec_ecdt_probe(void)
 }
 
 #ifdef CONFIG_PM_SLEEP
-static int acpi_ec_suspend_noirq(struct device *dev)
-{
-	struct acpi_ec *ec =
-		acpi_driver_data(to_acpi_device(dev));
-
-	acpi_ec_enter_noirq(ec);
-	return 0;
-}
-
-static int acpi_ec_resume_noirq(struct device *dev)
-{
-	struct acpi_ec *ec =
-		acpi_driver_data(to_acpi_device(dev));
-
-	acpi_ec_leave_noirq(ec);
-	return 0;
-}
-
 static int acpi_ec_suspend(struct device *dev)
 {
 	struct acpi_ec *ec =
@@ -1851,7 +1833,6 @@ static int acpi_ec_resume(struct device *dev)
 #endif
 
 static const struct dev_pm_ops acpi_ec_pm = {
-	SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(acpi_ec_suspend_noirq, acpi_ec_resume_noirq)
 	SET_SYSTEM_SLEEP_PM_OPS(acpi_ec_suspend, acpi_ec_resume)
 };

After receiving your e-mail, I tested the 4.13.1 kernel and reproduced the problem.
I copied the output record:

[fengzi@x1c ~]% while [ 1 ] ; do sensors | awk '{if ($0 ~ /Package/) temp = $4; else if ($0 ~ /fan/) {fan = $2; unit = $3}} END{print temp"  "fan" "unit}'; sleep 2; done
+48.0°C  0 RPM
+47.0°C  0 RPM
+67.0°C  0 RPM
ERROR: Can't get value of subfeature temp1_input: I/O error
+60.0°C  65535 RPM
+61.0°C  65535 RPM
+49.0°C  65535 RPM
+48.0°C  65535 RPM
+47.0°C  0 RPM
ERROR: Can't get value of subfeature temp1_input: I/O error
+76.0°C  3492 RPM
+49.0°C  4538 RPM
+47.0°C  5208 RPM
+46.0°C  5836 RPM
+45.0°C  6423 RPM
+44.0°C  7025 RPM
+45.0°C  6976 RPM
+46.0°C  6960 RPM
+45.0°C  6960 RPM
+47.0°C  6960 RPM
+43.0°C  6960 RPM
+44.0°C  6960 RPM
^C%                                                                                                                                                                                 [fengzi@x1c ~]% sensors
coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +45.0°C  (high = +100.0°C, crit = +100.0°C)
Core 0:        +45.0°C  (high = +100.0°C, crit = +100.0°C)
Core 1:        +42.0°C  (high = +100.0°C, crit = +100.0°C)

pch_skylake-virtual-0
Adapter: Virtual device
temp1:        +43.5°C  

acpitz-virtual-0
Adapter: Virtual device
temp1:        +48.0°C  (crit = +128.0°C)

iwlwifi-virtual-0
Adapter: Virtual device
temp1:        +41.0°C  

thinkpad-isa-0000
Adapter: ISA adapter
fan1:        6960 RPM

[fengzi@x1c ~]% uname -r
4.13.1-303.fc27.x86_64

Do I need to provide additional information?

Thanks
an
Comment 65 shian 2017-09-16 02:56:41 UTC
Oh! Some things I forgot.
In comment 64, Output:
"ERROR: Can't get value of subfeature temp1_input: I/O error"
corresponds to a suspend and wake-up.
Comment 66 Lv Zheng 2017-09-18 01:44:31 UTC
To Shian:

OK, it looks your issue is a different one, let me file a different bug for you, and categorize it as thermal so that it can be investigated by the right owners.

Thanks
Lv
Comment 67 Lv Zheng 2017-09-18 02:00:34 UTC
To shian:
Split your report to bug 196973.

To Tomislav:
Split your report to bug 196975.

We could track these issues there.

Thanks
Lv
Comment 68 Lv Zheng 2017-09-18 02:04:07 UTC
If the new reports are stuck here, they will only be investigated if the order of the suspend steps matters. So they need to be re-categorized and investigated via their possible root causes.

Note You need to log in before you can comment on or make changes to this bug.