Bug 196129
Summary: | EC noirq spinnig - Fan blows up on Lenovo Carbon X1 Generation 5 (2017) | ||
---|---|---|---|
Product: | ACPI | Reporter: | Lv Zheng (lv.zheng) |
Component: | EC | Assignee: | Lv Zheng (lv.zheng) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | andreas, danilo, djwong, dominik, fellowsgarden, fengziyonghu, george.sapkin, haoxian.zeng, j.gjorgji, lorenzo.benvenuti, m-bugzilla, nanochaves, nickj21, robert.n.sharp, sander, sbrabec, theoriginal.skullburner, tobias, tomislav.ivek |
Priority: | P3 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 4.11 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
T470 lspci output
T470 dmidecode output x1c Generation 5 (2017) acpidump output |
Description
Lv Zheng
2017-06-19 23:46:59 UTC
From Fernando: (In reply to Lv Zheng from comment #161) > Created attachment 256927 [details] > [PATCH] ACPI: EC: Revert back to default wait polling style processing in > noirq stage > > Could someone try to: > > 1. apply this commit. > 2. boot the kernel with "acpi.ec_freeze_events=N acpi.ec_suspend_yield=Y" > and let me know the result. > 3. boot the kernel with "acpi.ec_freeze_events=Y acpi.ec_suspend_yield=Y" > and let me know the result. > > Thanks in advance. No issues with 2, 3 and without params (as ec_suspend_yield default is TRUE), tested 5 times with each If boot with acpi.ec_suspend_yield=N, issues appears in first suspend From Damjan: After testing it more carefully, it seems to work with "acpi.ec_freeze_events=Y acpi.ec_suspend_yield=Y" Carbon 5th gen 20HQS0LV00 BIOS N1MET35W (1.20 ) 05/17/2017 Firmware Revision: 1.14 From Gjorgji Jankovski: (In reply to Lv Zheng from comment #161) > Created attachment 256927 [details] > [PATCH] ACPI: EC: Revert back to default wait polling style processing in > noirq stage > > Could someone try to: > > 1. apply this commit. > 2. boot the kernel with "acpi.ec_freeze_events=N acpi.ec_suspend_yield=Y" > and let me know the result. > 3. boot the kernel with "acpi.ec_freeze_events=Y acpi.ec_suspend_yield=Y" > and let me know the result. > > Thanks in advance. After applying the patch it seems to be fixed. Kernel: 4.11.4 BIOS: N1QET55W (1.30 ) 05/23/2017 Firmware Revision: 1.13 Model: T470 In an attempt to clarify, I made a video demonstrating the bug I am experiencing: https://youtu.be/9NQ9x-Jm99Q. From the descriptions in bug 191181 I assume this is what this new thread is about. Have I understood this correctly? To Andreas Lindhé:
It looks this happens occasionally.
Can the fan always return silence after running for a longer period?
I'm also not familiar with your problem.
It seems it is firmware version related.
Let me confirm this with you:
Can your problem be fixed by applying attachment 256927 [details]?
Tell me your test result about:
1. Booting kernel with: acpi.ec_freeze_events=N acpi.ec_suspend_yield=Y
2. Booting kernel with: acpi.ec_freeze_events=N acpi.ec_suspend_yield=N
3. Booting kernel with: acpi.ec_freeze_events=Y acpi.ec_suspend_yield=Y
Thanks
(In reply to Lv Zheng from comment #5) > To Andreas Lindhé: > > It looks this happens occasionally. > Can the fan always return silence after running for a longer period? No. If they start spinning in this manner, they do not stop until I suspend the machine. > > I'm also not familiar with your problem. > It seems it is firmware version related. > Let me confirm this with you: > Can your problem be fixed by applying attachment 256927 [details]? > Tell me your test result about: > 1. Booting kernel with: acpi.ec_freeze_events=N acpi.ec_suspend_yield=Y > 2. Booting kernel with: acpi.ec_freeze_events=N acpi.ec_suspend_yield=N > 3. Booting kernel with: acpi.ec_freeze_events=Y acpi.ec_suspend_yield=Y > > Thanks Sure. I didn't have time during the semester to recompile my kernel, but I'll look into in in a few days and come back to you. (In reply to Lv Zheng from comment #5) > To Andreas Lindhé: > > It looks this happens occasionally. > Can the fan always return silence after running for a longer period? > > I'm also not familiar with your problem. > It seems it is firmware version related. > Let me confirm this with you: > Can your problem be fixed by applying attachment 256927 [details]? > Tell me your test result about: > 1. Booting kernel with: acpi.ec_freeze_events=N acpi.ec_suspend_yield=Y > 2. Booting kernel with: acpi.ec_freeze_events=N acpi.ec_suspend_yield=N > 3. Booting kernel with: acpi.ec_freeze_events=Y acpi.ec_suspend_yield=Y > > Thanks Kernel: 4.12.0-rc4 and -rc6 with patch attachment 256927 [details] Model: Lenovo ThinkPad T470 BIOS: N1QET55W (1.30 ) 05/23/2017 Firmware Revision: 1.13 The issue is gone in cases 1 and 3 (acpi.ec_suspend_yield=Y). I have tested this over the course of a couple of days, on and off ac power, with several hibernates or full shutdowns in between. Booting with case 2 (acpi.ec_freeze_events=N acpi.ec_suspend_yield=N) shows the issue immediately after the first resume from suspend: the cpu fan spins fast and the acpitz-virtual-0 temp1 is stuck at 48°C. There is an odd thing I have noticed in cases 1 and 3, ie. when the fan issue seems to be fixed. Immediately after resume, the thinkpad-isa-0000 fan1 is stuck at 65535 RPM even though the fan seems not to be spinning at all. After a couple of seconds fan1 abruptly changes to 0 RPM and stays there until some load makes the fan spin up and then the sensor shows actual fan speed. All the time acpitz-virtual-0 is responsive and shows the actual temperature. (In reply to Lv Zheng from comment #5) > To Andreas Lindhé: > > It looks this happens occasionally. > Can the fan always return silence after running for a longer period? > > I'm also not familiar with your problem. > It seems it is firmware version related. > Let me confirm this with you: > Can your problem be fixed by applying attachment 256927 [details]? > Tell me your test result about: > 1. Booting kernel with: acpi.ec_freeze_events=N acpi.ec_suspend_yield=Y > 2. Booting kernel with: acpi.ec_freeze_events=N acpi.ec_suspend_yield=N > 3. Booting kernel with: acpi.ec_freeze_events=Y acpi.ec_suspend_yield=Y > > Thanks Kernel: 4.11.7-041107-generic patched with attachment 256927 [details] Model: Lenovo ThinkPad T470 BIOS: N1QET55W (1.30 ),Release Date: 05/23/2017 Firmware Revision: 1.13 I also can confirm that the issue is gone when I boot with the kernel parameters 1 and 3. My observations are the same as in comment #7. One interesting test could be: Under /sys/devices/system/cpu, there are "cpu devices". Could you try to write all cpu's scaling_min_freq values with their scaling_max_freq values? Can this cause fan blowing up? If you have intel_pstate enabled, you can also do this in a simpler way by writing "100" to /sys/devices/system/cpu/intel_pstate/min_perf_pct. Then if this can cause fan blowing up, will it return normal after changing the scaling_min_freq values or min_perf_pct values back? Thanks Lv (In reply to Lv Zheng from comment #9) > One interesting test could be: > > Under /sys/devices/system/cpu, there are "cpu devices". > Could you try to write all cpu's scaling_min_freq values with their > scaling_max_freq values? > Can this cause fan blowing up? > > If you have intel_pstate enabled, you can also do this in a simpler way by > writing "100" to /sys/devices/system/cpu/intel_pstate/min_perf_pct. > > Then if this can cause fan blowing up, will it return normal after changing > the scaling_min_freq values or min_perf_pct values back? > > Thanks > Lv I have tried it: Original value is: cat /sys/devices/system/cpu/intel_pstate/min_perf_pct 13 Sensor values with original value: sensors iwlwifi-virtual-0 Adapter: Virtual device temp1: +28.0°C thinkpad-isa-0000 Adapter: ISA adapter fan1: 0 RPM acpitz-virtual-0 Adapter: Virtual device temp1: +43.0°C (crit = +128.0°C) coretemp-isa-0000 Adapter: ISA adapter Package id 0: +44.0°C (high = +100.0°C, crit = +100.0°C) Core 0: +42.0°C (high = +100.0°C, crit = +100.0°C) Core 1: +42.0°C (high = +100.0°C, crit = +100.0°C) pch_skylake-virtual-0 Adapter: Virtual device temp1: +39.0°C Now with the value of 100 for some minutes: echo 100 > /sys/devices/system/cpu/intel_pstate/min_perf_pct sensors iwlwifi-virtual-0 Adapter: Virtual device temp1: +30.0°C thinkpad-isa-0000 Adapter: ISA adapter fan1: 0 RPM acpitz-virtual-0 Adapter: Virtual device temp1: +46.0°C (crit = +128.0°C) coretemp-isa-0000 Adapter: ISA adapter Package id 0: +49.0°C (high = +100.0°C, crit = +100.0°C) Core 0: +48.0°C (high = +100.0°C, crit = +100.0°C) Core 1: +49.0°C (high = +100.0°C, crit = +100.0°C) pch_skylake-virtual-0 Adapter: Virtual device temp1: +41.5°C This does not cause fan to blow up. Now I wrote again original value: echo 13 > /sys/devices/system/cpu/intel_pstate/min_perf_pct sensor values after some minutes: sensors iwlwifi-virtual-0 Adapter: Virtual device temp1: +29.0°C thinkpad-isa-0000 Adapter: ISA adapter fan1: 0 RPM acpitz-virtual-0 Adapter: Virtual device temp1: +44.0°C (crit = +128.0°C) coretemp-isa-0000 Adapter: ISA adapter Package id 0: +45.0°C (high = +100.0°C, crit = +100.0°C) Core 0: +43.0°C (high = +100.0°C, crit = +100.0°C) Core 1: +43.0°C (high = +100.0°C, crit = +100.0°C) pch_skylake-virtual-0 Adapter: Virtual device temp1: +40.5°C Sorry, but I do not have the knowledge to apply patch and compile the kernel myself so I cannot provide the test information regarding to T470s. P.S. Is there any patched and compiled kernel I can install and test? P.P.S Which version of the kernel will the patch being merged into? Hi,
I built a 4.11.7 kernel with patch attachment 256927 [details].
/sys/module/acpi/parameters//acpica_version : 20170119
/sys/module/acpi/parameters//aml_debug_output = 0
/sys/module/acpi/parameters//ec_busy_polling = N
/sys/module/acpi/parameters//ec_delay = 500
/sys/module/acpi/parameters//ec_event_clearing = query
/sys/module/acpi/parameters//ec_freeze_events = Y
/sys/module/acpi/parameters//ec_max_queries = 16
/sys/module/acpi/parameters//ec_polling_guard = 550
/sys/module/acpi/parameters//ec_storm_threshold = 8
/sys/module/acpi/parameters//ec_suspend_yield = Y
/sys/module/acpi/parameters//immediate_undock = Y
With these settings, the /first/ suspend worked fine as you'd expect, but after the second suspend/resume cycle the fan came on and stayed on.
iwlwifi-virtual-0
Adapter: Virtual device
temp1: +28.0°C
pch_skylake-virtual-0
Adapter: Virtual device
temp1: +26.0°C
acpitz-virtual-0
Adapter: Virtual device
temp1: +48.0°C (crit = +128.0°C)
thinkpad-isa-0000
Adapter: ISA adapter
fan1: 3975 RPM
coretemp-isa-0000
Adapter: ISA adapter
Package id 0: +30.0°C (high = +100.0°C, crit = +100.0°C)
Core 0: +28.0°C (high = +100.0°C, crit = +100.0°C)
Core 1: +27.0°C (high = +100.0°C, crit = +100.0°C)
Sorry, somehow I forgot to mention that my machine is: Model: Lenovo ThinkPad T470 BIOS: N1QET55W (1.30), Release Date: 05/23/2017 Firmware Revision: 1.13 Interesting. I have the same settings like in comment 12 only difference is the acpica_version which is 20170303 . For me the issue with fan blowing is resolved with that settings. attachment 256927 [details] without any kernel command line parameters, fixes the issue for me too.
Kernel 4.12-rc7
ThinkPad X1 Carbon 5th
LENOVO 20HQS0LV00/20HQS0LV00, BIOS N1MET35W (1.20 ) 05/17/2017
EC: Firmware Revision: 1.14
The original symptoms resurfaced today after a week of normal operation with 4.12.0-rc6, patch attachment 256927 [details], acpi.ec_freeze_events=Y, and acpi.ec_suspend_yield=Y. I am getting acpitz-virtual-0 temp1 at 48°C and the fan spinning at max speed until a power off/hibernate. Will try acpi.ec_freeze_events=N acpi.ec_suspend_yield=Y for a couple of days.
*** Bug 191181 has been marked as a duplicate of this bug. *** (In reply to Damjan Georgievski from comment #15) > attachment 256927 [details] without any kernel command line parameters, > fixes the issue for me too. > > Kernel 4.12-rc7 > ThinkPad X1 Carbon 5th > LENOVO 20HQS0LV00/20HQS0LV00, BIOS N1MET35W (1.20 ) 05/17/2017 > EC: Firmware Revision: 1.14 Same here (x1 carbon 2017, i7, bios v1.20). Anything I can do to help (resolve this issue)? Recommended to try and apply one of these "patches" or just to wait and hope that it will be resolved through a normal update, soon. Running Ubuntu 17.04. Thanks! > Same here (x1 carbon 2017, i7, bios v1.20). Anything I can do to help > (resolve this issue)? Recommended to try and apply one of these "patches" or > just to wait and hope that it will be resolved through a normal update, soon. > > Running Ubuntu 17.04. if you know how to compile and setup your own kernel try the patch in attachment 256927 [details] (In reply to Damjan Georgievski from comment #19) > > Same here (x1 carbon 2017, i7, bios v1.20). Anything I can do to help > > (resolve this issue)? Recommended to try and apply one of these "patches" > or > > just to wait and hope that it will be resolved through a normal update, > soon. > > > > Running Ubuntu 17.04. > > if you know how to compile and setup your own kernel try the patch in > attachment 256927 [details] 1) Never done it. If you could point me to a step by step guide that would be great. 2) Given that this bug is being discussed here, how likely is it that a fix is going to find its way into an update of Ubuntu 17.04 soon? How soon? FWIW after a week's worth of wanderings with this laptop (4.11.7), I've observed that about half of the resumes result in the "48C, fans blowing constantly" situation and the rest of the time the fans work as expected. 4.12.0 seems no different. I've not noticed any difference in the ec-related dmesg spew between the two cases, unfortunately. 3) Has the patch (above) been included in the new stable kernel (v. 4.12) which was released earlier this week? (In reply to fellowsgarden from comment #22) > 3) > Has the patch (above) been included in the new stable kernel (v. 4.12) which > was released earlier this week? no. ps. this is not the proper channel for kernel compile/install tutorials. you better ask an Ubuntu forum There is still a discussion for whether the noirq stage tuning is useful in community. If not, all noirq hooks will be removed instead of adding an option. Thanks Lv I'm inclined to try out the patch (and learn how to independently elsewhere) for two reasons: a) I'm eager to have a system which works b) Further up on this page there's an indication that "more info is needed" Status: NEEDINFO: if I could help, I'd be more than happy to! But if my help (e.g. to test if the noirq stage modification is useful on my system - a system which doesn't seem to be unique here!) is not needed, then, before attempting to implement the patch, I would like to know if the problem will just "go away by itself" with an update in the not too distant future. -- So, if I may, reiterate my question from my earlier post: Any educated guess as to when a "fix" might be included in the normal ubuntu 17.04 updates? Rough ballpark: Q4_2017 or Q2_2018 or only Q3_2021? By "educated" I mean from your experience with the time it takes for kernel bug fixes to eventually percolate to stable ubuntu releases. Maybe this question, too, belongs on stackexchange and not here. Apologies if so (am new to kernel.org and still learning my way about...) Thanks. Patches submitted here: https://patchwork.kernel.org/patch/9835825/ https://patchwork.kernel.org/patch/9835823/ Thanks To Darrick J. Wong: I also suspected these fixes might not be the root cause fixes. I think you can wait until the upstream kernel is patched with the fixes required for ThinkPad X1 Carbon 5th Gen users. Then if you still had problems with the patched upstream kernel, you could file a new bug to report your issue, starting from thermal category. Thanks Lv (In reply to Lv Zheng from comment #27) > To Darrick J. Wong: > > I also suspected these fixes might not be the root cause fixes. > I think you can wait until the upstream kernel is patched with the fixes > required for ThinkPad X1 Carbon 5th Gen users. Then if you still had > problems with the patched upstream kernel, you could file a new bug to > report your issue, starting from thermal category. I'll set myself a reminder to include those two patches the next time I build a kernel for that T470. Which, given my continuing need to integrate xfs_scrub fixes, probably won't be more than a few days. ;) --D (In reply to Lv Zheng from comment #26) > Patches submitted here: > > https://patchwork.kernel.org/patch/9835825/ > https://patchwork.kernel.org/patch/9835823/ > > Thanks Just tried it again with these specific ones and it seems to be fine for now. To Gjorgji Jankovski: Thanks for confirmation. I'll mark this bug as RESOLVED. Thanks Lv So far so good, after a whole day of testing... (In reply to Lv Zheng from comment #26) > Patches submitted here: > > https://patchwork.kernel.org/patch/9835825/ > https://patchwork.kernel.org/patch/9835823/ so far so good on 4.12.1 + the patches. X1 Carbon (gen 5) + bios 1.22 (new from this week) (In reply to Darrick J. Wong from comment #31) > So far so good, after a whole day of testing... So your platform is also OK now. (In reply to Damjan Georgievski from comment #32) > (In reply to Lv Zheng from comment #26) > > Patches submitted here: > > > > https://patchwork.kernel.org/patch/9835825/ > > https://patchwork.kernel.org/patch/9835823/ > > so far so good on 4.12.1 + the patches. > X1 Carbon (gen 5) + bios 1.22 (new from this week) Patches are merged by linux-pm.git, will appear in upstream in next release process. I'll close the bug. Thanks for the reports and the tests. Regrettably, it just happened again, three days in. :( (In reply to Darrick J. Wong from comment #34) > Regrettably, it just happened again, three days in. :( I'm on my fourth day (4.12.0-2 + these patches), and it still works fine on X1C. With patch 256927 on 4.12.0rc6 I got the problem on T470 within ten days, with both acpi.ec_freeze_events=N and Y. Now I am testing patches 9835823 and 9835825 on kernel 4.12.0 without any specific kernel boot options. A couple of days in, so far so good. > Regrettably, it just happened again, three days in.
Now you can file a different bug than X1C users', asking help from thermal developers.
Happened again when unsuspending while docked on a T470. I'm on a T470s and have had the same problem as described here. On 4.11.11 with patches 9835825 and 9835823 applied the problem has now gone. T470, patched kernel 4.12.4 and the bug is still present. The symptoms do not occur as often as without the patch but every couple of days the fans still spins fast after resume and acpitz-virtual-0 is stuck at 48°C. BIOS version: N1QET56W, BIOS Revision: 1.31, Firmware Revision: 1.14 I believe that was never the same bug. To Tomislav: Could you try to patch the followings and see if the situation can be improved: https://patchwork.kernel.org/patch/9870917/ https://patchwork.kernel.org/patch/9870915/ https://patchwork.kernel.org/patch/9870919/ https://patchwork.kernel.org/patch/9870925/ Thanks Lv (In reply to Lv Zheng from comment #42) > To Tomislav: > > Could you try to patch the followings and see if the situation can be > improved: > > https://patchwork.kernel.org/patch/9870917/ > https://patchwork.kernel.org/patch/9870915/ > https://patchwork.kernel.org/patch/9870919/ > https://patchwork.kernel.org/patch/9870925/ > > Thanks > Lv Lv, I am now running 4.13.0-rc4 with the four patches above. So far so good, but as the bug only appears sporadically I would like to test the new kernel for a couple of days under normal workloads. Sure, I'll wait until your next feedback and refresh the posted patches with enhanced patch description and your tested-by. After three days of normal behavior, today the bug resurfaced again on T470 running 4.13.0-rc4 and the four latest patches, after a 5-gour standby. This is better than hearing fans spin up every resume, but still not fixed. I do not see anything suspicious in dmesg except perhaps this message which might be unrelated: mei_me 0000:00:16.0: can't suspend (mei_me_pm_runtime_suspend [mei_me] returned -62) mei_me 0000:00:16.0: unexpected reset: dev_state = ENABLED fw status = 90000245 80100306 00000020 00084400 00000000 40400AD9 Anything else I can try? Thank you for your hard work. > This is better than hearing fans spin up every resume, but still not fixed.
The patches can help to handle EC events for a longer period during suspend.
Maybe we can tune the timing of acpi_ec_block_transaction() invocation to make the period even longer.
I'll check with the others to get a possible latest stage to invoke acpi_ec_block_transactions().
Thanks
Lv
Sorry for asking a beginner's question (again): in the links to both patches above it currently still says "state: awaiting upstream". I've googled this a bit but still have not arrived at a clear idea as to how to interpret this. - Does "awaiting upstream" imply an indefinite wait? (Does this mean that eventually these patches will enter some future kernel of, say, 4.18, or have they already entered some specific kernel, e.g. 4.13-rc6?) Bottom line: - How soon will these patches arrive via normal updates (educated guess from past experience)? Thanks! I am just here to report that this issue seems gone on my T470s with the arrival of kernel 4.12.7. I am using openSUSE Tumbleweed. It skipped kernel 4.12.1-6 and jumped to 4.12.7 on last Saturday. For one week, with more than a dozen cycles of sleep and awake, my T470s has seen no issue of fan blowing after resume. Since 4.12.8 on Fedora 26, the issue is also gone after at least two dozen sleep/wake cycles. X1 Carbon (4th generation). (In reply to fellowsgarden from comment #47) > Sorry for asking a beginner's question (again): in the links to both patches > above it currently still says "state: awaiting upstream". I've googled this > a bit but still have not arrived at a clear idea as to how to interpret this. > > - Does "awaiting upstream" imply an indefinite wait? > > (Does this mean that eventually these patches will enter some future kernel > of, say, 4.18, or have they already entered some specific kernel, e.g. > 4.13-rc6?) > > Bottom line: > > - How soon will these patches arrive via normal updates (educated guess > from past experience)? Regression fixes are already upstreamed in Linus tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=662591461c4 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9c40f956ce9 That's why the bug is closed as code fix. If you still can suffer from same issue, please open a new bug, try old kernels, and let us know which kernel is the earliest bad kernel and latest good kernel. (In reply to Dominik Sandjaja from comment #49) > Since 4.12.8 on Fedora 26, the issue is also gone after at least two dozen > sleep/wake cycles. X1 Carbon (4th generation). I am sad,My x1c 5th with kernel 4.12.9 but the bug is still present :( If you mean the regression fixes. It should be in 4.13 kernels. $ git tag --contains 66259146 v4.13 v4.13-rc1 v4.13-rc2 v4.13-rc3 v4.13-rc4 v4.13-rc5 v4.13-rc6 v4.13-rc7 Anyone that still can reproduce this issue on latest v4.13 kernels (even with lowered replication rate), please upload lspci/dmidecode output here. Thanks in advance. FWIW, on my side things are working fine. I upgraded to the latest BIOS the other day and I'm running 4.13+ on my laptop (x1 gen4), and no fan issues. Created attachment 258341 [details] T470 lspci output (In reply to Lv Zheng from comment #52) > If you mean the regression fixes. It should be in 4.13 kernels. > $ git tag --contains 66259146 > v4.13 > v4.13-rc1 > v4.13-rc2 > v4.13-rc3 > v4.13-rc4 > v4.13-rc5 > v4.13-rc6 > v4.13-rc7 > > Anyone that still can reproduce this issue on latest v4.13 kernels (even > with lowered replication rate), please upload lspci/dmidecode output here. > Thanks in advance. On ThinkPad T470 I do see the issue as reported in comment 45 on 4.13-rc4 with patches: https://patchwork.kernel.org/patch/9870917/ https://patchwork.kernel.org/patch/9870915/ https://patchwork.kernel.org/patch/9870919/ https://patchwork.kernel.org/patch/9870925/ (will attach dmidecode output below). Next I can try the latest 4.13 as well as the recently published BIOS update. Created attachment 258343 [details]
T470 dmidecode output
dmidecode output for T470 running kernel 4.13-rc4 with the issue still appearing.
To Tomislav: I happened to know a fact that, there are many issues related to this phenomenon: FAN is blowing up because EC has failed to obtain the temperature of a specific CPU, returning invalid temperature value to the BIOS code and BIOS code decided to blow the FAN up. Normally this is because a failure of a PECI communication that could be resulted from various causes. Some of the root causes I learned recently include: If a CPU is not in a proper C state, PECI will fail to obtain the temperature. If someone is using a specific SSD (Samsung SM961 512GB), he has to upgrade his EC firmware. So please also upload the lspci output here. And I'd suggest you to file a different bug to investigate the root cause of your platform. This bug is a regression related to Lenovo Carbon X1 users where the root cause might be related to the EC firmware event handling mechanism. Thanks and best regards Lv To Tomislav: Could you also upload an acpidump output here. Thanks Lv To Tomislav: In the lspci: 3e:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM961/PM961 (prog-if 02 [NVM Express]) It's the NVMe model that can reproduce the issue. It's reported that the issue couldn't be reproduced with specific NVMe models. Only limitted SSDs models are affected. Thanks Lv Created attachment 258371 [details]
x1c Generation 5 (2017) acpidump output
Sorry, English is not my native language, My x1c has the same SM961 NVMe SSD,Its my acpidump output in comment 59. Hope it be helpful. > To shian
Have you confirmed like Tomislav that you are still suffering from this issue after upgrading to latest kernel?
Thanks
Lv
> To shian This is an already fixed/closed bug. We are still leaving comments here as Tomislav complained a lowered replication ratio in comment 36, comment 40, comment 43 and comment 45 (probably should open a different bug). > To shian You are using 4.12.9 while I have confirmed that the fix is in 4.13 kernels in comment 52. Have you tried? I am so sorry I have not been fully tested,I think that the issue still exists on my pc because I found following patch in the fedora src.rpm package: $ rpm2cpio kernel-4.12.11-300.fc26.src.rpm| cpio -div $ cat patch-4.12.11| grep '@@.*acpi_ec_ecdt_probe' -A31 @@ -1812,24 +1812,6 @@ int __init acpi_ec_ecdt_probe(void) } #ifdef CONFIG_PM_SLEEP -static int acpi_ec_suspend_noirq(struct device *dev) -{ - struct acpi_ec *ec = - acpi_driver_data(to_acpi_device(dev)); - - acpi_ec_enter_noirq(ec); - return 0; -} - -static int acpi_ec_resume_noirq(struct device *dev) -{ - struct acpi_ec *ec = - acpi_driver_data(to_acpi_device(dev)); - - acpi_ec_leave_noirq(ec); - return 0; -} - static int acpi_ec_suspend(struct device *dev) { struct acpi_ec *ec = @@ -1851,7 +1833,6 @@ static int acpi_ec_resume(struct device *dev) #endif static const struct dev_pm_ops acpi_ec_pm = { - SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(acpi_ec_suspend_noirq, acpi_ec_resume_noirq) SET_SYSTEM_SLEEP_PM_OPS(acpi_ec_suspend, acpi_ec_resume) }; After receiving your e-mail, I tested the 4.13.1 kernel and reproduced the problem. I copied the output record: [fengzi@x1c ~]% while [ 1 ] ; do sensors | awk '{if ($0 ~ /Package/) temp = $4; else if ($0 ~ /fan/) {fan = $2; unit = $3}} END{print temp" "fan" "unit}'; sleep 2; done +48.0°C 0 RPM +47.0°C 0 RPM +67.0°C 0 RPM ERROR: Can't get value of subfeature temp1_input: I/O error +60.0°C 65535 RPM +61.0°C 65535 RPM +49.0°C 65535 RPM +48.0°C 65535 RPM +47.0°C 0 RPM ERROR: Can't get value of subfeature temp1_input: I/O error +76.0°C 3492 RPM +49.0°C 4538 RPM +47.0°C 5208 RPM +46.0°C 5836 RPM +45.0°C 6423 RPM +44.0°C 7025 RPM +45.0°C 6976 RPM +46.0°C 6960 RPM +45.0°C 6960 RPM +47.0°C 6960 RPM +43.0°C 6960 RPM +44.0°C 6960 RPM ^C% [fengzi@x1c ~]% sensors coretemp-isa-0000 Adapter: ISA adapter Package id 0: +45.0°C (high = +100.0°C, crit = +100.0°C) Core 0: +45.0°C (high = +100.0°C, crit = +100.0°C) Core 1: +42.0°C (high = +100.0°C, crit = +100.0°C) pch_skylake-virtual-0 Adapter: Virtual device temp1: +43.5°C acpitz-virtual-0 Adapter: Virtual device temp1: +48.0°C (crit = +128.0°C) iwlwifi-virtual-0 Adapter: Virtual device temp1: +41.0°C thinkpad-isa-0000 Adapter: ISA adapter fan1: 6960 RPM [fengzi@x1c ~]% uname -r 4.13.1-303.fc27.x86_64 Do I need to provide additional information? Thanks an Oh! Some things I forgot. In comment 64, Output: "ERROR: Can't get value of subfeature temp1_input: I/O error" corresponds to a suspend and wake-up. To Shian: OK, it looks your issue is a different one, let me file a different bug for you, and categorize it as thermal so that it can be investigated by the right owners. Thanks Lv To shian: Split your report to bug 196973. To Tomislav: Split your report to bug 196975. We could track these issues there. Thanks Lv If the new reports are stuck here, they will only be investigated if the order of the suspend steps matters. So they need to be re-categorized and investigated via their possible root causes. I still seem to be experiencing this bug on my Lenovo Thinkpad X1 Carbon 4th Generation, on kernel 5.9.11. See also bug #191181, where others are reporting the same problem on the same system, with different kernel versions. acpitz-acpi-0 seems to get stuck at 48 degrees C after suspend/resume around 30%/40% of the times and fan keeps blowing as a result. The problem can be fixed by suspending and resuming again. Not sure if this is the exact same bug/problem as the one described above... the symptoms are pretty similar though. Should a new bug report be created for this, since this one has been marked "closed"? @permaer, comment 69: I just reported the X1 Carbon 4th Gen issue as a bug 214205. |