Bug 196973 - Fan still blows up after fixing the regression - Lenovo x1c Generation 5 (2017)
Summary: Fan still blows up after fixing the regression - Lenovo x1c Generation 5 (2017)
Status: CLOSED UNREPRODUCIBLE
Alias: None
Product: ACPI
Classification: Unclassified
Component: Power-Thermal (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Zhang Rui
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-09-18 01:47 UTC by Lv Zheng
Modified: 2017-12-14 01:02 UTC (History)
3 users (show)

See Also:
Kernel Version: 4.13
Subsystem:
Regression: No
Bisected commit-id:


Attachments
acpidump of x1c gen5 (2017) (780.10 KB, text/plain)
2017-09-18 01:47 UTC, Lv Zheng
Details
full dmesg output after fan blows up (132.93 KB, text/plain)
2017-09-18 04:35 UTC, shian
Details
grep /sys/class/thermal/* output (10.43 KB, text/plain)
2017-10-18 02:05 UTC, shian
Details
sensors record (4.13.5) (2.25 KB, text/plain)
2017-10-18 02:33 UTC, shian
Details
grep /sys/class/hwmon/hwmon*/* (3.35 KB, text/plain)
2017-10-18 03:14 UTC, shian
Details
full sensors output (1.76 KB, text/plain)
2017-10-18 03:36 UTC, shian
Details
boot kernel with idle=[halt/nomwait/poll] or intel_idle.max_cstate=[0-9] (981 bytes, text/plain)
2017-10-18 11:34 UTC, shian
Details
turbostat output in kernel 4.13.5 (42.78 KB, text/plain)
2017-10-18 11:59 UTC, shian
Details
dmidecode output (kernel 4.13.5) (15.53 KB, text/plain)
2017-10-19 06:00 UTC, shian
Details
turbostat output (kernel-4.14.3 BIOSCD[n1mur11w].iso) (5.58 KB, application/x-troff-man)
2017-12-14 00:53 UTC, shian
Details
dmidecode output (kernel-4.14.3 BIOSCD[n1mur11w].iso) (15.53 KB, text/plain)
2017-12-14 00:55 UTC, shian
Details
sensors output (kernel-4.14.3 BIOSCD[n1mur11w].iso) (535 bytes, text/plain)
2017-12-14 00:59 UTC, shian
Details
full dmesg output (kernel-4.14.3 BIOSCD[n1mur11w].iso) (230.79 KB, text/plain)
2017-12-14 01:00 UTC, shian
Details
acpidump output (kernel-4.14.3 BIOSCD[n1mur11w].iso) (812.55 KB, text/plain)
2017-12-14 01:02 UTC, shian
Details

Description Lv Zheng 2017-09-18 01:47:47 UTC
Created attachment 258449 [details]
acpidump of x1c gen5 (2017)

I am so sorry I have not been fully tested,I think that the issue still exists on my pc because I found following patch in the fedora src.rpm package:

$ rpm2cpio kernel-4.12.11-300.fc26.src.rpm| cpio -div
$ cat patch-4.12.11| grep '@@.*acpi_ec_ecdt_probe' -A31
@@ -1812,24 +1812,6 @@ int __init acpi_ec_ecdt_probe(void)
 }
 
 #ifdef CONFIG_PM_SLEEP
-static int acpi_ec_suspend_noirq(struct device *dev)
-{
-	struct acpi_ec *ec =
-		acpi_driver_data(to_acpi_device(dev));
-
-	acpi_ec_enter_noirq(ec);
-	return 0;
-}
-
-static int acpi_ec_resume_noirq(struct device *dev)
-{
-	struct acpi_ec *ec =
-		acpi_driver_data(to_acpi_device(dev));
-
-	acpi_ec_leave_noirq(ec);
-	return 0;
-}
-
 static int acpi_ec_suspend(struct device *dev)
 {
 	struct acpi_ec *ec =
@@ -1851,7 +1833,6 @@ static int acpi_ec_resume(struct device *dev)
 #endif
 
 static const struct dev_pm_ops acpi_ec_pm = {
-	SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(acpi_ec_suspend_noirq, acpi_ec_resume_noirq)
 	SET_SYSTEM_SLEEP_PM_OPS(acpi_ec_suspend, acpi_ec_resume)
 };

After receiving your e-mail, I tested the 4.13.1 kernel and reproduced the problem.
I copied the output record:

[fengzi@x1c ~]% while [ 1 ] ; do sensors | awk '{if ($0 ~ /Package/) temp = $4; else if ($0 ~ /fan/) {fan = $2; unit = $3}} END{print temp"  "fan" "unit}'; sleep 2; done
+48.0°C  0 RPM
+47.0°C  0 RPM
+67.0°C  0 RPM
ERROR: Can't get value of subfeature temp1_input: I/O error
+60.0°C  65535 RPM
+61.0°C  65535 RPM
+49.0°C  65535 RPM
+48.0°C  65535 RPM
+47.0°C  0 RPM
ERROR: Can't get value of subfeature temp1_input: I/O error
+76.0°C  3492 RPM
+49.0°C  4538 RPM
+47.0°C  5208 RPM
+46.0°C  5836 RPM
+45.0°C  6423 RPM
+44.0°C  7025 RPM
+45.0°C  6976 RPM
+46.0°C  6960 RPM
+45.0°C  6960 RPM
+47.0°C  6960 RPM
+43.0°C  6960 RPM
+44.0°C  6960 RPM
^C%                                                                                                                                                                                 [fengzi@x1c ~]% sensors
coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +45.0°C  (high = +100.0°C, crit = +100.0°C)
Core 0:        +45.0°C  (high = +100.0°C, crit = +100.0°C)
Core 1:        +42.0°C  (high = +100.0°C, crit = +100.0°C)

pch_skylake-virtual-0
Adapter: Virtual device
temp1:        +43.5°C  

acpitz-virtual-0
Adapter: Virtual device
temp1:        +48.0°C  (crit = +128.0°C)

iwlwifi-virtual-0
Adapter: Virtual device
temp1:        +41.0°C  

thinkpad-isa-0000
Adapter: ISA adapter
fan1:        6960 RPM

[fengzi@x1c ~]% uname -r
4.13.1-303.fc27.x86_64

Do I need to provide additional information?

Thanks
an
Comment 1 Lv Zheng 2017-09-18 01:48:32 UTC
Oh! Some things I forgot.
In comment 1, Output:
"ERROR: Can't get value of subfeature temp1_input: I/O error"
corresponds to a suspend and wake-up.
Comment 2 Lv Zheng 2017-09-18 01:50:01 UTC
Have you tried to boot
Comment 3 Lv Zheng 2017-09-18 01:52:09 UTC
The bug is reported from shian (fengziyonghu@163.com). split from bug 196129.
Let me ask:

1. could you upload full dmesg here?
2. have you tried to boot with acpi.ec_freeze_events=N?

Thanks
Comment 4 Lv Zheng 2017-09-18 02:01:29 UTC
Split here to have the thermal guys to investigate if the "temp1_input" failure matters.
Comment 5 shian 2017-09-18 04:35:04 UTC
Created attachment 258455 [details]
full dmesg output after fan blows up


And I copied the output record in the case where the process of reproducing the fan blow was as short as possible:

[fengzi@x1c ~]% while ... sensors ...
+44.0°C  0 RPM
+45.0°C  0 RPM
ERROR: Can't get value of subfeature temp1_input: I/O error
+54.0°C  65535 RPM
+45.0°C  65535 RPM
+44.0°C  65535 RPM
+44.0°C  65535 RPM
+44.0°C  0 RPM
ERROR: Can't get value of subfeature temp1_input: I/O error
+57.0°C  65535 RPM
+44.0°C  4470 RPM
+43.0°C  5136 RPM
+44.0°C  5747 RPM
+44.0°C  6396 RPM
+43.0°C  6960 RPM
+43.0°C  6993 RPM
^C%
[fengzi@x1c ~]% cat /proc/cmdline 
BOOT_IMAGE=/vmlinuz-4.13.2-300.fc27.x86_64 root=UUID=c2247d70-f254-4b50-810a-231290113dd1 ro rhgb quiet LANG=zh_CN.UTF-8 acpi.ec_freeze_events=N
Comment 6 Zhang Rui 2017-10-17 05:31:17 UTC
please attach the output of "grep . /sys/class/thermal/thermal*/*" and "grep . /sys/class/thermal/thermal*/device/path" when the error happens.
Comment 7 Lv Zheng 2017-10-17 06:57:26 UTC
Anyway, this looks more like a proof that there is an EC FW bug.
What the reverted commit has done is to make it 100% triggered during s3-exit.

And if there is other cause than an EC FW robustness (need to be confirmed by thermal developers), then apparently Lenovo EC FW developers should do something to improve.
So let me assign this to thermal experts.

OTOH, can the similar test be done on Windows?
Comment 8 Lv Zheng 2017-10-17 07:38:57 UTC
Your sympton cannot be reproduced on platforms that can be fixed by the following regression fix:

commit 662591461c4b9a1e3b9b159dbf37648a585ebaae
Author: Lv Zheng <lv.zheng@intel.com>
Date:   Wed Jul 12 11:09:09 2017 +0800

    ACPI / EC: Drop EC noirq hooks to fix a regression

    According to bug reports, although the busy polling mode can make
    noirq stages execute faster, it causes abnormal fan blowing up after
    system resume (see the first link below for a video demonstration)
    on Lenovo ThinkPad X1 Carbon - the 5th Generation.  The problem can
    be fixed by upgrading the EC firmware on that machine.

    However, many reporters confirm that the problem can be fixed by
    stopping busy polling during suspend/resume and for some of them
    upgrading the EC firmware is not an option.

    For this reason, drop the noirq stage hooks from the EC driver
    to fix the regression.

    Fixes: c3a696b6e8f8 (ACPI / EC: Use busy polling mode when GPE is not enabled)
    Link: https://youtu.be/9NQ9x-Jm99Q
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=196129

The test result on that kind of platform is:
The fan RPM is always reported as 0 except for the moment after S3 resume to desktop. Upon S3 resume to desktop, I see three to four instances of 65536 rpm. At no time did I see an I/O error.

As a conclusion, it looks your issue is a different "invalid cpu temperature" issue, and is not related to the original regression report of bug 196129.
Comment 9 shian 2017-10-18 02:05:30 UTC
Created attachment 260251 [details]
grep /sys/class/thermal/* output

There are some comments that indicate the output,
And contains kernel parameters,
Comment 10 Zhang Rui 2017-10-18 02:23:21 UTC
(In reply to shian from comment #9)
> Created attachment 260251 [details]
> grep /sys/class/thermal/* output
> 
> There are some comments that indicate the output,
> And contains kernel parameters,

Does this error "ERROR: Can't get value of subfeature temp1_input: I/O error" happen while the Fan blowing?
please also attach the "grep . /sys/class/hwmon/hwmon*/*" both before and after the "ERROR: Can't get value of subfeature temp1_input: I/O error" error.
Comment 11 shian 2017-10-18 02:33:54 UTC
Created attachment 260253 [details]
sensors record (4.13.5)

(In reply to Lv Zheng from comment #8)

After upgrading to 4.13+, It seem the error never occur when the battery powered.
The attachment is my test record(And before, I tried the new kernel for nearly a month).
As the record shows, The "ERROR: Can't get value of subfeature temp1_input: I/O error" info does not always appear.
I found lenovo release the new bios: n1mur10w, I flash it but the fan blow still happens.
Comment 12 Zhang Rui 2017-10-18 02:56:33 UTC
(In reply to shian from comment #11)
> Created attachment 260253 [details]
> sensors record (4.13.5)
> 
> (In reply to Lv Zheng from comment #8)
> 
> After upgrading to 4.13+, It seem the error never occur when the battery
> powered.

you mean AC adapter plugged, right?
can you ever reproduce the error in 4.13+, including battery unpowered?

> The attachment is my test record

on which kernel?
Comment 13 shian 2017-10-18 02:58:46 UTC
Comment on attachment 260253 [details]
sensors record (4.13.5)

>[fengzi@x1c ~]% while [ 1 ] ; do sensors | awk '{if ($1 ~ /Package/) temp=$4;
>else if ($1 ~ /fan1/) rpm=$2;} END{print temp"  "rpm" RPM"}' ; sleep 2; done
>+40.0°C  0 RPM                                                        /*
>Start, battery only */
>+40.0°C  0 RPM                                                        /*
>Before I pressed <fn + 4>(suspend) */
>ERROR: Can't get value of subfeature temp1_input: I/O error            /*
>After I pressed <fn>(wake up and good) */
>+45.0°C  65535 RPM
>+40.0°C  65535 RPM
>+40.0°C  65535 RPM
>+39.0°C  65535 RPM
>+40.0°C  0 RPM                                                        /*
>Before I pressed <fn + 4>(suspend) */
>ERROR: Can't get value of subfeature temp1_input: I/O error            /*
>After I pressed <fn>(wake up and good) */
>+48.0°C  65535 RPM
>+40.0°C  65535 RPM
>+59.0°C  65535 RPM                                                    /*
>Before I pressed <fn + 4>(suspend) */
>ERROR: Can't get value of subfeature temp1_input: I/O error            /*
>After I pressed <fn>(wake up and good) */
>+46.0°C  65535 RPM
>+40.0°C  65535 RPM
>+52.0°C  65535 RPM
>+40.0°C  65535 RPM
>+40.0°C  0 RPM                                                        /*
>Before I pressed <fn + 4>(suspend) */
>ERROR: Can't get value of subfeature temp1_input: I/O error            /*
>After I pressed <fn>(wake up and good) */
>+52.0°C  65535 RPM
>+40.0°C  65535 RPM
>+45.0°C  65535 RPM
>+39.0°C  65535 RPM
>+40.0°C  0 RPM                                                        /*
>Before I pressed <fn + 4>(suspend) */
>ERROR: Can't get value of subfeature temp1_input: I/O error            /*
>After I pressed <fn>(wake up and good) */
>+45.0°C  65535 RPM
>+40.0°C  65535 RPM
>+39.0°C  65535 RPM
>+40.0°C  65535 RPM
>+39.0°C  0 RPM                                                        /*
>Before I pressed <fn + 4>(suspend) */
>ERROR: Can't get value of subfeature temp1_input: I/O error            /*
>*******Now, I plugged power(wake up and fan blow) */
>+43.0°C  3307 RPM
>+35.0°C  4354 RPM
>+35.0°C  5008 RPM
>+35.0°C  5597 RPM
>+35.0°C  6237 RPM                                                     /*
>Before I pressed <fn + 4>(suspend) */
>+42.0°C  65535 RPM                                                    /*
>After I pressed <fn>(wake up and good) */
>+36.0°C  65535 RPM
>+35.0°C  65535 RPM
>+35.0°C  65535 RPM
>+35.0°C  0 RPM                                                        /*
>Before I pressed <fn + 4>(suspend) */
>ERROR: Can't get value of subfeature temp1_input: I/O error            /*
>After I pressed <fn>(wake up and fan blow) */
>+45.0°C  2790 RPM
>+36.0°C  4354 RPM
>+36.0°C  5008 RPM
>+36.0°C  5617 RPM                                                     /*
>Before I pressed <fn + 4>(suspend) */
>ERROR: Can't get value of subfeature temp1_input: I/O error            /*
>After I pressed <fn>(wake up and fan blow) */
>+43.0°C  65535 RPM
>+36.0°C  4261 RPM
>+55.0°C  4966 RPM                                                     /*
>Before I pressed <fn + 4>(suspend) */
>ERROR: Can't get value of subfeature temp1_input: I/O error            /*
>After I pressed <fn>(wake up and fan blow) */
>+45.0°C  65535 RPM
>+36.0°C  4279 RPM
>+53.0°C  4966 RPM                                                     /*
>Before I pressed <fn + 4>(suspend) */
>ERROR: Can't get value of subfeature temp1_input: I/O error            /*
>After I pressed <fn>(wake up and fan blow) */
>+42.0°C  65535 RPM
>+36.0°C  4201 RPM
>+36.0°C  4870 RPM                                                     /*
>Before I pressed <fn + 4>(suspend) */
>ERROR: Can't get value of subfeature temp1_input: I/O error            /*
>After I pressed <fn>(wake up and fan blow) */
>+43.0°C  65535 RPM
>+37.0°C  4219 RPM                                                     /*
>Before I pressed <fn + 4>(suspend) */
>ERROR: Can't get value of subfeature temp1_input: I/O error            /*
>After I pressed <fn>(wake up and good) */
>+46.0°C  65535 RPM
>+37.0°C  65535 RPM
>+37.0°C  65535 RPM
>+36.0°C  65535 RPM
>+36.0°C  0 RPM
>^C%                          
>[fengzi@x1c ~]% cat /proc/cmdline
>BOOT_IMAGE=/vmlinuz-4.13.5-200.fc26.x86_64
>root=UUID=c2247d70-f254-4b50-810a-231290113dd1 ro rhgb quiet LANG=zh_CN.UTF-8
Comment 14 Zhang Rui 2017-10-18 03:10:28 UTC
please attach the output of "grep . /sys/class/hwmon/hwmon*/*" when the error happens.
Comment 15 shian 2017-10-18 03:14:31 UTC
Created attachment 260255 [details]
grep /sys/class/hwmon/hwmon*/*

(In reply to Zhang Rui from comment #10)

"ERROR: Can't get value of subfeature temp1_input: I/O error" info only appears when the the first output after the wakeup(and does not always appear), After this, no matter whether there is a fan blow, this info will no longer appear.
So, I can't get output when "ERROR: Can't get value of subfeature temp1_input: I/O error" appears.

(In reply to Zhang Rui from comment #12)
Use kernel 4.13.5, I can still reproduce the error in AC adapter plugged,but no battery.
Comment 16 Zhang Rui 2017-10-18 03:27:18 UTC
(In reply to shian from comment #15)
> Created attachment 260255 [details]
> grep /sys/class/hwmon/hwmon*/*
> 
> (In reply to Zhang Rui from comment #10)
> 
> "ERROR: Can't get value of subfeature temp1_input: I/O error" info only
> appears when the the first output after the wakeup(and does not always
> appear), After this, no matter whether there is a fan blow, this info will
> no longer appear.
> So, I can't get output when "ERROR: Can't get value of subfeature
> temp1_input: I/O error" appears.

I see. Then please attach the full sensors output. I want to confirm which sensor device raise the error message.
Comment 17 shian 2017-10-18 03:36:23 UTC
Created attachment 260257 [details]
full sensors output

Haha~ 
It is "iwlwifi-virtual-0".
Comment 18 Zhang Rui 2017-10-18 04:43:19 UTC
then I think I see what the problem is.
the iwlwifi firmware is not loaded when you grep the sysfs interface, thus -EIO is returned.

does the problem still exists if you wait for a while (say, 5 minutes) before running sensors command after wakeup?
Comment 19 shian 2017-10-18 05:10:18 UTC
(In reply to Zhang Rui from comment #18)
 
> does the problem still exists if you wait for a while (say, 5 minutes)
> before running sensors command after wakeup?
No, It will not exist.
As recorded in the attachment 260257 [details], When sensors run for the second time after 2 seconds, iwlwifi output is correct.
It seems the fan blow is independent of whether the sensor has read the error?
Comment 20 Zhang Rui 2017-10-18 05:33:15 UTC
(In reply to shian from comment #19)
> (In reply to Zhang Rui from comment #18)
>  
> > does the problem still exists if you wait for a while (say, 5 minutes)
> > before running sensors command after wakeup?
> No, It will not exist.
> As recorded in the attachment 260257 [details], When sensors run for the
> second time after 2 seconds, iwlwifi output is correct.

I mean, after wakeup, wait for a while, and then run sensors command for the first time. does the problem exist?

IMO, this is a race problem, and it can be reproduced if you grep the sysfs I/F (via sensors command) before iwlwifi firmwre loaded. So if we grep the sysfs I/F later, the problem will not exist.

> It seems the fan blow is independent of whether the sensor has read the
> error?

yes, these are two different problems.

Now, let's look at the fan blow up issue.
Please confirm if this is a duplicate of bug #196975.
Comment 21 shian 2017-10-18 05:53:32 UTC
(In reply to Zhang Rui from comment #20)

> I mean, after wakeup, wait for a while, and then run sensors command for the
> first time. does the problem exist?

Ok, after wakeup, wait for a while, run sensors and output is correct.

> Please confirm if this is a duplicate of bug #196975.

Yes, I think this is the same problem with bug #196975.
Comment 22 Lv Zheng 2017-10-18 07:19:28 UTC
> > does the problem still exists if you wait for a while (say, 5 minutes)
> > before running sensors command after wakeup?
> No, It will not exist.
> As recorded in the attachment 260257 [details], When sensors run for the
> second time after 2 seconds, iwlwifi output is correct.

I think Rui was asking if "FAN still blows up if you wait for a while before running sensors command after wakeup?".

> Yes, I think this is the same problem with bug #196975.

I still do not think they are same issue.
It's same sympton if CPU temperature is invalid - FAN blows up.

But this sympton can be caused by various root causes.
And bug #196975 is a known one related to the EC FW specific to that platform which is known to be different from yours.

Thanks and best regards
Lv
Comment 23 Zhang Rui 2017-10-18 07:32:02 UTC
(In reply to Lv Zheng from comment #22)
> > > does the problem still exists if you wait for a while (say, 5 minutes)
> > > before running sensors command after wakeup?
> > No, It will not exist.
> > As recorded in the attachment 260257 [details], When sensors run for the
> > second time after 2 seconds, iwlwifi output is correct.
> 
> I think Rui was asking if "FAN still blows up if you wait for a while before
> running sensors command after wakeup?".
> 
> > Yes, I think this is the same problem with bug #196975.
> 
> I still do not think they are same issue.
> It's same sympton if CPU temperature is invalid - FAN blows up.
> 
> But this sympton can be caused by various root causes.
> And bug #196975 is a known one related to the EC FW specific to that
> platform which is known to be different from yours.
> 
why bug #196975 is known to be different from this one?

to confirm it is the same issue, please
1. can the problem be reproduced in 4.9 kernel?
2. in 4.13, do the same test as https://bugzilla.kernel.org/show_bug.cgi?id=196975#c18
Comment 24 shian 2017-10-18 09:07:00 UTC
(In reply to Lv Zheng from comment #22)

> I think Rui was asking if "FAN still blows up if you wait for a while before
> running sensors command after wakeup?".

I tested again and confirmed, If fan still blows up, and I run sensors after wait for few seconds, It never failure(no "ERROR: Can't get value of subfeature temp1_input: I/O error" error message and the iwlwifi temperature is read correctly)

(In reply to Zhang Rui from comment #23)

> 1. can the problem be reproduced in 4.9 kernel?

Is the 4.9 kernel must be vanilla kernel? In debian 9.1 with 4.9 lts kernel, I did not find this problem.

> 2. in 4.13, do the same test as
> https://bugzilla.kernel.org/show_bug.cgi?id=196975#c18

I did some tests(May not be rigorous enough), I will sorting out the test results and attach it later.
Comment 25 shian 2017-10-18 11:34:57 UTC
Created attachment 260267 [details]
boot kernel with idle=[halt/nomwait/poll] or intel_idle.max_cstate=[0-9]

My cpu model : 
$ cat /proc/cpuinfo | grep -i name | head -n 1
model name	: Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz
Comment 26 shian 2017-10-18 11:59:49 UTC
Created attachment 260269 [details]
turbostat output in kernel 4.13.5

In attachment, I added some comments indicating when a suspend or wake-up occurred.
Comment 27 Zhang Rui 2017-10-19 02:11:25 UTC
Now, I think the iwlwifi error problem is clear, and it is because the firmware is not ready when poking its thermal I/F.

From now on, when we say "the problem", it actually means the fan blowing issue.

It looks like this problem is similar to bug #196975, but the difference is that
intel_idle.max_cstate=5 can fix the problem for #196975, but not for you. And you need intel_idle.max_cstate=3...
Comment 28 Lv Zheng 2017-10-19 04:38:31 UTC
> why bug #196975 is known to be different from this one?

Maybe we should have shian uploading a dmidecode output here.
Comment 29 shian 2017-10-19 06:00:40 UTC
Created attachment 260287 [details]
dmidecode output (kernel 4.13.5)

(In reply to Lv Zheng from comment #28)

Yes, bug 196975 and my fan blow seems to be somewhat different,
I tested again,
when boot with intel_idle.max_cstate=3 and suspend/wakeup 40 times, The fan is running correctly.
And when boot with intel_idle.max_cstate=4 fan blow after 2nd wakeup.
Comment 30 Lv Zheng 2017-10-25 07:10:37 UTC
To shian:

Would you please try attachment 260383 [details] to confirm if the problem disappears with kernel boot parameter of "acpi_resume_latency=25"?

If the problem can be fixed by the workaround, please give attachment 260385 [details] a try to see if it can also fix the problem.

Thanks in advance.
Comment 31 shian 2017-10-26 01:28:11 UTC
(In reply to Lv Zheng from comment #30)
> To shian:
> 
> Would you please try attachment 260383 [details] to confirm if the problem
> disappears with kernel boot parameter of "acpi_resume_latency=25"?
> 

Great! I tried attachment 260383 [details] with acpi_resume_latency=25 on master commit ae59df0349ba, After 30 cycles of wakeup/suspend, I did not find the fan blow problem.

I will continue to try the attachment 260385 [details] later.
Comment 32 shian 2017-10-26 06:29:22 UTC
Ok, For attachment 260385 [details] I did not find fan blow in 30 wakeup/suspend cycles on kernel tag v4.10 without any boot parameter.
Comment 33 Lv Zheng 2017-10-27 06:29:18 UTC
The patch turns runtime power resources _ON before invoking _WAK.
The following is what I got from the Windows AMLi debugger.

AMLI: ffffe000cb4ad040: AsyncEvalObject(\_SB.PCI0.PEG0.PG00._OFF)
AMLI: ffffe000cb4ad040: AsyncEvalObject(\_SB.PCI0.PEG1.PG01._OFF)
AMLI: ffffe000cb4ad040: AsyncEvalObject(\_SB.PCI0.PEG2.PG02._OFF)
AMLI: ffffe000ca279540: AsyncEvalObject(\_PTS)
AMLI: ffffe000ca279540: AsyncEvalObject(\_SB.PCI0._ADR)
...
AMLI: ffffe000cb4aa040: AsyncEvalObject(\_WAK)
AMLI: ffffe000ca279540: AsyncEvalObject(\_SB.PCI0.RP01._ADR)
...
AMLI: ffffe000ca279540: AsyncEvalObject(\_SB.PCI0.LPCB.EC._Q1C)
AMLI: ffffe000ca279540: AsyncEvalObject(\_SB.PCI0.RP08._ADR)
...
AMLI: ffffe000ca279540: AsyncEvalObject(\_SB.LNKA._DIS)
AMLI: ffffe000ca279540: AsyncEvalObject(\_SB.LNKB._DIS)
AMLI: ffffe000cb8b0040: EvalNameSpaceObject(\_TZ.TZ0._CRT)
AMLI: ffffe000ca279540: AsyncEvalObject(\_SB.LNKC._DIS)
AMLI: ffffe000ca279540: AsyncEvalObject(\_SB.LNKD._DIS)
AMLI: ffffe000ca279540: AsyncEvalObject(\_SB.LNKE._DIS)
AMLI: ffffe000cb8b0040: AsyncEvalObject(\_TZ.TZ0._TMP)
AMLI: ffffe000ca279540: AsyncEvalObject(\_SB.LNKF._DIS)
AMLI: ffffe000ca279540: AsyncEvalObject(\_SB.LNKG._DIS)
AMLI: ffffe000ca279540: AsyncEvalObject(\_SB.LNKH._DIS)
AMLI: ffffe000ca279540: AsyncEvalObject(\_SB.PCI0.PEG2.PG02._ON)
AMLI: ffffe000ca279540: AsyncEvalObject(\_SB.PCI0.PEG1.PG01._ON)
AMLI: ffffe000ca279540: AsyncEvalObject(\_SB.PCI0.PEG0.PG00._OFF)
AMLI: ffffe000cb4a7040: AsyncEvalObject(\_SB.LID0._LID)
AMLI: ffffe000ca279540: AsyncEvalObject(\_SB.LID0._PSW)
AMLI: ffffe000cb4aa040: AsyncEvalObject(\_SB.PCI0.PEG0.PG00._ON)
...

On Windows, between _PTS and _WAK, there is no _ON/_OFF executed, which probably means something.
Comment 34 Lv Zheng 2017-10-31 09:49:36 UTC
I think attachment 260383 [details] cannot fix the issue. :)
I managed to obtain a failing machine from our customers, and tested the patch on 4.11 kernel, it didn't help.

I'll try to find if there is a better solution than attachment 260385 [details] on the failing system.
Comment 35 shian 2017-11-23 01:05:34 UTC
I'm worried that the problems that appear on my pc are not the same as other pc, I test attachment 260383 [details] with v4.11 with acpi_resume_latency=25 to test whether there is a different situation with master commit ae59df0349ba, But I still can not reproduce the problem on v4.11 with the patch.
Comment 36 shian 2017-11-23 09:18:31 UTC
I flash the new bios n1mur11w that fixed an issue where fan might rotated with max speed due to not reading CPU temperature correctly. 
But that is not the reason my fan rotated with max speed. I still can reproduce the problem use this bios.
Comment 37 shian 2017-12-06 06:13:20 UTC
Today, I have done a long time to test the new bios, Now I find that fan rotation occurs after the machine wakeup because the machine's temperature is really high. When the temperature dropped to normal levels, the fan also stopped rotating, I tested hundreds of cycles of suspend/wakeup. The problem seems gone?
Comment 38 Zhang Rui 2017-12-13 02:56:20 UTC
good news.
The fan blows either because the temperature is really high, or the CPU temperature is read incorrectly.
As the later case seems to be gone, bug closed.
please feel free to reopen it if it is reproduced again.
Comment 39 shian 2017-12-14 00:53:25 UTC
Created attachment 261173 [details]
turbostat output (kernel-4.14.3 BIOSCD[n1mur11w].iso)

Fan blow happened again.
It looks like I can hardly reproduce this problem anymore, I uploaded all the relevant information at the time this error occurred.
Comment 40 shian 2017-12-14 00:55:29 UTC
Created attachment 261175 [details]
dmidecode output (kernel-4.14.3 BIOSCD[n1mur11w].iso)
Comment 41 shian 2017-12-14 00:59:14 UTC
Created attachment 261177 [details]
sensors output (kernel-4.14.3 BIOSCD[n1mur11w].iso)
Comment 42 shian 2017-12-14 01:00:53 UTC
Created attachment 261179 [details]
full dmesg output (kernel-4.14.3 BIOSCD[n1mur11w].iso)
Comment 43 shian 2017-12-14 01:02:13 UTC
Created attachment 261181 [details]
acpidump output (kernel-4.14.3 BIOSCD[n1mur11w].iso)

Note You need to log in before you can comment on or make changes to this bug.