Bug 201721

Summary: fancontrol does not work - regression
Product: Drivers Reporter: Walther Pelser (w.pelser)
Component: Hardware MonitoringAssignee: Jean Delvare (jdelvare)
Status: RESOLVED INVALID    
Severity: normal CC: erik.kaneda, linux
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 4.19.2 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: part of yast2 systemd-journal
part of yast2 systemd-journal (right one)
4.19.1 fancontrol.pid
4.19.1 pwmconfig.txt
4.19.2 pwmconfig.txt
strace pwmconfig
fancontrol
output of dmesg 4.19.1
output of dmesg 4.19.2
older kernel dmesg
kernel 4.16.13 dmesg
unpatched dsopcode.c (taken from 4.19.1)
dmesg kernel 4.19.2 acpi_enforce_resources=lax
fancontrol 4.19.1
fancontrol acpi_enforce_resources=lax 4.19.2 + 4.16.13

Description Walther Pelser 2018-11-18 09:41:49 UTC
Created attachment 279497 [details]
part of yast2 systemd-journal

With kernel 4.19.2 "fancontrol" does not work.
Chassis-fan is always running with max speed.
See attachment "part of yast2 systemd-journal".
"pwmconfig" can't find a pwm capable fan.
 
Reinstalling kernel 4.19.1 "fancontrol" works fine again.
Comment 1 Walther Pelser 2018-11-18 09:47:09 UTC
Created attachment 279499 [details]
part of yast2 systemd-journal (right one)
Comment 2 Jean Delvare 2018-11-19 08:23:22 UTC
Which hwmon driver(s) are you using?

Please attach the output of pwmconfig with both kernels.
Comment 3 Walther Pelser 2018-11-19 15:52:57 UTC
Which hwmon driver(s) are you using?
I use sensors-detect and then pwmconfig, or do you mean something else?
When kernel 4.19.2 is in use, there is no fancontrol.pid
Comment 4 Walther Pelser 2018-11-19 15:53:24 UTC
Created attachment 279531 [details]
4.19.1 fancontrol.pid
Comment 5 Walther Pelser 2018-11-19 15:54:40 UTC
Created attachment 279533 [details]
4.19.1 pwmconfig.txt
Comment 6 Walther Pelser 2018-11-19 15:55:30 UTC
Created attachment 279535 [details]
4.19.2 pwmconfig.txt
Comment 7 Walther Pelser 2018-11-19 15:56:16 UTC
Created attachment 279537 [details]
strace pwmconfig
Comment 8 Guenter Roeck 2018-11-19 16:57:37 UTC
Two possible causes (commit log between 4.19.1 and 4.19.2):

5764ffc8a643 hwmon: (pwm-fan) Set fan speed to 0 on suspend
43cba96d9505 hwmon: (pmbus) Fix page count auto-detection.

I don't immediately see how any of those could result in the observed problems.

Unfortunately, the submitter did not tell which hwmon driver(s) are in use, much less provide any information about the affected system. Differences in instantiated hwmon devices and raw attribute names and values would have helped as well, as might have differences in system configuration and the output of dmesg. We don't even know if this is a PC, an embedded arm system, or something else.

Without additional information I don't think there is anything we can do.
Comment 9 Walther Pelser 2018-11-19 18:02:07 UTC
OS is openSUSE-Tumbleweed
Driver see attachment
Comment 10 Walther Pelser 2018-11-19 18:02:48 UTC
Created attachment 279543 [details]
fancontrol
Comment 11 Walther Pelser 2018-11-19 19:06:23 UTC
Created attachment 279545 [details]
output of dmesg 4.19.1

with line
[   27.615807] w83627ehf w83627ehf.656: hwmon_device_register() is deprecated. Please convert the driver to use hwmon_device_register_with_info().
Comment 12 Walther Pelser 2018-11-19 19:09:07 UTC
Created attachment 279547 [details]
output of dmesg 4.19.2

missing line
[   27.615807] w83627ehf w83627ehf.656: hwmon_device_register() is deprecated. Please convert the driver to use hwmon_device_register_with_info().
Comment 13 Walther Pelser 2018-11-19 19:14:37 UTC
(In reply to Guenter Roeck from comment #8)
> Two possible causes (commit log between 4.19.1 and 4.19.2):
> 
> 5764ffc8a643 hwmon: (pwm-fan) Set fan speed to 0 on suspend
> 43cba96d9505 hwmon: (pmbus) Fix page count auto-detection.
> 
> I don't immediately see how any of those could result in the observed
> problems.
> 
> Unfortunately, the submitter did not tell which hwmon driver(s) are in use,
> much less provide any information about the affected system. Differences in
> instantiated hwmon devices and raw attribute names and values would have
> helped
I need a little help, how to get them,mentioned above

 as wellas might have differences in system configuration and the
> output of dmesg. We don't even know if this is a PC, an embedded arm system,
> or something else.
> 
> Without additional information I don't think there is anything we can do.

Thanks
Comment 14 Guenter Roeck 2018-11-19 20:33:58 UTC
See commit 111650510 ("ACPICA: AML interpreter: add region addresses in global list during initialization"). From its description:

"This commit may result in warning messages that look like the following:
    
[    7.871761] ACPI Warning: system_IO range 0x00000428-0x0000042F conflicts With op_region 0x00000400-0x0000047F (\PMIO) (20180531/utaddress-213)
[    7.871769] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
    
However, these messages do not signify regressions. It is a result of
properly adding address ranges within the global address list.
"

Remedy would be to boot with acpi-enforce_resources=lax.
Comment 15 Jean Delvare 2018-11-20 08:34:04 UTC
The right spelling of the option is:

acpi_enforce_resources=lax

(underscore, not dash).

You got to love the note "these messages do not signify regressions" in the commit message. From a functional perspective, it totally is a regression. And the patch does not fix any actual bug. It references bug #200011 in the commit message, because the issue was noticed while investigating this bug, however the commit does NOT fix that bug (see https://bugzilla.kernel.org/show_bug.cgi?id=200011#c65 ).
Comment 17 Jean Delvare 2018-11-21 08:35:34 UTC
Walther, for completeness, is this a new system, or have you been running older kernels on it? The information I got from the ACPICA guys suggests that kernels up to 4.16 behaved the same as 4.19.3. It would be great if you could tell us whether or not such old kernels were working for you, so that we can confirm we are all talking about the same thing.
Comment 18 Jean Delvare 2018-11-21 08:36:13 UTC
Sorry I meant 4.19.2, not 4.19.3, in previous comment.
Comment 19 Walther Pelser 2018-11-21 15:04:10 UTC
@ Jean
It is not a new system. 
When I installed 4.19.2 the running kernel was 4.19.1. As I can not solve the fancontrol problem with 4.19.2, I run 4.19.1 again and everything works fine again. 
The "missing line" in comment #12 means for me, that the driver is not properly installed, but I could be wrong. 
At the moment I try to build my own kernel 4.19.2 without the two patches mentioned in comment #8, but there are still problems to get it run, as it is a localyesconfig-kernel.
Comment 20 Jean Delvare 2018-11-21 15:47:06 UTC
My question really is if you have ever been running a kernel older than 4.19.1 on this machine, and if fancontrol was working back then.
Comment 21 Walther Pelser 2018-11-21 16:42:20 UTC
Sorry for this misunderstanding
There were a lot of older kernels with no problems.
Comment 22 Walther Pelser 2018-11-21 18:42:22 UTC
Booting with "acpi-enforce_resources=lax" does not change anything. 
I made self compiled 4.19.2  without this patches:
5764ffc8a643 hwmon: (pwm-fan) Set fan speed to 0 on suspend
43cba96d9505 hwmon: (pmbus) Fix page count auto-detection 
by exchanging pmbus.c and pwm-fan.c with the files from 4.19.1.
But fancontrol does not work. 
And also the pre compiled kernels from openSUSE are having the same problem for me.
Comment 23 Guenter Roeck 2018-11-21 21:25:00 UTC
As Jean had pointed out in #16, it should have been "acpi_enforce_resources=lax". Sorry for that.
Comment 24 Erik Kaneda 2018-11-22 02:01:55 UTC
Hi Walter,

Could you try a kernel that is older than 4.17 and post the dmesg of it in a working state? This means that the fan isn't going out of control.
Comment 25 Jean Delvare 2018-11-22 07:30:28 UTC
Walther, I don't mean to be rude but it would really help us help you if you would follow the discussion to avoid testing things which we already know will not work.

(In reply to Walther Pelser from comment #22)
> Booting with "acpi-enforce_resources=lax" does not change anything.

That was expected, see comment #15 for the right spelling.

> I made self compiled 4.19.2  without this patches:
> 5764ffc8a643 hwmon: (pwm-fan) Set fan speed to 0 on suspend
> 43cba96d9505 hwmon: (pmbus) Fix page count auto-detection 
> by exchanging pmbus.c and pwm-fan.c with the files from 4.19.1.
> But fancontrol does not work.

Again, that was expected. See comment #14 for the patch which is believed to cause your problem. While the commit ID mentioned by Guenter seems incorrect, a quick search gives the correct commit ID:

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=22083c028d0b3ee419232d25ce90367e5b25df8f

That's the commit you should try reverting.
Comment 26 Walther Pelser 2018-11-22 16:21:54 UTC
Created attachment 279611 [details]
older kernel dmesg
Comment 27 Walther Pelser 2018-11-22 16:49:14 UTC
Created attachment 279613 [details]
kernel 4.16.13 dmesg

fancontrol does not work
very surprisingly
Comment 28 Walther Pelser 2018-11-22 16:53:15 UTC
Created attachment 279615 [details]
unpatched dsopcode.c (taken from 4.19.1)

fancontrol is working again
Comment 29 Walther Pelser 2018-11-22 17:02:31 UTC
(In reply to Erik Schmauss from comment #24)
> Hi Walter,
> 
> Could you try a kernel that is older than 4.17 and post the dmesg of it in a
> working state? This means that the fan isn't going out of control.

I have no idea, why fanconrol is not working. I used a precompiled kernel from openSUSE (https://build.opensuse.org/package/show/home%3Atiwai%3Akernel%3A4.16/kernel-default). So I can't help in this case.
Comment 30 Walther Pelser 2018-11-22 17:17:46 UTC
(In reply to Guenter Roeck from comment #23)
> As Jean had pointed out in #16, it should have been
> "acpi_enforce_resources=lax". Sorry for that.

I had noticed that comment from Jean. So it was a typo again, but now my own. Would it be usefully to try it again with  the right "acpi_enforce_resources=lax"?
Comment 31 Walther Pelser 2018-11-22 18:13:14 UTC
(In reply to Jean Delvare from comment #25)
> Walther, I don't mean to be rude but it would really help us help you if you
> would follow the discussion to avoid testing things which we already know
> will not work.
Rudeness without excuse has become great problem for me. (This does not point at you!). So I avoid filing bugs and try to solve my software problems in private. 
> 
> (In reply to Walther Pelser from comment #22)
> > Booting with "acpi-enforce_resources=lax" does not change anything.
> 
> That was expected, see comment #15 for the right spelling.
My typo
> 
> > I made self compiled 4.19.2  without this patches:
> > 5764ffc8a643 hwmon: (pwm-fan) Set fan speed to 0 on suspend
> > 43cba96d9505 hwmon: (pmbus) Fix page count auto-detection 
> > by exchanging pmbus.c and pwm-fan.c with the files from 4.19.1.
> > But fancontrol does not work.
> 
> Again, that was expected. See comment #14 for the patch which is believed to
> cause your problem. While the commit ID mentioned by Guenter seems
> incorrect, a quick search gives the correct commit ID:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/
> ?id=22083c028d0b3ee419232d25ce90367e5b25df8f
I always have had this warning with no real problems:
[    7.871761] ACPI Warning: system_IO range 0x00000428-0x0000042F conflicts With op_region 0x00000400-0x0000047F (\PMIO) (20180531/utaddress-213)
[    7.871769] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
So I did not mention, that it had to make with the patch, because the remedy was in my eyes no real solution.
> 
> That's the commit you should try reverting.
See attachment "unpatched dsopcode.c (taken from 4.19.1)".
 
Thanks for your efforts.
Walther
Comment 32 Guenter Roeck 2018-11-22 19:00:25 UTC
The problem is

[   28.802635] ACPI Warning: SystemIO range 0x0000000000000295-0x0000000000000296 conflicts with OpRegion 0x0000000000000290-0x0000000000000299 (\_SB.PCI0.SBRG.SIOR.HWRE) (20180105/utaddress-247)
[   28.802642] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver

not 428-42F. The above message is also seen with 4.16.13, so the driver should not instantiate there either.
Comment 33 Jean Delvare 2018-11-23 16:51:24 UTC
(In reply to Walther Pelser from comment #30)
> I had noticed that comment from Jean. So it was a typo again, but now my
> own. Would it be usefully to try it again with  the right
> "acpi_enforce_resources=lax"?

Yes, of course.
Comment 34 Walther Pelser 2018-11-26 15:53:55 UTC
Created attachment 279659 [details]
dmesg kernel 4.19.2 acpi_enforce_resources=lax

Please look first at line 4, whether I implemented  "acpi_enforce_resources=lax" in the right way into the boot options. If yes, the result is: fancontrol does not work. Otherwise give me an example, how to add in in the right way.
Comment 35 Guenter Roeck 2018-11-26 16:41:19 UTC
#34: Looks ok, and the subsequent warning suggests that the driver is indeed instantiated.
Comment 36 Erik Kaneda 2018-11-26 17:51:34 UTC
(In reply to Walther Pelser from comment #27)
> Created attachment 279613 [details]
> kernel 4.16.13 dmesg
> 
> fancontrol does not work
> very surprisingly

Correct me if I'm wrong but this means that your fancontrol never really worked aside from 4.17 through 4.19. So the claim that you made comment #21 is incorrect. Is that right?

I'm just trying to understand the situation.

Thanks,
Erik
Comment 37 Walther Pelser 2018-11-26 18:20:37 UTC
(In reply to Erik Schmauss from comment #36)
> (In reply to Walther Pelser from comment #27)
> > Created attachment 279613 [details]
> > kernel 4.16.13 dmesg
> > 
> > fancontrol does not work
> > very surprisingly
> 
> Correct me if I'm wrong but this means that your fancontrol never really
> worked aside from 4.17 through 4.19. So the claim that you made comment #21
> is incorrect. Is that right?
> 
> I'm just trying to understand the situation.
> 
> Thanks,
> Erik
You are right. I started with fancontrol this year and it worked all the time.  I was forgotten this. So I wrote with "a lot of older kernels", which was not the case in connection with fancontrol.
Comment 38 Walther Pelser 2018-11-27 15:49:43 UTC
(In reply to Erik Schmauss from comment #36)
> (In reply to Walther Pelser from comment #27)
> > Created attachment 279613 [details]
> > kernel 4.16.13 dmesg
> > 
> > fancontrol does not work
> > very surprisingly
> 
> Correct me if I'm wrong but this means that your fancontrol never really
> worked aside from 4.17 through 4.19. So the claim that you made comment #21
> is incorrect. Is that right?
> 
> I'm just trying to understand the situation.
> 
> Thanks,
> Erik

Now fancontrol works with 4.16.13 too. With "acpi_enforce_resources=lax" the fancontrol file has changed. The old one no longer works. You have to run pwmconfig again, which is now possible. The same fancontrol file works with 4.16.13 and 4.19.2
Comment 39 Walther Pelser 2018-11-27 15:52:35 UTC
Created attachment 279671 [details]
fancontrol 4.19.1
Comment 40 Walther Pelser 2018-11-27 15:53:53 UTC
Created attachment 279673 [details]
fancontrol acpi_enforce_resources=lax 4.19.2 + 4.16.13
Comment 41 Walther Pelser 2018-11-27 16:09:38 UTC
With acpi_enforce_resources=lax there comes another problem for me in connection to Thermal Monitor (KDE Plasma). It could be seen in fancontrol too. lmsensor atk0110-acpi is no longer available. For me this is no a good development.
Comment 42 Jean Delvare 2018-11-28 09:01:20 UTC
ATK0110 is an ACPI interface on top of your hardware monitoring chip. It presents (more or less) the same information as the native w83627ehf driver. You should never run both drivers are the same time (and the ACPI resource conflict detection is meant to prevent it).

The only functional difference between the acpi_atk0110 driver and the w83627ehf driver is that the former does not support manual fan speed control, and as such can't be used as a backend for the fancontrol script. However I'm pretty certain that Asus implements automatic fan speed control profiles in the BIOS, which are more efficient than a software daemon. So the right thing to do for your system is to use the asus_atk0110 driver (which means you should NOT pass "acpi_enforce_resources=lax") and select your preferred fan speed control profile in the BIOS.
Comment 43 Walther Pelser 2018-11-28 16:01:58 UTC
(In reply to Jean Delvare from comment #42)
> ATK0110 is an ACPI interface on top of your hardware monitoring chip. It
> presents (more or less) the same information as the native w83627ehf driver.
> You should never run both drivers are the same time (and the ACPI resource
> conflict detection is meant to prevent it).

Since the beginning I run fancontrol with both drivers, as sensors-detect can find them. There is a warning, but they are working without any problems. So why should I change a working system?

> 
> The only functional difference between the acpi_atk0110 driver and the
> w83627ehf driver is that the former does not support manual fan speed
> control, and as such can't be used as a backend for the fancontrol script.

So both drivers are needed!

> However I'm pretty certain that Asus implements automatic fan speed control
> profiles in the BIOS, which are more efficient than a software daemon. So
> the right thing to do for your system is to use the asus_atk0110 driver
> (which means you should NOT pass "acpi_enforce_resources=lax") and select
> your preferred fan speed control profile in the BIOS.

My asus-board has only one 4-pin connector for the cpu fan. The chassis fan has only a 3-pin connector. The BIOS controls the cpu fan very good, a software damon would be too dangerous, as I think. But the chassis fan is not properly controlled by the BIOS. Too fast and too loud. Fancontrol can manage this fan as if was 4-pin connector, very good.
"acpi_enforce_resources=lax" is needed, to have pwmconfig working. But with it or without it the acpi_atk0110 driver is NOT available with kernel 4.19.2.

But in the meantime openSUSE has made changes in their rpm-kernel 4.19.5 , so that fancontrol is running again, as it was with kernel 4.19.1.
So far, my problem has gone.

Thanks for your answers, but without the help of openSUSE I would have skipped the discussed patch, because it makes more problems for me, than it solves.

Walther 

NEEDINFO is obsolete? Could you change the status?
Comment 44 Guenter Roeck 2018-11-28 16:16:31 UTC
Normally fans can be controlled and managed in the BIOS.  One can normally select the temperatures used to control each fan, pwm vs. DC control, and manual vs. automatic operation (including fancy temperature/speed control). I personally don't use ASUS boards, but I would be quite surprised if ASUS would be any different.
Comment 45 Walther Pelser 2018-11-28 18:15:30 UTC
(In reply to Guenter Roeck from comment #44)
> Normally fans can be controlled and managed in the BIOS.  One can normally
> select the temperatures used to control each fan, pwm vs. DC control, and
> manual vs. automatic operation (including fancy temperature/speed control).
> I personally don't use ASUS boards, but I would be quite surprised if ASUS
> would be any different.

You are right regarding newer asus boards. My board is ten years old and the BIOS works as described. It's still a non uefi board.
Comment 46 Jean Delvare 2018-12-04 13:47:33 UTC
(In reply to Walther Pelser from comment #43)
> Since the beginning I run fancontrol with both drivers, as sensors-detect
> can find them. There is a warning, but they are working without any
> problems. So why should I change a working system?

I've been driving without my safety belt forever and never had any problem. Why should I start using a safety belt now?

Using 2 drivers for the same device at the same time is simply unsafe. It is racy by design, as the 2 drivers access the same registers without talking to each other. You have been lucky so far, good for you. But someday you will hit the race, and problems will start. Possibly up to a fan stopping completely and your system melting and/or burning. You have been warned.

> So both drivers are needed!

No. IF you insist on having manual fan speed control then the w83627ehf driver is needed INSTEAD OF the asus_atk0110 driver.

So, with your setup, you really want to blacklist the asus_atk0110 driver and run pwmconfig again to reconfigure the fancontrol daemon to use the w83627ehf temperatures as its input.

> My asus-board has only one 4-pin connector for the cpu fan. The chassis fan
> has only a 3-pin connector. The BIOS controls the cpu fan very good, a
> software damon would be too dangerous, as I think. But the chassis fan is
> not properly controlled by the BIOS. Too fast and too loud. Fancontrol can
> manage this fan as if was 4-pin connector, very good.

For completeness, this has little to do with 4-pin vs 3-pin fan connector. The benefit of 4-pin connectors is to allow accurate fan speed monitoring even when fan speed control is in effect and effective speed is very low. But you can still control a fan with a 3-pin connector as long as it never goes in the very low range, or you are not worried about losing monitoring at very low speeds.

(In reply to Walther Pelser from comment #45)
> You are right regarding newer asus boards. My board is ten years old and the
> BIOS works as described. It's still a non uefi board.

For even more completeness, this has nothing to do with UEFI or non-UEFI board.

You may still want to look for a BIOS update, by the way. Asus may have improved BIOS-based fan speed control at some point.