Bug 204807 - Hardware monitoring sensor nct6798d doesn't work unless acpi_enforce_resources=lax is enabled
Summary: Hardware monitoring sensor nct6798d doesn't work unless acpi_enforce_resource...
Status: CLOSED INVALID
Alias: None
Product: Drivers
Classification: Unclassified
Component: Platform_x86 (show other bugs)
Hardware: All Linux
: P1 high
Assignee: drivers_platform_x86@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-09-10 18:51 UTC by Artem S. Tashkinov
Modified: 2021-04-21 17:09 UTC (History)
31 users (show)

See Also:
Kernel Version: 5.2.11, git
Tree: Mainline
Regression: No


Attachments
acpidump -b (31.30 KB, application/x-xz)
2020-06-29 16:22 UTC, Artem S. Tashkinov
Details
acpidump -b (44.83 KB, application/x-compressed-tar)
2020-08-29 13:48 UTC, myhateisblind
Details
acpidump -b (43.39 KB, application/gzip)
2020-09-06 10:04 UTC, Jaap de Haan
Details
acpidump -b (114.97 KB, application/gzip)
2020-10-09 16:59 UTC, Lars Podszuweit
Details
acpidump -b for 5.9.6 and Asus X570-Plus TUF Gaming (153.33 KB, application/zip)
2020-11-09 14:49 UTC, lemniscattaden
Details
Dmesg after booting kernel 5.9.8 on Asus B550M TUF (86.33 KB, text/plain)
2020-11-18 18:04 UTC, myhateisblind
Details
dmesg after boot 5.9.8 Asus TUF Gaming X570-PLUS (93.29 KB, text/plain)
2020-11-20 21:12 UTC, lemniscattaden
Details
dmesg after boot ASUS PRIME X570-P (108.17 KB, text/plain)
2020-11-21 11:56 UTC, Jaap de Haan
Details
sensors not working acpi conflict (95.26 KB, text/plain)
2020-11-21 12:33 UTC, Facundo
Details
ASUS Prime B460 Plus dmesg (55.86 KB, text/plain)
2020-11-26 11:25 UTC, dflogeras2
Details
dmesg ASUS X570 TUF Gaming Pro (130.99 KB, text/plain)
2021-01-17 15:54 UTC, Thomas Langkamp
Details
acpicump -p ASUS X570 TUF Gaming Pro (79.28 KB, application/gzip)
2021-01-17 16:04 UTC, Thomas Langkamp
Details
dmesg for ASUS ROG CROSSHAIR VIII IMPACT with BIOS : 3204 (135.50 KB, text/plain)
2021-02-14 12:38 UTC, Gregory Duhamel
Details
acpidump for ASUS ROG CROSSHAIR VIII IMPACT with BIOS : 3204 (46.31 KB, application/octet-stream)
2021-02-14 12:41 UTC, Gregory Duhamel
Details
Dmesg for Asus B550M-plus WIFI (84.94 KB, text/plain)
2021-02-22 22:08 UTC, yasin inat
Details
dmesg for Asus ROG STRIX Z490-I (83.17 KB, text/plain)
2021-02-23 20:07 UTC, Michael Coote
Details
dmesg after boot ASUS PRIME X570-P bios 3602 (98.09 KB, text/plain)
2021-03-13 15:18 UTC, Jaap de Haan
Details
acpidump ASUS PRIME X570-P bios 3602 (41.07 KB, application/gzip)
2021-03-13 15:19 UTC, Jaap de Haan
Details
acpidump for Pro B550-C (44.47 KB, application/gzip)
2021-04-21 17:09 UTC, doomwarriorx
Details

Description Artem S. Tashkinov 2019-09-10 18:51:27 UTC
Without this kernel flag I see this on boot:


nct6775: Found NCT6798D or compatible chip at 0x2e:0x290
ACPI Warning: SystemIO range 0x0000000000000295-0x0000000000000296 conflicts with OpRegion 0x0000000000000290-0x0000000000000299 (\AMW0.SHWM) (20190509/utaddress-204)
ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver


I'd be glad to provide any required information.

This bug needs to be fixed because

1) It doesn't affect Windows
2) Average people will never know how to deal with issue
3) I cannot ask my motherboard vendor (ASUS) to fix this issue in BIOS because they don't provide support for Linux - they barely provide any support at all.
Comment 1 Artem S. Tashkinov 2019-09-11 14:01:20 UTC
Even with acpi_enforce_resources=lax I'm getting these messages on boot:

nct6775: Found NCT6798D or compatible chip at 0x2e:0x290
ACPI Warning: SystemIO range 0x0000000000000295-0x0000000000000296 conflicts with OpRegion 0x0000000000000290-0x0000000000000299 (\AMW0.SHWM) (20190509/utaddress-204)
ACPI: This conflict may cause random problems and system instability
ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
Comment 2 Ulf 2020-01-01 17:17:39 UTC
Same for in 5.4.6 on a Asus Z170-WS (BIOS 3602 05/24/2019). dmesg says,

nct6775: Found NCT6793D or compatible chip at 0x2e:0x290
ACPI Warning: SystemIO range 0x0000000000000295-0x0000000000000296 conflicts with OpRegion 0x0000000000000290-0x0000000000000299 (\_GPE.HWM) (20190816/utaddress-204)
Comment 3 Artem S. Tashkinov 2020-04-20 10:42:57 UTC
Any updates on this one, Rui Zhang?

It's great you've specified HW but it surely looks like no one really cares.
Comment 4 Zhang Rui 2020-06-29 10:15:09 UTC
Please attach the acpidump output.

Well, TBH, it is probably that there is no way to fix this.
The root cause is that ACPI claims some resources that will possibly be used by the ACPI AML code, but the native nct6775 driver also requests the same piece of resource.
Comment 5 Artem S. Tashkinov 2020-06-29 16:22:27 UTC
Created attachment 289943 [details]
acpidump -b

(In reply to Zhang Rui from comment #4)
> Please attach the acpidump output.
> 
> Well, TBH, it is probably that there is no way to fix this.
> The root cause is that ACPI claims some resources that will possibly be used
> by the ACPI AML code, but the native nct6775 driver also requests the same
> piece of resource.

This doesn't seem right because Windows just works and doesn't require any hacks to be 100% functional on this PC. 99% of users will never know how to enable this option and will have a malfunctioning sensor.
Comment 6 myhateisblind 2020-08-29 13:48:36 UTC
Created attachment 292207 [details]
acpidump -b

Same issue with linux-next-git (5.9-rc2ish at the moment) on ASUS TUF B550M motherboard.
Comment 7 Jaap de Haan 2020-09-05 21:42:08 UTC
Same issue on latest ubuntu 20.04.1 LTS with an ASUS Prime X570-P, flashed at BIOS v2606.
Comment 8 Artem S. Tashkinov 2020-09-05 23:22:51 UTC
(In reply to Zhang Rui from comment #4)
> Please attach the acpidump output.
> 
> Well, TBH, it is probably that there is no way to fix this.
> The root cause is that ACPI claims some resources that will possibly be used
> by the ACPI AML code, but the native nct6775 driver also requests the same
> piece of resource.

From what I can see pretty much all recent AMD chipset motherboards users are affected. This sounds like something which must be fixed because we are talking about a very broad use case.
Comment 9 Jaap de Haan 2020-09-06 10:04:12 UTC
Created attachment 292371 [details]
acpidump -b

dump from my system ASUS Prime X570-P Bios v2606
Comment 10 Clodoaldo Pinto Neto 2020-10-07 19:57:44 UTC
Same issue on 5.8.13 on Gigabyte B550M DS3H.

out 07 07:09:44 d3.localdomain kernel: it87: Found IT8628E chip at 0xa40, revision 1
out 07 07:09:44 d3.localdomain kernel: it87: Beeping is supported
out 07 07:09:44 d3.localdomain kernel: ACPI Warning: SystemIO range 0x0000000000000A45-0x0000000000000A46 conflicts with OpRegion 0x0000000000000A45-0x0000000000000A46 (\GSA1.SIO1) (20200528/utaddress-204)
out 07 07:09:44 d3.localdomain kernel: ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
out 07 07:09:44 d3.localdomain kernel: fuse: init (API version 7.31)
out 07 07:09:44 d3.localdomain systemd[1]: systemd-modules-load.service: Main process exited, code=exited, status=1/FAILURE
out 07 07:09:44 d3.localdomain systemd[1]: systemd-modules-load.service: Failed with result 'exit-code'.
out 07 07:09:44 d3.localdomain systemd[1]: Failed to start Load Kernel Modules.
out 07 07:09:44 d3.localdomain kernel: audit: type=1130 audit(1602065384.576:2): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=systemd-modules-load comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
Comment 11 Lars Podszuweit 2020-10-09 16:59:50 UTC
Created attachment 292911 [details]
acpidump -b

Same issue on ASUS TUF Z390M-PRO GAMING BIOS v2808  


nct6775: Enabling hardware monitor logical device mappings.
nct6775: Found NCT6798D or compatible chip at 0x2e:0x290
ACPI Warning: SystemIO range 0x0000000000000295-0x0000000000000296 conflicts with OpRegion 0x0000000000000290-0x0000000000000299 (\AMW0.SHWM) (20200528/utaddress-204)
Comment 12 lemniscattaden 2020-11-09 14:49:10 UTC
Created attachment 293593 [details]
acpidump -b for 5.9.6 and Asus X570-Plus TUF Gaming

I have the same problem with 5.9.6 kernel and Asus X570-Plus TUF Gaming motherboard.
Comment 13 Zhang Rui 2020-11-18 14:28:14 UTC
for all the people in this thread that reports the same problem, please attach the full dmesg output after boot.
Comment 14 myhateisblind 2020-11-18 18:04:02 UTC
Created attachment 293727 [details]
Dmesg after booting kernel 5.9.8 on Asus B550M TUF
Comment 15 lemniscattaden 2020-11-20 21:12:07 UTC
Created attachment 293751 [details]
dmesg after boot 5.9.8 Asus TUF Gaming X570-PLUS
Comment 16 Jaap de Haan 2020-11-21 11:56:48 UTC
Created attachment 293755 [details]
dmesg after boot ASUS PRIME X570-P

dmesg after boot ASUS PRIME X570-P
Comment 17 Facundo 2020-11-21 12:33:33 UTC
Created attachment 293757 [details]
sensors not working acpi conflict

Sensors not working with chip Nuvoton NCT6798D on Asus Primer X570-PRO
because of ACPI conflict: ACPI Warning: SystemIO range 0x0000000000000295-0x0000000000000296 conflicts with OpRegion
Comment 18 dflogeras2 2020-11-26 11:21:03 UTC
Also affected on ASUS Prime B460-Plus motherboard w v1403 BIOS.
Comment 19 dflogeras2 2020-11-26 11:25:04 UTC
Created attachment 293819 [details]
ASUS Prime B460 Plus dmesg

dmesg of ASUSM Prime B460 Plus booting
Comment 20 Thomas Langkamp 2021-01-17 15:53:12 UTC
Same with Nuvoton NCT6798D on Asus TUF Gaming X570 PRO and Kernel 5.10.2-2-MANJARO
Comment 21 Thomas Langkamp 2021-01-17 15:54:00 UTC
Created attachment 294703 [details]
dmesg ASUS X570 TUF Gaming Pro
Comment 22 Thomas Langkamp 2021-01-17 16:04:31 UTC
Created attachment 294705 [details]
acpicump -p ASUS X570 TUF Gaming Pro
Comment 23 Gregory Duhamel 2021-02-14 12:37:20 UTC
Same issue on : DMI: ASUS System Product Name/ROG CROSSHAIR VIII IMPACT, BIOS 3204 01/25/2021

ACPI Warning: SystemIO range 0x0000000000000295-0x0000000000000296 conflicts with OpRegion 0x0000000000000290-0x0000000000000299 (\AMW0.SHWM) (20200925/utaddress-204)

ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
Comment 24 Gregory Duhamel 2021-02-14 12:38:43 UTC
Created attachment 295261 [details]
dmesg for ASUS ROG CROSSHAIR VIII IMPACT with BIOS : 3204
Comment 25 Gregory Duhamel 2021-02-14 12:41:00 UTC
Created attachment 295263 [details]
acpidump for ASUS ROG CROSSHAIR VIII IMPACT with BIOS : 3204
Comment 26 yasin inat 2021-02-22 22:08:37 UTC
Created attachment 295403 [details]
Dmesg for Asus B550M-plus WIFI

I have "acpi_enforce_resources=lax" in kernel line but still have the conflict problem. 

Probably not related but rest of the parameters: add_efi_memmap initrd=\amd-ucode.img mitigations=off video=current iommu=pt
Comment 27 Michael Coote 2021-02-23 20:07:30 UTC
Created attachment 295417 [details]
dmesg for Asus ROG STRIX Z490-I

dmesg for Asus ROG STRIX Z490-I. BIOS 11/30/2020 v1003

[    3.194752] nct6775: Found NCT6798D or compatible chip at 0x2e:0x290
[    3.194756] ACPI Warning: SystemIO range 0x0000000000000295-0x0000000000000296 conflicts with OpRegion 0x0000000000000290-0x0000000000000299 (\AMW0.SHWM) (20200528/utaddress-204)
[    3.194759] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
Comment 28 frank 2021-03-05 06:40:12 UTC
Same also on Asus Prime H310I-Plus


Ubuntu 20.04.2 LTS
Kernel: 5.4.0-66-generic
acpi_enforce_resources=lax enabled

dmesg:

[    4.083054] nct6775: Found NCT6796D or compatible chip at 0x2e:0x290
[    4.083058] ACPI Warning: SystemIO range 0x0000000000000295-0x0000000000000296 conflicts with OpRegion 0x0000000000000290-0x0000000000000299 (\AMW0.SHWM) (20190816/utaddress-204)
[    4.083063] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
Comment 29 Jaap de Haan 2021-03-13 15:18:38 UTC
Created attachment 295835 [details]
dmesg after boot ASUS PRIME X570-P bios 3602

ASUS PRIME X570-P after flashing newest bios 3602.
Comment 30 Jaap de Haan 2021-03-13 15:19:39 UTC
Created attachment 295837 [details]
acpidump ASUS PRIME X570-P bios 3602

ACPI Dump ASUS PRIME X570-P BIOS 3602
Comment 31 Zhang Rui 2021-03-18 04:20:47 UTC
I checked the dmesg and acpidump from Jaap,

[    3.497957] nct6775: Found NCT6798D or compatible chip at 0x2e:0x290
[    3.497963] ACPI Warning: SystemIO range 0x0000000000000295-0x0000000000000296 conflicts with OpRegion 0x0000000000000290-0x0000000000000299 (\AMW0.SHWM) (20200528/utaddress-204)
[    3.497969] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver

        Device (AMW0)
        {
            Name (_HID, EisaId ("PNP0C14") /* Windows Management Instrumentation Device */)  // _HID: Hardware ID
            Name (_UID, "ASUSWMI")  // _UID: Unique ID
        ...

So the resource conflict happens between native nct6775 driver and the ACPI asus_wmi driver.
My understanding is that asus_wmi/asus_nb_wmi do the same thing as nct6775 and expose them to hwmon class as well. If this is true, we can simply ignore these warning messages because "Yes, there is an ACPI driver is available for this device. And yes, we can use asus hwmon I/F instead of the native driver"

But this is just my guess. Need Hans to confirm on this.

For the other reporters in this thread, please
1. make sure your kernel is built with CONFIG_ASUS_WMI and CONFIG_ASUS_NB_WMI
2. attach the output of "sensors" command
because we should be able to see "asus" sensor in the output, provides similar functionality as nct6775 driver does.
Comment 32 Zhang Rui 2021-03-18 04:21:29 UTC
Reassign to platform driver category.
Comment 33 Matthew Garrett 2021-03-18 04:32:24 UTC
This isn't a bug - the ACPI tables claim the resource in question, and there's no way we can verify there are no conflicts between ACPI methods that touch that range and the native driver. If you're confident that this is safe on your system then you can boot with acpi_enforce_resources=lax, but we can't make that the default. This will still produce the warning, but the driver will be permitted to load.
Comment 34 Artem S. Tashkinov 2021-03-19 15:09:33 UTC
(In reply to Matthew Garrett from comment #33)
> This isn't a bug - the ACPI tables claim the resource in question, and
> there's no way we can verify there are no conflicts between ACPI methods
> that touch that range and the native driver. If you're confident that this
> is safe on your system then you can boot with acpi_enforce_resources=lax,
> but we can't make that the default. This will still produce the warning, but
> the driver will be permitted to load.

This bug needs to be fixed because

1) It doesn't affect Windows
2) Average people will never know how to deal with issue
3) I cannot ask my motherboard vendor (ASUS) to fix this issue in BIOS because they don't provide support for Linux - they barely provide any support at all.

OoB experience of Linux users should not be "I don't get any sensors output, how to fix that?" Most users don't even know what and how to Google. They don't know about dmesg either.

That's an effing horrible attitude.

I'm CC'ing Linus because I absolutely hate what's going on.
Comment 35 Artem S. Tashkinov 2021-03-19 15:14:14 UTC
This might not be a classic "bug" but **no one on Earth cares**. What people care about is having their systems work and be supported by Linux out of the box with **no cryptic voodoo applied**. You don't ask Windows users to run bcdedit.exe to fix their hardware, do you?

So, why do Linux users have to edit system configuration files to get at least comparable experience? Don't get me started that HWiNFO64 shows up to ten times more hardware sensors and their parameters than lm-sensors.
Comment 36 Artem S. Tashkinov 2021-03-19 15:15:19 UTC
Lastly, this problem affects literally hundreds of thousands of systems. It's not some single broken motherboard or broken EFI, we're talking about multiple classes of hardware.
Comment 37 Matthew Garrett 2021-03-19 19:13:59 UTC
Here's the situation. Your ACPI tables declare that your system firmware may access the addresses associated with your IO sensors. We have no idea what your firmware may do here - it may do nothing (in which case accessing the addresses is completely safe), or it may use them for its own internal monitoring. Sensor hardware frequently uses indexed addressing, which means that accessing a sensor requires something like the following:

1) Write the desired sensor to the index register
2) Read the sensor value from the data register

These can't occur simultaneously, so if both the OS and the firmware are accessing it you risk ending up with something like:

1) Write sensor A to the index register (from the OS)
2) Write sensor B to the index register (from the firmware)
3) Read the sensor value from the data register (returns the value of sensor B to the firmware)
4) Read the sensor value from the data register (returns the value of sensor B to the OS)

The OS asked for the value of sensor A, but received the value of sensor B. From the OS side this is probably not a big deal (you get a weird value in your graphing), but if it happens the other way around the firmware may decide that the system is running out of spec and shut it down to avoid damage. This is not a good user experience.

Why does Windows not have the same problem? Well, in the general case there's nothing stopping it from doing so. Vendor tooling usually takes one of two approaches:

1) They don't use the hardware sensors directly, they use firmware interfaces to them. This is alluded to in comment #31 - on Asus systems, the sensors are available via a WMI interface. Using a firmware interface ensures that the firmware knows what the state of the hardware is, and avoids any race conditions. Your board may well support an alternative firmware interface and Linux simply lacks driver support for it. If so, I'm afraid that the correct solution is to add that driver support. Given that this bug has ended up covering boards from multiple vendors, it's no longer the correct place to handle that, though.
2) The vendor knows that the firmware makes no policy decisions based on the sensor values, so it's safe to access the resources even though the firmware declares that it uses them. The problem with this approach is that *we* have no way of knowing that it's safe, and the consequences of it being unsafe include data loss. Given the choice between users being able to look at system temperatures and users not losing data, we choose to prioritise users not losing data.

Looking at your ACPI tables, we see the following:

    Name (IOHW, 0x0290)

    OperationRegion (SHWM, SystemIO, IOHW, 0x0A)
    Field (SHWM, ByteAcc, NoLock, Preserve)
    {
        Offset (0x05), 
        HIDX,   8, 
        HDAT,   8
    }

This means that there's a region of IO ports starting at address 0x290 and 0x0a addresses long. This is the same region of port IO that your sensor chip uses. Within that address range, we declare that 0x295 is called HIDX, and 0x296 is called HDAT. This is consistent with an index and data register as described above, which means that having the OS access this space directly is likely to race with the firmware (ie, it's dangerous).

Near here are two methods called RHWM and WHWM. At a guess, that's "Read Hardware Monitoring" and "Write Hardware Monitoring". These not only access the sensors via the registers described above, they do some additional hardware access around it. This is further evidence to support there being some handshaking involved to avoid race conditions - the firmware takes a mutex and appears to hit some other register that may also be used to guard against racing against system management mode. We really, *really* want to be using the firmware methods here rather than touching the sensor chip directly. At this point, direct access isn't so much walking past a sign saying "Danger, keep out", it's a sign saying "Proceed no further or you will die slowly and it will hurt the entire time".

RHWM is referenced from the WMBD method if the first argument to it is RHWM, and WHWM is referenced if the argument is WHWM. WMBD is the WMI dispatcher for the WMI function with identifier "BD" - looking at your _WDG object, which describes the available WMI interfaces, we have the following:

            Name (_WDG, Buffer (0x50)
            {
                /* 0000 */  0xD0, 0x5E, 0x84, 0x97, 0x6D, 0x4E, 0xDE, 0x11,  // .^..mN..
                /* 0008 */  0x8A, 0x39, 0x08, 0x00, 0x20, 0x0C, 0x9A, 0x66,  // .9.. ..f
                /* 0010 */  0x42, 0x43, 0x01, 0x02, 0xA0, 0x47, 0x67, 0x46,  // BC...GgF
                /* 0018 */  0xEC, 0x70, 0xDE, 0x11, 0x8A, 0x39, 0x08, 0x00,  // .p...9..
                /* 0020 */  0x20, 0x0C, 0x9A, 0x66, 0x42, 0x44, 0x01, 0x02,  //  ..fBD..
                /* 0028 */  0x72, 0x0F, 0xBC, 0xAB, 0xA1, 0x8E, 0xD1, 0x11,  // r.......
                /* 0030 */  0x00, 0xA0, 0xC9, 0x06, 0x29, 0x10, 0x00, 0x00,  // ....)...
                /* 0038 */  0xD2, 0x00, 0x01, 0x08, 0x21, 0x12, 0x90, 0x05,  // ....!...
                /* 0040 */  0x66, 0xD5, 0xD1, 0x11, 0xB2, 0xF0, 0x00, 0xA0,  // f.......
                /* 0048 */  0xC9, 0x06, 0x29, 0x10, 0x4D, 0x4F, 0x01, 0x00   // ..).MO..
            })

The format of _WDG is 16 bytes of GUID, 2 bytes of ID or notification data, 1 byte of instance count and 1 byte of flags. The GUID used by asus-wmi corresponds to the first GUID in this file, 97845ED0-4E6D-11DE-8A39-0800200C9A66. That has an ID of 0x4243, or BC - ie, it's not the GUID we're looking for. The next GUID, however, (466747a0-70ec-11de-8a39-0800200c9a66) has an identifier of 0x4344, or BD. So this is the GUID we're looking for. Unfortunately asus-wmi doesn't handle this GUID, so new code will need to be written.

I'm going to close this bug again because it's turned into a generic bug covering different motherboard vendors, and there's no one size fits all solution. For your case the correct way to handle it is for someone to write a driver that uses the 466747a0-70ec-11de-8a39-0800200c9a66 interface to expose the sensor data. I'm afraid I don't have relevant hardware so can't do this myself, but please do open another bug for that.

tl;dr - the kernel message you're seeing is correct. Avoiding it requires a new driver to be written. If you *personally* feel safe in ignoring the risks, you can pass the acpi_enforce_resources=lax option, but that can't be the default because it's unsafe in the general case, and so it isn't the solution to the wider problem.
Comment 38 Jaap de Haan 2021-03-20 07:22:46 UTC
CONFIG_ASUS_WMI=m andI confirmed the module is loaded.

I think I saw some improvements lastly in the support of temperature sensors, I am not so sure because I have no traces of the old state and it's a long time ago I used the UI. I flashed my BIOS recently and hoped things would be solved with that action.

Thanks a lot Matthew for this good explanation and for the first time I understood (at abstract level) what is going on and why it is so. This explanation is something really valuable to be kept and put in a prominent place like kernel Documentation and a known issues text file (then a less asus specific explanation) IMO.

I was nearly as desperate to try to use the `acpi_enforce_resources=lax` setting but without understanding it is for me as an engineer something "hot" and now I really get why it is so, I will for my part keep my fingers away from the setting and hope that someone will find out how to get the FAN values in the normal driver.

Many thanks for the clarification.
Comment 39 Matthew Garrett 2021-03-20 07:51:03 UTC
As noted in https://twitter.com/james_hilliard/status/1373178256615211012, there's actually a driver here: https://github.com/electrified/asus-wmi-sensors/ . I did a quick search earlier, but managed to miss this somehow.
Comment 40 myhateisblind 2021-03-20 07:57:37 UTC
Are you sure about that driver? The github page says:

"Note: X570/B550/TRX40 boards do not have the WMI interface and are not supported."

And those seems to be the chipsets of all or almost all boards reported in this bug.

20 mar. 2021 8:51:07 bugzilla-daemon@bugzilla.kernel.org:

> https://bugzilla.kernel.org/show_bug.cgi?id=204807
> 
> --- Comment #39 from Matthew Garrett (mjg59-kernel@srcf.ucam.org) ---
> As noted in https://twitter.com/james_hilliard/status/1373178256615211012,
> there's actually a driver here:
> https://github.com/electrified/asus-wmi-sensors/ . I did a quick search
> earlier, but managed to miss this somehow.
> 
> -- 
> You may reply to this email to add a comment.
> 
> You are receiving this mail because:
> You are on the CC list for the bug.
Comment 41 Matthew Garrett 2021-03-20 08:16:05 UTC
Interesting, it looks like it uses the same GUID but has a different set of methods. So yes, this driver probably won't work for a bunch of the boards here - it would need to be adapted to add support for the methods that these ones provide.
Comment 42 Artem S. Tashkinov 2021-03-20 15:28:06 UTC
(In reply to Matthew Garrett from comment #37)
> 97845ED0-4E6D-11DE-8A39-0800200C9A66. That has an ID of 0x4243, or BC - ie,
> it's not the GUID we're looking for. The next GUID, however,
> (466747a0-70ec-11de-8a39-0800200c9a66) has an identifier of 0x4344, or BD.
> So this is the GUID we're looking for. Unfortunately asus-wmi doesn't handle
> this GUID, so new code will need to be written.
> 
> I'm going to close this bug again because it's turned into a generic bug
> covering different motherboard vendors, and there's no one size fits all
> solution. For your case the correct way to handle it is for someone to write
> a driver that uses the 466747a0-70ec-11de-8a39-0800200c9a66 interface to
> expose the sensor data. I'm afraid I don't have relevant hardware so can't
> do this myself, but please do open another bug for that.
> 
> tl;dr - the kernel message you're seeing is correct. Avoiding it requires a
> new driver to be written. If you *personally* feel safe in ignoring the
> risks, you can pass the acpi_enforce_resources=lax option, but that can't be
> the default because it's unsafe in the general case, and so it isn't the
> solution to the wider problem.

That's the problem: we have _multiple_ motherboards with _multiple_ different chipsets from _different_ vendors 

1) all having the same glitch
2) all requiring the same workaround
3) working just fine under Windows with no hacks

> My understanding is that asus_wmi/asus_nb_wmi do the same thing as nct6775
> and expose them to hwmon class as well.

And at the same time you're talking about asus_wmi which covers only _certain_ ASUS motherboards, and no one in this discussion has shown it to work or provide the same set of sensors.

And this driver has nothing to do with sensors, linux/drivers/platform/x86/asus-wmi.c:

 * Asus PC WMI hotkey driver

This is not a driver which even tangentially deals with HW sensors found in motherboards affected by this bug.

I don't know why you're trying to sweep this bug under the rug but I really dislike it. The Linux kernel development has always followed common sense principles and it contains a _huge_ number of workarounds just to enable HW which doesn't work according to specs.

At the very least you could printk() this:

"Your motherboard might not exposing ACPI resources correctly, so you might not get access to your HW sensors. You could add "acpi_enforce_resources=lax" to kernel boot parameters to enable monitoring at your own risk. Please refer to https://bugzilla.kernel.org/show_bug.cgi?id=204807 for more information".

And this still paints Linux in a very bad light as users hardly care about if ACPI is implemented according to the specifications or not: however what they really care is whether their hardware works or being supported under Linux regardless out of the box. Most Linux users don't even know `dmesg` exists, so they have no way of knowing how to fix the issue.

Lastly, this bug is not fixed.
Comment 43 Artem S. Tashkinov 2021-03-20 15:33:14 UTC
A small correction of my previous comment:

linux/drivers/platform/x86/asus-nb-wmi.c

/*
 * Asus Notebooks WMI hotkey driver
 *
 * Copyright(C) 2010 Corentin Chary <corentin.chary@gmail.com>
 */

This is not related to lm-sensors in any shape or form. I'm really sad how this situation is getting handled: the bug has been known for over 1.5 years, affects literally hundreds of thousands devices and you're saying that this kernel option might have unintended consequences yet _everyone_ in this thread has enabled it with _zero_ side affects and Windows seemingly has it enabled by default, as no such messages are getting logged in Windows Event Log either when using HWiNFO64 or vendor specific monitoring software.
Comment 44 Artem S. Tashkinov 2021-03-20 15:43:47 UTC
(In reply to Artem S. Tashkinov from comment #42)
> "Your motherboard might not be exposing ACPI resources correctly, so you
> might
> not get access to your HW sensors. You could add
> "acpi_enforce_resources=lax" to kernel boot parameters to enable monitoring
> at your own risk. Please refer to
> https://bugzilla.kernel.org/show_bug.cgi?id=204807 for more information".
 
This message will at least allow various Linux distros to enable the option by default because many are not aware of the bug.
Comment 45 Matthew Garrett 2021-03-20 16:04:47 UTC
Artem,

Nobody is denying there's an issue here. However, the issue is that an additional driver needs to be written for this hardware. Please file a new bug for that and do not keep reopening this one.
Comment 46 Zhang Rui 2021-03-21 18:39:54 UTC
(In reply to Artem S. Tashkinov from comment #43)
> A small correction of my previous comment:
> 
> linux/drivers/platform/x86/asus-nb-wmi.c
> 
> /*
>  * Asus Notebooks WMI hotkey driver
>  *
>  * Copyright(C) 2010 Corentin Chary <corentin.chary@gmail.com>
>  */
> 
> This is not related to lm-sensors in any shape or form.

asus_nb_wmi_init -> asus_wmi_register_driver -> asus_wmi_probe -> asus_wmi_add -> asus_wmi_hwmon_init

Although the warning messages are printed by ACPI code, but this is a conflict between the native nct6775 driver and the Asus wmi driver, because Asus wmi driver accesses the same piece of resources and provide similar functionalities. And I'm familiar with neither of them.

> I'm really sad how
> this situation is getting handled: the bug has been known for over 1.5
> years, affects literally hundreds of thousands devices and you're saying
> that this kernel option might have unintended consequences yet _everyone_ in
> this thread has enabled it with _zero_ side affects and Windows seemingly
> has it enabled by default, as no such messages are getting logged in Windows
> Event Log either when using HWiNFO64 or vendor specific monitoring software.

In Linux, at least for now, I don't see a way to enable native nct6775 driver by default, and, this is true for all the native drivers that have resource conflict with the firmware.

IMO, the rootcause is that Linux does not support override driver A (native driver in this case) when driver B (driver that talks to firmware) is loaded, so we have to disable driver A even if there is only 0.01% possibility that driver B will be loaded when we know there might be a conflict.

what we can do is to write driver B to make this statement true
"ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver" and ignore this message.

(In reply to Artem S. Tashkinov from comment #44)
> (In reply to Artem S. Tashkinov from comment #42)
> > "Your motherboard might not be exposing ACPI resources correctly, so you
> > might
> > not get access to your HW sensors. You could add
> > "acpi_enforce_resources=lax" to kernel boot parameters to enable monitoring
> > at your own risk. Please refer to
> > https://bugzilla.kernel.org/show_bug.cgi?id=204807 for more information".
> 
> This message will at least allow various Linux distros to enable the option
> by default because many are not aware of the bug.

Hmmm, what about following conditions
1. "acpi_enforce_resources=" is a global switch, there might be platforms with more than one conflict, or with another conflict rather than nct6775. we can not validate all of them.
2. we may have new drivers that talk with firmware later, and we can not use "acpi_enforce_resources=lax" then.

But thanks for raising this up, I think this also rings a bell that the current message is kind of misleading.
It is true that ACPI covers a series of devices as described in the ACPI spec. But at the same time, ACPI is an interface. Many drivers, including vendor specific drivers, talks with firmware through the ACPI Interface. They depends on ACPI, but they're actually not covered by the ACPI specification, nor by kernel drivers/acpi code.

"ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver" makes people feel like it is an ACPI problem, but in many cases, it is not, I can only triage them.
Comment 47 Hans de Goede 2021-03-21 19:14:46 UTC
So if someone is willing to spend time on making this work, then here is how I believe this could be made to work (for the case which Matthew Garrett analysed):

1. Modify the nct6775 driver, adding a set of nct6775_register_ops to the nct6775_sio_data struct and have any function which sits "below" the probe() function only use these ops to do register accesses. Combined with having sensors_nct6775_init() set these register-ops to the currently used superio register access functions (so that nothing changes for existing users of the driver).

2. Move the nct6775_sio_data struct declaration to a shared header somewhere under include/linux

3. Have a new WMI driver which defines register-ops compatible with the ones expected by the nct6775_sio_data struct, using the RHWM and WHWM methods which Matthew found (note these should be called through their WMI wrappers) and have this driver instantiate a platform device, with its platdata set to this new nct6775_sio_data struct, allowing the nct6775 driver to access the registers this way, using the mutual-exclusion mechanism build into the RHWM and WHWM methods.

As the drivers/platform/x86 maintainer I would be more then happy to merge a clean driver for step 3. To me this seems quite doable (to someone with some kernel-dev experience + enough time).

Note I believe that this will not be a whole lot of work (but its not trivial either).
Comment 48 Andy Shevchenko 2021-03-22 10:12:49 UTC
Artem,
Matthew gave a really good explanation on techical background what's going on. What you really need is to amend existing driver(s) or provide a new one to fulfill the functionality you want to have.
Comment 49 Artem S. Tashkinov 2021-03-22 10:51:55 UTC
(In reply to Andy Shevchenko from comment #48)
> Matthew gave a really good explanation on techical background what's going
> on. What you really need is to amend existing driver(s) or provide a new one
> to fulfill the functionality you want to have.

I'm not a programmer let alone a person who understand the innards of the Linux kernel to even attempt to fix the issue, not to mention that:

> Note I believe that this will not be a whole lot of work (but its not trivial
> either).

Maybe we have ... kernel developers who can do that instead, for instance lm-sensors maintainers. I don't know. I'm confused. I did my best to report the issue. Meanwhile I'll continue to use the hack since I want to monitor my HW right now - not a few years later when someone finally ventures to scratch the itch. Thank you very much ;-)
Comment 50 Hans de Goede 2021-03-22 11:06:10 UTC
> Maybe we have ... kernel developers who can do that instead

You now kernel developers are humans too, so they need to eat and sleep and stuff too. IOW they don't have unlimited time to spend on helping every Linux user out there without any compensation.

Maybe you have a friend with some kernel-development experience who can help. Or maybe you can find someone who you can pay to fix this for you?
Comment 51 Andy Shevchenko 2021-03-22 11:32:27 UTC
(In reply to Artem S. Tashkinov from comment #49)
> (In reply to Andy Shevchenko from comment #48)
> > Matthew gave a really good explanation on techical background what's going
> > on. What you really need is to amend existing driver(s) or provide a new
> one
> > to fulfill the functionality you want to have.
> 
> I'm not a programmer let alone a person who understand the innards of the
> Linux kernel to even attempt to fix the issue, not to mention that:
> 
> > Note I believe that this will not be a whole lot of work (but its not
> trivial
> > either).
> 
> Maybe we have ... kernel developers who can do that instead, for instance
> lm-sensors maintainers. I don't know. I'm confused. I did my best to report
> the issue. Meanwhile I'll continue to use the hack since I want to monitor
> my HW right now - not a few years later when someone finally ventures to
> scratch the itch. Thank you very much ;-)

Artem,
I feel your pain. Believe me, I have got into the similar situation(s) myself being actually a kernel developer! I'm often being frustrated, but that's how it works in Linux and in OSS in general. The root cause here is the production model used by world of Windows and world of Linux (and besides the downsides like above I prefer the latter). For Windows the drivers are made for *THE product* while in *nix world the drivers try to cover as many products as they can with regard to the similarities and compatibility of the corresponding IPs.
That's why people often see "oh, hey, it works in Windows!" Yes, it works, but if and only if you are using the very same *THE product*. Step right or left will be a suicidal in that model. The Windows model is very fragile because of this and requires 10x times more resources to develop the code. OSS community simply does not have such resources to fulfill a job and due to economical reasons even Micro$oft also found advantages in the OSS model (but not with the drivers, unfortunately). The best help for you and for the rest is to be on the constructive side. You see, you even may yourself to develop a solution and become (a well paid) kernel developer. Or just for fun (look at the example of Intel IPU3 CIO2 camera glue layer (to support Windows only platforms) which is done solely by one guy who declared that he even didn't know C programming language before!

So, please, do not blame people here, it's rather the problem of the model.
Comment 52 frostzeux 2021-03-22 11:35:45 UTC
"That's an effing horrible attitude", Artem (#c34).
*leaving this rant*
Comment 53 Hans de Goede 2021-03-22 14:31:29 UTC
I'm also removing myself from the Cc of this bug because the discussion here does not seem to be productive. If anyone wants to implement the solution which I outlined in command 47, drop me an email at hdegoede@redhat.com .
Comment 54 Mateusz Jończyk 2021-04-11 08:25:36 UTC
Hello,

I was doing some preparatory work to implement the solution in comment 47 - like analysis of source code.

Unfortunately, it seems like this solution would only work for ASUS boards. All the acpidump outputs in this ticket are from ASUS boards. The "RHWM" and "WHWM" methods are from an interface with UID="ASUSWMI", so they look to be asus-specific.

For ASUS boards there exists a better driver:

https://github.com/electrified/asus-wmi-sensors

so there is probably no reason to implement direct access to nct6775.

Are there any benefits from implementing access to nct6775 as outlined above?

Greetings,
Comment 55 Hans de Goede 2021-04-11 09:40:41 UTC
(In reply to Mateusz Jończyk from comment #54)
> For ASUS boards there exists a better driver:
> 
> https://github.com/electrified/asus-wmi-sensors

Interesting I wonder why that has not been submitted upstream. I'll open an issue at its github page for that.

> so there is probably no reason to implement direct access to nct6775.
> 
> Are there any benefits from implementing access to nct6775 as outlined above?

No, that was just meant as a possible solution for the reported problem. I agree that using the WMI interface, which presumably is what Asus' Windows tools use, is better.
Comment 56 Hans de Goede 2021-04-11 09:46:53 UTC
Hmm,

asus-wmi-sensors also is not such a great solution, it seems the WMI interface is buggy on some boards and causes fans to stop or get stuck at max speed, which is quite bad, see:

https://github.com/electrified/asus-wmi-sensors#known-issues

So it seems that the situation with sensors on these boards simply sucks and Asus is to blame here. If even the "official" method of accessing the sensors is buggy then Asus needs to get their firmware fixed and until that is done users are better of without sensors support.
Comment 57 Mateusz Jończyk 2021-04-11 10:18:05 UTC
(In reply to Hans de Goede from comment #56)
> Hmm,
> 
> asus-wmi-sensors also is not such a great solution, it seems the WMI
> interface is buggy on some boards and causes fans to stop or get stuck at
> max speed, which is quite bad, see:
> 
> https://github.com/electrified/asus-wmi-sensors#known-issues

IMHO, this could be caused by access races, not necessarily by a buggy BIOS. The driver may simply not implement correct synchronization methods. It may be necessary to call some ACPI / WMI methods before and after accessing the sensors to avoid resource conflicts.

As is written in the documentation:
> The more frequently the WMI interface is polled the greater the potential for
> this to happen.

I am also not sure if the driver implements correct locking behavior kernel-wise.
Comment 58 Hans de Goede 2021-04-11 10:27:11 UTC
(In reply to Mateusz Jończyk from comment #57)
> (In reply to Hans de Goede from comment #56)
> > Hmm,
> > 
> > asus-wmi-sensors also is not such a great solution, it seems the WMI
> > interface is buggy on some boards and causes fans to stop or get stuck at
> > max speed, which is quite bad, see:
> > 
> > https://github.com/electrified/asus-wmi-sensors#known-issues
> 
> IMHO, this could be caused by access races, not necessarily by a buggy BIOS.
> The driver may simply not implement correct synchronization methods. It may
> be necessary to call some ACPI / WMI methods before and after accessing the
> sensors to avoid resource conflicts.

Perhaps, but usually WMI methods take the locks which they need on entry and release them on exit. I'm not even sure if an ACPI method (which this ultimately is) can hold locks after it exits, I would not be surprised if all acquired locks are automatically dropped on exit from the interpreter.

Also note that the README also states that on some motherboards the problems are fixed in later BIOS versions, which also points to a race inside the AML code and not a bug in the driver.

> As is written in the documentation:
> > The more frequently the WMI interface is polled the greater the potential
> for
> > this to happen.
> 
> I am also not sure if the driver implements correct locking behavior
> kernel-wise.

I did not check, but this should not matter, that may mess up the driver's state, but the WMI code is expected to do its own locking at the AML level, to e.g. protect against similar accesses to the super IO through the ACPI thermal region interface.

Note I'm not claiming that this is definitely not an issue with the driver, it could be. But I've seen a lot of very buggy AML code and I've yet to find a single vendor which does not write very low quality AML code. It seems there is absolutely no code-review done on the AML code and very little QA.
Comment 59 Mateusz Jończyk 2021-04-11 10:30:27 UTC
The Asus X570-Plus TUF Gaming was described in this ticket as not working. It is listed as not supported by this driver on GitHub. So there are some devices without a working WMI interface that would benefit from the handling in comment 47.

> I agree that using the WMI interface, which presumably is what Asus' Windows
> tools use, is better.

It also does not require guessing voltage divider parameters, which makes raw access to nct6775 not that much useful.
Comment 60 Kamil Dudka 2021-04-11 11:20:19 UTC
asus-wmi-sensors was already mentioned in comment #39.  I tried it with ASUS PRIME B360-PLUS but no device was matched by the driver.  It could have been some user error though.
Comment 61 Artem S. Tashkinov 2021-04-12 12:39:57 UTC
(In reply to Matthew Garrett from comment #39)
> As noted in https://twitter.com/james_hilliard/status/1373178256615211012,
> there's actually a driver here:
> https://github.com/electrified/asus-wmi-sensors/ . I did a quick search
> earlier, but managed to miss this somehow.

From its description:

Note: X570/B550/TRX40 boards do not have the WMI interface and are not supported.
Comment 62 Kamil Dudka 2021-04-12 13:25:01 UTC
Yes, my board was neither listed as supported, nor as unsupported/unknown in the mentioned README file.
Comment 63 Sydney Meyer 2021-04-12 22:42:20 UTC
Hello all,

perhaps this is the wrong place to ask such a question, but after reading many sites on the interwebs about the issue, i am left with the impression that most people (me included) do not actually understand the implications introduced by turning on/off knobs like "acpi_enforce_resources=lax". Also, i read a lot, mostly unclear, comments about "hardware damage" and therefore would like to ask, what is actually the recommended way to go about this with the situation as it is now? Is this issue perhaps only relevant for manual fan control, because with or without "acpi_enforce_resources=lax" and the nct6775 kernel module loaded, the system appears to adjust the fan speed for the appropriate load either way and there aren't any noticable differences between CPU temps either. So i guess my question basically boils down to this: Is there actually something to worry about, apart from not beeing able to see/control fan speeds? I just have become a little worried now with all the contradictive information out there, also read (on phoronix) about this [1] and this [2] a few weeks ago. This is a Asus X570 Gaming-E Board with a Ryzen 5950X CPU. As a regular user, am i going to fry my little computer by running Linux on it?

I understand that nobody will guarantee anything, of course, i just felt this might be a good place for a qualified answer, because, obviously, i don't understand any of this low-level stuff.

Thanks a bunch.

[1] Linux 5.11 Drops AMD Zen Voltage/Current Reporting Over Lack Of Documentation 
https://www.phoronix.com/scan.php?page=news_item&px=Linux-5.11-Drops-k10temp-V-C
[2] AMD Ryzen 5000 Temperature Monitoring Support Sent In For Linux 5.12
https://www.phoronix.com/scan.php?page=news_item&px=Zen-3-Desktop-CPU-k10temp
Comment 64 Hans de Goede 2021-04-13 06:11:25 UTC
Sydney, I understand that all the discussion can be somewhat confusing.

It should be perfectly safe to run Linux on your computer (but as you said no there is no warranty), by default Windows also does not come with any software to monitor the nct6775 sensors. So when installing Linux without making any changes your computer will run the same way as with a pristine (no extra sw installed) windows install.

Under Linux you will even be able to monitor the CPU temperature using the CPU's builtin temp-sensors. What does not work is monitoring other temperatures, voltages and fan-speeds. Nor controlling fan-speeds.  But typically a modern motherboard will automatically control the CPU fan speed based on temperature, without needing the OS to do anything; Also most users typically use their computer for other things then to monitor the computers temps and voltages.

Matthew rightly advises against using "acpi_enforce_resources=lax" because that opens races between the firmware and Linux which could result in writing to another superIO register then intended. This can definitely lead to e.g. stopping the fans even though the CPU is running hot, which is not good but all modern CPUs have builtin overtemp protection, so at the worst the system will simply shutdown (1). 

Theoretically this could also lead to worse outcomes, such us changes your CPU or RAM voltage which could damage your hardware. I am aware of at least one semi-related case where RAM got seriously overvolted damaging both the RAM and the CPU, this was not with a Super-IO solution though, but with I2C attached sensor probing.

1) Repeatedly overheating your CPU to where it automatically shuts down is not good for your CPU's health though and will likely shorten its lifetime.

TL;DR: Don't use "acpi_enforce_resources=lax", otherwise running Linux should be safe and everything should work fine.
Comment 65 Sydney Meyer 2021-04-13 22:04:13 UTC
Hello Hans,

thank you for taking the time to answer my question.

Your analogy to a (if there is such a thing) "pristine" ~20GB Windows installation makes indeed sense and i imagined something like this already without understanding it properly, but it is indeed reassuring, hearing this from someone who has a much, much better understanding of the subsystem at hand.

Personally, i don't have any need for monitoring voltages or manually adjusting fan curves, etc.

Also, over the past ~15 years, i have not once been let down by following kernel developers advice, because even if i did not fully understand the issue or even just at hindsight, there has always been deductive, and like Artem has ascertained, sometimes frustrating, yes, but always deductive reasoning behing the decisions and defaults, like it appears to be the case (lack of documentation and/or vendor support) here. A kind of established trust, really. And even if incorrect, _always_ with best knowledge and conscience. IMO, this is all a user can ask for, and unfortunately, albeit mostly in the commercial SW ecosystem, not a given anymore. I would trade these virtues for warranty any day.

TL;DR: Much thanks for answering the question this detailed. Flatter/sweet-talk. Will link your post for people with similar concerns. Big thanks again and have a nice week, Hans et al.
Comment 66 Hans de Goede 2021-04-14 07:58:21 UTC
Sydney, thank you for your kind words, you have put a smile on my face, so thank you.
Comment 67 Artem S. Tashkinov 2021-04-15 09:27:53 UTC
(In reply to Hans de Goede from comment #64)
> 
> Matthew rightly advises against using "acpi_enforce_resources=lax" because
> that opens races between the firmware and Linux which could result in
> writing to another superIO register then intended. This can definitely lead
> to e.g. stopping the fans even though the CPU is running hot, which is not
> good but all modern CPUs have builtin overtemp protection, so at the worst
> the system will simply shutdown (1). 
> 

Multiple users use acpi_enforce_resources=lax and I haven't seen a single report that it's ever broken anything.

AFAIK no one has used this hack to control fans using PWM, so that might indeed lead to unintended consequences.
Comment 68 myhateisblind 2021-04-15 09:30:04 UTC
I use it for that, and had no problem... yet.

15 abr. 2021 11:27:55 bugzilla-daemon@bugzilla.kernel.org:

> https://bugzilla.kernel.org/show_bug.cgi?id=204807
> 
> --- Comment #67 from Artem S. Tashkinov (aros@gmx.com) ---
> (In reply to Hans de Goede from comment #64)
>> 
>> Matthew rightly advises against using "acpi_enforce_resources=lax" because
>> that opens races between the firmware and Linux which could result in
>> writing to another superIO register then intended. This can definitely lead
>> to e.g. stopping the fans even though the CPU is running hot, which is not
>> good but all modern CPUs have builtin overtemp protection, so at the worst
>> the system will simply shutdown (1).
>> 
> 
> Multiple users use acpi_enforce_resources=lax and I haven't seen a single
> report that it's ever broken anything.
> 
> AFAIK no one has used this hack to control fans using PWM, so that might
> indeed
> lead to unintended consequences.
> 
> -- 
> You may reply to this email to add a comment.
> 
> You are receiving this mail because:
> You are on the CC list for the bug.
Comment 69 Hans de Goede 2021-04-15 09:39:48 UTC
(In reply to Artem S. Tashkinov from comment #67)
> Multiple users use acpi_enforce_resources=lax and I haven't seen a single
> report that it's ever broken anything.

<sigh> Yet I have been on the receiving end of a bug-report where I had to explain to a user that the lm_sensors sensors-detect script had overvolted his RAM ruining both his expensive high-end RAM as well as his expensive top of the line CPU. The user was surprisingly relaxed about all this, which I really appreciated.

And that was while the script was not doing anything which we (the developers) considered dangerous. But the motherboard had a funky setup causing a SMbus *read* transaction to change the voltage.

Mucking with this stuff can be dangerous and as Matthew has explained in his thorough analysis of the DSDT the DSDT is actually accessing the superio and if that races with a Linux kernel access a wrong register may be read from, or worse written to.

Using acpi_enforce_resources=lax simply is dangerous and we are not going to change the default, period, full-stop.

I welcome further discussions here about how we can *safely* solve hwmon access on various motherboards.

Please stop discussing acpi_enforce_resources=lax, that is not a safe option to use and more discussion about it is not productive.
Comment 70 doomwarriorx 2021-04-21 17:09:13 UTC
Created attachment 296451 [details]
acpidump for Pro B550-C

can confirm the issue with ASUS System Product Name/Pro B550M-C, BIOS 0214 10/22/2020

The bug still exists if asus_wmi & eeepc_wmi is blacklisted. Does the acpi_wmi still claim the address space even if no consumer/driver is available?

Note You need to log in before you can comment on or make changes to this bug.