Bug 196941
Summary: | [iwlwifi] Temperature sensor giving I/O error when WiFi is disabled with rfkill | ||
---|---|---|---|
Product: | Drivers | Reporter: | Armin K. (krejzi) |
Component: | network-wireless | Assignee: | DO NOT USE - assign "network-wireless-intel" component instead (linuxwifi) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | luca |
Priority: | P1 | ||
Hardware: | Intel | ||
OS: | Linux | ||
Kernel Version: | 4.13.2 | Subsystem: | |
Regression: | No | Bisected commit-id: |
Description
Armin K.
2017-09-14 09:46:06 UTC
The iwlwifi driver cannot provide the temperature reading when the interface is down (which is the case with rfkill). The temperature is read by the firmware, which is only loaded and running when the interface is up. We can't support this, so the userspace needs to be able to take care of it. As I said, other hardware drivers support this scenario quite perfect: Don't try to read anything if the hardware is powered down. This is breaking userspace, I don't think Linus would be happy with it. It can never have worked with the userspace as is, because this has never changed in the iwlwifi driver. This issue has been discussed before and this was the conclusion we came to. What do you want the iwlwifi driver to do? Lie about the temperature? This is what amdgpu does: amdgpu-pci-0100 Adapter: PCI adapter fan1: N/A temp1: N/A (crit = +0.0°C, hyst = +0.0°C) As in, returns nothing. I/O error is problematic in the OP, as it tries to access a disabled device's registers/whatever. It already displays N/A. So, amdgpu returns 0 as the temperature when the device is not on? 0 is a valid temperature (though unlikely) and there is nothing in the thermal framework indicating that the temperature should be set to 0 when it cannot be read. The iwlwifi driver registers a thermal zone and the only way to read the temperature from it is by calling the get_temperature op. This op can return a temperature and/or an error. There's no such thing as "returns nothing". Unfortunately the thermal framework doesn't seem to specify these cases. I've finally found the amdgpu kernel code for reading temperature. In case of hardware being disabled, they return -EINVAL, which lm_sensors seems to parse correctly. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c#n821 I think that is incorrect. -EINVAL is returned when the user space sends an invalid argument in a request, which is not the case here. In any case, if the userspace understands -EINVAL as N/A, it should also treat -EIO as such. Maybe -EIO is a bit harsh in iwlwifi, but at most I would change it to -ENODATA. -EINVAL would be wrong. -ENODATA sounds good and maybe even makes more sense. And do you think the userspace you treat this error more gracefully than the current -EINVAL? The userspace code is libsensors, right? Okay, I checked the libsensors code and it seems to treat *only* -EIO differently[1]: int sensors_read_sysfs_attr(const sensors_chip_name *name, const sensors_subfeature *subfeature, double *value) { [...] if (res == EOF) { if (errno == EIO) return -SENSORS_ERR_IO; else return -SENSORS_ERR_ACCESS_R; } [...] } I have no idea why it does this, though... I don't see anything related to that in the documentation[2]. [1] https://github.com/groeck/lm-sensors/blob/dfa4518962c81cf1a21471cef958aa9e67a63b68/lib/sysfs.c#L874 [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/thermal/sysfs-api.txt Maybe they treat -EIO as "device (driver) broken"? Okay, I'm reopening and I'll provide this fix. Okay, I made the change in our internal tree and it will eventually reach the mainline, following our normal upstreaming process. The fix has been submitted upstream and will soon reach the mainline and stable releases. |