Bug 201427
Summary: | Kernel ignores BIOS setting to NOT use Airplane Switch | ||
---|---|---|---|
Product: | Drivers | Reporter: | Chris (chrisharff) |
Component: | Platform_x86 | Assignee: | drivers_platform_x86 (drivers_platform_x86) |
Status: | NEW --- | ||
Severity: | normal | CC: | fin4478, gabriele.mzt, lenb, pali, pepijndevos, superm1, superm1 |
Priority: | P1 | ||
Hardware: | Intel | ||
OS: | Linux | ||
Kernel Version: | 4.15.0-36-generic | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
debugfs output for all possible states of ethernet and wifi switch
Git bisect log of /drivers/platfrom/x86/dell* possible patch to fix issue Patch for v4.15 (The kernel version on Ubuntu 18.04 LTS) |
Description
Chris
2018-10-14 19:37:15 UTC
I meant to also mention that this happens with the current Linux Mint "live DVD" (using the 4.15 kernel). That is the same as what I have installed on my HDD, except without any possible customizations by me. You can make a custom kernel where you have disabled the driver that is for your laptop buttons. To build a faster and stable custom kernel, install: sudo build-essential kernel-package qt5-default qt5-qmake qtbase5-dev qtbase5-dev-tools bison flex libelf-dev libssl-dev pkg-config . Download the kernel source from kernel.org. If you have AMD graphics, download the AMD drm-next-4.21-wip kernel source. The kernel configuration file of Debian Official kernel are available in /boot, named after the kernel release. Copy the .config file to the linux directory. Connect all your devices and run the command: make localmodconfig. Create a custom Debian kernel package: export CONCURRENCY_LEVEL=2 or use -j 2 with make-kpkg (use number of threads in your cpu) fakeroot make-kpkg --initrd kernel_image Add kernel_headers to the fakeroot command if you need headers. Install the kernel package with Gdebi. To make a custom kernel to boot, add a line to /etc/initramfs-tools/modules: unix And run: sudo update-initramfs Reboot. Thanks, but each time I upgrade the kernel on my machine I would need to compile it like this, so at best it is temporary solution. So it remains broken across the board. I can't believe I am the only one who finds this annoying, this is a very common computer. Created attachment 281895 [details]
debugfs output for all possible states of ethernet and wifi switch
I have my bios set up to ignore the switch but disable wifi when Ethernet is connected.
It seems this behaviour is controlled by `static int dell_rfkill_set(void *data, bool blocked)` in `/drivers/platform/x86/dell-laptop.c` so I added the authors to the CC list. For me it works perfectly for disabling wifi when I connect ethernet as set in my bios, but it indeed does not ignore the wifi switch. What raises my suspicion is that status and hwswitch_state are always identical, while their bits seem to have different meaning and they come from a different location. Is this normal? I would be willing to try to fix this issue myself. I'm proficient with C but new to kernel development, so I would appreciate if someone could point me in the right direction. The switch is controlled by dell-rbtn. On my laptop, the WiFi button is entirely controlled by the BIOS when this module is not available, on some other systems WiFi buttons/switches stop working altogether. So maybe try to blacklist/unload the module and see if anything changes. This driver has been available for quite some time and I don't see any significant change near release 4.13, both in dell-laptop and dell-rbtn, so I don't know why you are having issues only now. I wanted to add that the laptops of the Latitude and Precision series have a fallback mechanism to handle the WiFi switch/button which is used in case dell-rbtn is not available. This fallback mechanism is actually all there was before dell-rbtn superseded it. You can force this fallback mechanism by removing/unloading/blacklisting dell-rbtn. If we find that without dell-rbtn everything works as you expect, then one possibility is that your distro started to include dell-rbtn only recently. If this is not the case, then we need to find other causes of this unwanted behavior. On my laptop, disabling dell_rbtn does not disable the wifi switch. Disabling dell_laptop does disable the switch, but freezes my laptop after a minute or so. My suspicion is still directed at hwswitch_state, which read identical to status. This means that "Hardware switch supported" in status also controls "Wifi controlled by switch". I'm pretty sure the value for hwswitch_state is bogus. Status says "WiGig supported" is 0, but "WiGig controlled by switch" is 1 in hwswitch_state because it is just reading the status value. I tried to read the code where hwswitch_state comes from, but it just puts some opaque constants in dell_fill_request and dell_send_request. Is there documentation on these methods? In case these come directly from the bios, this might point to a bios bug? I did update my bios at some point, not sure when the switch started acting up. (In reply to Pepijn de Vos from comment #8) > Disabling dell_laptop does disable the switch, but freezes my laptop after a > minute or so. I don't think this should happen. > My suspicion is still directed at hwswitch_state, which read identical to > status. This means that "Hardware switch supported" in status also controls > "Wifi controlled by switch". > > I'm pretty sure the value for hwswitch_state is bogus. Status says "WiGig > supported" is 0, but "WiGig controlled by switch" is 1 in hwswitch_state > because it is just reading the status value. > > I tried to read the code where hwswitch_state comes from, but it just puts > some opaque constants in dell_fill_request and dell_send_request. Is there > documentation on these methods? Most of the information were taken from libsmbios that DELL released, as also mentioned in the header of the source file. Part of the comments of libsmbios are also available in dell-laptop.c. dell_send_request() uses SMM under the hood, so the data should come from the BIOS. > In case these come directly from the bios, this might point to a bios bug? I > did update my bios at some point, not sure when the switch started acting up. I don't know if the issue is due to some BIOS update or something else, but that's definitely a possibility. However, this means we have two different bugs, because Chris observed a regression, since everything works as expected when using an older kernel. 'git bisect' can be very handy in situations like this one. A BIOS update can definitely cause issues, but I can't say if it's the actual source of the issue. Thanks for looking into this, as it seems as it would otherwise be a persistent thing going into the future. This was repeatable for me, between the 4.10 kernel and 4.13 kernel (started in one of the 4.13 intermediate point releases). For positive test results, try switching between 4.10 and maybe 4.15 just to be sure. Easiest way to demonstrate this is to just boot up using the ISO for Linux Mint 18.3. This natively uses the 4.10 kernel, before any kernel changes or upgrades, and the BIOS function to disable the HW switch works as expected. https://linuxmint.com/edition.php?id=246 With Linux Mint 18.3 installed, I could break/fix this functionality just by switching between different kernel versions used. It actually became broken somewhere in the middle of the 4.13 point releases. Afterwards all kernel releases and point releases exhibited that behavior. I am now using Linux Mint 19.1 and it only allows for 4.15 or 4.18 kernel versions, so it is always broken now. For me this also worked at some previous time, could be the upgrade from Ubuntu 16.04 to 18.04. I'll try the Mint ISO when I have time. The freeze is something completely unrelated that also happens in Windows when I wake from hibernate, seemingly related to graphics drivers... But it definitely appears dell_laptop is controlling more than just a switch. Chris, could you share the output of the debugfs for the working kernel version? In your initial post I see the same thing, where hwswitch_state==status. I wonder if this is also the case for the working kernel. Okay, here is the same info for the bootable USB "live session" (uninstalled) for Linux Mint 18.3, as per my last post: mint@mint ~ $ inxi -Fxz System: Host: mint Kernel: 4.10.0-38-generic x86_64 (64 bit gcc: 5.4.0) Desktop: Cinnamon 3.6.6 (Gtk 3.18.9-1ubuntu3.3) Distro: Linux Mint 18.3 Sylvia Machine: System: Dell (portable) product: Latitude E6420 v: 01 Mobo: Dell model: 0X8R3Y Bios: Dell v: A25 date: 03/06/2018 CPU: Dual core Intel Core i7-2640M (-MCP-) cache: 4096 KB flags: (lm nx sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx) bmips: 11174 clock speeds: max: 3500 MHz 1: 1040 MHz 2: 3139 MHz Graphics: Card: Intel 2nd Generation Core Processor Family Integrated Graphics Controller bus-ID: 00:02.0 Display Server: X.Org 1.18.4 drivers: intel (unloaded: fbdev,vesa) Resolution: 1600x900@60.00hz GLX Renderer: Mesa DRI Intel Sandybridge Mobile GLX Version: 3.0 Mesa 17.0.7 Direct Rendering: Yes Audio: Card Intel 6 Series/C200 Series Family High Definition Audio Controller driver: snd_hda_intel bus-ID: 00:1b.0 Sound: Advanced Linux Sound Architecture v: k4.10.0-38-generic Network: Card-1: Intel 82579LM Gigabit Network Connection driver: e1000e v: 3.2.6-k port: 3080 bus-ID: 00:19.0 IF: enp0s25 state: down mac: <filter> Card-2: Intel Centrino Advanced-N 6205 [Taylor Peak] driver: iwlwifi bus-ID: 02:00.0 IF: wlp2s0 state: down mac: <filter> Drives: HDD Total Size: 287.3GB (2.7% used) ID-1: /dev/sda model: Samsung_SSD_860 size: 256.1GB temp: 0C ID-2: USB /dev/sdb model: Cruzer_Glide size: 15.6GB temp: 0C ID-3: USB /dev/sdc model: Cruzer_Glide size: 15.7GB temp: 0C Partition: ID-1: swap-1 size: 2.56GB used: 0.00GB (0%) fs: swap dev: /dev/sda2 RAID: No RAID devices: /proc/mdstat, md_mod kernel module present Sensors: System Temperatures: cpu: 65.0C mobo: N/A Fan Speeds (in rpm): cpu: N/A Info: Processes: 192 Uptime: 1 min Memory: 564.7/7860.9MB Init: systemd runlevel: 5 Gcc sys: 5.4.0 Client: Shell (bash 4.3.481) inxi: 2.2.35 mint@mint ~ $ ------------------------------------ /sys/kernel/debug/dell_laptop/rfkill/sys/kernel/debug/dell_laptop/rfkill return: 0 status: 0x1011D Bit 0 : Hardware switch supported: 1 Bit 1 : Wifi locator supported: 0 Bit 2 : Wifi is supported: 1 Bit 3 : Bluetooth is supported: 1 Bit 4 : WWAN is supported: 1 Bit 5 : Wireless keyboard supported: 0 Bit 6 : UWB supported: 0 Bit 7 : WiGig supported: 0 Bit 8 : Wifi is installed: 1 Bit 9 : Bluetooth is installed: 0 Bit 10: WWAN is installed: 0 Bit 11: UWB installed: 0 Bit 12: WiGig installed: 0 Bit 16: Hardware switch is on: 1 Bit 17: Wifi is blocked: 0 Bit 18: Bluetooth is blocked: 0 Bit 19: WWAN is blocked: 0 Bit 20: UWB is blocked: 0 Bit 21: WiGig is blocked: 0 hwswitch_return: 0 hwswitch_state: 0x8 Bit 0 : Wifi controlled by switch: 0 Bit 1 : Bluetooth controlled by switch: 0 Bit 2 : WWAN controlled by switch: 0 Bit 3 : UWB controlled by switch: 1 Bit 4 : WiGig controlled by switch: 0 Bit 7 : Wireless switch config locked: 0 Bit 8 : Wifi locator enabled: 0 Bit 15: Wifi locator setting locked: 0 --------------------------------------- /sys/kernel/debug/iwlwifi/0000:02:00.0/trans/rfkill [file doesn't exist in directory for USB drive 18.3 "live session"] Chris Right, so working kernels have working hwswitch_state. Can confirm that 4.13 works for me too. I've successfully compiled my first kernel from source. Currently bisecting through it to find the offending commit. Only a dozen steps left, so bear with me... Created attachment 282031 [details]
Git bisect log of /drivers/platfrom/x86/dell*
I bisected my way from 4.13 to 4.15 selecting only the files related to the dell platform.
I ended up at commit 5246741a3f2e0285394cf74f3105cb252b8f38ad that seems kinda trivial and unrelated that just moves an allocation around.
Worth noting is that for some revisions the file /sys/kernel/debug/dell_laptop/rfkill did not exist or returned a "no such device" error. In these cases I tested with the switch itself rather than with the debug output. This means that I may have bisected my way to another unrelated bug. It could be the module just failed and the fallback mechanism was used.
I can either try again with "git bisect skip" in cases where the debugfs fails, or widen the scope to include other files, because I'm not 100% sure the problem is in the dell platform files either.
I had a similar thought - if this only applies to just my generation of the Dell Latitude, the E6420 and related units (E6400, E6500 series, etc), or perhaps even later models which may have more significant changes. Back around 2017 or so, libSMBIOS was a package from Dell which needed to be compiled and installed. I wanted to extend the time on for the keyboard backlight beyond 30 seconds or so. These days it is the same as the monitor settings. This HW switch setting at least seems to be a regression of some sort. Thanks for your time... Commit 5246741a3f2e0285394cf74f3105cb252b8f38ad is right after a major refactoring of the driver (https://www.spinics.net/lists/platform-driver-x86/msg13672.html). That buffer is the one used to send the smbios request, so either the request was failing and you were getting out of dell_setup_rfkill() with an error or you were getting garbage in return. This would explain why you were not seeing the rfkill file. You are right. I tried again using "git bisect skip", and I narrowed it down to a range of commits just before that fix. This was a lengthy process because it more or less degraded to a linear search. There are only 'skip'ped commits left to test. The first bad commit could be any of: f2645fa317b8905b8934f06a0601d5b7fa66aba0 1f8543a5d602b816b9b64a62cafd6caae2af4ca6 ce7ff1cffdaf82354aca5f4c8691e5c85474fbde 307ab2a99d190d3a7949258b8551b66887ce8cf4 da1f607ed6e6a904463396bb6a28bf96584c61cc 1a258e670434f404a4500b65ba1afea2c2b29bba 8b9528a6d9a901b9f933231505fef5630e80ce5a 549b4930f057658dc50d8010e66219233119a4d8 868b8d33f91e431b1961a35baa6b5022639067f3 5246741a3f2e0285394cf74f3105cb252b8f38ad We cannot bisect more! Then I did a magic trick, I used "git bisect log", edited out all the skips and replayed the bisection. But I cherry-picked the fix from 5246741a3f2e0285394cf74f3105cb252b8f38ad onto the broken commits, allowing me to narrow down the real culprit. Ladies, gentlemen, and other beings, without further ado, the offending commit is 549b4930f057658dc50d8010e66219233119a4d8 platform/x86: dell-smbios: Introduce dispatcher for SMM calls Now I badly need some sleep, but the next step is figuring out what's actually wrong. It'd be so cool to land a patch in the kernel. I'm completely new to kernel debugging though. Where is my gdb and printf?? Any guidance appreciated. Created attachment 282045 [details]
possible patch to fix issue
Can you see if this patch helps to fix the issue for you?
I believe there might have been a logic error when the driver was converted in that commit.
Looking at the commit with fresh eyes, this seems indeed the error. I tried your patch and it works! I'm glad we solved the problem, but honestly a bit salty because I was so excited to debug and fix my first kernel bug. So close... When you send that patch to LKML, please add Fixes: keyword with commit which broke driver. So patch would be propagated to -stable releases. Anyway, nice catch! You can add my Acked-by. Created attachment 282047 [details]
Patch for v4.15 (The kernel version on Ubuntu 18.04 LTS)
Thanks guys, glad to hear the good news. I've sent it to the ML with these extra comments Pali. https://lore.kernel.org/lkml/1553696734-31282-1-git-send-email-mario.limonciello@dell.com/T/#u Thanks to all. That was the first time I reported a kernel-level issue. Pepijn, I will try your patched kernel. What is the expected path forward from here? It seems there has not been much action recently. |