Bug 219514 - PC does not resume from suspend, bisect points to btusb/mediatek
Summary: PC does not resume from suspend, bisect points to btusb/mediatek
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Bluetooth (show other bugs)
Hardware: All Linux
: P3 normal
Assignee: linux-bluetooth@vger.kernel.org
URL:
Keywords:
: 219290 (view as bug list)
Depends on:
Blocks:
 
Reported: 2024-11-19 19:43 UTC by Tony Houghton
Modified: 2025-01-09 18:15 UTC (History)
14 users (show)

See Also:
Kernel Version:
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg shortly after booting with the "bad" kernel (112.36 KB, text/plain)
2024-11-19 19:43 UTC, Tony Houghton
Details
dmesg shortly after booting, then successfully suspending and resuming with the "good" kernel (121.86 KB, text/plain)
2024-11-19 19:45 UTC, Tony Houghton
Details

Description Tony Houghton 2024-11-19 19:43:46 UTC
Created attachment 307246 [details]
dmesg shortly after booting with the "bad" kernel

In recent kernels my system keeps hanging when trying to resume from suspend. The fans power up, as do my keyboard LEDs, but the displays stay off and pressing Caps Lock does not toggle its LED. This means I can't get a log when the problem occurs.

My motherboard is an MSI MAG X670E TOMAHAWK WIFI with an onboard Mediatek wireless device, and I'm using Arch Linux.

lspci: 0f:00.0 Network controller: MEDIATEK Corp. MT7922 802.11ax PCI Express Wireless Network Adapter
lsusb: 0e8d:0616 MediaTek Inc. Wireless_Device

git bisect points to ceac1cb0259de682d78f5c784ef8e0b13022e9d9 as the first bad commit. It's been difficult to pinpoint it, because the bug isn't 100% consistent, but this commit has exhibited the bug every time I've tried it, and subsequent revisions nearly always do. I've tested the previous commit, 6dc22ab9f085ae165e4ce89d61fb426f94e8a969, several times, and it's successfully resumed every time.
Comment 1 Tony Houghton 2024-11-19 19:45:10 UTC
Created attachment 307247 [details]
dmesg shortly after booting, then successfully suspending and resuming with the "good" kernel
Comment 2 Tony Houghton 2024-12-15 21:06:05 UTC
I've copied btusb.c and btmtk.c from 6dc22ab9f085 to a checkout of 6.13-rc2 and changed a few lines to make it compatible with some things that have changed since then:

diff --git a/drivers/bluetooth/btmtk.c b/drivers/bluetooth/btmtk.c
index fe3b892f6c6e..9eeddbb7d991 100644
--- a/drivers/bluetooth/btmtk.c
+++ b/drivers/bluetooth/btmtk.c
@@ -6,7 +6,7 @@
 #include <linux/firmware.h>
 #include <linux/usb.h>
 #include <linux/iopoll.h>
-#include <asm/unaligned.h>
+#include <linux/unaligned.h>
 
 #include <net/bluetooth/bluetooth.h>
 #include <net/bluetooth/hci_core.h>
diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c
index 034256c399dd..0e5cc454e2f9 100644
--- a/drivers/bluetooth/btusb.c
+++ b/drivers/bluetooth/btusb.c
@@ -17,7 +17,7 @@
 #include <linux/suspend.h>
 #include <linux/gpio/consumer.h>
 #include <linux/debugfs.h>
-#include <asm/unaligned.h>
+#include <linux/unaligned.h>
 
 #include <net/bluetooth/bluetooth.h>
 #include <net/bluetooth/hci_core.h>
@@ -3887,8 +3887,8 @@ static int btusb_probe(struct usb_interface *intf,
        if (id->driver_info & BTUSB_WIDEBAND_SPEECH)
                set_bit(HCI_QUIRK_WIDEBAND_SPEECH_SUPPORTED, &hdev->quirks);
 
-       if (id->driver_info & BTUSB_VALID_LE_STATES)
-               set_bit(HCI_QUIRK_VALID_LE_STATES, &hdev->quirks);
+       if (!(id->driver_info & BTUSB_VALID_LE_STATES))
+               set_bit(HCI_QUIRK_BROKEN_LE_STATES, &hdev->quirks);
 
        if (id->driver_info & BTUSB_DIGIANSWER) {
                data->cmdreq_type = USB_TYPE_VENDOR;

Some guesswork was involved, but it seems to work for me. I'd like to try to get to the bottom of the issue so I don't have to keep patching my kernel. Are there any options I could try?

I've got plenty of experience with C, but not with the kernel, so if you could give me some guidance such as a summary of what changed in ceac1cb0259d, what code paths are taken during suspend/resume and any code tweaks I can try, it would be much appreciated.
Comment 4 Olivier Croquette 2025-01-02 23:55:07 UTC
I also bisected this and came to commit d019930b0049fc2648a6b279893d8ad330596e81, which is in the same area. I also found similar reports:

https://bugzilla.kernel.org/show_bug.cgi?id=219290
https://bugzilla.redhat.com/show_bug.cgi?id=2314036
https://bbs.archlinux.org/viewtopic.php?id=295916
https://bbs.archlinux.org/viewtopic.php?id=299987
https://discussion.fedoraproject.org/t/kernel-6-11-3-200-fc40-unable-to-resume-from-suspend-when-bluetooth-enabled/134008/10
https://discussion.fedoraproject.org/t/system-cannot-wake-up/134199/39

A workaround that works for me and does not require to patch the kernel is the following service:

# /etc/systemd/system/bt-fix.service
# 
# Author: Bojan Kseneman
# https://discussion.fedoraproject.org/t/kernel-6-11-3-200-fc40-unable-to-resume-from-suspend-when-bluetooth-enabled/134008/17

[Unit]
Description=Disable Bluetooth before going to sleep
Before=sleep.target
StopWhenUnneeded=yes

[Service]
Type=oneshot
RemainAfterExit=yes

ExecStart=/usr/sbin/rfkill block bluetooth
ExecStop=/usr/sbin/rfkill unblock bluetooth

[Install]
WantedBy=sleep.target



I also wrote on the linux-bluetooth mailing list, with stack trace from a kernel oops:

https://lore.kernel.org/linux-bluetooth/073c3b772abe84d480913495eea0c4da73607d6e.camel@croquette.de/T/#u
Comment 5 Artem S. Tashkinov 2025-01-04 05:36:43 UTC
*** Bug 219290 has been marked as a duplicate of this bug. ***
Comment 6 Olivier Croquette 2025-01-04 06:48:30 UTC
Quick update: the workaround with rfill does not seem to work reliably for me. I am left with not suspending, or using an old kernel.
Comment 7 kernel-7xes 2025-01-04 15:27:53 UTC
Since Bug 219290 is marked as duplicate of this bug, I would like to mention that I needed to rkfill block wifi too, to fix resume from hibernate as rfkill block bluetooth only fixed resume from suspend.
Comment 8 Bojan Kseneman 2025-01-06 08:53:05 UTC
If you also need to kill wifi you should replace the "rfkill block bluetooth" with "rfkill block all" in the script, it should kill both BT & Wifi

Anyway, I've installed kernel 6.12.8 today and disabled the service and cannot reproduce this issue anymore. Can someone else also confirm this?
Comment 9 Olivier Croquette 2025-01-06 10:18:25 UTC
Thank you Bojan. I now do this to suspend:

sudo rfkill block all && /usr/bin/systemctl suspend

And after waking up:

sudo rfkill unblock all

So far, it worked twice in a row. Let’s see!
Comment 10 Bojan Kseneman 2025-01-06 13:09:06 UTC
Yes in both use "all" instead of "bluetooth" however you don't need to call `/usr/bin/systemctl suspend`as the service is hooked to sleep.target anyway.
Comment 11 Olivier Croquette 2025-01-06 14:41:23 UTC
I just tested with 6.12.8 and was able to resume from suspend 3 times in a row. That looks good. However it took a long time to get the BT devices to work again. From dmesg:

[  263.253098] PM: suspend exit
[  263.340683] pci_bus 0000:03: Allocating resources
[  263.348625] pci_bus 0000:03: Allocating resources
[  263.365801] r8169 0000:0b:00.0 enp11s0: Link is Down
[  263.386884] Bluetooth: hci0: HW/SW Version: 0x008a008a, Build Time: 20241106163512
[  263.395274] RTL8226B_RTL8221B 2.5Gbps PHY r8169-0-b00:00: attached PHY driver (mii_bus:phy_addr=r8169-0-b00:00, irq=MAC)
[  263.575549] r8169 0000:0b:00.0 enp11s0: Link is Down
[  266.524999] r8169 0000:0b:00.0 enp11s0: Link is Up - 1Gbps/Full - flow control rx/tx
[  284.169397] Bluetooth: hci0: Device setup in 20437679 usecs
[  284.169402] Bluetooth: hci0: HCI Enhanced Setup Synchronous Connection command is advertised, but not supported.
[  284.444262] Bluetooth: hci0: AOSP extensions version v1.00
[  284.444270] Bluetooth: hci0: AOSP quality report is supported
[  284.444433] Bluetooth: MGMT ver 1.23
[  292.361312] input: Logitech Wireless Mouse MX Master 3 as /devices/virtual/misc/uhid/.../input/input24
[  292.361454] logitech-hidpp-device input,hidraw4: BLUETOOTH HID v0.15 Keyboard [Logitech Wireless Mouse MX Master 3]
[  292.387566] logitech-hidpp-device: HID++ 4.5 device connected.


So it takes around 20 seconds to setup hci0, and 8 more seconds to find the mouse (maybe because it was in sleep mode too though).

When I resume I see a black screen for a long time, maybe the 20 seconds.
Comment 12 Bojan Kseneman 2025-01-07 09:05:28 UTC
That's odd, it seems to work normaly on MT9222. All devices take about 3s for me, but again, I don't have a bluetooth mouse

[ 1557.845962] PM: resume devices took 3.106 seconds
Comment 14 Olivier Croquette 2025-01-09 18:15:25 UTC
Thanks for the info, this is the commit then:

commit f5c5661f02b5539d88aea8497f8d0835d165e945
Author: Chris Lu <chris.lu@mediatek.com>
Date:   Mon Sep 23 16:47:05 2024 +0800

    Bluetooth: btusb: mediatek: change the conditions for ISO interface
    
    commit defc33b5541e0a7e45cc2d99d72fbe80a597afc5 upstream.
    
    Change conditions for Bluetooth driver claiming and releasing usb
    ISO interface for MediaTek ISO data transmission.
    
    Signed-off-by: Chris Lu <chris.lu@mediatek.com>
    Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
    Cc: Fedor Pchelkin <boddah8794@gmail.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

One more discussion about this issue:
https://www.reddit.com/r/Fedora/comments/1hv281d/mt7922_no_longer_causes_kernel_panic_on_resume/

Note You need to log in before you can comment on or make changes to this bug.