Bug 215347

Summary: btintel: AX200 (8087:0029): hci crashes and can't recover after repeated rfkill on/off
Product: Drivers Reporter: Hui Wang (hui.wang)
Component: BluetoothAssignee: linux-bluetooth (linux-bluetooth)
Status: RESOLVED CODE_FIX    
Severity: blocking CC: luiz.dentz, tedd.an
Priority: P1    
Hardware: Intel   
OS: Linux   
Kernel Version: 5.16-rc5 Subsystem:
Regression: No Bisected commit-id:
Attachments: testing script
picture of gnome bluetooth setting
dmesg of the kernel bluetooth-next
btmon log

Description Hui Wang 2021-12-17 11:42:45 UTC
Created attachment 300051 [details]
testing script

We have a couple of Lenovo machines which have Bluetooth hci (8087:0029), If we run a script like below:
for a in {1..100}
do
    echo $a
    rfkill block $HCI0_ID
    sleep 1
    rfkill unblock $HCI0_ID
    sleep 1
done

The kernel will print "hci0: Failed to read MSFT supported features (-110)" after several round of rfkill block/unblock, and the bluetooth driver can't work anymore from that moment. Sometimes 'rmmod btusb;modprobe btusb" could make the bluetooth driver work again, sometimes need to reboot.

We already updated the kernel and linux-firmware to the latest, but still could reproduce above issue.

And we run the same test on other machines with different Bluetooth hci like 8087:0026, they all work well with the testing script.

The 8087:0029 hci loads the intel/ibt-20-1-3.sfi, we suspect this is an issue on the bluetooth firmware ibt-20-1-3.sfi, since 8087:0026 shares the same kernel driver with 0029, but only 0029 has this issue.


And the testing script is attached.
Comment 1 Tedd An 2021-12-18 00:02:34 UTC
Hi Hui Wang,

What's the kernel version?
I tried with the tip of the bluetooth-next(https://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git/) and didn't see the issue.

Could you capture the btmon log and dmesg log?

Regards,
Comment 2 Hui Wang 2021-12-18 06:12:45 UTC
Hi Tedd,

Forgot to mention, before running the bttest.sh, need to open the gnmoe bluetooth setting, so the testing steps are:
- boot the system
- open a terminal and run 'dmesg -w'
- run gnome-settings->bluetooth (please see the attached picture)
- run bttest.sh
Comment 3 Hui Wang 2021-12-18 06:13:51 UTC
Created attachment 300055 [details]
picture of gnome bluetooth setting
Comment 4 Hui Wang 2021-12-18 06:14:45 UTC
Created attachment 300057 [details]
dmesg of the kernel bluetooth-next
Comment 5 Hui Wang 2021-12-18 06:15:09 UTC
Created attachment 300059 [details]
btmon log
Comment 6 Hui Wang 2021-12-18 06:36:22 UTC
And if the desktop is not gnome, we could replace "gnome-settings->bluetooth" with a command-line "bluetoothctl scan on", so the test steps are:
- boot the system
- open a terminal and run 'dmesg -w'
- open another terminal and run 'bluetoothctl scan on' and wait for 5 seconds
- run bttest.sh
Comment 7 Hui Wang 2021-12-18 06:46:20 UTC
(In reply to Hui Wang from comment #6)
> And if the desktop is not gnome, we could replace
> "gnome-settings->bluetooth" with a command-line "bluetoothctl scan on", so
> the test steps are:
> - boot the system
> - open a terminal and run 'dmesg -w'
> - open another terminal and run 'bluetoothctl scan on' and wait for 5 seconds
> - run bttest.sh

Please ignore this comment, "bluetoothctl scan on" doesn't help reproduce the crashing.
Comment 8 Tedd An 2021-12-22 22:56:34 UTC
It seems like kernel oops in the dmesg and it may causes the issue.

[   46.401807] ------------[ cut here ]------------
[   46.401808] kernel BUG at net/core/skbuff.c:113!
[   46.401819] invalid opcode: 0000 [#1] PREEMPT SMP
[   46.401826] CPU: 7 PID: 142 Comm: kworker/u33:0 Not tainted 5.16.0-rc1+ #1
[   46.401831] Hardware name: LENOVO 20U8S0PQ00/20U8S0PQ00, BIOS R19ET36W (1.20 ) 07/12/2021
[   46.401835] Workqueue: hci0 hci_rx_work [bluetooth]
[   46.401890] RIP: 0010:skb_panic+0x4c/0x4e
[   46.401899] Code: 4f 70 50 8b 87 bc 00 00 00 50 8b 87 b8 00 00 00 50 ff b7 c8 00 00 00 4c 8b 8f c0 00 00 00 48 c7 c7 e8 00 a7 98 e8 46 11 fb ff <0f> 0b 48 8b 55 08 48 c7 c1 60 b1 71 98 e8 a2 ff ff ff 48 c7 c6 b0
[   46.401903] RSP: 0018:ffffa22980aa3c60 EFLAGS: 00010246
[   46.401908] RAX: 0000000000000088 RBX: 0000000000000000 RCX: 0000000000000000
[   46.401911] RDX: 0000000000000000 RSI: ffffffff989a0551 RDI: 00000000ffffffff
[   46.401914] RBP: ffffa22980aa3c80 R08: 0000000000000000 R09: ffffa22980aa3a50
[   46.401917] R10: 0000000000000001 R11: 0000000000000001 R12: 00000000000000a5
[   46.401919] R13: 00000000000000a5 R14: ffff9409c4e7c300 R15: 00000000000000aa
[   46.401923] FS:  0000000000000000(0000) GS:ffff940ccefc0000(0000) knlGS:0000000000000000
[   46.401927] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   46.401930] CR2: 00007fae7c01b398 CR3: 000000040c20f000 CR4: 0000000000350ee0
[   46.401933] Call Trace:
[   46.401937]  <TASK>
[   46.401942]  skb_put.cold+0x10/0x10
[   46.401947]  mgmt_device_found+0x3c8/0x4c0 [bluetooth]
[   46.402004]  hci_extended_inquiry_result_evt+0x162/0x230 [bluetooth]
[   46.402054]  hci_event_packet+0x2cd/0x4b0 [bluetooth]
[   46.402099]  ? hci_remote_oob_data_request_evt+0x1f0/0x1f0 [bluetooth]
[   46.402144]  hci_rx_work+0x1b2/0x3d0 [bluetooth]
[   46.402190]  ? queue_delayed_work_on+0x3d/0x50
[   46.402198]  process_one_work+0x1ec/0x3c0
[   46.402204]  worker_thread+0x4d/0x3d0
[   46.402210]  kthread+0x17a/0x1a0
[   46.402214]  ? process_one_work+0x3c0/0x3c0
[   46.402218]  ? set_kthread_struct+0x40/0x40
[   46.402223]  ret_from_fork+0x22/0x30
[   46.402232]  </TASK>
[   46.402234] Modules linked in: rfcomm bnep nls_iso8859_1 intel_rapl_msr intel_rapl_common snd_ctl_led snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel edac_mce_amd snd_intel_dspcfg snd_intel_sdw_acpi uvcvideo snd_hda_codec videobuf2_vmalloc snd_hda_core videobuf2_memops btusb snd_hwdep videobuf2_v4l2 btrtl kvm_amd btbcm snd_pcm input_leds btintel videobuf2_common kvm snd_seq_midi btmtk snd_seq_midi_event bluetooth videodev snd_rawmidi joydev iwlmvm ecdh_generic think_lmi thinkpad_acpi mc rapl ecc serio_raw firmware_attributes_class efi_pstore wmi_bmof snd_seq nvram mac80211 platform_profile libarc4 ledtrig_audio snd_seq_device iwlwifi snd_timer snd_rn_pci_acp3x ucsi_acpi ipmi_devintf snd typec_ucsi snd_pci_acp3x cfg80211 k10temp ipmi_msghandler soundcore typec mac_hid sch_fq_codel msr parport_pc ppdev lp parport sunrpc ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq
[   46.402343]  libcrc32c raid1 raid0 multipath linear dm_mirror dm_region_hash dm_log amdgpu iommu_v2 gpu_sched i2c_algo_bit drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect rtsx_pci_sdmmc sysimgblt fb_sys_fops crc32c_intel psmouse cec rc_core drm xhci_pci i2c_piix4 xhci_pci_renesas nvme rtsx_pci r8169 realtek nvme_core wmi video i2c_scmi
[   46.402425] ---[ end trace e9aa209601e9071e ]---
Comment 9 Luiz Von Dentz 2021-12-22 23:45:43 UTC
This is like fixed by:

https://lore.kernel.org/linux-bluetooth/20211213212650.2067066-1-luiz.dentz@gmail.com/T/#t

It was fixup on the original patch and force pushed to bluetooth-next yesterday.
Comment 10 Hui Wang 2021-12-23 05:28:21 UTC
@Tedd and Luiz,


Got it, will re-test with the latest bluetooth-next. And we didn't experience the kernel oops with ubuntu kernel 5.13 and 5.14, this kernel oops only happen in the bluetooth-next kernel, so I guess the root cause is not the kernel oops.

Anyway, I will try to catch the log without the kernel oops (with the fix of comment 9).

Thanks.
Comment 11 Hui Wang 2021-12-23 13:42:33 UTC
Indeed, if kernel oops doesn't happen, this issue can't be reproduced with bluetooth-next kernel anymore.

I will bisect kernel to find the fixing commit/commits for this bug, then close this bug.

Thanks.
Comment 12 Hui Wang 2021-12-25 01:32:32 UTC
Through bisecting, I found the commit cf75ad8b41d2 ("Bluetooth: hci_sync: Convert MGMT_SET_POWERED" fixed the bug I reported.

The problem is the commit has 10+ depending commits, it is not easy to bakcport the fix to 5.14 and 5.15 kernels.

@Tedd and Luiz,

Do you have plan to backport the fix to stable 5.15 kernel at least?
Comment 13 Hui Wang 2022-01-20 01:22:09 UTC
Since the problem is fixed in the latest kernel. close this bug now.