Bug 189641

Summary: Intel bluetooth 7260 B3/B4 crashing revisited
Product: Drivers Reporter: dflogeras2
Component: BluetoothAssignee: linux-bluetooth (linux-bluetooth)
Status: NEW ---    
Severity: normal CC: agurenko, jaya.p.g
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.8.11 Subsystem:
Regression: No Bisected commit-id:
Attachments: syslog events around bluetooth lock up

Description dflogeras2 2016-12-04 12:59:06 UTC
I am seeing an issue that looks very much like this one:

https://bugzilla.kernel.org/show_bug.cgi?id=120161

I only recently started using bluetooth a fair amount on this hardware (Sony Vaio Pro 13, Intel 7260 wifi/bluetooth).

I am running the firmware that the linked bug says fixes the issue and am sure it is the one being loaded:

[  481.856049] Bluetooth: hci0: read Intel version: 370710018002030d00
[  481.867852] Bluetooth: hci0: Intel Bluetooth firmware file: intel/ibt-hw-37.7.10-fw-1.80.2.3.d.bseq
[  482.000054] Bluetooth: hci0: Intel Bluetooth firmware patch completed and activated

However, after pairing my BT mouse and using it for a few minutes, I get the same types of message:

[34740.224713] Bluetooth: hci0 command 0x0c1a tx timeout

After which bluetooth is unusable.  I seem to be able to reset the hardware with a rfkill command:

0: sony-wifi: Wireless LAN
        Soft blocked: no
        Hard blocked: no
1: sony-bluetooth: Bluetooth
        Soft blocked: no
        Hard blocked: no
4: phy0: Wireless LAN
        Soft blocked: no
        Hard blocked: no
65: nfc0: NFC
        Soft blocked: no
        Hard blocked: no
66: hci0: Bluetooth
        Soft blocked: no
        Hard blocked: no

So toggling "rfkill block 1" then "rfkill unblock 1" seems to reset the device.

I am running kernel 4.8.11 presently, compiled from source (Gentoo)
Comment 1 dflogeras2 2016-12-04 13:01:06 UTC
Sorry, I forgot to paste this (I realize the firmware filenames don't indicate their exact version)

cksum /lib/firmware/intel/ibt-hw-37.7.10-fw-1.80.2.3.d.bseq
2724065437 25627 /lib/firmware/intel/ibt-hw-37.7.10-fw-1.80.2.3.d.bseq
Comment 2 dflogeras2 2016-12-04 19:59:21 UTC
Another bit of information I have noticed and is possibly helpful, the issue only seems to present itself after a suspend/resume cycle.  After that, it usually occurs within 5 minutes of resuming the laptop.  Once it does, and I force the BT device to reset with the rfkill trick in comment #1, it seems fairly solid after many hours of use.
Comment 3 G Jaya Praveen 2016-12-05 05:52:01 UTC
Hi,

Can you please check the FW Version and let us know to debug the issue.

Please give, 
#sudo hcitool cmd 3f 05

Provide the reponse event for the cmd above.

Thanks,
Jaya Praveen
Comment 4 G Jaya Praveen 2016-12-05 05:54:42 UTC
The latest FW on Intel 7260 B5/B6 is below.

https://git.kernel.org/cgit/linux/kernel/git/firmware/linux-firmware.git/commit/?id=fb328183532ffc7a1a54e84fb53467e275a95b79

Hope this is the Firmware getting used. Please check and reply.
Comment 5 dflogeras2 2016-12-05 11:34:52 UTC
Hello

hcitool cmd 3f 05
01 05 FC 00 37 07 10 01 80 02 03 0D 56

I may be misunderstanding something, but my device loads the B3/B4 variant of the firmware ibt-hw-37.7.10-fw-1.80.2.3.d.bseq

The latest version of that is from Jun 02 2016.

I have the B5/B6 firmwares in /lib/firmware/intel, but the kernel chooses the one stated above for my device.  Please let me know if this is incorrect, and/or how to change it.
Comment 6 G Jaya Praveen 2016-12-05 12:17:16 UTC
Hi,

Got it, The BT card is Intel 7260 B3 and as you said the last released version is Patch :86.

Seems to be a controller issue. Please check to provide the Sniffer logs for the issue is possible from your side, so that this can analyzed.
If not possible, please reply, so that we will try from our side.


Thanks,
Jaya Praveen G
Comment 7 dflogeras2 2016-12-05 12:32:09 UTC
I'm happy to provide a "Sniffer" log, but I do not know how.  If you can point me at documentation to capture I'd gladly try to grab the events of the failure.

I assume you'd like some output from hcidump, but if you tell me the exact invocation that would be most useful to you it would probably save some back and forth.
Comment 8 G Jaya Praveen 2016-12-05 13:19:28 UTC
Hi,

Can you please mention the steps and HID Mouse used to see the issue.

I had tried to reproduce the issue with the same patch and Microsoft Touch Scuplt Mouse (Br/EDR) and I don't see the issue.

Please provide the steps, so that I can reproduce and get the logs from here.

THanks
Comment 9 dflogeras2 2016-12-05 15:50:57 UTC
        Name: Microsoft Sculpt Comfort Mouse
        Alias: Microsoft Sculpt Comfort Mouse
        Class: 0x000580
        Icon: input-mouse
        Paired: yes
        Trusted: yes
        Blocked: no
        Connected: yes
        LegacyPairing: no
        UUID: Service Discovery Serve.. (00001000-0000-1000-8000-00805f9b34fb)
        UUID: Human Interface Device... (00001124-0000-1000-8000-00805f9b34fb)
        UUID: PnP Information           (00001200-0000-1000-8000-00805f9b34fb)
        Modalias: usb:v045Ep07A2d0129

It is not always easily reproducible.  As far as I can tell, usually when it happens the following events leads up to it:

- I am using my laptop, with the BT mouse paired
- I suspend my laptop, while the mouse is still paired
- I turn off the physical power switch on my BT mouse (this may be crucial)
- Later on, I resume my laptop
- Due to a KDE bug, I have to enable my BT adapter (it starts powered, but in the hciconfig DOWN state).  I don't know if this has anything to do with it, if I remember correctly the BT controller would lock up even when KDE resumed with the bluetooth stack UP.
- After that, I turn on the BT mouse power switch
- It pairs and begins to work
- After a few minutes the BT controller locks up
- I can reset the device doing rfkill block/unblock as stated in comment #1
- After that, the BT re-pairs and works consistently (without resetting the BT mouse, I just wait a few seconds and it re-connects)


Usually in my dmesg I see messages such as the following, correlating to the BT controller lockup:
[34740.224713] Bluetooth: hci0 command 0x0c1a tx timeout
[34742.272545] Bluetooth: hci0 command 0x0406 tx timeout
[34745.472341] Bluetooth: hci0 command 0xfc3f tx timeout

[44095.674692] Bluetooth: hci0 command 0x0c03 tx timeout
[44120.249020] Bluetooth: hci0 command 0x0c03 tx timeout

[101098.110912] Bluetooth: hci0 command 0x0804 tx timeout
[101100.158788] Bluetooth: hci0 command 0x041f tx timeout
[101102.206688] Bluetooth: hci0 command 0x0406 tx timeout

[101274.035219] Bluetooth: hci0 command 0x0c03 tx timeout

(Note from the timings, you can see that these are from multiple failures)



Also note, that when I connect the mouse I see the following (unknown main item tag 0x0).  This happens even when the controller is working without lockups so may be unrelated:

[113184.095301] hid-generic 0005:045E:07A2.0019: unknown main item tag 0x0
[113184.095518] input: Microsoft Sculpt Comfort Mouse as /devices/pci0000:00/0000:00:14.0/usb2/2-6/2-6:1.0/bluetooth/hci0/hci0:256/0005:045E:07A2.0019/input/input38
[113184.096339] hid-generic 0005:045E:07A2.0019: input,hidraw1: BLUETOOTH HID v1.29 Mouse [Microsoft Sculpt Comfort Mouse] on 5c:51:4f:0a:9f:8c
Comment 10 dflogeras2 2016-12-05 15:56:54 UTC
https://www.microsoftstore.com/store/msusa/en_US/pdp/Microsoft-Sculpt-Comfort-Mouse/productID.282133900

This is the mouse, mine was purchased in 2013-14 so who knows if the firmware has been changed appreciably since for that model.
Comment 11 G Jaya Praveen 2016-12-07 05:01:25 UTC
Hi,

I tried to reproduce the issue the whole day, am not seeing this issue.
Maynot be Intel controller issue, should be something wrong with remote BTOE (HID MOuse) or with the Vendor Laptop (Sony Viao).
Please check if you can provide a Sniffer Logs for us to analyze. (Bluetooth Ellisys Analyser Logs).
Comment 12 dflogeras2 2016-12-07 12:35:49 UTC
Hi Jaya,

that's unfortunate and thank you for trying to reproduce.

Unfortunately I don't have access to a Ellisys Analyser, and as an end user I'm not about to go buy one :)

The best I can do is offer to send my BT mouse to you for testing if Intel wants to pay the shipping (and your time).
Comment 13 dflogeras2 2016-12-10 03:05:40 UTC
On the same laptop, same software, same kernel, I popped in a cheap ebay bluetooth USB dongle that I borrowed from a colleague.

Bus 002 Device 011: ID 0a12:0001 Cambridge Silicon Radio, Ltd Bluetooth Dongle (HCI mode)


And using the same BT mouse for the last couple days it works seemingly solid.

I still see the messages like:
[158540.771907] hid-generic 0005:045E:07A2.0022: unknown main item tag 0x0

whenever I connect the mouse, but I have not see any of the "tx timeout" messages like I see with the 7260 controller.

I'm not sure if this proves or disproves anything, I just thought I'd add it, as it is information.  I'll keep an eye one it, but I really don't want to sacrifice one of the USB ports (the vaio only has 2).  This is the reason I am trying to use a BT mouse vs. a generic "wireless" mouse requiring a dongle.
Comment 14 dflogeras2 2016-12-11 23:42:26 UTC
I am also noticing that the lock-up is correlated to kernel messages like:

[21351.684277] usb 2-6: reset full-speed USB device number 5 using xhci_hcd

I'm not sure what would cause the kernel to reset the BT controller while it is being used.
Comment 15 G Jaya Praveen 2016-12-12 11:30:25 UTC
Hi,

Maybe can you please provide us the syslogs to see if we cannot debug the issue.
syslogs can be found in /var/logs/syslog.

This is just a try, as without our HID, we don't see the issue.
Comment 16 G Jaya Praveen 2016-12-12 11:32:33 UTC
And which is the KDE Bug that you are mentioning about, as the bt adapter shouldn't be turned off when you do suspend/resume cycle.
Please provide us the KDE bug info.
Comment 17 dflogeras2 2016-12-12 12:14:24 UTC
Created attachment 247451 [details]
syslog events around bluetooth lock up

I have not yet chased down the KDE bug I mentioned.  However I think it is unrelated, for the last several days I have been testing with a new (clean) user account which bluetooth always starts powered up after a resume, and I am still seeing the bluetooth lockup bug.

I'm attaching a system log dump from system resume this morning.  Here are some highlights:

Dec 12 07:28:19 myhostname kernel: usb 2-6:1.0: rebind failed: -517
Dec 12 07:28:19 myhostname kernel: usb 2-6:1.1: rebind failed: -517

I'm not sure what this means, but usb 2-6 is my 7260 bluetooth device.


At 07:28:41 my networking is up, VPN connected, and bluetooth mouse turned on and working.

A few minutes later (07:33:02) the BT controller gets reset without any user intervention that I am aware of, I was just clearing emails.  After this, you can see at 07:33:21 the USB disconnect, which is a direct result from me issuing a "rfkill block" / "rfkill unblock" to manually toggle the device.
Comment 18 dflogeras2 2017-01-22 19:53:14 UTC
Hi again,  I may have some more insight.

After revisiting this issue, and googling a bit more, it seems to be related to USB autosuspend.

When I resume from suspend, the device is in power/control=auto.  In this state, the lock-up can occur.  If I rfkill block/unblock, the device ends up in the power/control=on state and seems to work indefinitely (until the next suspend or reboot).

I've also confirmed that if I resume, and manually do:
echo on | sudo tee /sys/bus/usb/devices/X-Y/power/control

It seems to work indefinitely as well.  Hope that helps.