Bug 198953

Summary: hci_bcm: Streamline runtime PM code change for 4.16 kernel breaks bluetooth on ASUS T100TA
Product: Drivers Reporter: Robert R. Howell (rhowell)
Component: BluetoothAssignee: linux-bluetooth (linux-bluetooth)
Status: RESOLVED CODE_FIX    
Severity: normal CC: jbMacBrodie, jwrdegoede, regressions
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.16-rc3 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: Patch to add ASUS T100TAM and Toshiba Encore device info to hci_bcm.c
Patch to avoid the need for DMI quirks to force Interrupt resources as active-low
Patch adding BCM2E84 HID, this is a pre-requisite for the Do-not-tie-GPIO-pin-order patch
Patch: "hci_bcm: Do not tie GPIO pin order to a specific ACPI HID"
acpidump for ASUS T100TA
acpidump for ASUS T100TAM
acpidump for Toshiba Encore (original 8 inch tablet model)
Add Toshiba Encore BCM2E38 to latest version of acpi_match code
Patch: Bluetooth: hci_bcm: Add 6 new ACPI HIDs
acpidump for T100CHI
brcmfmac43241b4-sdio.txt (nvram) file from Toshiba Encore
[PATCH] brcmfmac: Add support for getting nvram contents from EFI variables

Description Robert R. Howell 2018-03-01 07:25:24 UTC
Created attachment 274481 [details]
Patch to add ASUS T100TAM and Toshiba Encore device info to hci_bcm.c

The patch "Bluetooth: hci_bcm: Streamline runtime PM code" by Lukas Wunner, posted at <https://www.spinics.net/lists/linux-bluetooth/msg73529.html>, and incorporated in 4.16-rc3 (and probably rc1 and 2) breaks bluetooth on the ASUS T100TA, T100TAM, and Toshiba Encore computers.  All are Baytrail machines.  Reverting that single patch fixes the problem.  

Bluetooth comes up OK initially, but after about 5 seconds of use seems to die.  Turning power to the bluetooth controller off then on, either in bluetoothctl or in via a system GUI, fixes the problem for about another 5 seconds, but then bluetooth dies again.  The problem happens both with a Logitech MX2 mouse and a bluetooth keyboard.  No error messages seem to be produced in dmesg or elsewhere, but the system stops responding to the mouse or keyboard.

I've also experimented with reverting the series of 13 later patches also be Lukas Wunner posted at <https://www.spinics.net/lists/linux-bluetooth/msg73757.html> but those do NOT seem to have any effect one way or the other on the problem.  The single critical patch is the one listed above, and luckily it can be reverted independently of the other 13.

While bluetooth has been working (before 4.16) on the T100TA with the stock kernel, to make it work for the T100TAM and the Encore I've always needed to add them to the device quirk table (which already includes the T100TA), and for the Encore I need to add the BCM2E38 device ID to the ACPI table.  I've attached a patch which adds that T100TAM and Encore information.  If there is any way it could be incorporated in the mainline code, that would be great.  In any case, you'll need to apply it to the 4.16-rc3 stock hci_bcm.c file if you do want to test for this bug with the T100TAM or the Encore.  You don't need it to test with the T100TA.
Comment 1 jbMacAZ 2018-03-05 03:56:22 UTC
I also see this bluetooth regression also on the Asus T100CHI starting with 4.16-rc1.  Reverting the "streamline runtime PM code" patch also fixes bluetooth on the T100CHI.

Thanks for finding that.

Another approach to the quirk is to change the T100TA entry to 
-			DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "T100TA"),
+			DMI_MATCH(DMI_PRODUCT_NAME, "T100TA"),
as proposed by Hans de Goede.  This would cover the T100TA/TAS/TAM/TAF...
Comment 2 The Linux kernel's regression tracker (Thorsten Leemhuis) 2018-03-05 14:45:24 UTC
I added this to the list or regressions of Linux 4.16, but I wonder if the developers in question are aware of it, as many subsystems don't look much at it. Hence:

@Robert: Did you tell Lukas about the regression by mail? 

@jbMacAZ: Is Hans already working on a quirk?
Comment 3 jbMacAZ 2018-03-05 20:49:22 UTC
(In reply to Thorsten Leemhuis from comment #2)
> I added this to the list or regressions of Linux 4.16, but I wonder if the
> developers in question are aware of it, as many subsystems don't look much
> at it. Hence:
> 
> @jbMacAZ: Is Hans already working on a quirk?

Hans proposed the edit on the T100/Ubuntu G+ group.  I've built several T100 custom kernels with the proposed edit, but I have not had explicit feedback.  Hans has not been given any feedback on his proposal.  It seems to work fine though.
Comment 4 Robert R. Howell 2018-03-12 07:16:08 UTC
Thorsten:  I did manage to contact Lukas by email after seeing your comment and I've been testing some debugging suggestions from him on my T100.  We've partially isolated the cause of the problem.  I'll post an update here once we do a few more tests.

@jbMacAZ:  Thanks for the suggestion about using DMI_MATCH.
Comment 5 Hans de Goede 2018-03-15 07:58:53 UTC
Hi All,

So as discussed on the bluetooth list I've hit the same problem of the bcm bluetooth device getting stuck in suspend. My solution is a simple revert of the offending commit, as instead of being just a simple cleanup it has unintended and unwanted side-effects. Anyways lets discuss that further on the list.

My main reason for replying here is because of the adding of extra dmi matches for active-low interrupts. I've come up with an alternative approach which actually removes DMI matches, since I believe that we simply need to always treat the IRQ as active-low if it is declared with an "Interrupt" ACPI resource (the other way is with a "GpioInt" resource).

I'll attach my patch for this, if you can test this on the machines needing an extra DMI match that would be great. I would also appreciate it if you can run:

acpidump -o acpidump.vendorname-modelname

And attach the generated acpidump.vendorname-modelname file here.

Regards,

Hans
Comment 6 Hans de Goede 2018-03-15 08:03:39 UTC
Hi Again,

I also see that you need to add a new BCM2E## ACPI HID for one machine. I'm also working on a way to make that easier. Specifically I've a patch removing the need to specify the GPIO order in the id table, replacing this with code to actually have the driver figure this out itself. My main reason for this is that I do not believe that there is a 1:1 relationship between ACPI HID and the resource order and so far we're just lucky that our 1:1 coupling of these work, also see:

https://fedorapeople.org/~jwrdegoede/bcm-bt-stuff

Which is an overview of info about bcm bt devices which I've extracted from the ACPI tables from over 50 different laptop models.

I'll attach this patch for testing too.

Regards,

Hans
Comment 7 Hans de Goede 2018-03-15 08:07:54 UTC
Created attachment 274739 [details]
Patch to avoid the need for DMI quirks to force Interrupt resources as active-low
Comment 8 Hans de Goede 2018-03-15 08:08:46 UTC
Created attachment 274741 [details]
Patch adding BCM2E84 HID, this is a pre-requisite for the Do-not-tie-GPIO-pin-order patch
Comment 9 Hans de Goede 2018-03-15 08:09:27 UTC
Created attachment 274743 [details]
Patch: "hci_bcm: Do not tie GPIO pin order to a specific ACPI HID"
Comment 10 Robert R. Howell 2018-03-15 17:06:47 UTC
Hi Hans

I'm about to attach the acpidump files you requested from the three types of Baytrail machines I have available:  an ASUS-T100TA, an ASUS-T100TAM, and a Toshiba-Encore.  The Toshiba is their original 8 inch model Baytrail tablet, not the latter 10 inch version.

I'm in the process of testing your patches on the above machines and will post another comment with the results from that shortly.
Comment 11 Robert R. Howell 2018-03-15 17:08:28 UTC
Created attachment 274759 [details]
acpidump for ASUS T100TA
Comment 12 Robert R. Howell 2018-03-15 17:09:29 UTC
Created attachment 274761 [details]
acpidump for ASUS T100TAM
Comment 13 Robert R. Howell 2018-03-15 17:10:36 UTC
Created attachment 274763 [details]
acpidump for Toshiba Encore  (original 8 inch tablet model)
Comment 14 Robert R. Howell 2018-03-15 17:21:26 UTC
Hi Hans

I've tested your patches on all three machines mentioned above (The ASUS T100TA, ASUS T100TAM, and Toshiba Encore) and Bluetooth seems to be working well on all of them.

I started with the 4.16-rc5 kernel, reverted the "Streamline runtime PM code" patch, applied your three patches, then applied one patch of my own (which I'll attach in a minute) which simply adds the Encore BCM2E38 ID to those listed in your third patch.  (I also then add some other necessary Baytrail patches but those don't involve Bluetooth.)

From dmesg I do see the notice your code outputs about the "usually wrong" interrupt polarity for all three devices.  And as far as I can tell in about 1/2 hour of testing, Bluetooth is working well.  I don't see any unusual Bluetooth messages in the dmesg output either.

Thanks for your work on these Baytrail devices.  Let me know if there is anything else I can do.

Bob Howell
Comment 15 Robert R. Howell 2018-03-15 17:23:47 UTC
Created attachment 274765 [details]
Add Toshiba Encore BCM2E38 to latest version of acpi_match code
Comment 16 Hans de Goede 2018-03-15 18:25:24 UTC
Hi Bob,

Thank you for testing this.

As patch of a patch series I've prepared I'm planning on submitting a patch adding a bunch of new ids to the driver. I'll attach my current version. Is it ok if I add:

Reported-and-tested-by: Robert R. Howell <redacted@redacted.dom>

To the commit message for this patch, to credit you for your work on this?

Normally adding a new id like this would be a good candidate for your first kernel patch submission, but since I'm currently doing a bunch of changes to the id table that is not really convenient in this case.

My plan is to submit the irq polarity patch + the id patch which I'm attaching (which still uses the old hardwired gpio ordering stuff) upstream first, with a Cc: stable to get these fixes added to older kernels and then send my other changes as a series on top.

Regards,

Hans
Comment 17 Hans de Goede 2018-03-15 18:26:07 UTC
Created attachment 274767 [details]
Patch: Bluetooth: hci_bcm: Add 6 new ACPI HIDs
Comment 18 Hans de Goede 2018-03-15 18:31:48 UTC
p.s.

Do you have a git repo or some such with the other Bay Trail patches you use somewhere? As part of my ongoing Bay and Cherry Trail support work I'm trying to get all patches from various places, such as:

https://github.com/burzumishi/linux-baytrail-flexx10/tree/master/kernel/patches/v4.8
https://github.com/muhviehstah/linux-baytrail411/commits/master
https://github.com/jfwells/linux-asus-t100ta

Upstream, so I'm always interested in what patches people use / need. Perhaps there is even a patch in there which would be a good candidate for your first upstream patch submission (you seem to know your way around the kernel, so I believe you would make a good kernel dev).

Anyways this is offtopic for this bug, so please drop me a mail at hdegoede@redhat.com about the the other Bay Trail patches you use (or file a new bug for them).
Comment 19 Robert R. Howell 2018-03-15 19:47:59 UTC
Hi Hans

Adding the "Reported-and-tested-by" tag would be fine.  

While I've been building kernels with a few special patches for my Baytrail devices for the last couple years I'm just starting to learn how git is used with the main repositories, and how to post patches and commits to the mailing lists.  So I'm more than happy to have you add the Encore information in your patch.

I don't yet have a public repository for the patches I use -- they are mostly from the files at the Ubuntu ASUS T100 group at <https://plus.google.com/communities/117853703024346186936>, with a few extra of my own.  But I'm certainly happy to share those.  Let me think about the best way to do that and I'll get back to you with a direct email on that.
Comment 20 jbMacAZ 2018-03-15 20:01:45 UTC
Created attachment 274769 [details]
acpidump for T100CHI

Attached is the acpidump for the T100CHI.  I've applied the three patches to 4.16-5c5 (the revert is still applied.)  Bluetooth still works on the T100CHI through at least one BT timeout cycle.

Also, if there are other helpful patches for the T100*, please share them at the T100/Ubuntu group!  It can take months for accepted patches to actually get to the mainstream.

Thanks,
John
Comment 21 Hans de Goede 2018-03-16 10:36:01 UTC
p.s.

Thanks for the apcidumps too.

Question can you attach the nvram file (/lib/firmware/brcm/brcmfmac43####.txt)
you are using on the Toshiba encore ?  I'm wondering what the crystal frequency there is (the xtalfreq=37400 / xtalfreq=26000 line in the textfile), because the loaded patchram file needs to match the frequency.
Comment 22 Robert R. Howell 2018-03-18 03:49:15 UTC
Hi Hans

I'll attach the brcmfmac43241b4-sdio.txt file I'm using for the Toshiba Encore in just a minute.  There was some confusion in my few year old notes about where I obtained the version I was using, so to eliminate that confusion that I made a new copy from the contents of /sys/firmware/efi/efivars/nvram... on the Encore and tested it to be sure WiFi and Bluetooth were still working OK.  That is the version I'm attaching.  It does show that the Encore uses xtalfreq=37400, like the T100's.

Bob
Comment 23 Robert R. Howell 2018-03-18 03:51:05 UTC
Created attachment 274797 [details]
brcmfmac43241b4-sdio.txt (nvram) file from Toshiba Encore
Comment 24 Hans de Goede 2018-03-18 16:54:23 UTC
Ah, the Encore uses the nvram efivar, good. I'm working together with upstream to directly get the data from the efivar for laptops like these, so that users no longer need to manually copy it.

And thanks for attaching it.
Comment 25 Hans de Goede 2018-03-18 18:43:06 UTC
Created attachment 274803 [details]
[PATCH] brcmfmac: Add support for getting nvram contents from EFI variables

Not really relevant to this bug, but FYI here is the patch to make the kernel get the nvram settings for the wifi bits directly from the efi variable.
Comment 26 jbMacAZ 2018-03-20 07:45:50 UTC
(In reply to Hans de Goede from comment #25)
> Created attachment 274803 [details]
> [PATCH] brcmfmac: Add support for getting nvram contents from EFI variables
> 
> Not really relevant to this bug, but FYI here is the patch to make the
> kernel get the nvram settings for the wifi bits directly from the efi
> variable.

The new patch works for T100CHI (4.16-rc6).  I renamed the brcmfmac43241b4-sdio.txt, rebooted and wifi still worked.
Comment 27 Robert R. Howell 2018-03-22 16:31:24 UTC
Hi Hans

I've tested your new NVRAM patch on my T100TA, T100TAM, and Toshiba Encore and it works on all three.  If the ...sdio.txt file is present it uses that, otherwise it successfully gets the information from NVRAM.

There was one side effect you may want to know about.  Originally I had my kernel built with the CONFIG_FW_LOADER_USER_HELPER_FALLBACK option set.  It turns out that when that option is set your NVRAM patch causes an additional 60 second delay in WiFi coming up, as the system waits for some timeout to occur during the "firmware" load.  That is in addition to another 60 second delay which was introduced by CLM related changes in kernel 4.15, as I mentioned in my other bug report at <https://bugzilla.kernel.org/show_bug.cgi?id=198089>.  But both 60 second delays disappear when I build a kernel without CONFIG_FW_LOADER_USER_HELPER_FALLBACK being set, and apparently in most distributions, that option is NOT set.

In testing that NVRAM patch I discovered another issue with the brcmfmac driver, in that after unloading it with "modprobe -r brcmfmac", reloading it with "modprobe brcmfmac" often fails.  However while your patch MAY affect the details of what happens, sort of problem seems to be present at least from 4.15 onward, and perhaps earlier.  The problem may also be intermittent, so sorting this out has become difficult.  With your patch in use, the failure usually occurs as follows.  But something like this (perhaps with slightly different messages) often goes wrong without the NVRAM patch too.  The only way to clear this error is to reboot, so that also makes debugging difficult.  I'll let you know if I can sort out the pattern.

[  135.464064] brcmfmac: brcmf_fw_map_chip_to_name: using brcm/brcmfmac43241b4-sdio.bin for chip 0x004324(17188) rev 0x000005
[  135.464167] usbcore: registered new interface driver brcmfmac
[  140.718849] brcmfmac: brcmf_sdio_bus_stop: Failed to force clock for F2: err -16

Since in normal use I typically don't need to unload and reload the driver, usually this bug doesn't cause a problem.
Comment 28 Robert R. Howell 2018-03-27 15:07:36 UTC
The original "Bluetooth: hci_bcm: Streamline runtime PM code" patch was reverted in commit b09c61522c81886c34966825f9e5afcbfafac446 which is now incorporated in 4.16-rc7.  I've tested rc7 on my ASUS T100TA and T100TAM machines, and this does fix the problem.  When I apply the other device-id specific patches which have always been needed for my Toshiba Encore, the revert also fixes Bluetooth on that machine.