Bug 199231 - swiotlb: coherent allocation failed in iwlwifi
Summary: swiotlb: coherent allocation failed in iwlwifi
Status: CLOSED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: IOMMU (show other bugs)
Hardware: Intel Linux
: P1 normal
Assignee: drivers_iommu
URL:
Keywords:
: 199447 (view as bug list)
Depends on:
Blocks:
 
Reported: 2018-03-28 16:06 UTC by kernelbugzilla
Modified: 2018-04-28 19:30 UTC (History)
5 users (show)

See Also:
Kernel Version: 4.16.0-041600rc7-generic
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
dmesg from 4.16rc7 (122.93 KB, text/plain)
2018-03-28 16:06 UTC, kernelbugzilla
Details
dmesg from 4.15 + backport-iwlwifi (178.25 KB, text/plain)
2018-04-03 17:27 UTC, kernelbugzilla
Details
dmesg from 4.15 + backport-iwlwifi + patch (159.46 KB, text/plain)
2018-04-04 03:40 UTC, kernelbugzilla
Details
Boot with 4.16.3 kernel still fails to load iwlwifi driver (70.48 KB, text/plain)
2018-04-20 13:48 UTC, Stuart
Details

Description kernelbugzilla 2018-03-28 16:06:00 UTC
Created attachment 274979 [details]
dmesg from 4.16rc7

4.16 regresses iwlwifi on Intel Corporation Wireless 8260 (rev 3a) wireless card, as part of Dell Precision 7510.

The iwlwifi driver consistently fails to initialize immediately after boot with "iwlwifi 0000:02:00.0: swiotlb: coherent allocation failed, size=4096" (full dmesg with stack trace attached).
Comment 1 Emmanuel Grumbach 2018-03-28 18:38:18 UTC
Allocating 4K of coherent memory can't be something overwhelmingly complicated...

Does this work on 4.15?
I doubt it is a bug in iwlwifi really...
This is very early in the init process and this flow hasn't changed for a while.
Nevertheless, if we see that this a regression, it may help to nail down the problem.
Comment 2 kernelbugzilla 2018-03-28 18:47:16 UTC
> Does this work on 4.15?

Yes, it does. 4.15.0 is fine.

For the record, this is off of a ubuntu mainline build (http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.16-rc7/) but past experience has shown these to be pretty reliable, changes are entirely in configuration/packaging and the patchset is not very big.
Comment 3 kernelbugzilla 2018-04-02 19:37:51 UTC
Anything else I can help with that would point to a more specific problem?
Comment 4 Emmanuel Grumbach 2018-04-02 19:55:43 UTC
Can you try with 4.15 and our master branch from https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/backport-iwlwifi.git/ ?

If that works, we'll be able to bisect maybe.
Comment 5 kernelbugzilla 2018-04-03 17:27:58 UTC
Created attachment 275097 [details]
dmesg from 4.15 + backport-iwlwifi

backport-iwlwifi from commit f75f445080d1eb6059cc29ff5ab55ad12d80b937 fails in new and exciting ways.

dmesg attached - something to do with the firmware.

You know this code better - is this before or after the coherent allocation that fails on 4.16?
Comment 6 Emmanuel Grumbach 2018-04-03 17:37:46 UTC
way after :)

But it is ... weird...

It means that we have a mismatch in the features that are advertised by the firmware.
The firmware is angry at the driver because the firmware didn't expect the TIME_QUOTA_CMD that being caused by the fact that firmware has this logic offloaded now...
But if that's the case, the firmware should have advertised this...
Anyway separate issue...


Should be fine with:

diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/quota.c b/drivers/net/wireless/intel/iwlwifi/mvm/quota.c
index 03cd22e88ab0..af837a91fe53 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/quota.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/quota.c
@@ -279,7 +279,7 @@ int iwl_mvm_update_quotas(struct iwl_mvm *mvm,
        lockdep_assert_held(&mvm->mutex);
 
        if (fw_has_capa(&mvm->fw->ucode_capa,
-                       IWL_UCODE_TLV_CAPA_DYNAMIC_QUOTA))
+                       IWL_UCODE_TLV_CAPA_DYNAMIC_QUOTA) || true)
                return 0;
 
        /* update all upon completion */





What we do see is that the firmware is loaded...
So I am fearing that we need to look in the swiotlb code rather than in iwlwifi....
Comment 7 kernelbugzilla 2018-04-03 17:48:45 UTC
> So I am fearing that we need to look in the swiotlb code rather than in
> iwlwifi....

Oh, fun. How do I help? :)
Comment 8 Emmanuel Grumbach 2018-04-03 19:32:57 UTC
First I'd like to know that it works with the small patch inline in my previous comment.
Comment 9 kernelbugzilla 2018-04-04 03:40:00 UTC
Created attachment 275099 [details]
dmesg from 4.15 + backport-iwlwifi + patch

Nope, fails with microcode error.
Comment 10 Emmanuel Grumbach 2018-04-07 19:40:09 UTC
Luca just said there is a problem with the firmware version. Please upgrade the firmware from https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/linux-firmware.git/

thanks.
Comment 11 Matthew Turnbull 2018-04-09 01:20:17 UTC
4.15.15 (gentoo-sources)
iwlwifi-stack-public:master:6932:7803aa0b
firmware version 36.e91976c0.0

This works for me.

4.16.1 (gentoo-sources)
iwlwifi-stack-public:master:6932:7803aa0b
firmware version 36.e91976c0.0

This still fails with "coherent allocation failed".

That does kind of point to it likely being a DMA/swiotlb regression. It looks like there was a refactor of coherent buffer allocation.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2382dc9a3eca644147be83dd2cd0dd64dc9e3e8c
Comment 12 Emmanuel Grumbach 2018-04-09 05:42:12 UTC
I am moving this bug to the relevant people (CC'ed Christoph). I am pretty sure that IOMMU isn't the right component, OTOH, I couldn't find anything related to SWIOTLB in the components...

Intel WiFi folks are still CC'ed to this bug.
Comment 13 Matthew Turnbull 2018-04-14 18:07:47 UTC
I just noticed this on the upstream master:

> swiotlb: fix unexpected swiotlb_alloc_coherent failures
> The code refactoring by commit 0176adb00406 ("swiotlb: refactor coherent
> buffer allocation") made swiotlb_alloc_buffer almost always failing due
> to a thinko: namely, the function evaluates the dma_coherent_ok call
> incorrectly and dealing as if it's invalid. This ends up with weird
> errors like iwlwifi probe failure or amdgpu screen flickering.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9e7f06c8beee304ee21b791653fefcd713f48b9a

It's not on the stable 4.16.y branch yet, but I manually applied it to 4.16.2, and I can confirm that it fixed the iwlwifi issues for me.
Comment 14 Emmanuel Grumbach 2018-04-20 05:34:12 UTC
*** Bug 199447 has been marked as a duplicate of this bug. ***
Comment 15 Stuart 2018-04-20 13:48:18 UTC
Created attachment 275465 [details]
Boot with 4.16.3 kernel still fails to load iwlwifi driver

I manually downloaded and compiled the 4.16.3 kernel using the default configuration for opensuse tumblweed.  I still get failure trying to load the iwlwifi driver.
Comment 16 kernelbugzilla 2018-04-20 17:30:17 UTC
(In reply to Stuart from comment #15)
> Created attachment 275465 [details]
> Boot with 4.16.3 kernel still fails to load iwlwifi driver

If you look at the 4.16.y mainline branch, the fix commit is nowhere to be found.
Comment 17 Stuart 2018-04-21 11:26:19 UTC
Thanks, the openSuSE team picked up the change in their 4.16.2 kernel update:

# diff /lib/modules/4.16.2-1-default/source/lib/swiotlb.c /srv/ftp/pub/kernel/swiotlb.c
735c735
<       if (!dma_coherent_ok(dev, *dma_handle, size))
---
>       if (dma_coherent_ok(dev, *dma_handle, size))


iwlwifi is working now:

[    5.249858] Intel(R) Wireless WiFi driver for Linux
[    5.249859] Copyright(c) 2003- 2015 Intel Corporation
[    5.249894] iwlwifi 0000:03:00.0: enabling device (0000 -> 0002)
[    5.251436] iwlwifi 0000:03:00.0: loaded firmware version 36.e91976c0.0 op_mode iwlmvm
[    5.284221] iwlwifi 0000:03:00.0: Detected Intel(R) Dual Band Wireless AC 8260, REV=0x208
[    5.367985] iwlwifi 0000:03:00.0: base HW address: 44:85:00:4a:92:9b
[    5.408014] input: Lid Switch as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:18/PNP0C09:01/PNP0C0D:00/input/input5
[    5.454412] ieee80211 phy0: Failed to initialize wep: -2
[    5.454434] ieee80211 phy0: Selected rate control algorithm 'iwl-mvm-rs'
Comment 18 Pawel 2018-04-28 19:30:20 UTC
Fedora still hasn't picked this fix up to date, and just released a problematic 4.16.3 as an update yesterday.

Note You need to log in before you can comment on or make changes to this bug.