Bug 205001

Summary: iwlwifi: 9260: ASSERT 22CE upon heavy Rx traffic
Product: Drivers Reporter: Chris Clayton (chris2553)
Component: network-wirelessAssignee: DO NOT USE - assign "network-wireless-intel" component instead (linuxwifi)
Status: CLOSED DUPLICATE    
Severity: normal CC: bugzilla, chris2553, emmanuel.grumbach, labaunti3
Priority: P1    
Hardware: Intel   
OS: Linux   
Kernel Version: 5.3.1 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: dmesg showing iwlwifi microcode SW error
bisect log
first bad commit message

Description Chris Clayton 2019-09-26 06:05:30 UTC
Created attachment 285175 [details]
dmesg showing iwlwifi microcode SW error

I'm seeing a Microcode SW error which I can create almost at will by downloading a large file such as the Fedora 30 .iso file. This problem was initially reported under bug 204151 but, as this was a different error, Luca Coelho rqeuested it be reported separately.

The hardware is Intel(R) Wireless-AC 9260 160MHz, REV=0x324.
The firmware version is 46.6bf1df06.0 op_mode iwlmvm

In my simple test case, the downloaded file appears to be OK. I can mount the iso (with fuseiso) and browse the contents.
Comment 1 Emmanuel Grumbach 2019-09-26 06:51:09 UTC
Hi,

can you try to load iwlwifi with:

disable_msix=1

and see what happens?

Thanks.
Comment 2 Chris Clayton 2019-09-26 07:13:25 UTC
What happens is:

[ 6315.116362] iwlwifi: unknown parameter 'disable_msix' ignored

modinfo shows that iwlwifi has no parameter containing the string "msix".
Comment 3 Emmanuel Grumbach 2019-09-26 07:32:18 UTC
Oh well.. This is not upstream, but in our internal backport tree :(

So let's do it manually, can you do this:

diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
index eb544811759d..63c997e0b800 100644
--- a/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
+++ b/drivers/net/wireless/intel/iwlwifi/pcie/trans.c
@@ -1706,7 +1706,7 @@ iwl_pcie_set_interrupt_capa(struct pci_dev *pdev,
        int max_irqs, num_irqs, i, ret;
        u16 pci_cmd;
 
-       if (!cfg_trans->mq_rx_supported || iwlwifi_mod_params.disable_msix)
+       if (!cfg_trans->mq_rx_supported || iwlwifi_mod_params.disable_msix || true)
                goto enable_msi;
 
        max_irqs = min_t(u32, num_online_cpus() + 2, IWL_MAX_RX_HW_QUEUES);
Comment 4 Chris Clayton 2019-09-26 10:53:52 UTC
Your patch doesn't apply to 5.3.1, but I applied this one:

--- linux-5.3.1/drivers/net/wireless/intel/iwlwifi/pcie/trans.c.iwlwifi-msix-diags      2019-09-26 09:25:43.328456574 +0100
+++ linux-5.3.1/drivers/net/wireless/intel/iwlwifi/pcie/trans.c 2019-09-26 09:29:15.764465124 +0100
@@ -1588,7 +1588,7 @@ static void iwl_pcie_set_interrupt_capa(
        int max_irqs, num_irqs, i, ret;
        u16 pci_cmd;
 
-       if (!trans->cfg->mq_rx_supported)
+       if (!trans->cfg->mq_rx_supported || true)
                goto enable_msi;
 
        max_irqs = min_t(u32, num_online_cpus() + 2, IWL_MAX_RX_HW_QUEUES);


With that patch, I've downloaded the Fedora 30 iso.file eight times and the Microcode SW error has not reoccured.
Comment 5 Emmanuel Grumbach 2019-09-26 11:30:07 UTC
All right, this is interesting...

Thank you.
Comment 6 Chris Clayton 2019-09-30 22:26:25 UTC
Just a quick update. I've been trying to recreate this problem on 4.19.75 but seem unable to. I've downloaded the large .iso file numerous times including a couple of tests where two downloads of the file were running concurrently.

I didn't notice the problem on 5.2 series kernels, so I'll try and bisect 5.3 kernels over the next couple of days.
Comment 7 Chris Clayton 2019-10-01 12:30:04 UTC
I've bisected and landed at:
3c514bf831ac12356b695ff054bef641b9e99593
iwlwifi: mvm: add a loose synchronization of the NSSN across Rx queues

A kernel built at the parent commit (521dc6c7c74e88fbd02947e4e50a5cb0d49b4395- iwlwiif: mvm: refactor iwl_mvm_notify_rx_queue) does not exhibit the microcode SW error.

I'll attach the bisect log and 'first bad commit' message in a moment.
Comment 8 Chris Clayton 2019-10-01 12:31:04 UTC
Created attachment 285275 [details]
bisect log
Comment 9 Chris Clayton 2019-10-01 12:31:37 UTC
Created attachment 285277 [details]
first bad commit message
Comment 10 Emmanuel Grumbach 2019-10-01 18:09:40 UTC
Ok, I can't see how this commit could be causing this, but we did see that this commit is causing other issues. So this is a sort of "known issue". For sure, your report helps us.

The problem is that this commit is needed for another problem that is described in its commit message.

Your work is very useful for us, thank you very much.
The ball is now in our court. Thanks.
Comment 11 Chris Clayton 2019-10-16 08:03:33 UTC
On 15/10/2019 18:29, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=205001
> 

What's the latest update on this bug? Will it be fixed in 5.4 timescales. It is a regression after all.

> bugzilla@goodbit.net changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |bugzilla@goodbit.net
>
Comment 12 labaunti3 2019-10-24 04:29:16 UTC
It is happening to me too, on 5.3 and 5.4rc3, however not on 5.2
Comment 13 Emmanuel Grumbach 2019-11-20 09:17:39 UTC

*** This bug has been marked as a duplicate of bug 204873 ***