Bug 199209 - iwlwifi: 9260: crashes and hangs UI on file uploads over nfs
Summary: iwlwifi: 9260: crashes and hangs UI on file uploads over nfs
Status: CLOSED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: network-wireless (show other bugs)
Hardware: Intel Linux
: P1 high
Assignee: DO NOT USE - assign "network-wireless-intel" component instead
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-03-26 05:38 UTC by sergeidanilov
Modified: 2018-04-08 20:18 UTC (History)
1 user (show)

See Also:
Kernel Version: 4.15.12
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg output (62.87 KB, text/plain)
2018-03-26 05:38 UTC, sergeidanilov
Details
journalctl output (134.27 KB, text/plain)
2018-03-26 05:38 UTC, sergeidanilov
Details
consistent_crash_after_warning (52.78 KB, application/x-xz)
2018-03-27 04:20 UTC, sergeidanilov
Details
logs_without_tso (31.40 KB, application/x-xz)
2018-03-27 04:21 UTC, sergeidanilov
Details
crash_with_max_amsdu_len_1800.tar.xz (279.00 KB, application/x-xz)
2018-03-28 04:52 UTC, sergeidanilov
Details
fix candidate (1.09 KB, patch)
2018-04-08 09:31 UTC, Emmanuel Grumbach
Details | Diff
fix candidate (1.20 KB, patch)
2018-04-08 10:30 UTC, Emmanuel Grumbach
Details | Diff

Description sergeidanilov 2018-03-26 05:38:18 UTC
I have my NAS mounted on linux using nfs. No custom options ,
just 192.168.1.2:/nfs/Storage /mnt/storage 

9260 works perfectly for downloads with a patch from bug  198351.

However it almost immediately hangs my whole OS (UI, keyboard) if I'm trying to upload something.Upload speeds drops to someting 0.5MB/sec
Logs giving crash stack trace with "Microcode SW error detected. Restarting 0x101." 

Attaching output of journalctl and dmesg with Microcode SW error.

Also I sniffed network output while upload process.
Here is a link to ~70MB archive: https://drive.google.com/open?id=1unH32-e5EAB6un_dGdtsfOFy6HzxJqRQ
Comment 1 sergeidanilov 2018-03-26 05:38:37 UTC
Created attachment 274939 [details]
dmesg output
Comment 2 sergeidanilov 2018-03-26 05:38:55 UTC
Created attachment 274941 [details]
journalctl output
Comment 3 sergeidanilov 2018-03-26 05:43:02 UTC
my AP is Netgear R7800.
For sniffing I configured it to work with no security on 80MHZ.
But it crashes and hangs the same way on 160Mhz as well.
Comment 4 Emmanuel Grumbach 2018-03-26 08:59:17 UTC
Does it reproduce easily?
Does the FW crash always come after the WARNING in iwl_mvm_tx_tso?

I am very surprised to see this WARNING...

Can you try to disable TSO?
Comment 5 sergeidanilov 2018-03-26 20:50:09 UTC
Yes, 
I will collect several crashes today to see whether its comes after warning.

Just to confirm will it be enough to do something like
ethtool -K eth0 tso off
to disable tso?
Comment 6 Emmanuel Grumbach 2018-03-26 20:55:26 UTC
(In reply to sergeidanilov from comment #5)
> Yes, 
> I will collect several crashes today to see whether its comes after warning.

thanks

> 
> Just to confirm will it be enough to do something like
> ethtool -K eth0 tso off
> to disable tso?

Hm... on my system it doesn't work.. :(

The easiest and safest is probably this:

diff --git a/drivers/net/wireless/intel/iwlwifi/cfg/9000.c b/drivers/net/wireless/intel/iwlwifi/cfg/9000.c
index 83d47eb92724..7ef3572ccc26 100644
--- a/drivers/net/wireless/intel/iwlwifi/cfg/9000.c
+++ b/drivers/net/wireless/intel/iwlwifi/cfg/9000.c
@@ -145,7 +145,7 @@ static const struct iwl_tt_params iwl9000_tt_params = {
        .dccm2_len = IWL9000_DCCM2_LEN,                                 \
        .smem_offset = IWL9000_SMEM_OFFSET,                             \
        .smem_len = IWL9000_SMEM_LEN,                                   \
-       .features = IWL_TX_CSUM_NETIF_FLAGS | NETIF_F_RXCSUM,           \
+       .features = NETIF_F_RXCSUM,                                     \
        .thermal_params = &iwl9000_tt_params,                           \
        .apmg_not_supported = true,                                     \
        .mq_rx_supported = true,                                        \
Comment 7 sergeidanilov 2018-03-27 04:20:26 UTC
Emmanuel,
Yes, crash always happens after WARNING in iwl_mvm_tx_tso.
I attached three runs with consistent_crash_after_warning.tar.xz

After disabling TSO its working just fine!
attached logs_without_tso.tar.xz
Comment 8 sergeidanilov 2018-03-27 04:20:58 UTC
Created attachment 274955 [details]
consistent_crash_after_warning
Comment 9 sergeidanilov 2018-03-27 04:21:21 UTC
Created attachment 274957 [details]
logs_without_tso
Comment 10 Emmanuel Grumbach 2018-03-27 06:14:38 UTC
Can you please restore the original code and do (as root of course):

echo 1800 >  /sys/kernel/debug/iwlwifi/0000\:02\:00.0/iwlmvm/max_amsdu_len 


Thanks.
Comment 11 Emmanuel Grumbach 2018-03-27 06:28:32 UTC
Also, does this problem happen with iperf? It only under load with bfs?
Comment 12 sergeidanilov 2018-03-28 04:52:14 UTC
Created attachment 274975 [details]
crash_with_max_amsdu_len_1800.tar.xz
Comment 13 sergeidanilov 2018-03-28 04:52:26 UTC
I tried to rep
Comment 14 sergeidanilov 2018-03-28 04:55:12 UTC
I tried to reproduce it with iperf and also with cifs and ftp.
It works just fine with any of them

It looks like problem narrows to nfs. it's consistent with nfs and immediately happens even with slow network speeds like 1MB/sec and small files.

After I did echo 1800 > /sys/kernel/debug/iwlwifi/0000\:03\:00.0/iwlmvm/max_amsdu_len
crash still happens. 

I collected new logs to crash_with_max_amsdu_len_1800.tar.xz
Comment 15 Emmanuel Grumbach 2018-03-28 05:51:05 UTC
Ok.... Interesting... 

Thanks.

I am on vacation right now, but I'll try to think on what can be causing this issue.
Comment 16 Emmanuel Grumbach 2018-04-08 08:46:35 UTC
I got back to this.

I could reproduce.

WIP.
Comment 17 Emmanuel Grumbach 2018-04-08 09:31:01 UTC
Created attachment 275161 [details]
fix candidate

please test the patch attached.
Comment 18 Emmanuel Grumbach 2018-04-08 10:30:07 UTC
Created attachment 275163 [details]
fix candidate

new version.
Comment 19 Emmanuel Grumbach 2018-04-08 11:16:07 UTC
FWIW: I could check that the problem doesn't reproduce with the second version of the fix.
Please confirm that it fixes the problem for you as well

Waiting for your input before closing the bug.
Comment 20 sergeidanilov 2018-04-08 20:08:22 UTC
I confirm patch works.
Thanks Emmanuel!

Note You need to log in before you can comment on or make changes to this bug.