Bug 203315
Summary: | iwlwifi: 8260: ASSERT 0xEDC with A-MSDU - WIFI-23915 | ||
---|---|---|---|
Product: | Drivers | Reporter: | Johannes Hirte (johannes.hirte) |
Component: | network-wireless | Assignee: | DO NOT USE - assign "network-wireless-intel" component instead (linuxwifi) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | aes368, andersk, cvandesande, gjunk2, hslager, j.straight-kernelbugs, luca, pillarsdotnet, rthomsen6, semi225599, slavi, snow3461, steven, vaaghoofdharry, vladi, wberrier |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
See Also: | https://bugzilla.kernel.org/show_bug.cgi?id=204009 | ||
Kernel Version: | 5.1.0-rc | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
full dmesg log
fw-dump1 fw-dump2 latest firmware for 8260 latest firmware for 8260 latest firmware for 8260 with debug enabled dmesg with debug-firmware |
Description
Johannes Hirte
2019-04-15 06:54:57 UTC
Still an issue with 5.1-rc7. Anything I can do that helps fixing this? How often do you get this? (In reply to Emmanuel Grumbach from comment #2) > How often do you get this? When working via NFS, I get this every day. It may take some hours, depending on the workload, but it happened always. With simple browsing or ssh I've not encountered this. I'll try if other workloads like rsync trigger this too. What you can try is to enable debug on iwlwifi: debug=0xC0800000 Along with the firmware dump [1], this may help to see what happens.. [1] - https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/debugging#firmware_debugging Please take the time to read the privacy notice here: https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/debugging#privacy_aspects (In reply to Emmanuel Grumbach from comment #4) > What you can try is to enable debug on iwlwifi: > > debug=0xC0800000 > > Along with the firmware dump [1], this may help to see what happens.. > > [1] - > https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/ > debugging#firmware_debugging > > Please take the time to read the privacy notice here: > https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/ > debugging#privacy_aspects I've loaded iwlwifi with debug=0xC0800000, but /sys/kernel/debug/iwlwifi/ is still missing. CONFIG_ALLOW_DEV_COREDUMP=Y is set. Do I need any other kernel-config? However the bug depends on speed an direction. Pushing data in the LAN with rsync triggers the bug immediately, whereas pulling in LAN works perfectly. Synced more than 30GB without any problem. Syncing data via ISP works too. It only happens when pushing data with high speed. I am not surprised. This firmware crash is about the Tx path. If you can reproduce quickly, you can try to run with tracing. Don't bother with debugfs for iwlwifi. Just put the udev rule in place and check that you have a dump being created when the firmware crashes. You don't need debugfs for that. Created attachment 282591 [details]
fw-dump1
Created attachment 282593 [details]
fw-dump2
corresponding crash:
[ 210.734926] iwlwifi 0000:02:00.0: Microcode SW error detected. Restarting 0x82000000.
[ 210.735095] iwlwifi 0000:02:00.0: Start IWL Error Log Dump:
[ 210.735098] iwlwifi 0000:02:00.0: Status: 0x00000100, count: 6
[ 210.735100] iwlwifi 0000:02:00.0: Loaded firmware version: 36.9f0a2d68.0
[ 210.735103] iwlwifi 0000:02:00.0: 0x00000EDC | ADVANCED_SYSASSERT
[ 210.735106] iwlwifi 0000:02:00.0: 0x00000220 | trm_hw_status0
[ 210.735108] iwlwifi 0000:02:00.0: 0x00000000 | trm_hw_status1
[ 210.735110] iwlwifi 0000:02:00.0: 0x00024180 | branchlink2
[ 210.735112] iwlwifi 0000:02:00.0: 0x000397EA | interruptlink1
[ 210.735113] iwlwifi 0000:02:00.0: 0x00000000 | interruptlink2
[ 210.735115] iwlwifi 0000:02:00.0: 0x0D8E001C | data1
[ 210.735117] iwlwifi 0000:02:00.0: 0x20000606 | data2
[ 210.735119] iwlwifi 0000:02:00.0: 0x000018B8 | data3
[ 210.735120] iwlwifi 0000:02:00.0: 0xC44010C9 | beacon time
[ 210.735122] iwlwifi 0000:02:00.0: 0xCAA4173A | tsf low
[ 210.735124] iwlwifi 0000:02:00.0: 0x000007CA | tsf hi
[ 210.735125] iwlwifi 0000:02:00.0: 0x00000000 | time gp1
[ 210.735127] iwlwifi 0000:02:00.0: 0x0C1DFEB9 | time gp2
[ 210.735129] iwlwifi 0000:02:00.0: 0x00000001 | uCode revision type
[ 210.735131] iwlwifi 0000:02:00.0: 0x00000024 | uCode version major
[ 210.735132] iwlwifi 0000:02:00.0: 0x9F0A2D68 | uCode version minor
[ 210.735134] iwlwifi 0000:02:00.0: 0x00000201 | hw version
[ 210.735136] iwlwifi 0000:02:00.0: 0x18489008 | board version
[ 210.735138] iwlwifi 0000:02:00.0: 0x20000606 | hcmd
[ 210.735139] iwlwifi 0000:02:00.0: 0x24022001 | isr0
[ 210.735141] iwlwifi 0000:02:00.0: 0x00000000 | isr1
[ 210.735143] iwlwifi 0000:02:00.0: 0x08001802 | isr2
[ 210.735144] iwlwifi 0000:02:00.0: 0x00417CC1 | isr3
[ 210.735146] iwlwifi 0000:02:00.0: 0x00000000 | isr4
[ 210.735148] iwlwifi 0000:02:00.0: 0x0D62001C | last cmd Id
[ 210.735149] iwlwifi 0000:02:00.0: 0x00000000 | wait_event
[ 210.735151] iwlwifi 0000:02:00.0: 0x00000094 | l2p_control
[ 210.735153] iwlwifi 0000:02:00.0: 0x00018030 | l2p_duration
[ 210.735155] iwlwifi 0000:02:00.0: 0x0000000F | l2p_mhvalid
[ 210.735157] iwlwifi 0000:02:00.0: 0x00000085 | l2p_addr_match
[ 210.735159] iwlwifi 0000:02:00.0: 0x0000000D | lmpm_pmg_sel
[ 210.735161] iwlwifi 0000:02:00.0: 0x04120134 | timestamp
[ 210.735163] iwlwifi 0000:02:00.0: 0x00007880 | flow_handler
[ 210.735232] iwlwifi 0000:02:00.0: Start IWL Error Log Dump:
[ 210.735234] iwlwifi 0000:02:00.0: Status: 0x00000100, count: 7
[ 210.735237] iwlwifi 0000:02:00.0: 0x00000070 | NMI_INTERRUPT_LMAC_FATAL
[ 210.735238] iwlwifi 0000:02:00.0: 0x00000000 | umac branchlink1
[ 210.735240] iwlwifi 0000:02:00.0: 0xC0086AA4 | umac branchlink2
[ 210.735242] iwlwifi 0000:02:00.0: 0xC0083C90 | umac interruptlink1
[ 210.735244] iwlwifi 0000:02:00.0: 0xC0083C90 | umac interruptlink2
[ 210.735246] iwlwifi 0000:02:00.0: 0x00000800 | umac data1
[ 210.735247] iwlwifi 0000:02:00.0: 0xC0083C90 | umac data2
[ 210.735249] iwlwifi 0000:02:00.0: 0xDEADBEEF | umac data3
[ 210.735251] iwlwifi 0000:02:00.0: 0x00000024 | umac major
[ 210.735252] iwlwifi 0000:02:00.0: 0x9F0A2D68 | umac minor
[ 210.735254] iwlwifi 0000:02:00.0: 0xC088628C | frame pointer
[ 210.735256] iwlwifi 0000:02:00.0: 0xC088628C | stack pointer
[ 210.735258] iwlwifi 0000:02:00.0: 0x004A014E | last host cmd
[ 210.735260] iwlwifi 0000:02:00.0: 0x00000000 | isr status reg
[ 210.735265] ieee80211 phy0: Hardware restart was requested
[ 211.497703] iwlwifi 0000:02:00.0: Microcode SW error detected. Restarting 0x2000000.
[ 211.497870] iwlwifi 0000:02:00.0: Start IWL Error Log Dump:
[ 211.497873] iwlwifi 0000:02:00.0: Status: 0x00000100, count: 6
[ 211.497875] iwlwifi 0000:02:00.0: Loaded firmware version: 36.9f0a2d68.0
[ 211.497878] iwlwifi 0000:02:00.0: 0x0000105C | ADVANCED_SYSASSERT
[ 211.497881] iwlwifi 0000:02:00.0: 0x00000220 | trm_hw_status0
[ 211.497883] iwlwifi 0000:02:00.0: 0x00000000 | trm_hw_status1
[ 211.497885] iwlwifi 0000:02:00.0: 0x00024180 | branchlink2
[ 211.497887] iwlwifi 0000:02:00.0: 0x000397EA | interruptlink1
[ 211.497888] iwlwifi 0000:02:00.0: 0x00000000 | interruptlink2
[ 211.497890] iwlwifi 0000:02:00.0: 0xDEADBEEF | data1
[ 211.497892] iwlwifi 0000:02:00.0: 0xDEADBEEF | data2
[ 211.497894] iwlwifi 0000:02:00.0: 0xDEADBEEF | data3
[ 211.497895] iwlwifi 0000:02:00.0: 0x00012F69 | beacon time
[ 211.497897] iwlwifi 0000:02:00.0: 0xBE86F6D3 | tsf low
[ 211.497899] iwlwifi 0000:02:00.0: 0x000007CB | tsf hi
[ 211.497900] iwlwifi 0000:02:00.0: 0x00000000 | time gp1
[ 211.497902] iwlwifi 0000:02:00.0: 0x0000D746 | time gp2
[ 211.497904] iwlwifi 0000:02:00.0: 0x00000001 | uCode revision type
[ 211.497906] iwlwifi 0000:02:00.0: 0x00000024 | uCode version major
[ 211.497907] iwlwifi 0000:02:00.0: 0x9F0A2D68 | uCode version minor
[ 211.497909] iwlwifi 0000:02:00.0: 0x00000201 | hw version
[ 211.497911] iwlwifi 0000:02:00.0: 0x18489008 | board version
[ 211.497913] iwlwifi 0000:02:00.0: 0x0D92001C | hcmd
[ 211.497914] iwlwifi 0000:02:00.0: 0x24022001 | isr0
[ 211.497916] iwlwifi 0000:02:00.0: 0x00000000 | isr1
[ 211.497918] iwlwifi 0000:02:00.0: 0x08001802 | isr2
[ 211.497919] iwlwifi 0000:02:00.0: 0x004134C1 | isr3
[ 211.497921] iwlwifi 0000:02:00.0: 0x00000000 | isr4
[ 211.497923] iwlwifi 0000:02:00.0: 0x0A6F001C | last cmd Id
[ 211.497924] iwlwifi 0000:02:00.0: 0x00000000 | wait_event
[ 211.497926] iwlwifi 0000:02:00.0: 0x000000D4 | l2p_control
[ 211.497928] iwlwifi 0000:02:00.0: 0x00018030 | l2p_duration
[ 211.497930] iwlwifi 0000:02:00.0: 0x00000007 | l2p_mhvalid
[ 211.497931] iwlwifi 0000:02:00.0: 0x00000081 | l2p_addr_match
[ 211.497933] iwlwifi 0000:02:00.0: 0x0000000D | lmpm_pmg_sel
[ 211.497935] iwlwifi 0000:02:00.0: 0x04120134 | timestamp
[ 211.497936] iwlwifi 0000:02:00.0: 0x00005868 | flow_handler
[ 211.498011] iwlwifi 0000:02:00.0: Start IWL Error Log Dump:
[ 211.498013] iwlwifi 0000:02:00.0: Status: 0x00000100, count: 7
[ 211.498015] iwlwifi 0000:02:00.0: 0x00000070 | NMI_INTERRUPT_LMAC_FATAL
[ 211.498017] iwlwifi 0000:02:00.0: 0x00000000 | umac branchlink1
[ 211.498018] iwlwifi 0000:02:00.0: 0xC0086AA4 | umac branchlink2
[ 211.498020] iwlwifi 0000:02:00.0: 0xC0083C90 | umac interruptlink1
[ 211.498022] iwlwifi 0000:02:00.0: 0xC0083C90 | umac interruptlink2
[ 211.498023] iwlwifi 0000:02:00.0: 0x00000800 | umac data1
[ 211.498025] iwlwifi 0000:02:00.0: 0xC0083C90 | umac data2
[ 211.498027] iwlwifi 0000:02:00.0: 0xDEADBEEF | umac data3
[ 211.498029] iwlwifi 0000:02:00.0: 0x00000024 | umac major
[ 211.498030] iwlwifi 0000:02:00.0: 0x9F0A2D68 | umac minor
[ 211.498032] iwlwifi 0000:02:00.0: 0xC088628C | frame pointer
[ 211.498034] iwlwifi 0000:02:00.0: 0xC088628C | stack pointer
[ 211.498035] iwlwifi 0000:02:00.0: 0x0050019C | last host cmd
[ 211.498037] iwlwifi 0000:02:00.0: 0x00000000 | isr status reg
[ 211.498041] ieee80211 phy0: Hardware restart was requested
[ 211.985750] iwlwifi 0000:02:00.0: Failing on timeout while stopping DMA channel 8 [0x07fe0003]
Excellent. Did you have tracing running by chance? (In reply to Emmanuel Grumbach from comment #9) > Excellent. Did you have tracing running by chance? The resulting trace is 2.2GB in size. Even compressed (zstd), it's still 142MB. If there isn't a possibility to upload this, I can prepare a download tomorrow. trace can be downloaded from https://datenkhaos.de/trace.dat.gpg I got it. You can remove it if you want. I started to look and the firmware / hardware is getting confused and loses track of the Tx FIFO. I'll try to get someone from the right team to look at this. Created attachment 282617 [details]
latest firmware for 8260
Can you please try the latest firmware we have for 8260?
I attached it here. There were a few fixes.
Thank you.
(In reply to Emmanuel Grumbach from comment #14) > Created attachment 282617 [details] > latest firmware for 8260 > > Can you please try the latest firmware we have for 8260? > I attached it here. There were a few fixes. > > Thank you. Still the same error. I've bisected this to: 8dd2cea8b65012003234b97c7f3dfaa61a3b4bd8 is the first bad commit commit 8dd2cea8b65012003234b97c7f3dfaa61a3b4bd8 Author: Ilan Peer <ilan.peer@intel.com> Date: Wed Oct 17 16:51:59 2018 +0300 iwlwifi: mvm: Do not set RTS/CTS protection for P2P Device MAC As this is not needed and might cause interoperability issues during pairing with devices that would not reply to RTS frames. Signed-off-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> :040000 040000 5cd6552c692631a4caa302ae4adf379b5e883547 6fd40358be68390d8aad1b7ca4932fd688aa615f M drivers Ok, either the bisect is wrong or it's only part of the problem. Simply reverting this commit doesn't solve the problem. Tested on top of 5.1-rc7. This can't really be the culprit. You are not using P2P Device I presume.. and even then, it wouldn't be related to the ASSERT EDC you're seeing. I opened a ticket on the relevant team. Created attachment 282795 [details]
latest firmware for 8260
Hey,
just to be sure, can you please test with the latest firmware attached here?
I doubt it'll make any difference, but I just want to know that it has not been fixed already.
Thanks.
(In reply to Emmanuel Grumbach from comment #18) > Created attachment 282795 [details] > latest firmware for 8260 > > Hey, > > just to be sure, can you please test with the latest firmware attached here? > I doubt it'll make any difference, but I just want to know that it has not > been fixed already. > Thanks. Still happens with this firmware. Tested with kernel 5.1.3 Created attachment 282821 [details]
latest firmware for 8260 with debug enabled
Hi,
please reproduce with this firmware and produce another dump.
This includes many more probes that will help us to understand what happens.
Thanks.
(In reply to Emmanuel Grumbach from comment #20) > Created attachment 282821 [details] > latest firmware for 8260 with debug enabled > > Hi, > > please reproduce with this firmware and produce another dump. > This includes many more probes that will help us to understand what happens. > > Thanks. Sadly with this firmware the speed is so limited, I'm not able to reproduce the bug. With debug-firmware, I achieve only 2-3MB/s, without debug it's more than 40MB/s. This is very very strange. Can you please roll-back to the previous just to make sure? Can you share the dmesg output of the run? (In reply to Emmanuel Grumbach from comment #22) > This is very very strange. > Can you please roll-back to the previous just to make sure? I"ve double-checked, and it's always the same. With debug-firmware, speed drops to 2-3MB/s. Created attachment 282823 [details]
dmesg with debug-firmware
Just to be sure, I've tested this with linux 5.1.3 and 5.0.16. With both I get the slowdown with the debug-firmware. Thanks. And that slowdown prevents the original bug (ASSERT EDC to occur?) I'll let the relevant know.. (In reply to Emmanuel Grumbach from comment #26) > Thanks. And that slowdown prevents the original bug (ASSERT EDC to occur?) Yes, I've not tried to figure out, what speeds are necessary for triggering this bug. With 10MB/s it didn't happen, but with 40MB/s and more the bug occurs within seconds. Ok, thanks. I think I was wrong when I started to look at the firmware. This smells more like a driver issue now. Is bisection something you could consider? If not, we'll try to see on our end. *** Bug 203649 has been marked as a duplicate of this bug. *** (In reply to Emmanuel Grumbach from comment #28) > Ok, thanks. > > I think I was wrong when I started to look at the firmware. This smells more > like a driver issue now. > > Is bisection something you could consider? Already tried, and this ended in 8dd2cea8b65012003234b97c7f3dfaa61a3b4bd8. A poblem is, there are other bugs around. I'll give it another try. Sorry, I know see that you already mentioned that. Oh well.. Need to do a bit less multi-tasking :) Chris, what device do you have? My device is an Intel Corporation Wireless 7265 (rev 59). I can reproduce this within a few minutes of booting a VM with the disk on an NFS share. The crash only seems to happen when using NFS and the Wifi card is hitting fairly high transfer rates >10MiB/s I can fairly consistently repro this without using NFS on a Lenovo T460s (Intel 8260 WiFi card). If I boot up my laptop on kernel 5.1.3, open a web browser and run a speed test while on WiFi, the speed slows to a crawl and when I check dmesg I see the same trace (although different addresses, not sure if that's relevant). Downgrading the kernel to 5.0.x fixes the issue for me. Please let me know if any additional traces would be useful. Are you sure you see ASSERT EDC? What device do you have? Please attach the dmesg. Ah I'm seeing 104B. Perhaps unrelated then? [ 128.228712] iwlwifi 0000:04:00.0: Microcode SW error detected. Restarting 0x2000000. [ 128.228853] iwlwifi 0000:04:00.0: Start IWL Error Log Dump: [ 128.228857] iwlwifi 0000:04:00.0: Status: 0x00000100, count: 6 [ 128.228860] iwlwifi 0000:04:00.0: Loaded firmware version: 36.9f0a2d68.0 [ 128.228864] iwlwifi 0000:04:00.0: 0x0000104B | ADVANCED_SYSASSERT [ 128.228867] iwlwifi 0000:04:00.0: 0x059002A0 | trm_hw_status0 [ 128.228870] iwlwifi 0000:04:00.0: 0x00000000 | trm_hw_status1 [ 128.228873] iwlwifi 0000:04:00.0: 0x00024180 | branchlink2 [ 128.228876] iwlwifi 0000:04:00.0: 0x000397EA | interruptlink1 [ 128.228878] iwlwifi 0000:04:00.0: 0x00000000 | interruptlink2 [ 128.228881] iwlwifi 0000:04:00.0: 0xC000E000 | data1 [ 128.228884] iwlwifi 0000:04:00.0: 0x000000A0 | data2 [ 128.228887] iwlwifi 0000:04:00.0: 0x000000E0 | data3 [ 128.228889] iwlwifi 0000:04:00.0: 0xB241251B | beacon time [ 128.228892] iwlwifi 0000:04:00.0: 0xC5EABAED | tsf low [ 128.228895] iwlwifi 0000:04:00.0: 0x000001E9 | tsf hi [ 128.228898] iwlwifi 0000:04:00.0: 0x000124B3 | time gp1 [ 128.228901] iwlwifi 0000:04:00.0: 0x049105D5 | time gp2 [ 128.228903] iwlwifi 0000:04:00.0: 0x00000001 | uCode revision type [ 128.228906] iwlwifi 0000:04:00.0: 0x00000024 | uCode version major [ 128.228909] iwlwifi 0000:04:00.0: 0x9F0A2D68 | uCode version minor [ 128.228911] iwlwifi 0000:04:00.0: 0x00000201 | hw version [ 128.228914] iwlwifi 0000:04:00.0: 0x00489008 | board version [ 128.228917] iwlwifi 0000:04:00.0: 0x0E00001C | hcmd [ 128.228919] iwlwifi 0000:04:00.0: 0x00023801 | isr0 [ 128.228922] iwlwifi 0000:04:00.0: 0x10850000 | isr1 [ 128.228925] iwlwifi 0000:04:00.0: 0x08001912 | isr2 [ 128.228927] iwlwifi 0000:04:00.0: 0x40417CC3 | isr3 [ 128.228930] iwlwifi 0000:04:00.0: 0x00000000 | isr4 [ 128.228933] iwlwifi 0000:04:00.0: 0x0E00001C | last cmd Id [ 128.228935] iwlwifi 0000:04:00.0: 0x00000000 | wait_event [ 128.228938] iwlwifi 0000:04:00.0: 0x000000D4 | l2p_control [ 128.228941] iwlwifi 0000:04:00.0: 0x00018020 | l2p_duration [ 128.228943] iwlwifi 0000:04:00.0: 0x00000007 | l2p_mhvalid [ 128.228946] iwlwifi 0000:04:00.0: 0x00000081 | l2p_addr_match [ 128.228949] iwlwifi 0000:04:00.0: 0x0000000D | lmpm_pmg_sel [ 128.228952] iwlwifi 0000:04:00.0: 0x04120134 | timestamp [ 128.228954] iwlwifi 0000:04:00.0: 0x0000F808 | flow_handler [ 128.229027] iwlwifi 0000:04:00.0: Start IWL Error Log Dump: [ 128.229030] iwlwifi 0000:04:00.0: Status: 0x00000100, count: 7 [ 128.229033] iwlwifi 0000:04:00.0: 0x00000070 | NMI_INTERRUPT_LMAC_FATAL [ 128.229035] iwlwifi 0000:04:00.0: 0x00000000 | umac branchlink1 [ 128.229037] iwlwifi 0000:04:00.0: 0xC0086AA4 | umac branchlink2 [ 128.229059] iwlwifi 0000:04:00.0: 0xC0083C90 | umac interruptlink1 [ 128.229062] iwlwifi 0000:04:00.0: 0xC0083C90 | umac interruptlink2 [ 128.229064] iwlwifi 0000:04:00.0: 0x00000800 | umac data1 [ 128.229067] iwlwifi 0000:04:00.0: 0xC0083C90 | umac data2 [ 128.229069] iwlwifi 0000:04:00.0: 0xDEADBEEF | umac data3 [ 128.229085] iwlwifi 0000:04:00.0: 0x00000024 | umac major [ 128.229086] iwlwifi 0000:04:00.0: 0x9F0A2D68 | umac minor [ 128.229088] iwlwifi 0000:04:00.0: 0xC088628C | frame pointer [ 128.229089] iwlwifi 0000:04:00.0: 0xC088628C | stack pointer [ 128.229091] iwlwifi 0000:04:00.0: 0x00AD0118 | last host cmd [ 128.229093] iwlwifi 0000:04:00.0: 0x00000000 | isr status reg [ 128.229098] ieee80211 phy0: Hardware restart was requested Yes, please open a new bug. still bisecting, but commit b0d795a9ae558209656b18930c2b4def5f8fdfb8 Author: Mordechay Goodstein <mordechay.goodstein@intel.com> Date: Sun Oct 21 18:27:26 2018 +0300 iwlwifi: mvm: avoid possible access out of array. The value in txq_id can be out of array scope, validate it before accessing the array. Signed-off-by: Mordechay Goodstein <mordechay.goodstein@intel.com> Fixes: cf961e16620f ("iwlwifi: mvm: support dqa-mode agg on non-shared queue") Signed-off-by: Luca Coelho <luciano.coelho@intel.com> is definitively bad this one: commit a98e2802a654e240d9020546fd29e5632da5f848 Author: Ihab Zhaika <ihab.zhaika@intel.com> Date: Sun Aug 5 15:05:45 2018 +0300 iwlwifi: correct one of the PCI struct names One of the cfg struct names is mistakenly "iwl22000", when it should be "iwl22560". Chage-Id: If9fbfa4bceef81d028c90c98d47115fbe39da547 Signed-off-by: Ihab Zhaika <ihab.zhaika@intel.com> Fixes: 2f7a3863191a ("iwlwifi: rename the temporary name of A000 to the official 22000") Signed-off-by: Luca Coelho <luciano.coelho@intel.com> seems to be good, but I'm still testing I'd bet it is 438af9698b0f161286c6e5d24255c3c231988b39 With 438af9698b0f161286c6e5d24255c3c231988b39 I can reproduce the bug. And testing commit cfbc6c4c5b91c7725ef14465b98ac347d31f2334 Author: Sara Sharon <sara.sharon@intel.com> Date: Tue Aug 21 15:23:39 2018 +0300 iwlwifi: mvm: support mac80211 TXQs model produces this error: [ 59.121470] WARNING: CPU: 1 PID: 2458 at drivers/net/wireless/intel/iwlwifi/mvm/tx.c:1102 iwl_mvm_tx_mpdu+0x2e2/0x540 [iwlmvm] [ 59.121478] Modules linked in: cmac rfcomm bnep btusb btrtl btbcm btintel bluetooth hp_wmi ecdh_generic wmi_bmof kvm_amd ccp kvm irqbypass crc32_pclmul joydev aesni_intel iwlmvm uvcvideo mac80211 aes_x86_64 videobuf2_vmalloc crypto_simd videobuf2_memops cryptd glue_helper videobuf2_v4l2 videodev snd_hda_codec_conexant psmouse snd_hda_codec_generic ledtrig_audio videobuf2_common snd_hda_codec_hdmi iwlwifi fam15h_power k10temp snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core i2c_piix4 snd_pcm hid_logitech_hidpp snd_timer cfg80211 snd realtek soundcore r8169 rfkill libphy wmi hp_wireless hid_logitech_dj rtsx_pci_sdmmc mmc_core ehci_pci ehci_hcd xhci_pci xhci_hcd rtsx_pci efivarfs autofs4 [ 59.121536] CPU: 1 PID: 2458 Comm: ssh Not tainted 5.0.0-rc1-00028-gcfbc6c4c5b91 #20 [ 59.121539] Hardware name: HP HP ProBook 645 G2/80FE, BIOS N77 Ver. 01.37 01/04/2019 [ 59.121546] RIP: 0010:iwl_mvm_tx_mpdu+0x2e2/0x540 [iwlmvm] [ 59.121549] Code: 74 24 18 a8 08 0f 85 35 02 00 00 0f b6 4c 24 18 41 8b 55 00 4c 63 f9 f6 c2 40 74 6e 4b 8d 04 bf 83 bc c3 74 01 00 00 03 74 60 <0f> 0b 48 8b bf a0 00 00 00 48 8b 34 24 41 bc ff ff ff ff e8 26 7d [ 59.121551] RSP: 0018:ffffa7db89463688 EFLAGS: 00010293 [ 59.121553] RAX: 0000000000000005 RBX: ffffa356afd4a820 RCX: 0000000000000001 [ 59.121555] RDX: 0000000040000050 RSI: 0000000000000001 RDI: ffffa3575c700028 [ 59.121557] RBP: ffffa35752e415a8 R08: 0000000000000008 R09: ffffa3575d7fd684 [ 59.121558] R10: 0000000000000008 R11: 0000000000004188 R12: ffffa3575d7f0500 [ 59.121560] R13: ffffa7db89463728 R14: 0000000000004188 R15: 0000000000000001 [ 59.121562] FS: 00007f0b28994740(0000) GS:ffffa3576f280000(0000) knlGS:0000000000000000 [ 59.121564] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 59.121565] CR2: 00002604a2adaa80 CR3: 000000031bde8000 CR4: 00000000001406e0 [ 59.121567] Call Trace: [ 59.121577] ? __local_bh_enable_ip+0x35/0x80 [ 59.121584] iwl_mvm_tx_skb+0x1aa/0x500 [iwlmvm] [ 59.121590] iwl_mvm_mac_itxq_xmit+0x61/0x90 [iwlmvm] [ 59.121608] ieee80211_queue_skb+0x229/0x3a0 [mac80211] [ 59.121620] ieee80211_tx+0xd4/0x130 [mac80211] [ 59.121632] __ieee80211_subif_start_xmit+0x70f/0xbc0 [mac80211] [ 59.121636] ? __bpf_prog_run32+0x34/0x60 [ 59.121641] ? skb_send_sock_locked+0x200/0x200 [ 59.121643] ? reqsk_fastopen_remove+0x150/0x150 [ 59.121654] ieee80211_subif_start_xmit+0x3e/0x2d0 [mac80211] [ 59.121658] ? dev_queue_xmit_nit+0x247/0x250 [ 59.121661] dev_hard_start_xmit+0x90/0x210 [ 59.121664] __dev_queue_xmit+0x755/0x8d0 [ 59.121668] ? __cgroup_bpf_run_filter_skb+0x156/0x200 [ 59.121672] ip_finish_output2+0x271/0x3c0 [ 59.121675] ip_output+0x6b/0x110 [ 59.121679] __ip_queue_xmit+0x158/0x420 [ 59.121683] __tcp_transmit_skb+0x538/0xa40 [ 59.121686] tcp_write_xmit+0x373/0xfd0 [ 59.121689] __tcp_push_pending_frames+0x2d/0xc0 [ 59.121692] tcp_sendmsg_locked+0x4b6/0xcd0 [ 59.121695] tcp_sendmsg+0x23/0x40 [ 59.121698] sock_sendmsg+0x34/0x40 [ 59.121702] sock_write_iter+0x8a/0xf0 [ 59.121706] __vfs_write+0x138/0x190 [ 59.121710] vfs_write+0xb1/0x190 [ 59.121712] ksys_write+0x4a/0xb0 [ 59.121715] do_syscall_64+0x50/0x170 [ 59.121719] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 59.121722] RIP: 0033:0x7f0b2889a828 [ 59.121724] Code: 00 90 48 83 ec 38 64 48 8b 04 25 28 00 00 00 48 89 44 24 28 31 c0 48 8d 05 b5 7e 0d 00 8b 00 85 c0 75 27 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 60 48 8b 4c 24 28 64 48 33 0c 25 28 00 00 00 [ 59.121726] RSP: 002b:00007ffeb88e9a80 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 59.121728] RAX: ffffffffffffffda RBX: 0000000000004024 RCX: 00007f0b2889a828 [ 59.121729] RDX: 0000000000004024 RSI: 000055f759ad2d10 RDI: 0000000000000003 [ 59.121731] RBP: 000055f759a760e0 R08: 00007f0b27f5a000 R09: 0000000000000000 [ 59.121732] R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000003b [ 59.121734] R13: 0000000000000000 R14: 000000007fffffff R15: 00007ffeb88e9b18 [ 59.121737] ---[ end trace 83442b1426970953 ]--- Can you try to revert438af9698b0f161286c6e5d24255c3c231988b39 ? Thanks a lot. This is tremendously helpful. Actually this should be enough: diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c b/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c index 5c52469288be..59a70f36e0f7 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c @@ -468,7 +468,6 @@ int iwl_mvm_mac_setup_register(struct iwl_mvm *mvm) ieee80211_hw_set(hw, SUPPORTS_VHT_EXT_NSS_BW); ieee80211_hw_set(hw, BUFF_MMPDU_TXQ); ieee80211_hw_set(hw, STA_MMPDU_TXQ); - ieee80211_hw_set(hw, TX_AMSDU); ieee80211_hw_set(hw, TX_FRAG_LIST); if (iwl_mvm_has_tlc_offload(mvm)) { On top of linux-5.1.3 I've reverted: 08f7d8b69aaf137db8ee0a2d7c9e6cd6383ae250 679bff239f51388a61a3cb4a512bc3a1d6e66d74 438af9698b0f161286c6e5d24255c3c231988b39 (a little tricky) and this seems to fixed it. I've transmitted more than 70GB without error. Awesome. We'll take it from here. Huge thanks. *** Bug 203719 has been marked as a duplicate of this bug. *** I just want to add that I've also had this issue since 5.1-rc1 on 8265 (firmware 36), easily reproducible for me using `iperf -c`, and the one-liner on comment42 fixes it here too. I ran into this on the 8260 while forwarding internet for a device connected over wifi (seemed to work without problems without forwarding). I am grateful to have found this thread and patch in comment 42; it worked for me as well. Hope to see the fix merged in soon. *** Bug 203783 has been marked as a duplicate of this bug. *** *** Bug 203809 has been marked as a duplicate of this bug. *** (In reply to Alex Spitzer from comment #47) > I ran into this on the 8260 while forwarding internet for a device connected > over wifi (seemed to work without problems without forwarding). I am > grateful to have found this thread and patch in comment 42; it worked for me > as well. Hope to see the fix merged in soon. So, you're saying the patch in comment 42 by itself is sufficient to work around the issue? (In reply to Johannes Berg from comment #50) > (In reply to Alex Spitzer from comment #47) > > I ran into this on the 8260 while forwarding internet for a device > connected > > over wifi (seemed to work without problems without forwarding). I am > > grateful to have found this thread and patch in comment 42; it worked for > me > > as well. Hope to see the fix merged in soon. > > So, you're saying the patch in comment 42 by itself is sufficient to work > around the issue? I can confirm it does work around the issue on my 7265 wlan. But I believe it's not a fix, just a workaround. (it disable some functionality I guess) (In reply to snow3461 from comment #51) > I can confirm it does work around the issue on my 7265 wlan. But I believe > it's not a fix, just a workaround. (it disable some functionality I guess) Thanks. Yes, in a sense; it does disable some functionality that in theory lets you reach a bit higher throughput, but we enabled this functionality by accident rather than intentionally on older devices and (clearly!) never tested it carefully there, so chances are we won't actually have the bandwidth to go back to the older devices and carefully analyze and understand why it breaks there. (FWIW, we're talking about a device that was released 4 years ago and obviously was in development significantly before that) I do see this issue on a "newer" intel wifi (Dual Band Wireless AC 8265) and removing that include does fix my dmesg errors as well. (In reply to Johannes Berg from comment #50) > (In reply to Alex Spitzer from comment #47) > > I ran into this on the 8260 while forwarding internet for a device > connected > > over wifi (seemed to work without problems without forwarding). I am > > grateful to have found this thread and patch in comment 42; it worked for > me > > as well. Hope to see the fix merged in soon. > > So, you're saying the patch in comment 42 by itself is sufficient to work > around the issue? Yes, I commented the "ieee80211_hw_set(hw, TX_AMSDU);" line, recompiled the iwlwifi module, and the problems went away for me. I have not tested with newer kernels however. I did this for Linux kernel version 5.1.5. Fixed by https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/backport-iwlwifi.git/commit/?id=4ea46b6574e83c20fc5e061d354f8c4df777d7f6 Will be sent upstream through the regular process. oops we had a hiccup in our infra. Commit is in master branch with a different commit ID. (In reply to vladi from comment #53) > I do see this issue on a "newer" intel wifi (Dual Band Wireless AC 8265) and > removing that include does fix my dmesg errors as well. I had actually been referring to 8265 when I said 4 years ago - 7265 is even older :-) As you can see in the code change, we really supported/tested this only from 9000 series onwards. I'm still sort of curious what happens, but it remains to be seen if I can find cycles to investigate further. Most likely not. *** Bug 203929 has been marked as a duplicate of this bug. *** *** Bug 204009 has been marked as a duplicate of this bug. *** The patch is still missing upstream. Any chance getting this in before kernel 5.2 will be released? (In reply to Johannes Hirte from comment #60) > The patch is still missing upstream. Any chance getting this in before > kernel 5.2 will be released? Unfortunately we're already in v5.2-rc7, so it will be difficult to still get it in for v5.2. To gauge the importance of this, does the assert (FW crash) disrupt connectivity much or is it just a hiccup from which we recover with a FW restart? It's a performance killer. When this happen, transmission drops from ~40MB/s down to 1MB/s. In my case (duplicate bug 203929), it drops connectivity and doesn't recover. I'm forced to rmmod/modprobe to get connectivity back. But that only keeps it working until the next time it trips this assert. Thanks for the input! I'll try to squeeze this into v5.2. Let's see if it's accepted. FYI: I've sent the patch and asked Dave to take it to v5.2 still. Let's see if that happens. https://patchwork.kernel.org/patch/11029027/ The patch mentioned in C65 (11029027) is marked superseded, and it's definitely not in 5.3-rc3. Does this mean we that there is a different fix in place, somewhere else or perhaps will be? thanks Sorry - i see is in the pipeline now. thank you! *** Bug 204577 has been marked as a duplicate of this bug. *** |