Bug 196715

Summary: iwlwifi: 8265/3168: Packet Injection causes queue hang - WIFILNX-1331
Product: Drivers Reporter: rosenp
Component: network-wirelessAssignee: DO NOT USE - assign "network-wireless-intel" component instead (linuxwifi)
Status: CLOSED CODE_FIX    
Severity: normal CC: luca
Priority: P1    
Hardware: Intel   
OS: Linux   
Kernel Version: 4.12.5 Subsystem:
Regression: No Bisected commit-id:
Attachments: kernel oops
full log
kernel 4.13.9 not 4.3.
Fix candidate

Description rosenp 2017-08-21 06:33:55 UTC
Created attachment 258035 [details]
kernel oops

As the title says, injecting packets with iwlwifi crashes the driver. journald log is attached.
Comment 1 Luca Coelho 2017-08-26 17:26:14 UTC
Do you actually get an oops, or just this warning? Even the warning shouldn't happen, but it's not as serious as an oops (obviously).

Can you describe exactly what you did to get this? Also which NIC is this, which is not so important, but sometimes it's useful.
Comment 2 rosenp 2017-08-26 18:45:55 UTC
I reported it as an oops since these lines followed:

Aug 20 23:28:10 mangix.clevo abrt-dump-journal-oops[868]: abrt-dump-journal-oops: Found oopses: 1
Aug 20 23:28:10 mangix.clevo abrt-dump-journal-oops[868]: abrt-dump-journal-oops: Creating problem directories
Aug 20 23:28:10 mangix.clevo abrt-server[3523]: Deleting problem directory oops-2017-08-20-23:28:10-868-0 (dup of oops-2017-08-18-14:48:49-855-0)
Aug 20 23:28:10 mangix.clevo abrt-notification[3530]: System encountered a non-fatal error in iwl_mvm_set_tx_cmd()
Aug 20 23:28:11 mangix.clevo abrt-dump-journal-oops[868]: Reported 1 kernel oopses to Abrt


I guess it's only a warning.

wireless card is an 8265: 

04:00.0 Network controller [0280]: Intel Corporation Wireless 8265 / 8275 [8086:24fd] (rev 50)
	Subsystem: Intel Corporation Device [8086:0150]

To get this warning, only an injection test is necessary. 

"aireplay-ng wlan0 -9"

The test fails since injection is not working. I cannot kick myself off the network either.

Firmware is -27.
Comment 3 Luca Coelho 2017-08-26 18:50:21 UTC
Thanks! We'll look into it.
Comment 4 Luca Coelho 2017-09-01 19:32:23 UTC
I created an internal tracking ticket.  I'll report here as it proceeds.
Comment 5 Luca Coelho 2017-10-13 08:30:06 UTC
I checked this issue again and, apparently, it was fixed with the following commit:

6e46496302df ("iwlwifi: mvm: remove DQA non-STA client mode special case")

This patch is in 4.13 and above, but didn't go into 4.12 stable.  Since 4.12 is EOL, can you upgrade to 4.13 and retry?
Comment 6 rosenp 2017-10-13 22:58:51 UTC
You're going to hate me for this, but I replaced my 8265 in my laptop with an ath9k card. I'll need to find it if I want to do any more testing.

However, I recently build a desktop with built-in Intel wifi. Although the chip is different (3168), it still errors on kernel 4.13.3, which has the aformentioned commit. I uploaded a full log, including early boot. command ran was aireplay -9 wlp9somon.

From what I see, on this chip it's an error with firmware and not the driver. So the issue that I originally reported is probably fixed.
Comment 7 rosenp 2017-10-13 22:59:47 UTC
Created attachment 258819 [details]
full log
Comment 8 rosenp 2017-11-02 00:39:25 UTC
Installed the 8265 in my laptop again to debug the other issue. I tried again and the result is even worse. SMBus got messed up and as a result I was unable to use my mouse. Luckily my keyboard doesn't use SMBus. Kernel is 4.3.9 and firmware is version 31.532993.0. SMBus issue seems unrelated but this is the first time i've gotten it. Log attached.
Comment 9 rosenp 2017-11-02 00:40:32 UTC
Created attachment 260463 [details]
kernel 4.13.9 not 4.3.
Comment 10 Luca Coelho 2017-11-13 08:24:09 UTC
Okay, thanks for reporting and testing again.

The old bug seems to be fixed but we got a new one.  I'll rename this report to better reflect the new issue.

I don't think the SMBus issue has anything to do with it though.

Both cases (i.e. with 3168 and 8265) seem to be the same now, queue hang after packet injection.
Comment 11 Luca Coelho 2017-11-17 17:22:28 UTC
I investigated this a bit more and I think the reason is that we should not have the watchdog for injected frames.  I think we are not getting the ACKs from the injected frames, so the watchdog kicks in.

Can you try to load the iwlmvm module with tfd_q_hang_detect=0 (module parameter)? That will disable the queue hang detection.  If you see that the injected frames go out properly and the watchdog doesn't kick in, the problem is solved. :)
Comment 12 rosenp 2017-11-17 23:14:21 UTC
Looks like you're right. No ACK messages come from injected frames. Tested with aireplay-ng and a different utility.

As for the kernel parameter, it got rid of all the error messages :). Still doesn't fix the ACK messages not coming in though. Will do further testing.
Comment 13 Emmanuel Grumbach 2017-11-19 08:58:30 UTC
Created attachment 260715 [details]
Fix candidate

Hi,

can you please try this?
Comment 14 Emmanuel Grumbach 2017-11-19 20:37:24 UTC
patch is merged in our backport tree.

Will close when you'll have tested.
Comment 15 Emmanuel Grumbach 2017-11-22 14:59:20 UTC
please reopen if needed.