Bug 172431

Summary: iwlwifi: 8260: ASSERT 1007 - WIFILNX-595
Product: Drivers Reporter: Volker Mische (volker.mische)
Component: network-wirelessAssignee: DO NOT USE - assign "network-wireless-intel" component instead (linuxwifi)
Status: CLOSED WILL_NOT_FIX    
Severity: normal CC: awes, david.meriin, elitebadger, gomesbascoy, h8uvkyqnpfrzqyc, linuxwifi, luca, mike.cloaked, volker.mische, wael.nasreddine, willing.alexander
Priority: P1    
Hardware: x86-64   
OS: Linux   
See Also: https://bugzilla.kernel.org/show_bug.cgi?id=173101
https://bugzilla.kernel.org/show_bug.cgi?id=196641
Kernel Version: 4.8.0r-c6 4.10.0-rc4 Subsystem:
Regression: No Bisected commit-id:
Attachments: iwlwifi-8000C-22.ucode firmware with debugging enabled
Output of dmesg|grep iwl
My Kernel config file
dmsg output of -27 crash
dmesg|grep iwl output
Core24 with monitor
Firmware with a potential fix (Core24)
Properly signed firmware with a potential fix (Core24)
dmesg|grep iwl output 2017-05-29
dmesg|grep iwl output 2017-06-08
Syslog error for non-queue-stuck 1007 assert
dmesg|grep iwl output 2017-07-02

Description Volker Mische 2016-09-21 20:16:03 UTC
I'm getting error messages like

    Queue 2 stuck for 10000 ms.

when I'm connected to my home 2.5 GHz network with 40MHz. It's the only network in range. I've an Intel 8260 chip. It's often Queue 2, but I've also seen it on Queue 0 or 16.

I don't have issues if set the cfg80211_disable_40mhz_24ghz parameter and hence connected to the 20MHz (I checked via "iw dev").

The issue is also there if I set the power_scheme to 1 or if I run on battery without being plugged into the power socket.

I think that's all the information I have for now, I hope I didn't miss anything that was mentioned in the email thread [1].

In case I should create a dump, let me know (it should be easy, it happens every few minutes).

[1]: http://marc.info/?l=linux-wireless&m=147429366505895
Comment 1 Tobias Schmetzer 2016-09-28 19:01:54 UTC
Is it only error Messages or does your Wifi Connection drop?
Comment 2 Volker Mische 2016-09-28 20:09:56 UTC
The Wifi connection drops, I get very slow pings back from the router.
Comment 3 Luca Coelho 2016-10-04 07:43:02 UTC
We are getting many reports of this kind of the "Queue stuck" problem and are currently investigating it.  Unfortunately no ETA for a fix yet.

Meanwhile, can you check if the workarounds described in the following link help?

https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi#about_platform_noise
Comment 4 Volker Mische 2016-10-04 07:47:03 UTC
Luca, please see the original bug report description for the workarounds that I've tried.
Comment 5 Luca Coelho 2016-10-04 07:57:55 UTC
Oh, I'm sorry, I forgot about that thread.  We'll come back to you with more instructions on how to get more data.  We're getting quite a few reports on this problem and I'm trying to gather all the info I can to take this up with our firmware team.
Comment 6 Luca Coelho 2016-10-11 09:54:44 UTC
Created attachment 241471 [details]
iwlwifi-8000C-22.ucode firmware with debugging enabled

Can you take some logs with the attached firmware, following the instructions here?

https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/debugging#firmware_debugging

This will allow us to further debug the issues you are experiencing.

And please make sure you read and understand the privacy aspects of sending such logs to us:

https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/debugging#privacy_aspects
Comment 7 Volker Mische 2016-10-20 20:07:37 UTC
Sorry for the long delay. I've installed the patched firmware. The instructions say that the dump will automatically created on a kernel >= 4.1. My problem is that I don't have a "/sys/devices/virtual/devcoredump" directory, althought my kernel has "CONFIG_ALLOW_DEV_COREDUMP=y" set.

I also don't have "/sys/kernel/debug/iwlwifi" although "CONFIG_IWLWIFI_DEBUG=y"

Looks like I'm still missing some Kernel configuration. Does anyone have a clue?
Comment 8 Emmanuel Grumbach 2016-10-20 20:09:57 UTC
the devcoredump directory should be created only upon a firmware crash which will happen upon the queue stuck thing you are seeing.
Comment 9 Volker Mische 2016-10-21 06:17:48 UTC
I saw crashes via dmesg. Anyway I'll try it tonight again. At least it's good to know that I should have a devcoredump directory.
Comment 10 Volker Mische 2016-11-22 09:49:45 UTC
Sorry for the delay again.

I still don't get a dump. It isn't automatically created. There's no "/sys/devices/virtual/devcoredump" directory.

$ ls /sys/devices/virtual/
bdi  dmi  graphics  hwmon  input  mem  misc  msr  net  powercap  sound  thermal  tty  vc  vtconsole  workqueue

Here's some info from dmesg:

[    6.139636] iwlwifi 0000:01:00.0: loaded firmware version 22.391740.0 op_mode iwlmvm
...
[ 2018.956116] iwlwifi 0000:01:00.0: Queue 16 stuck for 4000 ms.
...
[ 2018.958139] iwlwifi 0000:01:00.0: Microcode SW error detected.  Restarting 0x2000000.

I'll also attach the output of a `dmesg|grep iwl`
Comment 11 Volker Mische 2016-11-22 09:50:55 UTC
Created attachment 245561 [details]
Output of dmesg|grep iwl
Comment 12 Emmanuel Grumbach 2016-11-22 20:37:39 UTC
can you sure your compilation config file?
Do you have DEVCOREDUMP selected?
Comment 13 Volker Mische 2016-11-22 22:24:13 UTC
Created attachment 245761 [details]
My Kernel config file

I've attached my full config file, here's the DEV_COREDUMP part of it:

$ grep DEV_COREDUMP /boot/config-4.8.0-rc6
CONFIG_WANT_DEV_COREDUMP=y
CONFIG_ALLOW_DEV_COREDUMP=y
CONFIG_DEV_COREDUMP=y

I also double checked if I'm currently running that kernel, I do.

Is there anyway to check it in a running system if the setting was really set?
Comment 14 Luca Coelho 2017-01-11 12:22:47 UTC
You need to have mac80211's debugfs flag set, but you don't have it:

 # CONFIG_MAC80211_DEBUGFS is not set

...then you can also enable CONFIG_IWLWIFI_DEBUGFS (which depends on CONFIG_MAC80211_DEBUGFS).  This should bring /sys/kernel/debug/iwlwifi to life.
Comment 15 Luca Coelho 2017-01-23 09:25:30 UTC
Ping?

We haven't heard anything from you for a long time.  Did you have the chance to get more logs?
Comment 16 Luca Coelho 2017-01-26 06:19:40 UTC
Without more data we can't really proceed with this bug.  Marking it as resolved.

Feel free to reopen if you manage to get the data we need.

Thanks for reporting!
Comment 17 Volker Mische 2017-01-27 17:25:29 UTC
Sorry for the delay, I was on holidays and then wasn't able to reproduce it. Now it happened again, I've sent the dump as described on the wiki page.

I also upgraded my kernel to 4.10.0-rc4.
Comment 18 Luca Coelho 2017-01-27 19:50:52 UTC
Great, thanks for the logs! I'll open an internal bug with your logs.
Comment 19 Emmanuel Grumbach 2017-01-28 17:14:43 UTC
Are you sure you used the firmware that Luca added?
Seems like the debugging wasn't enabled.
Comment 20 Volker Mische 2017-01-28 17:46:08 UTC
I'm so sorry. I indeed used the wrong firmware. Now I'm using the right one, I'll email again once I see a crash.
Comment 21 Emmanuel Grumbach 2017-01-30 08:23:16 UTC
The problem is that we hear noise on the extension channel and hence we can't start transmitting anything.

Please try to move to Core24 release:

https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/core_release

The new core24 firmware (-27.ucode) includes fixes that may very well help.

Thank you.
Comment 22 Volker Mische 2017-01-30 13:12:53 UTC
For those who run into similar issues and want to give it a try and run a vanilla Kernel like me, make sure you increase in `drivers/net/wireless/intel/iwlwifi/iwl-8000.c` the `IWL8000_UCODE_API_MAX` from 26 to 27. Else your firmware won't be loaded.
Comment 23 Volker Mische 2017-01-30 13:48:20 UTC
Created attachment 253571 [details]
dmsg output of -27 crash

At the moment I don't get those "queue stuck" errors. Though the firmware crashed. I've attached my dmesg output. Should I create a new bug report?
Comment 24 Emmanuel Grumbach 2017-01-30 13:51:10 UTC
This is another issue.
Please open a new bug report.
Comment 25 Luca Coelho 2017-01-30 13:53:10 UTC
Forcing the driver to load -27 may or may not work. :)

It's a risky proposal, because we have FW versions for a reason. ;)

In any case, let's see if Core24 (or FW -27) works and if it does, we can try to pinpoint the fix.

I think you should only open a new bug report once you try it with Core24 and the -27 FW... If you mix things, unexpected things can happen.
Comment 26 Luca Coelho 2017-02-07 05:15:39 UTC
Would you be able to retest this with the real Core24 driver (you probably need an older kernel for it to work) and the -27 firmware?

Another option would be to try the wireless-testing tree, which has a driver for the -27 firmware (and is going to reach v4.11).
Comment 27 Volker Mische 2017-02-07 07:05:19 UTC
Luca, the Core24 driver didn't get picked up by my 4.10.0-rc4 Kernel as the maximum version is set to 26.

Bumping that maximum version manually isn't recommend as you mentioned above. So I don't know what the proper way would be to build a kernel that works with the Core24 driver as expected. If you have any link to some Kernel source I could use I'm happy to try it.
Comment 28 Luca Coelho 2017-02-07 09:01:09 UTC
Unfortunately our Core24 doesn't compile on top of 4.10...  The latest version on top of which it compiles is v4.8.

Our master branch, OTOH, compiles up to v4.9.  I have just pushed the latest version to our tree in git.kernel.org:

https://git.kernel.org/cgit/linux/kernel/git/iwlwifi/backport-iwlwifi.git/
Comment 29 Volker Mische 2017-02-21 13:04:21 UTC
Created attachment 254853 [details]
dmesg|grep iwl output

I've compiled the backport mater branch, installed the modules and it also seems to load the Core24 firmware correctly. There are still frequent crashes as you can see on the dmesg output. My Kernel version is 4.8.0rc6.
Comment 30 Emmanuel Grumbach 2017-02-21 13:43:38 UTC
That's a different problem, but you are using the right firmware.
Comment 31 Emmanuel Grumbach 2017-02-21 14:08:56 UTC
Created attachment 254855 [details]
Core24 with monitor

Can you please get the firmware dump using the firmware attached here?
Just like Luca explained in comment 6.

Thank you.
Comment 32 Emmanuel Grumbach 2017-02-21 17:38:56 UTC
The data was sent privately and is now attached to the internal ticket.
Comment 33 Luca Coelho 2017-04-26 11:54:27 UTC
According to one of our firmware developers, there are currently two known reasons for this 0x100C ASSERT to happen:

1. Some sort of noise in the platform (usually related to USB 3.0);

2. A bug for which we have a fix candidate;


For 1, we have some questions:

a) Do you use USB 3.0? If so, can you move the device you have connected to another port and see if it helps? It's also worth trying to disconnect the USB 3.0 device to see if it helps.

b) What kind of computer you have? Is it a laptop or a tablet or...? Can you tell us the model?

c) Is Bluetooth or LTE also enabled in this device?


For 2, we'll provide you with a fix candidate to test.
Comment 34 Volker Mische 2017-04-26 12:16:45 UTC
a) I don't use USB 3.0, at least not actively. I don't have anything connected to any port. Can I somehow disable USB 3.0 ports completely somehow? Could it be an internal device that is connected to USB 3.0?

b) It's a laptop, a Tuxedo InfinityBook 13 (rev 2). It's one of those whitelabel machines, I've seen other ones from other countries as well. Basis seems to be a Topstar U931, for more information see http://www.pcurtis.com/helios.htm
If you want e.g. `hwinfo` output, let me know.

c) Bluetooth should be disabled, but perhaps it's on without me knowing. I can check the next time I try. It doesn't have LTE.
Comment 35 Luca Coelho 2017-04-26 12:49:42 UTC
Thanks! I'm not sure if you can disable USB 3.0 completely, probably somewhere in the bios?

We're compiling the new version of the firmware for you.  But unfortunately everything seems to be pointing to a platform noise problem. :(
Comment 36 Luca Coelho 2017-04-27 06:37:29 UTC
Created attachment 256087 [details]
Firmware with a potential fix (Core24)

Can try to reproduce the problem with this firmware?

There is a potential fix for this issue that works in some cases, hopefully it will work for you too.
Comment 37 Volker Mische 2017-05-28 18:26:54 UTC
Sorry for the long delay. I've copied it over the older -27. I tried to use it with my 4.8 Kernel, but it didn't load correctly. This is a `dmesg|grep iwl`:

[    5.874410] Loading modules backported from iwlwifi
[    5.874410] iwlwifi-stack-public:release/LinuxCore24:5768:2a86abaf
[    5.921889] iwlwifi 0000:01:00.0: enabling device (0000 -> 0002)
[    5.946534] iwlwifi 0000:01:00.0: Direct firmware load for iwl-dbg-cfg.ini failed with error -2
[    5.948568] iwlwifi 0000:01:00.0: capa flags index 3 larger than supported by driver
[    5.949438] iwlwifi 0000:01:00.0: loaded firmware version 27.487586.1 op_mode iwlmvm
[    5.997635] iwlwifi 0000:01:00.0: Detected Intel(R) Dual Band Wireless AC 8260, REV=0x208
[    5.999721] iwlwifi 0000:01:00.0: L1 Disabled - LTR Enabled
[    6.000553] iwlwifi 0000:01:00.0: L1 Disabled - LTR Enabled
[    7.042449] iwlwifi 0000:01:00.0: SecBoot CPU1 Status: 0x3090001, CPU2 Status: 0x0
[    7.042452] iwlwifi 0000:01:00.0: Failed to start INIT ucode: -110
[    7.046637] iwlwifi 0000:01:00.0: Failed to run INIT ucode: -110
[    7.046684] iwlwifi 0000:01:00.0: L1 Disabled - LTR Enabled
Comment 38 Luca Coelho 2017-05-29 06:18:29 UTC
Oh, sorry, Volker.  It seems that we compiled the firmware with a signature that only works internally.  I've asked the FW team to provide a properly signed version you can use.
Comment 39 Luca Coelho 2017-05-29 11:10:31 UTC
Created attachment 256757 [details]
Properly signed firmware with a potential fix (Core24)

Can you please try with this new firmware? This one should be properly signed now.
Comment 40 Volker Mische 2017-05-29 11:59:17 UTC
Created attachment 256759 [details]
dmesg|grep iwl output 2017-05-29

I was able to lead the firmware. Sadly I still see crashes. Here's the output from `dmesg|grep iwl`.
Comment 41 Luca Coelho 2017-05-29 12:12:13 UTC
:(

Okay, thanks for testing.  Apparently we have some other possible fix candidates, I'll provide you with a new FW for testing soon.
Comment 42 Luca Coelho 2017-06-05 09:00:05 UTC
Volker, we have another firmware for you to test.  I'll send it to you by email.

Due to the complexity of the changes, I need to send you the firmware of our "mainline", which means you will have to use the driver from the master branch of our backport releases, which we publish here:

https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/backport-iwlwifi.git/

If this fixes your problem, we will backport this fix to our Core24 release so you can work with the official version.

Thanks a lot for your cooperation!
Comment 43 Volker Mische 2017-06-07 23:14:53 UTC
Created attachment 256911 [details]
dmesg|grep iwl output 2017-06-08

I've tried the new version with a 4.9 Kernel (the default one from Debian testing). I still get crashes. I've attached the `dmesg|grep iwl` output.
Comment 44 Luca Coelho 2017-06-12 06:40:30 UTC
Thanks Volker.  Back to the drawing board...
Comment 45 Emmanuel Grumbach 2017-06-18 05:02:05 UTC
*** Bug 196109 has been marked as a duplicate of this bug. ***
Comment 46 Nathan Baker 2017-06-18 06:00:39 UTC
Created attachment 257061 [details]
Syslog error for non-queue-stuck 1007 assert
Comment 47 Nathan Baker 2017-06-18 06:02:33 UTC
For what it's worth, my bug (bug 196109) was dup'd to this one because it has the same assert. I don't see the Queue Stuck message, but presumably it's the same root cause?

I've attached my syslog output in case it provides any more useful information for triage. Please see the original bug I filed for more information.
Comment 48 Emmanuel Grumbach 2017-06-19 13:27:33 UTC
@Nathan can you please reply to the questions in comment 33?

Thanks.
Comment 49 Emmanuel Grumbach 2017-06-19 13:28:39 UTC
@Nathan, can you please test the firmware from comment #39?
Comment 50 Nathan Baker 2017-06-19 21:20:10 UTC
a) Do you use USB 3.0? 
- Yes. I can't easily move the device plugged in, but next time I reboot maybe I can try. It's not a great solution though. I can't disconnect the device without dramatically changing my workflow.

b) What kind of computer you have? Is it a laptop or a tablet or...? Can you tell us the model?
- Intel NUC Skull Canyon. It's a very small form factor PC.

c) Is Bluetooth or LTE also enabled in this device?
- Yes. I am using Bluetooth.


I have installed the new firmware and the next time I reboot I will report the results. I will first try without moving the USB3 device to see if the problem clears up. If that doesn't work I will reboot again and move the USB3 device.
Comment 51 Emmanuel Grumbach 2017-06-20 03:39:46 UTC
You don't have to reboot to replace the firmware. Reloading iwlwifi kernel module is enough.
Comment 52 Nathan Baker 2017-06-20 04:26:35 UTC
Ah, cheers, should have thought of that.

> [339388.935961] Intel(R) Wireless WiFi driver for Linux
> [339388.935962] Copyright(c) 2003- 2015 Intel Corporation
> [339388.938282] iwlwifi 0000:03:00.0: Direct firmware load for
> iwlwifi-8000C-28.ucode failed with error -2
> [339388.938735] iwlwifi 0000:03:00.0: capa flags index 3 larger than
> supported by driver
> [339388.939151] iwlwifi 0000:03:00.0: loaded firmware version 27.532463.0
> op_mode iwlmvm
> [339388.943718] iwlwifi 0000:03:00.0: Detected Intel(R) Dual Band Wireless AC
> 8260, REV=0x208
> [339388.946085] iwlwifi 0000:03:00.0: L1 Enabled - LTR Enabled
> [339388.947035] iwlwifi 0000:03:00.0: L1 Enabled - LTR Enabled

I hope that looks reasonably correct to you.

I'll give it 24 hours before I declare a verdict, but (at the risk of jinxing it) things are looking pretty promising so far!
Comment 53 Nathan Baker 2017-06-20 05:01:17 UTC
May have spoken too soon:

> [342369.176682] ieee80211 phy0: Hardware restart was requested
> [342369.674264] iwlwifi 0000:03:00.0: L1 Enabled - LTR Enabled
> [342369.674621] iwlwifi 0000:03:00.0: L1 Enabled - LTR Enabled
> [342369.804310] iwlwifi 0000:03:00.0: L1 Enabled - LTR Enabled
> [342369.804727] iwlwifi 0000:03:00.0: L1 Enabled - LTR Enabled
> [342379.786546] iwlwifi 0000:03:00.0: Microcode SW error detected. 
> Restarting 0x82000000.
> [342379.786556] iwlwifi 0000:03:00.0: CSR values:
> [342379.786560] iwlwifi 0000:03:00.0: (2nd byte of CSR_INT_COALESCING is
> CSR_INT_PERIODIC_REG)
> [342379.786567] iwlwifi 0000:03:00.0:        CSR_HW_IF_CONFIG_REG: 0X18c89008
> [342379.786583] iwlwifi 0000:03:00.0:          CSR_INT_COALESCING: 0X00000040
> [342379.786598] iwlwifi 0000:03:00.0:                     CSR_INT: 0X00000000
> [342379.786612] iwlwifi 0000:03:00.0:                CSR_INT_MASK: 0X00000000
> [342379.786627] iwlwifi 0000:03:00.0:           CSR_FH_INT_STATUS: 0X00000000
> [342379.786642] iwlwifi 0000:03:00.0:                 CSR_GPIO_IN: 0X00000019
> [342379.786656] iwlwifi 0000:03:00.0:                   CSR_RESET: 0X00000000
> [342379.786671] iwlwifi 0000:03:00.0:                CSR_GP_CNTRL: 0X08040005
> [342379.786686] iwlwifi 0000:03:00.0:                  CSR_HW_REV: 0X00000201
> [342379.786700] iwlwifi 0000:03:00.0:              CSR_EEPROM_REG: 0Xd55555d5
> [342379.786715] iwlwifi 0000:03:00.0:               CSR_EEPROM_GP: 0Xd55555d5
> [342379.786729] iwlwifi 0000:03:00.0:              CSR_OTP_GP_REG: 0Xd55555d5
> [342379.786744] iwlwifi 0000:03:00.0:                 CSR_GIO_REG: 0X001f0042
> [342379.786759] iwlwifi 0000:03:00.0:            CSR_GP_UCODE_REG: 0X00000000
> [342379.786773] iwlwifi 0000:03:00.0:           CSR_GP_DRIVER_REG: 0X00000000
> [342379.786788] iwlwifi 0000:03:00.0:           CSR_UCODE_DRV_GP1: 0X00000000
> [342379.786803] iwlwifi 0000:03:00.0:           CSR_UCODE_DRV_GP2: 0X00000000
> [342379.786817] iwlwifi 0000:03:00.0:                 CSR_LED_REG: 0X00000060
> [342379.786832] iwlwifi 0000:03:00.0:        CSR_DRAM_INT_TBL_REG: 0X883b40f9
> [342379.786847] iwlwifi 0000:03:00.0:        CSR_GIO_CHICKEN_BITS: 0X07800200
> [342379.786861] iwlwifi 0000:03:00.0:             CSR_ANA_PLL_CFG: 0Xd55555d5
> [342379.786876] iwlwifi 0000:03:00.0:      CSR_MONITOR_STATUS_REG: 0Xc03803c0
> [342379.786891] iwlwifi 0000:03:00.0:           CSR_HW_REV_WA_REG: 0X0001001a
> [342379.786905] iwlwifi 0000:03:00.0:        CSR_DBG_HPET_MEM_REG: 0Xffff0000
> [342379.786909] iwlwifi 0000:03:00.0: FH register values:
> [342379.786935] iwlwifi 0000:03:00.0:         FH_RSCSR_CHNL0_STTS_WPTR_REG:
> 0X28629e00
> [342379.786961] iwlwifi 0000:03:00.0:        FH_RSCSR_CHNL0_RBDCB_BASE_REG:
> 0X01de2450
> [342379.786986] iwlwifi 0000:03:00.0:                  FH_RSCSR_CHNL0_WPTR:
> 0X000000c8
> [342379.787012] iwlwifi 0000:03:00.0:         FH_MEM_RCSR_CHNL0_CONFIG_REG:
> 0X80801054
> [342379.787038] iwlwifi 0000:03:00.0:          FH_MEM_RSSR_SHARED_CTRL_REG:
> 0X000000fc
> [342379.787063] iwlwifi 0000:03:00.0:            FH_MEM_RSSR_RX_STATUS_REG:
> 0X07830000
> [342379.787089] iwlwifi 0000:03:00.0:    FH_MEM_RSSR_RX_ENABLE_ERR_IRQ2DRV:
> 0X00000000
> [342379.787115] iwlwifi 0000:03:00.0:                FH_TSSR_TX_STATUS_REG:
> 0X07ff0003
> [342379.787140] iwlwifi 0000:03:00.0:                 FH_TSSR_TX_ERROR_REG:
> 0X00000000
> [342379.787284] iwlwifi 0000:03:00.0: Start IWL Error Log Dump:
> [342379.787288] iwlwifi 0000:03:00.0: Status: 0x00000000, count: 6
> [342379.787292] iwlwifi 0000:03:00.0: Loaded firmware version: 27.532463.0
> [342379.787297] iwlwifi 0000:03:00.0: 0x00001007 | ADVANCED_SYSASSERT         
> [342379.787301] iwlwifi 0000:03:00.0: 0x008006F4 | trm_hw_status0
> [342379.787304] iwlwifi 0000:03:00.0: 0x00000000 | trm_hw_status1
> [342379.787308] iwlwifi 0000:03:00.0: 0x0000FECC | branchlink2
> [342379.787311] iwlwifi 0000:03:00.0: 0x00029602 | interruptlink1
> [342379.787315] iwlwifi 0000:03:00.0: 0x00000000 | interruptlink2
> [342379.787318] iwlwifi 0000:03:00.0: 0x00030400 | data1
> [342379.787323] iwlwifi 0000:03:00.0: 0x0000040B | data2
> [342379.787326] iwlwifi 0000:03:00.0: 0xDEADBEEF | data3
> [342379.787330] iwlwifi 0000:03:00.0: 0x0F4050BD | beacon time
> [342379.787333] iwlwifi 0000:03:00.0: 0x538D4F42 | tsf low
> [342379.787337] iwlwifi 0000:03:00.0: 0x0000008D | tsf hi
> [342379.787340] iwlwifi 0000:03:00.0: 0x00000000 | time gp1
> [342379.787343] iwlwifi 0000:03:00.0: 0x00976F19 | time gp2
> [342379.787347] iwlwifi 0000:03:00.0: 0x00000001 | uCode revision type
> [342379.787350] iwlwifi 0000:03:00.0: 0x0000001B | uCode version major
> [342379.787359] iwlwifi 0000:03:00.0: 0x00081FEF | uCode version minor
> [342379.787362] iwlwifi 0000:03:00.0: 0x00000201 | hw version
> [342379.787366] iwlwifi 0000:03:00.0: 0x18C89008 | board version
> [342379.787369] iwlwifi 0000:03:00.0: 0x0B07001C | hcmd
> [342379.787373] iwlwifi 0000:03:00.0: 0x80022002 | isr0
> [342379.787376] iwlwifi 0000:03:00.0: 0x00000000 | isr1
> [342379.787380] iwlwifi 0000:03:00.0: 0x0800180A | isr2
> [342379.787383] iwlwifi 0000:03:00.0: 0x004168C5 | isr3
> [342379.787386] iwlwifi 0000:03:00.0: 0x00000000 | isr4
> [342379.787390] iwlwifi 0000:03:00.0: 0x0500001C | last cmd Id
> [342379.787393] iwlwifi 0000:03:00.0: 0x00000000 | wait_event
> [342379.787396] iwlwifi 0000:03:00.0: 0x00000400 | l2p_control
> [342379.787400] iwlwifi 0000:03:00.0: 0x00000020 | l2p_duration
> [342379.787403] iwlwifi 0000:03:00.0: 0x0000003F | l2p_mhvalid
> [342379.787407] iwlwifi 0000:03:00.0: 0x00000000 | l2p_addr_match
> [342379.787410] iwlwifi 0000:03:00.0: 0x0000000D | lmpm_pmg_sel
> [342379.787414] iwlwifi 0000:03:00.0: 0x29051029 | timestamp
> [342379.787417] iwlwifi 0000:03:00.0: 0x0000C8D8 | flow_handler
> [342379.787495] iwlwifi 0000:03:00.0: Start IWL Error Log Dump:
> [342379.787500] iwlwifi 0000:03:00.0: Status: 0x00000000, count: 7
> [342379.787506] iwlwifi 0000:03:00.0: 0x00000070 | ADVANCED_SYSASSERT
> [342379.787511] iwlwifi 0000:03:00.0: 0x00000000 | umac branchlink1
> [342379.787516] iwlwifi 0000:03:00.0: 0xC0085E88 | umac branchlink2
> [342379.787520] iwlwifi 0000:03:00.0: 0xC0083660 | umac interruptlink1
> [342379.787525] iwlwifi 0000:03:00.0: 0xC0083660 | umac interruptlink2
> [342379.787531] iwlwifi 0000:03:00.0: 0x00000800 | umac data1
Comment 54 Luca Coelho 2017-06-20 06:10:37 UTC
As I understand from the firmware team, there are a few different issues that can cause this same sysassert.  The firmware in comment 39 had a couple of fixes.  We have some more fixes in another firmware I sent Volker, but you need to use the master branch of our backports (see comment #42).

I'll send you the firmware for that privately as well so we can check if it fixes your problem.
Comment 55 Nathan Baker 2017-06-20 08:17:14 UTC
I built and installed the new driver and loaded the new firmware and I still see the problem. 

The status code is different now (0x80 versus 0x00):

> [ 1263.722870] iwlwifi 0000:03:00.0: Microcode SW error detected.  Restarting
> 0x82000000.
> [ 1263.723016] iwlwifi 0000:03:00.0: Start IWL Error Log Dump:
> [ 1263.723022] iwlwifi 0000:03:00.0: Status: 0x00000080, count: 6
> [ 1263.723025] iwlwifi 0000:03:00.0: Loaded firmware version: 32.530564.0

Unless there's any value in me sticking with this driver / firmware combo, I think I'll switch back to my distro's.
Comment 56 Emmanuel Grumbach 2017-06-20 08:20:23 UTC
What matters here is the ASSERT code.
Comment 57 Luca Coelho 2017-06-20 08:23:51 UTC
Thanks for testing, Nathan!

It's a shame it didn't work for you either.  Can you confirm that the SYSASSERT is the same one as before (i.e. 0x1007)?
Comment 58 Nathan Baker 2017-06-20 08:36:58 UTC
Yes, sorry, I should have copied that too.

Interestingly, the most recent error I saw uses a different code after "Restarting", but still the same assert code.

> [  332.879424] iwlwifi 0000:03:00.0: Microcode SW error detected.  Restarting
> 0x2000000.
> [  332.879560] iwlwifi 0000:03:00.0: Start IWL Error Log Dump:
> [  332.879563] iwlwifi 0000:03:00.0: Status: 0x00000080, count: 6
> [  332.879565] iwlwifi 0000:03:00.0: Loaded firmware version: 32.530564.0
> [  332.879567] iwlwifi 0000:03:00.0: 0x00001007 | ADVANCED_SYSASSERT
Comment 59 Nathan Baker 2017-06-20 09:10:23 UTC
Also, I don't know if this is relevant, but the updated driver caused my bluetooth keyboard to stop working.

I thought it might just be a coincidence so I decided to triage it later, but after removing the new driver and firmware the keyboard started working again.
Comment 60 Luca Coelho 2017-06-20 09:51:43 UTC
All these other values don't mean much in this case, so they can change.  We can check a bit further if you attach the dmesg, but I don't think it will help...

Regarding the BT keyboard issue, it's probably unrelated, so we'll just have to make sure BT works fine when we officially release this new firmware.

Since this new FW didn't help, I recommend that you revert to your distro's firmware and driver.  I'll ping the FW guys to see if they have more fixes that may help.
Comment 61 David Meriin 2017-06-21 08:37:48 UTC
Hi Nathan,

Did you have a chance to remove the USB 3.0 device in the end and see if it works for you?

I understand it doesn't work out as a solution to you, but we'd like to see if the USB device is the root cause (as it's already an ongoing known issue) or it something else.

Thanks,
David.
Comment 62 Nathan Baker 2017-06-21 09:43:47 UTC
No, I'm afraid I haven't tried that yet. Unfortunately the USB hard drive I have plugged in is pretty heavily tied into my system and my workflow, so I'd have to make some changes to even boot with it unplugged. I'll give it a go at some point, hopefully soon, but I haven't managed to do that yet. Sorry.
Comment 63 Nathan Baker 2017-06-24 03:04:30 UTC
Annoyingly (for me, anyway), unplugging the USB3 hub seems to have made a big difference. I did notice one dip in throughput as though the adapter was resetting, but I didn't see anything written to dmesg. It's been running for several hours now at semi-normal usage (as normal as I can get minus the USB hard drive) without tripping the assert once.

I will press on as much as I can before I need to plug the drive back in to do work. Will be interesting to see what happens.
Comment 64 Nathan Baker 2017-06-24 06:05:29 UTC
And I saw the assert again once I plugged in the USB hub (not immediately, but after some time). Didn't reboot.

Any thoughts about how to work around the noise issue then?
Comment 65 David Meriin 2017-06-25 07:12:49 UTC
Hi Nathan,

Unfortunately we currently don't have a solution for this assert which is caused by the USB interference. Our system engineers are looking for a workaround for this issue. 

This noise interference is observed because of the proximity of the USB port to our NIC. The only workaround I could suggest at the moment is to plug your device to another USB port, in case you have one. A different USB port is probably further than our NIC and thus its noise impact would be much less significant. 


We'll be sure to let you know as soon as will have a fix for this issue.

Thanks,
David.
Comment 66 Nathan Baker 2017-06-25 08:52:19 UTC
Thanks for the suggestion. Unfortunately, moving the plug to the front ports does not improve the situation. Do keep me advised of any improvements you may make.
Comment 67 Emmanuel Grumbach 2017-06-25 19:51:46 UTC
@Nathan

Have you tried to change the channel / band in which the AP is operating?
Comment 68 Nathan Baker 2017-06-25 21:19:35 UTC
Unfortunately the AP is managed by my ISP and that's not an option I'm given.
Comment 69 Emmanuel Grumbach 2017-06-26 18:39:57 UTC
Volker, anything to contribute before we close this bug?
Comment 70 Volker Mische 2017-06-26 18:44:19 UTC
Emmanuel, in my case I don't have any USB device connected. What would be the proximity the USB port need to have to the Wifi chip? I could open my machine and check the layout of the mainboard.
Comment 71 Emmanuel Grumbach 2017-06-26 18:50:45 UTC
Have you tried to change the channel of the AP?
Comment 72 Nathan Baker 2017-06-26 20:04:16 UTC
Hold up, why would you close this bug?

If the claim is that Intel's hardware is buggy and the problem isn't with the driver, are there corresponding reports of problems on Windows with noise from BT or USB3? Is there evidence that there's nothing the driver can do to work around this problem?

Can this bug at least document the need for better diagnostics so "Assert 1007" doesn't mean "one of four or five separate issues which need to be separately triaged"?
Comment 73 Luca Coelho 2017-06-26 20:43:50 UTC
Nathan,

We have a white paper on this subject:

https://www.intel.com/content/www/us/en/io/universal-serial-bus/usb3-frequency-interference-paper.html

Please take a look to understand the details.

The driver doesn't know what is going on.  The firmware cannot tell what kind of noise it is and the problem is that it just keeps trying to find a clean air time to send data but it never comes, so it gives up.

As David mentioned, our system engineers are (and have been!) working hard to try to find a workaround for the issue.
Comment 74 Nathan Baker 2017-06-26 20:53:59 UTC
Thanks for the link. I note that the paper specifically mentions 2.4GHz WiFi as being affected, but I noticed that the problem was much worse on 5GHz (so much so that it was unusable; I had to fall back to 2.4GHz before I could use it).

Is it just that 5GHz WiFi was not in use in 2012 when the paper was written? Or do my problems with the 5GHz network indicate there could be multiple causes?

It's unfortunate that although Intel knew about the problem in 2012, when they designed the NUC6i7KYK which came out last year it was still susceptible to the problem. Perhaps you can bring this feedback to the systems engineers that designing systems to avoid known problems is generally considered good practice.
Comment 75 Luca Coelho 2017-06-26 21:16:14 UTC
Nathan,

I understand your frustration and I hope you will be able to find a way to circumvent the problem.

Regarding 5GHz, I'm not really sure, but from what I understand from the whitepaper, there should be less interference in the 5GHz band.  5GHz is much more sensitive, though, so I'm not sure you'd have better results.

When you tested 5GHz, did you see the same SYSASSERT? Or something else? You could try the firmware and driver we suggested earlier again to test 5GHz specifically and see if the problem is still there.

Another thing to try would be to use a different USB cable.  Maybe some cables have better shielding than others? There were also some other mitigation suggestions in the whitepaper.

And finally, since this is a NUC, and not a laptop, you could try to use an external antenna.  It could help and it shouldn't be as inconvenient as having an external antenna on a laptop. ;)
Comment 76 Nathan Baker 2017-06-26 21:38:00 UTC
Oh, haha, I completely failed to file a bug about it. 5GHz actually gives me a kernel oops.

> Jun 27 09:25:12 nathanb-nuc kernel: iwlwifi 0000:03:00.0: PHY ctxt cmd error.
> ret=-5
> Jun 27 09:25:12 nathanb-nuc kernel: iwlwifi 0000:03:00.0: Failed to send MAC
> context (action:2): -5
> Jun 27 09:25:12 nathanb-nuc kernel: iwlwifi 0000:03:00.0: failed to update
> MAC 00:c2:c6:dd:7f:cf
> Jun 27 09:25:12 nathanb-nuc kernel: iwlwifi 0000:03:00.0: Failed to send MAC
> context (action:2): -5
> Jun 27 09:25:12 nathanb-nuc kernel: iwlwifi 0000:03:00.0: failed to update
> MAC 00:c2:c6:dd:7f:cf
> Jun 27 09:25:12 nathanb-nuc kernel: BUG: unable to handle kernel NULL pointer
> dereference at 000000000000011c
> Jun 27 09:25:12 nathanb-nuc kernel: IP: iwl_mvm_add_sta+0x4f1/0x780 [iwlmvm]
> Jun 27 09:25:12 nathanb-nuc kernel: PGD 4a22e0067 
> Jun 27 09:25:12 nathanb-nuc kernel: PUD 4a8c59067 
> Jun 27 09:25:12 nathanb-nuc kernel: PMD 0 
> Jun 27 09:25:12 nathanb-nuc kernel: 
> Jun 27 09:25:12 nathanb-nuc kernel: Oops: 0000 [#1] PREEMPT SMP
> Jun 27 09:25:12 nathanb-nuc kernel: Modules linked in: mmc_block fuse joydev
> input_leds hid_generic uhid algif_hash algif_skcipher af_alg ctr ccm cmac
> rfcomm mousedev hid_logitech_hidpp hid_logitech_dj usbhid sn
> Jun 27 09:25:12 nathanb-nuc kernel:  btqca lirc_dev btintel intel_gtt snd ptp
> syscopyarea sysfillrect i2c_i801 pps_core nuvoton_cir sysimgblt fb_sys_fops
> soundcore mei i2c_algo_bit shpchp intel_pch_thermal therm
> Jun 27 09:25:12 nathanb-nuc kernel: CPU: 6 PID: 559 Comm: wpa_supplicant
> Tainted: G           O    4.11.6-3-ARCH #1
> Jun 27 09:25:12 nathanb-nuc kernel: Hardware name:                 
> /NUC6i7KYB, BIOS KYSKLi70.86A.0037.2016.0603.1032 06/03/2016
> Jun 27 09:25:12 nathanb-nuc kernel: task: ffff8804a8edd700 task.stack:
> ffffc90002754000

So yeah. Not particularly usable. I assumed it was the same problem, just worse, but I guess since the 2.4GHz problem is noise there's probably actually something else going on with the 5GHz.

I should probably go ahead and file that at some point....
Comment 77 Luca Coelho 2017-06-26 21:44:37 UTC
Ouch! Can you check if this oops looks the same as the ones reported here in bug 195299 (i.e. happens during the recovery flow)?
Comment 78 Nathan Baker 2017-06-26 22:35:29 UTC
No, happens when I first try to connect.
Comment 79 Luca Coelho 2017-06-26 23:03:49 UTC
Can you post your dmesg in bug 195299? Then we can check if it's the same case.  If it's not, we'll ask you to file a new bug. ;)
Comment 80 Nathan Baker 2017-06-26 23:17:36 UTC
Done. Maybe if that one gets fixed it will help me work around this one :)
Comment 81 Nathan Baker 2017-07-01 06:07:54 UTC
I switched from HDMI to DP this week (for reasons not related to WiFi), and for what it's worth I haven't seen an ASSERT 1007 since (knock on wood).
Comment 82 David Meriin 2017-07-02 11:51:16 UTC
Hi,

Thank you Nathan for your input. HDMI could also interfere with the 2.4Ghz band

There are a few more sources that may interfere with your WiFi.
Mische, would you like to check that some of the sources below are not interfering in your case?

1. Electronic devices - HDMI cables/cordless home phone/ microwave/ baby monitor/ wireless speakers/TV etc.  If you have any of them close to your laptop, I suggest moving your laptop or try and shut those devices down completely to see if it helps.
2. Hard drives with low shielding to the cables -  do you have any hard drivers/components you've added to the laptop yourself? It could be an interference of one of them
3. notebook/laptop with the lid closed which has an external monitor connected to it.  This issue was also reported as a possible source of interference.

Thanks,
David.
Comment 83 Volker Mische 2017-07-02 14:19:54 UTC
Created attachment 257295 [details]
dmesg|grep iwl output 2017-07-02

Hi David,

I've tried to unplug as many things as I can and put things as far away as possible. I still get errors, I've attached my `dmesg|grep iwl` output. My Kernel version is:

    Linux frea 4.9.0-2-amd64 #1 SMP Debian 4.9.18-1 (2017-03-30) x86_64 GNU/Linux

My laptop was running on battery without anything connected.
Comment 84 David Meriin 2017-07-02 14:22:50 UTC
Hi Mische,

Thank you for trying that out. I hoped it might be beneficial for you. In anyway I would recommend to switch the AP to a different channel if possible and see if it helps you out. 

Thanks,
David.
Comment 85 Emmanuel Grumbach 2017-07-22 18:05:19 UTC
*** Bug 196439 has been marked as a duplicate of this bug. ***
Comment 86 Luca Coelho 2017-07-28 15:16:13 UTC
I talked with the Firmware engineers again and they are working on a way to mitigate problems due to noise coming from USB, but it's at very early stages now and we won't have a fix for this in the foreseeable future.

I have to close this bug as won't fix.
Comment 87 Emmanuel Grumbach 2017-08-12 19:27:13 UTC
*** Bug 196641 has been marked as a duplicate of this bug. ***
Comment 88 Emmanuel Grumbach 2017-10-23 06:28:00 UTC
*** Bug 197331 has been marked as a duplicate of this bug. ***