I'm getting error messages like Queue 2 stuck for 10000 ms. when I'm connected to my home 2.5 GHz network with 40MHz. It's the only network in range. I've an Intel 8260 chip. It's often Queue 2, but I've also seen it on Queue 0 or 16. I don't have issues if set the cfg80211_disable_40mhz_24ghz parameter and hence connected to the 20MHz (I checked via "iw dev"). The issue is also there if I set the power_scheme to 1 or if I run on battery without being plugged into the power socket. I think that's all the information I have for now, I hope I didn't miss anything that was mentioned in the email thread [1]. In case I should create a dump, let me know (it should be easy, it happens every few minutes). [1]: http://marc.info/?l=linux-wireless&m=147429366505895
Is it only error Messages or does your Wifi Connection drop?
The Wifi connection drops, I get very slow pings back from the router.
We are getting many reports of this kind of the "Queue stuck" problem and are currently investigating it. Unfortunately no ETA for a fix yet. Meanwhile, can you check if the workarounds described in the following link help? https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi#about_platform_noise
Luca, please see the original bug report description for the workarounds that I've tried.
Oh, I'm sorry, I forgot about that thread. We'll come back to you with more instructions on how to get more data. We're getting quite a few reports on this problem and I'm trying to gather all the info I can to take this up with our firmware team.
Created attachment 241471 [details] iwlwifi-8000C-22.ucode firmware with debugging enabled Can you take some logs with the attached firmware, following the instructions here? https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/debugging#firmware_debugging This will allow us to further debug the issues you are experiencing. And please make sure you read and understand the privacy aspects of sending such logs to us: https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/debugging#privacy_aspects
Sorry for the long delay. I've installed the patched firmware. The instructions say that the dump will automatically created on a kernel >= 4.1. My problem is that I don't have a "/sys/devices/virtual/devcoredump" directory, althought my kernel has "CONFIG_ALLOW_DEV_COREDUMP=y" set. I also don't have "/sys/kernel/debug/iwlwifi" although "CONFIG_IWLWIFI_DEBUG=y" Looks like I'm still missing some Kernel configuration. Does anyone have a clue?
the devcoredump directory should be created only upon a firmware crash which will happen upon the queue stuck thing you are seeing.
I saw crashes via dmesg. Anyway I'll try it tonight again. At least it's good to know that I should have a devcoredump directory.
Sorry for the delay again. I still don't get a dump. It isn't automatically created. There's no "/sys/devices/virtual/devcoredump" directory. $ ls /sys/devices/virtual/ bdi dmi graphics hwmon input mem misc msr net powercap sound thermal tty vc vtconsole workqueue Here's some info from dmesg: [ 6.139636] iwlwifi 0000:01:00.0: loaded firmware version 22.391740.0 op_mode iwlmvm ... [ 2018.956116] iwlwifi 0000:01:00.0: Queue 16 stuck for 4000 ms. ... [ 2018.958139] iwlwifi 0000:01:00.0: Microcode SW error detected. Restarting 0x2000000. I'll also attach the output of a `dmesg|grep iwl`
Created attachment 245561 [details] Output of dmesg|grep iwl
can you sure your compilation config file? Do you have DEVCOREDUMP selected?
Created attachment 245761 [details] My Kernel config file I've attached my full config file, here's the DEV_COREDUMP part of it: $ grep DEV_COREDUMP /boot/config-4.8.0-rc6 CONFIG_WANT_DEV_COREDUMP=y CONFIG_ALLOW_DEV_COREDUMP=y CONFIG_DEV_COREDUMP=y I also double checked if I'm currently running that kernel, I do. Is there anyway to check it in a running system if the setting was really set?
You need to have mac80211's debugfs flag set, but you don't have it: # CONFIG_MAC80211_DEBUGFS is not set ...then you can also enable CONFIG_IWLWIFI_DEBUGFS (which depends on CONFIG_MAC80211_DEBUGFS). This should bring /sys/kernel/debug/iwlwifi to life.
Ping? We haven't heard anything from you for a long time. Did you have the chance to get more logs?
Without more data we can't really proceed with this bug. Marking it as resolved. Feel free to reopen if you manage to get the data we need. Thanks for reporting!
Sorry for the delay, I was on holidays and then wasn't able to reproduce it. Now it happened again, I've sent the dump as described on the wiki page. I also upgraded my kernel to 4.10.0-rc4.
Great, thanks for the logs! I'll open an internal bug with your logs.
Are you sure you used the firmware that Luca added? Seems like the debugging wasn't enabled.
I'm so sorry. I indeed used the wrong firmware. Now I'm using the right one, I'll email again once I see a crash.
The problem is that we hear noise on the extension channel and hence we can't start transmitting anything. Please try to move to Core24 release: https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/core_release The new core24 firmware (-27.ucode) includes fixes that may very well help. Thank you.
For those who run into similar issues and want to give it a try and run a vanilla Kernel like me, make sure you increase in `drivers/net/wireless/intel/iwlwifi/iwl-8000.c` the `IWL8000_UCODE_API_MAX` from 26 to 27. Else your firmware won't be loaded.
Created attachment 253571 [details] dmsg output of -27 crash At the moment I don't get those "queue stuck" errors. Though the firmware crashed. I've attached my dmesg output. Should I create a new bug report?
This is another issue. Please open a new bug report.
Forcing the driver to load -27 may or may not work. :) It's a risky proposal, because we have FW versions for a reason. ;) In any case, let's see if Core24 (or FW -27) works and if it does, we can try to pinpoint the fix. I think you should only open a new bug report once you try it with Core24 and the -27 FW... If you mix things, unexpected things can happen.
Would you be able to retest this with the real Core24 driver (you probably need an older kernel for it to work) and the -27 firmware? Another option would be to try the wireless-testing tree, which has a driver for the -27 firmware (and is going to reach v4.11).
Luca, the Core24 driver didn't get picked up by my 4.10.0-rc4 Kernel as the maximum version is set to 26. Bumping that maximum version manually isn't recommend as you mentioned above. So I don't know what the proper way would be to build a kernel that works with the Core24 driver as expected. If you have any link to some Kernel source I could use I'm happy to try it.
Unfortunately our Core24 doesn't compile on top of 4.10... The latest version on top of which it compiles is v4.8. Our master branch, OTOH, compiles up to v4.9. I have just pushed the latest version to our tree in git.kernel.org: https://git.kernel.org/cgit/linux/kernel/git/iwlwifi/backport-iwlwifi.git/
Created attachment 254853 [details] dmesg|grep iwl output I've compiled the backport mater branch, installed the modules and it also seems to load the Core24 firmware correctly. There are still frequent crashes as you can see on the dmesg output. My Kernel version is 4.8.0rc6.
That's a different problem, but you are using the right firmware.
Created attachment 254855 [details] Core24 with monitor Can you please get the firmware dump using the firmware attached here? Just like Luca explained in comment 6. Thank you.
The data was sent privately and is now attached to the internal ticket.
According to one of our firmware developers, there are currently two known reasons for this 0x100C ASSERT to happen: 1. Some sort of noise in the platform (usually related to USB 3.0); 2. A bug for which we have a fix candidate; For 1, we have some questions: a) Do you use USB 3.0? If so, can you move the device you have connected to another port and see if it helps? It's also worth trying to disconnect the USB 3.0 device to see if it helps. b) What kind of computer you have? Is it a laptop or a tablet or...? Can you tell us the model? c) Is Bluetooth or LTE also enabled in this device? For 2, we'll provide you with a fix candidate to test.
a) I don't use USB 3.0, at least not actively. I don't have anything connected to any port. Can I somehow disable USB 3.0 ports completely somehow? Could it be an internal device that is connected to USB 3.0? b) It's a laptop, a Tuxedo InfinityBook 13 (rev 2). It's one of those whitelabel machines, I've seen other ones from other countries as well. Basis seems to be a Topstar U931, for more information see http://www.pcurtis.com/helios.htm If you want e.g. `hwinfo` output, let me know. c) Bluetooth should be disabled, but perhaps it's on without me knowing. I can check the next time I try. It doesn't have LTE.
Thanks! I'm not sure if you can disable USB 3.0 completely, probably somewhere in the bios? We're compiling the new version of the firmware for you. But unfortunately everything seems to be pointing to a platform noise problem. :(
Created attachment 256087 [details] Firmware with a potential fix (Core24) Can try to reproduce the problem with this firmware? There is a potential fix for this issue that works in some cases, hopefully it will work for you too.
Sorry for the long delay. I've copied it over the older -27. I tried to use it with my 4.8 Kernel, but it didn't load correctly. This is a `dmesg|grep iwl`: [ 5.874410] Loading modules backported from iwlwifi [ 5.874410] iwlwifi-stack-public:release/LinuxCore24:5768:2a86abaf [ 5.921889] iwlwifi 0000:01:00.0: enabling device (0000 -> 0002) [ 5.946534] iwlwifi 0000:01:00.0: Direct firmware load for iwl-dbg-cfg.ini failed with error -2 [ 5.948568] iwlwifi 0000:01:00.0: capa flags index 3 larger than supported by driver [ 5.949438] iwlwifi 0000:01:00.0: loaded firmware version 27.487586.1 op_mode iwlmvm [ 5.997635] iwlwifi 0000:01:00.0: Detected Intel(R) Dual Band Wireless AC 8260, REV=0x208 [ 5.999721] iwlwifi 0000:01:00.0: L1 Disabled - LTR Enabled [ 6.000553] iwlwifi 0000:01:00.0: L1 Disabled - LTR Enabled [ 7.042449] iwlwifi 0000:01:00.0: SecBoot CPU1 Status: 0x3090001, CPU2 Status: 0x0 [ 7.042452] iwlwifi 0000:01:00.0: Failed to start INIT ucode: -110 [ 7.046637] iwlwifi 0000:01:00.0: Failed to run INIT ucode: -110 [ 7.046684] iwlwifi 0000:01:00.0: L1 Disabled - LTR Enabled
Oh, sorry, Volker. It seems that we compiled the firmware with a signature that only works internally. I've asked the FW team to provide a properly signed version you can use.
Created attachment 256757 [details] Properly signed firmware with a potential fix (Core24) Can you please try with this new firmware? This one should be properly signed now.
Created attachment 256759 [details] dmesg|grep iwl output 2017-05-29 I was able to lead the firmware. Sadly I still see crashes. Here's the output from `dmesg|grep iwl`.
:( Okay, thanks for testing. Apparently we have some other possible fix candidates, I'll provide you with a new FW for testing soon.
Volker, we have another firmware for you to test. I'll send it to you by email. Due to the complexity of the changes, I need to send you the firmware of our "mainline", which means you will have to use the driver from the master branch of our backport releases, which we publish here: https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/backport-iwlwifi.git/ If this fixes your problem, we will backport this fix to our Core24 release so you can work with the official version. Thanks a lot for your cooperation!
Created attachment 256911 [details] dmesg|grep iwl output 2017-06-08 I've tried the new version with a 4.9 Kernel (the default one from Debian testing). I still get crashes. I've attached the `dmesg|grep iwl` output.
Thanks Volker. Back to the drawing board...
*** Bug 196109 has been marked as a duplicate of this bug. ***
Created attachment 257061 [details] Syslog error for non-queue-stuck 1007 assert
For what it's worth, my bug (bug 196109) was dup'd to this one because it has the same assert. I don't see the Queue Stuck message, but presumably it's the same root cause? I've attached my syslog output in case it provides any more useful information for triage. Please see the original bug I filed for more information.
@Nathan can you please reply to the questions in comment 33? Thanks.
@Nathan, can you please test the firmware from comment #39?
a) Do you use USB 3.0? - Yes. I can't easily move the device plugged in, but next time I reboot maybe I can try. It's not a great solution though. I can't disconnect the device without dramatically changing my workflow. b) What kind of computer you have? Is it a laptop or a tablet or...? Can you tell us the model? - Intel NUC Skull Canyon. It's a very small form factor PC. c) Is Bluetooth or LTE also enabled in this device? - Yes. I am using Bluetooth. I have installed the new firmware and the next time I reboot I will report the results. I will first try without moving the USB3 device to see if the problem clears up. If that doesn't work I will reboot again and move the USB3 device.
You don't have to reboot to replace the firmware. Reloading iwlwifi kernel module is enough.
Ah, cheers, should have thought of that. > [339388.935961] Intel(R) Wireless WiFi driver for Linux > [339388.935962] Copyright(c) 2003- 2015 Intel Corporation > [339388.938282] iwlwifi 0000:03:00.0: Direct firmware load for > iwlwifi-8000C-28.ucode failed with error -2 > [339388.938735] iwlwifi 0000:03:00.0: capa flags index 3 larger than > supported by driver > [339388.939151] iwlwifi 0000:03:00.0: loaded firmware version 27.532463.0 > op_mode iwlmvm > [339388.943718] iwlwifi 0000:03:00.0: Detected Intel(R) Dual Band Wireless AC > 8260, REV=0x208 > [339388.946085] iwlwifi 0000:03:00.0: L1 Enabled - LTR Enabled > [339388.947035] iwlwifi 0000:03:00.0: L1 Enabled - LTR Enabled I hope that looks reasonably correct to you. I'll give it 24 hours before I declare a verdict, but (at the risk of jinxing it) things are looking pretty promising so far!
May have spoken too soon: > [342369.176682] ieee80211 phy0: Hardware restart was requested > [342369.674264] iwlwifi 0000:03:00.0: L1 Enabled - LTR Enabled > [342369.674621] iwlwifi 0000:03:00.0: L1 Enabled - LTR Enabled > [342369.804310] iwlwifi 0000:03:00.0: L1 Enabled - LTR Enabled > [342369.804727] iwlwifi 0000:03:00.0: L1 Enabled - LTR Enabled > [342379.786546] iwlwifi 0000:03:00.0: Microcode SW error detected. > Restarting 0x82000000. > [342379.786556] iwlwifi 0000:03:00.0: CSR values: > [342379.786560] iwlwifi 0000:03:00.0: (2nd byte of CSR_INT_COALESCING is > CSR_INT_PERIODIC_REG) > [342379.786567] iwlwifi 0000:03:00.0: CSR_HW_IF_CONFIG_REG: 0X18c89008 > [342379.786583] iwlwifi 0000:03:00.0: CSR_INT_COALESCING: 0X00000040 > [342379.786598] iwlwifi 0000:03:00.0: CSR_INT: 0X00000000 > [342379.786612] iwlwifi 0000:03:00.0: CSR_INT_MASK: 0X00000000 > [342379.786627] iwlwifi 0000:03:00.0: CSR_FH_INT_STATUS: 0X00000000 > [342379.786642] iwlwifi 0000:03:00.0: CSR_GPIO_IN: 0X00000019 > [342379.786656] iwlwifi 0000:03:00.0: CSR_RESET: 0X00000000 > [342379.786671] iwlwifi 0000:03:00.0: CSR_GP_CNTRL: 0X08040005 > [342379.786686] iwlwifi 0000:03:00.0: CSR_HW_REV: 0X00000201 > [342379.786700] iwlwifi 0000:03:00.0: CSR_EEPROM_REG: 0Xd55555d5 > [342379.786715] iwlwifi 0000:03:00.0: CSR_EEPROM_GP: 0Xd55555d5 > [342379.786729] iwlwifi 0000:03:00.0: CSR_OTP_GP_REG: 0Xd55555d5 > [342379.786744] iwlwifi 0000:03:00.0: CSR_GIO_REG: 0X001f0042 > [342379.786759] iwlwifi 0000:03:00.0: CSR_GP_UCODE_REG: 0X00000000 > [342379.786773] iwlwifi 0000:03:00.0: CSR_GP_DRIVER_REG: 0X00000000 > [342379.786788] iwlwifi 0000:03:00.0: CSR_UCODE_DRV_GP1: 0X00000000 > [342379.786803] iwlwifi 0000:03:00.0: CSR_UCODE_DRV_GP2: 0X00000000 > [342379.786817] iwlwifi 0000:03:00.0: CSR_LED_REG: 0X00000060 > [342379.786832] iwlwifi 0000:03:00.0: CSR_DRAM_INT_TBL_REG: 0X883b40f9 > [342379.786847] iwlwifi 0000:03:00.0: CSR_GIO_CHICKEN_BITS: 0X07800200 > [342379.786861] iwlwifi 0000:03:00.0: CSR_ANA_PLL_CFG: 0Xd55555d5 > [342379.786876] iwlwifi 0000:03:00.0: CSR_MONITOR_STATUS_REG: 0Xc03803c0 > [342379.786891] iwlwifi 0000:03:00.0: CSR_HW_REV_WA_REG: 0X0001001a > [342379.786905] iwlwifi 0000:03:00.0: CSR_DBG_HPET_MEM_REG: 0Xffff0000 > [342379.786909] iwlwifi 0000:03:00.0: FH register values: > [342379.786935] iwlwifi 0000:03:00.0: FH_RSCSR_CHNL0_STTS_WPTR_REG: > 0X28629e00 > [342379.786961] iwlwifi 0000:03:00.0: FH_RSCSR_CHNL0_RBDCB_BASE_REG: > 0X01de2450 > [342379.786986] iwlwifi 0000:03:00.0: FH_RSCSR_CHNL0_WPTR: > 0X000000c8 > [342379.787012] iwlwifi 0000:03:00.0: FH_MEM_RCSR_CHNL0_CONFIG_REG: > 0X80801054 > [342379.787038] iwlwifi 0000:03:00.0: FH_MEM_RSSR_SHARED_CTRL_REG: > 0X000000fc > [342379.787063] iwlwifi 0000:03:00.0: FH_MEM_RSSR_RX_STATUS_REG: > 0X07830000 > [342379.787089] iwlwifi 0000:03:00.0: FH_MEM_RSSR_RX_ENABLE_ERR_IRQ2DRV: > 0X00000000 > [342379.787115] iwlwifi 0000:03:00.0: FH_TSSR_TX_STATUS_REG: > 0X07ff0003 > [342379.787140] iwlwifi 0000:03:00.0: FH_TSSR_TX_ERROR_REG: > 0X00000000 > [342379.787284] iwlwifi 0000:03:00.0: Start IWL Error Log Dump: > [342379.787288] iwlwifi 0000:03:00.0: Status: 0x00000000, count: 6 > [342379.787292] iwlwifi 0000:03:00.0: Loaded firmware version: 27.532463.0 > [342379.787297] iwlwifi 0000:03:00.0: 0x00001007 | ADVANCED_SYSASSERT > [342379.787301] iwlwifi 0000:03:00.0: 0x008006F4 | trm_hw_status0 > [342379.787304] iwlwifi 0000:03:00.0: 0x00000000 | trm_hw_status1 > [342379.787308] iwlwifi 0000:03:00.0: 0x0000FECC | branchlink2 > [342379.787311] iwlwifi 0000:03:00.0: 0x00029602 | interruptlink1 > [342379.787315] iwlwifi 0000:03:00.0: 0x00000000 | interruptlink2 > [342379.787318] iwlwifi 0000:03:00.0: 0x00030400 | data1 > [342379.787323] iwlwifi 0000:03:00.0: 0x0000040B | data2 > [342379.787326] iwlwifi 0000:03:00.0: 0xDEADBEEF | data3 > [342379.787330] iwlwifi 0000:03:00.0: 0x0F4050BD | beacon time > [342379.787333] iwlwifi 0000:03:00.0: 0x538D4F42 | tsf low > [342379.787337] iwlwifi 0000:03:00.0: 0x0000008D | tsf hi > [342379.787340] iwlwifi 0000:03:00.0: 0x00000000 | time gp1 > [342379.787343] iwlwifi 0000:03:00.0: 0x00976F19 | time gp2 > [342379.787347] iwlwifi 0000:03:00.0: 0x00000001 | uCode revision type > [342379.787350] iwlwifi 0000:03:00.0: 0x0000001B | uCode version major > [342379.787359] iwlwifi 0000:03:00.0: 0x00081FEF | uCode version minor > [342379.787362] iwlwifi 0000:03:00.0: 0x00000201 | hw version > [342379.787366] iwlwifi 0000:03:00.0: 0x18C89008 | board version > [342379.787369] iwlwifi 0000:03:00.0: 0x0B07001C | hcmd > [342379.787373] iwlwifi 0000:03:00.0: 0x80022002 | isr0 > [342379.787376] iwlwifi 0000:03:00.0: 0x00000000 | isr1 > [342379.787380] iwlwifi 0000:03:00.0: 0x0800180A | isr2 > [342379.787383] iwlwifi 0000:03:00.0: 0x004168C5 | isr3 > [342379.787386] iwlwifi 0000:03:00.0: 0x00000000 | isr4 > [342379.787390] iwlwifi 0000:03:00.0: 0x0500001C | last cmd Id > [342379.787393] iwlwifi 0000:03:00.0: 0x00000000 | wait_event > [342379.787396] iwlwifi 0000:03:00.0: 0x00000400 | l2p_control > [342379.787400] iwlwifi 0000:03:00.0: 0x00000020 | l2p_duration > [342379.787403] iwlwifi 0000:03:00.0: 0x0000003F | l2p_mhvalid > [342379.787407] iwlwifi 0000:03:00.0: 0x00000000 | l2p_addr_match > [342379.787410] iwlwifi 0000:03:00.0: 0x0000000D | lmpm_pmg_sel > [342379.787414] iwlwifi 0000:03:00.0: 0x29051029 | timestamp > [342379.787417] iwlwifi 0000:03:00.0: 0x0000C8D8 | flow_handler > [342379.787495] iwlwifi 0000:03:00.0: Start IWL Error Log Dump: > [342379.787500] iwlwifi 0000:03:00.0: Status: 0x00000000, count: 7 > [342379.787506] iwlwifi 0000:03:00.0: 0x00000070 | ADVANCED_SYSASSERT > [342379.787511] iwlwifi 0000:03:00.0: 0x00000000 | umac branchlink1 > [342379.787516] iwlwifi 0000:03:00.0: 0xC0085E88 | umac branchlink2 > [342379.787520] iwlwifi 0000:03:00.0: 0xC0083660 | umac interruptlink1 > [342379.787525] iwlwifi 0000:03:00.0: 0xC0083660 | umac interruptlink2 > [342379.787531] iwlwifi 0000:03:00.0: 0x00000800 | umac data1
As I understand from the firmware team, there are a few different issues that can cause this same sysassert. The firmware in comment 39 had a couple of fixes. We have some more fixes in another firmware I sent Volker, but you need to use the master branch of our backports (see comment #42). I'll send you the firmware for that privately as well so we can check if it fixes your problem.
I built and installed the new driver and loaded the new firmware and I still see the problem. The status code is different now (0x80 versus 0x00): > [ 1263.722870] iwlwifi 0000:03:00.0: Microcode SW error detected. Restarting > 0x82000000. > [ 1263.723016] iwlwifi 0000:03:00.0: Start IWL Error Log Dump: > [ 1263.723022] iwlwifi 0000:03:00.0: Status: 0x00000080, count: 6 > [ 1263.723025] iwlwifi 0000:03:00.0: Loaded firmware version: 32.530564.0 Unless there's any value in me sticking with this driver / firmware combo, I think I'll switch back to my distro's.
What matters here is the ASSERT code.
Thanks for testing, Nathan! It's a shame it didn't work for you either. Can you confirm that the SYSASSERT is the same one as before (i.e. 0x1007)?
Yes, sorry, I should have copied that too. Interestingly, the most recent error I saw uses a different code after "Restarting", but still the same assert code. > [ 332.879424] iwlwifi 0000:03:00.0: Microcode SW error detected. Restarting > 0x2000000. > [ 332.879560] iwlwifi 0000:03:00.0: Start IWL Error Log Dump: > [ 332.879563] iwlwifi 0000:03:00.0: Status: 0x00000080, count: 6 > [ 332.879565] iwlwifi 0000:03:00.0: Loaded firmware version: 32.530564.0 > [ 332.879567] iwlwifi 0000:03:00.0: 0x00001007 | ADVANCED_SYSASSERT
Also, I don't know if this is relevant, but the updated driver caused my bluetooth keyboard to stop working. I thought it might just be a coincidence so I decided to triage it later, but after removing the new driver and firmware the keyboard started working again.
All these other values don't mean much in this case, so they can change. We can check a bit further if you attach the dmesg, but I don't think it will help... Regarding the BT keyboard issue, it's probably unrelated, so we'll just have to make sure BT works fine when we officially release this new firmware. Since this new FW didn't help, I recommend that you revert to your distro's firmware and driver. I'll ping the FW guys to see if they have more fixes that may help.
Hi Nathan, Did you have a chance to remove the USB 3.0 device in the end and see if it works for you? I understand it doesn't work out as a solution to you, but we'd like to see if the USB device is the root cause (as it's already an ongoing known issue) or it something else. Thanks, David.
No, I'm afraid I haven't tried that yet. Unfortunately the USB hard drive I have plugged in is pretty heavily tied into my system and my workflow, so I'd have to make some changes to even boot with it unplugged. I'll give it a go at some point, hopefully soon, but I haven't managed to do that yet. Sorry.
Annoyingly (for me, anyway), unplugging the USB3 hub seems to have made a big difference. I did notice one dip in throughput as though the adapter was resetting, but I didn't see anything written to dmesg. It's been running for several hours now at semi-normal usage (as normal as I can get minus the USB hard drive) without tripping the assert once. I will press on as much as I can before I need to plug the drive back in to do work. Will be interesting to see what happens.
And I saw the assert again once I plugged in the USB hub (not immediately, but after some time). Didn't reboot. Any thoughts about how to work around the noise issue then?
Hi Nathan, Unfortunately we currently don't have a solution for this assert which is caused by the USB interference. Our system engineers are looking for a workaround for this issue. This noise interference is observed because of the proximity of the USB port to our NIC. The only workaround I could suggest at the moment is to plug your device to another USB port, in case you have one. A different USB port is probably further than our NIC and thus its noise impact would be much less significant. We'll be sure to let you know as soon as will have a fix for this issue. Thanks, David.
Thanks for the suggestion. Unfortunately, moving the plug to the front ports does not improve the situation. Do keep me advised of any improvements you may make.
@Nathan Have you tried to change the channel / band in which the AP is operating?
Unfortunately the AP is managed by my ISP and that's not an option I'm given.
Volker, anything to contribute before we close this bug?
Emmanuel, in my case I don't have any USB device connected. What would be the proximity the USB port need to have to the Wifi chip? I could open my machine and check the layout of the mainboard.
Have you tried to change the channel of the AP?
Hold up, why would you close this bug? If the claim is that Intel's hardware is buggy and the problem isn't with the driver, are there corresponding reports of problems on Windows with noise from BT or USB3? Is there evidence that there's nothing the driver can do to work around this problem? Can this bug at least document the need for better diagnostics so "Assert 1007" doesn't mean "one of four or five separate issues which need to be separately triaged"?
Nathan, We have a white paper on this subject: https://www.intel.com/content/www/us/en/io/universal-serial-bus/usb3-frequency-interference-paper.html Please take a look to understand the details. The driver doesn't know what is going on. The firmware cannot tell what kind of noise it is and the problem is that it just keeps trying to find a clean air time to send data but it never comes, so it gives up. As David mentioned, our system engineers are (and have been!) working hard to try to find a workaround for the issue.
Thanks for the link. I note that the paper specifically mentions 2.4GHz WiFi as being affected, but I noticed that the problem was much worse on 5GHz (so much so that it was unusable; I had to fall back to 2.4GHz before I could use it). Is it just that 5GHz WiFi was not in use in 2012 when the paper was written? Or do my problems with the 5GHz network indicate there could be multiple causes? It's unfortunate that although Intel knew about the problem in 2012, when they designed the NUC6i7KYK which came out last year it was still susceptible to the problem. Perhaps you can bring this feedback to the systems engineers that designing systems to avoid known problems is generally considered good practice.
Nathan, I understand your frustration and I hope you will be able to find a way to circumvent the problem. Regarding 5GHz, I'm not really sure, but from what I understand from the whitepaper, there should be less interference in the 5GHz band. 5GHz is much more sensitive, though, so I'm not sure you'd have better results. When you tested 5GHz, did you see the same SYSASSERT? Or something else? You could try the firmware and driver we suggested earlier again to test 5GHz specifically and see if the problem is still there. Another thing to try would be to use a different USB cable. Maybe some cables have better shielding than others? There were also some other mitigation suggestions in the whitepaper. And finally, since this is a NUC, and not a laptop, you could try to use an external antenna. It could help and it shouldn't be as inconvenient as having an external antenna on a laptop. ;)
Oh, haha, I completely failed to file a bug about it. 5GHz actually gives me a kernel oops. > Jun 27 09:25:12 nathanb-nuc kernel: iwlwifi 0000:03:00.0: PHY ctxt cmd error. > ret=-5 > Jun 27 09:25:12 nathanb-nuc kernel: iwlwifi 0000:03:00.0: Failed to send MAC > context (action:2): -5 > Jun 27 09:25:12 nathanb-nuc kernel: iwlwifi 0000:03:00.0: failed to update > MAC 00:c2:c6:dd:7f:cf > Jun 27 09:25:12 nathanb-nuc kernel: iwlwifi 0000:03:00.0: Failed to send MAC > context (action:2): -5 > Jun 27 09:25:12 nathanb-nuc kernel: iwlwifi 0000:03:00.0: failed to update > MAC 00:c2:c6:dd:7f:cf > Jun 27 09:25:12 nathanb-nuc kernel: BUG: unable to handle kernel NULL pointer > dereference at 000000000000011c > Jun 27 09:25:12 nathanb-nuc kernel: IP: iwl_mvm_add_sta+0x4f1/0x780 [iwlmvm] > Jun 27 09:25:12 nathanb-nuc kernel: PGD 4a22e0067 > Jun 27 09:25:12 nathanb-nuc kernel: PUD 4a8c59067 > Jun 27 09:25:12 nathanb-nuc kernel: PMD 0 > Jun 27 09:25:12 nathanb-nuc kernel: > Jun 27 09:25:12 nathanb-nuc kernel: Oops: 0000 [#1] PREEMPT SMP > Jun 27 09:25:12 nathanb-nuc kernel: Modules linked in: mmc_block fuse joydev > input_leds hid_generic uhid algif_hash algif_skcipher af_alg ctr ccm cmac > rfcomm mousedev hid_logitech_hidpp hid_logitech_dj usbhid sn > Jun 27 09:25:12 nathanb-nuc kernel: btqca lirc_dev btintel intel_gtt snd ptp > syscopyarea sysfillrect i2c_i801 pps_core nuvoton_cir sysimgblt fb_sys_fops > soundcore mei i2c_algo_bit shpchp intel_pch_thermal therm > Jun 27 09:25:12 nathanb-nuc kernel: CPU: 6 PID: 559 Comm: wpa_supplicant > Tainted: G O 4.11.6-3-ARCH #1 > Jun 27 09:25:12 nathanb-nuc kernel: Hardware name: > /NUC6i7KYB, BIOS KYSKLi70.86A.0037.2016.0603.1032 06/03/2016 > Jun 27 09:25:12 nathanb-nuc kernel: task: ffff8804a8edd700 task.stack: > ffffc90002754000 So yeah. Not particularly usable. I assumed it was the same problem, just worse, but I guess since the 2.4GHz problem is noise there's probably actually something else going on with the 5GHz. I should probably go ahead and file that at some point....
Ouch! Can you check if this oops looks the same as the ones reported here in bug 195299 (i.e. happens during the recovery flow)?
No, happens when I first try to connect.
Can you post your dmesg in bug 195299? Then we can check if it's the same case. If it's not, we'll ask you to file a new bug. ;)
Done. Maybe if that one gets fixed it will help me work around this one :)
I switched from HDMI to DP this week (for reasons not related to WiFi), and for what it's worth I haven't seen an ASSERT 1007 since (knock on wood).
Hi, Thank you Nathan for your input. HDMI could also interfere with the 2.4Ghz band There are a few more sources that may interfere with your WiFi. Mische, would you like to check that some of the sources below are not interfering in your case? 1. Electronic devices - HDMI cables/cordless home phone/ microwave/ baby monitor/ wireless speakers/TV etc. If you have any of them close to your laptop, I suggest moving your laptop or try and shut those devices down completely to see if it helps. 2. Hard drives with low shielding to the cables - do you have any hard drivers/components you've added to the laptop yourself? It could be an interference of one of them 3. notebook/laptop with the lid closed which has an external monitor connected to it. This issue was also reported as a possible source of interference. Thanks, David.
Created attachment 257295 [details] dmesg|grep iwl output 2017-07-02 Hi David, I've tried to unplug as many things as I can and put things as far away as possible. I still get errors, I've attached my `dmesg|grep iwl` output. My Kernel version is: Linux frea 4.9.0-2-amd64 #1 SMP Debian 4.9.18-1 (2017-03-30) x86_64 GNU/Linux My laptop was running on battery without anything connected.
Hi Mische, Thank you for trying that out. I hoped it might be beneficial for you. In anyway I would recommend to switch the AP to a different channel if possible and see if it helps you out. Thanks, David.
*** Bug 196439 has been marked as a duplicate of this bug. ***
I talked with the Firmware engineers again and they are working on a way to mitigate problems due to noise coming from USB, but it's at very early stages now and we won't have a fix for this in the foreseeable future. I have to close this bug as won't fix.
*** Bug 196641 has been marked as a duplicate of this bug. ***
*** Bug 197331 has been marked as a duplicate of this bug. ***