Created attachment 123621 [details] dmesg After long period of work, connection to wifi was suddenly lost, and a lot of orrors (see dmesg) started flooding to dmesg. NetworkManager was unable to reconnect to the network again, I've tried to switch wifi on/off with rfkill, and finally reloaded the module, which fixed that. This may be a regression since I haven't seen something like that in 3.12.8. Dmesg attached.
Created attachment 123631 [details] lspci -vv
The logs show iwlwifi issues when your system is a really bad shape because of other component. INFO: task bcache_writebac:92 blocked for more than 120 seconds. [ 240.469408] Tainted: G O 3.13.0-1-ab #9 [ 240.469408] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 240.469410] bcache_writebac D ffff88032b436600 0 92 2 0x00000000 [ 240.469413] ffff880037c2feb8 0000000000000046 ffff88032b436600 ffff880037c2ffd8 [ 240.469415] 00000000000130c0 00000000000130c0 ffff88032b436600 0000000000000000 [ 240.469417] 0000000000000000 ffff880037c2fe10 ffffffff8148db86 ffff88032b4bfd70 [ 240.469419] Call Trace: [ 240.469425] [<ffffffff8148db86>] ? preempt_schedule+0x36/0x60 [ 240.469429] [<ffffffff8100b1b6>] ? ___preempt_schedule+0x56/0xb0 [ 240.469432] [<ffffffff81067660>] ? kthread_create_on_node+0x60/0x180 [ 240.469452] [<ffffffffa0190fd0>] ? read_dirty_endio+0x60/0x60 [bcache] [ 240.469454] [<ffffffff8148d869>] schedule+0x29/0x70 [ 240.469456] [<ffffffff8106782d>] kthread+0xad/0xf0 [ 240.469458] [<ffffffff81067780>] ? kthread_create_on_node+0x180/0x180 [ 240.469461] [<ffffffff814989ec>] ret_from_fork+0x7c/0xb0 [ 240.469463] [<ffffffff81067780>] ? kthread_create_on_node+0x180/0x180 Something really bad is happending in core kernel... The problem is that I can't tell you to whom this bug should be assign. The only thing I am pretty sure about is that I am not the one who can solve this :)
I've already reported about this bcache bug (also, a regression, too, 3.13 was bad for me), but it doesn't look related with iwlwifi, or is it really somehow? I'm talking about things later, when a lot of iwlwifi errors show up.
[36590.315454] iwlwifi 0000:04:00.0: L1 Enabled; Disabling L0S [36590.315634] iwlwifi 0000:04:00.0: L1 Enabled; Disabling L0S [36596.052696] iwlwifi 0000:04:00.0: FW Error notification: type 0x00000000 cmd_id 0x00 [36596.052700] iwlwifi 0000:04:00.0: FW Error notification: seq 0x0000 service 0x00000000 [36596.052701] iwlwifi 0000:04:00.0: FW Error notification: timestamp 0x 574CEA [36596.052720] iwlwifi 0000:04:00.0: Microcode SW error detected. Restarting 0x2000000. [36596.052721] iwlwifi 0000:04:00.0: CSR values: [36596.052722] iwlwifi 0000:04:00.0: (2nd byte of CSR_INT_COALESCING is CSR_INT_PERIODIC_REG) [36596.052737] iwlwifi 0000:04:00.0: CSR_HW_IF_CONFIG_REG: 0X00489204 [36596.052750] iwlwifi 0000:04:00.0: CSR_INT_COALESCING: 0X8000ff40 [36596.052763] iwlwifi 0000:04:00.0: CSR_INT: 0X00000000 [36596.052777] iwlwifi 0000:04:00.0: CSR_INT_MASK: 0X00000000 [36596.052791] iwlwifi 0000:04:00.0: CSR_FH_INT_STATUS: 0X00000000 [36596.052805] iwlwifi 0000:04:00.0: CSR_GPIO_IN: 0X00000000 [36596.052819] iwlwifi 0000:04:00.0: CSR_RESET: 0X00000000 [36596.052833] iwlwifi 0000:04:00.0: CSR_GP_CNTRL: 0X080403c5 [36596.052847] iwlwifi 0000:04:00.0: CSR_HW_REV: 0X00000144 [36596.052861] iwlwifi 0000:04:00.0: CSR_EEPROM_REG: 0X00000000 [36596.052875] iwlwifi 0000:04:00.0: CSR_EEPROM_GP: 0X80000000 [36596.052885] iwlwifi 0000:04:00.0: CSR_OTP_GP_REG: 0X803a0000 [36596.052899] iwlwifi 0000:04:00.0: CSR_GIO_REG: 0X00080042 [36596.052913] iwlwifi 0000:04:00.0: CSR_GP_UCODE_REG: 0X00000000 [36596.052927] iwlwifi 0000:04:00.0: CSR_GP_DRIVER_REG: 0X00000000 [36596.052941] iwlwifi 0000:04:00.0: CSR_UCODE_DRV_GP1: 0X00000000 [36596.052955] iwlwifi 0000:04:00.0: CSR_UCODE_DRV_GP2: 0X00000000 [36596.052969] iwlwifi 0000:04:00.0: CSR_LED_REG: 0X00000060 [36596.052983] iwlwifi 0000:04:00.0: CSR_DRAM_INT_TBL_REG: 0X882e926f [36596.052994] iwlwifi 0000:04:00.0: CSR_GIO_CHICKEN_BITS: 0X27800200 [36596.053007] iwlwifi 0000:04:00.0: CSR_ANA_PLL_CFG: 0Xd55555d5 [36596.053021] iwlwifi 0000:04:00.0: CSR_HW_REV_WA_REG: 0X0001001a [36596.053035] iwlwifi 0000:04:00.0: CSR_DBG_HPET_MEM_REG: 0Xffff0010 [36596.053036] iwlwifi 0000:04:00.0: FH register values: [36596.053058] iwlwifi 0000:04:00.0: FH_RSCSR_CHNL0_STTS_WPTR_REG: 0X06a21200 [36596.053072] iwlwifi 0000:04:00.0: FH_RSCSR_CHNL0_RBDCB_BASE_REG: 0X006a1760 [36596.053086] iwlwifi 0000:04:00.0: FH_RSCSR_CHNL0_WPTR: 0X00000030 [36596.053099] iwlwifi 0000:04:00.0: FH_MEM_RCSR_CHNL0_CONFIG_REG: 0X80801114 [36596.053113] iwlwifi 0000:04:00.0: FH_MEM_RSSR_SHARED_CTRL_REG: 0X000000fc [36596.053127] iwlwifi 0000:04:00.0: FH_MEM_RSSR_RX_STATUS_REG: 0X07030000 [36596.053141] iwlwifi 0000:04:00.0: FH_MEM_RSSR_RX_ENABLE_ERR_IRQ2DRV: 0X00000000 [36596.053155] iwlwifi 0000:04:00.0: FH_TSSR_TX_STATUS_REG: 0X07ff0001 [36596.053168] iwlwifi 0000:04:00.0: FH_TSSR_TX_ERROR_REG: 0X00000000 [36596.053292] iwlwifi 0000:04:00.0: Start IWL Error Log Dump: [36596.053294] iwlwifi 0000:04:00.0: Status: 0x00000000, count: 6 [36596.053295] iwlwifi 0000:04:00.0: 0x00000038 | BAD_COMMAND [36596.053297] iwlwifi 0000:04:00.0: 0x00A00220 | uPc [36596.053298] iwlwifi 0000:04:00.0: 0x00000000 | branchlink1 [36596.053299] iwlwifi 0000:04:00.0: 0x00000C2A | branchlink2 [36596.053300] iwlwifi 0000:04:00.0: 0x00015ADC | interruptlink1 [36596.053301] iwlwifi 0000:04:00.0: 0x0001AE90 | interruptlink2 [36596.053302] iwlwifi 0000:04:00.0: 0x00000000 | data1 [36596.053303] iwlwifi 0000:04:00.0: 0xDEADBEEF | data2 [36596.053304] iwlwifi 0000:04:00.0: 0xDEADBEEF | data3 [36596.053305] iwlwifi 0000:04:00.0: 0x0034C7BD | beacon time [36596.053306] iwlwifi 0000:04:00.0: 0x00574CED | tsf low [36596.053308] iwlwifi 0000:04:00.0: 0x00000000 | tsf hi [36596.053309] iwlwifi 0000:04:00.0: 0x00000000 | time gp1 [36596.053310] iwlwifi 0000:04:00.0: 0x00574CED | time gp2 [36596.053311] iwlwifi 0000:04:00.0: 0x00000000 | time gp3 [36596.053312] iwlwifi 0000:04:00.0: 0x0004160F | uCode version [36596.053313] iwlwifi 0000:04:00.0: 0x00000144 | hw version [36596.053314] iwlwifi 0000:04:00.0: 0x00489204 | board version [36596.053315] iwlwifi 0000:04:00.0: 0x00000000 | hcmd [36596.053316] iwlwifi 0000:04:00.0: 0x240220C0 | isr0 [36596.053317] iwlwifi 0000:04:00.0: 0x01000000 | isr1 [36596.053318] iwlwifi 0000:04:00.0: 0x00000002 | isr2 [36596.053320] iwlwifi 0000:04:00.0: 0x0041FCC0 | isr3 [36596.053321] iwlwifi 0000:04:00.0: 0x00000001 | isr4 [36596.053322] iwlwifi 0000:04:00.0: 0x01000112 | isr_pref [36596.053323] iwlwifi 0000:04:00.0: 0x00000000 | wait_event [36596.053324] iwlwifi 0000:04:00.0: 0x00000050 | l2p_control [36596.053325] iwlwifi 0000:04:00.0: 0x00018010 | l2p_duration [36596.053326] iwlwifi 0000:04:00.0: 0x0000003F | l2p_mhvalid [36596.053327] iwlwifi 0000:04:00.0: 0x00000000 | l2p_addr_match [36596.053328] iwlwifi 0000:04:00.0: 0x00000005 | lmpm_pmg_sel [36596.053329] iwlwifi 0000:04:00.0: 0x22121614 | timestamp [36596.053330] iwlwifi 0000:04:00.0: 0x00002838 | flow_handler [36596.053335] ieee80211 phy1: Hardware restart was requested [36596.053344] iwlwifi 0000:04:00.0: FW error in SYNC CMD SCAN_OFFLOAD_UPDATE_PROFILES_CMD [36596.053346] CPU: 0 PID: 2169 Comm: wpa_supplicant Tainted: G O 3.13.0-1-ab #9 [36596.053347] Hardware name: LENOVO 20AN0037RT/20AN0037RT, BIOS GLET64WW (2.18 ) 12/18/2013 [36596.053349] ffff88032acff218 ffff88032450b810 ffffffff8148bd87 ffff88032acfc000 [36596.053352] ffff88032450b880 ffffffffa05f9c58 ffff88032450b848 0000001400000000 [36596.053353] 0000000000000000 ffff8803273c4c80 ffffffff81084560 ffff88032450b848 [36596.053355] Call Trace: [36596.053361] [<ffffffff8148bd87>] dump_stack+0x4d/0x6f [36596.053365] [<ffffffffa05f9c58>] iwl_trans_pcie_send_hcmd+0x668/0x690 [iwlwifi] [36596.053369] [<ffffffff81084560>] ? __wake_up_sync+0x20/0x20 [36596.053373] [<ffffffffa076f171>] iwl_mvm_send_cmd+0x21/0x70 [iwlmvm] [36596.053376] [<ffffffffa0775a4b>] iwl_mvm_config_sched_scan_profiles+0x14b/0x1a0 [iwlmvm] [36596.053378] [<ffffffffa076aa33>] iwl_mvm_mac_sched_scan_start+0x93/0xe0 [iwlmvm] [36596.053392] [<ffffffffa06feab1>] ieee80211_request_sched_scan_start+0x141/0x2d0 [mac80211] [36596.053398] [<ffffffffa070f2a0>] ieee80211_sched_scan_start+0x30/0x40 [mac80211] [36596.053406] [<ffffffffa05ab52b>] nl80211_start_sched_scan+0x77b/0x870 [cfg80211] [36596.053411] [<ffffffff813d5fc4>] genl_family_rcv_msg+0x334/0x380 [36596.053413] [<ffffffff813d6010>] ? genl_family_rcv_msg+0x380/0x380 [36596.053415] [<ffffffff813d60a1>] genl_rcv_msg+0x91/0xd0 [36596.053417] [<ffffffff813d52a9>] netlink_rcv_skb+0xa9/0xc0 [36596.053419] [<ffffffff813d5c78>] genl_rcv+0x28/0x40 [36596.053421] [<ffffffff813d4835>] netlink_unicast+0xd5/0x180 [36596.053423] [<ffffffff813d4ce9>] netlink_sendmsg+0x359/0x760 [36596.053426] [<ffffffff8138ee70>] sock_sendmsg+0xa0/0xc0 [36596.053428] [<ffffffff813908d0>] ? move_addr_to_kernel+0x40/0x90 [36596.053430] [<ffffffff81390879>] ___sys_sendmsg+0x3a9/0x3c0 [36596.053433] [<ffffffff8107c2d0>] ? update_curr+0x80/0x190 [36596.053434] [<ffffffff8107a018>] ? __enqueue_entity+0x78/0x80 [36596.053437] [<ffffffff8107167a>] ? finish_task_switch+0x4a/0xe0 [36596.053439] [<ffffffff8148d2be>] ? __schedule+0x37e/0x900 [36596.053441] [<ffffffff810a53e8>] ? ktime_get_ts+0x48/0xe0 [36596.053443] [<ffffffff81391832>] __sys_sendmsg+0x42/0x80 [36596.053446] [<ffffffff81391882>] SyS_sendmsg+0x12/0x20 [36596.053448] [<ffffffff81498a96>] system_call_fastpath+0x1a/0x1f Things like this.
It seems really weird that we send a bad command to the firmware... This most likely because your system is already in an awful state. I'd suggest to wait until you get a fix for the bcache issue, and then see how iwlwifi behaves
Oh, I haven't thought about that. Okay, I'll wait till some fix for bcache, thanks for helping this far!
do you have news here? I'd like to close the bug
The bcache issue has been fixed by bisecting and reverting(but my solution has not yet been accepted to upstream), this problem arises from time to time erratically. I would say that this could not be reliably reproduced, so you can close it until I'll catch some pattern in this (or maybe the problem goes away by itself).
Created attachment 126501 [details] new dmesg I've checked though the last dmesg and as it turned out the issue has happened today, I've just hasn't noticed it for some reason, and it went away by itself even without reloading a module. Sorry for multiple "delay" statements (this is from snd-usb-audio and is a separate bug, the issue happened before I attached the USB card)
Ok - now I understand what is happening here... Yes - we have a bug in scheduled scan. Thanks for reporting. This feature is disabled in 3.14-rc3 and the patch that disables this feature was CCed to stable 3.13. So I guess we just have to wait until Greg picks it up. Here is the commit that will disable the offending feature: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=0822afe8ebb9389997ef677447c7b08e08797de9 In parallel, we will check why the error you are seeing occurs. Please try to apply the aforementioned commit and report back so that I can close the bug.
I installed 3.13.3 (in which this feature is still enabled) and the exact same firmware as you but couldn't reproduce the issue. For the record, this feature is called scheduled scan, it allows the driver to configure the firmware to scan autonomously without waking up the host CPU. Can you please run tracing? I'd need -e mac80211 -e iwlwifi -e iwlwifi_msg Thanks!
Just has learnt about ftrace, I'll revert just applied patch (btw, there wasn't any issues for several hours but this can mean nothing because the issue can present itself 1-2 times a day) and run with it and tracing until something interesting happens then.
Created attachment 126771 [details] dmesg with trace Hello again, This bug was more reproducible than I thought -- I'm getting it each boot actually (not sure if was the case earlier). Attaching new dmesg log (with ftrace) and a trace.dat file (created by trace-cmd).
Created attachment 126781 [details] trace data
(For clarity -- this is the results for kernel without the patch mentioned)
Created attachment 126851 [details] fix scan offload when sched scan is enabled can you please apply this patch (and of course, don't apply the patch that disables the feature). Thanks!
Hello, After this patch the problem was gone, but I cannot connect to my WLAN network anymore. Interestingly, this problem disappears after suspend/resume cycle, and everything works normally.
Created attachment 126871 [details] trace.dat with new patch
Created attachment 126881 [details] dmesg with new patch
Feel free to ask me for additional experiments/info if needed.
How do you try to connect to your AP? Do you use wpa_cli, or the NetworkManager's GUI? I guess I'd need your syslog to see the logs of the supplicant. Thanks!
Hello, I use NetworkManager (I can try other methods if requested). wpa_supplicant generates log like this: Feb 21 01:17:37 abbradarbook wpa_supplicant[3181]: Successfully initialized wpa_supplicant Feb 21 01:17:37 abbradarbook wpa_supplicant[3181]: dbus: wpas_dbus_bss_signal_prop_changed: Unknown Property value 7 Feb 21 01:17:37 abbradarbook wpa_supplicant[3181]: wlp4s0: Failed to initiate AP scan Feb 21 01:17:37 abbradarbook wpa_supplicant[3181]: wlp4s0: Failed to initiate AP scan ... And goes like that. If I kill wpa_supplicant, NM restarts it and everything works properly after that. When I use kernel with feature disabled, everything works from the first try.
I guess that this is the bug because of which we disabled this feature. We are still reviewing the patch I attached, once it will be merged, I'll close this bug. FWIW: the patch that disables scheduled scan has been ported to stable tree.
https://git.kernel.org/cgit/linux/kernel/git/stable/stable-queue.git/tree/queue-3.13/iwlwifi-mvm-disable-scheduled-scan.patch?id=91a6efb86984246385960d8558d07758474c8020
patch is now upstream.