Created attachment 283657 [details] Relevant dmesg output Running Ubuntu 18.04 LTS with stock kernel (4.15.0-54-generic). Firmware failure with stock firmware from linux-firmware package (version 29.1044073957.0) and firmware supplied by Emmanuel (29.4009927039.0). Takes about 1 day of uptime for the first firmware crash. After that, firmware restarts are fairly common (2-3 per day). Throughput drops by 3-4x after first firmware restart and never recovers. No specific repro steps; simply remain connected to a 5GHz WPA2 Personal network with occasional network activity.
I have also been seeing this sort of behaviour since about April/May with Fedora 30 and various kernel versions. Seems to be load-dependent - it will be fine for a while but do anything "strenous" network wise (like, say download distribution updates) and it seems more likely to happen. Sometimes after less than an hour. ifconfig down/up gets the network back. I thought it was kernel related but as far as i can determine it is wireless firmware related, as rolling back to a previous fedora package (that was dated prior to May 14th on the fedora side) seems to revert the issue. But my testing has been limited as every time i update the box it seems to break again. If anyone has some suggestions for diagnostics I'm willing to assist. similar systemd journal output from my box also attached.
Created attachment 283893 [details] systemd journal output, same issue, fedora 30 on asrock taichi x470
To confirm, this started happening for me in May, with kernel version 5.0.17 (as far as i can determine) onwards. So it isn't just with kernel 4.x, and as above i do not believe it to be kernel specific, rather wireless firmware specific due to firmware rollback resolving the issue.
There seems to be an issue with the new FW where it doesn't flag support for the new command version, but uses it anyway. The crash is caused by the driver sending a command in the wrong (i.e. old) format. I'll fix this and send a patch/new FW version in a bit.
*** Bug 204213 has been marked as a duplicate of this bug. ***
*** Bug 204153 has been marked as a duplicate of this bug. ***
Created attachment 284709 [details] dmesg output from iwlwifi 3168 I'm getting strange kernel output with Intel(R) Dual Band Wireless AC 3168, REV=0x220. It amounts to just garbage backtrace. For me, it only happens with the 5.3 release candidates. Everything is good with linux stable tree. I've updated the firmware but no dice. Am I missing a kernel config option? A bios setting perhaps? I've got a Asrock x370 taichi mother board. Attached is some output from dmesg and lspci. The first 10 lines were pruned. Around line 180 is the output from lspci.
Created attachment 284873 [details] FW crash messages extracted from kernel log.
Is the patch/new FW mentioned in comment 4 above available yet, please? I'm still seeing this FW crash in 5.3.0-rc7+. I'm using the latest firmware in the linux-firmware git tree (iwlwifi 0000:02:00.0: loaded firmware version 46.6bf1df06.0 op_mode iwlmvm) in the hiope that this is the FW referred to in the comment. Relevant messages from kernel log is attached.
I'm still seeing it every boot and wake from suspend with 5.3.0-rc7 as well, on an 8260 controller. [ 9.866194] flap.local kernel: iwlwifi 0000:6c:00.0: Microcode SW error detected. Restarting 0x2000000. [ 9.866330] flap.local kernel: iwlwifi 0000:6c:00.0: Start IWL Error Log Dump: [ 9.866331] flap.local kernel: iwlwifi 0000:6c:00.0: Status: 0x00000080, count: 6 [ 9.866332] flap.local kernel: iwlwifi 0000:6c:00.0: Loaded firmware version: 36.77d01142.0 [ 9.866334] flap.local kernel: iwlwifi 0000:6c:00.0: 0x00000038 | BAD_COMMAND ... [ 9.867069] flap.local kernel: iwlwifi 0000:6c:00.0: FW error in SYNC CMD GEO_TX_POWER_LIMIT ... [ 9.867192] flap.local kernel: WARNING: CPU: 2 PID: 685 at drivers/net/wireless/intel/iwlwifi/mvm/scan.c:1874 iwl_mvm_rx_umac_scan_complete_notif.cold+0xc> [ 9.867237] flap.local kernel: CPU: 2 PID: 685 Comm: kworker/2:5 Not tainted 5.3.0-0.rc7.git0.1.fc31.x86_64 #1 And kernel is then tainted 512.
Still happens on 5.3.0 on an 8260 controller, any updates? kernel: iwlwifi 0000:02:00.0: loaded firmware version 36.77d01142.0 op_mode iwlmvm kernel: iwlwifi 0000:02:00.0: Detected Intel(R) Dual Band Wireless AC 8260, REV=0x208 kernel: iwlwifi 0000:02:00.0: Microcode SW error detected. Restarting 0x82000000. kernel: iwlwifi 0000:02:00.0: Loaded firmware version: 36.77d01142.0 kernel: iwlwifi 0000:02:00.0: FW error in SYNC CMD GEO_TX_POWER_LIMIT kernel: CPU: 2 PID: 71 Comm: kworker/2:1 Tainted: P OE 5.3.0-arch1-1-ARCH #1 kernel: Hardware name: HP HP ZBook 15 G3/80D5, BIOS N81 Ver. 01.41 07/16/2019 kernel: Workqueue: events iwl_mvm_async_handlers_wk [iwlmvm] kernel: Call Trace: kernel: dump_stack+0x5c/0x80 kernel: iwl_trans_pcie_send_hcmd+0x547/0x560 [iwlwifi] kernel: ? wait_woken+0x70/0x70 kernel: iwl_trans_send_cmd+0x59/0xb0 [iwlwifi] kernel: iwl_mvm_send_cmd+0x2e/0x80 [iwlmvm] kernel: iwl_mvm_get_sar_geo_profile+0x102/0x180 [iwlmvm] kernel: iwl_mvm_rx_chub_update_mcc+0x10b/0x1a0 [iwlmvm] kernel: iwl_mvm_async_handlers_wk+0xaa/0x140 [iwlmvm] kernel: process_one_work+0x1d1/0x3a0 kernel: worker_thread+0x4a/0x3d0 kernel: kthread+0xfb/0x130 kernel: ? process_one_work+0x3a0/0x3a0 kernel: ? kthread_park+0x80/0x80 kernel: ret_from_fork+0x35/0x40 kernel: iwlwifi 0000:02:00.0: Failed to get geographic profile info -5
Created attachment 285073 [details] Crash on 5.3.0
*** Bug 204917 has been marked as a duplicate of this bug. ***
*** Bug 204943 has been marked as a duplicate of this bug. ***
I have just sent a fix for this upstream: https://patchwork.kernel.org/patch/11158395/ It will hopefully be taken to v5.4 soon and from there to stable v5.3, v5.2 and v4.19.
Looking at the original log I attached to this bug report, I see that the uCode major version for my device is 0x1D (29): [79300.452830] iwlwifi 0000:05:00.0: 0x0000001D | uCode version major The patch enables GEO_TX_POWER_LIMIT for uCode major version 36 which wouldn't affect my device. Maybe I misunderstood something about the nature of the issue / patch, but it seems that the original bug report I reported is different than what is being fixed here. If so, should I file another bug or should we reopen this bug?
Awesome, thanks Luca!
On 24/09/2019 18:03, bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=204151 > > --- Comment #16 from sharvil.nanavati@gmail.com --- > Looking at the original log I attached to this bug report, I see that the > uCode > major version for my device is 0x1D (29): > > [79300.452830] iwlwifi 0000:05:00.0: 0x0000001D | uCode version major > > The patch enables GEO_TX_POWER_LIMIT for uCode major version 36 which > wouldn't > affect my device. > Even with that patch applied, I'm still getting the crash too. Hardware and uCode version are shown in: [ 3.447509] iwlwifi 0000:02:00.0: loaded firmware version 46.6bf1df06.0 op_mode iwlmvm [ 3.493018] iwlwifi 0000:02:00.0: Detected Intel(R) Wireless-AC 9260 160MHz, REV=0x324 > Maybe I misunderstood something about the nature of the issue / patch, but it > seems that the original bug report I reported is different than what is being > fixed here. If so, should I file another bug or should we reopen this bug? >
(In reply to sharvil.nanavati from comment #16) > Looking at the original log I attached to this bug report, I see that the > uCode major version for my device is 0x1D (29): > > [79300.452830] iwlwifi 0000:05:00.0: 0x0000001D | uCode version major > > The patch enables GEO_TX_POWER_LIMIT for uCode major version 36 which > wouldn't affect my device. > > Maybe I misunderstood something about the nature of the issue / patch, but > it seems that the original bug report I reported is different than what is > being fixed here. If so, should I file another bug or should we reopen this > bug? The original bug you reported, the one that has this line in the logs: [79300.452799] iwlwifi 0000:05:00.0: 0x00000034 | NMI_INTERRUPT_WDG ...has been fix with commit 0c3d7282233c ("iwlwifi: Add support for SAR South Korea limitation") upstream (it's in v5.3). This back has also been backported to v5.2.10+. The new patch I posted here today is not relevant for the "NMI_INTERRUPT_WDG" case. Please make sure you are using v5.2.10 or above.
(In reply to Chris Clayton from comment #18) > Even with that patch applied, I'm still getting the crash too. Hardware and > uCode version are shown in: > > [ 3.447509] iwlwifi 0000:02:00.0: loaded firmware version 46.6bf1df06.0 > op_mode iwlmvm > [ 3.493018] iwlwifi 0000:02:00.0: Detected Intel(R) Wireless-AC 9260 > 160MHz, REV=0x324 With this device you are probably getting the NMI_INTERRUPT_WDG case. I can't see the kernel version you are using, but it probably doesn't have the fix. Can you please check and report? For the record, this is the patch upstream for the NMI_INTERRUPT_WDG case: https://patchwork.kernel.org/patch/11021735/
Thanks for closing the loop on this, Luca. Much appreciated.
(In reply to Luca Coelho from comment #20) > (In reply to Chris Clayton from comment #18) > > Even with that patch applied, I'm still getting the crash too. Hardware and > > uCode version are shown in: > > > > [ 3.447509] iwlwifi 0000:02:00.0: loaded firmware version 46.6bf1df06.0 > > op_mode iwlmvm > > [ 3.493018] iwlwifi 0000:02:00.0: Detected Intel(R) Wireless-AC 9260 > > 160MHz, REV=0x324 > > With this device you are probably getting the NMI_INTERRUPT_WDG case. I > can't see the kernel version you are using, but it probably doesn't have the > fix. Can you please check and report? > > For the record, this is the patch upstream for the NMI_INTERRUPT_WDG case: > > https://patchwork.kernel.org/patch/11021735/ I'm running 5.3.1 which already includes the patch you've pointed to. I can generate the Microcode SW error more or less at will by downloading a large file (e.g. the Fedora 30 .iso). I've just done that and I'll attach the output from dmesg in a moment.
Created attachment 285165 [details] dmesg showing Microcode SW error
*** Bug 204983 has been marked as a duplicate of this bug. ***
(In reply to Chris Clayton from comment #22) > I'm running 5.3.1 which already includes the patch you've pointed to. > > I can generate the Microcode SW error more or less at will by downloading a > large file (e.g. the Fedora 30 .iso). I've just done that and I'll attach > the output from dmesg in a moment. The error you are getting is a different one: [ 554.990533] iwlwifi 0000:02:00.0: 0x000022CE | ADVANCED_SYSASSERT In this case, the signature is SYSASSERT and 0x000022CE. Can you please file a separate report for it?
*** Bug 205035 has been marked as a duplicate of this bug. ***
Hi, the patch proposed in comment 15 doesn't seem to fix the error found with a AC 3168 card. The latest firmware revision for AC 3168 is -29, and it's trying to remove -36 from the conditions. I had a build that removes both -29 and -36 together, which fixes this fw error on AC 3168 as expected.
Thanks for the patch you sent You-Sheng! As I replied in the list, the 7265D devices do implement this command, so a better fix would be this: diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c index 0d2229319261..38d89ee9bd28 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c @@ -906,8 +906,10 @@ static bool iwl_mvm_sar_geo_support(struct iwl_mvm *mvm) * entirely. */ return IWL_UCODE_SERIAL(mvm->fw->ucode_ver) >= 38 || - IWL_UCODE_SERIAL(mvm->fw->ucode_ver) == 29 || - IWL_UCODE_SERIAL(mvm->fw->ucode_ver) == 17; + IWL_UCODE_SERIAL(mvm->fw->ucode_ver) == 17 || + (IWL_UCODE_SERIAL(mvm->fw->ucode_ver) == 29 && + (mvm->trans->hw_rev & + CSR_HW_REV_TYPE_MSK) == CSR_HW_REV_TYPE_7265D); } int iwl_mvm_get_sar_geo_profile(struct iwl_mvm *mvm) Can someone try it?
Okay, I think this patch is good and I have applied it internally. I'll push it you as soon as I get confirmation that the patch I posted in the previous comment works.
I can confirm the patch in comment 15 and comment 28 turns off firmware error messages on a HP EliteDesk 800 G3 DM that equips Intel AC 3168.
Thanks for confirming, You-Sheng!
Here's the patch that was sent upstream: https://lore.kernel.org/linux-wireless/20191009094853.PfIm3J8o7DN_Femup-OXkJdtmKE7rftk1ODkm7cx-vk@z/
*** Bug 205163 has been marked as a duplicate of this bug. ***
(In reply to Luca Coelho from comment #32) > Here's the patch that was sent upstream: > https://lore.kernel.org/linux-wireless/20191009094853.PfIm3J8o7DN_Femup- > OXkJdtmKE7rftk1ODkm7cx-vk@z/ That patch/https://git.kernel.org/torvalds/c/12e36d98d3e5acf5fc57774e0a15906d55f30cb9 is not marked for stable afaics. I wonder if it should, as it contains this line: ``` Fixes: f5a47fae6aa3 ("iwlwifi: mvm: fix version check for GEO_TX_POWER_LIMIT support") ``` And that patch was recently backported to 5.3: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.3.y&id=3a0b7157d6a9833cefd40b5f0fa1cc90285bb4b5 Thus I suspect that 12e36d98d3e5 should be backported, too. I guess Bug 205163 is an indicator for that, as it talks about Linux 5.3.5.
(In reply to Thorsten Leemhuis from comment #34) > ``` > Fixes: f5a47fae6aa3 ("iwlwifi: mvm: fix version check for GEO_TX_POWER_LIMIT > support") > ``` > And that patch was recently backported to 5.3: > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/ > ?h=linux-5.3.y&id=3a0b7157d6a9833cefd40b5f0fa1cc90285bb4b5 Scratch that, I mixed up the commits, sorry for the noise. But I still wonder if https://git.kernel.org/torvalds/c/12e36d98d3e5acf5fc57774e0a15906d55f30cb9 should be backported, as I see warnings mentioned in Bug 205163 here as well on a system with a 3168
I also still have these issues on my Aspire A515-51G. https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1846016
Hi Lyubomir, please help ACK https://lists.ubuntu.com/archives/kernel-team/2019-October/104807.html if that works for you. It has been landed to Ubuntu linux-oem/linux-oem-osp1 tree, but not yet in the -generic trees due to the lack of sufficient ACKs.
Lyubomir, you may use my PPA https://launchpad.net/~vicamo/+archive/ubuntu/ppa-1846016 for test.
FWIW this has regressed in 5.5 - tracked in Bug 206395