Bug 207409
Summary: | iwlwifi: 7265: Microcode SW error detected | ||
---|---|---|---|
Product: | Drivers | Reporter: | Olli Salonen (olli.salonen) |
Component: | network-wireless-intel | Assignee: | Default virtual assignee for network-wireless-intel (drivers_network-wireless-intel) |
Status: | NEW --- | ||
Severity: | normal | CC: | arnau.bigas, coolx67, john.aaron.rose, linuxwifi, mg |
Priority: | P1 | ||
Hardware: | Intel | ||
OS: | Linux | ||
Kernel Version: | 5.3.0 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
dmesg
Kernel log |
Created attachment 289615 [details]
Kernel log
I get similar firmware stack trace, although symptoms are a bit differents : sometimes (it happen like twice a week maybe), all my network operations deadlock (including trying to bring down the interface), and a process named "kworker/u8:10+phy0" is stuck at 100% cpu in my top. Rebooting the machine fixes it. lscpi -vvv -d ::0280 02:00.0 Network controller: Intel Corporation Wireless 7265 (rev 59) Subsystem: Intel Corporation Dual Band Wireless-AC 7265 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 50 Region 0: Memory at f7100000 (64-bit, non-prefetchable) [size=8K] Capabilities: [c8] Power Management version 3 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Address: 00000000fee00398 Data: 0000 Capabilities: [40] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 unlimited ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- RlxdOrd+ ExtTag- PhantFunc- AuxPwr+ NoSnoop+ FLReset- MaxPayload 128 bytes, MaxReadReq 128 bytes DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr+ TransPend- LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L1, Exit Latency L1 <32us ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+ LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+ ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s (ok), Width x1 (ok) TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range B, TimeoutDis+ NROPrPrP- LTR+ 10BitTagComp- 10BitTagReq- OBFF Via WAKE#, ExtFmt- EETLPPrefix- EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- FRS- TPHComp- ExtTPHComp- AtomicOpsCap: 32bit- 64bit- 128bitCAS- DevCtl2: Completion Timeout: 16ms to 55ms, TimeoutDis- LTR+ OBFF Disabled, AtomicOpsCtl: ReqEn- LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete- EqualizationPhase1- EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest- Retimer- 2Retimers- CrosslinkRes: unsupported Capabilities: [100 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn- MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap- HeaderLog: 00000000 00000000 00000000 00000000 Capabilities: [140 v1] Device Serial Number 10-02-b5-ff-ff-9a-ea-61 Capabilities: [14c v1] Latency Tolerance Reporting Max snoop latency: 3145728ns Max no snoop latency: 3145728ns Capabilities: [154 v1] L1 PM Substates L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+ PortCommonModeRestoreTime=30us PortTPowerOnTime=60us L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ T_CommonMode=0us LTR1.2_Threshold=163840ns L1SubCtl2: T_PwrOn=60us Kernel driver in use: iwlwifi Kernel modules: iwlwifi ethool -i wlp2s0 driver: iwlwifi version: 5.6.15-arch1-1 firmware-version: 29.163394017.0 7265D-29.ucode expansion-rom-version: bus-info: 0000:02:00.0 supports-statistics: yes supports-test: no supports-eeprom-access: no supports-register-dump: no supports-priv-flags: no uname -a Linux hostname 5.6.15-arch1-1 #1 SMP PREEMPT Wed, 27 May 2020 23:42:26 +0000 x86_64 GNU/Linux I had first reported the bug in the archlinux bug tracker, if relevant : https://bugs.archlinux.org/task/66447?project=1&pagenum=9 same SW error here. For me I have still an connection (wifi signal in gnome), but requests time out and after a few minutes or reboot everything is fine. But error occurs every few minutes (1-30) again. 5.8.5-arch1-1 https://pastebin.com/8sDvDf1M kernel 5.9 doesn't work either. looks like no one cares. I have created a new config file in > /etc/modprobe.d/iwlwifi.conf added options iwlwifi 11n_disable=1 swcrypto=0 power_save=0 options iwlmvm power_scheme=1 options iwlwifi uapsd_disable=1 my thought process is as follows We have 5GHz with 80MHz Channel Bandwidth and 2.4GHz with 40 MHz Channel Bandwidth With and without WPA2 Enterprise Tunneled TLS | MsCHAPv2 Maybe there is a problem using 5g + 2.4 maybe there is a bug with 2.4g and 40MHz Bandwidth. I will test it further and remove one option by another and try to find the problem for now it looks like > options iwlwifi power_save=0 does prevent the SW issue changing the power_scheme on any value doesn't change anything (crashes with scheme 1, 2 and 3). when setting 11n_disable=1 linux is kinda confused https://ibb.co/g40sbj9 In gnome and terminal (iw) it shows 2.4GHz in nmcli it shows channel 36 and 405MBit My router says channel 36 11a with 54MBit so for now I would suggest to use power_save=0 I would also suggest to blacklist the Intel Corporation Wireless 7265 for power saving per default in the kernel until someone is into bisecting the main issue of this. after long testing it seems that options iwlwifi swcrypto=0 options iwlwifi power_save=0 options iwlmvm power_scheme=1 options iwlwifi uapsd_disable=1 using these options I get 866MB/s connection speed and no SW firmware errors. This should be addressed, maybe there are some hw issues with power saving mode. Therefore, It would be nice to disable them in kernel for specific vendor:hw IDs. (In reply to roman from comment #7) > after long testing it seems that > > options iwlwifi swcrypto=0 > options iwlwifi power_save=0 > options iwlmvm power_scheme=1 > options iwlwifi uapsd_disable=1 > > > using these options I get 866MB/s connection speed and no SW firmware > errors. > This should be addressed, maybe there are some hw issues with power saving > mode. > > Therefore, It would be nice to disable them in kernel for specific vendor:hw > IDs. I have a 2165 chip using kernel 5.15.0-52-generic under Ubuntu Jammy. I actually did: echo 'options iwlwifi power_save=0 uapsd_disable=0' | sudo tee -a /etc/modprobe.d/wifihacks.conf echo 'options iwlmvm power_scheme=1' | sudo tee -a /etc/modprobe.d/wifihacks.conf But after rebooting, the wifi still does not work. Help! I forgot to mention that I have a 2165 chip. However, Intel at https://www.intel.com/content/www/us/en/support/articles/000005511/wireless.html says that they use the same firmware. Apologies. I keep saying it's a 2165 chip whereas it is in fact a 3165 chip. |
Created attachment 288671 [details] dmesg I'm running an Intel Compute Stick STK1AW32SC that has Wireless AC 7265 (REV=0x210) WiFi chip. It works fine for some time, but after some days it always reports a microcode SW error. [442890.970169] iwlwifi 0000:01:00.0: Microcode SW error detected. Restarting 0x2000000. [442890.970343] iwlwifi 0000:01:00.0: Start IWL Error Log Dump: [442890.970350] iwlwifi 0000:01:00.0: Status: 0x00000080, count: 6 [442890.970357] iwlwifi 0000:01:00.0: Loaded firmware version: 29.1044073957.0 root@greenhouse:~# uname -a Linux greenhouse 5.3.0-46-generic #38~18.04.1-Ubuntu SMP Tue Mar 31 04:17:56 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux root@greenhouse:~# ethtool -i wlp1s0 driver: iwlwifi version: 5.3.0-46-generic firmware-version: 29.1044073957.0 expansion-rom-version: bus-info: 0000:01:00.0 supports-statistics: yes supports-test: no supports-eeprom-access: no supports-register-dump: no supports-priv-flags: no root@greenhouse:~# lspci | grep -i network 01:00.0 Network controller: Intel Corporation Wireless 7265 (rev 69) The WiFi will work after these errors, but I'm using the Bluetooth to periodically query a few BT thermometers and that will stop working after this. A reboot will sort it out. This is after a SW error: root@greenhouse:~# bluetoothctl lescan [NEW] Controller 00:21:5C:BC:2F:E0 greenhouse [default] [NEW] Device 4C:65:A8:D9:48:2F MJ_HT_V1 [NEW] Device 4C:65:A8:D7:49:1C MJ_HT_V1 [NEW] Device 4C:65:A8:D9:3E:76 MJ_HT_V1 Agent registered [bluetooth]# exit Agent unregistered [DEL] Controller 00:21:5C:BC:2F:E0 greenhouse [default] root@greenhouse:~# gatttool -b 4C:65:A8:D9:48:2F --char-read --handle=0x18 connect error: Transport endpoint is not connected (107) Whereas after rebooting it works again: root@greenhouse:~# gatttool -b 4C:65:A8:D9:48:2F --char-read --handle=0x18 Characteristic value/descriptor: 44