Bug 207409 - iwlwifi: 7265: Microcode SW error detected
Summary: iwlwifi: 7265: Microcode SW error detected
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: network-wireless-intel (show other bugs)
Hardware: Intel Linux
: P1 normal
Assignee: Default virtual assignee for network-wireless-intel
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-04-22 17:28 UTC by Olli Salonen
Modified: 2020-11-15 15:19 UTC (History)
4 users (show)

See Also:
Kernel Version: 5.3.0
Tree: Mainline
Regression: No


Attachments
dmesg (52.48 KB, text/plain)
2020-04-22 17:28 UTC, Olli Salonen
Details
Kernel log (120.90 KB, text/plain)
2020-06-11 18:17 UTC, Max Gautier
Details

Description Olli Salonen 2020-04-22 17:28:08 UTC
Created attachment 288671 [details]
dmesg

I'm running an Intel Compute Stick STK1AW32SC that has Wireless AC 7265 (REV=0x210) WiFi chip. It works fine for some time, but after some days it always reports a microcode SW error.

[442890.970169] iwlwifi 0000:01:00.0: Microcode SW error detected.  Restarting 0x2000000.
[442890.970343] iwlwifi 0000:01:00.0: Start IWL Error Log Dump:
[442890.970350] iwlwifi 0000:01:00.0: Status: 0x00000080, count: 6
[442890.970357] iwlwifi 0000:01:00.0: Loaded firmware version: 29.1044073957.0

root@greenhouse:~# uname -a
Linux greenhouse 5.3.0-46-generic #38~18.04.1-Ubuntu SMP Tue Mar 31 04:17:56 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

root@greenhouse:~# ethtool -i wlp1s0
driver: iwlwifi
version: 5.3.0-46-generic
firmware-version: 29.1044073957.0
expansion-rom-version:
bus-info: 0000:01:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no

root@greenhouse:~# lspci | grep -i network
01:00.0 Network controller: Intel Corporation Wireless 7265 (rev 69)

The WiFi will work after these errors, but I'm using the Bluetooth to periodically query a few BT thermometers and that will stop working after this. A reboot will sort it out.

This is after a SW error:
root@greenhouse:~# bluetoothctl lescan
[NEW] Controller 00:21:5C:BC:2F:E0 greenhouse [default]
[NEW] Device 4C:65:A8:D9:48:2F MJ_HT_V1
[NEW] Device 4C:65:A8:D7:49:1C MJ_HT_V1
[NEW] Device 4C:65:A8:D9:3E:76 MJ_HT_V1
Agent registered
[bluetooth]# exit
Agent unregistered
[DEL] Controller 00:21:5C:BC:2F:E0 greenhouse [default]
root@greenhouse:~# gatttool -b 4C:65:A8:D9:48:2F --char-read --handle=0x18
connect error: Transport endpoint is not connected (107)

Whereas after rebooting it works again:
root@greenhouse:~# gatttool -b 4C:65:A8:D9:48:2F --char-read --handle=0x18
Characteristic value/descriptor: 44
Comment 1 Max Gautier 2020-06-11 18:17:34 UTC
Created attachment 289615 [details]
Kernel log
Comment 2 Max Gautier 2020-06-11 18:26:29 UTC
I get similar firmware stack trace, although symptoms are a bit differents :
sometimes (it happen like twice a week maybe), all my network operations deadlock (including trying to bring down the interface), and a process named "kworker/u8:10+phy0" is stuck at 100% cpu in my top. Rebooting the machine fixes it.

lscpi -vvv -d ::0280
02:00.0 Network controller: Intel Corporation Wireless 7265 (rev 59)
	Subsystem: Intel Corporation Dual Band Wireless-AC 7265
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 50
	Region 0: Memory at f7100000 (64-bit, non-prefetchable) [size=8K]
	Capabilities: [c8] Power Management version 3
		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
		Address: 00000000fee00398  Data: 0000
	Capabilities: [40] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 unlimited
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
		DevCtl:	CorrErr- NonFatalErr- FatalErr- UnsupReq-
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr+ NoSnoop+ FLReset-
			MaxPayload 128 bytes, MaxReadReq 128 bytes
		DevSta:	CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr+ TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x1, ASPM L1, Exit Latency L1 <32us
			ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
			ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s (ok), Width x1 (ok)
			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range B, TimeoutDis+ NROPrPrP- LTR+
			 10BitTagComp- 10BitTagReq- OBFF Via WAKE#, ExtFmt- EETLPPrefix-
			 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
			 FRS- TPHComp- ExtTPHComp-
			 AtomicOpsCap: 32bit- 64bit- 128bitCAS-
		DevCtl2: Completion Timeout: 16ms to 55ms, TimeoutDis- LTR+ OBFF Disabled,
			 AtomicOpsCtl: ReqEn-
		LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete- EqualizationPhase1-
			 EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
			 Retimer- 2Retimers- CrosslinkRes: unsupported
	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
		AERCap:	First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
		HeaderLog: 00000000 00000000 00000000 00000000
	Capabilities: [140 v1] Device Serial Number 10-02-b5-ff-ff-9a-ea-61
	Capabilities: [14c v1] Latency Tolerance Reporting
		Max snoop latency: 3145728ns
		Max no snoop latency: 3145728ns
	Capabilities: [154 v1] L1 PM Substates
		L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
			  PortCommonModeRestoreTime=30us PortTPowerOnTime=60us
		L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
			   T_CommonMode=0us LTR1.2_Threshold=163840ns
		L1SubCtl2: T_PwrOn=60us
	Kernel driver in use: iwlwifi
	Kernel modules: iwlwifi


ethool -i wlp2s0
driver: iwlwifi
version: 5.6.15-arch1-1
firmware-version: 29.163394017.0 7265D-29.ucode
expansion-rom-version: 
bus-info: 0000:02:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no

uname -a
Linux hostname 5.6.15-arch1-1 #1 SMP PREEMPT Wed, 27 May 2020 23:42:26 +0000 x86_64 GNU/Linux

I had first reported the bug in the archlinux bug tracker, if relevant : 
https://bugs.archlinux.org/task/66447?project=1&pagenum=9
Comment 3 roman 2020-08-28 12:09:45 UTC
same SW error here. 

For me I have still an connection (wifi signal in gnome), but requests time out and after a few minutes or reboot everything is fine.

But error occurs every few minutes (1-30) again. 

5.8.5-arch1-1

https://pastebin.com/8sDvDf1M
Comment 4 roman 2020-09-10 06:30:37 UTC
kernel 5.9 doesn't work either. 

looks like no one cares.
Comment 5 roman 2020-09-10 09:43:56 UTC
I have created a new config file in
  
  > /etc/modprobe.d/iwlwifi.conf

added

options iwlwifi 11n_disable=1 swcrypto=0 power_save=0
options iwlmvm power_scheme=1 
options iwlwifi uapsd_disable=1 


my thought process is as follows

We have 5GHz with 80MHz Channel Bandwidth
and 2.4GHz with 40 MHz Channel Bandwidth

With and without WPA2 Enterprise Tunneled TLS | MsCHAPv2


Maybe there is a problem using 5g + 2.4
maybe there is a bug with 2.4g and 40MHz Bandwidth. 

I will test it further and remove one option by another and try to find the problem
Comment 6 roman 2020-09-10 14:17:44 UTC
for now it looks like 

> options iwlwifi power_save=0

does prevent the SW issue

changing the power_scheme on any value doesn't change anything (crashes with scheme 1, 2 and 3).

when setting 11n_disable=1 linux is kinda confused

https://ibb.co/g40sbj9

In gnome and terminal (iw) it shows 2.4GHz 
in nmcli it shows channel 36 and 405MBit
My router says channel 36 
11a with 54MBit

so for now I would suggest to use power_save=0 

I would also suggest to blacklist the 
Intel Corporation Wireless 7265
for power saving per default in the kernel until someone is into bisecting the main issue of this.
Comment 7 roman 2020-09-17 08:30:09 UTC
after long testing it seems that 

options iwlwifi swcrypto=0 
options iwlwifi power_save=0
options iwlmvm power_scheme=1 
options iwlwifi uapsd_disable=1 


using these options I get 866MB/s connection speed and no SW firmware errors. 
This should be addressed, maybe there are some hw issues with power saving mode.

Therefore, It would be nice to disable them in kernel for specific vendor:hw IDs.

Note You need to log in before you can comment on or make changes to this bug.