Bug 199807
Summary: | iwl4965 last firmware version is buggy and should be rolled back | ||
---|---|---|---|
Product: | Drivers | Reporter: | Ryan Underwood (nemesis) |
Component: | network-wireless | Assignee: | drivers_network-wireless (drivers_network-wireless) |
Status: | CLOSED WILL_NOT_FIX | ||
Severity: | normal | CC: | stf_xl |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 4.15 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
boot dmesg
sample dmesg during failure mode |
Description
Ryan Underwood
2018-05-23 01:32:19 UTC
Created attachment 276141 [details]
sample dmesg during failure mode
Also, "completely fixes the problem" is an overstatement, let me rephrase. The user-visible unusably-slow performance and dropped connections are completely fixed when using the older microcode. Microcode crashes are still visible in the log with the older release; but in the case of the older microcode, it doesn't impact performance of the link. Example: [556283.340519] iwl4965 0000:07:00.0: Microcode SW error detected. Restarting 0x82000000. [556283.340530] iwl4965 0000:07:00.0: Loaded firmware version: 228.57.2.23 [556283.340549] iwl4965 0000:07:00.0: Start IWL Error Log Dump: [556283.340554] iwl4965 0000:07:00.0: Status: 0x000213E4, count: 5 [556283.340701] iwl4965 0000:07:00.0: Desc Time data1 data2 line [556283.340708] iwl4965 0000:07:00.0: NMI_INTERRUPT_WDG (0x0004) 1467024232 0x00000002 0x03630000 208 [556283.340713] iwl4965 0000:07:00.0: pc blink1 blink2 ilink1 ilink2 hcmd [556283.340720] iwl4965 0000:07:00.0: 0x0046C 0x04B30 0x004C2 0x006DE 0x04BCC 0x27A001C [556283.340725] iwl4965 0000:07:00.0: FH register values: [556283.340743] iwl4965 0000:07:00.0: FH49_RSCSR_CHNL0_STTS_WPTR_REG: 0X1d1a2700 [556283.340761] iwl4965 0000:07:00.0: FH49_RSCSR_CHNL0_RBDCB_BASE_REG: 0X01090700 [556283.340778] iwl4965 0000:07:00.0: FH49_RSCSR_CHNL0_WPTR: 0X000000f0 [556283.340796] iwl4965 0000:07:00.0: FH49_MEM_RCSR_CHNL0_CONFIG_REG: 0X80809000 [556283.340812] iwl4965 0000:07:00.0: FH49_MEM_RSSR_SHARED_CTRL_REG: 0X0000003c [556283.340829] iwl4965 0000:07:00.0: FH49_MEM_RSSR_RX_STATUS_REG: 0X03630000 [556283.340846] iwl4965 0000:07:00.0: FH49_MEM_RSSR_RX_ENABLE_ERR_IRQ2DRV: 0X00000000 [556283.340863] iwl4965 0000:07:00.0: FH49_TSSR_TX_STATUS_REG: 0X07fd0002 [556283.340880] iwl4965 0000:07:00.0: FH49_TSSR_TX_ERROR_REG: 0X00000000 [556283.406101] iwl4965 0000:07:00.0: Timeout stopping DMA channel 1 [0x07fd0002] [556283.407444] iwl4965 0000:07:00.0: Can't stop Rx DMA. [556283.407769] ieee80211 phy13: Hardware restart was requested The 228.61.2.24 firmware was released about 10 years ago and being used since then. Removing support for it now does not sound like good idea. I would check if your distribution enable PowerSave (disabled by default), which is know to cause firmware crashes. Since 4.13 kernel there is warning if PS is enabled: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=438f3d13da5e0714f1add1652865b864a2c36eb7 Disabling powersave (via powertop) does not stop these particular crashes with the newer firmware nor the older one. The difference is just that in the presence of these occasional crashes, the link doesn't slow to a near-halt with the older firmware. I suspect that few people are still using this hardware with newer kernels. They are part of a potentially larger set of people that includes those who tried and gave up because it didn't work. I should ensure to state that I agree that there is risk to rolling it back, so it would be nice to figure out what is going on here. It is easy to reproduce: - swcrypto=1 - powersave on or off - Some USB dongle running hostapd (I have RTL8818AU for now) - Multiple simultaneous bulk streams - try downloading multiple YouTube videos at once, for example, or moving multiple files to a CIFS mount simultaneously Is this reproducible without swcrypto=1 ? Without swcrypto=1 the situation is hopeless for other reasons. I am actually amazed that it is not the default. However, I have not tested it recently and will do so again. What mean hopeless, it does not associate with AP ? What encryption/settings are you using ? The last time I tried it without swcrypto=1 years ago the firmware constantly crashed and it was slow. I have always used WPA2 personal, TKIP. What settings? By the way, it's a lot easier to try this hardware yourself and see just how bad it is than for me to explain it to you in this text box. :-) I have the hardware and it works flawlessly for me. I would try AES instead of TKIP. TKIP is flawed anyway. I wanted to ask you to provide debug logs, but loose willingness to work on this bug. You know the workaround the problem anyway. Losing interest in a bug that's reported by users in every distribution for years? I cannot control what some random router supports and expecting the user to be in charge of that is frankly ridiculous. This is exactly the kind of problem that provides a poor user experience as soon a user installs Linux. If you're too lazy to work on it as you stated, at least leave it open for someone who will. This should be easy enough for you: use the last firmware and download 5 kernel tarballs simultaneously from a fast local mirror. If the firmware doesn't crash, then something is different about your configuration. |