Bug 205193 - iwlwifi: 5200-5: every 300 seconds: Timeout waiting for hardware access (CSR_GP_CNTRL 0x080003d8)
Summary: iwlwifi: 5200-5: every 300 seconds: Timeout waiting for hardware access (CSR_...
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: network-wireless-intel (show other bugs)
Hardware: Intel Linux
: P1 normal
Assignee: DO NOT USE - assign "network-wireless-intel" component instead
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-10-15 03:00 UTC by David Lindsay
Modified: 2020-07-17 10:35 UTC (History)
3 users (show)

See Also:
Kernel Version: 5.2.11
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description David Lindsay 2019-10-15 03:00:37 UTC
Hi. I'm user #102,669,340 experiencing Wi-Fi module restarts every 300 seconds/5 minutes, in my case with the Intel Corporation Ultimate N WiFi Link 5300 built into my ThinkPad T400. Yes, fairly old hardware, but (overall) it works very well. I'm running kernel 5.2.11 and iwlwifi-5000-5.ucode version 8.83.5.1 build 33692.

The reason I submit this report is that the module restart process is very noticeable - it causes the entire system (especially mouse cursor motion) to become choppy for about 2-3 seconds. This is kind of disconcerting to experience, and the second or third time it happened (10-15 minutes after powering on this particular machine for the first time) I was scrambling for dmesg, fearing serious hardware failure. The glitches immediately vanished when I tried removing iwlwifi+iwldvm.

I've been using an external USB Wi-Fi dongle (a TP-Link TL-WN722N) - which has worked fine with no issues - for several months now, but the little cable the dongle is attached with (added to mitigate the risk of knocks and bumps damaging the dongle and/or USB port) does impair mobility somewhat, so I am casually curious to find out if it is at all possible to get the builtin Wi-Fi module working.

I'm not really sure what to try at this point. https://bugzilla.kernel.org/show_bug.cgi?id=91171 mentions Wake-on-LAN in the context of the ThinkPad T430, which I've just disabled in the BIOS to no effect.

--

I have two questions.

1. Would it be possible to change the interval of whatever failing operation is taking 300 seconds, so it happens more frequently - say, every 5 seconds? I think that a tunable driver parameter - or custom firmware with hardcoded lowered values (and no-warranty "test" clauses) - would significantly aid debugging.

Developers, testers and random users must currently wait 5 minutes to get feedback about whether a particular change has had any effect or not.

It's also possible that users continue to experience this issue in the wild today, in situations that are not being captured as "adequately" broken because the glitches happen (comparatively) infrequently, and they require more than a couple of moments' patience to verify. If this issue were happening every 5-15 seconds, it may well bring more bug reports out of the woodwork.

I make these arguments as someone who tends to use interpreted programming languages wherever possible to avoid even small compilation times, so I must admit I am heavily biased.


2. If I'm reading https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/linux-firmware.git/log/iwlwifi-5000-5.ucode correctly, it appears that the latest firmware version available for my Wi-Fi module is from... 2011?? (I've verified the firmware blob in git is the same as what I'm currently using.)

Is it possible I'm still seeing issues that have been fixed in newer hardware because those fixes haven't been backported to my card's firmware release? If this is indeed the case, would it be possible to request these fix backports please? :)

I absolutely do not mind testing preproduction/alpha firmwares.

--

In case it is useful, I uploaded the relevant portions of dmesg after running

# modprobe iwlwifi debug=0xfff7ffff 11n_disable=1 fw_restart=0 fw_monitor=1

over here: https://drive.google.com/open?id=1-zz3vqG7UJWc5G_2bli7yAxN3XX5Xkco (1.1MB .txt.xz -> 14MB; xzless suggested)

Here's the start of where everything falls to pieces:

118647 (0.001707) [104989.122942] t400 kernel: iwlwifi 0000:03:00.0: U iwl_prepare_legacy_sensitivity_tbl ofdm: ac 120 mrc 210 x1 120 mrc_x1 240 thresh 100
118648 (0.008252) [104989.131194] t400 kernel: iwlwifi 0000:03:00.0: U iwl_prepare_legacy_sensitivity_tbl cck: ac 200 mrc 400 thresh 86
118649 (0.008435) [104989.139629] t400 kernel: iwlwifi 0000:03:00.0: I iwl_pcie_enqueue_hcmd Sending command SENSITIVITY_CMD (00.a8), seq: 0x0403, 28 bytes at 3[3]:4
118650 (0.009164) [104989.148793] t400 kernel: iwlwifi 0000:03:00.0: I iwl_pcie_set_cmd_in_flight set ref_cmd_in_flight - ref
118651 (0.002729) [104989.151522] t400 kernel: iwlwifi 0000:03:00.0: I iwl_pcie_txq_inc_wr_ptr Q:4 WR: 0x4
118652 (0.178514) [104989.330036] t400 kernel: iwlwifi 0000:03:00.0: I iwl_pcie_enqueue_hcmd Sending command REPLY_LEDS_CMD (00.48), seq: 0x0404, 12 bytes at 4[4]:4
118653 (0.002919) [104989.332955] t400 kernel: iwlwifi 0000:03:00.0: I iwl_pcie_txq_inc_wr_ptr Q:4 WR: 0x5
118654 (1.010435) [104990.343390] t400 kernel: iwlwifi 0000:03:00.0: I iwl_pcie_enqueue_hcmd Sending command REPLY_LEDS_CMD (00.48), seq: 0x0405, 12 bytes at 5[5]:4
118655 (0.011844) [104990.355234] t400 kernel: iwlwifi 0000:03:00.0: I iwl_pcie_txq_inc_wr_ptr Q:4 WR: 0x6
118656 (0.006608) [104990.361842] t400 kernel: iwlwifi 0000:03:00.0: I iwlagn_tx_skb TX to [0|8] Q:0 - seq: 0x0
118657 (0.007219) [104990.369061] t400 kernel: iwlwifi 0000:03:00.0: I iwl_trans_pcie_tx Q: 0 first tx - take ref
118658 (0.007774) [104990.376835] t400 kernel: iwlwifi 0000:03:00.0: I iwl_pcie_txq_inc_wr_ptr Q:0 WR: 0xc
118659 (0.548026) [104990.924861] t400 dhcpcd[43974]: wlp3s0: carrier lost
118660 (0.001884) [104990.926745] t400 kernel: iwlwifi 0000:03:00.0: U iwlagn_mac_set_key enter
118661 (0.006093) [104990.932838] t400 kernel: iwlwifi 0000:03:00.0: U iwl_scan_cancel_timeout Scan cancel timeout
118662 (0.009620) [104990.942458] t400 kernel: iwlwifi 0000:03:00.0: U iwl_do_scan_abort Not performing scan to abort
118663 (0.007841) [104990.950299] t400 kernel: iwlwifi 0000:03:00.0: U iwl_remove_dynamic_key Remove dynamic key: idx=0 sta=0
118664 (0.009053) [104990.959352] t400 kernel: iwlwifi 0000:03:00.0: U iwl_send_add_sta Adding sta 0 (ZZ:ZZ:ZZ:ZZ:ZZ:ZZ) synchronously
118665 (0.008511) [104990.967863] t400 kernel: iwlwifi 0000:03:00.0: U iwl_pcie_send_hcmd_sync Attempting to send sync command REPLY_ADD_STA
118666 (0.011803) [104990.979666] t400 kernel: iwlwifi 0000:03:00.0: U iwl_pcie_send_hcmd_sync Setting HCMD_ACTIVE for command REPLY_ADD_STA
118667 (0.008960) [104990.988626] t400 kernel: iwlwifi 0000:03:00.0: I iwl_pcie_enqueue_hcmd Sending command REPLY_ADD_STA (00.18), seq: 0x0406, 96 bytes at 6[6]:4
118668 (0.011619) [104991.000245] t400 kernel: iwlwifi 0000:03:00.0: I iwl_pcie_txq_inc_wr_ptr Q:4 WR: 0x7
118669 (0.326422) [104991.326667] t400 kernel: iwlwifi 0000:03:00.0: I iwl_pcie_enqueue_hcmd Sending command REPLY_LEDS_CMD (00.48), seq: 0x0407, 12 bytes at 7[7]:4
118670 (0.010813) [104991.337480] t400 kernel: iwlwifi 0000:03:00.0: I iwl_pcie_txq_inc_wr_ptr Q:4 WR: 0x8
118671 (1.005875) [104992.343355] t400 kernel: iwlwifi 0000:03:00.0: I iwl_pcie_enqueue_hcmd Sending command REPLY_LEDS_CMD (00.48), seq: 0x0408, 12 bytes at 8[8]:4
118672 (0.011055) [104992.354410] t400 kernel: iwlwifi 0000:03:00.0: I iwl_pcie_txq_inc_wr_ptr Q:4 WR: 0x9
118673 (0.732287) [104993.086697] t400 kernel: iwlwifi 0000:03:00.0: Error sending REPLY_ADD_STA: time out after 2000ms.
118674 (0.008170) [104993.094867] t400 kernel: iwlwifi 0000:03:00.0: Current CMD queue read_ptr 3 write_ptr 9
118675 (0.006334) [104993.101201] t400 kernel: iwlwifi 0000:03:00.0: U iwl_pcie_send_hcmd_sync Clearing HCMD_ACTIVE for command REPLY_ADD_STA
118676 (0.008575) [104993.109776] t400 kernel: iwlwifi 0000:03:00.0: U _iwl_disable_interrupts Disabled interrupts
118677 (0.011831) [104993.121607] t400 kernel: ------------[ cut here ]------------
118678 (0.000000) [104993.121607] t400 kernel: Timeout waiting for hardware access (CSR_GP_CNTRL 0x080003d8)
118679 (0.000000) [104993.121607] t400 kernel: WARNING: CPU: 0 PID: 44552 at drivers/net/wireless/intel/iwlwifi/pcie/trans.c:2008 iwl_trans_pcie_grab_nic_access+0x1e9/0x220 [iwlwifi]

NB. The first column is a line number, and the second column (in brackets) is a time delta. The dmesg was acquired via:

# journalctl -q -b 0 -o short-monotonic | awk '{ match($0, /^\[([^ ]+)]/, x); printf("(%f) %s\n", x[1]-ox, $0); ox=x[1]; }' | (line selection) | awk '{ printf("%6d %s\n", NR, $0) }'

Let me know if you'd like me to figure tracing out.

--

Thanks for taking the time to look into this!

David

Note You need to log in before you can comment on or make changes to this bug.