While debugging some problems with hanging graphics (https://bugzilla.kernel.org/show_bug.cgi?id=194843), I've found with kasan this bug: May 5 14:34:01 probook kernel: ================================================================== May 5 14:34:01 probook kernel: BUG: KASAN: global-out-of-bounds in iwl_mvm_mac_ctxt_cmd_common+0x14b5/0x1610 [iwlmvm] at addr ffffffffa0562336 May 5 14:34:01 probook kernel: Read of size 2 by task wpa_supplicant/2355 May 5 14:34:01 probook kernel: Address belongs to variable iwl_drv_exit+0xf66f/0x339 [iwlwifi] May 5 14:34:01 probook kernel: CPU: 0 PID: 2355 Comm: wpa_supplicant Not tainted 4.11.0 #179 May 5 14:34:01 probook kernel: Hardware name: HP HP ProBook 645 G2/80FE, BIOS N77 Ver. 01.07 11/01/2016 May 5 14:34:01 probook kernel: Call Trace: May 5 14:34:01 probook kernel: dump_stack+0x4f/0x66 May 5 14:34:01 probook kernel: kasan_report+0x4da/0x510 May 5 14:34:01 probook kernel: ? iwl_mvm_mac_ctxt_cmd_common+0x14b5/0x1610 [iwlmvm] May 5 14:34:01 probook kernel: ? wake_atomic_t_function+0x2b0/0x2b0 May 5 14:34:01 probook kernel: __asan_report_load2_noabort+0x14/0x20 May 5 14:34:01 probook kernel: iwl_mvm_mac_ctxt_cmd_common+0x14b5/0x1610 [iwlmvm] May 5 14:34:01 probook kernel: ? iwl_mvm_channel_switch_noa_notif+0x40f/0x410 [iwlmvm] May 5 14:34:01 probook kernel: ? iwl_mvm_mac_ctxt_send_beacon+0xcb0/0xcb0 [iwlmvm] May 5 14:34:01 probook kernel: ? iwl_mvm_send_cmd_pdu+0x91/0xb0 [iwlmvm] May 5 14:34:01 probook kernel: ? iwl_mvm_send_cmd+0x160/0x160 [iwlmvm] May 5 14:34:01 probook kernel: iwl_mvm_mac_ctxt_cmd_sta+0xd1/0xe70 [iwlmvm] May 5 14:34:01 probook kernel: ? iwl_mvm_mac_ctxt_cmd_common+0x1610/0x1610 [iwlmvm] May 5 14:34:01 probook kernel: ? iwl_mvm_phy_ctxt_apply.constprop.3+0x31f/0x5d0 [iwlmvm] May 5 14:34:01 probook kernel: ? iwl_mvm_ref_taken+0x150/0x150 [iwlmvm] May 5 14:34:01 probook kernel: iwl_mvm_mac_ctx_send+0x68/0x110 [iwlmvm] May 5 14:34:01 probook kernel: iwl_mvm_mac_ctxt_changed+0x68/0x180 [iwlmvm] May 5 14:34:01 probook kernel: iwl_mvm_bss_info_changed+0x2f8/0xec0 [iwlmvm] May 5 14:34:01 probook kernel: ieee80211_bss_info_change_notify+0x177/0x4c0 [mac80211] May 5 14:34:01 probook kernel: ? __ieee80211_recalc_txpower+0x111/0x320 [mac80211] May 5 14:34:01 probook kernel: ieee80211_assign_vif_chanctx+0x7ce/0xf80 [mac80211] May 5 14:34:01 probook kernel: ieee80211_vif_use_channel+0x3ad/0x780 [mac80211] May 5 14:34:01 probook kernel: ieee80211_prep_connection+0x55b/0x1cf0 [mac80211] May 5 14:34:01 probook kernel: ? ieee80211_handle_bss_capability+0x220/0x220 [mac80211] May 5 14:34:01 probook kernel: ? __kmalloc+0x126/0x220 May 5 14:34:01 probook kernel: ieee80211_mgd_auth+0x69d/0xdd0 [mac80211] May 5 14:34:01 probook kernel: ? ieee80211_mlme_notify_scan_completed+0x1c0/0x1c0 [mac80211] May 5 14:34:01 probook kernel: ieee80211_auth+0x13/0x20 [mac80211] May 5 14:34:01 probook kernel: cfg80211_mlme_auth+0x2a7/0x6b0 [cfg80211] May 5 14:34:01 probook kernel: ? unwind_get_return_address+0x1e0/0x1e0 May 5 14:34:01 probook kernel: ? cfg80211_rx_mgmt+0x710/0x710 [cfg80211] May 5 14:34:01 probook kernel: ? parse_station_flags.isra.36+0x490/0x490 [cfg80211] May 5 14:34:01 probook kernel: nl80211_authenticate+0x8f7/0xfe0 [cfg80211] May 5 14:34:01 probook kernel: ? nl80211_parse_key+0xe70/0xe70 [cfg80211] May 5 14:34:01 probook kernel: ? nl80211_pre_doit+0xcd/0x560 [cfg80211] May 5 14:34:01 probook kernel: ? nla_parse+0xde/0x210 May 5 14:34:01 probook kernel: genl_family_rcv_msg+0x5c8/0x10f0 May 5 14:34:01 probook kernel: ? __alloc_skb+0x31f/0x560 May 5 14:34:01 probook kernel: ? genl_rcv+0x40/0x40 May 5 14:34:01 probook kernel: ? try_to_wake_up+0xb8/0x1080 May 5 14:34:01 probook kernel: ? alloc_skb_with_frags+0x8d/0x4c0 May 5 14:34:01 probook kernel: genl_rcv_msg+0x9b/0x120 May 5 14:34:01 probook kernel: netlink_rcv_skb+0x23b/0x340 May 5 14:34:01 probook kernel: ? genl_family_rcv_msg+0x10f0/0x10f0 May 5 14:34:01 probook kernel: genl_rcv+0x23/0x40 May 5 14:34:01 probook kernel: netlink_unicast+0x438/0x620 May 5 14:34:01 probook kernel: ? netlink_attachskb+0x640/0x640 May 5 14:34:01 probook kernel: netlink_sendmsg+0x86f/0xb60 May 5 14:34:01 probook kernel: ? netlink_broadcast+0x10/0x10 May 5 14:34:01 probook kernel: ? netlink_broadcast+0x10/0x10 May 5 14:34:01 probook kernel: sock_sendmsg+0xb5/0xf0 May 5 14:34:01 probook kernel: ___sys_sendmsg+0x6a2/0x8c0 May 5 14:34:01 probook kernel: ? ___sys_recvmsg+0x333/0x590 May 5 14:34:01 probook kernel: ? SYSC_sendto+0x300/0x300 May 5 14:34:01 probook kernel: ? sock_sendmsg+0xb5/0xf0 May 5 14:34:01 probook kernel: ? sock_write_iter+0x1e0/0x3b0 May 5 14:34:01 probook kernel: ? _raw_spin_unlock_irq+0x39/0x60 May 5 14:34:01 probook kernel: ? sock_sendmsg+0xf0/0xf0 May 5 14:34:01 probook kernel: ? __vfs_write+0x299/0x620 May 5 14:34:01 probook kernel: ? vfs_dedupe_get_page.isra.20+0x1d0/0x1d0 May 5 14:34:01 probook kernel: ? __fdget+0xe/0x10 May 5 14:34:01 probook kernel: __sys_sendmsg+0xc1/0x140 May 5 14:34:01 probook kernel: ? __sys_sendmsg+0xc1/0x140 May 5 14:34:01 probook kernel: ? SyS_shutdown+0x170/0x170 May 5 14:34:01 probook kernel: ? vfs_write+0x305/0x490 May 5 14:34:01 probook kernel: ? do_syscall_64+0x24d/0x380 May 5 14:34:01 probook kernel: SyS_sendmsg+0xd/0x20 May 5 14:34:01 probook kernel: entry_SYSCALL_64_fastpath+0x13/0x94 May 5 14:34:01 probook kernel: RIP: 0033:0x7fdd8d285ad7 May 5 14:34:01 probook kernel: RSP: 002b:00007ffcabeb1d98 EFLAGS: 00000246 ORIG_RAX: 000000000000002e May 5 14:34:01 probook kernel: RAX: ffffffffffffffda RBX: 0000000001083770 RCX: 00007fdd8d285ad7 May 5 14:34:01 probook kernel: RDX: 0000000000000000 RSI: 00007ffcabeb1df0 RDI: 0000000000000006 May 5 14:34:01 probook kernel: RBP: 0000000001068ff0 R08: 0000000000000000 R09: 000000000000003f May 5 14:34:01 probook kernel: R10: 000000000107cbd0 R11: 0000000000000246 R12: 0000000000000000 May 5 14:34:01 probook kernel: R13: 0000000000000000 R14: 0000000001068ff0 R15: 000000000107c520 May 5 14:34:01 probook kernel: Memory state around the buggy address: May 5 14:34:01 probook kernel: ffffffffa0562200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 May 5 14:34:01 probook kernel: ffffffffa0562280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 May 5 14:34:01 probook kernel: >ffffffffa0562300: 00 00 00 00 00 00 fa fa fa fa fa fa 00 00 00 00 May 5 14:34:01 probook kernel: ^ May 5 14:34:01 probook kernel: ffffffffa0562380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 May 5 14:34:01 probook kernel: ffffffffa0562400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 May 5 14:34:01 probook kernel: ================================================================== May 5 14:34:01 probook kernel: Disabling lock debugging due to kernel taint It's the second time, I'm seeing this out-of-bounds error. Maybe it happens way more often, and this is the root cause of my problems with amdgpu. hardware is a Wireless-AC 8260 lspci: ===== 00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1576 00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Device 1577 00:01.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Carrizo (rev c5) 00:01.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Kabini HDMI/DP Audio 00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 157b 00:02.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 157c 00:02.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 157c 00:02.5 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 157c 00:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 157b 00:08.0 Encryption controller: Advanced Micro Devices, Inc. [AMD] Device 1578 00:09.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 157d 00:09.2 Audio device: Advanced Micro Devices, Inc. [AMD] Device 157a 00:10.0 USB controller: Advanced Micro Devices, Inc. [AMD] FCH USB XHCI Controller (rev 20) 00:11.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 49) 00:12.0 USB controller: Advanced Micro Devices, Inc. [AMD] FCH USB EHCI Controller (rev 49) 00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 4a) 00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 11) 00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1570 00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1571 00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1572 00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1573 00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1574 00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1575 01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15) 02:00.0 Network controller: Intel Corporation Wireless 8260 (rev 3a) 03:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS522A PCI Express Card Reader (rev 01)
Can you please attach iwlmvm.ko so that I can see exactly what line it is?
I just reviewed the function and I can't find anything suspicious. All the memcpy are well into the their buffers. I'd need you to help me finding what iwl_mvm_mac_ctxt_cmd_common+0x14b5/0x1610 really points to.
Just noticed I've build the kernel without debug infos. Rebuilding with debug now and provide you the requested infos.
Created attachment 256397 [details] iwlmvm.ko.xz iwlmvm.ko xz compressed
probook ~ # cat /sys/module/iwlmvm/sections/.text 0xffffffffa06c8000 (gdb) add-symbol-file drivers/net/wireless/intel/iwlwifi/mvm/iwlmvm.ko 0xffffffffa06c8000 add symbol table from file "drivers/net/wireless/intel/iwlwifi/mvm/iwlmvm.ko" at .text_addr = 0xa06c8000 (y or n) y Reading symbols from drivers/net/wireless/intel/iwlwifi/mvm/iwlmvm.ko...done. (gdb) list *(iwl_mvm_mac_ctxt_cmd_common+0x14b5) 0xffffffffa06f3c55 is in iwl_mvm_mac_ctxt_cmd_common (drivers/net/wireless/intel/iwlwifi/mvm/mac-ctxt.c:578). 573 int i; 574 575 sband = mvm->hw->wiphy->bands[band]; 576 577 for_each_set_bit(i, &basic, BITS_PER_LONG) { 578 int hw = sband->bitrates[i].hw_value; 579 if (hw >= IWL_FIRST_OFDM_RATE) { 580 ofdm |= BIT(hw - IWL_FIRST_OFDM_RATE); 581 if (lowest_present_ofdm > hw) 582 lowest_present_ofdm = hw; (gdb) I don't know if the addresses from kasan report are still correct for the new compiled kernel with debug infos. Maybe I have to wait for the next kasan report to be sure.
Confirmed with recompiled kernel with debug infos: BUG: KASAN: global-out-of-bounds in iwl_mvm_mac_ctxt_cmd_common+0x14b5/0x1610 [iwlmvm] at addr ffffffffa050a336 gdb again points at int hw = sband->bitrates[i].hw_value;
ok - I'll try to take a look. Can you please print the value of 'basic' in this function? This can depend on your Access Point. You can also just record the connection flow with tracing, that will do as well. thanks.
Maybe, if you're printing things already, also print "band" and "vif->bss_conf.chandef.center_freq1" printk(KERN_DEBUG "basic=%x, band=%d, cf1=%d\n", basic, band, vif->bss_conf.chandef.center_freq1); somehow I feel that perhaps vif->bss_conf.chandef and the chanctx got out of sync, perhaps in some temporary state
code changes: sband = mvm->hw->wiphy->bands[band]; printk(KERN_DEBUG "basic=%lx, band=%d, cf1=%d\n", basic, band, vif->bss_conf.chandef.center_freq1); for_each_set_bit(i, &basic, BITS_PER_LONG) { int hw = sband->bitrates[i].hw_value; if (hw >= IWL_FIRST_OFDM_RATE) { ofdm |= BIT(hw - IWL_FIRST_OFDM_RATE); if (lowest_present_ofdm > hw) lowest_present_ofdm = hw; } else { BUILD_BUG_ON(IWL_FIRST_CCK_RATE != 0); cck |= BIT(hw); if (lowest_present_cck > hw) lowest_present_cck = hw; } } printk(KERN_DEBUG "basic=%lx, band=%d, cf1=%d\n", basic, band, vif->bss_conf.chandef.center_freq1); error output: [15267.263770] iwlwifi 0000:02:00.0: L1 Enabled - LTR Disabled [15267.264443] iwlwifi 0000:02:00.0: L1 Enabled - LTR Disabled [15267.392773] iwlwifi 0000:02:00.0: L1 Enabled - LTR Disabled [15267.393070] iwlwifi 0000:02:00.0: L1 Enabled - LTR Disabled [15267.475268] basic=15f, band=0, cf1=2437 [15267.475273] basic=15f, band=0, cf1=2437 [15267.475548] basic=15f, band=0, cf1=2437 [15267.475551] basic=15f, band=0, cf1=2437 [15267.475957] basic=15f, band=0, cf1=2437 [15267.475961] basic=15f, band=0, cf1=2437 [15267.476230] IPv6: ADDRCONF(NETDEV_UP): wlp2s0: link is not ready [15267.982314] wlp2s0: authenticate with c8:0e:14:dd:75:4b [15267.985789] basic=15f, band=1, cf1=5530 [15267.985796] ================================================================== [15267.985845] BUG: KASAN: global-out-of-bounds in iwl_mvm_mac_ctxt_cmd_common+0x15e4/0x1750 [iwlmvm] at addr ffffffffa051a336 [15267.985851] Read of size 2 by task wpa_supplicant/2366 [15267.985865] Address belongs to variable iwl_drv_exit+0xf66f/0x339 [iwlwifi] [15267.985872] CPU: 0 PID: 2366 Comm: wpa_supplicant Not tainted 4.11.0-00002-g426aae1c9a0e #189 [15267.985877] Hardware name: HP HP ProBook 645 G2/80FE, BIOS N77 Ver. 01.07 11/01/2016 [15267.985881] Call Trace: [15267.985893] dump_stack+0x4f/0x66 [15267.985899] kasan_report+0x4da/0x510 [15267.985910] ? iwl_mvm_mac_ctxt_cmd_common+0x15e4/0x1750 [iwlmvm] [15267.985919] ? iwl_pcie_hcmd_complete+0xef0/0xef0 [iwlwifi] [15267.985924] __asan_report_load2_noabort+0x14/0x20 [15267.985934] iwl_mvm_mac_ctxt_cmd_common+0x15e4/0x1750 [iwlmvm] [15267.985945] ? iwl_mvm_channel_switch_noa_notif+0x3cf/0x410 [iwlmvm] [15267.985954] ? iwl_mvm_mac_ctxt_send_beacon+0xcb0/0xcb0 [iwlmvm] [15267.985964] ? iwl_mvm_send_cmd_pdu+0x91/0xb0 [iwlmvm] [15267.985973] ? iwl_mvm_send_cmd+0x160/0x160 [iwlmvm] [15267.985983] iwl_mvm_mac_ctxt_cmd_sta+0xd1/0xe70 [iwlmvm] [15267.985993] ? iwl_mvm_mac_ctxt_cmd_common+0x1750/0x1750 [iwlmvm] [15267.986002] ? iwl_mvm_phy_ctxt_apply.constprop.3+0x31f/0x5d0 [iwlmvm] [15267.986011] ? iwl_mvm_ref_taken+0x150/0x150 [iwlmvm] [15267.986021] iwl_mvm_mac_ctx_send+0x68/0x110 [iwlmvm] [15267.986030] iwl_mvm_mac_ctxt_changed+0x68/0x180 [iwlmvm] [15267.986039] iwl_mvm_bss_info_changed+0x2f8/0xec0 [iwlmvm] [15267.986090] ieee80211_bss_info_change_notify+0x177/0x4c0 [mac80211] [15267.986107] ? __ieee80211_recalc_txpower+0x111/0x320 [mac80211] [15267.986123] ieee80211_assign_vif_chanctx+0x7ce/0xf80 [mac80211] [15267.986139] ieee80211_vif_use_channel+0x3ad/0x780 [mac80211] [15267.986155] ieee80211_prep_connection+0x55b/0x1cf0 [mac80211] [15267.986172] ? ieee80211_handle_bss_capability+0x220/0x220 [mac80211] [15267.986178] ? __kmalloc+0x126/0x220 [15267.986193] ieee80211_mgd_auth+0x69d/0xdd0 [mac80211] [15267.986209] ? ieee80211_mlme_notify_scan_completed+0x1c0/0x1c0 [mac80211] [15267.986226] ieee80211_auth+0x13/0x20 [mac80211] [15267.986267] cfg80211_mlme_auth+0x2a7/0x6b0 [cfg80211] [15267.986274] ? unwind_get_return_address+0x1e0/0x1e0 [15267.986287] ? cfg80211_rx_mgmt+0x710/0x710 [cfg80211] [15267.986301] ? parse_station_flags.isra.36+0x490/0x490 [cfg80211] [15267.986314] nl80211_authenticate+0x8f7/0xfe0 [cfg80211] [15267.986328] ? nl80211_parse_key+0xe70/0xe70 [cfg80211] [15267.986341] ? nl80211_pre_doit+0xcd/0x560 [cfg80211] [15267.986346] ? nla_parse+0xde/0x210 [15267.986353] genl_family_rcv_msg+0x5c8/0x10f0 [15267.986358] ? __alloc_skb+0x31f/0x560 [15267.986362] ? genl_rcv+0x40/0x40 [15267.986367] ? try_to_wake_up+0xb8/0x1080 [15267.986372] ? alloc_skb_with_frags+0x8d/0x4c0 [15267.986376] genl_rcv_msg+0x9b/0x120 [15267.986380] netlink_rcv_skb+0x23b/0x340 [15267.986384] ? genl_family_rcv_msg+0x10f0/0x10f0 [15267.986388] genl_rcv+0x23/0x40 [15267.986391] netlink_unicast+0x438/0x620 [15267.986395] ? netlink_attachskb+0x640/0x640 [15267.986399] netlink_sendmsg+0x86f/0xb60 [15267.986403] ? netlink_broadcast+0x10/0x10 [15267.986406] ? netlink_broadcast+0x10/0x10 [15267.986411] sock_sendmsg+0xb5/0xf0 [15267.986414] ___sys_sendmsg+0x6a2/0x8c0 [15267.986418] ? ___sys_recvmsg+0x333/0x590 [15267.986421] ? SYSC_sendto+0x300/0x300 [15267.986424] ? sock_sendmsg+0xb5/0xf0 [15267.986428] ? sock_write_iter+0x1e0/0x3b0 [15267.986435] ? _raw_spin_unlock_irq+0x39/0x60 [15267.986440] ? __enqueue_entity+0x134/0x230 [15267.986443] ? update_curr+0x14b/0x490 [15267.986447] ? put_prev_entity+0xde/0x2770 [15267.986455] ? debug_smp_processor_id+0x17/0x20 [15267.986461] ? __switch_to+0x5ef/0x1160 [15267.986466] ? __fdget+0xe/0x10 [15267.986470] __sys_sendmsg+0xc1/0x140 [15267.986473] ? __sys_sendmsg+0xc1/0x140 [15267.986477] ? SyS_shutdown+0x170/0x170 [15267.986484] ? vfs_write+0x305/0x490 [15267.986488] ? exit_to_usermode_loop+0x75/0xf0 [15267.986492] SyS_sendmsg+0xd/0x20 [15267.986496] entry_SYSCALL_64_fastpath+0x13/0x94 [15267.986501] RIP: 0033:0x7f7fc928dad7 [15267.986505] RSP: 002b:00007ffe288586f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [15267.986512] RAX: ffffffffffffffda RBX: 00000000007b5090 RCX: 00007f7fc928dad7 [15267.986515] RDX: 0000000000000000 RSI: 00007ffe28858750 RDI: 0000000000000006 [15267.986519] RBP: 0000000000001000 R08: 0000000000000000 R09: 000000000000004b [15267.986522] R10: 00000000007b9550 R11: 0000000000000246 R12: 00000000007af1f0 [15267.986525] R13: 00000000007a7a80 R14: 0000000000000004 R15: 00000000007ba410 [15267.986529] Memory state around the buggy address: [15267.986534] ffffffffa051a200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [15267.986538] ffffffffa051a280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [15267.986541] >ffffffffa051a300: 00 00 00 00 00 00 fa fa fa fa fa fa 00 00 00 00 [15267.986544] ^ [15267.986547] ffffffffa051a380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [15267.986550] ffffffffa051a400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [15267.986553] ================================================================== [15267.986555] Disabling lock debugging due to kernel taint [15267.986579] basic=15f, band=1, cf1=5530 [15267.987400] basic=15, band=1, cf1=5530 [15267.987403] basic=15, band=1, cf1=5530 [15267.990474] wlp2s0: send auth to c8:0e:14:dd:75:4b (try 1/3) [15268.006164] wlp2s0: authenticated [15268.006654] wlp2s0: associate with c8:0e:14:dd:75:4b (try 1/3) [15268.080435] wlp2s0: RX AssocResp from c8:0e:14:dd:75:4b (capab=0x411 status=0 aid=1) [15268.080901] basic=15, band=1, cf1=5530 [15268.080908] basic=15, band=1, cf1=5530 [15268.082659] wlp2s0: associated [15268.082763] IPv6: ADDRCONF(NETDEV_CHANGE): wlp2s0: link becomes ready [15268.108746] basic=15, band=1, cf1=5530 [15268.108753] basic=15, band=1, cf1=5530 [15268.157255] wlp2s0: Limiting TX power to 27 (30 - 3) dBm as advertised by c8:0e:14:dd:75:4b [15268.157268] basic=15, band=1, cf1=5530 [15268.157272] basic=15, band=1, cf1=5530 [15268.168776] basic=15, band=1, cf1=5530 [15268.168783] basic=15, band=1, cf1=5530
Seems like only when we call ieee80211_prep_connection we call ieee80211_get_rates which sets up the rates properly. Since we had a 2.4GHz network configured in the vif first, then we have CCK rates. Now we move bss_conf to a 5GHz network, we should reset the basic rates. We don't do that until we call ieee80211_prep_connection which is very clear in your logs: [15267.986579] basic=15f, band=1, cf1=5530 <== BAD VALUE, CCK rates with band = 5.2GHz [15267.987400] basic=15, band=1, cf1=5530 <== GOOD VALUE [15267.987403] basic=15, band=1, cf1=5530 [15267.990474] wlp2s0: send auth to c8:0e:14:dd:75:4b (try 1/3) I guess we need to see who assigns the band and doesn't update the basic rates. Will try to dig a bit later tonight.
I've noticed the bug occurs often with the following scenario: * connected to network1 2.4GHz * put laptop into suspend and move to other place * resume and connect to network2 5GHz so I tried to reproduce this, but was unsuccessful [94534.583518] PM: Finishing wakeup. [94534.583521] Restarting tasks ... done. [94534.598055] Bluetooth: hci0: Bootloader revision 0.0 build 2 week 52 2014 [94534.605208] Bluetooth: hci0: Device revision is 5 [94534.605219] Bluetooth: hci0: Secure boot is enabled [94534.605223] Bluetooth: hci0: OTP lock is enabled [94534.605227] Bluetooth: hci0: API lock is enabled [94534.605230] Bluetooth: hci0: Debug lock is disabled [94534.605234] Bluetooth: hci0: Minimum firmware build 1 week 10 2014 [94534.605249] Bluetooth: hci0: Found device firmware: intel/ibt-11-5.sfi [94534.686890] thermal thermal_zone5: failed to read out thermal zone (-5) [94534.741661] IPv6: ADDRCONF(NETDEV_UP): enp1s0: link is not ready [94534.755095] r8169 0000:01:00.0 enp1s0: link down [94534.755569] IPv6: ADDRCONF(NETDEV_UP): enp1s0: link is not ready [94534.761543] IPv6: ADDRCONF(NETDEV_UP): wlp2s0: link is not ready [94534.763056] iwlwifi 0000:02:00.0: L1 Enabled - LTR Disabled [94534.763792] iwlwifi 0000:02:00.0: L1 Enabled - LTR Disabled [94534.891847] iwlwifi 0000:02:00.0: L1 Enabled - LTR Disabled [94534.892139] iwlwifi 0000:02:00.0: L1 Enabled - LTR Disabled [94534.977029] basic=15f, band=0, cf1=2437 [94534.977035] basic=15f, band=0, cf1=2437 [94534.977305] basic=15f, band=0, cf1=2437 [94534.977308] basic=15f, band=0, cf1=2437 [94534.977413] basic=15f, band=0, cf1=2437 [94534.977415] basic=15f, band=0, cf1=2437 [94534.977680] IPv6: ADDRCONF(NETDEV_UP): wlp2s0: link is not ready [94534.986038] iwlwifi 0000:02:00.0: L1 Enabled - LTR Disabled [94534.986766] iwlwifi 0000:02:00.0: L1 Enabled - LTR Disabled [94535.114831] iwlwifi 0000:02:00.0: L1 Enabled - LTR Disabled [94535.115134] iwlwifi 0000:02:00.0: L1 Enabled - LTR Disabled [94535.198667] basic=15f, band=0, cf1=2437 [94535.198673] basic=15f, band=0, cf1=2437 [94535.199071] basic=15f, band=0, cf1=2437 [94535.199075] basic=15f, band=0, cf1=2437 [94535.199254] basic=15f, band=0, cf1=2437 [94535.199257] basic=15f, band=0, cf1=2437 [94535.199552] IPv6: ADDRCONF(NETDEV_UP): wlp2s0: link is not ready [94535.234885] IPv6: ADDRCONF(NETDEV_UP): wlp2s0: link is not ready [94535.344136] psmouse serio3: synaptics: queried max coordinates: x [..5672], y [..4828] [94535.379918] psmouse serio3: synaptics: queried min coordinates: x [1336..], y [1070..] [94536.631826] Bluetooth: hci0: Waiting for firmware download to complete [94536.632917] Bluetooth: hci0: Firmware loaded in 1988415 usecs [94536.633049] Bluetooth: hci0: Waiting for device to boot [94536.645120] Bluetooth: hci0: Device booted in 11887 usecs [94536.645181] Bluetooth: hci0: Found Intel DDC parameters: intel/ibt-11-5.ddc [94536.649028] Bluetooth: hci0: Applying Intel DDC parameters completed [94538.729735] iwlwifi 0000:02:00.0: L1 Enabled - LTR Disabled [94538.730429] iwlwifi 0000:02:00.0: L1 Enabled - LTR Disabled [94538.860440] iwlwifi 0000:02:00.0: L1 Enabled - LTR Disabled [94538.860918] iwlwifi 0000:02:00.0: L1 Enabled - LTR Disabled [94538.945091] basic=15f, band=0, cf1=2437 [94538.945098] basic=15f, band=0, cf1=2437 [94538.945389] basic=15f, band=0, cf1=2437 [94538.945392] basic=15f, band=0, cf1=2437 [94538.945497] basic=15f, band=0, cf1=2437 [94538.945500] basic=15f, band=0, cf1=2437 [94538.945916] IPv6: ADDRCONF(NETDEV_UP): wlp2s0: link is not ready [94542.277285] wlp2s0: authenticate with c8:0e:14:dd:75:4b [94542.280725] basic=15f, band=1, cf1=5530 [94542.280732] basic=15f, band=1, cf1=5530 [94542.281718] basic=15, band=1, cf1=5530 [94542.281725] basic=15, band=1, cf1=5530 [94542.284638] wlp2s0: send auth to c8:0e:14:dd:75:4b (try 1/3) [94542.299398] wlp2s0: authenticated [94542.301633] wlp2s0: associate with c8:0e:14:dd:75:4b (try 1/3) [94542.364684] wlp2s0: RX AssocResp from c8:0e:14:dd:75:4b (capab=0x411 status=0 aid=2) [94542.365016] basic=15, band=1, cf1=5530 [94542.365020] basic=15, band=1, cf1=5530 [94542.378625] wlp2s0: associated [94542.378810] IPv6: ADDRCONF(NETDEV_CHANGE): wlp2s0: link becomes ready [94542.383729] wlp2s0: Limiting TX power to 27 (30 - 3) dBm as advertised by c8:0e:14:dd:75:4b [94542.383745] basic=15, band=1, cf1=5530 [94542.383749] basic=15, band=1, cf1=5530 [94542.395980] basic=15, band=1, cf1=5530 [94542.395986] basic=15, band=1, cf1=5530 [94542.471287] basic=15, band=1, cf1=5530 [94542.471293] basic=15, band=1, cf1=5530 it's again the [94542.280725] basic=15f, band=1, cf1=5530 but without kasan complaining
we don't recalculate the basic_rates when we update the channel context. Doing this can be a bit tricky. I'll try to come up with a patch and go through a round of reviews internally. I'll let you know.
I reproduced the problem. It comes from the fact that we assign the channel context to the driver before we set the basic rates (and not only). The flow goes like this: ieee80211_mgd_auth | -->ieee80211_prep_connection | -->ieee80211_prep_channel (updates the band) | -->ieee80211_vif_use_channel | -->ieee80211_assign_vif_chanctx (new band, old basic_rates) | -->ieee80211_get_rates (updates the basic_rates) I'll try to see a way to disentangle the flow.
Created attachment 256615 [details] Fix Hi, here is a fix. Can you please test?
doesn't work, with this patch I don't get a connection and dmesg shows me several of this: [ 39.586269] wlp2s0: authenticate with 02:a0:f9:37:8e:a5 [ 39.586311] ------------[ cut here ]------------ [ 39.586370] WARNING: CPU: 1 PID: 2351 at net/mac80211/mlme.c:4372 ieee80211_prep_connection+0x1678/0x2070 [mac80211] [ 39.586375] Modules linked in: cmac rfcomm bnep btusb btrtl btbcm btintel bluetooth ehci_pci rtsx_pci_ms rtsx_pci_sdmmc mmc_core memstick hp_wmi edac_core kvm_amd kvm irqbypass aesni_intel iwlmvm snd_hda_codec_conexan t snd_hda_codec_generic aes_x86_64 crypto_simd mac80211 snd_hda_codec_hdmi cryptd glue_helper iwlwifi snd_hda_intel k10temp cfg80211 snd_hda_codec fam15h_power i2c_piix4 ehci_hcd snd_hwdep snd_hda_core xhci_pci xhci_hcd rfkill r8169 rtsx_pci snd_pcm mii snd_timer snd soundcore wmi i2c_designware_platform i2c_designware_core hp_wireless efivarfs autofs4 [ 39.586509] CPU: 1 PID: 2351 Comm: wpa_supplicant Not tainted 4.11.0-00005-ga0ffa19501cf #192 [ 39.586515] Hardware name: HP HP ProBook 645 G2/80FE, BIOS N77 Ver. 01.07 11/01/2016 [ 39.586520] Call Trace: [ 39.586533] dump_stack+0x9e/0x11a [ 39.586541] ? _atomic_dec_and_lock+0x176/0x176 [ 39.586550] ? wake_up_klogd+0xe6/0x140 [ 39.586578] ? ieee80211_prep_connection+0x1678/0x2070 [mac80211] [ 39.586586] __warn+0x16d/0x1b0 [ 39.586594] warn_slowpath_null+0x18/0x20 [ 39.586620] ieee80211_prep_connection+0x1678/0x2070 [mac80211] [ 39.586628] ? _raw_spin_unlock_irq+0x110/0x110 [ 39.586654] ? drv_mgd_prepare_tx+0x590/0x590 [mac80211] [ 39.586661] ? vprintk_emit+0x302/0x3b0 [ 39.586668] ? save_stack+0xa3/0xd0 [ 39.586692] ? cfg80211_find_ie_match+0x13a/0x1f0 [cfg80211] [ 39.586700] ? vprintk_default+0x1a/0x20 [ 39.586705] ? vprintk_func+0x2a/0xd0 [ 39.586713] ? load_image_and_restore+0xf6/0xf6 [ 39.586720] ? __kmalloc+0xf6/0x220 [ 39.586745] ieee80211_mgd_auth+0x5d2/0xdd0 [mac80211] [ 39.586752] ? _raw_spin_unlock_irq+0x110/0x110 [ 39.586776] ? ieee80211_mlme_notify_scan_completed+0x450/0x450 [mac80211] [ 39.586783] ? unwind_next_frame+0x1f6/0xd00 [ 39.586809] ieee80211_auth+0x13/0x20 [mac80211] [ 39.586830] cfg80211_mlme_auth+0x430/0xd40 [cfg80211] [ 39.586837] ? unwind_get_return_address+0x182/0x2a0 [ 39.586843] ? unwind_dump.isra.5.part.6+0x320/0x320 [ 39.586862] ? cfg80211_rx_mgmt+0x9f0/0x9f0 [cfg80211] [ 39.586870] ? __unwind_start+0x17b/0x380 [ 39.586876] ? memset+0x31/0x40 [ 39.586897] ? parse_station_flags.isra.36+0x4c0/0x4c0 [cfg80211] [ 39.586904] ? rtnl_kfree_skbs+0x60/0x60 [ 39.586923] nl80211_authenticate+0xb3b/0x13f0 [cfg80211] [ 39.586943] ? nl80211_parse_key+0xfb0/0xfb0 [cfg80211] [ 39.586964] ? nl80211_pre_doit+0xcd/0x560 [cfg80211] [ 39.586970] ? ns_capable+0xe/0x10 [ 39.586979] genl_family_rcv_msg+0x68b/0x1610 [ 39.586986] ? genl_rcv+0x40/0x40 [ 39.586992] ? netlink_insert+0x3d0/0x3d0 [ 39.586998] ? kasan_unpoison_shadow+0x35/0x50 [ 39.587004] ? kasan_slab_alloc+0x12/0x20 [ 39.587011] ? netlink_seq_start+0xe0/0xe0 [ 39.587017] genl_rcv_msg+0x9b/0x120 [ 39.587025] ? trace_raw_output_rcu_utilization+0x150/0x150 [ 39.587031] netlink_rcv_skb+0x125/0x340 [ 39.587036] ? kasan_check_write+0x14/0x20 [ 39.587042] ? genl_family_rcv_msg+0x1610/0x1610 [ 39.587048] genl_rcv+0x23/0x40 [ 39.587055] netlink_unicast+0x422/0x630 [ 39.587061] ? netlink_attachskb+0xa70/0xa70 [ 39.587067] ? unix_stream_sendmsg+0x9ba/0xba0 [ 39.587075] netlink_sendmsg+0xa06/0xe20 [ 39.587082] ? netlink_broadcast+0x10/0x10 [ 39.587089] ? netlink_broadcast+0x10/0x10 [ 39.587098] sock_sendmsg+0xb5/0xf0 [ 39.587104] ___sys_sendmsg+0x758/0x9d0 [ 39.587111] ? SYSC_sendto+0x600/0x600 [ 39.587116] ? sock_sendmsg+0xb5/0xf0 [ 39.587122] ? sock_write_iter+0x2d0/0x610 [ 39.587128] ? sock_sendmsg+0xf0/0xf0 [ 39.587136] ? iterate_fd+0x2d0/0x2d0 [ 39.587143] ? __vfs_write+0x4c5/0x940 [ 39.587149] ? poll_select_copy_remaining+0x3af/0x880 [ 39.587156] ? vfs_dedupe_get_page.isra.20+0x3c0/0x3c0 [ 39.587164] ? pvclock_gtod_unregister_notifier+0x40/0x40 [ 39.587170] ? __fdget+0xe/0x10 [ 39.587176] __sys_sendmsg+0xd9/0x230 [ 39.587181] ? __sys_sendmsg+0xd9/0x230 [ 39.587186] ? SyS_shutdown+0x2d0/0x2d0 [ 39.587192] ? __fdget_raw+0x10/0x10 [ 39.587199] ? SyS_select+0x1ac/0x250 [ 39.587205] ? kasan_check_read+0x11/0x20 [ 39.587212] ? _copy_to_user+0x6b/0x80 [ 39.587220] SyS_sendmsg+0xd/0x20 [ 39.587227] entry_SYSCALL_64_fastpath+0x13/0x94 [ 39.587234] RIP: 0033:0x7f499ac6ead7 [ 39.587239] RSP: 002b:00007ffd194b4b68 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 39.587247] RAX: ffffffffffffffda RBX: 00000000011f1ce0 RCX: 00007f499ac6ead7 [ 39.587251] RDX: 0000000000000000 RSI: 00007ffd194b4bc0 RDI: 0000000000000006 [ 39.587256] RBP: 0000000001213dd0 R08: 0000000000000000 R09: 00000000000000a5 [ 39.587260] R10: 0000000000000020 R11: 0000000000000246 R12: 0000000001213dd0 [ 39.587264] R13: 0000000000000000 R14: 0000000000000000 R15: 00000000011f1f30 [ 39.587301] ---[ end trace 13e3cbc096b89c9b ]--- [ 40.025296] wlp2s0: authenticate with 02:a0:f9:36:e9:91 [ 40.025333] ------------[ cut here ]------------ attaching the full dmesg
Created attachment 256617 [details] dmesg output with patch
For testing, you can remove that whole section around line 4372, from rcu_read_lock() to rcu_read_unlock(), and replace it with "rate_flags = 0"
(In reply to Johannes Berg from comment #17) > For testing, you can remove that whole section around line 4372, from > rcu_read_lock() to rcu_read_unlock(), and replace it with "rate_flags = 0" did so and it looks good: [ 63.942990] basic=15f, band=0, cf1=2437 [ 63.943259] IPv6: ADDRCONF(NETDEV_UP): wlp2s0: link is not ready [ 67.385941] wlp2s0: authenticate with c8:0e:14:dd:75:4b [ 67.389355] basic=15, band=1, cf1=5530 and no kasan error
The fix is now in Linus' tree, but shouldn't this go to stable too?