Bug 195677

Summary: iwlwifi: 8260: KASAN: global-out-of-bounds in iwl_mvm_mac_ctxt_cmd_common
Product: Drivers Reporter: Johannes Hirte (johannes.hirte)
Component: network-wirelessAssignee: DO NOT USE - assign "network-wireless-intel" component instead (linuxwifi)
Status: CLOSED CODE_FIX    
Severity: normal    
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 4.11.0 Subsystem:
Regression: No Bisected commit-id:
Attachments: iwlmvm.ko.xz
Fix
dmesg output with patch

Description Johannes Hirte 2017-05-05 13:04:19 UTC
While debugging some problems with hanging graphics (https://bugzilla.kernel.org/show_bug.cgi?id=194843), I've found with kasan this bug:

May  5 14:34:01 probook kernel: ==================================================================
May  5 14:34:01 probook kernel: BUG: KASAN: global-out-of-bounds in iwl_mvm_mac_ctxt_cmd_common+0x14b5/0x1610 [iwlmvm] at addr ffffffffa0562336
May  5 14:34:01 probook kernel: Read of size 2 by task wpa_supplicant/2355
May  5 14:34:01 probook kernel: Address belongs to variable iwl_drv_exit+0xf66f/0x339 [iwlwifi]
May  5 14:34:01 probook kernel: CPU: 0 PID: 2355 Comm: wpa_supplicant Not tainted 4.11.0 #179
May  5 14:34:01 probook kernel: Hardware name: HP HP ProBook 645 G2/80FE, BIOS N77 Ver. 01.07 11/01/2016
May  5 14:34:01 probook kernel: Call Trace:
May  5 14:34:01 probook kernel:  dump_stack+0x4f/0x66
May  5 14:34:01 probook kernel:  kasan_report+0x4da/0x510
May  5 14:34:01 probook kernel:  ? iwl_mvm_mac_ctxt_cmd_common+0x14b5/0x1610 [iwlmvm]
May  5 14:34:01 probook kernel:  ? wake_atomic_t_function+0x2b0/0x2b0
May  5 14:34:01 probook kernel:  __asan_report_load2_noabort+0x14/0x20
May  5 14:34:01 probook kernel:  iwl_mvm_mac_ctxt_cmd_common+0x14b5/0x1610 [iwlmvm]
May  5 14:34:01 probook kernel:  ? iwl_mvm_channel_switch_noa_notif+0x40f/0x410 [iwlmvm]
May  5 14:34:01 probook kernel:  ? iwl_mvm_mac_ctxt_send_beacon+0xcb0/0xcb0 [iwlmvm]
May  5 14:34:01 probook kernel:  ? iwl_mvm_send_cmd_pdu+0x91/0xb0 [iwlmvm]
May  5 14:34:01 probook kernel:  ? iwl_mvm_send_cmd+0x160/0x160 [iwlmvm]
May  5 14:34:01 probook kernel:  iwl_mvm_mac_ctxt_cmd_sta+0xd1/0xe70 [iwlmvm]
May  5 14:34:01 probook kernel:  ? iwl_mvm_mac_ctxt_cmd_common+0x1610/0x1610 [iwlmvm]
May  5 14:34:01 probook kernel:  ? iwl_mvm_phy_ctxt_apply.constprop.3+0x31f/0x5d0 [iwlmvm]
May  5 14:34:01 probook kernel:  ? iwl_mvm_ref_taken+0x150/0x150 [iwlmvm]
May  5 14:34:01 probook kernel:  iwl_mvm_mac_ctx_send+0x68/0x110 [iwlmvm]
May  5 14:34:01 probook kernel:  iwl_mvm_mac_ctxt_changed+0x68/0x180 [iwlmvm]
May  5 14:34:01 probook kernel:  iwl_mvm_bss_info_changed+0x2f8/0xec0 [iwlmvm]
May  5 14:34:01 probook kernel:  ieee80211_bss_info_change_notify+0x177/0x4c0 [mac80211]
May  5 14:34:01 probook kernel:  ? __ieee80211_recalc_txpower+0x111/0x320 [mac80211]
May  5 14:34:01 probook kernel:  ieee80211_assign_vif_chanctx+0x7ce/0xf80 [mac80211]
May  5 14:34:01 probook kernel:  ieee80211_vif_use_channel+0x3ad/0x780 [mac80211]
May  5 14:34:01 probook kernel:  ieee80211_prep_connection+0x55b/0x1cf0 [mac80211]
May  5 14:34:01 probook kernel:  ? ieee80211_handle_bss_capability+0x220/0x220 [mac80211]
May  5 14:34:01 probook kernel:  ? __kmalloc+0x126/0x220
May  5 14:34:01 probook kernel:  ieee80211_mgd_auth+0x69d/0xdd0 [mac80211]
May  5 14:34:01 probook kernel:  ? ieee80211_mlme_notify_scan_completed+0x1c0/0x1c0 [mac80211]
May  5 14:34:01 probook kernel:  ieee80211_auth+0x13/0x20 [mac80211]
May  5 14:34:01 probook kernel:  cfg80211_mlme_auth+0x2a7/0x6b0 [cfg80211]
May  5 14:34:01 probook kernel:  ? unwind_get_return_address+0x1e0/0x1e0
May  5 14:34:01 probook kernel:  ? cfg80211_rx_mgmt+0x710/0x710 [cfg80211]
May  5 14:34:01 probook kernel:  ? parse_station_flags.isra.36+0x490/0x490 [cfg80211]
May  5 14:34:01 probook kernel:  nl80211_authenticate+0x8f7/0xfe0 [cfg80211]
May  5 14:34:01 probook kernel:  ? nl80211_parse_key+0xe70/0xe70 [cfg80211]
May  5 14:34:01 probook kernel:  ? nl80211_pre_doit+0xcd/0x560 [cfg80211]
May  5 14:34:01 probook kernel:  ? nla_parse+0xde/0x210
May  5 14:34:01 probook kernel:  genl_family_rcv_msg+0x5c8/0x10f0
May  5 14:34:01 probook kernel:  ? __alloc_skb+0x31f/0x560
May  5 14:34:01 probook kernel:  ? genl_rcv+0x40/0x40
May  5 14:34:01 probook kernel:  ? try_to_wake_up+0xb8/0x1080
May  5 14:34:01 probook kernel:  ? alloc_skb_with_frags+0x8d/0x4c0
May  5 14:34:01 probook kernel:  genl_rcv_msg+0x9b/0x120
May  5 14:34:01 probook kernel:  netlink_rcv_skb+0x23b/0x340
May  5 14:34:01 probook kernel:  ? genl_family_rcv_msg+0x10f0/0x10f0
May  5 14:34:01 probook kernel:  genl_rcv+0x23/0x40
May  5 14:34:01 probook kernel:  netlink_unicast+0x438/0x620
May  5 14:34:01 probook kernel:  ? netlink_attachskb+0x640/0x640
May  5 14:34:01 probook kernel:  netlink_sendmsg+0x86f/0xb60
May  5 14:34:01 probook kernel:  ? netlink_broadcast+0x10/0x10
May  5 14:34:01 probook kernel:  ? netlink_broadcast+0x10/0x10
May  5 14:34:01 probook kernel:  sock_sendmsg+0xb5/0xf0
May  5 14:34:01 probook kernel:  ___sys_sendmsg+0x6a2/0x8c0
May  5 14:34:01 probook kernel:  ? ___sys_recvmsg+0x333/0x590
May  5 14:34:01 probook kernel:  ? SYSC_sendto+0x300/0x300
May  5 14:34:01 probook kernel:  ? sock_sendmsg+0xb5/0xf0
May  5 14:34:01 probook kernel:  ? sock_write_iter+0x1e0/0x3b0
May  5 14:34:01 probook kernel:  ? _raw_spin_unlock_irq+0x39/0x60
May  5 14:34:01 probook kernel:  ? sock_sendmsg+0xf0/0xf0
May  5 14:34:01 probook kernel:  ? __vfs_write+0x299/0x620
May  5 14:34:01 probook kernel:  ? vfs_dedupe_get_page.isra.20+0x1d0/0x1d0
May  5 14:34:01 probook kernel:  ? __fdget+0xe/0x10
May  5 14:34:01 probook kernel:  __sys_sendmsg+0xc1/0x140
May  5 14:34:01 probook kernel:  ? __sys_sendmsg+0xc1/0x140
May  5 14:34:01 probook kernel:  ? SyS_shutdown+0x170/0x170
May  5 14:34:01 probook kernel:  ? vfs_write+0x305/0x490
May  5 14:34:01 probook kernel:  ? do_syscall_64+0x24d/0x380
May  5 14:34:01 probook kernel:  SyS_sendmsg+0xd/0x20
May  5 14:34:01 probook kernel:  entry_SYSCALL_64_fastpath+0x13/0x94
May  5 14:34:01 probook kernel: RIP: 0033:0x7fdd8d285ad7
May  5 14:34:01 probook kernel: RSP: 002b:00007ffcabeb1d98 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
May  5 14:34:01 probook kernel: RAX: ffffffffffffffda RBX: 0000000001083770 RCX: 00007fdd8d285ad7
May  5 14:34:01 probook kernel: RDX: 0000000000000000 RSI: 00007ffcabeb1df0 RDI: 0000000000000006
May  5 14:34:01 probook kernel: RBP: 0000000001068ff0 R08: 0000000000000000 R09: 000000000000003f
May  5 14:34:01 probook kernel: R10: 000000000107cbd0 R11: 0000000000000246 R12: 0000000000000000
May  5 14:34:01 probook kernel: R13: 0000000000000000 R14: 0000000001068ff0 R15: 000000000107c520
May  5 14:34:01 probook kernel: Memory state around the buggy address:
May  5 14:34:01 probook kernel:  ffffffffa0562200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
May  5 14:34:01 probook kernel:  ffffffffa0562280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
May  5 14:34:01 probook kernel: >ffffffffa0562300: 00 00 00 00 00 00 fa fa fa fa fa fa 00 00 00 00
May  5 14:34:01 probook kernel:                                      ^
May  5 14:34:01 probook kernel:  ffffffffa0562380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
May  5 14:34:01 probook kernel:  ffffffffa0562400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
May  5 14:34:01 probook kernel: ==================================================================
May  5 14:34:01 probook kernel: Disabling lock debugging due to kernel taint

It's the second time, I'm seeing this out-of-bounds error. Maybe it happens way more often, and this is the root cause of my problems with amdgpu.

hardware is a Wireless-AC 8260

lspci:
=====
00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1576
00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Device 1577
00:01.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Carrizo (rev c5)
00:01.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Kabini HDMI/DP Audio
00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 157b
00:02.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 157c
00:02.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 157c
00:02.5 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 157c
00:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 157b
00:08.0 Encryption controller: Advanced Micro Devices, Inc. [AMD] Device 1578
00:09.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 157d
00:09.2 Audio device: Advanced Micro Devices, Inc. [AMD] Device 157a
00:10.0 USB controller: Advanced Micro Devices, Inc. [AMD] FCH USB XHCI Controller (rev 20)
00:11.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 49)
00:12.0 USB controller: Advanced Micro Devices, Inc. [AMD] FCH USB EHCI Controller (rev 49)
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 4a)
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 11)
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1570
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1571
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1572
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1573
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1574
00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1575
01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)                                                                                             
02:00.0 Network controller: Intel Corporation Wireless 8260 (rev 3a)                                                                                                                                                        
03:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS522A PCI Express Card Reader (rev 01)
Comment 1 Emmanuel Grumbach 2017-05-11 06:16:57 UTC
Can you please attach iwlmvm.ko so that I can see exactly what line it is?
Comment 2 Emmanuel Grumbach 2017-05-11 06:19:53 UTC
I just reviewed the function and I can't find anything suspicious. All the memcpy are well into the their buffers.
I'd need you to help me finding what iwl_mvm_mac_ctxt_cmd_common+0x14b5/0x1610 really points to.
Comment 3 Johannes Hirte 2017-05-11 11:45:49 UTC
Just noticed I've build the kernel without debug infos. Rebuilding with debug now and provide you the requested infos.
Comment 4 Johannes Hirte 2017-05-11 15:30:48 UTC
Created attachment 256397 [details]
iwlmvm.ko.xz

iwlmvm.ko xz compressed
Comment 5 Johannes Hirte 2017-05-11 15:34:30 UTC
probook ~ # cat /sys/module/iwlmvm/sections/.text
0xffffffffa06c8000

(gdb) add-symbol-file drivers/net/wireless/intel/iwlwifi/mvm/iwlmvm.ko 0xffffffffa06c8000
add symbol table from file "drivers/net/wireless/intel/iwlwifi/mvm/iwlmvm.ko" at
        .text_addr = 0xa06c8000
(y or n) y
Reading symbols from drivers/net/wireless/intel/iwlwifi/mvm/iwlmvm.ko...done.
(gdb) list *(iwl_mvm_mac_ctxt_cmd_common+0x14b5)
0xffffffffa06f3c55 is in iwl_mvm_mac_ctxt_cmd_common (drivers/net/wireless/intel/iwlwifi/mvm/mac-ctxt.c:578).
573             int i;
574
575             sband = mvm->hw->wiphy->bands[band];
576
577             for_each_set_bit(i, &basic, BITS_PER_LONG) {
578                     int hw = sband->bitrates[i].hw_value;
579                     if (hw >= IWL_FIRST_OFDM_RATE) {
580                             ofdm |= BIT(hw - IWL_FIRST_OFDM_RATE);
581                             if (lowest_present_ofdm > hw)
582                                     lowest_present_ofdm = hw;
(gdb)

I don't know if the addresses from kasan report are still correct for the new compiled kernel with debug infos. Maybe I have to wait for the next kasan report to be sure.
Comment 6 Johannes Hirte 2017-05-12 20:41:36 UTC
Confirmed with recompiled kernel with debug infos:

BUG: KASAN: global-out-of-bounds in iwl_mvm_mac_ctxt_cmd_common+0x14b5/0x1610 [iwlmvm] at addr ffffffffa050a336

gdb again points at

int hw = sband->bitrates[i].hw_value;
Comment 7 Emmanuel Grumbach 2017-05-14 20:07:49 UTC
ok - I'll try to take a look.

Can you please print the value of 'basic' in this function?
This can depend on your Access Point.
You can also just record the connection flow with tracing, that will do as well.

thanks.
Comment 8 Johannes Berg 2017-05-14 20:54:04 UTC
Maybe, if you're printing things already, also print "band" and "vif->bss_conf.chandef.center_freq1"

  printk(KERN_DEBUG "basic=%x, band=%d, cf1=%d\n",
         basic, band, vif->bss_conf.chandef.center_freq1);

somehow I feel that perhaps vif->bss_conf.chandef and the chanctx got out of sync, perhaps in some temporary state
Comment 9 Johannes Hirte 2017-05-15 15:20:28 UTC
code changes:

        sband = mvm->hw->wiphy->bands[band];

        printk(KERN_DEBUG "basic=%lx, band=%d, cf1=%d\n",
               basic, band, vif->bss_conf.chandef.center_freq1);

        for_each_set_bit(i, &basic, BITS_PER_LONG) {
                int hw = sband->bitrates[i].hw_value;
                if (hw >= IWL_FIRST_OFDM_RATE) {
                        ofdm |= BIT(hw - IWL_FIRST_OFDM_RATE);
                        if (lowest_present_ofdm > hw)
                                lowest_present_ofdm = hw;
                } else {
                        BUILD_BUG_ON(IWL_FIRST_CCK_RATE != 0);

                        cck |= BIT(hw);
                        if (lowest_present_cck > hw)
                                lowest_present_cck = hw;
                }
        }

        printk(KERN_DEBUG "basic=%lx, band=%d, cf1=%d\n",
               basic, band, vif->bss_conf.chandef.center_freq1);

error output:

[15267.263770] iwlwifi 0000:02:00.0: L1 Enabled - LTR Disabled
[15267.264443] iwlwifi 0000:02:00.0: L1 Enabled - LTR Disabled
[15267.392773] iwlwifi 0000:02:00.0: L1 Enabled - LTR Disabled
[15267.393070] iwlwifi 0000:02:00.0: L1 Enabled - LTR Disabled
[15267.475268] basic=15f, band=0, cf1=2437
[15267.475273] basic=15f, band=0, cf1=2437
[15267.475548] basic=15f, band=0, cf1=2437
[15267.475551] basic=15f, band=0, cf1=2437
[15267.475957] basic=15f, band=0, cf1=2437
[15267.475961] basic=15f, band=0, cf1=2437
[15267.476230] IPv6: ADDRCONF(NETDEV_UP): wlp2s0: link is not ready
[15267.982314] wlp2s0: authenticate with c8:0e:14:dd:75:4b
[15267.985789] basic=15f, band=1, cf1=5530
[15267.985796] ==================================================================
[15267.985845] BUG: KASAN: global-out-of-bounds in iwl_mvm_mac_ctxt_cmd_common+0x15e4/0x1750 [iwlmvm] at addr ffffffffa051a336
[15267.985851] Read of size 2 by task wpa_supplicant/2366
[15267.985865] Address belongs to variable iwl_drv_exit+0xf66f/0x339 [iwlwifi]
[15267.985872] CPU: 0 PID: 2366 Comm: wpa_supplicant Not tainted 4.11.0-00002-g426aae1c9a0e #189
[15267.985877] Hardware name: HP HP ProBook 645 G2/80FE, BIOS N77 Ver. 01.07 11/01/2016
[15267.985881] Call Trace:
[15267.985893]  dump_stack+0x4f/0x66
[15267.985899]  kasan_report+0x4da/0x510
[15267.985910]  ? iwl_mvm_mac_ctxt_cmd_common+0x15e4/0x1750 [iwlmvm]
[15267.985919]  ? iwl_pcie_hcmd_complete+0xef0/0xef0 [iwlwifi]
[15267.985924]  __asan_report_load2_noabort+0x14/0x20
[15267.985934]  iwl_mvm_mac_ctxt_cmd_common+0x15e4/0x1750 [iwlmvm]
[15267.985945]  ? iwl_mvm_channel_switch_noa_notif+0x3cf/0x410 [iwlmvm]
[15267.985954]  ? iwl_mvm_mac_ctxt_send_beacon+0xcb0/0xcb0 [iwlmvm]
[15267.985964]  ? iwl_mvm_send_cmd_pdu+0x91/0xb0 [iwlmvm]
[15267.985973]  ? iwl_mvm_send_cmd+0x160/0x160 [iwlmvm]
[15267.985983]  iwl_mvm_mac_ctxt_cmd_sta+0xd1/0xe70 [iwlmvm]
[15267.985993]  ? iwl_mvm_mac_ctxt_cmd_common+0x1750/0x1750 [iwlmvm]
[15267.986002]  ? iwl_mvm_phy_ctxt_apply.constprop.3+0x31f/0x5d0 [iwlmvm]
[15267.986011]  ? iwl_mvm_ref_taken+0x150/0x150 [iwlmvm]
[15267.986021]  iwl_mvm_mac_ctx_send+0x68/0x110 [iwlmvm]
[15267.986030]  iwl_mvm_mac_ctxt_changed+0x68/0x180 [iwlmvm]
[15267.986039]  iwl_mvm_bss_info_changed+0x2f8/0xec0 [iwlmvm]
[15267.986090]  ieee80211_bss_info_change_notify+0x177/0x4c0 [mac80211]
[15267.986107]  ? __ieee80211_recalc_txpower+0x111/0x320 [mac80211]
[15267.986123]  ieee80211_assign_vif_chanctx+0x7ce/0xf80 [mac80211]
[15267.986139]  ieee80211_vif_use_channel+0x3ad/0x780 [mac80211]
[15267.986155]  ieee80211_prep_connection+0x55b/0x1cf0 [mac80211]
[15267.986172]  ? ieee80211_handle_bss_capability+0x220/0x220 [mac80211]
[15267.986178]  ? __kmalloc+0x126/0x220
[15267.986193]  ieee80211_mgd_auth+0x69d/0xdd0 [mac80211]
[15267.986209]  ? ieee80211_mlme_notify_scan_completed+0x1c0/0x1c0 [mac80211]
[15267.986226]  ieee80211_auth+0x13/0x20 [mac80211]
[15267.986267]  cfg80211_mlme_auth+0x2a7/0x6b0 [cfg80211]
[15267.986274]  ? unwind_get_return_address+0x1e0/0x1e0
[15267.986287]  ? cfg80211_rx_mgmt+0x710/0x710 [cfg80211]
[15267.986301]  ? parse_station_flags.isra.36+0x490/0x490 [cfg80211]
[15267.986314]  nl80211_authenticate+0x8f7/0xfe0 [cfg80211]
[15267.986328]  ? nl80211_parse_key+0xe70/0xe70 [cfg80211]
[15267.986341]  ? nl80211_pre_doit+0xcd/0x560 [cfg80211]
[15267.986346]  ? nla_parse+0xde/0x210
[15267.986353]  genl_family_rcv_msg+0x5c8/0x10f0
[15267.986358]  ? __alloc_skb+0x31f/0x560
[15267.986362]  ? genl_rcv+0x40/0x40
[15267.986367]  ? try_to_wake_up+0xb8/0x1080
[15267.986372]  ? alloc_skb_with_frags+0x8d/0x4c0
[15267.986376]  genl_rcv_msg+0x9b/0x120
[15267.986380]  netlink_rcv_skb+0x23b/0x340
[15267.986384]  ? genl_family_rcv_msg+0x10f0/0x10f0
[15267.986388]  genl_rcv+0x23/0x40
[15267.986391]  netlink_unicast+0x438/0x620
[15267.986395]  ? netlink_attachskb+0x640/0x640
[15267.986399]  netlink_sendmsg+0x86f/0xb60
[15267.986403]  ? netlink_broadcast+0x10/0x10
[15267.986406]  ? netlink_broadcast+0x10/0x10
[15267.986411]  sock_sendmsg+0xb5/0xf0
[15267.986414]  ___sys_sendmsg+0x6a2/0x8c0
[15267.986418]  ? ___sys_recvmsg+0x333/0x590
[15267.986421]  ? SYSC_sendto+0x300/0x300
[15267.986424]  ? sock_sendmsg+0xb5/0xf0
[15267.986428]  ? sock_write_iter+0x1e0/0x3b0
[15267.986435]  ? _raw_spin_unlock_irq+0x39/0x60
[15267.986440]  ? __enqueue_entity+0x134/0x230
[15267.986443]  ? update_curr+0x14b/0x490
[15267.986447]  ? put_prev_entity+0xde/0x2770
[15267.986455]  ? debug_smp_processor_id+0x17/0x20
[15267.986461]  ? __switch_to+0x5ef/0x1160
[15267.986466]  ? __fdget+0xe/0x10
[15267.986470]  __sys_sendmsg+0xc1/0x140
[15267.986473]  ? __sys_sendmsg+0xc1/0x140
[15267.986477]  ? SyS_shutdown+0x170/0x170
[15267.986484]  ? vfs_write+0x305/0x490
[15267.986488]  ? exit_to_usermode_loop+0x75/0xf0
[15267.986492]  SyS_sendmsg+0xd/0x20
[15267.986496]  entry_SYSCALL_64_fastpath+0x13/0x94
[15267.986501] RIP: 0033:0x7f7fc928dad7
[15267.986505] RSP: 002b:00007ffe288586f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[15267.986512] RAX: ffffffffffffffda RBX: 00000000007b5090 RCX: 00007f7fc928dad7
[15267.986515] RDX: 0000000000000000 RSI: 00007ffe28858750 RDI: 0000000000000006
[15267.986519] RBP: 0000000000001000 R08: 0000000000000000 R09: 000000000000004b
[15267.986522] R10: 00000000007b9550 R11: 0000000000000246 R12: 00000000007af1f0
[15267.986525] R13: 00000000007a7a80 R14: 0000000000000004 R15: 00000000007ba410
[15267.986529] Memory state around the buggy address:
[15267.986534]  ffffffffa051a200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[15267.986538]  ffffffffa051a280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[15267.986541] >ffffffffa051a300: 00 00 00 00 00 00 fa fa fa fa fa fa 00 00 00 00
[15267.986544]                                      ^
[15267.986547]  ffffffffa051a380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[15267.986550]  ffffffffa051a400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[15267.986553] ==================================================================
[15267.986555] Disabling lock debugging due to kernel taint
[15267.986579] basic=15f, band=1, cf1=5530
[15267.987400] basic=15, band=1, cf1=5530
[15267.987403] basic=15, band=1, cf1=5530
[15267.990474] wlp2s0: send auth to c8:0e:14:dd:75:4b (try 1/3)
[15268.006164] wlp2s0: authenticated
[15268.006654] wlp2s0: associate with c8:0e:14:dd:75:4b (try 1/3)
[15268.080435] wlp2s0: RX AssocResp from c8:0e:14:dd:75:4b (capab=0x411 status=0 aid=1)
[15268.080901] basic=15, band=1, cf1=5530
[15268.080908] basic=15, band=1, cf1=5530
[15268.082659] wlp2s0: associated
[15268.082763] IPv6: ADDRCONF(NETDEV_CHANGE): wlp2s0: link becomes ready
[15268.108746] basic=15, band=1, cf1=5530
[15268.108753] basic=15, band=1, cf1=5530
[15268.157255] wlp2s0: Limiting TX power to 27 (30 - 3) dBm as advertised by c8:0e:14:dd:75:4b
[15268.157268] basic=15, band=1, cf1=5530
[15268.157272] basic=15, band=1, cf1=5530
[15268.168776] basic=15, band=1, cf1=5530
[15268.168783] basic=15, band=1, cf1=5530
Comment 10 Emmanuel Grumbach 2017-05-16 17:55:42 UTC
Seems like only when we call ieee80211_prep_connection we call ieee80211_get_rates which sets up the rates properly.
Since we had a 2.4GHz network configured in the vif first, then we have CCK rates. Now we move bss_conf to a 5GHz network, we should reset the basic rates.
We don't do that until we call ieee80211_prep_connection which is very clear in your logs:

[15267.986579] basic=15f, band=1, cf1=5530  <== BAD VALUE, CCK rates with band = 5.2GHz
[15267.987400] basic=15, band=1, cf1=5530   <== GOOD VALUE
[15267.987403] basic=15, band=1, cf1=5530
[15267.990474] wlp2s0: send auth to c8:0e:14:dd:75:4b (try 1/3)

I guess we need to see who assigns the band and doesn't update the basic rates.
Will try to dig a bit later tonight.
Comment 11 Johannes Hirte 2017-05-16 19:08:56 UTC
I've noticed the bug occurs often with the following scenario:

* connected to network1 2.4GHz
* put laptop into suspend and move to other place
* resume and connect to network2 5GHz

so I tried to reproduce this, but was unsuccessful 


[94534.583518] PM: Finishing wakeup.
[94534.583521] Restarting tasks ... done.
[94534.598055] Bluetooth: hci0: Bootloader revision 0.0 build 2 week 52 2014
[94534.605208] Bluetooth: hci0: Device revision is 5
[94534.605219] Bluetooth: hci0: Secure boot is enabled
[94534.605223] Bluetooth: hci0: OTP lock is enabled
[94534.605227] Bluetooth: hci0: API lock is enabled
[94534.605230] Bluetooth: hci0: Debug lock is disabled
[94534.605234] Bluetooth: hci0: Minimum firmware build 1 week 10 2014
[94534.605249] Bluetooth: hci0: Found device firmware: intel/ibt-11-5.sfi
[94534.686890] thermal thermal_zone5: failed to read out thermal zone (-5)
[94534.741661] IPv6: ADDRCONF(NETDEV_UP): enp1s0: link is not ready
[94534.755095] r8169 0000:01:00.0 enp1s0: link down
[94534.755569] IPv6: ADDRCONF(NETDEV_UP): enp1s0: link is not ready
[94534.761543] IPv6: ADDRCONF(NETDEV_UP): wlp2s0: link is not ready
[94534.763056] iwlwifi 0000:02:00.0: L1 Enabled - LTR Disabled
[94534.763792] iwlwifi 0000:02:00.0: L1 Enabled - LTR Disabled
[94534.891847] iwlwifi 0000:02:00.0: L1 Enabled - LTR Disabled
[94534.892139] iwlwifi 0000:02:00.0: L1 Enabled - LTR Disabled
[94534.977029] basic=15f, band=0, cf1=2437
[94534.977035] basic=15f, band=0, cf1=2437
[94534.977305] basic=15f, band=0, cf1=2437
[94534.977308] basic=15f, band=0, cf1=2437
[94534.977413] basic=15f, band=0, cf1=2437
[94534.977415] basic=15f, band=0, cf1=2437
[94534.977680] IPv6: ADDRCONF(NETDEV_UP): wlp2s0: link is not ready
[94534.986038] iwlwifi 0000:02:00.0: L1 Enabled - LTR Disabled
[94534.986766] iwlwifi 0000:02:00.0: L1 Enabled - LTR Disabled
[94535.114831] iwlwifi 0000:02:00.0: L1 Enabled - LTR Disabled
[94535.115134] iwlwifi 0000:02:00.0: L1 Enabled - LTR Disabled
[94535.198667] basic=15f, band=0, cf1=2437
[94535.198673] basic=15f, band=0, cf1=2437
[94535.199071] basic=15f, band=0, cf1=2437
[94535.199075] basic=15f, band=0, cf1=2437
[94535.199254] basic=15f, band=0, cf1=2437
[94535.199257] basic=15f, band=0, cf1=2437
[94535.199552] IPv6: ADDRCONF(NETDEV_UP): wlp2s0: link is not ready
[94535.234885] IPv6: ADDRCONF(NETDEV_UP): wlp2s0: link is not ready
[94535.344136] psmouse serio3: synaptics: queried max coordinates: x [..5672], y [..4828]
[94535.379918] psmouse serio3: synaptics: queried min coordinates: x [1336..], y [1070..]
[94536.631826] Bluetooth: hci0: Waiting for firmware download to complete
[94536.632917] Bluetooth: hci0: Firmware loaded in 1988415 usecs
[94536.633049] Bluetooth: hci0: Waiting for device to boot
[94536.645120] Bluetooth: hci0: Device booted in 11887 usecs
[94536.645181] Bluetooth: hci0: Found Intel DDC parameters: intel/ibt-11-5.ddc
[94536.649028] Bluetooth: hci0: Applying Intel DDC parameters completed
[94538.729735] iwlwifi 0000:02:00.0: L1 Enabled - LTR Disabled
[94538.730429] iwlwifi 0000:02:00.0: L1 Enabled - LTR Disabled
[94538.860440] iwlwifi 0000:02:00.0: L1 Enabled - LTR Disabled
[94538.860918] iwlwifi 0000:02:00.0: L1 Enabled - LTR Disabled
[94538.945091] basic=15f, band=0, cf1=2437
[94538.945098] basic=15f, band=0, cf1=2437
[94538.945389] basic=15f, band=0, cf1=2437
[94538.945392] basic=15f, band=0, cf1=2437
[94538.945497] basic=15f, band=0, cf1=2437
[94538.945500] basic=15f, band=0, cf1=2437
[94538.945916] IPv6: ADDRCONF(NETDEV_UP): wlp2s0: link is not ready
[94542.277285] wlp2s0: authenticate with c8:0e:14:dd:75:4b
[94542.280725] basic=15f, band=1, cf1=5530
[94542.280732] basic=15f, band=1, cf1=5530
[94542.281718] basic=15, band=1, cf1=5530
[94542.281725] basic=15, band=1, cf1=5530
[94542.284638] wlp2s0: send auth to c8:0e:14:dd:75:4b (try 1/3)
[94542.299398] wlp2s0: authenticated
[94542.301633] wlp2s0: associate with c8:0e:14:dd:75:4b (try 1/3)
[94542.364684] wlp2s0: RX AssocResp from c8:0e:14:dd:75:4b (capab=0x411 status=0 aid=2)
[94542.365016] basic=15, band=1, cf1=5530
[94542.365020] basic=15, band=1, cf1=5530
[94542.378625] wlp2s0: associated
[94542.378810] IPv6: ADDRCONF(NETDEV_CHANGE): wlp2s0: link becomes ready
[94542.383729] wlp2s0: Limiting TX power to 27 (30 - 3) dBm as advertised by c8:0e:14:dd:75:4b
[94542.383745] basic=15, band=1, cf1=5530
[94542.383749] basic=15, band=1, cf1=5530
[94542.395980] basic=15, band=1, cf1=5530
[94542.395986] basic=15, band=1, cf1=5530
[94542.471287] basic=15, band=1, cf1=5530
[94542.471293] basic=15, band=1, cf1=5530

it's again the 
[94542.280725] basic=15f, band=1, cf1=5530

but without kasan complaining
Comment 12 Emmanuel Grumbach 2017-05-16 20:12:12 UTC
we don't recalculate the basic_rates when we update the channel context. Doing this can be a bit tricky. I'll try to come up with a patch and go through a round of reviews internally. I'll let you know.
Comment 13 Emmanuel Grumbach 2017-05-18 08:50:22 UTC
I reproduced the problem.

It comes from the fact that we assign the channel context to the driver before we set the basic rates (and not only). The flow goes like this:

ieee80211_mgd_auth
  |
  -->ieee80211_prep_connection
      |
      -->ieee80211_prep_channel (updates the band)
      |
      -->ieee80211_vif_use_channel
          |
          -->ieee80211_assign_vif_chanctx  (new band, old basic_rates)
      |
      -->ieee80211_get_rates (updates the basic_rates)


I'll try to see a way to disentangle the flow.
Comment 14 Emmanuel Grumbach 2017-05-18 11:25:36 UTC
Created attachment 256615 [details]
Fix

Hi,

here is a fix. Can you please test?
Comment 15 Johannes Hirte 2017-05-18 12:30:41 UTC
doesn't work, with this patch I don't get a connection and dmesg shows me several of this:

[   39.586269] wlp2s0: authenticate with 02:a0:f9:37:8e:a5
[   39.586311] ------------[ cut here ]------------
[   39.586370] WARNING: CPU: 1 PID: 2351 at net/mac80211/mlme.c:4372 ieee80211_prep_connection+0x1678/0x2070 [mac80211]
[   39.586375] Modules linked in: cmac rfcomm bnep btusb btrtl btbcm btintel bluetooth ehci_pci rtsx_pci_ms rtsx_pci_sdmmc mmc_core memstick hp_wmi edac_core kvm_amd kvm irqbypass aesni_intel iwlmvm snd_hda_codec_conexan
t snd_hda_codec_generic aes_x86_64 crypto_simd mac80211 snd_hda_codec_hdmi cryptd glue_helper iwlwifi snd_hda_intel k10temp cfg80211 snd_hda_codec fam15h_power i2c_piix4 ehci_hcd snd_hwdep snd_hda_core xhci_pci xhci_hcd 
rfkill r8169 rtsx_pci snd_pcm mii snd_timer snd soundcore wmi i2c_designware_platform i2c_designware_core hp_wireless efivarfs autofs4
[   39.586509] CPU: 1 PID: 2351 Comm: wpa_supplicant Not tainted 4.11.0-00005-ga0ffa19501cf #192
[   39.586515] Hardware name: HP HP ProBook 645 G2/80FE, BIOS N77 Ver. 01.07 11/01/2016
[   39.586520] Call Trace:
[   39.586533]  dump_stack+0x9e/0x11a
[   39.586541]  ? _atomic_dec_and_lock+0x176/0x176
[   39.586550]  ? wake_up_klogd+0xe6/0x140
[   39.586578]  ? ieee80211_prep_connection+0x1678/0x2070 [mac80211]
[   39.586586]  __warn+0x16d/0x1b0
[   39.586594]  warn_slowpath_null+0x18/0x20
[   39.586620]  ieee80211_prep_connection+0x1678/0x2070 [mac80211]
[   39.586628]  ? _raw_spin_unlock_irq+0x110/0x110
[   39.586654]  ? drv_mgd_prepare_tx+0x590/0x590 [mac80211]
[   39.586661]  ? vprintk_emit+0x302/0x3b0
[   39.586668]  ? save_stack+0xa3/0xd0
[   39.586692]  ? cfg80211_find_ie_match+0x13a/0x1f0 [cfg80211]
[   39.586700]  ? vprintk_default+0x1a/0x20
[   39.586705]  ? vprintk_func+0x2a/0xd0
[   39.586713]  ? load_image_and_restore+0xf6/0xf6
[   39.586720]  ? __kmalloc+0xf6/0x220
[   39.586745]  ieee80211_mgd_auth+0x5d2/0xdd0 [mac80211]
[   39.586752]  ? _raw_spin_unlock_irq+0x110/0x110
[   39.586776]  ? ieee80211_mlme_notify_scan_completed+0x450/0x450 [mac80211]
[   39.586783]  ? unwind_next_frame+0x1f6/0xd00
[   39.586809]  ieee80211_auth+0x13/0x20 [mac80211]
[   39.586830]  cfg80211_mlme_auth+0x430/0xd40 [cfg80211]
[   39.586837]  ? unwind_get_return_address+0x182/0x2a0
[   39.586843]  ? unwind_dump.isra.5.part.6+0x320/0x320
[   39.586862]  ? cfg80211_rx_mgmt+0x9f0/0x9f0 [cfg80211]
[   39.586870]  ? __unwind_start+0x17b/0x380
[   39.586876]  ? memset+0x31/0x40
[   39.586897]  ? parse_station_flags.isra.36+0x4c0/0x4c0 [cfg80211]
[   39.586904]  ? rtnl_kfree_skbs+0x60/0x60
[   39.586923]  nl80211_authenticate+0xb3b/0x13f0 [cfg80211]
[   39.586943]  ? nl80211_parse_key+0xfb0/0xfb0 [cfg80211]
[   39.586964]  ? nl80211_pre_doit+0xcd/0x560 [cfg80211]
[   39.586970]  ? ns_capable+0xe/0x10
[   39.586979]  genl_family_rcv_msg+0x68b/0x1610
[   39.586986]  ? genl_rcv+0x40/0x40
[   39.586992]  ? netlink_insert+0x3d0/0x3d0
[   39.586998]  ? kasan_unpoison_shadow+0x35/0x50
[   39.587004]  ? kasan_slab_alloc+0x12/0x20
[   39.587011]  ? netlink_seq_start+0xe0/0xe0
[   39.587017]  genl_rcv_msg+0x9b/0x120
[   39.587025]  ? trace_raw_output_rcu_utilization+0x150/0x150
[   39.587031]  netlink_rcv_skb+0x125/0x340
[   39.587036]  ? kasan_check_write+0x14/0x20
[   39.587042]  ? genl_family_rcv_msg+0x1610/0x1610
[   39.587048]  genl_rcv+0x23/0x40
[   39.587055]  netlink_unicast+0x422/0x630
[   39.587061]  ? netlink_attachskb+0xa70/0xa70
[   39.587067]  ? unix_stream_sendmsg+0x9ba/0xba0
[   39.587075]  netlink_sendmsg+0xa06/0xe20
[   39.587082]  ? netlink_broadcast+0x10/0x10
[   39.587089]  ? netlink_broadcast+0x10/0x10
[   39.587098]  sock_sendmsg+0xb5/0xf0
[   39.587104]  ___sys_sendmsg+0x758/0x9d0
[   39.587111]  ? SYSC_sendto+0x600/0x600
[   39.587116]  ? sock_sendmsg+0xb5/0xf0
[   39.587122]  ? sock_write_iter+0x2d0/0x610
[   39.587128]  ? sock_sendmsg+0xf0/0xf0
[   39.587136]  ? iterate_fd+0x2d0/0x2d0
[   39.587143]  ? __vfs_write+0x4c5/0x940
[   39.587149]  ? poll_select_copy_remaining+0x3af/0x880
[   39.587156]  ? vfs_dedupe_get_page.isra.20+0x3c0/0x3c0
[   39.587164]  ? pvclock_gtod_unregister_notifier+0x40/0x40
[   39.587170]  ? __fdget+0xe/0x10
[   39.587176]  __sys_sendmsg+0xd9/0x230
[   39.587181]  ? __sys_sendmsg+0xd9/0x230
[   39.587186]  ? SyS_shutdown+0x2d0/0x2d0
[   39.587192]  ? __fdget_raw+0x10/0x10
[   39.587199]  ? SyS_select+0x1ac/0x250
[   39.587205]  ? kasan_check_read+0x11/0x20
[   39.587212]  ? _copy_to_user+0x6b/0x80
[   39.587220]  SyS_sendmsg+0xd/0x20
[   39.587227]  entry_SYSCALL_64_fastpath+0x13/0x94
[   39.587234] RIP: 0033:0x7f499ac6ead7
[   39.587239] RSP: 002b:00007ffd194b4b68 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[   39.587247] RAX: ffffffffffffffda RBX: 00000000011f1ce0 RCX: 00007f499ac6ead7
[   39.587251] RDX: 0000000000000000 RSI: 00007ffd194b4bc0 RDI: 0000000000000006
[   39.587256] RBP: 0000000001213dd0 R08: 0000000000000000 R09: 00000000000000a5
[   39.587260] R10: 0000000000000020 R11: 0000000000000246 R12: 0000000001213dd0
[   39.587264] R13: 0000000000000000 R14: 0000000000000000 R15: 00000000011f1f30
[   39.587301] ---[ end trace 13e3cbc096b89c9b ]---
[   40.025296] wlp2s0: authenticate with 02:a0:f9:36:e9:91
[   40.025333] ------------[ cut here ]------------

attaching the full dmesg
Comment 16 Johannes Hirte 2017-05-18 12:31:55 UTC
Created attachment 256617 [details]
dmesg output with patch
Comment 17 Johannes Berg 2017-05-18 15:18:44 UTC
For testing, you can remove that whole section around line 4372, from rcu_read_lock() to rcu_read_unlock(), and replace it with "rate_flags = 0"
Comment 18 Johannes Hirte 2017-05-18 19:54:26 UTC
(In reply to Johannes Berg from comment #17)
> For testing, you can remove that whole section around line 4372, from
> rcu_read_lock() to rcu_read_unlock(), and replace it with "rate_flags = 0"

did so and it looks good:

[   63.942990] basic=15f, band=0, cf1=2437
[   63.943259] IPv6: ADDRCONF(NETDEV_UP): wlp2s0: link is not ready
[   67.385941] wlp2s0: authenticate with c8:0e:14:dd:75:4b
[   67.389355] basic=15, band=1, cf1=5530

and no kasan error
Comment 19 Johannes Hirte 2017-06-21 11:14:36 UTC
The fix is now in Linus' tree, but shouldn't this go to stable too?