Bug 69661 - mwifiex_usb on MS Surface Pro 1 is unstable
Summary: mwifiex_usb on MS Surface Pro 1 is unstable
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: network-wireless (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: drivers_network-wireless@kernel-bugs.osdl.org
URL:
Keywords:
: 64111 (view as bug list)
Depends on:
Blocks:
 
Reported: 2014-01-29 13:03 UTC by Jakub Zeman
Modified: 2016-06-08 13:13 UTC (History)
25 users (show)

See Also:
Kernel Version: 3.15.0
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Whole dmesg (86.14 KB, text/plain)
2014-01-29 13:04 UTC, Jakub Zeman
Details
config file with CONFIG_DYNAMIC_DEBUG enabled (146.35 KB, application/octet-stream)
2014-02-05 10:43 UTC, Jakub Zeman
Details
dmesg with CONFIG_DYNAMIC_DEBUG enabled (87.02 KB, text/plain)
2014-02-06 19:21 UTC, Jakub Zeman
Details
Disable USB autosuspend on Wifi + BT (1.65 KB, patch)
2014-02-11 06:06 UTC, Zwane Mwaikambo
Details | Diff
debugfs output before/after (3.25 KB, text/plain)
2014-02-12 17:59 UTC, Reyad Attiyat
Details
btusb.c disables usb autosuspend for Marvell devices (1.65 KB, patch)
2014-07-18 22:44 UTC, Bing Zhao
Details | Diff
The patch that is needed by patch in attachment 143461 (2.51 KB, patch)
2014-07-22 04:06 UTC, Bing Zhao
Details | Diff
[PATCH 1] Fix surface type/touch cover (1.32 KB, patch)
2015-06-25 23:24 UTC, Reyad Attiyat
Details | Diff
[PATCH 2] Fix double add error on (1.08 KB, patch)
2015-06-25 23:25 UTC, Reyad Attiyat
Details | Diff
[PATCH 3] Mwifiex hack to fix broken urb zero packet (1.45 KB, patch)
2015-06-25 23:26 UTC, Reyad Attiyat
Details | Diff
[PATCH 4] Set tx padding to 0 (821 bytes, patch)
2015-06-25 23:26 UTC, Reyad Attiyat
Details | Diff
[PATCH 5] Enable rx workqueue when rx pending is high (953 bytes, patch)
2015-06-25 23:27 UTC, Reyad Attiyat
Details | Diff
[PATCH 2] mwifiex: Fix double add error (5.57 KB, patch)
2015-06-29 01:10 UTC, Reyad Attiyat
Details | Diff
[PATCH 3] xhci: Add support for urb_zero_packet (5.04 KB, patch)
2015-06-29 01:11 UTC, Reyad Attiyat
Details | Diff

Description Jakub Zeman 2014-01-29 13:03:32 UTC
This bug is related to bug 68001. I entered new bug because the crash is completely different to bug 68001 with 3.13.0 kernel.

I'm using MS Surface Pro 1 hardware with Debian testing OS The module mwifiex_usb for Marvel 88W8797 USB wireless/bluetooth device crashes under heavy traffic. After the crash kernel is unable to shutdown the computer completely (It hangs after the kernel reports "Now halt".

Whole dmesg is attached and below is one of the crash message:

[  242.023784] INFO: task khubd:121 blocked for more than 120 seconds.
[  242.023792]       Not tainted 3.13.0 #1
[  242.023794] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  242.023797] khubd           D ffff880037309b10     0   121      2 0x00000000
[  242.023804]  ffff8800373097d0 0000000000000046 0000000000014240 0000000000014240
[  242.023809]  ffff880036a1ffd8 ffff8800373097d0 ffff8800d75e4088 ffff8800d75e4130
[  242.023813]  ffff8800d75e41f0 0000000000000000 0000000000000000 ffff880036a1fc08
[  242.023818] Call Trace:
[  242.023830]  [<ffffffff81357fc1>] ? rpm_resume+0x191/0x660
[  242.023838]  [<ffffffff81094710>] ? prepare_to_wait_event+0xf0/0xf0
[  242.023843]  [<ffffffff81358575>] ? pm_runtime_forbid+0x35/0x50
[  242.023866]  [<ffffffffa03d2476>] ? mwifiex_usb_resume+0xc6/0x130 [mwifiex_usb]
[  242.023894]  [<ffffffffa0027f5e>] ? usb_resume_interface.isra.5+0x9e/0xd0 [usbcore]
[  242.023909]  [<ffffffffa00281c2>] ? usb_resume_both+0x62/0x120 [usbcore]
[  242.023924]  [<ffffffffa00290a1>] ? usb_runtime_resume+0x11/0x20 [usbcore]
[  242.023928]  [<ffffffff813572d7>] ? __rpm_callback+0x27/0x70
[  242.023933]  [<ffffffff81357378>] ? rpm_callback+0x58/0x80
[  242.023937]  [<ffffffff81358296>] ? rpm_resume+0x466/0x660
[  242.023942]  [<ffffffff813584d3>] ? __pm_runtime_resume+0x43/0x70
[  242.023959]  [<ffffffffa00287e6>] ? usb_autoresume_device+0x16/0x40 [usbcore]
[  242.023974]  [<ffffffffa001bb6d>] ? usb_remote_wakeup.part.29+0xd/0x30 [usbcore]
[  242.023981]  [<ffffffff8149cc59>] ? mutex_lock+0x9/0x25
[  242.023997]  [<ffffffffa002042c>] ? hub_thread+0xf0c/0x1320 [usbcore]
[  242.024003]  [<ffffffff81094710>] ? prepare_to_wait_event+0xf0/0xf0
[  242.024018]  [<ffffffffa001f520>] ? usb_reset_device+0x170/0x170 [usbcore]
[  242.024023]  [<ffffffff8107b361>] ? kthread+0xc1/0xe0
[  242.024027]  [<ffffffff8107b2a0>] ? kthread_create_on_node+0x180/0x180
[  242.024032]  [<ffffffff814a564c>] ? ret_from_fork+0x7c/0xb0
[  242.024036]  [<ffffffff8107b2a0>] ? kthread_create_on_node+0x180/0x180
Comment 1 Jakub Zeman 2014-01-29 13:04:06 UTC
Created attachment 123761 [details]
Whole dmesg
Comment 2 Jakub Zeman 2014-02-04 10:51:19 UTC
In 3.13.1 similar behavior:

[  888.423792] usb 3-3: mwifiex_cmd_timeout_func: Timeout cmd id (1391499915.745699) = 0xa4, act = 0x0
[  888.423830] usb 3-3: num_data_h2c_failure = 0
[  888.423833] usb 3-3: num_cmd_h2c_failure = 0
[  888.423836] usb 3-3: num_cmd_timeout = 1
[  888.423838] usb 3-3: num_tx_timeout = 0
[  888.423841] usb 3-3: last_cmd_index = 1
[  888.423844] usb 3-3: last_cmd_id: 16 00 a4 00 16 00 a4 00 7f 00
[  888.423847] usb 3-3: last_cmd_act: 00 00 00 00 00 00 00 00 00 00
[  888.423850] usb 3-3: last_cmd_resp_index = 0
[  888.423852] usb 3-3: last_cmd_resp_id: 16 80 7f 80 16 80 a4 80 7f 80
[  888.423855] usb 3-3: last_event_index = 3
[  888.423857] usb 3-3: last_event: 00 00 33 00 17 00 2b 00 00 00
[  888.423860] usb 3-3: data_sent=0 cmd_sent=0
[  888.423863] usb 3-3: ps_mode=1 ps_state=0
[  888.423869] usb 3-3: cmd timeout
[  888.423911] usb 3-3: failed to get signal information
[  898.444265] usb 3-3: mwifiex_cmd_timeout_func: Timeout cmd id (1391499925.761746) = 0xa4, act = 0x0
[  898.444278] usb 3-3: num_data_h2c_failure = 0
[  898.444282] usb 3-3: num_cmd_h2c_failure = 0
[  898.444286] usb 3-3: num_cmd_timeout = 2
[  898.444290] usb 3-3: num_tx_timeout = 0
[  898.444303] usb 3-3: last_cmd_index = 2
[  898.444305] usb 3-3: last_cmd_id: 16 00 a4 00 a4 00 a4 00 7f 00
[  898.444306] usb 3-3: last_cmd_act: 00 00 00 00 00 00 00 00 00 00
[  898.444307] usb 3-3: last_cmd_resp_index = 0
[  898.444308] usb 3-3: last_cmd_resp_id: 16 80 7f 80 16 80 a4 80 7f 80
[  898.444310] usb 3-3: last_event_index = 3
[  898.444311] usb 3-3: last_event: 00 00 33 00 17 00 2b 00 00 00
[  898.444312] usb 3-3: data_sent=0 cmd_sent=1
[  898.444313] usb 3-3: ps_mode=1 ps_state=0
[  898.444320] usb 3-3: cmd timeout
[  898.444330] usb 3-3: failed to get signal information
[  898.444640] ------------[ cut here ]------------
[  898.444674] WARNING: CPU: 3 PID: 4240 at drivers/usb/core/urb.c:336 usb_submit_urb+0x3e2/0x400 [usbcore]()
[  898.444675] URB ffff8800a3c89d80 submitted while active
[  898.444676] Modules linked in: tun btusb mwifiex_usb mwifiex cfg80211 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables parport_pc bnep rfcomm ppdev bluetooth lp rfkill parport binfmt_misc uinput nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc nls_utf8 nls_cp437 vfat fat fuse snd_hda_codec_hdmi snd_hda_codec_realtek hid_multitouch hid_sensor_magn_3d hid_sensor_als hid_sensor_gyro_3d hid_sensor_accel_3d hid_sensor_trigger industrialio_triggered_buffer kfifo_buf hid_sensor_iio_common industrialio joydev hid_generic hid_sensor_hub usbhid hid uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev media tpm_infineon x86_pkg_temp_thermal intel_powerclamp iTCO_wdt iTCO_vendor_support coretemp kvm_intel evdev kvm crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 ablk_helper cryptd lrw gf128mul glue_helper pcspkr i2c_i801 snd_hda_intel snd_hda_codec snd_hwdep battery lpc_ich i915 ac mfd_core snd_pcm snd_page_alloc snd_timer tpm_tis tpm snd video drm_kms_helper soundcore button drm mei_me i2c_algo_bit i2c_core mei processor ext4 crc16 mbcache jbd2 sg sd_mod crc_t10dif crct10dif_common ahci libahci libata scsi_mod xhci_hcd thermal ehci_pci thermal_sys ehci_hcd usbcore usb_common
[  898.444722] CPU: 3 PID: 4240 Comm: kworker/u9:1 Not tainted 3.13.1 #1
[  898.444723] Hardware name: Microsoft Corporation Surface with Windows 8 Pro/Surface with Windows 8 Pro, BIOS 1.05.0050 08/20/2013
[  898.444727] Workqueue: MWIFIEX_WORK_QUEUE mwifiex_main_work_queue [mwifiex]
[  898.444738]  0000000000000009 ffffffff81498615 ffff88011780bd08 ffffffff8105b722
[  898.444740]  ffff8800a3c89d80 ffff88011780bd58 0000000000000001 ffff8800a0c08b30
[  898.444742]  ffff880117299340 ffffffff8105b787 ffffffffa0022b58 0000000000000020
[  898.444744] Call Trace:
[  898.444750]  [<ffffffff81498615>] ? dump_stack+0x41/0x51
[  898.444754]  [<ffffffff8105b722>] ? warn_slowpath_common+0x72/0x90
[  898.444757]  [<ffffffff8105b787>] ? warn_slowpath_fmt+0x47/0x50
[  898.444765]  [<ffffffffa0011b02>] ? usb_submit_urb+0x3e2/0x400 [usbcore]
[  898.444767]  [<ffffffffa0792129>] ? mwifiex_usb_host_to_card+0xe9/0x220 [mwifiex_usb]
[  898.444770]  [<ffffffffa07c3b05>] ? mwifiex_exec_next_cmd+0x3b5/0x4c0 [mwifiex]
[  898.444773]  [<ffffffffa07c0762>] ? mwifiex_main_process+0x4d2/0x510 [mwifiex]
[  898.444776]  [<ffffffff810744ad>] ? process_one_work+0x16d/0x420
[  898.444778]  [<ffffffff81075076>] ? worker_thread+0x116/0x3b0
[  898.444780]  [<ffffffff81074f60>] ? rescuer_thread+0x330/0x330
[  898.444782]  [<ffffffff8107b361>] ? kthread+0xc1/0xe0
[  898.444784]  [<ffffffff8107b2a0>] ? kthread_create_on_node+0x180/0x180
[  898.444786]  [<ffffffff814a564c>] ? ret_from_fork+0x7c/0xb0
[  898.444788]  [<ffffffff8107b2a0>] ? kthread_create_on_node+0x180/0x180
[  898.444789] ---[ end trace 54eebdf791ad793a ]---
[  898.444792] usb 3-3: mwifiex_usb_host_to_card: usb_submit_urb failed
[  898.444794] usb 3-3: DNLD_CMD: host to card failed
[  898.444803] usb 3-3: failed to get signal information
[  898.444848] usb 3-3: mwifiex_usb_host_to_card: usb_submit_urb failed
[  898.444850] usb 3-3: DNLD_CMD: host to card failed
[  898.444855] usb 3-3: failed to get signal information
[  903.382528] usb 3-3: data: -ENOSR is returned
[  904.406048] usb 3-3: mwifiex_usb_host_to_card: usb_submit_urb failed
[  904.406055] usb 3-3: DNLD_CMD: host to card failed
[  904.406071] usb 3-3: failed to get signal information
[  904.406137] usb 3-3: mwifiex_usb_host_to_card: usb_submit_urb failed
[  904.406141] usb 3-3: DNLD_CMD: host to card failed
[  904.406146] usb 3-3: failed to get signal information
[  910.407662] usb 3-3: mwifiex_usb_host_to_card: usb_submit_urb failed
[  910.407670] usb 3-3: DNLD_CMD: host to card failed
[  910.407718] usb 3-3: failed to get signal information
[  910.407849] usb 3-3: mwifiex_usb_host_to_card: usb_submit_urb failed
[  910.407870] usb 3-3: DNLD_CMD: host to card failed
[  910.407914] usb 3-3: failed to get signal information
[  916.409877] usb 3-3: mwifiex_usb_host_to_card: usb_submit_urb failed
[  916.409883] usb 3-3: DNLD_CMD: host to card failed
[  916.409895] usb 3-3: failed to get signal information
[  916.409946] usb 3-3: mwifiex_usb_host_to_card: usb_submit_urb failed
[  916.409949] usb 3-3: DNLD_CMD: host to card failed
[  916.409957] usb 3-3: failed to get signal information
[  922.416722] usb 3-3: mwifiex_usb_host_to_card: usb_submit_urb failed
[  922.416739] usb 3-3: DNLD_CMD: host to card failed
[  922.416790] usb 3-3: failed to get signal information
[  922.416886] usb 3-3: mwifiex_usb_host_to_card: usb_submit_urb failed
[  922.416889] usb 3-3: DNLD_CMD: host to card failed
[  922.416925] usb 3-3: failed to get signal information
[  928.419941] usb 3-3: mwifiex_usb_host_to_card: usb_submit_urb failed
[  928.419948] usb 3-3: DNLD_CMD: host to card failed
[  928.419996] usb 3-3: failed to get signal information
[  928.420122] usb 3-3: mwifiex_usb_host_to_card: usb_submit_urb failed
[  928.420138] usb 3-3: DNLD_CMD: host to card failed
[  928.420187] usb 3-3: failed to get signal information
[  934.423276] usb 3-3: mwifiex_usb_host_to_card: usb_submit_urb failed
[  934.423285] usb 3-3: DNLD_CMD: host to card failed
[  934.423340] usb 3-3: failed to get signal information
[  934.423494] usb 3-3: mwifiex_usb_host_to_card: usb_submit_urb failed
[  934.423501] usb 3-3: DNLD_CMD: host to card failed
[  934.423551] usb 3-3: failed to get signal information
[  940.419875] usb 3-3: mwifiex_usb_host_to_card: usb_submit_urb failed
[  940.419882] usb 3-3: DNLD_CMD: host to card failed
[  940.419909] usb 3-3: failed to get signal information
[  940.420022] usb 3-3: mwifiex_usb_host_to_card: usb_submit_urb failed
[  940.420032] usb 3-3: DNLD_CMD: host to card failed
[  940.420052] usb 3-3: failed to get signal information
[  946.423251] usb 3-3: mwifiex_usb_host_to_card: usb_submit_urb failed
[  946.423258] usb 3-3: DNLD_CMD: host to card failed
[  946.423273] usb 3-3: failed to get signal information
[  946.423321] usb 3-3: mwifiex_usb_host_to_card: usb_submit_urb failed
[  946.423325] usb 3-3: DNLD_CMD: host to card failed
[  946.423335] usb 3-3: failed to get signal information
[  952.431144] usb 3-3: mwifiex_usb_host_to_card: usb_submit_urb failed
[  952.431148] usb 3-3: DNLD_CMD: host to card failed
[  952.431154] usb 3-3: failed to get signal information
[  952.431179] usb 3-3: mwifiex_usb_host_to_card: usb_submit_urb failed
[  952.431181] usb 3-3: DNLD_CMD: host to card failed
[  952.431187] usb 3-3: failed to get signal information
[  958.433227] usb 3-3: mwifiex_usb_host_to_card: usb_submit_urb failed
[  958.433236] usb 3-3: DNLD_CMD: host to card failed
[  958.433323] usb 3-3: failed to get signal information
[  958.433469] usb 3-3: mwifiex_usb_host_to_card: usb_submit_urb failed
[  958.433476] usb 3-3: DNLD_CMD: host to card failed
[  958.433543] usb 3-3: failed to get signal information
[  964.436367] usb 3-3: mwifiex_usb_host_to_card: usb_submit_urb failed
[  964.436372] usb 3-3: DNLD_CMD: host to card failed
[  964.436380] usb 3-3: failed to get signal information
[  964.436407] usb 3-3: mwifiex_usb_host_to_card: usb_submit_urb failed
[  964.436410] usb 3-3: DNLD_CMD: host to card failed
[  964.436415] usb 3-3: failed to get signal information
[  970.438467] usb 3-3: mwifiex_usb_host_to_card: usb_submit_urb failed
[  970.438476] usb 3-3: DNLD_CMD: host to card failed
[  970.438527] usb 3-3: failed to get signal information
[  970.438684] usb 3-3: mwifiex_usb_host_to_card: usb_submit_urb failed
[  970.438691] usb 3-3: DNLD_CMD: host to card failed
[  970.438738] usb 3-3: failed to get signal information
[  971.435064] usb 3-3: mwifiex_usb_host_to_card: usb_submit_urb failed
[  971.435069] usb 3-3: DNLD_CMD: host to card failed
Comment 3 Jakub Zeman 2014-02-04 14:48:27 UTC
During my usage of ms surface pro 1 with linux I found out that latest kenel releases have problems when WiFi is going to some energy saving mode. Because Marvel 88W8797 doesn't work well I decided to buy another WiFi USB adapter. I chose Asus USB N-10 adapter.

The Asus adapter has the same problems like Marvel 88W8797 but it doesn't report it to any log. When I'm working with WiFi connection (downloading something or if I have at least one SSH connection up) the adapter works well. When I shut down all connection the adapter stops working after some time (similar to Marvel 88W8797) but dmesg shows no errors.

Isn't there something wrong inside the "energy saving" subsytem (if there is something like that)?
Comment 4 Bing Zhao 2014-02-04 23:57:54 UTC
@Comment 2: could you enable dynamic debug and collect the logs for the timeout issue? Thanks!

set CONFIG_DYNAMIC_DEBUG=y in kernel .config and re-compile.
after reboot, run these commands to turn on dynamic debug for mwifiex modules:

echo "module mwifiex +p" > /sys/kernel/debug/dynamic_debug/control
echo "module mwifiex_usb +p" > /sys/kernel/debug/dynamic_debug/control

And, to confirm, you've removed the line
.supports_autosuspend = 1,
from mwifiex/usb.c, correct?
Comment 5 Jakub Zeman 2014-02-05 10:42:19 UTC
Than you for reply. I enabled CONFIG_DYNAMIC_DEBUG=y and removed .supports_autosuspend = 1, from struct usb_driver mwifiex_usb_driver

My kernel config file is attached. I'll let you know if it helps or not.
Comment 6 Jakub Zeman 2014-02-05 10:43:09 UTC
Created attachment 124631 [details]
config file with CONFIG_DYNAMIC_DEBUG enabled
Comment 7 Jakub Zeman 2014-02-06 19:21:24 UTC
Created attachment 124901 [details]
dmesg with CONFIG_DYNAMIC_DEBUG enabled
Comment 8 Jakub Zeman 2014-02-06 19:23:15 UTC
I attached dmesg with CONFIG_DYNAMIC_DEBUG. Did it help?
Comment 9 Bing Zhao 2014-02-07 20:44:49 UTC
Thanks Jakub for testing. It helps. I've reproduced once the tx timeout with your .config file from Comment #6. It was not easy for me to replicate the issue though.
How often does the tx timeout happen to you?
Comment 10 Jakub Zeman 2014-02-08 01:56:23 UTC
It depends where I'm using the WiFi. At home I have Mikrotik access point with CM9 miniPCI card and when I'm connected to it doesn't happen.

But when I'm on some public access points (like Starbucks, McDonald, ...) it happens very often. But I cannot say what type of access point they are using. Last week I was connected to some access point which was created using Windows 7 "Microsoft Virtual WiFi Miniport Adapter" and it was almost unable to work with - the issue happened every 5 minutes.

I was trying it with removed ".supports_autosuspend = 1" and it has no impact to it.
Comment 11 Reyad Attiyat 2014-02-08 22:26:08 UTC
*** Bug 64111 has been marked as a duplicate of this bug. ***
Comment 12 Reyad Attiyat 2014-02-08 22:37:49 UTC
@Jakub
I also experience this same error on the surface pro 2. I have tried different access points as well and have not found out that won't crash the driver eventually.

I noticed you mentioned that this error only recently came up on newer kernel versions. Do you know whcih kernel version was working with wifi? Does this happen with the older firmware too?

You also mentioned that when driver crashes you cannot shutdown or reboot safely. This also happened to my machine with 3.13 but doesn't happen when I upgraded to the wireless-testing branch.

@Bing

You mentioned that you could only reproduce the error when you changed your kernel config. Did you do a diff between the kerenl configs? Any ideas on what configuration would cause this.

I have been reading the logs and the driver and was trying to better understand the crash. It seems its trying to dequeue data from the device but times out. Is this gonna need another firmware update or is this more of a kernel error. If it happens with different kernel configs do you believe it is kernel code related.
Comment 13 Jakub Zeman 2014-02-08 23:53:51 UTC
You can find my answers in the text below.

(In reply to Reyad Attiyat from comment #12)
> @Jakub
> I noticed you mentioned that this error only recently came up on newer
> kernel versions. Do you know whcih kernel version was working with wifi?
> Does this happen with the older firmware too?

I mentioned newer kernels because from my point of view kernels 3.10, 3.12 and 3.13 (including wireless-testing branch) doesn't work on my surface pro well with USB WiFi devices. I tried build in Marvel device and Asus USB N-10 device. Both devices stop working practically in the same way as I described in my previous comments. The only difference is that Asus USB N-10 reports no error messages to dmesg.

Unlike with 3.0.X and 3.2.X - these kernels work with Asus USB N-10 well but I cannot use them because they have problems with stability on my SF Pro (the kernel get stuck often).

> 
> You also mentioned that when driver crashes you cannot shutdown or reboot
> safely. This also happened to my machine with 3.13 but doesn't happen when I
> upgraded to the wireless-testing branch.

That might be true - but I don't use wireless-testing branch because the kernel get stuck with my SF Pro quite often :-) so I don't remember If the shutdown worked well with it or not

When I summarize it:
- I'm using latest kernel (3.13.1 now)
- I bought USB Ethernet adapter and when I need to work with network without problems I disable Wifi and I use USB Ethernet adapter.
- I'm praying for Bing would find the cause of the problem and posted patch asap :)
Comment 14 Reyad Attiyat 2014-02-10 22:21:35 UTC
> I mentioned newer kernels because from my point of view kernels 3.10, 3.12
> and 3.13 (including wireless-testing branch) doesn't work on my surface pro
> well with USB WiFi devices. I tried build in Marvel device and Asus USB N-10
> device. Both devices stop working practically in the same way as I described
> in my previous comments. The only difference is that Asus USB N-10 reports
> no error messages to dmesg.
> 
> Unlike with 3.0.X and 3.2.X - these kernels work with Asus USB N-10 well but
> I cannot use them because they have problems with stability on my SF Pro
> (the kernel get stuck often).
> 

Your devices seems to fair worse with kernel errors than mine does. I only rarely experience the tx timeout issue. When you mention your device gets stuck is this only because of mwifiex_usb crashing or do you have other kernel crashes?

You said that it worked perfectly on your home access point. Is it 802.11g or 802.11n?

I have been playing around with some setting to try and improve reliability. I'm using an 802.11g access point that is basically a linux box with a wirless nic in AP mode. I have disabled WMM in the mwifiex driver and so far haven't experienced the tx timeout. I'm prolly just getting lucky thou.
Comment 15 Bing Zhao 2014-02-11 04:58:14 UTC
@Comment #12,
> You mentioned that you could only reproduce the error when you changed your
> kernel config. 
> Did you do a diff between the kerenl configs? 
> Any ideas on what configuration would cause this.

I did a diff between the two kernel .config files, the small difference is niether USB nor wireless related. So, it doesn't convince me that Jakub's .config file made it happen. Later I did another test using Jakub's config file, but this time it ran two days without timeout.

@Comment #14, 
> I have disabled WMM in the mwifiex driver and so far haven't experienced the
> tx timeout. I'm prolly just getting lucky thou.

Have you got any problem later with this change?
Comment 16 Zwane Mwaikambo 2014-02-11 06:05:50 UTC
@Bing Using 3.13.2, new firmware and the attached patches, i have not run into any issues yet.

patch-surface_no_autosuspend
Comment 17 Zwane Mwaikambo 2014-02-11 06:06:41 UTC
Created attachment 125561 [details]
Disable USB autosuspend on Wifi + BT
Comment 18 Bing Zhao 2014-02-11 06:35:08 UTC
Thanks Zwane for the information and the patch.
Could you confirm that you must apply the patch-sufarce_no_autosuspend in kernel 3.13.2 in order to get rid of the timeouts?
Furthermore, do you have to change .supports_autosuspend = 0 in btusb.c to avoid the mwifiex timeout?
Comment 19 Zwane Mwaikambo 2014-02-11 06:39:29 UTC
Bing, good point, originally i needed both, but i'm going to re-test with that patch backed out.
Comment 20 Zwane Mwaikambo 2014-02-11 07:49:57 UTC
I backed out patch-surface_no_autosuspend, booted and when i ran powertop(1) i got the following;

[  242.528724] INFO: task khubd:43 blocked for more than 120 seconds.
[  242.528731]       Tainted: GF         IO 3.13.2-s2pro+ #1
[  242.528733] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  242.528736] khubd           D ffff88021fa130c0     0    43      2 0x00000000
[  242.528742]  ffff880213d37b50 0000000000000046 ffff8802142ec7d0 00000000000130c0
[  242.528747]  ffff880213d37fd8 00000000000130c0 ffff8801f3542fe0 ffff8802142ec7d0
[  242.528752]  ffff880208a5792e ffff880208a57888 ffff880208a579e8 0000000000000000
[  242.528757] Call Trace:
[  242.528767]  [<ffffffff816cca19>] schedule+0x29/0x70
[  242.528773]  [<ffffffff81472419>] rpm_resume+0x179/0x630
[  242.528779]  [<ffffffff8108a840>] ? finish_wait+0x80/0x80
[  242.528784]  [<ffffffff814738ee>] pm_runtime_forbid+0x3e/0x50
[  242.528790]  [<ffffffff815128f5>] usb_disable_autosuspend+0x15/0x20
[  242.528799]  [<ffffffffa0427f05>] mwifiex_usb_resume+0xd5/0x160 [mwifiex_usb]
[  242.528804]  [<ffffffff81512ede>] usb_resume_interface.isra.5+0xae/0x130
[  242.528809]  [<ffffffff81512fcb>] usb_resume_both+0x6b/0x150
[  242.528813]  [<ffffffff8151451a>] usb_runtime_resume+0x1a/0x20
[  242.528818]  [<ffffffff814721c0>] __rpm_callback+0x30/0x80
[  242.528822]  [<ffffffff81472283>] rpm_callback+0x73/0x90
[  242.528826]  [<ffffffff814726d4>] rpm_resume+0x434/0x630
[  242.528831]  [<ffffffff816cbb94>] ? schedule_timeout+0x164/0x2b0
[  242.528835]  [<ffffffff81473b2c>] __pm_runtime_resume+0x5c/0x90
[  242.528840]  [<ffffffff81513b69>] usb_autoresume_device+0x29/0x60
[  242.528844]  [<ffffffff81507151>] usb_remote_wakeup+0x31/0x70
[  242.528850]  [<ffffffff816ce912>] ? mutex_lock+0x12/0x2f
[  242.528853]  [<ffffffff815097b1>] hub_thread+0x941/0x1610
[  242.528858]  [<ffffffff8108a840>] ? finish_wait+0x80/0x80
[  242.528862]  [<ffffffff81508e70>] ? usb_port_resume+0x5f0/0x5f0
[  242.528867]  [<ffffffff8106bb79>] kthread+0xc9/0xe0
[  242.528871]  [<ffffffff8106bab0>] ? flush_kthread_worker+0x80/0x80
[  242.528878]  [<ffffffff816d81ac>] ret_from_fork+0x7c/0xb0
[  242.528882]  [<ffffffff8106bab0>] ? flush_kthread_worker+0x80/0x80
[  352.393786] audit_printk_skb: 174 callbacks suppressed
Comment 21 Zwane Mwaikambo 2014-02-11 08:04:32 UTC
I got a similar backtrace on a subsequent reboot without even running powertop.
Comment 22 Reyad Attiyat 2014-02-12 17:59:24 UTC
@Bing
> @Comment #14, 
> > I have disabled WMM in the mwifiex driver and so far haven't experienced
> the tx timeout. I'm prolly just getting lucky thou.
> 
> Have you got any problem later with this change?

I did encounter a tx timeout with wmm disabled. I was able to get an output from debugfs after the crash but only from the debug file and not getlog or info.

I notice num_cmd_h2c_fail=0x17 does this mean its a firmware crash or possibly usb driver. From my understanding of the README, h2c_fail is host to card failures and I'm guessing that its try to send the tx_pedning packets.

I'll post a before and after of the output below.
Comment 23 Reyad Attiyat 2014-02-12 17:59:59 UTC
Created attachment 125791 [details]
debugfs output before/after
Comment 24 Bing Zhao 2014-02-13 03:56:16 UTC
@Zwane, thanks for testing. Any chance you could try without the change in btusb.c, meaning .supports_autosuspend = 1 in btusb.c, but other changes in patch-sufarce_no_autosuspend still applied?
I'm asking this because btusb.c could be used by other devices which need .supports_autosuspend = 1.

@Reyad, yes, h2c_fail ("debug" output from debugfs) is host_to_card failure. It tells us there had 17 command failures happened before the TX data timeout. Next time when you get this problem please save a dmesg and/or /var/log/messages (or other files that save kernel/driver logs). The h2c_fail is most likely caused by 8797 driver/firmware handshake failure. The debugfs "info" and "getlog" fail because they try to send commands to firmware while driver & firmware are already out of sync. It could be usb core driver too, but usually we should see some errors reported by usb driver.
Comment 25 Zwane Mwaikambo 2014-02-15 04:18:33 UTC
(In reply to Bing Zhao from comment #24)
> @Zwane, thanks for testing. Any chance you could try without the change in
> btusb.c, meaning .supports_autosuspend = 1 in btusb.c, but other changes in
> patch-sufarce_no_autosuspend still applied?
> I'm asking this because btusb.c could be used by other devices which need
> .supports_autosuspend = 1.

I'll retest with 3.13.2, when i tested with 3.13.0-rc4 i got the timeouts. Thanks for your efforts!
Comment 26 Jakub Zeman 2014-02-16 19:05:23 UTC
I'm using 3.13.3 with removed ".supports_autosuspend = 1" from drivers/net/wireless/mwifiex/usb.c for last 4 days. I was connected via wireless only and the kernel seems fully stable all the time.

So I would say (but I'm not kernel expert) that the problem isn't in the driver or firmware but in the way how kernel handles suspend or power save of wireless devices.
Comment 27 Zwane Mwaikambo 2014-02-22 22:47:47 UTC
(In reply to Zwane Mwaikambo from comment #25)
> (In reply to Bing Zhao from comment #24)
> > @Zwane, thanks for testing. Any chance you could try without the change in
> > btusb.c, meaning .supports_autosuspend = 1 in btusb.c, but other changes in
> > patch-sufarce_no_autosuspend still applied?
> > I'm asking this because btusb.c could be used by other devices which need
> > .supports_autosuspend = 1.
> 
> I'll retest with 3.13.2, when i tested with 3.13.0-rc4 i got the timeouts.
> Thanks for your efforts!

I retested without the btusb change and it becomes less reliable coming out of suspend. Even with the changes it doesn't reliably wakeup.
Comment 28 Jakub Zeman 2014-02-27 17:28:41 UTC
After few weeks I have new crash even with removed ".supports_autosuspend = 1" from drivers/net/wireless/mwifiex/usb.c

[  135.640436] usb 3-3: data: -ENOSR is returned
[  170.234985] usb 3-3: data: -ENOSR is returned
[  836.541246] ------------[ cut here ]------------
[  836.541276] WARNING: CPU: 0 PID: 159 at net/wireless/sme.c:662 __cfg80211_connect_result+0x3db/0x420 [cfg80211]()
[  836.541278] Modules linked in: nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables parport_pc ppdev bnep rfcomm lp parport btusb bluetooth binfmt_misc uinput nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc nls_utf8 nls_cp437 vfat fat fuse hid_sensor_magn_3d hid_sensor_accel_3d hid_sensor_gyro_3d hid_sensor_als hid_sensor_trigger industrialio_triggered_buffer kfifo_buf hid_sensor_iio_common industrialio joydev hid_generic hid_sensor_hub hid_multitouch usbhid hid iTCO_wdt iTCO_vendor_support mwifiex_usb mwifiex cfg80211 uvcvideo videobuf2_vmalloc rfkill videobuf2_memops videobuf2_core videodev media tpm_infineon x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crc32_pclmul crc32c_intel evdev ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_realtek aesni_intel aes_x86_64 ablk_helper cryptd lrw gf128mul glue_helper pcspkr lpc_ich mfd_core i2c_i801 i915 snd_hda_intel ac drm_kms_helper snd_hda_codec drm snd_hwdep video i2c_algo_bit battery i2c_core snd_pcm tpm_tis snd_page_alloc mei_me snd_timer processor tpm mei button snd soundcore ext4 crc16 mbcache jbd2 sg sd_mod crc_t10dif crct10dif_common thermal thermal_sys ahci libahci ehci_pci xhci_hcd ehci_hcd libata scsi_mod usbcore usb_common
[  836.541366] CPU: 0 PID: 159 Comm: kworker/u8:5 Not tainted 3.13.3 #1
[  836.541369] Hardware name: Microsoft Corporation Surface with Windows 8 Pro/Surface with Windows 8 Pro, BIOS 1.05.0050 08/20/2013
[  836.541377] Workqueue: cfg80211 cfg80211_event_work [cfg80211]
[  836.541379]  0000000000000009 ffffffff814a1ce0 0000000000000000 ffffffff8105b7a2
[  836.541384]  0000000000000000 ffff8800373c3dc0 0000000000000000 ffff8800d81db298
[  836.541387]  ffff8800d80a6400 ffffffffa052a7cb 0000000000000001 0000000000000000
[  836.541391] Call Trace:
[  836.541400]  [<ffffffff814a1ce0>] ? dump_stack+0x41/0x51
[  836.541406]  [<ffffffff8105b7a2>] ? warn_slowpath_common+0x72/0x90
[  836.541416]  [<ffffffffa052a7cb>] ? __cfg80211_connect_result+0x3db/0x420 [cfg80211]
[  836.541426]  [<ffffffffa050a2dd>] ? cfg80211_process_wdev_events+0x17d/0x1b0 [cfg80211]
[  836.541435]  [<ffffffffa050a348>] ? cfg80211_process_rdev_events+0x38/0x90 [cfg80211]
[  836.541442]  [<ffffffffa0506015>] ? cfg80211_event_work+0x15/0x20 [cfg80211]
[  836.541448]  [<ffffffff8107452d>] ? process_one_work+0x16d/0x420
[  836.541451]  [<ffffffff810750f6>] ? worker_thread+0x116/0x3b0
[  836.541455]  [<ffffffff81074fe0>] ? rescuer_thread+0x330/0x330
[  836.541459]  [<ffffffff8107b4b1>] ? kthread+0xc1/0xe0
[  836.541462]  [<ffffffff8107b3f0>] ? kthread_create_on_node+0x180/0x180
[  836.541467]  [<ffffffff814aedcc>] ? ret_from_fork+0x7c/0xb0
[  836.541470]  [<ffffffff8107b3f0>] ? kthread_create_on_node+0x180/0x180
[  836.541473] ---[ end trace 5293a7a94377ed30 ]---
[  836.541519] cfg80211: Calling CRDA to update world regulatory domain
[  836.547387] cfg80211: World regulatory domain updated:
[  836.547393] cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
[  836.547397] cfg80211:   (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[  836.547401] cfg80211:   (2457000 KHz - 2482000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[  836.547404] cfg80211:   (2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
[  836.547407] cfg80211:   (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[  836.547410] cfg80211:   (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[ 1388.212931] usb 3-3: data: -ENOSR is returned
[ 1393.275121] usb 3-3: data: -ENOSR is returned
[ 1410.665526] usb 3-3: data: -ENOSR is returned
[ 1797.447844] usb 3-3: data: -ENOSR is returned
[ 1797.447858] usb 3-3: data: -ENOSR is returned
[ 1797.447900] usb 3-3: data: -ENOSR is returned
[ 1797.447957] usb 3-3: data: -ENOSR is returned
[ 1797.447981] usb 3-3: data: -ENOSR is returned
[ 1797.448025] usb 3-3: data: -ENOSR is returned
[ 1797.472830] ------------[ cut here ]------------
[ 1797.472859] WARNING: CPU: 0 PID: 159 at net/wireless/sme.c:662 __cfg80211_connect_result+0x3db/0x420 [cfg80211]()
[ 1797.472860] Modules linked in: nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables parport_pc ppdev bnep rfcomm lp parport btusb bluetooth binfmt_misc uinput nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc nls_utf8 nls_cp437 vfat fat fuse hid_sensor_magn_3d hid_sensor_accel_3d hid_sensor_gyro_3d hid_sensor_als hid_sensor_trigger industrialio_triggered_buffer kfifo_buf hid_sensor_iio_common industrialio joydev hid_generic hid_sensor_hub hid_multitouch usbhid hid iTCO_wdt iTCO_vendor_support mwifiex_usb mwifiex cfg80211 uvcvideo videobuf2_vmalloc rfkill videobuf2_memops videobuf2_core videodev media tpm_infineon x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crc32_pclmul crc32c_intel evdev ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_realtek aesni_intel aes_x86_64 ablk_helper cryptd lrw gf128mul glue_helper pcspkr lpc_ich mfd_core i2c_i801 i915 snd_hda_intel ac drm_kms_helper snd_hda_codec drm snd_hwdep video i2c_algo_bit battery i2c_core snd_pcm tpm_tis snd_page_alloc mei_me snd_timer processor tpm mei button snd soundcore ext4 crc16 mbcache jbd2 sg sd_mod crc_t10dif crct10dif_common thermal thermal_sys ahci libahci ehci_pci xhci_hcd ehci_hcd libata scsi_mod usbcore usb_common
[ 1797.472915] CPU: 0 PID: 159 Comm: kworker/u8:5 Tainted: G        W    3.13.3 #1
[ 1797.472917] Hardware name: Microsoft Corporation Surface with Windows 8 Pro/Surface with Windows 8 Pro, BIOS 1.05.0050 08/20/2013
[ 1797.472922] Workqueue: cfg80211 cfg80211_event_work [cfg80211]
[ 1797.472924]  0000000000000009 ffffffff814a1ce0 0000000000000000 ffffffff8105b7a2
[ 1797.472927]  0000000000000000 ffff8800373c3dc0 0000000000000000 ffff8800acd67e18
[ 1797.472929]  ffff8800d80a6400 ffffffffa052a7cb 0000000000000001 0000000000000000
[ 1797.472931] Call Trace:
[ 1797.472937]  [<ffffffff814a1ce0>] ? dump_stack+0x41/0x51
[ 1797.472941]  [<ffffffff8105b7a2>] ? warn_slowpath_common+0x72/0x90
[ 1797.472947]  [<ffffffffa052a7cb>] ? __cfg80211_connect_result+0x3db/0x420 [cfg80211]
[ 1797.472954]  [<ffffffffa050a2dd>] ? cfg80211_process_wdev_events+0x17d/0x1b0 [cfg80211]
[ 1797.472959]  [<ffffffffa050a348>] ? cfg80211_process_rdev_events+0x38/0x90 [cfg80211]
[ 1797.472964]  [<ffffffffa0506015>] ? cfg80211_event_work+0x15/0x20 [cfg80211]
[ 1797.472967]  [<ffffffff8107452d>] ? process_one_work+0x16d/0x420
[ 1797.472969]  [<ffffffff810750f6>] ? worker_thread+0x116/0x3b0
[ 1797.472971]  [<ffffffff81074fe0>] ? rescuer_thread+0x330/0x330
[ 1797.472974]  [<ffffffff8107b4b1>] ? kthread+0xc1/0xe0
[ 1797.472976]  [<ffffffff8107b3f0>] ? kthread_create_on_node+0x180/0x180
[ 1797.472979]  [<ffffffff814aedcc>] ? ret_from_fork+0x7c/0xb0
[ 1797.472981]  [<ffffffff8107b3f0>] ? kthread_create_on_node+0x180/0x180
[ 1797.472982] ---[ end trace 5293a7a94377ed31 ]---
[ 1797.473008] cfg80211: Calling CRDA to update world regulatory domain
[ 1797.476680] cfg80211: World regulatory domain updated:
[ 1797.476682] cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
[ 1797.476695] cfg80211:   (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[ 1797.476696] cfg80211:   (2457000 KHz - 2482000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[ 1797.476697] cfg80211:   (2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
[ 1797.476698] cfg80211:   (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[ 1797.476700] cfg80211:   (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[ 1931.418080] usb 3-3: data: -ENOSR is returned
[ 2037.566631] ------------[ cut here ]------------
[ 2037.566650] WARNING: CPU: 1 PID: 159 at net/wireless/sme.c:662 __cfg80211_connect_result+0x3db/0x420 [cfg80211]()
[ 2037.566651] Modules linked in: nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables parport_pc ppdev bnep rfcomm lp parport btusb bluetooth binfmt_misc uinput nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc nls_utf8 nls_cp437 vfat fat fuse hid_sensor_magn_3d hid_sensor_accel_3d hid_sensor_gyro_3d hid_sensor_als hid_sensor_trigger industrialio_triggered_buffer kfifo_buf hid_sensor_iio_common industrialio joydev hid_generic hid_sensor_hub hid_multitouch usbhid hid iTCO_wdt iTCO_vendor_support mwifiex_usb mwifiex cfg80211 uvcvideo videobuf2_vmalloc rfkill videobuf2_memops videobuf2_core videodev media tpm_infineon x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crc32_pclmul crc32c_intel evdev ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_realtek aesni_intel aes_x86_64 ablk_helper cryptd lrw gf128mul glue_helper pcspkr lpc_ich mfd_core i2c_i801 i915 snd_hda_intel ac drm_kms_helper snd_hda_codec drm snd_hwdep video i2c_algo_bit battery i2c_core snd_pcm tpm_tis snd_page_alloc mei_me snd_timer processor tpm mei button snd soundcore ext4 crc16 mbcache jbd2 sg sd_mod crc_t10dif crct10dif_common thermal thermal_sys ahci libahci ehci_pci xhci_hcd ehci_hcd libata scsi_mod usbcore usb_common
[ 2037.566707] CPU: 1 PID: 159 Comm: kworker/u8:5 Tainted: G        W    3.13.3 #1
[ 2037.566708] Hardware name: Microsoft Corporation Surface with Windows 8 Pro/Surface with Windows 8 Pro, BIOS 1.05.0050 08/20/2013
[ 2037.566714] Workqueue: cfg80211 cfg80211_event_work [cfg80211]
[ 2037.566715]  0000000000000009 ffffffff814a1ce0 0000000000000000 ffffffff8105b7a2
[ 2037.566718]  0000000000000000 ffff8800373c3dc0 0000000000000000 ffff8800d2d58598
[ 2037.566720]  ffff8800d80a6400 ffffffffa052a7cb 0000000000000001 0000000000000000
[ 2037.566722] Call Trace:
[ 2037.566727]  [<ffffffff814a1ce0>] ? dump_stack+0x41/0x51
[ 2037.566731]  [<ffffffff8105b7a2>] ? warn_slowpath_common+0x72/0x90
[ 2037.566737]  [<ffffffffa052a7cb>] ? __cfg80211_connect_result+0x3db/0x420 [cfg80211]
[ 2037.566743]  [<ffffffffa050a2dd>] ? cfg80211_process_wdev_events+0x17d/0x1b0 [cfg80211]
[ 2037.566748]  [<ffffffffa050a348>] ? cfg80211_process_rdev_events+0x38/0x90 [cfg80211]
[ 2037.566753]  [<ffffffffa0506015>] ? cfg80211_event_work+0x15/0x20 [cfg80211]
[ 2037.566756]  [<ffffffff8107452d>] ? process_one_work+0x16d/0x420
[ 2037.566759]  [<ffffffff810750f6>] ? worker_thread+0x116/0x3b0
[ 2037.566761]  [<ffffffff81074fe0>] ? rescuer_thread+0x330/0x330
[ 2037.566763]  [<ffffffff8107b4b1>] ? kthread+0xc1/0xe0
[ 2037.566765]  [<ffffffff8107b3f0>] ? kthread_create_on_node+0x180/0x180
[ 2037.566768]  [<ffffffff814aedcc>] ? ret_from_fork+0x7c/0xb0
[ 2037.566770]  [<ffffffff8107b3f0>] ? kthread_create_on_node+0x180/0x180
[ 2037.566771] ---[ end trace 5293a7a94377ed32 ]---
[ 2037.566801] cfg80211: Calling CRDA to update world regulatory domain
[ 2037.570580] cfg80211: World regulatory domain updated:
[ 2037.570583] cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
[ 2037.570585] cfg80211:   (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[ 2037.570586] cfg80211:   (2457000 KHz - 2482000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[ 2037.570588] cfg80211:   (2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
[ 2037.570589] cfg80211:   (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[ 2037.570590] cfg80211:   (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[ 2052.082963] usb 3-3: data: -ENOSR is returned
[ 2058.570521] usb 3-3: mwifiex_cmd_timeout_func: Timeout cmd id (1393488412.325014) = 0xce, act = 0xf4db
[ 2058.570540] usb 3-3: num_data_h2c_failure = 0
[ 2058.570542] usb 3-3: num_cmd_h2c_failure = 0
[ 2058.570545] usb 3-3: num_cmd_timeout = 1
[ 2058.570547] usb 3-3: num_tx_timeout = 0
[ 2058.570549] usb 3-3: last_cmd_index = 2
[ 2058.570552] usb 3-3: last_cmd_id: 28 00 5b 00 ce 00 28 00 28 00
[ 2058.570554] usb 3-3: last_cmd_act: 13 00 01 00 db f4 13 00 13 00
[ 2058.570557] usb 3-3: last_cmd_resp_index = 1
[ 2058.570559] usb 3-3: last_cmd_resp_id: 28 80 5b 80 28 80 28 80 28 80
[ 2058.570561] usb 3-3: last_event_index = 2
[ 2058.570563] usb 3-3: last_event: 33 00 17 00 2b 00 17 00 2b 00
[ 2058.570566] usb 3-3: data_sent=1 cmd_sent=0
[ 2058.570568] usb 3-3: ps_mode=1 ps_state=0
[ 2182.736504] ------------[ cut here ]------------
[ 2182.736512] WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:264 dev_watchdog+0x226/0x230()
[ 2182.736514] NETDEV WATCHDOG: mlan0 (usb): transmit queue 1 timed out
[ 2182.736515] Modules linked in: nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables parport_pc ppdev bnep rfcomm lp parport btusb bluetooth binfmt_misc uinput nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc nls_utf8 nls_cp437 vfat fat fuse hid_sensor_magn_3d hid_sensor_accel_3d hid_sensor_gyro_3d hid_sensor_als hid_sensor_trigger industrialio_triggered_buffer kfifo_buf hid_sensor_iio_common industrialio joydev hid_generic hid_sensor_hub hid_multitouch usbhid hid iTCO_wdt iTCO_vendor_support mwifiex_usb mwifiex cfg80211 uvcvideo videobuf2_vmalloc rfkill videobuf2_memops videobuf2_core videodev media tpm_infineon x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crc32_pclmul crc32c_intel evdev ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_realtek aesni_intel aes_x86_64 ablk_helper cryptd lrw gf128mul glue_helper pcspkr lpc_ich mfd_core i2c_i801 i915 snd_hda_intel ac drm_kms_helper snd_hda_codec drm snd_hwdep video i2c_algo_bit battery i2c_core snd_pcm tpm_tis snd_page_alloc mei_me snd_timer processor tpm mei button snd soundcore ext4 crc16 mbcache jbd2 sg sd_mod crc_t10dif crct10dif_common thermal thermal_sys ahci libahci ehci_pci xhci_hcd ehci_hcd libata scsi_mod usbcore usb_common
[ 2182.736573] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G        W    3.13.3 #1
[ 2182.736574] Hardware name: Microsoft Corporation Surface with Windows 8 Pro/Surface with Windows 8 Pro, BIOS 1.05.0050 08/20/2013
[ 2182.736576]  0000000000000009 ffffffff814a1ce0 ffff88011f303e20 ffffffff8105b7a2
[ 2182.736578]  0000000000000001 ffff88011f303e70 0000000000000004 0000000000000002
[ 2182.736580]  ffff8800daa3b000 ffffffff8105b807 ffffffff8174de20 ffffffff00000030
[ 2182.736582] Call Trace:
[ 2182.736584]  <IRQ>  [<ffffffff814a1ce0>] ? dump_stack+0x41/0x51
[ 2182.736591]  [<ffffffff8105b7a2>] ? warn_slowpath_common+0x72/0x90
[ 2182.736594]  [<ffffffff8105b807>] ? warn_slowpath_fmt+0x47/0x50
[ 2182.736599]  [<ffffffff813dce86>] ? dev_watchdog+0x226/0x230
[ 2182.736602]  [<ffffffff813dcc60>] ? dev_graft_qdisc+0x70/0x70
[ 2182.736604]  [<ffffffff810669bc>] ? call_timer_fn+0x2c/0x100
[ 2182.736606]  [<ffffffff813dcc60>] ? dev_graft_qdisc+0x70/0x70
[ 2182.736608]  [<ffffffff810677f9>] ? run_timer_softirq+0x1f9/0x2b0
[ 2182.736610]  [<ffffffff81060502>] ? __do_softirq+0xe2/0x260
[ 2182.736613]  [<ffffffff81060935>] ? irq_exit+0x95/0xa0
[ 2182.736616]  [<ffffffff8103e5cb>] ? smp_apic_timer_interrupt+0x3b/0x50
[ 2182.736619]  [<ffffffff814afa5d>] ? apic_timer_interrupt+0x6d/0x80
[ 2182.736620]  <EOI>  [<ffffffff813896ad>] ? cpuidle_enter_state+0x4d/0xc0
[ 2182.736624]  [<ffffffff813896a3>] ? cpuidle_enter_state+0x43/0xc0
[ 2182.736627]  [<ffffffff813897c9>] ? cpuidle_idle_call+0xa9/0x1d0
[ 2182.736630]  [<ffffffff8101ad85>] ? arch_cpu_idle+0x5/0x30
[ 2182.736634]  [<ffffffff810a51fe>] ? cpu_startup_entry+0xbe/0x280
[ 2182.736637]  [<ffffffff8103c56a>] ? start_secondary+0x20a/0x2b0
[ 2182.736638] ---[ end trace 5293a7a94377ed33 ]---
[ 2182.736642] usb 3-3: 4295437748 : Tx timeout(#1), bss_type-num = 0-0
[ 2192.740919] usb 3-3: 4295440248 : Tx timeout(#2), bss_type-num = 0-0
[ 2202.745170] usb 3-3: 4295442748 : Tx timeout(#3), bss_type-num = 0-0
[ 2212.749523] usb 3-3: 4295445248 : Tx timeout(#4), bss_type-num = 0-0
[ 2222.753844] usb 3-3: 4295447748 : Tx timeout(#5), bss_type-num = 0-0
[ 2232.758214] usb 3-3: 4295450248 : Tx timeout(#6), bss_type-num = 0-0
[ 2242.762533] usb 3-3: 4295452748 : Tx timeout(#7), bss_type-num = 0-0
[ 2252.766839] usb 3-3: 4295455248 : Tx timeout(#8), bss_type-num = 0-0
[ 2262.771155] usb 3-3: 4295457748 : Tx timeout(#9), bss_type-num = 0-0
[ 2272.775406] usb 3-3: 4295460248 : Tx timeout(#10), bss_type-num = 0-0
[ 2282.779707] usb 3-3: 4295462748 : Tx timeout(#11), bss_type-num = 0-0
[ 2292.783988] usb 3-3: 4295465248 : Tx timeout(#12), bss_type-num = 0-0
[ 2302.788323] usb 3-3: 4295467748 : Tx timeout(#13), bss_type-num = 0-0
[ 2312.792621] usb 3-3: 4295470248 : Tx timeout(#14), bss_type-num = 0-0
[ 2322.796988] usb 3-3: 4295472748 : Tx timeout(#15), bss_type-num = 0-0
[ 2332.801279] usb 3-3: 4295475248 : Tx timeout(#16), bss_type-num = 0-0
[ 2342.805482] usb 3-3: 4295477748 : Tx timeout(#17), bss_type-num = 0-0
[ 2352.809840] usb 3-3: 4295480248 : Tx timeout(#18), bss_type-num = 0-0
[ 2362.814148] usb 3-3: 4295482748 : Tx timeout(#19), bss_type-num = 0-0
[ 2372.818447] usb 3-3: 4295485248 : Tx timeout(#20), bss_type-num = 0-0
[ 2382.822730] usb 3-3: 4295487748 : Tx timeout(#21), bss_type-num = 0-0
[ 2392.827000] usb 3-3: 4295490248 : Tx timeout(#22), bss_type-num = 0-0
[ 2402.831271] usb 3-3: 4295492748 : Tx timeout(#23), bss_type-num = 0-0
[ 2412.835566] usb 3-3: 4295495248 : Tx timeout(#24), bss_type-num = 0-0
[ 2422.839773] usb 3-3: 4295497748 : Tx timeout(#25), bss_type-num = 0-0
[ 2432.844130] usb 3-3: 4295500248 : Tx timeout(#26), bss_type-num = 0-0
Comment 29 perpe 2014-03-05 18:54:43 UTC
On the Surface Pro 2 with Ubuntu 13.10 + mainline kernel the bug is still there, although the usb interfaces are more stable than with kernel 3.13. I also use the updated firmware that Bing Zhao had submitted.

The same bug was in the btmrvl module for the sdio variant and was fixed with this commit http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/bluetooth/btmrvl_main.c?id=ea05fea9042620ac3b8ab9a3e5e4d2ed80c89244 (see also https://dev.laptop.org/ticket/12657)
Unfortunately there isn't a bt module for the usb variant of the marvell bt/wifi combo devices, this means we have to use the generic btusb module, which gets into the same kthread race like the btmrvl module before that commit.
Comment 30 Jakub Zeman 2014-05-05 07:13:27 UTC
Hi all,

I was trying kernels 3.13, 3.14 and 3.15 and the problem is still there. See my dmesg from 3.15.0 RC3:

[  808.530054] ------------[ cut here ]------------
[  808.530097] WARNING: CPU: 0 PID: 171 at net/wireless/sme.c:654 __cfg80211_connect_result+0x3b6/0x400 [cfg80211]()
[  808.530100] Modules linked in: tun nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables bnep rfcomm binfmt_misc uinput nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc ecb btusb nls_utf8 nls_cp437 vfat fat fuse hidp bluetooth parport_pc ppdev lp parport snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic iTCO_wdt iTCO_vendor_support hid_multitouch hid_sensor_als hid_sensor_magn_3d hid_sensor_accel_3d hid_sensor_gyro_3d hid_sensor_trigger industrialio_triggered_buffer kfifo_buf hid_sensor_iio_common industrialio joydev hid_sensor_hub hid_generic usbhid hid uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev media usb_storage tpm_infineon mwifiex_usb mwifiex cfg80211 rfkill x86_pkg_temp_thermal intel_powerclamp intel_rapl lpc_ich mfd_core coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel evdev aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd pcspkr i2c_i801 snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm i915 snd_seq snd_seq_device battery snd_timer ac video drm_kms_helper tpm_tis snd drm i2c_algo_bit tpm i2c_core mei_me soundcore mei button processor ext4 crc16 mbcache jbd2 sg sd_mod crc_t10dif crct10dif_common crc32c_intel thermal thermal_sys ahci libahci libata scsi_mod xhci_hcd ehci_pci ehci_hcd usbcore usb_common
[  808.530254] CPU: 0 PID: 171 Comm: kworker/u8:5 Not tainted 3.15.0-rc3 #1
[  808.530258] Hardware name: Microsoft Corporation Surface with Windows 8 Pro/Surface with Windows 8 Pro, BIOS 1.05.0050 08/20/2013
[  808.530272] Workqueue: cfg80211 cfg80211_event_work [cfg80211]
[  808.530276]  0000000000000009 ffffffff814b0558 0000000000000000 ffffffff8105cdd2
[  808.530282]  0000000000000000 ffff88003719bdc0 0000000000000000 ffff8800d7945658
[  808.530288]  ffff8800d23dac00 ffffffffa04d6906 0000000000000001 0000000000000000
[  808.530293] Call Trace:
[  808.530307]  [<ffffffff814b0558>] ? dump_stack+0x41/0x51
[  808.530315]  [<ffffffff8105cdd2>] ? warn_slowpath_common+0x72/0x90
[  808.530333]  [<ffffffffa04d6906>] ? __cfg80211_connect_result+0x3b6/0x400 [cfg80211]
[  808.530351]  [<ffffffffa04b53e1>] ? cfg80211_process_wdev_events+0x171/0x1a0 [cfg80211]
[  808.530367]  [<ffffffffa04b5438>] ? cfg80211_process_rdev_events+0x28/0x60 [cfg80211]
[  808.530382]  [<ffffffffa04b1015>] ? cfg80211_event_work+0x15/0x20 [cfg80211]
[  808.530390]  [<ffffffff810759dd>] ? process_one_work+0x16d/0x420
[  808.530397]  [<ffffffff81076596>] ? worker_thread+0x116/0x3b0
[  808.530404]  [<ffffffff81076480>] ? rescuer_thread+0x330/0x330
[  808.530410]  [<ffffffff8107c741>] ? kthread+0xc1/0xe0
[  808.530417]  [<ffffffff8107c680>] ? kthread_create_on_node+0x180/0x180
[  808.530423]  [<ffffffff814bdb4c>] ? ret_from_fork+0x7c/0xb0
[  808.530429]  [<ffffffff8107c680>] ? kthread_create_on_node+0x180/0x180
[  808.530433] ---[ end trace 4e8d74f4055a7dae ]---
[  808.530533] cfg80211: Calling CRDA to update world regulatory domain
[  808.539390] cfg80211: World regulatory domain updated:
[  808.539403] cfg80211:  DFS Master region: unset
[  808.539407] cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp), (dfs_cac_time)
[  808.539416] cfg80211:   (2402000 KHz - 2472000 KHz @ 40000 KHz), (N/A, 2000 mBm), (N/A)
[  808.539423] cfg80211:   (2457000 KHz - 2482000 KHz @ 40000 KHz), (N/A, 2000 mBm), (N/A)
[  808.539430] cfg80211:   (2474000 KHz - 2494000 KHz @ 20000 KHz), (N/A, 2000 mBm), (N/A)
[  808.539436] cfg80211:   (5170000 KHz - 5250000 KHz @ 80000 KHz), (N/A, 2000 mBm), (N/A)
[  808.539443] cfg80211:   (5735000 KHz - 5835000 KHz @ 80000 KHz), (N/A, 2000 mBm), (N/A)
[  808.539449] cfg80211:   (57240000 KHz - 63720000 KHz @ 2160000 KHz), (N/A, 0 mBm), (N/A)
[ 1409.316761] ------------[ cut here ]------------
[ 1409.316846] WARNING: CPU: 1 PID: 5392 at net/wireless/sme.c:654 __cfg80211_connect_result+0x3b6/0x400 [cfg80211]()
[ 1409.316854] Modules linked in: tun nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables bnep rfcomm binfmt_misc uinput nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc ecb btusb nls_utf8 nls_cp437 vfat fat fuse hidp bluetooth parport_pc ppdev lp parport snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic iTCO_wdt iTCO_vendor_support hid_multitouch hid_sensor_als hid_sensor_magn_3d hid_sensor_accel_3d hid_sensor_gyro_3d hid_sensor_trigger industrialio_triggered_buffer kfifo_buf hid_sensor_iio_common industrialio joydev hid_sensor_hub hid_generic usbhid hid uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev media usb_storage tpm_infineon mwifiex_usb mwifiex cfg80211 rfkill x86_pkg_temp_thermal intel_powerclamp intel_rapl lpc_ich mfd_core coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel evdev aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd pcspkr i2c_i801 snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm i915 snd_seq snd_seq_device battery snd_timer ac video drm_kms_helper tpm_tis snd drm i2c_algo_bit tpm i2c_core mei_me soundcore mei button processor ext4 crc16 mbcache jbd2 sg sd_mod crc_t10dif crct10dif_common crc32c_intel thermal thermal_sys ahci libahci libata scsi_mod xhci_hcd ehci_pci ehci_hcd usbcore usb_common
[ 1409.317190] CPU: 1 PID: 5392 Comm: kworker/u8:0 Tainted: G        W     3.15.0-rc3 #1
[ 1409.317196] Hardware name: Microsoft Corporation Surface with Windows 8 Pro/Surface with Windows 8 Pro, BIOS 1.05.0050 08/20/2013
[ 1409.317227] Workqueue: cfg80211 cfg80211_event_work [cfg80211]
[ 1409.317233]  0000000000000009 ffffffff814b0558 0000000000000000 ffffffff8105cdd2
[ 1409.317240]  0000000000000000 ffff88008cd81dc0 0000000000000000 ffff8800d83ffd58
[ 1409.317246]  ffff8800d23dac00 ffffffffa04d6906 0000000000000001 0000000000000000
[ 1409.317254] Call Trace:
[ 1409.317268]  [<ffffffff814b0558>] ? dump_stack+0x41/0x51
[ 1409.317278]  [<ffffffff8105cdd2>] ? warn_slowpath_common+0x72/0x90
[ 1409.317305]  [<ffffffffa04d6906>] ? __cfg80211_connect_result+0x3b6/0x400 [cfg80211]
[ 1409.317327]  [<ffffffffa04b53e1>] ? cfg80211_process_wdev_events+0x171/0x1a0 [cfg80211]
[ 1409.317350]  [<ffffffffa04b5438>] ? cfg80211_process_rdev_events+0x28/0x60 [cfg80211]
[ 1409.317367]  [<ffffffffa04b1015>] ? cfg80211_event_work+0x15/0x20 [cfg80211]
[ 1409.317376]  [<ffffffff810759dd>] ? process_one_work+0x16d/0x420
[ 1409.317383]  [<ffffffff81076596>] ? worker_thread+0x116/0x3b0
[ 1409.317390]  [<ffffffff81076480>] ? rescuer_thread+0x330/0x330
[ 1409.317396]  [<ffffffff8107c741>] ? kthread+0xc1/0xe0
[ 1409.317403]  [<ffffffff8107c680>] ? kthread_create_on_node+0x180/0x180
[ 1409.317410]  [<ffffffff814bdb4c>] ? ret_from_fork+0x7c/0xb0
[ 1409.317416]  [<ffffffff8107c680>] ? kthread_create_on_node+0x180/0x180
[ 1409.317420] ---[ end trace 4e8d74f4055a7daf ]---
[ 1409.317636] cfg80211: Calling CRDA to update world regulatory domain
[ 1409.327135] cfg80211: World regulatory domain updated:
[ 1409.327145] cfg80211:  DFS Master region: unset
[ 1409.327150] cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp), (dfs_cac_time)
[ 1409.327158] cfg80211:   (2402000 KHz - 2472000 KHz @ 40000 KHz), (N/A, 2000 mBm), (N/A)
[ 1409.327166] cfg80211:   (2457000 KHz - 2482000 KHz @ 40000 KHz), (N/A, 2000 mBm), (N/A)
[ 1409.327172] cfg80211:   (2474000 KHz - 2494000 KHz @ 20000 KHz), (N/A, 2000 mBm), (N/A)
[ 1409.327178] cfg80211:   (5170000 KHz - 5250000 KHz @ 80000 KHz), (N/A, 2000 mBm), (N/A)
[ 1409.327185] cfg80211:   (5735000 KHz - 5835000 KHz @ 80000 KHz), (N/A, 2000 mBm), (N/A)
[ 1409.327191] cfg80211:   (57240000 KHz - 63720000 KHz @ 2160000 KHz), (N/A, 0 mBm), (N/A)
Comment 31 Bing Zhao 2014-05-07 04:27:51 UTC
@Comment #30:
> WARNING: CPU: 0 PID: 171 at net/wireless/sme.c:654
> __cfg80211_connect_result+0x3b6/0x400

There is a fix in sme.c function __cfg80211_connect_result():
https://git.kernel.org/cgit/linux/kernel/git/linville/wireless-testing.git/commit/?id=c1fbb258846dfc425507a093922d2d001e54c3ea

Not certain if it fixes this WARN_ON, but perhaps you want to have a try?
Comment 32 Jakub Zeman 2014-05-12 15:13:03 UTC
(In reply to Bing Zhao from comment #31)
> @Comment #30:
> > WARNING: CPU: 0 PID: 171 at net/wireless/sme.c:654
> > __cfg80211_connect_result+0x3b6/0x400
> 
> There is a fix in sme.c function __cfg80211_connect_result():
> https://git.kernel.org/cgit/linux/kernel/git/linville/wireless-testing.git/
> commit/?id=c1fbb258846dfc425507a093922d2d001e54c3ea
> 
> Not certain if it fixes this WARN_ON, but perhaps you want to have a try?

Hi,

I applied the patch and I'm getting following WARNINGS:
[ 2770.863179] ------------[ cut here ]------------
[ 2770.863233] WARNING: CPU: 3 PID: 171 at net/wireless/sme.c:654 __cfg80211_connect_result+0x3c6/0x410 [cfg80211]()
[ 2770.863237] Modules linked in: nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables bnep rfcomm binfmt_misc uinput nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc nls_utf8 nls_cp437 vfat fat ecb btusb fuse hidp bluetooth parport_pc ppdev lp parport hid_sensor_als hid_sensor_gyro_3d hid_sensor_accel_3d hid_sensor_magn_3d hid_sensor_trigger industrialio_triggered_buffer kfifo_buf hid_sensor_iio_common industrialio joydev hid_multitouch hid_generic hid_sensor_hub usbhid hid snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic iTCO_wdt iTCO_vendor_support uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev media usb_storage tpm_infineon x86_pkg_temp_thermal intel_powerclamp mwifiex_usb evdev mwifiex cfg80211 snd_hda_intel snd_hda_controller rfkill snd_hda_codec snd_hwdep snd_pcm intel_rapl snd_seq snd_seq_device coretemp snd_timer kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel snd aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd soundcore i2c_i801 pcspkr lpc_ich mfd_core ac battery i915 mei_me video mei drm_kms_helper drm tpm_tis tpm i2c_algo_bit i2c_core button processor ext4 crc16 mbcache jbd2 sg sd_mod crc_t10dif crct10dif_common crc32c_intel ahci libahci libata scsi_mod xhci_hcd ehci_pci ehci_hcd thermal thermal_sys usbcore usb_common
[ 2770.863391] CPU: 3 PID: 171 Comm: kworker/u8:5 Tainted: G        W     3.15.0-rc3 #2
[ 2770.863395] Hardware name: Microsoft Corporation Surface with Windows 8 Pro/Surface with Windows 8 Pro, BIOS 1.05.0050 08/20/2013
[ 2770.863410] Workqueue: cfg80211 cfg80211_event_work [cfg80211]
[ 2770.863414]  0000000000000009 ffffffff814b0558 0000000000000000 ffffffff8105cdd2
[ 2770.863420]  0000000000000000 ffff88003726bdc0 0000000000000000 ffff8800b67c6a58
[ 2770.863426]  ffff8800368b2800 ffffffffa04cd916 0000000000000001 0000000000000000
[ 2770.863432] Call Trace:
[ 2770.863446]  [<ffffffff814b0558>] ? dump_stack+0x41/0x51
[ 2770.863455]  [<ffffffff8105cdd2>] ? warn_slowpath_common+0x72/0x90
[ 2770.863473]  [<ffffffffa04cd916>] ? __cfg80211_connect_result+0x3c6/0x410 [cfg80211]
[ 2770.863491]  [<ffffffffa04ac3e1>] ? cfg80211_process_wdev_events+0x171/0x1a0 [cfg80211]
[ 2770.863509]  [<ffffffffa04ac438>] ? cfg80211_process_rdev_events+0x28/0x60 [cfg80211]
[ 2770.863523]  [<ffffffffa04a8015>] ? cfg80211_event_work+0x15/0x20 [cfg80211]
[ 2770.863533]  [<ffffffff810759dd>] ? process_one_work+0x16d/0x420
[ 2770.863540]  [<ffffffff81076596>] ? worker_thread+0x116/0x3b0
[ 2770.863547]  [<ffffffff81076480>] ? rescuer_thread+0x330/0x330
[ 2770.863553]  [<ffffffff8107c741>] ? kthread+0xc1/0xe0
[ 2770.863560]  [<ffffffff8107c680>] ? kthread_create_on_node+0x180/0x180
[ 2770.863567]  [<ffffffff814bdb4c>] ? ret_from_fork+0x7c/0xb0
[ 2770.863572]  [<ffffffff8107c680>] ? kthread_create_on_node+0x180/0x180
[ 2770.863577] ---[ end trace c03b8d25d9f6be37 ]---
[ 2770.863665] cfg80211: Calling CRDA to update world regulatory domain
[ 2770.874422] cfg80211: World regulatory domain updated:
[ 2770.874431] cfg80211:  DFS Master region: unset
[ 2770.874434] cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp), (dfs_cac_time)
[ 2770.874440] cfg80211:   (2402000 KHz - 2472000 KHz @ 40000 KHz), (N/A, 2000 mBm), (N/A)
[ 2770.874445] cfg80211:   (2457000 KHz - 2482000 KHz @ 40000 KHz), (N/A, 2000 mBm), (N/A)
[ 2770.874449] cfg80211:   (2474000 KHz - 2494000 KHz @ 20000 KHz), (N/A, 2000 mBm), (N/A)
[ 2770.874453] cfg80211:   (5170000 KHz - 5250000 KHz @ 80000 KHz), (N/A, 2000 mBm), (N/A)
[ 2770.874456] cfg80211:   (5735000 KHz - 5835000 KHz @ 80000 KHz), (N/A, 2000 mBm), (N/A)
[ 2770.874461] cfg80211:   (57240000 KHz - 63720000 KHz @ 2160000 KHz), (N/A, 0 mBm), (N/A)
[ 3010.958863] ------------[ cut here ]------------
[ 3010.958920] WARNING: CPU: 1 PID: 171 at net/wireless/sme.c:654 __cfg80211_connect_result+0x3c6/0x410 [cfg80211]()
[ 3010.958927] Modules linked in: nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables bnep rfcomm binfmt_misc uinput nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc nls_utf8 nls_cp437 vfat fat ecb btusb fuse hidp bluetooth parport_pc ppdev lp parport hid_sensor_als hid_sensor_gyro_3d hid_sensor_accel_3d hid_sensor_magn_3d hid_sensor_trigger industrialio_triggered_buffer kfifo_buf hid_sensor_iio_common industrialio joydev hid_multitouch hid_generic hid_sensor_hub usbhid hid snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic iTCO_wdt iTCO_vendor_support uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev media usb_storage tpm_infineon x86_pkg_temp_thermal intel_powerclamp mwifiex_usb evdev mwifiex cfg80211 snd_hda_intel snd_hda_controller rfkill snd_hda_codec snd_hwdep snd_pcm intel_rapl snd_seq snd_seq_device coretemp snd_timer kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel snd aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd soundcore i2c_i801 pcspkr lpc_ich mfd_core ac battery i915 mei_me video mei drm_kms_helper drm tpm_tis tpm i2c_algo_bit i2c_core button processor ext4 crc16 mbcache jbd2 sg sd_mod crc_t10dif crct10dif_common crc32c_intel ahci libahci libata scsi_mod xhci_hcd ehci_pci ehci_hcd thermal thermal_sys usbcore usb_common
[ 3010.959098] CPU: 1 PID: 171 Comm: kworker/u8:5 Tainted: G        W     3.15.0-rc3 #2
[ 3010.959103] Hardware name: Microsoft Corporation Surface with Windows 8 Pro/Surface with Windows 8 Pro, BIOS 1.05.0050 08/20/2013
[ 3010.959119] Workqueue: cfg80211 cfg80211_event_work [cfg80211]
[ 3010.959123]  0000000000000009 ffffffff814b0558 0000000000000000 ffffffff8105cdd2
[ 3010.959130]  0000000000000000 ffff88003726bdc0 0000000000000000 ffff8800d7bd69d8
[ 3010.959135]  ffff8800368b2800 ffffffffa04cd916 0000000000000001 0000000000000000
[ 3010.959141] Call Trace:
[ 3010.959154]  [<ffffffff814b0558>] ? dump_stack+0x41/0x51
[ 3010.959161]  [<ffffffff8105cdd2>] ? warn_slowpath_common+0x72/0x90
[ 3010.959180]  [<ffffffffa04cd916>] ? __cfg80211_connect_result+0x3c6/0x410 [cfg80211]
[ 3010.959199]  [<ffffffffa04ac3e1>] ? cfg80211_process_wdev_events+0x171/0x1a0 [cfg80211]
[ 3010.959215]  [<ffffffffa04ac438>] ? cfg80211_process_rdev_events+0x28/0x60 [cfg80211]
[ 3010.959230]  [<ffffffffa04a8015>] ? cfg80211_event_work+0x15/0x20 [cfg80211]
[ 3010.959240]  [<ffffffff810759dd>] ? process_one_work+0x16d/0x420
[ 3010.959247]  [<ffffffff81076596>] ? worker_thread+0x116/0x3b0
[ 3010.959253]  [<ffffffff81076480>] ? rescuer_thread+0x330/0x330
[ 3010.959260]  [<ffffffff8107c741>] ? kthread+0xc1/0xe0
[ 3010.959266]  [<ffffffff8107c680>] ? kthread_create_on_node+0x180/0x180
[ 3010.959273]  [<ffffffff814bdb4c>] ? ret_from_fork+0x7c/0xb0
[ 3010.959279]  [<ffffffff8107c680>] ? kthread_create_on_node+0x180/0x180
[ 3010.959283] ---[ end trace c03b8d25d9f6be38 ]---

The WiFi works after that but in few hours it hangs completely.
Comment 33 Jakub Zeman 2014-05-15 16:34:17 UTC
Can anybody recommend me some other fix?
Comment 34 kungfoobar+kbt 2014-06-09 01:36:21 UTC
Subscribing to this bug. Using Surface Pro 2, WiFi stops working after several hours. I'll be saving logs each time it hangs.
Comment 35 kungfoobar+kbt 2014-06-23 17:29:31 UTC
Most messages are similar to this one, and WiFi no longer works until I restart. Sometimes even other USB Wifi devices stops working and I have to restart no matter what, and sometimes it doesn't restart, it's locked up trying to shut down something. Sometimes I think it stops working when I plug the charger, sometimes it's pretty random.

Jun 10 02:50:18 dreampad-linux wpa_supplicant[1436]: message repeated 47 times: [ mlan0: CTRL-EVENT-SCAN-STARTED ]
Jun 10 02:50:22 dreampad-linux wpa_supplicant[1436]: mlan0: Trying to associate with c0:4a:00:e3:22:46 (SSID='xxxxxxx' freq=2412 MHz)
Jun 10 02:50:22 dreampad-linux wpa_supplicant[1436]: mlan0: CTRL-EVENT-DISCONNECTED bssid=c0:4a:00:e3:22:46 reason=3
Jun 10 02:50:22 dreampad-linux NetworkManager[823]: <info> (mlan0): roamed from BSSID E8:94:F6:6A:F0:A0 (xxxxxxx) to (none) ((none))
Jun 10 02:50:22 dreampad-linux NetworkManager[823]: <info> (mlan0): supplicant interface state: completed -> associating
Jun 10 02:50:22 dreampad-linux wpa_supplicant[1436]: mlan0: Associated with c0:4a:00:e3:22:46
Jun 10 02:50:22 dreampad-linux NetworkManager[823]: <warn> Connection disconnected (reason 3)
Jun 10 02:50:22 dreampad-linux kernel: [ 5498.747924] ------------[ cut here ]------------
Jun 10 02:50:22 dreampad-linux kernel: [ 5498.747949] WARNING: CPU: 1 PID: 3386 at net/wireless/sme.c:655 __cfg80211_connect_result+0x3d6/0x420 [cfg80211]()
Jun 10 02:50:22 dreampad-linux kernel: [ 5498.747951] Modules linked in: hidp hid_sensor_als hid_sensor_gyro_3d hid_sensor_magn_3d hid_sensor_accel_3d hid_sensor_trigger industrialio_triggered_buffer kfifo_buf industrialio hid_sensor_iio_common btusb snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel snd_hda_controller mwifiex_usb intel_rapl snd_hda_codec mwifiex x86_pkg_temp_thermal snd_hwdep intel_powerclamp coretemp kvm_intel snd_pcm kvm snd_seq_midi snd_seq_midi_event cfg80211 snd_rawmidi uvcvideo crct10dif_pclmul snd_seq dm_multipath crc32_pclmul scsi_dh ghash_clmulni_intel videobuf2_vmalloc videobuf2_memops videobuf2_core aesni_intel videodev snd_seq_device snd_timer aes_x86_64 lrw gf128mul hid_sensor_hub glue_helper joydev ablk_helper mei_me cryptd snd mei lpc_ich hid_multitouch soundcore tpm_infineon i2c_hid rfcomm bnep bluetooth dw_dmac dw_dmac_core i2c_designware_platform binfmt_misc 8250_dw mac_hid i2c_designware_core spi_pxa2xx_platform parport_pc ppdev nls_iso8859_1 lp parport dm_mirror dm_region_hash dm_log hid_generic usbhid hid i915 i2c_algo_bit drm_kms_helper drm ahci libahci sdhci_acpi sdhci video
Jun 10 02:50:22 dreampad-linux kernel: [ 5498.748001] CPU: 1 PID: 3386 Comm: kworker/u8:2 Not tainted 3.15.0-rc7-wl+ #2
Jun 10 02:50:22 dreampad-linux kernel: [ 5498.748003] Hardware name: Microsoft Corporation Surface Pro 2/Surface Pro 2, BIOS 2.04.0450 01/14/2014
Jun 10 02:50:22 dreampad-linux kernel: [ 5498.748010] Workqueue: cfg80211 cfg80211_event_work [cfg80211]
Jun 10 02:50:22 dreampad-linux kernel: [ 5498.748012]  0000000000000009 ffff88002a18fcb8 ffffffff817c4e16 0000000000000000
Jun 10 02:50:22 dreampad-linux kernel: [ 5498.748014]  ffff88002a18fcf0 ffffffff810679ad 0000000000000000 0000000000000000
Jun 10 02:50:22 dreampad-linux kernel: [ 5498.748017]  ffff8800cdbd84f8 ffff8801f8855800 ffff8801f8ee3000 ffff88002a18fd00
Jun 10 02:50:22 dreampad-linux kernel: [ 5498.748019] Call Trace:
Jun 10 02:50:22 dreampad-linux kernel: [ 5498.748025]  [<ffffffff817c4e16>] dump_stack+0x45/0x56
Jun 10 02:50:22 dreampad-linux kernel: [ 5498.748029]  [<ffffffff810679ad>] warn_slowpath_common+0x7d/0xa0
Jun 10 02:50:22 dreampad-linux kernel: [ 5498.748032]  [<ffffffff81067a8a>] warn_slowpath_null+0x1a/0x20
Jun 10 02:50:22 dreampad-linux kernel: [ 5498.748043]  [<ffffffffa04417f6>] __cfg80211_connect_result+0x3d6/0x420 [cfg80211]
Jun 10 02:50:22 dreampad-linux kernel: [ 5498.748051]  [<ffffffffa041ee33>] cfg80211_process_wdev_events+0x153/0x1d0 [cfg80211]
Jun 10 02:50:22 dreampad-linux kernel: [ 5498.748060]  [<ffffffffa041eee8>] cfg80211_process_rdev_events+0x38/0x70 [cfg80211]
Jun 10 02:50:22 dreampad-linux kernel: [ 5498.748066]  [<ffffffffa041a01e>] cfg80211_event_work+0x1e/0x30 [cfg80211]
Jun 10 02:50:22 dreampad-linux kernel: [ 5498.748070]  [<ffffffff81083542>] process_one_work+0x182/0x450
Jun 10 02:50:22 dreampad-linux kernel: [ 5498.748073]  [<ffffffff81084341>] worker_thread+0x121/0x410
Jun 10 02:50:22 dreampad-linux kernel: [ 5498.748076]  [<ffffffff81084220>] ? rescuer_thread+0x430/0x430
Jun 10 02:50:22 dreampad-linux kernel: [ 5498.748079]  [<ffffffff8108ae82>] kthread+0xd2/0xf0
Jun 10 02:50:22 dreampad-linux kernel: [ 5498.748082]  [<ffffffff8108adb0>] ? kthread_create_on_node+0x190/0x190
Jun 10 02:50:22 dreampad-linux kernel: [ 5498.748085]  [<ffffffff817d5cbc>] ret_from_fork+0x7c/0xb0
Jun 10 02:50:22 dreampad-linux kernel: [ 5498.748087]  [<ffffffff8108adb0>] ? kthread_create_on_node+0x190/0x190
Jun 10 02:50:22 dreampad-linux kernel: [ 5498.748089] ---[ end trace ffb17d91dad4b0bc ]---
Jun 10 02:50:22 dreampad-linux kernel: [ 5498.748118] cfg80211: Calling CRDA to update world regulatory domain
Jun 10 02:50:22 dreampad-linux NetworkManager[823]: <info> (mlan0): supplicant interface state: associating -> 4-way handshake
Jun 10 02:50:22 dreampad-linux kernel: [ 5498.750651] cfg80211: World regulatory domain updated:
Jun 10 02:50:22 dreampad-linux kernel: [ 5498.750653] cfg80211:  DFS Master region: unset
Jun 10 02:50:22 dreampad-linux kernel: [ 5498.750654] cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp), (dfs_cac_time)
Jun 10 02:50:22 dreampad-linux kernel: [ 5498.750656] cfg80211:   (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm), (N/A)
Jun 10 02:50:22 dreampad-linux kernel: [ 5498.750657] cfg80211:   (2457000 KHz - 2482000 KHz @ 40000 KHz), (300 mBi, 2000 mBm), (N/A)
Jun 10 02:50:22 dreampad-linux kernel: [ 5498.750658] cfg80211:   (2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm), (N/A)
Jun 10 02:50:22 dreampad-linux kernel: [ 5498.750659] cfg80211:   (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm), (N/A)
Jun 10 02:50:22 dreampad-linux kernel: [ 5498.750660] cfg80211:   (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 2000 mBm), (N/A)
Jun 10 02:50:22 dreampad-linux wpa_supplicant[1436]: mlan0: WPA: Failed to set PTK to the driver (alg=3 keylen=16 bssid=c0:4a:00:e3:22:46)
Jun 10 02:50:22 dreampad-linux wpa_supplicant[1436]: mlan0: CTRL-EVENT-DISCONNECTED bssid=c0:4a:00:e3:22:46 reason=1 locally_generated=1
Jun 10 02:50:22 dreampad-linux wpa_supplicant[1436]: mlan0: WPA: 4-Way Handshake failed - pre-shared key may be incorrect
Jun 10 02:50:22 dreampad-linux wpa_supplicant[1436]: mlan0: CTRL-EVENT-SSID-TEMP-DISABLED id=0 ssid="xxxxxxx" auth_failures=1 duration=10
Jun 10 02:50:22 dreampad-linux NetworkManager[823]: <warn> Connection disconnected (reason -1)
Jun 10 02:50:22 dreampad-linux NetworkManager[823]: <info> (mlan0): supplicant interface state: 4-way handshake -> disconnected
Jun 10 02:50:22 dreampad-linux NetworkManager[823]: <info> Activation (mlan0/wireless): disconnected during association, asking for new key.
Jun 10 02:50:22 dreampad-linux NetworkManager[823]: <info> (mlan0): device state change: activated -> need-auth (reason 'supplicant-disconnect') [100 60 8]
Jun 10 02:50:22 dreampad-linux NetworkManager[823]: <info> NetworkManager state is now CONNECTING
Jun 10 02:50:22 dreampad-linux dbus[476]: [system] Activating service name='org.freedesktop.nm_dispatcher' (using servicehelper)
Jun 10 02:50:22 dreampad-linux dbus[476]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Jun 10 02:50:22 dreampad-linux kernel: [ 5498.855268] usb 2-2: cmd: Scan is blocked during association...
Jun 10 02:50:22 dreampad-linux kernel: [ 5498.855273] usb 2-2: scan failed: -16
Jun 10 02:50:22 dreampad-linux NetworkManager[823]: <info> (mlan0): supplicant interface state: disconnected -> scanning
Jun 10 02:50:22 dreampad-linux dbus[476]: [system] Activating service name='org.freedesktop.hostname1' (using servicehelper)
Jun 10 02:50:22 dreampad-linux dbus[476]: [system] Successfully activated service 'org.freedesktop.hostname1'
Comment 36 David López 2014-06-23 21:03:42 UTC
I have a Surface Pro 1 with 64 bits archlinux and frequently (more than 50% times) wifi doesn't work on start. I've tested several kernels with same problems: 3.14.6, 3.15.1 and lts 3.10.44. Changing usb8797_uapsta.bin to the one in git (541.048 bytes, md5sum e7dbdfad0d030910ff35b3ca18af8bbd) didn't help

However I've noticed that issue only happens with network manager, it doesn't happen with netctl. I'm not sure that netctl works fine, because sometimes wifi seems to oscillate between downloading/disconection and download ratios are poor, and (rarely) it seems to completely disconnect without showing error messages. But it works much better than network manager for me. I suggest you to try it.

This is a dmesg error with network manager, it seems similar to yours:

[   11.630540] ------------[ cut here ]------------
[   11.630562] WARNING: CPU: 1 PID: 92 at net/wireless/sme.c:654 __cfg80211_connect_result+0x421/0x480 [cfg80211]()
[   11.630565] Modules linked in: ecb btusb bluetooth 6lowpan_iphc hid_multitouch joydev mousedev hid_sensor_hub hid_generic usbhid hid uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev media mwifiex_usb mwifiex cfg80211 rfkill tpm_infineon coretemp hwmon intel_rapl x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm iTCO_wdt iTCO_vendor_support crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel mac_hid evdev aesni_intel snd_hda_codec_hdmi aes_x86_64 lrw gf128mul glue_helper snd_hda_codec_realtek ablk_helper cryptd i915 snd_hda_codec_generic microcode i2c_i801 pcspkr snd_hda_intel snd_hda_controller thermal drm_kms_helper snd_hda_codec lpc_ich snd_hwdep battery tpm_tis drm snd_pcm tpm snd_timer intel_gtt mei_me video snd i2c_algo_bit mei ac i2c_core soundcore button processor
[   11.630626]  ext4 crc16 mbcache jbd2 sd_mod crc_t10dif crct10dif_common ahci libahci ehci_pci libata ehci_hcd xhci_hcd scsi_mod usbcore usb_common
[   11.630643] CPU: 1 PID: 92 Comm: kworker/u8:3 Not tainted 3.15.1-1-ARCH #1
[   11.630645] Hardware name: Microsoft Corporation Surface with Windows 8 Pro/Surface with Windows 8 Pro, BIOS 1.00.0250 01/08/2013
[   11.630653] Workqueue: cfg80211 cfg80211_event_work [cfg80211]
[   11.630656]  0000000000000000 000000004a9f0ca3 ffff8800d80b7ca0 ffffffff81507c61
[   11.630660]  0000000000000000 ffff8800d80b7cd8 ffffffff81069acd ffff8800d8ad0000
[   11.630663]  0000000000000000 0000000000000000 ffff8800d0f5c000 ffff8800d0361bb8
[   11.630667] Call Trace:
[   11.630675]  [<ffffffff81507c61>] dump_stack+0x4d/0x6f
[   11.630682]  [<ffffffff81069acd>] warn_slowpath_common+0x7d/0xa0
[   11.630686]  [<ffffffff81069bfa>] warn_slowpath_null+0x1a/0x20
[   11.630695]  [<ffffffffa0875ad1>] __cfg80211_connect_result+0x421/0x480 [cfg80211]
[   11.630704]  [<ffffffffa084fb06>] ? cfg80211_process_wdev_events+0x86/0x1c0 [cfg80211]
[   11.630712]  [<ffffffffa084fc0c>] cfg80211_process_wdev_events+0x18c/0x1c0 [cfg80211]
[   11.630720]  [<ffffffffa084fc78>] cfg80211_process_rdev_events+0x38/0x70 [cfg80211]
[   11.630727]  [<ffffffffa084b10e>] cfg80211_event_work+0x1e/0x30 [cfg80211]
[   11.630732]  [<ffffffff81086178>] process_one_work+0x168/0x450
[   11.630736]  [<ffffffff81086bd2>] worker_thread+0x132/0x3e0
[   11.630741]  [<ffffffff81086aa0>] ? manage_workers.isra.23+0x2d0/0x2d0
[   11.630746]  [<ffffffff8108d3ca>] kthread+0xea/0x100
[   11.630750]  [<ffffffff8108d2e0>] ? kthread_create_on_node+0x1a0/0x1a0
[   11.630756]  [<ffffffff815158fc>] ret_from_fork+0x7c/0xb0
[   11.630759]  [<ffffffff8108d2e0>] ? kthread_create_on_node+0x1a0/0x1a0
[   11.630762] ---[ end trace fb9b3ce9f51da601 ]---
[   11.630816] cfg80211: Calling CRDA to update world regulatory domain
[   11.632667] cfg80211: World regulatory domain updated:
[   11.632670] cfg80211:  DFS Master region: unset
[   11.632671] cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp), (dfs_cac_time)
[   11.632673] cfg80211:   (2402000 KHz - 2472000 KHz @ 40000 KHz), (N/A, 2000 mBm), (N/A)
[   11.632675] cfg80211:   (2457000 KHz - 2482000 KHz @ 40000 KHz), (N/A, 2000 mBm), (N/A)
[   11.632676] cfg80211:   (2474000 KHz - 2494000 KHz @ 20000 KHz), (N/A, 2000 mBm), (N/A)
[   11.632678] cfg80211:   (5170000 KHz - 5250000 KHz @ 80000 KHz), (N/A, 2000 mBm), (N/A)
[   11.632679] cfg80211:   (5735000 KHz - 5835000 KHz @ 80000 KHz), (N/A, 2000 mBm), (N/A)
[   11.632680] cfg80211:   (57240000 KHz - 63720000 KHz @ 2160000 KHz), (N/A, 0 mBm), (N/A)
[   11.632809] cfg80211: Calling CRDA for country: ES
[   11.635291] cfg80211: Regulatory domain changed to country: ES
[   11.635298] cfg80211:  DFS Master region: ETSI
[   11.635301] cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp), (dfs_cac_time)
[   11.635304] cfg80211:   (2400000 KHz - 2483500 KHz @ 40000 KHz), (N/A, 2000 mBm), (N/A)
[   11.635308] cfg80211:   (5150000 KHz - 5250000 KHz @ 80000 KHz), (N/A, 2000 mBm), (N/A)
[   11.635311] cfg80211:   (5250000 KHz - 5350000 KHz @ 80000 KHz), (N/A, 2000 mBm), (N/A)
[   11.635314] cfg80211:   (5470000 KHz - 5725000 KHz @ 80000 KHz), (N/A, 2698 mBm), (0 s)
[   11.635316] cfg80211:   (57240000 KHz - 65880000 KHz @ 2160000 KHz), (N/A, 4000 mBm), (N/A)
[   11.739812] usb 3-3: cmd: Scan is blocked during association...
[   11.739822] usb 3-3: scan failed: -16
[   36.665592] usb 3-3: cmd: Scan is blocked during association...
[   36.665600] usb 3-3: scan failed: -16
[   40.026909] usb 3-3: cmd: Scan is blocked during association...
[   40.026923] usb 3-3: scan failed: -16

I wish this information helps
Comment 37 David López 2014-06-25 06:36:28 UTC
The dmesg error message in the (not so frequent) netctl stops is a bit different than networmanager's:

[    5.372733] usb 3-3: driver_version = mwifiex 1.0 (14.68.29.p26) 
[ 2304.805562] usb 3-3: mwifiex_cmd_timeout_func: Timeout cmd id (1403652553.551961) = 0xa4, act = 0x0
[ 2304.805573] usb 3-3: num_data_h2c_failure = 0
[ 2304.805575] usb 3-3: num_cmd_h2c_failure = 0
[ 2304.805577] usb 3-3: is_cmd_timedout = 1
[ 2304.805580] usb 3-3: num_tx_timeout = 0
[ 2304.805582] usb 3-3: last_cmd_index = 2
[ 2304.805584] usb 3-3: last_cmd_id: 7f 00 16 00 a4 00 16 00 a4 00
[ 2304.805586] usb 3-3: last_cmd_act: 00 00 00 00 00 00 00 00 00 00
[ 2304.805589] usb 3-3: last_cmd_resp_index = 1
[ 2304.805591] usb 3-3: last_cmd_resp_id: 7f 80 16 80 7f 80 16 80 a4 80
[ 2304.805593] usb 3-3: last_event_index = 1
[ 2304.805595] usb 3-3: last_event: 33 00 33 00 17 00 2b 00 33 00
[ 2304.805597] usb 3-3: data_sent=1 cmd_sent=0
[ 2304.805599] usb 3-3: ps_mode=1 ps_state=0
[ 2304.805605] usb 3-3: cmd timeout
[ 2304.805611] usb 3-3: failed to get signal information
[ 2304.806567] usb 3-3: PREP_CMD: FW is in bad state
[ 2304.806572] usb 3-3: failed to get signal information
[ 2305.307217] usb 3-3: PREP_CMD: FW is in bad state
[ 2305.307228] usb 3-3: failed to get signal information
[ 2305.808080] usb 3-3: PREP_CMD: FW is in bad state
[ 2305.808084] usb 3-3: failed to get signal information
[ 2306.309178] usb 3-3: PREP_CMD: FW is in bad state
[ 2306.309189] usb 3-3: failed to get signal information
[ 2306.809854] usb 3-3: PREP_CMD: FW is in bad state
[ 2306.809859] usb 3-3: failed to get signal information
[ 2307.310858] usb 3-3: PREP_CMD: FW is in bad state
[ 2307.310869] usb 3-3: failed to get signal information
[ 2307.812253] usb 3-3: PREP_CMD: FW is in bad state
[ 2307.812264] usb 3-3: failed to get signal information
[ 2308.312770] usb 3-3: PREP_CMD: FW is in bad state
[ 2308.312781] usb 3-3: failed to get signal information
[ 2308.813084] usb 3-3: PREP_CMD: FW is in bad state
[ 2308.813088] usb 3-3: failed to get signal information
[ 2309.313656] usb 3-3: PREP_CMD: FW is in bad state
[ 2309.313665] usb 3-3: failed to get signal information
[ 2309.814919] usb 3-3: PREP_CMD: FW is in bad state
[ 2309.814930] usb 3-3: failed to get signal information
[ 2310.315803] usb 3-3: PREP_CMD: FW is in bad state
[ 2310.315812] usb 3-3: failed to get signal information
[ 2310.434868] ------------[ cut here ]------------
[ 2310.434899] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:264 dev_watchdog+0x259/0x270()
[ 2310.434908] NETDEV WATCHDOG: mlan0 (usb): transmit queue 1 timed out
[ 2310.434914] Modules linked in: ecb btusb bluetooth 6lowpan_iphc fuse joydev mousedev hid_multitouch hid_sensor_hub hid_generic usbhid hid snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic mwifiex_usb mwifiex uvcvideo cfg80211 videobuf2_vmalloc videobuf2_memops videobuf2_core videodev rfkill media nls_iso8859_1 nls_cp437 vfat fat coretemp hwmon iTCO_wdt iTCO_vendor_support intel_rapl x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm crct10dif_pclmul crc32_pclmul mac_hid evdev crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper snd_hda_intel ablk_helper cryptd snd_hda_controller snd_hda_codec microcode i915 snd_hwdep snd_pcm snd_timer snd soundcore lpc_ich i2c_i801 pcspkr thermal drm_kms_helper mei_me mei drm tpm_infineon tpm_tis intel_gtt i2c_algo_bit video
[ 2310.435035]  i2c_core button tpm processor battery ac ext4 crc16 mbcache jbd2 sd_mod crc_t10dif crct10dif_common ahci libahci ehci_pci libata xhci_hcd ehci_hcd scsi_mod usbcore usb_common
[ 2310.435069] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.15.1-1-ARCH #1
[ 2310.435076] Hardware name: Microsoft Corporation Surface with Windows 8 Pro/Surface with Windows 8 Pro, BIOS 1.00.0250 01/08/2013
[ 2310.435082]  0000000000000000 f96ff68ab991e215 ffff88011f203d90 ffffffff81507c61
[ 2310.435090]  ffff88011f203dd8 ffff88011f203dc8 ffffffff81069acd 0000000000000001
[ 2310.435098]  ffff8800d88ef440 0000000000000000 ffff8800c37e5000 0000000000000004
[ 2310.435107] Call Trace:
[ 2310.435113]  <IRQ>  [<ffffffff81507c61>] dump_stack+0x4d/0x6f
[ 2310.435134]  [<ffffffff81069acd>] warn_slowpath_common+0x7d/0xa0
[ 2310.435142]  [<ffffffff81069b4c>] warn_slowpath_fmt+0x5c/0x80
[ 2310.435156]  [<ffffffff814369f9>] dev_watchdog+0x259/0x270
[ 2310.435165]  [<ffffffff814367a0>] ? dev_graft_qdisc+0x80/0x80
[ 2310.435175]  [<ffffffff81076d76>] call_timer_fn+0x36/0x160
[ 2310.435183]  [<ffffffff814367a0>] ? dev_graft_qdisc+0x80/0x80
[ 2310.435191]  [<ffffffff810777a4>] run_timer_softirq+0x274/0x320
[ 2310.435202]  [<ffffffff8106ed6a>] __do_softirq+0xfa/0x2f0
[ 2310.435212]  [<ffffffff8106f0b6>] irq_exit+0x86/0xb0
[ 2310.435221]  [<ffffffff81517a24>] smp_apic_timer_interrupt+0x44/0x50
[ 2310.435233]  [<ffffffff8151651d>] apic_timer_interrupt+0x6d/0x80
[ 2310.435238]  <EOI>  [<ffffffff813d390c>] ? cpuidle_enter_state+0x4c/0xc0
[ 2310.435258]  [<ffffffff813d39b7>] cpuidle_enter+0x17/0x20
[ 2310.435268]  [<ffffffff810b21f5>] cpu_startup_entry+0x385/0x4f0
[ 2310.435280]  [<ffffffff814fcb94>] rest_init+0x84/0x90
[ 2310.435289]  [<ffffffff818f5ea6>] start_kernel+0x448/0x469
[ 2310.435298]  [<ffffffff818f5120>] ? early_idt_handlers+0x120/0x120
[ 2310.435307]  [<ffffffff818f54d7>] x86_64_start_reservations+0x2a/0x2c
[ 2310.435315]  [<ffffffff818f5626>] x86_64_start_kernel+0x14d/0x170
[ 2310.435321] ---[ end trace dcd1feb1372e41d3 ]---
[ 2310.435333] usb 3-3: 4295570104 : Tx timeout(#1), bss_type-num = 0-0
[ 2310.816710] usb 3-3: PREP_CMD: FW is in bad state
[ 2310.816721] usb 3-3: failed to get signal information
[ 2311.317296] usb 3-3: PREP_CMD: FW is in bad state
[ 2311.317307] usb 3-3: failed to get signal information
[ 2311.818871] usb 3-3: PREP_CMD: FW is in bad state
[ 2311.818880] usb 3-3: failed to get signal information
[ 2312.319388] usb 3-3: PREP_CMD: FW is in bad state
[ 2312.319397] usb 3-3: failed to get signal information
[ 2312.821036] usb 3-3: PREP_CMD: FW is in bad state
[ 2312.821045] usb 3-3: failed to get signal information
[ 2313.321748] usb 3-3: PREP_CMD: FW is in bad state
[ 2313.321759] usb 3-3: failed to get signal information
[ 2313.823197] usb 3-3: PREP_CMD: FW is in bad state
[ 2313.823205] usb 3-3: failed to get signal information
[ 2314.323917] usb 3-3: PREP_CMD: FW is in bad state
[ 2314.323926] usb 3-3: failed to get signal information
[ 2314.825128] usb 3-3: PREP_CMD: FW is in bad state
[ 2314.825136] usb 3-3: failed to get signal information
[ 2315.325702] usb 3-3: PREP_CMD: FW is in bad state
[ 2315.325712] usb 3-3: failed to get signal information
[ 2315.827187] usb 3-3: PREP_CMD: FW is in bad state
[ 2315.827195] usb 3-3: failed to get signal information
[ 2316.327849] usb 3-3: PREP_CMD: FW is in bad state
[ 2316.327858] usb 3-3: failed to get signal information
[ 2316.828166] usb 3-3: PREP_CMD: FW is in bad state
[ 2316.828175] usb 3-3: failed to get signal information
[ 2317.328946] usb 3-3: PREP_CMD: FW is in bad state
[ 2317.328957] usb 3-3: failed to get signal information
[ 2317.829793] usb 3-3: PREP_CMD: FW is in bad state
[ 2317.829802] usb 3-3: failed to get signal information
[ 2318.331101] usb 3-3: PREP_CMD: FW is in bad state
[ 2318.331110] usb 3-3: failed to get signal information
[ 2318.831958] usb 3-3: PREP_CMD: FW is in bad state
[ 2318.831969] usb 3-3: failed to get signal information
[ 2319.332592] usb 3-3: PREP_CMD: FW is in bad state
[ 2319.332603] usb 3-3: failed to get signal information
[ 2319.833496] usb 3-3: PREP_CMD: FW is in bad state
[ 2319.833507] usb 3-3: failed to get signal information
[ 2320.334812] usb 3-3: PREP_CMD: FW is in bad state
[ 2320.334822] usb 3-3: failed to get signal information
[ 2320.439522] usb 3-3: 4295573104 : Tx timeout(#2), bss_type-num = 0-0
Comment 38 Bing Zhao 2014-07-01 22:23:53 UTC
Hi, we have recently found some missing memset (for tx_info/rx_info control buffers) which can cause random failures on both command and data path.
I'm not sure if these patches will solve this particular issue or not, but they are worth a try:

#1: https://git.kernel.org/cgit/linux/kernel/git/linville/wireless.git/commit/?id=d76744a93246eccdca1106037e8ee29debf48277

#2: http://marc.info/?l=linux-wireless&m=140425083601658&w=4
Comment 39 kungfoobar+kbt 2014-07-01 22:33:48 UTC
Thank you, Bing! I'll try these patches and test.

Is there a way to reset the device when it freezes? Right now when it happens it makes the system unusable, other USB devices frequently stop working too and I have to reboot (with sysrq, regular rebooting/halting doesn't finish).
Comment 40 Bing Zhao 2014-07-01 22:58:49 UTC
The lockup should be a driver issue. The only way you can do is reboot. I hope you don't have any important data on this machine.
Is the whole system hanging? If not, are you able to get some sysme dumps?

echo w > /proc/sysrq-trigger
echo d > /proc/sysrq-trigger
echo t > /proc/sysrq-trigger
Comment 41 kungfoobar+kbt 2014-07-01 23:57:20 UTC
Usually when that happens, the system is more or less usable, but some or all USB devices stop working and in the syslog I can see this sort of messages all the time:

[ 2039.509483] usb 2-2: mwifiex_usb_host_to_card: usb_submit_urb failed
[ 2039.509488] usb 2-2: DNLD_CMD: host to card failed
[ 2039.509504] usb 2-2: failed to get signal information
[ 2045.521492] usb 2-2: mwifiex_usb_host_to_card: usb_submit_urb failed
[ 2045.521503] usb 2-2: DNLD_CMD: host to card failed
[ 2045.521526] usb 2-2: failed to get signal information
[ 2045.521704] usb 2-2: mwifiex_usb_host_to_card: usb_submit_urb failed
[ 2045.521709] usb 2-2: DNLD_CMD: host to card failed
[ 2045.521726] usb 2-2: failed to get signal information
[ 2046.227364] usb 2-2: 4296711008 : Tx timeout(#3), bss_type-num = 0-0
[ 2051.525237] usb 2-2: mwifiex_usb_host_to_card: usb_submit_urb failed
[ 2051.525248] usb 2-2: DNLD_CMD: host to card failed
[ 2051.525286] usb 2-2: failed to get signal information

I do have important data, but usually commited to git repos or backed up in some way. I'll make those dumps when I'm using mwifiex again (I'm using an external dongle now).
Comment 42 Reyad Attiyat 2014-07-02 00:55:45 UTC
Hey Bing Zhao,

I have tested both those patches that run memset. I can report that the driver does seem more stable. I have only done limited testing and it has passed scenarios that would have a high probability of crashing before the patch. I'm running a surface pro 2 with kernel 3.16-rc2.

Please can someone else test these patches I'd love to have this bug closed. I feel like I'm just getting lucky.

Thanks Bing for keeping this bug updated and all your hard work on the kernel. I really appreciate everything you have done for this driver.
Comment 43 Bing Zhao 2014-07-02 01:39:08 UTC
@Reyad, Kudos to the Marvell guys behind me and people like you from the community. Without your consistent supports for testing, debugging, collecting logs, we could not fix this bug. Again, thanks much for all people who worked on this. 

@kungfoobar+kbt, I hope these patches can solve the issue for you too. Please be very careful if you have to put your data in this development machine as the system may crash/hang unexpectedly at any time.
Comment 44 kungfoobar+kbt 2014-07-02 17:13:12 UTC
Just like Reyad, I think it's more stable after the patches, or at least it didn't crash after half an hour of crash-prone activities... I hope it's the final fix! Thank you Bing Zhao!

There's a weird issue though, bluetooth mouse is a bit sluggish on heavy network load, and turning off and on the mouse doesn't fix it, but limiting or stopping the network activities does. I guess it's normal.
Comment 45 kungfoobar+kbt 2014-07-02 21:10:41 UTC
Wireless stopped working. Internet was going slow and dmesg was spitting messages like this:

[ 9128.034995] usb 2-2: PREP_CMD: FW is in bad state
[ 9128.035000] usb 2-2: failed to get signal information
[ 9128.035015] usb 2-2: PREP_CMD: FW is in bad state
[ 9128.035016] usb 2-2: failed to get signal information
[ 9129.819319] usb 2-2: 4297172244 : Tx timeout(#59), bss_type-num = 0-0
[ 9134.042822] usb 2-2: PREP_CMD: FW is in bad state
[ 9134.042829] usb 2-2: failed to get signal information
[ 9134.042852] usb 2-2: PREP_CMD: FW is in bad state
[ 9134.042854] usb 2-2: failed to get signal information
[ 9139.830335] usb 2-2: 4297174744 : Tx timeout(#60), bss_type-num = 0-0
[ 9140.043951] usb 2-2: PREP_CMD: FW is in bad state
[ 9140.043957] usb 2-2: failed to get signal information
[ 9140.043977] usb 2-2: PREP_CMD: FW is in bad state
[ 9140.043979] usb 2-2: failed to get signal information

Then I restarted the router (it's a shitty ADSL router given by the ISP for free that needs to be restarted from time to time), and both WiFi and bluetooth stopped working. Here's my full dmesg for this session: http://pastebin.com/q6YJVKWC

sysrq W http://pastebin.com/5Px1YtRF
sysrq D yelds no result (help message)
sysrq T http://pastebin.com/tHwBmVjv
Comment 46 kungfoobar+kbt 2014-07-03 02:53:07 UTC
Happened again. Whole dmesg with sysrq stuff here: http://www.pasteall.org/52590

It booted up with external WiFi and bluetooth dongles and I disconnected them later on, and it worked for some hours. When it stopped working, I tried to reset the mwifiex device with this http://palimpsest.minivi.com/linux/utils/usbreset/ and the whole system hung, not even sysrq were working.
Comment 47 Zwane Mwaikambo 2014-07-03 06:13:00 UTC
I can confirm that wireless networking is now a lot more stable than before but i have only been running for about a few hours now, whereas it would fail within an hour before. I'll give another status update within a few days.
Comment 48 Bing Zhao 2014-07-03 07:48:04 UTC
Thanks kungfoobar+kbt and Zwane.

@kungfoobar+kbt, based on the logs the issue in Comment #46 appears different than Comment #45 . Will look into details.
Comment 49 Zwane Mwaikambo 2014-07-03 20:27:52 UTC
Tested kernel version was 3.16-rc3. Bing, is there a separate bug open for suspend/resume stability with mwifiex_usb that you're aware of? Thanks!!
Comment 50 Reyad Attiyat 2014-07-03 21:15:23 UTC
(In reply to Zwane Mwaikambo from comment #49)
> Tested kernel version was 3.16-rc3. Bing, is there a separate bug open for
> suspend/resume stability with mwifiex_usb that you're aware of? Thanks!!

Hello Zwane,

Which issues do you experience with suspend/resume? Which version of systemd are you using or which distro have you been running? 

The reason I ask is I had issues with Fedora 20 and Gnome Settings Daemon 3.12. Systemd was reporting the device as a virtual machine which caused gnome-setting-daemon to respond in strange ways. It would only suspend once and then not suspend after that.

Try running systemd-detect-virt If it returns "microsoft" I'd try updating systemd to the newest release. It should return "none" if you're on real hardware.

Good Luck
Comment 51 Reyad Attiyat 2014-07-04 20:44:37 UTC
Hey Bing,

I finally experienced a driver/firmware crash similar to kungkoobar. The driver is still more stable as before it would crash almost everytime I tried downloading anything over 1GB in size. This crash happened after lite internet usage. I should note I have not tested bluetooth and have blacklisted it from loading the module.

[18100.586256] usb 2-2: PREP_CMD: FW is in bad state
[18100.586265] usb 2-2: failed to get signal information
[18100.586518] usb 2-2: PREP_CMD: FW is in bad state
[18100.586523] usb 2-2: failed to get signal information
[18103.289748] ------------[ cut here ]------------
[18103.289780] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:264 dev_watchdog+0x258/0x270()
[18103.289783] NETDEV WATCHDOG: mlan0 (usb): transmit queue 1 timed out
[18103.289785] Modules linked in: ip6t_rpfilter ip6t_REJECT xt_conntrack fuse mwifiex_usb mwifiex cfg80211 snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi hid_sensor_als hid_sensor_magn_3d hid_sensor_incl_3d snd_hda_intel hid_sensor_rotation hid_sensor_gyro_3d hid_sensor_accel_3d snd_hda_controller hid_sensor_trigger hid_sensor_iio_common industrialio_triggered_buffer kfifo_buf snd_hda_codec industrialio x86_pkg_temp_thermal coretemp kvm_intel rfkill snd_hwdep iTCO_wdt snd_seq iTCO_vendor_support uvcvideo kvm snd_seq_device vfat fat crc32_pclmul snd_pcm ebtable_nat crc32c_intel ghash_clmulni_intel videobuf2_vmalloc ebtable_broute bridge videobuf2_memops videobuf2_core stp llc microcode v4l2_common ebtable_filter ebtables mei_me videodev joydev ip6table_nat snd_timer mei hid_sensor_hub
[18103.289886]  nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 snd ip6table_mangle pcspkr media ip6table_security hid_multitouch i2c_i801 ip6table_raw ip6table_filter lpc_ich ip6_tables soundcore tpm_infineon iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack tpm_tis tpm iptable_mangle iptable_security iptable_raw i2c_hid i2c_designware_platform i2c_designware_core binfmt_misc i915 i2c_algo_bit drm_kms_helper drm sd_mod i2c_core video
[18103.289952] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W I   3.16.0-rc3+ #108
[18103.289955] Hardware name: Microsoft Corporation Surface Pro 2/Surface Pro 2, BIOS 2.03.0250 09/06/2013
[18103.289958]  0000000000000009 ffff88011ac03d68 ffffffff8179cbd0 ffff88011ac03db0
[18103.289966]  ffff88011ac03da0 ffffffff8109585d 0000000000000001 ffff8800d827eb18
[18103.289975]  0000000000000000 ffff8800cea522a0 0000000000000004 ffff88011ac03e00
[18103.289983] Call Trace:
[18103.289986]  <IRQ>  [<ffffffff8179cbd0>] dump_stack+0x4e/0x7a
[18103.289997]  [<ffffffff8109585d>] warn_slowpath_common+0x7d/0xa0
[18103.290002]  [<ffffffff810958cc>] warn_slowpath_fmt+0x4c/0x50
[18103.290007]  [<ffffffff81684498>] dev_watchdog+0x258/0x270
[18103.290012]  [<ffffffff81684240>] ? dev_graft_qdisc+0x80/0x80
[18103.290017]  [<ffffffff810a5c1a>] call_timer_fn+0x8a/0x330
[18103.290022]  [<ffffffff810a5b95>] ? call_timer_fn+0x5/0x330
[18103.290027]  [<ffffffff81684240>] ? dev_graft_qdisc+0x80/0x80
[18103.290032]  [<ffffffff810a6144>] run_timer_softirq+0x284/0x3d0
[18103.290038]  [<ffffffff8109c78e>] __do_softirq+0x14e/0x4e0
[18103.290043]  [<ffffffff8109cd05>] irq_exit+0xc5/0xd0
[18103.290049]  [<ffffffff817a9814>] smp_apic_timer_interrupt+0x44/0x50
[18103.290053]  [<ffffffff817a7802>] apic_timer_interrupt+0x72/0x80
[18103.290056]  <EOI>  [<ffffffff81601371>] ? cpuidle_enter_state+0x51/0xc0
[18103.290066]  [<ffffffff816014c7>] cpuidle_enter+0x17/0x20
[18103.290072]  [<ffffffff810eddcc>] cpu_startup_entry+0x3cc/0x790
[18103.290078]  [<ffffffff81794fc1>] rest_init+0x131/0x140
[18103.290085]  [<ffffffff81f22f53>] start_kernel+0x438/0x445
[18103.290090]  [<ffffffff81f22120>] ? early_idt_handlers+0x120/0x120
[18103.290096]  [<ffffffff81f224d7>] x86_64_start_reservations+0x2a/0x2c
[18103.290101]  [<ffffffff81f22617>] x86_64_start_kernel+0x13e/0x14d
[18103.290103] ---[ end trace 95445a2581389d3c ]---
[18103.290108] usb 2-2: 4312781009 : Tx timeout(#1), bss_type-num = 0-0
[18106.582999] usb 2-2: PREP_CMD: FW is in bad state
[18106.583007] usb 2-2: failed to get signal information
[18106.583219] usb 2-2: PREP_CMD: FW is in bad state
[18106.583223] usb 2-2: failed to get signal information
[18108.294023] usb 2-2: 4312786016 : Tx timeout(#2), bss_type-num = 0-0
[18112.579889] usb 2-2: PREP_CMD: FW is in bad state
[18112.579898] usb 2-2: failed to get signal information
[18112.580161] usb 2-2: PREP_CMD: FW is in bad state
[18112.580167] usb 2-2: failed to get signal information
[18118.288422] usb 2-2: 4312796016 : Tx timeout(#3), bss_type-num = 0-0
Comment 52 Zwane Mwaikambo 2014-07-07 04:43:17 UTC
I haven't experienced a crash yet but have only used it for about 8 hours over the past few days (30G+ of traffic).
Comment 53 Zwane Mwaikambo 2014-07-07 06:13:32 UTC
(In reply to Reyad Attiyat from comment #50)
> (In reply to Zwane Mwaikambo from comment #49)
> > Tested kernel version was 3.16-rc3. Bing, is there a separate bug open for
> > suspend/resume stability with mwifiex_usb that you're aware of? Thanks!!
> 
> Hello Zwane,
> 
> Which issues do you experience with suspend/resume? Which version of systemd
> are you using or which distro have you been running? 

Typically the system won't come out of suspend and will require a hard reset. Or it will get to the point of resuming the display but freeze solidly. I have observed this from Ubuntu 13.04 -> 14.04

> The reason I ask is I had issues with Fedora 20 and Gnome Settings Daemon
> 3.12. Systemd was reporting the device as a virtual machine which caused
> gnome-setting-daemon to respond in strange ways. It would only suspend once
> and then not suspend after that.

suspending is not a problem for me, only resume.

> Try running systemd-detect-virt If it returns "microsoft" I'd try updating
> systemd to the newest release. It should return "none" if you're on real
> hardware.

I don't think that's the issue for me, the problem really seems to be bluetooth autosuspend related, my patch workarounds to disable it have made suspend/resume considerably more stable.
Comment 54 Keith Amidon 2014-07-09 20:36:51 UTC
For me, running 3.16rc4 + the two patches from comment 38 wireless seems to connect more reliably and remain operational longer, but it still eventually dies, the very similar to the call trace shown in comment #51.  However, the call trace is slightly different:


Jul 09 11:57:04 kea-tablet kernel: usb 2-2: mwifiex_cmd_timeout_func: Timeout cmd id (1404932224.739614) = 0x107, act = 0x0
Jul 09 11:57:04 kea-tablet kernel: usb 2-2: num_data_h2c_failure = 0
Jul 09 11:57:04 kea-tablet kernel: usb 2-2: num_cmd_h2c_failure = 0
Jul 09 11:57:04 kea-tablet kernel: usb 2-2: is_cmd_timedout = 1
Jul 09 11:57:04 kea-tablet kernel: usb 2-2: num_tx_timeout = 0
Jul 09 11:57:04 kea-tablet kernel: usb 2-2: last_cmd_index = 3
Jul 09 11:57:04 kea-tablet kernel: usb 2-2: last_cmd_id: 07 01 07 01 07 01 07 01 07 01
Jul 09 11:57:04 kea-tablet kernel: usb 2-2: last_cmd_act: 00 00 00 00 00 00 00 00 00 00
Jul 09 11:57:04 kea-tablet kernel: usb 2-2: last_cmd_resp_index = 2
Jul 09 11:57:04 kea-tablet kernel: usb 2-2: last_cmd_resp_id: 07 81 07 81 07 81 07 81 07 81
Jul 09 11:57:04 kea-tablet kernel: usb 2-2: last_event_index = 3
Jul 09 11:57:04 kea-tablet kernel: usb 2-2: last_event: 58 00 58 00 58 00 58 00 58 00
Jul 09 11:57:04 kea-tablet kernel: usb 2-2: data_sent=1 cmd_sent=0
Jul 09 11:57:04 kea-tablet kernel: usb 2-2: ps_mode=1 ps_state=0
Jul 09 11:57:04 kea-tablet kernel: usb 2-2: cmd timeout
Jul 09 11:57:06 kea-tablet kernel: ------------[ cut here ]------------
Jul 09 11:57:06 kea-tablet kernel: WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:264 dev_watchdog+0x259/0x270()
Jul 09 11:57:06 kea-tablet kernel: NETDEV WATCHDOG: mlan0 (usb): transmit queue 1 timed out
Jul 09 11:57:06 kea-tablet kernel: Modules linked in: tun rfcomm fuse ecb btusb bnep bluetooth 6lowpan_iphc openvswitch gre vxlan libcrc32c snd_hda_codec_hdmi nls_iso8859_1 nls_cp437 vfat fat iTCO_wdt iTCO_vendor_support tpm_infineon coretemp hwmon intel_rapl x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel mwifiex_usb mwifiex aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd uvcvideo cfg80211 snd_hda_codec_realtek snd_hda_codec_generic videobuf2_vmalloc videobuf2_memops videobuf2_core microcode v4l2_common i915 videodev evdev snd_hda_intel joydev rfkill media snd_hda_controller mousedev drm_kms_helper mac_hid pcspkr hid_multitouch snd_hda_codec drm snd_hwdep snd_pcm mei_me snd_timer intel_gtt mei snd i2c_algo_bit soundcore lpc_ich i2c_i801
Jul 09 11:57:06 kea-tablet kernel:  thermal tpm_tis tpm i2c_hid gpio_lynxpoint battery i2c_designware_platform i2c_designware_core i2c_core dw_dmac dw_dmac_core 8250_dw video spi_pxa2xx_platform ac button processor ext4 crc16 mbcache jbd2 hid_sensor_hub hid_generic usbhid hid sd_mod crc_t10dif crct10dif_common ahci libahci libata scsi_mod ehci_pci xhci_hcd ehci_hcd usbcore usb_common sdhci_acpi sdhci led_class mmc_core
Jul 09 11:57:06 kea-tablet kernel: CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W     3.16.0-1-mainline #1
Jul 09 11:57:06 kea-tablet kernel: Hardware name: Microsoft Corporation Surface Pro 2/Surface Pro 2, BIOS 2.04.0950 05/12/2014
Jul 09 11:57:06 kea-tablet kernel:  0000000000000000 e24e783c1974ec45 ffff88021fa83d90 ffffffff8151c759
Jul 09 11:57:06 kea-tablet kernel:  ffff88021fa83dd8 ffff88021fa83dc8 ffffffff8106e48d 0000000000000001
Jul 09 11:57:06 kea-tablet kernel:  ffff88020f9f1440 0000000000000001 ffff88021021a000 0000000000000004
Jul 09 11:57:06 kea-tablet kernel: Call Trace:
Jul 09 11:57:06 kea-tablet kernel:  <IRQ>  [<ffffffff8151c759>] dump_stack+0x4d/0x6f
Jul 09 11:57:06 kea-tablet kernel:  [<ffffffff8106e48d>] warn_slowpath_common+0x7d/0xa0
Jul 09 11:57:06 kea-tablet kernel:  [<ffffffff8106e50c>] warn_slowpath_fmt+0x5c/0x80
Jul 09 11:57:06 kea-tablet kernel:  [<ffffffff8144bd99>] dev_watchdog+0x259/0x270
Jul 09 11:57:06 kea-tablet kernel:  [<ffffffff8144bb40>] ? dev_graft_qdisc+0x80/0x80
Jul 09 11:57:06 kea-tablet kernel:  [<ffffffff8107b826>] call_timer_fn+0x36/0x160
Jul 09 11:57:06 kea-tablet kernel:  [<ffffffff8144bb40>] ? dev_graft_qdisc+0x80/0x80
Jul 09 11:57:06 kea-tablet kernel:  [<ffffffff8107c254>] run_timer_softirq+0x274/0x320
Jul 09 11:57:06 kea-tablet kernel:  [<ffffffff810736f2>] __do_softirq+0xf2/0x2e0
Jul 09 11:57:06 kea-tablet kernel:  [<ffffffff81073a36>] irq_exit+0x86/0xb0
Jul 09 11:57:06 kea-tablet kernel:  [<ffffffff815250e4>] smp_apic_timer_interrupt+0x44/0x50
Jul 09 11:57:06 kea-tablet kernel:  [<ffffffff8152336d>] apic_timer_interrupt+0x6d/0x80
Jul 09 11:57:06 kea-tablet kernel:  <EOI>  [<ffffffff813e79ec>] ? cpuidle_enter_state+0x4c/0xc0
Jul 09 11:57:06 kea-tablet kernel:  [<ffffffff813e7b47>] cpuidle_enter+0x17/0x20
Jul 09 11:57:06 kea-tablet kernel:  [<ffffffff810b71ff>] cpu_startup_entry+0x32f/0x520
Jul 09 11:57:06 kea-tablet kernel:  [<ffffffff810469f0>] start_secondary+0x250/0x2f0
Jul 09 11:57:06 kea-tablet kernel: ---[ end trace 67e549ffe0bb0ae8 ]---
Jul 09 11:57:06 kea-tablet kernel: usb 2-2: 4295811000 : Tx timeout(#1), bss_type-num = 0-0
Jul 09 11:57:09 kea-tablet kernel: usb 2-2: PREP_CMD: FW is in bad state
Jul 09 11:57:09 kea-tablet kernel: usb 2-2: scan failed: -1
Jul 09 11:57:16 kea-tablet kernel: usb 2-2: 4295814000 : Tx timeout(#2), bss_type-num = 0-0

This seems to be fatal in that wireless won't start working again until I reboot.  I also see a different trace fairly frequently before this from which it seems to be able to recover, and which seems different from everything else reported here:

Jul 09 11:05:36 kea-tablet kernel: usb 2-2: ASSOC_RESP: failed, status code=1 err=0xfffc a_id=0xffff
Jul 09 11:05:36 kea-tablet kernel: cfg80211: Calling CRDA for country: US
Jul 09 11:05:36 kea-tablet NetworkManager[235]: <warn> Connection disconnected (reason 3)
Jul 09 11:05:36 kea-tablet NetworkManager[235]: <info> (mlan0): supplicant interface state: associating -> disconnected
Jul 09 11:05:36 kea-tablet NetworkManager[235]: <info> (mlan0): supplicant interface state: disconnected -> scanning
Jul 09 11:05:36 kea-tablet kernel: ------------[ cut here ]------------
Jul 09 11:05:36 kea-tablet kernel: WARNING: CPU: 2 PID: 142 at net/wireless/reg.c:1814 reg_process_hint+0x2d1/0x460 [cfg80211]()
Jul 09 11:05:36 kea-tablet kernel: Modules linked in: rfcomm fuse ecb btusb bnep bluetooth 6lowpan_iphc openvswitch gre vxlan libcrc32c snd_hda_codec_hdmi nls_iso8859_1 nls_cp437 vfat fat iTCO_wdt iTCO_vendor_support tpm_infineon coretemp hwmon intel_rapl x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel mwifiex_usb mwifiex aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd uvcvideo cfg80211 snd_hda_codec_realtek snd_hda_codec_generic videobuf2_vmalloc videobuf2_memops videobuf2_core microcode v4l2_common i915 videodev evdev snd_hda_intel joydev rfkill media snd_hda_controller mousedev drm_kms_helper mac_hid pcspkr hid_multitouch snd_hda_codec drm snd_hwdep snd_pcm mei_me snd_timer intel_gtt mei snd i2c_algo_bit soundcore lpc_ich i2c_i801
Jul 09 11:05:36 kea-tablet kernel:  thermal tpm_tis tpm i2c_hid gpio_lynxpoint battery i2c_designware_platform i2c_designware_core i2c_core dw_dmac dw_dmac_core 8250_dw video spi_pxa2xx_platform ac button processor ext4 crc16 mbcache jbd2 hid_sensor_hub hid_generic usbhid hid sd_mod crc_t10dif crct10dif_common ahci libahci libata scsi_mod ehci_pci xhci_hcd ehci_hcd usbcore usb_common sdhci_acpi sdhci led_class mmc_core
Jul 09 11:05:36 kea-tablet kernel: CPU: 2 PID: 142 Comm: kworker/2:2 Not tainted 3.16.0-1-mainline #1
Jul 09 11:05:36 kea-tablet kernel: Hardware name: Microsoft Corporation Surface Pro 2/Surface Pro 2, BIOS 2.04.0950 05/12/2014
Jul 09 11:05:36 kea-tablet kernel: Workqueue: events reg_todo [cfg80211]
Jul 09 11:05:36 kea-tablet kernel:  0000000000000000 000000008432f9c8 ffff880212f13d50 ffffffff8151c759
Jul 09 11:05:36 kea-tablet kernel:  0000000000000000 ffff880212f13d88 ffffffff8106e48d ffff8802029680c0
Jul 09 11:05:36 kea-tablet kernel:  ffff88020f9f0a60 ffff8802029680dc ffff88021fb18000 0ffff88021fb1800
Jul 09 11:05:36 kea-tablet kernel: Call Trace:
Jul 09 11:05:36 kea-tablet kernel:  [<ffffffff8151c759>] dump_stack+0x4d/0x6f
Jul 09 11:05:36 kea-tablet kernel:  [<ffffffff8106e48d>] warn_slowpath_common+0x7d/0xa0
Jul 09 11:05:36 kea-tablet kernel:  [<ffffffff8106e5ba>] warn_slowpath_null+0x1a/0x20
Jul 09 11:05:36 kea-tablet kernel:  [<ffffffffa07df421>] reg_process_hint+0x2d1/0x460 [cfg80211]
Jul 09 11:05:36 kea-tablet kernel:  [<ffffffffa07df629>] reg_todo+0x79/0x1a0 [cfg80211]
Jul 09 11:05:36 kea-tablet kernel:  [<ffffffff81279f79>] ? ioc_release_fn+0x89/0xc0
Jul 09 11:05:36 kea-tablet kernel:  [<ffffffff8108afd8>] process_one_work+0x168/0x450
Jul 09 11:05:36 kea-tablet kernel:  [<ffffffff8108b60b>] worker_thread+0x6b/0x550
Jul 09 11:05:36 kea-tablet kernel:  [<ffffffff8108b5a0>] ? init_pwq.part.22+0x10/0x10
Jul 09 11:05:36 kea-tablet kernel:  [<ffffffff81091d1a>] kthread+0xea/0x100
Jul 09 11:05:36 kea-tablet kernel:  [<ffffffff81091c30>] ? kthread_create_on_node+0x1b0/0x1b0
Jul 09 11:05:36 kea-tablet kernel:  [<ffffffff8152247c>] ret_from_fork+0x7c/0xb0
Jul 09 11:05:36 kea-tablet kernel:  [<ffffffff81091c30>] ? kthread_create_on_node+0x1b0/0x1b0
Jul 09 11:05:36 kea-tablet kernel: ---[ end trace 67e549ffe0bb0adc ]---
Jul 09 11:05:36 kea-tablet kernel: cfg80211: Calling CRDA for country: US
Jul 09 11:05:36 kea-tablet kernel: ------------[ cut here ]------------
Jul 09 11:05:36 kea-tablet kernel: WARNING: CPU: 1 PID: 93 at net/wireless/reg.c:1814 reg_process_hint+0x2d1/0x460 [cfg80211]()
Jul 09 11:05:36 kea-tablet kernel: Modules linked in: rfcomm fuse ecb btusb bnep bluetooth 6lowpan_iphc openvswitch gre vxlan libcrc32c snd_hda_codec_hdmi nls_iso8859_1 nls_cp437 vfat fat iTCO_wdt iTCO_vendor_support tpm_infineon coretemp hwmon intel_rapl x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel mwifiex_usb mwifiex aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd uvcvideo cfg80211 snd_hda_codec_realtek snd_hda_codec_generic videobuf2_vmalloc videobuf2_memops videobuf2_core microcode v4l2_common i915 videodev evdev snd_hda_intel joydev rfkill media snd_hda_controller mousedev drm_kms_helper mac_hid pcspkr hid_multitouch snd_hda_codec drm snd_hwdep snd_pcm mei_me snd_timer intel_gtt mei snd i2c_algo_bit soundcore lpc_ich i2c_i801
Jul 09 11:05:36 kea-tablet kernel:  thermal tpm_tis tpm i2c_hid gpio_lynxpoint battery i2c_designware_platform i2c_designware_core i2c_core dw_dmac dw_dmac_core 8250_dw video spi_pxa2xx_platform ac button processor ext4 crc16 mbcache jbd2 hid_sensor_hub hid_generic usbhid hid sd_mod crc_t10dif crct10dif_common ahci libahci libata scsi_mod ehci_pci xhci_hcd ehci_hcd usbcore usb_common sdhci_acpi sdhci led_class mmc_core
Jul 09 11:05:36 kea-tablet kernel: CPU: 1 PID: 93 Comm: kworker/1:2 Tainted: G        W     3.16.0-1-mainline #1
Jul 09 11:05:36 kea-tablet kernel: Hardware name: Microsoft Corporation Surface Pro 2/Surface Pro 2, BIOS 2.04.0950 05/12/2014
Jul 09 11:05:36 kea-tablet kernel: Workqueue: events reg_todo [cfg80211]
Jul 09 11:05:36 kea-tablet kernel:  0000000000000000 000000007f3bea3f ffff8800da077d50 ffffffff8151c759
Jul 09 11:05:36 kea-tablet kernel:  0000000000000000 ffff8800da077d88 ffffffff8106e48d ffff8802029680c0
Jul 09 11:05:36 kea-tablet kernel:  ffff88020f9f0a60 ffff8802029680dc ffff88021fa98000 0ffff88021fa9800
Jul 09 11:05:36 kea-tablet kernel: Call Trace:
Jul 09 11:05:36 kea-tablet kernel:  [<ffffffff8151c759>] dump_stack+0x4d/0x6f
Jul 09 11:05:36 kea-tablet kernel:  [<ffffffff8106e48d>] warn_slowpath_common+0x7d/0xa0
Jul 09 11:05:36 kea-tablet kernel:  [<ffffffff8106e5ba>] warn_slowpath_null+0x1a/0x20
Jul 09 11:05:36 kea-tablet kernel:  [<ffffffffa07df421>] reg_process_hint+0x2d1/0x460 [cfg80211]
Jul 09 11:05:36 kea-tablet kernel:  [<ffffffffa07df629>] reg_todo+0x79/0x1a0 [cfg80211]
Jul 09 11:05:36 kea-tablet kernel:  [<ffffffff8108afd8>] process_one_work+0x168/0x450
Jul 09 11:05:36 kea-tablet kernel:  [<ffffffff8108b60b>] worker_thread+0x6b/0x550
Jul 09 11:05:36 kea-tablet kernel:  [<ffffffff8108b5a0>] ? init_pwq.part.22+0x10/0x10
Jul 09 11:05:36 kea-tablet kernel:  [<ffffffff81091d1a>] kthread+0xea/0x100
Jul 09 11:05:36 kea-tablet kernel:  [<ffffffff81091c30>] ? kthread_create_on_node+0x1b0/0x1b0
Jul 09 11:05:36 kea-tablet kernel:  [<ffffffff8152247c>] ret_from_fork+0x7c/0xb0
Jul 09 11:05:36 kea-tablet kernel:  [<ffffffff81091c30>] ? kthread_create_on_node+0x1b0/0x1b0
Jul 09 11:05:36 kea-tablet kernel: ---[ end trace 67e549ffe0bb0add ]---
Jul 09 11:05:36 kea-tablet kernel: cfg80211: Calling CRDA for country: US

I'm not sure if this is related or not.  If it is, I wonder if it might indicate that there is some kind of concurrency issue with the device being accessed by multiple CPUs simultaneously?
Comment 55 Jakub Zeman 2014-07-09 21:05:38 UTC
I just want to comment all the posts here.

I'm using SF Pro with Linux more then half a Year and from my experience the problem must be somewhere in suspend.

Now I'm using kernel 3.12.X (currently 3.12.24) and I have disabled suspend in drivers/net/wireless/mwifiex/usb.c like this:

static struct usb_driver mwifiex_usb_driver = {
        .name = usbdriver_name,
        .probe = mwifiex_usb_probe,
        .disconnect = mwifiex_usb_disconnect,
        .id_table = mwifiex_usb_table,
};

And for sure I added return 0; at the beginning of functions mwifiex_usb_resume() and mwifiex_usb_suspend().

Since then the driver won't crash completely (I still have kernel crashes but the mlan0 interface never goes completely down). When I keep the suspend/resume functional it crashes almost immediately when I connect to wireless network.

So Bing please try to focus on suspend/resume functionality of the driver.
Comment 56 David López 2014-07-09 23:27:22 UTC
I'm not sure if it's related with this issue, but suspend doesn't work for me, the Surface Pro 1 wakes up after a few seconds. I use 64 bits archlinux, with kernel 3.15.4 and systemd 214. That's the dmesg when I try 'systemctl suspend'

[  370.723691] PM: Syncing filesystems ... done.
[  370.736980] PM: Preparing system for mem sleep
[  370.737305] Freezing user space processes ... (elapsed 0.001 seconds) done.
[  370.738844] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[  370.739940] PM: Entering mem sleep
[  370.740495] Suspending console(s) (use no_console_suspend to debug)
[  370.743488] usb 1-3: None of the WOWLAN triggers enabled
[  370.743607] ieee80211 phy0: deleting the crypto keys
[  370.743610] usb 1-3: PREP_CMD: host entering sleep state
[  370.743611] ieee80211 phy0: deleting the crypto keys
[  370.743612] usb 1-3: PREP_CMD: host entering sleep state
[  370.743613] ieee80211 phy0: deleting the crypto keys
[  370.743615] usb 1-3: PREP_CMD: host entering sleep state
[  370.743616] ieee80211 phy0: deleting the crypto keys
[  370.743617] usb 1-3: PREP_CMD: host entering sleep state
[  370.743618] ieee80211 phy0: deleting the crypto keys
[  370.743619] usb 1-3: PREP_CMD: host entering sleep state
[  370.743620] ieee80211 phy0: deleting the crypto keys
[  370.743634] cfg80211: Calling CRDA to update world regulatory domain
[  370.743733] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[  370.744116] usb 1-3: usb_submit_urb failed
[  370.744936] sd 0:0:0:0: [sda] Stopping disk
[  371.297304] PM: suspend of devices complete after 556.456 msecs
[  371.310650] PM: late suspend of devices complete after 13.331 msecs
[  371.311313] ehci-pci 0000:00:1d.0: System wakeup enabled by ACPI
[  371.311366] ehci-pci 0000:00:1a.0: System wakeup enabled by ACPI
[  371.311416] xhci_hcd 0000:00:14.0: System wakeup enabled by ACPI
[  371.323850] PM: noirq suspend of devices complete after 13.190 msecs
[  371.324271] ACPI: Preparing to enter system sleep state S3
[  371.324439] PM: Saving platform NVS memory
[  371.332803] Disabling non-boot CPUs ...
[  371.332845] intel_pstate CPU 1 exiting
[  371.334037] kvm: disabling virtualization on CPU1
[  371.334296] smpboot: CPU 1 is now offline
[  371.334562] intel_pstate CPU 2 exiting
[  371.335721] kvm: disabling virtualization on CPU2
[  371.437184] smpboot: CPU 2 is now offline
[  371.437424] intel_pstate CPU 3 exiting
[  371.438548] kvm: disabling virtualization on CPU3
[  371.540578] smpboot: CPU 3 is now offline
[  371.542227] ACPI: Low-level resume complete
[  371.542279] PM: Restoring platform NVS memory
[  371.545074] Enabling non-boot CPUs ...
[  371.545133] x86: Booting SMP configuration:
[  371.545134] smpboot: Booting Node 0 Processor 1 APIC 0x2
[  371.556823] kvm: enabling virtualization on CPU1
[  371.559101] Intel pstate controlling: cpu 1
[  371.559197] CPU1 is up
[  371.559227] smpboot: Booting Node 0 Processor 2 APIC 0x1
[  371.570893] kvm: enabling virtualization on CPU2
[  371.573197] Intel pstate controlling: cpu 2
[  371.573279] CPU2 is up
[  371.573303] smpboot: Booting Node 0 Processor 3 APIC 0x3
[  371.584921] kvm: enabling virtualization on CPU3
[  371.587111] Intel pstate controlling: cpu 3
[  371.587236] CPU3 is up
[  371.595422] ACPI: Waking up from system sleep state S3
[  372.028544] ehci-pci 0000:00:1d.0: System wakeup disabled by ACPI
[  372.028574] ehci-pci 0000:00:1a.0: System wakeup disabled by ACPI
[  372.028582] xhci_hcd 0000:00:14.0: System wakeup disabled by ACPI
[  372.028880] PM: noirq resume of devices complete after 13.500 msecs
[  372.029411] PM: early resume of devices complete after 0.486 msecs
[  372.029580] mei_me 0000:00:16.0: irq 44 for MSI/MSI-X
[  372.029870] snd_hda_intel 0000:00:1b.0: irq 45 for MSI/MSI-X
[  372.030191] sd 0:0:0:0: [sda] Starting disk
[  372.231935] usb 4-1.6: reset full-speed USB device number 4 using ehci-pci
[  372.324296] PM: resume of devices complete after 294.740 msecs
[  372.324578] PM: Finishing wakeup.
[  372.324580] Restarting tasks ... done.
[  372.328910] video LNXVIDEO:00: Restoring backlight state
[  372.330933] cfg80211: World regulatory domain updated:
[  372.330940] cfg80211:  DFS Master region: unset
[  372.330942] cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp), (dfs_cac_time)
[  372.330947] cfg80211:   (2402000 KHz - 2472000 KHz @ 0 KHz), (N/A, 2000 mBm), (N/A)
[  372.330950] cfg80211:   (2457000 KHz - 2482000 KHz @ 0 KHz), (N/A, 2000 mBm), (N/A)
[  372.330953] cfg80211:   (2474000 KHz - 2494000 KHz @ 20000 KHz), (N/A, 2000 mBm), (N/A)
[  372.330956] cfg80211:   (5170000 KHz - 5250000 KHz @ 160000 KHz), (N/A, 2000 mBm), (N/A)
[  372.330959] cfg80211:   (5250000 KHz - 5330000 KHz @ 160000 KHz), (N/A, 2000 mBm), (0 s)
[  372.330962] cfg80211:   (5490000 KHz - 5730000 KHz @ 160000 KHz), (N/A, 2000 mBm), (0 s)
[  372.330965] cfg80211:   (5735000 KHz - 5835000 KHz @ 80000 KHz), (N/A, 2000 mBm), (N/A)
[  372.330969] cfg80211:   (57240000 KHz - 63720000 KHz @ 2160000 KHz), (N/A, 0 mBm), (N/A)
[  372.330989] cfg80211: Calling CRDA for country: ES
[  372.346398] cfg80211: Regulatory domain changed to country: ES
[  372.346402] cfg80211:  DFS Master region: ETSI
[  372.346404] cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp), (dfs_cac_time)
[  372.346407] cfg80211:   (2400000 KHz - 2483500 KHz @ 40000 KHz), (N/A, 2000 mBm), (N/A)
[  372.346409] cfg80211:   (5150000 KHz - 5250000 KHz @ 80000 KHz), (N/A, 2000 mBm), (N/A)
[  372.346411] cfg80211:   (5250000 KHz - 5350000 KHz @ 80000 KHz), (N/A, 2000 mBm), (0 s)
[  372.346413] cfg80211:   (5470000 KHz - 5725000 KHz @ 80000 KHz), (N/A, 2698 mBm), (0 s)
[  372.346415] cfg80211:   (57240000 KHz - 65880000 KHz @ 2160000 KHz), (N/A, 4000 mBm), (N/A)
[  372.348450] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  372.348746] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20140214/psargs-359)
[  372.348752] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff88011a8384b0), AE_NOT_FOUND (20140214/psparse-536)
[  372.349157] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20140214/psargs-359)
[  372.349162] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff88011a8384b0), AE_NOT_FOUND (20140214/psparse-536)
[  372.349251] ata1.00: configured for UDMA/100
[  373.515682] [drm] Enabling RC6 states: RC6 on, RC6p on, RC6pp off

The Surface Pro 1 wakes up after some seconds.

I've tried to remove and reinsert xhci:

1- 'rmmod xhci_hcd'
2- 'modprobe xhci-hcd'
3- 'systemctl suspend'

but the tablet still wakes up after some seconds:

[  978.038630] PM: Syncing filesystems ... done.
[  978.048781] PM: Preparing system for mem sleep
[  978.049280] Freezing user space processes ... (elapsed 0.001 seconds) done.
[  978.050871] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[  978.051995] PM: Entering mem sleep
[  978.052612] Suspending console(s) (use no_console_suspend to debug)
[  978.052736] usb 1-3: None of the WOWLAN triggers enabled
[  978.053149] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[  978.054358] sd 0:0:0:0: [sda] Stopping disk
[  978.606667] PM: suspend of devices complete after 553.675 msecs
[  978.620007] PM: late suspend of devices complete after 13.329 msecs
[  978.620385] ehci-pci 0000:00:1d.0: System wakeup enabled by ACPI
[  978.620428] ehci-pci 0000:00:1a.0: System wakeup enabled by ACPI
[  978.620462] xhci_hcd 0000:00:14.0: System wakeup enabled by ACPI
[  978.633269] PM: noirq suspend of devices complete after 13.254 msecs
[  978.633491] ACPI: Preparing to enter system sleep state S3
[  978.633658] PM: Saving platform NVS memory
[  978.651696] Disabling non-boot CPUs ...
[  978.651791] intel_pstate CPU 1 exiting
[  978.653245] kvm: disabling virtualization on CPU1
[  978.653526] smpboot: CPU 1 is now offline
[  978.654059] intel_pstate CPU 2 exiting
[  978.655462] kvm: disabling virtualization on CPU2
[  978.756601] smpboot: CPU 2 is now offline
[  978.757128] intel_pstate CPU 3 exiting
[  978.758422] kvm: disabling virtualization on CPU3
[  978.860034] smpboot: CPU 3 is now offline
[  978.862415] ACPI: Low-level resume complete
[  978.862472] PM: Restoring platform NVS memory
[  978.865282] Enabling non-boot CPUs ...
[  978.865341] x86: Booting SMP configuration:
[  978.865343] smpboot: Booting Node 0 Processor 1 APIC 0x2
[  978.877042] kvm: enabling virtualization on CPU1
[  978.879322] Intel pstate controlling: cpu 1
[  978.879420] CPU1 is up
[  978.879450] smpboot: Booting Node 0 Processor 2 APIC 0x1
[  978.891138] kvm: enabling virtualization on CPU2
[  978.893422] Intel pstate controlling: cpu 2
[  978.893504] CPU2 is up
[  978.893529] smpboot: Booting Node 0 Processor 3 APIC 0x3
[  978.905146] kvm: enabling virtualization on CPU3
[  978.907337] Intel pstate controlling: cpu 3
[  978.907464] CPU3 is up
[  978.917674] ACPI: Waking up from system sleep state S3
[  979.351872] ehci-pci 0000:00:1a.0: System wakeup disabled by ACPI
[  979.351920] ehci-pci 0000:00:1d.0: System wakeup disabled by ACPI
[  979.352076] xhci_hcd 0000:00:14.0: System wakeup disabled by ACPI
[  979.352427] PM: noirq resume of devices complete after 13.811 msecs
[  979.353259] PM: early resume of devices complete after 0.787 msecs
[  979.353368] mei_me 0000:00:16.0: irq 44 for MSI/MSI-X
[  979.353972] snd_hda_intel 0000:00:1b.0: irq 45 for MSI/MSI-X
[  979.355142] sd 0:0:0:0: [sda] Starting disk
[  979.555174] usb 4-1.6: reset full-speed USB device number 4 using ehci-pci
[  979.647555] PM: resume of devices complete after 294.138 msecs
[  979.647909] PM: Finishing wakeup.
[  979.647911] Restarting tasks ... done.
[  979.653829] video LNXVIDEO:00: Restoring backlight state
[  979.671703] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  979.672063] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20140214/psargs-359)
[  979.672070] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff88011a8384b0), AE_NOT_FOUND (20140214/psparse-536)
[  979.672572] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20140214/psargs-359)
[  979.672579] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff88011a8384b0), AE_NOT_FOUND (20140214/psparse-536)
[  979.672709] ata1.00: configured for UDMA/100
[  980.792268] [drm] Enabling RC6 states: RC6 on, RC6p on, RC6pp off

------------

However I've found a weird solution. Doing once

1- 'rmmod xhci_hcd'
2- 'systemctl suspend' (before reinsert xhci)
3- Press the power button to wake up the Surface
4- 'modprobe xhci-hcd' (after suspend)
5- 'netctl start my-profile' (to connect to internet)

seems to solve all the suspend problems in the current session, now I can successfully suspend all the times that I want, and the Surface also automatic connects to my-profile when it wakes up. But it's necessary to do steps 1-5 once in every session or suspend doesn't work.

That's the dmesg output in the successful suspend:

[ 2116.305786] PM: Syncing filesystems ... done.
[ 2116.321530] PM: Preparing system for mem sleep
[ 2116.322051] Freezing user space processes ... (elapsed 0.001 seconds) done.
[ 2116.323621] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[ 2116.324944] PM: Entering mem sleep
[ 2116.325588] Suspending console(s) (use no_console_suspend to debug)
[ 2116.329218] usb 1-3: None of the WOWLAN triggers enabled
[ 2116.329307] usb 1-3: PREP_CMD: host entering sleep state
[ 2116.329311] ieee80211 phy2: deleting the crypto keys
[ 2116.329313] usb 1-3: PREP_CMD: host entering sleep state
[ 2116.329314] ieee80211 phy2: deleting the crypto keys
[ 2116.329316] usb 1-3: PREP_CMD: host entering sleep state
[ 2116.329318] ieee80211 phy2: deleting the crypto keys
[ 2116.329320] usb 1-3: PREP_CMD: host entering sleep state
[ 2116.329321] ieee80211 phy2: deleting the crypto keys
[ 2116.329322] usb 1-3: PREP_CMD: host entering sleep state
[ 2116.329324] ieee80211 phy2: deleting the crypto keys
[ 2116.329325] usb 1-3: PREP_CMD: host entering sleep state
[ 2116.329327] ieee80211 phy2: deleting the crypto keys
[ 2116.329373] cfg80211: Calling CRDA to update world regulatory domain
[ 2116.329544] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[ 2116.332690] sd 0:0:0:0: [sda] Stopping disk
[ 2116.889999] PM: suspend of devices complete after 564.009 msecs
[ 2116.903326] PM: late suspend of devices complete after 13.312 msecs
[ 2116.903832] ehci-pci 0000:00:1d.0: System wakeup enabled by ACPI
[ 2116.903911] ehci-pci 0000:00:1a.0: System wakeup enabled by ACPI
[ 2116.903957] xhci_hcd 0000:00:14.0: System wakeup enabled by ACPI
[ 2116.916594] PM: noirq suspend of devices complete after 13.256 msecs
[ 2116.916831] ACPI: Preparing to enter system sleep state S3
[ 2116.917001] PM: Saving platform NVS memory
[ 2116.935476] Disabling non-boot CPUs ...
[ 2116.935569] intel_pstate CPU 1 exiting
[ 2116.936974] kvm: disabling virtualization on CPU1
[ 2116.937227] smpboot: CPU 1 is now offline
[ 2116.937481] intel_pstate CPU 2 exiting
[ 2116.938711] kvm: disabling virtualization on CPU2
[ 2117.039925] smpboot: CPU 2 is now offline
[ 2117.040474] intel_pstate CPU 3 exiting
[ 2117.041807] kvm: disabling virtualization on CPU3
[ 2117.143304] smpboot: CPU 3 is now offline
[ 2117.145700] ACPI: Low-level resume complete
[ 2117.145753] PM: Restoring platform NVS memory
[ 2117.148561] Enabling non-boot CPUs ...
[ 2117.148621] x86: Booting SMP configuration:
[ 2117.148622] smpboot: Booting Node 0 Processor 1 APIC 0x2
[ 2117.160272] kvm: enabling virtualization on CPU1
[ 2117.162573] Intel pstate controlling: cpu 1
[ 2117.162681] CPU1 is up
[ 2117.162711] smpboot: Booting Node 0 Processor 2 APIC 0x1
[ 2117.174432] kvm: enabling virtualization on CPU2
[ 2117.176724] Intel pstate controlling: cpu 2
[ 2117.176809] CPU2 is up
[ 2117.176831] smpboot: Booting Node 0 Processor 3 APIC 0x3
[ 2117.188466] kvm: enabling virtualization on CPU3
[ 2117.190736] Intel pstate controlling: cpu 3
[ 2117.190858] CPU3 is up
[ 2117.199140] ACPI: Waking up from system sleep state S3
[ 2117.631759] ehci-pci 0000:00:1a.0: System wakeup disabled by ACPI
[ 2117.631764] xhci_hcd 0000:00:14.0: System wakeup disabled by ACPI
[ 2117.631965] ehci-pci 0000:00:1d.0: System wakeup disabled by ACPI
[ 2117.632057] PM: noirq resume of devices complete after 13.419 msecs
[ 2117.632596] PM: early resume of devices complete after 0.485 msecs
[ 2117.632736] mei_me 0000:00:16.0: irq 42 for MSI/MSI-X
[ 2117.632964] snd_hda_intel 0000:00:1b.0: irq 44 for MSI/MSI-X
[ 2117.638168] sd 0:0:0:0: [sda] Starting disk
[ 2117.838455] usb 4-1.6: reset full-speed USB device number 4 using ehci-pci
[ 2117.930753] PM: resume of devices complete after 298.011 msecs
[ 2117.931094] PM: Finishing wakeup.
[ 2117.931097] Restarting tasks ... done.
[ 2117.938351] video LNXVIDEO:00: Restoring backlight state
[ 2117.949832] cfg80211: World regulatory domain updated:
[ 2117.949836] cfg80211:  DFS Master region: unset
[ 2117.949838] cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp), (dfs_cac_time)
[ 2117.949841] cfg80211:   (2402000 KHz - 2472000 KHz @ 0 KHz), (N/A, 2000 mBm), (N/A)
[ 2117.949843] cfg80211:   (2457000 KHz - 2482000 KHz @ 0 KHz), (N/A, 2000 mBm), (N/A)
[ 2117.949845] cfg80211:   (2474000 KHz - 2494000 KHz @ 20000 KHz), (N/A, 2000 mBm), (N/A)
[ 2117.949846] cfg80211:   (5170000 KHz - 5250000 KHz @ 160000 KHz), (N/A, 2000 mBm), (N/A)
[ 2117.949848] cfg80211:   (5250000 KHz - 5330000 KHz @ 160000 KHz), (N/A, 2000 mBm), (0 s)
[ 2117.949850] cfg80211:   (5490000 KHz - 5730000 KHz @ 160000 KHz), (N/A, 2000 mBm), (0 s)
[ 2117.949852] cfg80211:   (5735000 KHz - 5835000 KHz @ 80000 KHz), (N/A, 2000 mBm), (N/A)
[ 2117.949854] cfg80211:   (57240000 KHz - 63720000 KHz @ 2160000 KHz), (N/A, 0 mBm), (N/A)
[ 2117.949870] cfg80211: Calling CRDA for country: ES
[ 2117.953450] cfg80211: Regulatory domain changed to country: ES
[ 2117.953455] cfg80211:  DFS Master region: ETSI
[ 2117.953456] cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp), (dfs_cac_time)
[ 2117.953459] cfg80211:   (2400000 KHz - 2483500 KHz @ 40000 KHz), (N/A, 2000 mBm), (N/A)
[ 2117.953461] cfg80211:   (5150000 KHz - 5250000 KHz @ 80000 KHz), (N/A, 2000 mBm), (N/A)
[ 2117.953463] cfg80211:   (5250000 KHz - 5350000 KHz @ 80000 KHz), (N/A, 2000 mBm), (0 s)
[ 2117.953465] cfg80211:   (5470000 KHz - 5725000 KHz @ 80000 KHz), (N/A, 2698 mBm), (0 s)
[ 2117.953467] cfg80211:   (57240000 KHz - 65880000 KHz @ 2160000 KHz), (N/A, 4000 mBm), (N/A)
[ 2117.954992] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 2117.955323] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20140214/psargs-359)
[ 2117.955330] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff88011a8384b0), AE_NOT_FOUND (20140214/psparse-536)
[ 2117.955779] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20140214/psargs-359)
[ 2117.955786] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node ffff88011a8384b0), AE_NOT_FOUND (20140214/psparse-536)
[ 2117.955872] ata1.00: configured for UDMA/100
[ 2119.339006] [drm] Enabling RC6 states: RC6 on, RC6p on, RC6pp off

I wish this information helps.
Comment 57 Bing Zhao 2014-07-11 04:04:21 UTC
@Comment #49, we used to have an issue on usb autosuspend. We removed it in this patch:

adb07df mwifiex: do not advertise usb autosuspend support

After this patch I'm not aware of other suspend/resume issues from my side.


@Comment #51, did you see a command timeout happened earlier?
In kungfoobar+kbt's log (Comment #45), there was a cmd timeout like this:

[ 8551.408162] usb 2-2: mwifiex_cmd_timeout_func: Timeout cmd id (1404334030.958681) = 0xa4, act = 0x0


@Comment #53, we can consider to add vendor specific code in btusb.c to disable usb autosuspend for usb8797 device.


@Comment #54, I got "cmd 0x107 timeout" with some other none-USB devices as well. Will update here once we find a fix.


@Comment #55, since you are using 3.12 kernel, did you cherry-pick this patch?

adb07df mwifiex: do not advertise usb autosuspend support

Normally when connecting to a wireless network the suspend/resume handlers shouldn't be called unless you have something that triggers usb autosuspend.


@Comment #56, thanks for the info! We used to have an issue with xhci support in old firmware. Are you using the latest firmware from git.marvell.com?
Comment 58 kungfoobar+kbt 2014-07-11 06:28:44 UTC
In all my reports I have this patch applied:

@@ -50,6 +50,7 @@ static struct usb_driver btusb_driver;
 #define BTUSB_ATH3012		0x80
 #define BTUSB_INTEL		0x100
 #define BTUSB_BCM_PATCHRAM	0x200
+#define BTUSB_MWIFIEX		0x400
 
 static const struct usb_device_id btusb_table[] = {
 	/* Generic Bluetooth USB device */
@@ -237,6 +238,9 @@ static const struct usb_device_id blacklist_table[] = {
 	{ USB_DEVICE(0x8087, 0x07dc), .driver_info = BTUSB_INTEL },
 	{ USB_DEVICE(0x8087, 0x0a2a), .driver_info = BTUSB_INTEL },
 
+	/* Marvell 8797 - mwifiex-usb composite wifi+bt device */
+	{ USB_DEVICE(0x1286, 0x2044), .driver_info = BTUSB_MWIFIEX },
+
 	{ }	/* Terminating entry */
 };
 
@@ -1641,6 +1645,11 @@ static int btusb_probe(struct usb_interface *intf,
 	if (id->driver_info & BTUSB_INTEL)
 		hdev->setup = btusb_setup_intel;
 
+	if (id->driver_info & BTUSB_MWIFIEX) {
+		printk(KERN_INFO KBUILD_MODNAME ": mwifiex bt detected, disabling usb autosuspend\n");
+		usb_disable_autosuspend(data->udev);
+	}
+
 	/* Interface numbers are hardcoded in the specification */
 	data->isoc = usb_ifnum_to_if(data->udev, 1);
 


I tried a kernel without it and I couldn't connect via WiFi or Bluetooth, with any internal or external dongle.

After adding it back, WiFi works but BT doesn't (I don't know if it's related).
Comment 59 kungfoobar+kbt 2014-07-11 06:30:23 UTC
Sorry I didn't copy the patch header or mention the file.

diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c
index 6250fc2..754e572 100644
--- a/drivers/bluetooth/btusb.c
+++ b/drivers/bluetooth/btusb.c
Comment 60 Bing Zhao 2014-07-12 04:08:09 UTC
@Comment #58, thanks for patch. 

It disables usb autosuspend for usb8797 Bluetooth, same as this patch for WLAN:
adb07df mwifiex: do not advertise usb autosuspend support

I guess it's worth sending it upstream.
Comment 61 Zwane Mwaikambo 2014-07-12 04:27:09 UTC
I can also confirm having to use a similar patch to get a system that can resume.
Comment 62 David López 2014-07-15 15:46:52 UTC
@Comment #57

I think I'm using the last version from git. I overwrote usb8797_uapsta.bin in /usr/lib/firmware/mrvl with 541.048 bytes size, md5sum: e7dbdfad0d030910ff35b3ca18af8bbd

downloaded from 

http://git.marvell.com/?p=mwifiex-firmware.git;a=blob;f=mrvl/usb8797_uapsta.bin;h=196379881310cba56e4fc5482d82e73008226cde;hb=e24897b869455e9aea6b5892e21d72585e136376

My dmesg returns: 

[    4.786943] usb 1-3: driver_version = mwifiex 1.0 (14.68.29.p26) 

Do I need to do anything more to get the last version?
Comment 63 Keith Amidon 2014-07-15 16:44:31 UTC
On my Surface Pro2 with the following setup I'm now getting much more consistent association with the wifi access points I use at home and at work:

1. Up-to-date Arch linux
2. The mainline linux 3.16rc5 kernel with the following patches:
   a. All the sensor date patches from bug #73321
   b. The initialization patches from comment #38
   a. The bluetooth autosuspend patch from comment #58 and comment #59
3. The updated firmware from the fourth comment in bug #68001
4. Systemd monitoring of the lid state disabled as described in the
   12th comment of bug #79381

I can suspend and resume and have wifi will reassociate after the resume.

It's hard to be sure but I think it is likely that the bluetooth autosuspend patch made the biggest difference in the reliability of my wifi connections.  I suspect pushing this patch plus the updated firmware upstream will result in things basically working for more people.

Unfortunately, I'm still seeing two problems.

Problem 1: Occasionally, I'll get connection drops which are recoverable.  These seem to happen when a large data transfer is in progress and interrupt the transfer.  These are associated with tracebacks like those in comment #36 or those in comment #54.  For me they often come in pairs with multiple CPUs reporting the same traceback at the same time.  Note that I am using Gnome 3.12 w/NetworkManager.  There are no other messages from the kernel about the mwifiex driver/device or the associated usb device before these occur in the most recent logs I've taken.

Problem 2: Eventually, the device will hang with the firmware in a bad state.  Usually ther will be a stack traces similar to comment #51 but varying somewhat around the time things hang.  A message about a mwifiex_cmd_timeout_func issue always seems to be present before running into trouble, something like:

   2998:Jul 15 08:32:43 kea-tablet kernel: usb 1-2: mwifiex_cmd_timeout_func: Timeout cmd id (1405438363.409535) = 0xa4, act = 0x0

One other interesting thing is that once this occurs, if I suspend and resume the resume does eventually succeed but it takes something like a minute, with the first 30 seconds or so showing no sign that the tablet is attempt to resume and then the display coming on but the tablet completely ignoring input (touch, mouse, or keyboard) for another 30 seconds.   I haven't tried multiple suspend/resume cycles to see if subsequent ones behave the same, but I suspect they do.

Is there any information that would helpful to gather when either of these situations occur?  I'm extremely motivated to get these two issues fixed as they will substantially improve the usefulness of the tablet for me and I feel like we're getting close to the solution.  I can reproduce the failures usually within 30-60 minutes at home or at work.
Comment 64 Keith Amidon 2014-07-15 16:56:54 UTC
I should also note that I updated my general device firmware to the latest version using Windows Update from the dual-boot instance of Windows 8.1 I kept on the tablet on something like July 10th.  I don't think this made any significant change in the wifi behavior, but it did seem to make the Type Cover and general USB much more reliable.  It is possible that there are some interactions between those USB improvements and the wifi driver that aren't obvious from the logs.
Comment 65 Bing Zhao 2014-07-17 19:03:36 UTC
We fixed a simular issue related to __cfg80211_connect_result WARN_ON.

http://marc.info/?l=linux-wireless&m=140562302630442&w=2

David, Keith: you can test this patch against the issue in Comment #36.
Comment 66 Keith Amidon 2014-07-18 05:23:28 UTC
I haven't had a lot of time to test yet, but with the patch linked in comment #65 I was able to transfer 1.4GB over the wireless on my home AP without having the firmware go into a bad state, which I've never been able to do before, so that is quite promsiing.  I also verifies that the wireless was able to reassociate after a suspend/resume cycle and that worked too.


I'll test some more tomorrow and post another update.

Bing: many thanks for your continued efforts getting to the bottom of this and updating the bug with pointers to possible solutions.
Comment 67 kungfoobar+kbt 2014-07-18 18:28:05 UTC
Tested patch in #65, connection crashed after several hours (during a resource intensive task, but probably unrelated). Dmesg and sysrq stuff: http://107.170.194.234/share/dmesg6.txt

I was also listening to music via bluetooth and the music kept going but very choppy. Tried to reconnect the headphones but timed out.
Comment 68 Bing Zhao 2014-07-18 18:41:29 UTC
@Comment 67: In the log there was a command timeout (0xa4) and then the system got stuck.

[10684.576882] usb 2-2: mwifiex_cmd_timeout_func: Timeout cmd id (1405707062.599334) = 0xa4, act = 0x0

This is the same problem in Comment #63 (Problem 2) and Comment #51 for which we haven't found the cause yet.
Comment 69 kungfoobar+kbt 2014-07-18 21:17:50 UTC
I don't know if I found a consistent way of crashing it or I'm having very bad luck.

[ 1435.051713] usb 2-2: mwifiex_cmd_timeout_func: Timeout cmd id (1405717896.639501) = 0xa4, act = 0x0
Comment 70 Bing Zhao 2014-07-18 22:44:03 UTC
Created attachment 143461 [details]
btusb.c disables usb autosuspend for Marvell devices

This patch is based on Comment #58 and it depends on the patch I posted on linux-bluetooth:
http://marc.info/?l=linux-bluetooth&m=140572005800553&w=2

I will upstream it if someone can test it.
Comment 71 kungfoobar+kbt 2014-07-20 14:28:17 UTC
I don't know if this is related, but now I have to execute this after booting:

sudo hciconfig hci0 reset

Before I can use bluetooth.
Comment 72 Keith Amidon 2014-07-20 15:05:30 UTC
I tried to test Bing's bluetooth patch but I think the patch from linux-bluetooth is intended for a later kernel version then 3.16rc5 since even after adjusting it to apply correctly it wouldbn't build.  If you can let me know which kernel it is intended for I may be able to try it out later today.

In other news, now that the wireless is generally much more reliable with the earlier patches referred to by this bug I spent some time trying to track down when it still locks up.  I tried both large transmit and receive traffic loads, things like copying large files, watching videos over the network, etc.  I found that I generally large amounts of receive traffic seemed fine but large amounts of transmit traffic seemed to cause problems.  In particular, using scp traffic to transfer a large file though a second SSH port forward (which I do occasionally in my normal work) seemed to trigger the problem quickly.

I think tried an experiment and lowered the MTU of the wireless link to 512 using the command 'ip link set mlan0 mtu 512'.  With this change I've been unable so far to cause the firmware to lock up, using the scp test or combining that with other workloads like watching videos over thet network, browsing, etc.  After setting the MTU back to 1500 I was able to cause a lockup on the second attempt to transfer the file using scp.

Based on all this it seems likely that there is a firmware problem when there is a queue of large packets to be transmitted.  I'm going to try using the tablet in normal use for the next couple of days with a reduced MTU and see if I have problems.
Comment 73 kungfoobar+kbt 2014-07-20 22:25:04 UTC
I think found out something that seems to make a crash much more likely: 35°C, fans (one of them a ceiling fan), uploading a file and 100% CPU load (e.g. compiling the kernel). A lower MTU didn't help.
Comment 74 Keith Amidon 2014-07-21 15:24:25 UTC
Unforutnately, I have to report that, while reducing the MTU does seem to again reduce wifi hangs for me, it doesn't completely resolve the situation.  My most recent hang (with the reduced MTU) had the following related log messages:

Jul 21 08:04:14 kea-tablet kernel: usb 1-2: cmd_wait_q terminated: 0
Jul 21 08:04:22 kea-tablet kernel: usb 1-2: mwifiex_cmd_timeout_func: Timeout cmd id (1405955062.484849) = 0xa4, act = 0x0
Jul 21 08:04:22 kea-tablet kernel: usb 1-2: num_data_h2c_failure = 0
Jul 21 08:04:22 kea-tablet kernel: usb 1-2: num_cmd_h2c_failure = 0
Jul 21 08:04:22 kea-tablet kernel: usb 1-2: is_cmd_timedout = 1
Jul 21 08:04:22 kea-tablet kernel: usb 1-2: num_tx_timeout = 0
Jul 21 08:04:22 kea-tablet kernel: usb 1-2: last_cmd_index = 3
Jul 21 08:04:22 kea-tablet kernel: usb 1-2: last_cmd_id: a4 00 7f 00 16 00 a4 00 16 00
Jul 21 08:04:22 kea-tablet kernel: usb 1-2: last_cmd_act: 00 00 00 00 00 00 00 00 00 00
Jul 21 08:04:22 kea-tablet kernel: usb 1-2: last_cmd_resp_index = 2
Jul 21 08:04:22 kea-tablet kernel: usb 1-2: last_cmd_resp_id: a4 80 7f 80 16 80 7f 80 16 80
Jul 21 08:04:22 kea-tablet kernel: usb 1-2: last_event_index = 1
Jul 21 08:04:22 kea-tablet kernel: usb 1-2: last_event: 58 00 58 00 58 00 58 00 58 00
Jul 21 08:04:22 kea-tablet kernel: usb 1-2: data_sent=1 cmd_sent=0
Jul 21 08:04:22 kea-tablet kernel: usb 1-2: ps_mode=1 ps_state=0
Jul 21 08:04:26 kea-tablet kernel: usb 1-2: cmd_wait_q terminated: 0
Jul 21 08:04:26 kea-tablet kernel: usb 1-2: PREP_CMD: FW is in bad state
Jul 21 08:04:26 kea-tablet kernel: usb 1-2: PREP_CMD: FW is in bad state
Jul 21 08:04:26 kea-tablet kernel: usb 1-2: failed to get signal information


I don't believe the cmd_wait_q terminated message has shown up before the mwifiex_cmd_timeout_func message before.
Comment 75 Bing Zhao 2014-07-22 04:03:07 UTC
@Comment #71: could you confirm that you applied patch from 
http://marc.info/?l=linux-bluetooth&m=140572005800553&w=2
and the patch in Attachment 143461 [details], and then get the problem that you have to do an hci0 reset? Could you send out some dmesg and the hcidump before you run hciconfig hci0 reset?

@Comment #72: the patch the from bluetooth-next will be integrated to 3.17 in upstream, but you can still apply it to 3.16rc5 for testing after applying the dependency patch first:
http://marc.info/?l=linux-bluetooth&m=140572005800553&w=2
(I will attached the patch to this bug).
Comment 76 Bing Zhao 2014-07-22 04:06:00 UTC
Created attachment 143831 [details]
The patch that is needed by patch in attachment 143461 [details]

same patch as http://marc.info/?l=linux-bluetooth&m=140572005800553&w=2
Comment 77 Keith Amidon 2014-07-22 06:05:10 UTC
To successfully build against 3.16rc6 with the bluetooth-next patch I also needed this patch:

http://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/include/net/bluetooth/hci_core.h?id=24c457e27076beb2a85b4213642a6388eb88f240

I'll update this bug with any change in behavior after I get a chance to test a bit.
Comment 78 Bing Zhao 2014-07-22 06:49:44 UTC
@Comment #77, Keith is right, I was using wireless-testing (wireless-next) tree which has the patch pulled from bluetooth-next already.
Comment 79 Michael Zanga 2014-10-16 02:29:59 UTC
I'm having a strange issue with this on my surface pro 2. I feel like I'm missing something obvious, but I'm not sure what...
I am running archlinux, and had the above patches working using an older kernel I compiled on a different computer (a much faster one). This was done as a test a couple months ago, since I'm now studying abroad and only have my surface to work on. I saw that the patches above were in kernel version 3.17, so I wanted to wait until it made it to archlinux's repositories to avoid having to compile kernels on this. It has now, so I installed version 3.17 (and headers), and there were no problems. So I set it installing updates that had built up over the last few months, and upon restarting I'm getting the mwifiex_cmd_timeout_func again and no wireless interface. I'd say one of the updates broke something, but none of them touched the driver. There was a firmware update package, but I tried reverting that to the one used when it worked but there was no change. Also tried updating linux again (to 3.17.1) but that also didn't help.
Comment 80 kungfoobar+kbt 2014-10-16 12:35:52 UTC
Are the patches included in kernel 3.16.0 or in backports-3.16.2-1? I also want to avoid compiling the kernel but I couldn't make it work after updating to Ubuntu 14.10 and compiling and installing the backports. In both cases I get this:

[   10.064113] usb 2-2: USB disconnect, device number 3
[   10.064175] mwifiex_usb: mwifiex_usb_disconnect: card or card->adapter is NULL
[   10.752753] usb 2-2: new high-speed USB device number 8 using xhci_hcd
[   10.882590] usb 2-2: New USB device found, idVendor=1286, idProduct=2044
[   10.882594] usb 2-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[   10.882597] usb 2-2: Product: Bluetooth and Wireless LAN Composite Device
[   10.882598] usb 2-2: Manufacturer: Marvell
[   10.882600] usb 2-2: SerialNumber: 501AC5128D80
[   10.882752] usb 2-2: ep 0x4 - rounding interval to 128 microframes, ep desc says 255 microframes
[   15.886673] usb 2-2: can't set config #1, error -110

I'd like not having to choose between an unstable kernel and no wifi/bluetooth. I've been 4 months with unstable kernels and using btrfs most of the time. Oddly enough I lost data with ext4 (whole FS corrupt) and I lost none with btrfs (other than the crashes/freezes that made me reboot).
Comment 81 Leo Alekseyev 2014-12-13 00:27:27 UTC
Surface Pro 1 running Ubuntu 14.10 with kernel 3.16.0-25 is still exhibiting (1) problems with resume from suspend and (2) random wifi breakage with traceback similar to that of Comment #51.  @Comment #56: I was able to get suspend working find by disabling xhci, so I suspect suspend issues in that comment are unrelated.

[ 5645.029087] usb 3-3: mwifiex_cmd_timeout_func: Timeout cmd id (1418332290.426933) = 0xa4, act = 0x0
[ 5645.029098] usb 3-3: num_data_h2c_failure = 0
[ 5645.029118] usb 3-3: num_cmd_h2c_failure = 0
[ 5645.029120] usb 3-3: is_cmd_timedout = 1
[ 5645.029121] usb 3-3: num_tx_timeout = 0
[ 5645.029123] usb 3-3: last_cmd_index = 2
[ 5645.029125] usb 3-3: last_cmd_id: 7f 00 16 00 a4 00 16 00 a4 00
[ 5645.029126] usb 3-3: last_cmd_act: 00 00 00 00 00 00 00 00 00 00
[ 5645.029128] usb 3-3: last_cmd_resp_index = 1
[ 5645.029130] usb 3-3: last_cmd_resp_id: 7f 80 16 80 7f 80 16 80 a4 80
[ 5645.029131] usb 3-3: last_event_index = 3
[ 5645.029133] usb 3-3: last_event: 34 00 33 00 33 00 37 00 58 00
[ 5645.029135] usb 3-3: data_sent=0 cmd_sent=0
[ 5645.029136] usb 3-3: ps_mode=1 ps_state=0
[ 5645.029141] usb 3-3: cmd timeout
[ 5645.029149] usb 3-3: failed to get signal information
[ 5645.029240] usb 3-3: PREP_CMD: FW is in bad state
[ 5645.029242] usb 3-3: failed to get signal information
...snip...
[ 5679.038421] usb 3-3: PREP_CMD: FW is in bad state
[ 5679.038426] usb 3-3: scan failed: -1
..snip...
[ 5713.861059] ------------[ cut here ]------------
[ 5713.861093] WARNING: CPU: 1 PID: 0 at /build/buildd/linux-3.16.0/net/sched/sch_generic.c:264 dev_watchdog+0x258/0x270()
[ 5713.861095] NETDEV WATCHDOG: mlan0 (usb): transmit queue 1 timed out
[ 5713.861096] Modules linked in: hidp hid_sensor_gyro_3d hid_sensor_rotation hid_sensor_accel_3d hid_sensor_als hid_sensor_magn_3d hid_sensor_incl_3d hid_sensor_trigger industrialio_triggered_buffer kfifo_buf industrialio hid_sensor_iio_common btusb mwifiex_usb mwifiex cfg80211 joydev intel_rapl x86_pkg_temp_thermal hid_multitouch intel_powerclamp kvm_intel hid_generic hid_sensor_hub kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq usbhid hid uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core v4l2_common videodev media bnep rfcomm snd_seq_device
[ 5713.861125]  snd_timer bluetooth 6lowpan_iphc i915 snd mac_hid parport_pc ppdev tpm_infineon video lp parport drm_kms_helper mei_me coretemp drm i2c_algo_bit mei lpc_ich soundcore nls_iso8859_1 ahci libahci
[ 5713.861138] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.16.0-25-generic #33-Ubuntu
[ 5713.861139] Hardware name: Microsoft Corporation Surface with Windows 8 Pro/Surface with Windows 8 Pro, BIOS 1.06.0350 04/30/2014
[ 5713.861141]  0000000000000009 ffff88011f283d98 ffffffff8178015c ffff88011f283de0
[ 5713.861143]  ffff88011f283dd0 ffffffff8106fd7d 0000000000000001 ffff8800b0f65c40
[ 5713.861145]  0000000000000001 ffff88009f000000 0000000000000004 ffff88011f283e30
[ 5713.861147] Call Trace:
[ 5713.861148]  <IRQ>  [<ffffffff8178015c>] dump_stack+0x45/0x56
[ 5713.861157]  [<ffffffff8106fd7d>] warn_slowpath_common+0x7d/0xa0
[ 5713.861159]  [<ffffffff8106fdec>] warn_slowpath_fmt+0x4c/0x50
[ 5713.861163]  [<ffffffff816d9ac0>] ? tcp_init_xmit_timers+0x30/0x30
[ 5713.861166]  [<ffffffff816a2678>] dev_watchdog+0x258/0x270
[ 5713.861168]  [<ffffffff816a2420>] ? dev_graft_qdisc+0x80/0x80
[ 5713.861171]  [<ffffffff8107c9d6>] call_timer_fn+0x36/0x110
[ 5713.861173]  [<ffffffff816a2420>] ? dev_graft_qdisc+0x80/0x80
[ 5713.861175]  [<ffffffff8107e5d0>] run_timer_softirq+0x260/0x370
[ 5713.861177]  [<ffffffff810755e4>] __do_softirq+0x124/0x2e0
[ 5713.861179]  [<ffffffff8107599d>] irq_exit+0xfd/0x110
[ 5713.861182]  [<ffffffff8178b094>] smp_apic_timer_interrupt+0x44/0x50
[ 5713.861185]  [<ffffffff8178919d>] apic_timer_interrupt+0x6d/0x80
[ 5713.861185]  <EOI>  [<ffffffff816192fc>] ? cpuidle_enter_state+0x4c/0xc0
[ 5713.861192]  [<ffffffff81619457>] cpuidle_enter+0x17/0x20
[ 5713.861195]  [<ffffffff810b9b67>] cpu_startup_entry+0x347/0x480
[ 5713.861198]  [<ffffffff81045b00>] start_secondary+0x230/0x2c0
[ 5713.861200] ---[ end trace 81a4197c96cc0dba ]---
[ 5713.861204] usb 3-3: 4296320000 : Tx timeout(#1), bss_type-num = 0-0
[ 5717.051599] usb 3-3: PREP_CMD: FW is in bad state
Comment 82 David López 2014-12-13 01:03:11 UTC
I'm writing only to comment that unfortunately I don't notice any improvement with kernel 3.17. I'm using 64 bits archlinux, and tried 3.17.1, 3.17.2, 3.17.3, 3.17.4 and 3.17.6 kernels, and I have the same problems I suffered with older kernels

- Random wifi failure
- systemctl suspend doesn't work

Are the patches included in 3.17 kernels? Has someone solved the wifi problems in archlinux? Thanks
Comment 83 Reyad Attiyat 2014-12-18 23:01:40 UTC
I have noticed that the wireless driver, in newer kernels, is much more unstable on my Surface Pro 2. With the newest kernel, 3.18, there's a patch that adds a callback called mwifiex_usb_submit_rem_rx_urbs found here:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/net/wireless/mwifiex?id=cf6a64fd603ae0f7391f7589b0f3568d4e79605c

This callbacks causes a kernel panic that halts the entire machine. It seems to submit urb's that have already been sent. Reverting the patch stops the kernel from panicing.

[  348.824741] usb 1-2: Incompatible network settings
[  421.384942] usb 1-2: Incompatible network settings
[  425.141964] usb 1-2: Incompatible network settings
[  513.017575] usb 1-2: Incompatible network settings
[  532.147466] usb 1-2: Incompatible network settings
[  558.224179] usb 1-2: Incompatible network settings
[  901.253038] kworker/dying (90) used greatest stack depth: 10672 bytes left
[ 1278.546316] ------------[ cut here ]------------
[ 1278.548684] WARNING: CPU: 3 PID: 864 at lib/list_debug.c:36 __list_add+0xcb/0xd0()
[ 1278.551037] list_add double add: new=ffff880118d0e4b0, prev=ffff880118d0e4b0, next=ffff880034c3eed8.
[ 1278.553413] Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack mwifiex_usb mwifiex uinput cfg80211 x86_pkg_temp_thermal coretemp kvm_intel hid_sensor_gyro_3d hid_sensor_magn_3d hid_sensor_incl_3d hid_sensor_accel_3d hid_sensor_rotation ebtable_nat hid_sensor_als hid_sensor_trigger ebtable_broute hid_sensor_iio_common bridge rfkill kvm industrialio_triggered_buffer kfifo_buf stp llc industrialio ebtable_filter ebtables iTCO_wdt iTCO_vendor_support snd_hda_codec_realtek ip6table_nat snd_hda_codec_generic nf_conntrack_ipv6 snd_hda_codec_hdmi nf_defrag_ipv6 crc32_pclmul nf_nat_ipv6 crc32c_intel snd_hda_intel ip6table_mangle snd_hda_controller vfat ghash_clmulni_intel fat snd_hda_codec ip6table_security uvcvideo ip6table_raw snd_hwdep ip6table_filter ip6_tables snd_seq microcode
[ 1278.561375]  videobuf2_vmalloc iptable_nat nf_conntrack_ipv4 videobuf2_memops nf_defrag_ipv4 nf_nat_ipv4 nf_nat videobuf2_core snd_seq_device v4l2_common nf_conntrack videodev joydev snd_pcm iptable_mangle hid_sensor_hub iptable_security media snd_timer iptable_raw snd mei_me hid_multitouch mei lpc_ich i2c_i801 soundcore tpm_infineon tpm_tis tpm i2c_hid i2c_designware_platform i2c_designware_core binfmt_misc i915 i2c_algo_bit drm_kms_helper drm sd_mod video
[ 1278.569568] CPU: 3 PID: 864 Comm: kworker/u9:1 Tainted: G          I    3.18.0+ #135
[ 1278.572220] Hardware name: Microsoft Corporation Surface Pro 2/Surface Pro 2, BIOS 2.03.0250 09/06/2013
[ 1278.574915] Workqueue: MWIFIEX_RX_WORK_QUEUE mwifiex_rx_work_queue [mwifiex]
[ 1278.577422]  ffffffff81a93486 ffff8801191338e8 ffffffff817e4788 0000000000000000
[ 1278.579860]  ffff880119133938 ffff880119133928 ffffffff810a249a ffff880119133908
[ 1278.582291]  ffff880118d0e4b0 ffff880034c3eed8 ffff880118d0e4b0 ffff88011955b2d0
[ 1278.584692] Call Trace:
[ 1278.586989]  [<ffffffff817e4788>] dump_stack+0x4f/0x7b
[ 1278.589186]  [<ffffffff810a249a>] warn_slowpath_common+0x8a/0xc0
[ 1278.591416]  [<ffffffff810a2516>] warn_slowpath_fmt+0x46/0x50
[ 1278.593640]  [<ffffffff814123cb>] __list_add+0xcb/0xd0
[ 1278.595905]  [<ffffffff815a12da>] ? usb_hcd_link_urb_to_ep+0x2a/0xa0
[ 1278.598000]  [<ffffffff815a1327>] usb_hcd_link_urb_to_ep+0x77/0xa0
[ 1278.600084]  [<ffffffff815e2e2a>] prepare_transfer+0xaa/0x130
[ 1278.602151]  [<ffffffff815e4997>] xhci_queue_bulk_tx+0xb7/0x780
[ 1278.604196]  [<ffffffff815db66c>] ? xhci_urb_enqueue+0x4ac/0x620
[ 1278.606225]  [<ffffffff815db66c>] ? xhci_urb_enqueue+0x4ac/0x620
[ 1278.608103]  [<ffffffff815db712>] xhci_urb_enqueue+0x552/0x620
[ 1278.609961]  [<ffffffff815a3566>] usb_hcd_submit_urb+0x96/0xaa0
[ 1278.611825]  [<ffffffff8169263a>] ? __alloc_skb+0x7a/0x1f0
[ 1278.613695]  [<ffffffff81691a21>] ? __kmalloc_reserve.isra.24+0x31/0x90
[ 1278.615599]  [<ffffffff8169260c>] ? __alloc_skb+0x4c/0x1f0
[ 1278.617401]  [<ffffffff8169260c>] ? __alloc_skb+0x4c/0x1f0
[ 1278.619185]  [<ffffffff810289ea>] ? native_sched_clock+0x2a/0xb0
[ 1278.620949]  [<ffffffff8169264e>] ? __alloc_skb+0x8e/0x1f0
[ 1278.622697]  [<ffffffff815a51d2>] usb_submit_urb+0x372/0x5f0
[ 1278.624438]  [<ffffffff81698859>] ? __alloc_rx_skb+0x39/0x110
[ 1278.626143]  [<ffffffffa05e23b2>] mwifiex_usb_submit_rx_urb+0xb2/0x170 [mwifiex_usb]
[ 1278.627463]  [<ffffffffa05e2635>] mwifiex_usb_submit_rem_rx_urbs+0x45/0x50 [mwifiex_usb]
[ 1278.628605]  [<ffffffffa06233d9>] mwifiex_rx_work_queue+0xd9/0x110 [mwifiex]
[ 1278.629754]  [<ffffffff810c1edc>] process_one_work+0x20c/0x8b0
[ 1278.630891]  [<ffffffff810c1e3f>] ? process_one_work+0x16f/0x8b0
[ 1278.632033]  [<ffffffff810c267b>] ? worker_thread+0xfb/0x4b0
[ 1278.633193]  [<ffffffff810c25eb>] worker_thread+0x6b/0x4b0
[ 1278.634352]  [<ffffffff810c2580>] ? process_one_work+0x8b0/0x8b0
[ 1278.635509]  [<ffffffff810c856d>] kthread+0x10d/0x130
[ 1278.636647]  [<ffffffff8110008d>] ? trace_hardirqs_on+0xd/0x10
[ 1278.637785]  [<ffffffff810c8460>] ? kthread_create_on_node+0x230/0x230
[ 1278.638921]  [<ffffffff817eeb3c>] ret_from_fork+0x7c/0xb0
[ 1278.640044]  [<ffffffff810c8460>] ? kthread_create_on_node+0x230/0x230
[ 1278.641163] ---[ end trace 84b49540b05c006b ]---
[ 1278.648287] ------------[ cut here ]------------
[ 1278.650466] WARNING: CPU: 3 PID: 0 at lib/list_debug.c:33 __list_add+0xa0/0xd0()
[ 1278.652421] list_add corruption. prev->next should be next (ffff880034c3eed8), but was ffff880118d0e4b0. (prev=ffff880118d0e4b0).
[ 1278.654532] Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack mwifiex_usb mwifiex uinput cfg80211 x86_pkg_temp_thermal coretemp kvm_intel hid_sensor_gyro_3d hid_sensor_magn_3d hid_sensor_incl_3d hid_sensor_accel_3d hid_sensor_rotation ebtable_nat hid_sensor_als hid_sensor_trigger ebtable_broute hid_sensor_iio_common bridge rfkill kvm industrialio_triggered_buffer kfifo_buf stp llc industrialio ebtable_filter ebtables iTCO_wdt iTCO_vendor_support snd_hda_codec_realtek ip6table_nat snd_hda_codec_generic nf_conntrack_ipv6 snd_hda_codec_hdmi nf_defrag_ipv6 crc32_pclmul nf_nat_ipv6 crc32c_intel snd_hda_intel ip6table_mangle snd_hda_controller vfat ghash_clmulni_intel fat snd_hda_codec ip6table_security uvcvideo ip6table_raw snd_hwdep ip6table_filter ip6_tables snd_seq microcode
[ 1278.659894]  videobuf2_vmalloc iptable_nat nf_conntrack_ipv4 videobuf2_memops nf_defrag_ipv4 nf_nat_ipv4 nf_nat videobuf2_core snd_seq_device v4l2_common nf_conntrack videodev joydev snd_pcm iptable_mangle hid_sensor_hub iptable_security media snd_timer iptable_raw snd mei_me hid_multitouch mei lpc_ich i2c_i801 soundcore tpm_infineon tpm_tis tpm i2c_hid i2c_designware_platform i2c_designware_core binfmt_misc i915 i2c_algo_bit drm_kms_helper drm sd_mod video
[ 1278.663870] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G        W I    3.18.0+ #135
[ 1278.665195] Hardware name: Microsoft Corporation Surface Pro 2/Surface Pro 2, BIOS 2.03.0250 09/06/2013
[ 1278.666567]  ffffffff81a93486 ffff88011b2038f8 ffffffff817e4788 0000000000000000
[ 1278.667931]  ffff88011b203948 ffff88011b203938 ffffffff810a249a ffff88011b203918
[ 1278.669351]  ffff880118d0d268 ffff880034c3eed8 ffff880118d0e4b0 ffff88011955b6d8
[ 1278.670695] Call Trace:
[ 1278.672042]  <IRQ>  [<ffffffff817e4788>] dump_stack+0x4f/0x7b
[ 1278.673374]  [<ffffffff810a249a>] warn_slowpath_common+0x8a/0xc0
[ 1278.674710]  [<ffffffff810a2516>] warn_slowpath_fmt+0x46/0x50
[ 1278.676029]  [<ffffffff814123a0>] __list_add+0xa0/0xd0
[ 1278.677292]  [<ffffffff815a12da>] ? usb_hcd_link_urb_to_ep+0x2a/0xa0
[ 1278.678537]  [<ffffffff815a1327>] usb_hcd_link_urb_to_ep+0x77/0xa0
[ 1278.679725]  [<ffffffff815e2e2a>] prepare_transfer+0xaa/0x130
[ 1278.680889]  [<ffffffff815e4997>] xhci_queue_bulk_tx+0xb7/0x780
[ 1278.681979]  [<ffffffff815db66c>] ? xhci_urb_enqueue+0x4ac/0x620
[ 1278.683029]  [<ffffffff815db66c>] ? xhci_urb_enqueue+0x4ac/0x620
[ 1278.684084]  [<ffffffff815db712>] xhci_urb_enqueue+0x552/0x620
[ 1278.685098]  [<ffffffff815a3566>] usb_hcd_submit_urb+0x96/0xaa0
[ 1278.686103]  [<ffffffff8169263a>] ? __alloc_skb+0x7a/0x1f0
[ 1278.687129]  [<ffffffff81691a21>] ? __kmalloc_reserve.isra.24+0x31/0x90
[ 1278.688132]  [<ffffffff8169260c>] ? __alloc_skb+0x4c/0x1f0
[ 1278.689132]  [<ffffffff8169260c>] ? __alloc_skb+0x4c/0x1f0
[ 1278.690150]  [<ffffffff8169264e>] ? __alloc_skb+0x8e/0x1f0
[ 1278.691123]  [<ffffffff815a51d2>] usb_submit_urb+0x372/0x5f0
[ 1278.692102]  [<ffffffff81698859>] ? __alloc_rx_skb+0x39/0x110
[ 1278.693111]  [<ffffffffa05e23b2>] mwifiex_usb_submit_rx_urb+0xb2/0x170 [mwifiex_usb]
[ 1278.694100]  [<ffffffffa05e2812>] mwifiex_usb_rx_complete+0x1d2/0x660 [mwifiex_usb]
[ 1278.695088]  [<ffffffff815a1c06>] ? usb_hcd_unmap_urb_for_dma+0x96/0x170
[ 1278.696107]  [<ffffffff815a1d93>] __usb_hcd_giveback_urb+0x83/0x140
[ 1278.697093]  [<ffffffff815a1f73>] usb_hcd_giveback_urb+0x43/0x130
[ 1278.698087]  [<ffffffff815e7e23>] xhci_irq+0xcb3/0x1e70
[ 1278.699101]  [<ffffffff815e8ff1>] xhci_msi_irq+0x11/0x20
[ 1278.700095]  [<ffffffff81117a5f>] handle_irq_event_percpu+0x6f/0x530
[ 1278.701088]  [<ffffffff81117f61>] handle_irq_event+0x41/0x70
[ 1278.702105]  [<ffffffff8111afbe>] handle_edge_irq+0x8e/0x140
[ 1278.703094]  [<ffffffff81020912>] handle_irq+0xb2/0x1a0
[ 1278.704087]  [<ffffffff810ca4fd>] ? atomic_notifier_call_chain+0x9d/0x110
[ 1278.705103]  [<ffffffff810ca465>] ? atomic_notifier_call_chain+0x5/0x110
[ 1278.706105]  [<ffffffff810def47>] ? irqtime_account_irq+0xe7/0x100
[ 1278.707098]  [<ffffffff817f1adb>] do_IRQ+0x5b/0x100
[ 1278.708112]  [<ffffffff817ef772>] common_interrupt+0x72/0x72
[ 1278.709101]  <EOI>  [<ffffffff81649feb>] ? cpuidle_enter_state+0x6b/0x300
[ 1278.710105]  [<ffffffff81649fe4>] ? cpuidle_enter_state+0x64/0x300
[ 1278.711131]  [<ffffffff8164a367>] cpuidle_enter+0x17/0x20
[ 1278.712127]  [<ffffffff810f4804>] cpu_startup_entry+0x504/0x620
[ 1278.713119]  [<ffffffff8105305b>] start_secondary+0x15b/0x190
[ 1278.714136] ---[ end trace 84b49540b05c006c ]---
[ 1365.178222] ------------[ cut here ]------------
[ 1365.178303] WARNING: CPU: 2 PID: 0 at lib/list_debug.c:62 __list_del_entry+0xdd/0xe0()
[ 1365.182790] list_del corruption. next->prev should be ffff880118d0eed8, but was ffff880118d0e4b0
[ 1365.187292] Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack mwifiex_usb mwifiex uinput cfg80211 x86_pkg_temp_thermal coretemp kvm_intel hid_sensor_gyro_3d hid_sensor_magn_3d hid_sensor_incl_3d hid_sensor_accel_3d hid_sensor_rotation ebtable_nat hid_sensor_als hid_sensor_trigger ebtable_broute hid_sensor_iio_common bridge rfkill kvm industrialio_triggered_buffer kfifo_buf stp llc industrialio ebtable_filter ebtables iTCO_wdt iTCO_vendor_support snd_hda_codec_realtek ip6table_nat snd_hda_codec_generic nf_conntrack_ipv6 snd_hda_codec_hdmi nf_defrag_ipv6 crc32_pclmul nf_nat_ipv6 crc32c_intel snd_hda_intel ip6table_mangle snd_hda_controller vfat ghash_clmulni_intel fat snd_hda_codec ip6table_security uvcvideo ip6table_raw snd_hwdep ip6table_filter ip6_tables snd_seq microcode
[ 1365.211643]  videobuf2_vmalloc iptable_nat nf_conntrack_ipv4 videobuf2_memops nf_defrag_ipv4 nf_nat_ipv4 nf_nat videobuf2_core snd_seq_device v4l2_common nf_conntrack videodev joydev snd_pcm iptable_mangle hid_sensor_hub iptable_security media snd_timer iptable_raw snd mei_me hid_multitouch mei lpc_ich i2c_i801 soundcore tpm_infineon tpm_tis tpm i2c_hid i2c_designware_platform i2c_designware_core binfmt_misc i915 i2c_algo_bit drm_kms_helper drm sd_mod video
[ 1365.226606] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G        W I    3.18.0+ #135
[ 1365.230546] Hardware name: Microsoft Corporation Surface Pro 2/Surface Pro 2, BIOS 2.03.0250 09/06/2013
[ 1365.234484]  ffffffff81a93486 ffff88011b003cb8 ffffffff817e4788 0000000000000000
[ 1365.238366]  ffff88011b003d08 ffff88011b003cf8 ffffffff810a249a ffff88011b003cd8
[ 1365.242248]  ffff880118d0eeb8 ffff880118d0eed8 ffff880035f34048 ffff88003432c000
[ 1365.246073] Call Trace:
[ 1365.249857]  <IRQ>  [<ffffffff817e4788>] dump_stack+0x4f/0x7b
[ 1365.253733]  [<ffffffff810a249a>] warn_slowpath_common+0x8a/0xc0
[ 1365.257553]  [<ffffffff810a2516>] warn_slowpath_fmt+0x46/0x50
[ 1365.261407]  [<ffffffff814124ad>] __list_del_entry+0xdd/0xe0
[ 1365.265217]  [<ffffffff815a1377>] usb_hcd_unlink_urb_from_ep+0x27/0x40
[ 1365.269077]  [<ffffffff815e7d89>] xhci_irq+0xc19/0x1e70
[ 1365.272892]  [<ffffffff815e8ff1>] xhci_msi_irq+0x11/0x20
[ 1365.276696]  [<ffffffff81117a5f>] handle_irq_event_percpu+0x6f/0x530
[ 1365.280509]  [<ffffffff81117f61>] handle_irq_event+0x41/0x70
[ 1365.284347]  [<ffffffff8111afbe>] handle_edge_irq+0x8e/0x140
[ 1365.288139]  [<ffffffff81020912>] handle_irq+0xb2/0x1a0
[ 1365.291922]  [<ffffffff810ca4fd>] ? atomic_notifier_call_chain+0x9d/0x110
[ 1365.295786]  [<ffffffff810ca465>] ? atomic_notifier_call_chain+0x5/0x110
[ 1365.299593]  [<ffffffff810def47>] ? irqtime_account_irq+0xe7/0x100
[ 1365.303401]  [<ffffffff817f1adb>] do_IRQ+0x5b/0x100
[ 1365.307226]  [<ffffffff817ef772>] common_interrupt+0x72/0x72
[ 1365.311026]  <EOI>  [<ffffffff81649feb>] ? cpuidle_enter_state+0x6b/0x300
[ 1365.314900]  [<ffffffff81649fe4>] ? cpuidle_enter_state+0x64/0x300
[ 1365.318795]  [<ffffffff8164a367>] cpuidle_enter+0x17/0x20
[ 1365.322691]  [<ffffffff810f4804>] cpu_startup_entry+0x504/0x620
[ 1365.326547]  [<ffffffff8105305b>] start_secondary+0x15b/0x190
[ 1365.330442] ---[ end trace 84b49540b05c006d ]---
[ 1365.334621] ------------[ cut here ]------------
[ 1365.339190] WARNING: CPU: 2 PID: 0 at lib/list_debug.c:59 __list_del_entry+0xbd/0xe0()
[ 1365.343559] list_del corruption. prev->next should be ffff880118d0e4b0, but was ffff880118d0d268
[ 1365.348007] Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack mwifiex_usb mwifiex uinput cfg80211 x86_pkg_temp_thermal coretemp kvm_intel hid_sensor_gyro_3d hid_sensor_magn_3d hid_sensor_incl_3d hid_sensor_accel_3d hid_sensor_rotation ebtable_nat hid_sensor_als hid_sensor_trigger ebtable_broute hid_sensor_iio_common bridge rfkill kvm industrialio_triggered_buffer kfifo_buf stp llc industrialio ebtable_filter ebtables iTCO_wdt iTCO_vendor_support snd_hda_codec_realtek ip6table_nat snd_hda_codec_generic nf_conntrack_ipv6 snd_hda_codec_hdmi nf_defrag_ipv6 crc32_pclmul nf_nat_ipv6 crc32c_intel snd_hda_intel ip6table_mangle snd_hda_controller vfat ghash_clmulni_intel fat snd_hda_codec ip6table_security uvcvideo ip6table_raw snd_hwdep ip6table_filter ip6_tables snd_seq microcode
[ 1365.362765]  videobuf2_vmalloc iptable_nat nf_conntrack_ipv4 videobuf2_memops nf_defrag_ipv4 nf_nat_ipv4 nf_nat videobuf2_core snd_seq_device v4l2_common nf_conntrack videodev joydev snd_pcm iptable_mangle hid_sensor_hub iptable_security media snd_timer iptable_raw snd mei_me hid_multitouch mei lpc_ich i2c_i801 soundcore tpm_infineon tpm_tis tpm i2c_hid i2c_designware_platform i2c_designware_core binfmt_misc i915 i2c_algo_bit drm_kms_helper drm sd_mod video
[ 1365.378471] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G        W I    3.18.0+ #135
[ 1365.383709] Hardware name: Microsoft Corporation Surface Pro 2/Surface Pro 2, BIOS 2.03.0250 09/06/2013
[ 1365.389004]  ffffffff81a93486 ffff88011b003cb8 ffffffff817e4788 0000000000000000
[ 1365.394330]  ffff88011b003d08 ffff88011b003cf8 ffffffff810a249a ffff88011b003cd8
[ 1365.399775]  ffff880118d0e490 ffff880118d0e4b0 ffff880035f34048 ffff88003432c000
[ 1365.405346] Call Trace:
[ 1365.410624]  <IRQ>  [<ffffffff817e4788>] dump_stack+0x4f/0x7b
[ 1365.415940]  [<ffffffff810a249a>] warn_slowpath_common+0x8a/0xc0
[ 1365.421192]  [<ffffffff810a2516>] warn_slowpath_fmt+0x46/0x50
[ 1365.426312]  [<ffffffff8141248d>] __list_del_entry+0xbd/0xe0
[ 1365.431277]  [<ffffffff815a1377>] usb_hcd_unlink_urb_from_ep+0x27/0x40
[ 1365.436171]  [<ffffffff815e7d89>] xhci_irq+0xc19/0x1e70
[ 1365.440891]  [<ffffffff815e8ff1>] xhci_msi_irq+0x11/0x20
[ 1365.445452]  [<ffffffff81117a5f>] handle_irq_event_percpu+0x6f/0x530
[ 1365.449924]  [<ffffffff81117f61>] handle_irq_event+0x41/0x70
[ 1365.454213]  [<ffffffff8111afbe>] handle_edge_irq+0x8e/0x140
[ 1365.458450]  [<ffffffff81020912>] handle_irq+0xb2/0x1a0
[ 1365.462641]  [<ffffffff810ca4fd>] ? atomic_notifier_call_chain+0x9d/0x110
[ 1365.466867]  [<ffffffff810ca465>] ? atomic_notifier_call_chain+0x5/0x110
[ 1365.471077]  [<ffffffff810def47>] ? irqtime_account_irq+0xe7/0x100
[ 1365.475248]  [<ffffffff817f1adb>] do_IRQ+0x5b/0x100
[ 1365.479392]  [<ffffffff817ef772>] common_interrupt+0x72/0x72
[ 1365.483553]  <EOI>  [<ffffffff81649feb>] ? cpuidle_enter_state+0x6b/0x300
[ 1365.487690]  [<ffffffff81649fe4>] ? cpuidle_enter_state+0x64/0x300
[ 1365.491709]  [<ffffffff8164a367>] cpuidle_enter+0x17/0x20
[ 1365.495590]  [<ffffffff810f4804>] cpu_startup_entry+0x504/0x620
[ 1365.499329]  [<ffffffff8105305b>] start_secondary+0x15b/0x190
[ 1365.502904] ---[ end trace 84b49540b05c006e ]---
Comment 84 Jakub Zeman 2015-03-17 19:38:50 UTC
Hi all,

this week I upgraded to vanilla kernel 3.18.9 and the all the problems I had with mwifiex_usb in 3.14.xx kernel are gone (I'm using Surface Pro 1).

So I think that this bug can be closed.

Jakub
Comment 85 David López 2015-03-19 23:54:12 UTC
I'm afraid I can't be so optimistic.

Today archlinux updated the kernel from 3.18.6 to 3.19.2. Unfortunately, now my Surface Pro 1 always freezes when I open the browser (tested with firefox and chromium), or when I suspend it with 'systemctl suspend'. No error messages, the tablet freezes and I have to shutdown pressing the power button for several seconds. I've also noticed random freezes, for example launching onboard, or writing in gnome-terminal or an email with thunderbird.

I'm almost sure that this new problem is also related with the wifi card, I've tested to disable it with 'rmmod xhci_pci' and the freezes have disappeared.

Downgrading to 3.18.6 solve the freezes problem, but old problem remains

- Random wifi failure (Comment #37)
- systemctl suspend doesn't work (Comment #56)

I include the journal of a freeze. I power on at 00:19, open firefox (then the Surface froze), press the power button until if poweroff, and then restarted 5 minutes later.

http://pastebin.com/5sidB3yT

I'm not sure if this information could be of interest.
Comment 86 Michael Zanga 2015-03-20 02:58:24 UTC
Can confirm similar results on my Surface Pro 2. Seems to happen whenever there is sufficiently heavy usage of the wireless interface. Happens to me when trying to download updates, crashes before the package database can finish downloading.
Comment 87 doomsplayer 2015-04-14 15:05:04 UTC
this bug also happened on my surface pro3 with 3.19.3 kernel sometimes, when the cpu load is high.
Comment 88 Jakub Zeman 2015-04-18 11:38:21 UTC
(In reply to David López from comment #85)
> I'm afraid I can't be so optimistic.
> 
> Today archlinux updated the kernel from 3.18.6 to 3.19.2. Unfortunately, now
> my Surface Pro 1 always freezes when I open the browser (tested with firefox
> and chromium), or when I suspend it with 'systemctl suspend'. No error
> messages, the tablet freezes and I have to shutdown pressing the power
> button for several seconds. I've also noticed random freezes, for example
> launching onboard, or writing in gnome-terminal or an email with thunderbird.
> 
> I'm almost sure that this new problem is also related with the wifi card,
> I've tested to disable it with 'rmmod xhci_pci' and the freezes have
> disappeared.
> 
> Downgrading to 3.18.6 solve the freezes problem, but old problem remains
> 
> - Random wifi failure (Comment #37)
> - systemctl suspend doesn't work (Comment #56)
> 
> I include the journal of a freeze. I power on at 00:19, open firefox (then
> the Surface froze), press the power button until if poweroff, and then
> restarted 5 minutes later.
> 
> http://pastebin.com/5sidB3yT
> 
> I'm not sure if this information could be of interest.

Hi,

you are right. It's very easy to reproduce it even on Surface Pro 1.

This bug was for the first time opened one Year ago. Can anyone explain to me why it has been not fixed yet (especially when it's so easy to reproduce it)?

Jakub
Comment 89 David López 2015-04-19 08:36:02 UTC
Jakub, although the problem is easy to reproduce, the crash is usually random. However, I've found a way to reproduce the 'FW is in bad state' error following those steps 

- Open firefox
- Go to 'http://store.steampowered.com/browse/linux/'
- Search for 'torchlight'

I've tried about ten times and always find the error. No SPAM intended, but I think that there's something in that webpage that makes the wifi card hang. At least in my OS (archlinux).
Comment 90 Reyad Attiyat 2015-06-25 23:19:05 UTC
Hello,

I have been doing a lot of testing recently with the mwifiex driver on my Surface Pro 2 and have some patches to share. I noticed the firmware on the card would crash or not respond causing timeouts. This has all occurred randomly but especially under high traffic situations. The patches that I will attach to this bug seem to lower the rate of these firmware crashes or driver timeouts significantly.

My system is using the newest firmware updated using Windows Update on Windows 10 beta (tech preview) and the newest Linux kernel 4.1 with Fedora 22. The first patch allows the keyboard and mouse, typecover, to work correctly by adding a quirk to the hid subsystem. The second patch fixes a crash in the driver as it tries to add a usb urb to a list twice and causes list corruption. The third patch is a hack to ensure the size of the usb urb is not a multiple of the max urb packet size, which is 512 for usb3. The last two patches are changes that have fixed crashes both for sending and receiving data. Not sure exactly why the last two patches increase stability but they remove some padding and check for high rx pending limit on usb.

I'd really appreciate some more testing from people with the surface 1, 2 or 3. I have only tested these patches on my device and with one access point. I have been running a high internet load by having multiple tabs open to streaming content such as YouTube and Twitch. Also use bandwidth speed test and sites that use heavy ajax connections such as GMail to test all possible loads.

My results have been zero crashes when running a kernel that uses little memory as I have experienced a crash when the driver failed to allocate a rx skb because of low memory. In the kernel config menu I disabled a lot of the kernel debugging features that are found under Kernel Hacks menu, such as DMA debugging.

Thanks,
Reyad Attiyat
Comment 91 Reyad Attiyat 2015-06-25 23:24:13 UTC
Created attachment 180921 [details]
[PATCH 1] Fix surface type/touch cover
Comment 92 Reyad Attiyat 2015-06-25 23:25:31 UTC
Created attachment 180931 [details]
[PATCH 2] Fix double add error on
Comment 93 Reyad Attiyat 2015-06-25 23:26:11 UTC
Created attachment 180941 [details]
[PATCH 3] Mwifiex hack to fix broken urb zero packet
Comment 94 Reyad Attiyat 2015-06-25 23:26:49 UTC
Created attachment 180951 [details]
[PATCH 4] Set tx padding to 0
Comment 95 Reyad Attiyat 2015-06-25 23:27:18 UTC
Created attachment 180961 [details]
[PATCH 5] Enable rx workqueue when rx pending is high
Comment 96 Reyad Attiyat 2015-06-25 23:37:56 UTC
Please note this might not fix every crash on all devices and the first patch is only for Surface 2 typecover/touchcover.
Comment 97 Reyad Attiyat 2015-06-26 00:43:34 UTC
One new thing:
The error I do receive is dev_alloc_skb failed and unfortunately still does occur even with the low memory/no DMA debug kernel it just takes a lot longer, 6GB of data transfer.
[ 8973.241443] usb 2-2: mwifiex_usb_submit_rx_urb: dev_alloc_skb failed with size: 4096

There is another bug 85071 that mentions this error but has no responses. I think this could be firefox/flash memory leak taking all the memory and unrelated to mwifiex. It would be nice if the driver could recover from this error gracefully.
Comment 98 David López 2015-06-26 12:28:38 UTC
Thanks for all your effort, Reyad

I'm afraid I'm not sure how should I apply your patchs. Could you help me? I'm on Archlinux in a Surface Pro 1, so I only need patchs 2-5

By the way, I've read that you're on kernel 4.1. Currently Archlinux is in kernel 4.0.6. My tablet freezes after several seconds in kernel 3.19.xx (Comment #85), and in kernel 4.0.xx I can't even connect to the internet (although the internet connection works out of the box for a USB wifi adpater I own, so I'm pretty sure that the problem is related with the Marvell 8797 card), so I'm forced to remain in kernel 3.18.xx. Have you tried kernels 3.19.xx and/or 4.0.xx? Have you suffered these issues?
Comment 99 Reyad Attiyat 2015-06-26 13:18:01 UTC
Hello Davud,

For compiling a kernel on arch you can follow this guide:
https://wiki.archlinux.org/index.php/Kernels/Compilation/Traditional

First install all dependency needed such as a compiler, linker, and such. Download the latest kernel 4.1 from kernel.org. After you have the kernel unzipped you can apply the patches by using the patch command, see here for more info: https://www.kernel.org/doc/Documentation/applying-patches.txt

After the kernel is downloaded and patches applied you must run the make commands to compile, link, and install the kennel and its modules. Also note you will need to copy your current kernel configuration using a command like:
$ zcat /proc/config.gz > .config
when in the kernel directory. If arch doesn't support this method there should be a kernel config in the boot directory you can copy. You might be asked to add some configuration when running make,if so the default options are all safe to choose.

You are correct to assume that you only need patches 2-5, as patch 1 is only for Surface Pro 2 on newer firmwares. Patch 2 deals with an issue that you mention, of a system freeze on newer kernel, and should fix that issue. Patch 3,4, and 5 deal with random crashes experienced by the driver and firmware and should allow for a lot more stability than older kernels did.

Good luck, if you need further help just ask on here or email me.

Thank you,
Reyad Attiyat
Comment 100 David López 2015-06-26 17:06:36 UTC
I'd never compiled a kernel, but with your help and Arch's wikis it has been very easy.

I'm now writing from 4.1 with patches 2-5. I've tried my own Comment #89 test and... it works!!! In addition, I feel that webpages load faster (is that possible?). On the other hand, suspend still doesn't work (Comment #56). However, the 'suspend' bug is a minor issue when comparing with the 'timeout' one. If your patches solved the timeout bug it would be a huge improvement for me.
 
I'll continue testing and let you know the results in a few days/weeks.

Let me thank you for your effort once more.
Comment 101 Reyad Attiyat 2015-06-26 18:17:42 UTC
Hey David,

Thanks for taking the time to test these patches, I'm glad they have improved your device stability. Not too sure about suspend, my device also randomly wakes up when suspended, but will look into the issue after this bug/patches are improved.

Also keep in mind that the errors you will see in dmesg such as:
[  240.339308] usb 2-2: Error: SKB len is multiple of 512 and URB_ZERO_PACKET is broken, resizing skb. SKB LEN: 512

are normal and get created by Patch 3 every time this skb resize hack is used. I'm working on eliminating this hack and improving the xhci, usb3 driver, to handle URB_ZERO_PACKET.

Some Known Issues:
-On my device there are a very high amount of dropped rx packets. This can be checked by using the command:
# cat /sys/kernel/debug/mwifiex/mlan0/info 

-After a while, or during periods of low memory, you might see the error:
usb 2-2: mwifiex_usb_submit_rx_urb: dev_alloc_skb failed
this causes the mwifiex driver to crash and might be due to a memory leak or be correlated to the high amount of dropped rx packets. Still investigating

-Also I have seen a tx timeout after a restart, restarted because the driver crashed due to low memory, but is resolved on another system restart. This one is very strange and almost seems to be because the driver/firmware is still messed up even after a restart.


If you get any tx timeouts please post them here with steps to reproduce (if any), tail output of dmesg, and the outputs of the two debugfs commands:
# cat /sys/kernel/debug/mwifiex/mlan0/info 
# cat /sys/kernel/debug/mwifiex/mlan0/debug

Thanks again,
Reyad Attiyat
Comment 102 Reyad Attiyat 2015-06-29 01:10:27 UTC
Created attachment 181221 [details]
[PATCH 2] mwifiex: Fix double add error
Comment 103 Reyad Attiyat 2015-06-29 01:11:59 UTC
Created attachment 181231 [details]
[PATCH 3] xhci: Add support for urb_zero_packet
Comment 104 Reyad Attiyat 2015-06-29 01:15:37 UTC
I have updated Patch 3 to add support in xhci for sending zero urb packets when needed.
Comment 105 David López 2015-07-07 18:14:05 UTC
Hi Reyad. I applied your new patches 2-3 and still no traces of timeouts. On the other hand, I've noticed something strange with my suspend error, sometimes it works and sometimes it doesn't. It needs further investigation.

I'm afraid I won't use my tablet a lot during the summer, but I'll let you know if I discover something interesting.
Comment 106 Hugo Ferrando Seage 2015-07-11 11:08:45 UTC
Thanks Reyad. I've successfully applied your patches on both Linux 4.1.1 and 4.1.2.

I can see some rx dropouts but no tx ones for now. However Wi-Fi seems to work reliably.
Comment 107 K W 2015-08-15 17:08:01 UTC
(In reply to Reyad Attiyat from comment #103)
> Created attachment 181231 [details]
> [PATCH 3] xhci: Add support for urb_zero_packet

Hey Reyad,
This may seem like a stupid question. But I am not sure how to download the patches off here. Is there supposed to be a download button?
Comment 108 Reyad Attiyat 2015-08-17 12:13:12 UTC
(In reply to K W from comment #107)
> (In reply to Reyad Attiyat from comment #103)
> > Created attachment 181231 [details]
> > [PATCH 3] xhci: Add support for urb_zero_packet
> 
> Hey Reyad,
> This may seem like a stupid question. But I am not sure how to download the
> patches off here. Is there supposed to be a download button?

Hello,

Most of the posted patches, found at top of page, can be found in the linux-next branch on kernel.org. To download/clone this repo just run the command:

git clone git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git

I just noticed I never submitted patch 5. Will due this today. If you need patch 5 applied you can download it from here:

https://bugzilla.kernel.org/attachment.cgi?id=180961

Patch 4 is not needed to fix this bug and I will remove it from this bug report.

Good luck,
Reyad Attiyat
Comment 109 Ted Sundström 2015-08-17 18:54:48 UTC
Applied the patches found here except for patch 5 (used patch 4), as well as the ones found in the commits over here: 

https://github.com/GusBricker/surfacepro2-kernel

onto a 4.1.0-3 kernel, and so far I don't have any problems at all except for waking up from suspension. Hotplugging the Type Cover 2, touchscreen, wifi... Everything works. The touchscreen won't work after waking up from suspension though.

The logs talks about broken URB_ZERO_PACKETs once every minute or so, but it doesn't seem to affect the performance.

Thanks a lot!
Comment 110 Ken Okabe 2015-08-31 22:08:09 UTC
Thanks for the patch. Reyad Attiyat 

I build the latest stable kernel 4.1.6 with the patch under Debian8.1 for deb packages.

https://github.com/kenokabe/linux-kernel-4.1.6-mwiflex-usb-patch
Comment 111 Ken Okabe 2015-08-31 22:30:15 UTC
https://github.com/kenokabe/linux-kernel-4.1.6-mwifiex-usb-patch

Incorrect repo name, link modified...
Comment 112 David López 2015-09-03 22:50:50 UTC
I've tested in 64bits archlinux kernels 4.1, 4.1.1, 4.1.2, 4.1.3 and 4.1.4 with patches 2-3-4-5, and also 4.1.6 only with patches 2-3-5, and all the 'timeouts' and the freezes had gone.

Reyad, are your patches included in recent 4.2 kernel? If they are not yet, do you know when will they be included in the stable kernel?
Comment 113 Björn Carlsson 2015-09-04 07:00:05 UTC
I sucessfully applied patches 2, 3 & 5 on 4.2 ArchLinux kernel.
Seems to work perfect!
Comment 114 Ken Okabe 2015-09-04 07:07:33 UTC
So far, I have kept using the new patched-kernel under Debian8.1 for a few days, the horrable unstable situation that requires reboot once encountered the disconnection, has gone.

Even if the wifi is disconnected for some reason, the re-connection works without reboot. 

Finally major progress that I don't have to use eternal USB-Wifi stick. Very glad.
Comment 115 Zwane Mwaikambo 2015-09-19 06:49:25 UTC
I haven't had a failure on 4.1.2 with patches 2,3,5 over the period of a week. Great job!
Comment 116 Zwane Mwaikambo 2015-09-20 05:49:51 UTC
(In reply to Zwane Mwaikambo from comment #115)
> I haven't had a failure on 4.1.2 with patches 2,3,5 over the period of a
> week. Great job!

Correction, the kernel is 4.1.3 with the above patches.
Comment 117 David López 2015-10-15 14:06:28 UTC
I've tested in 64bits archlinux kernels 4.2.1, 4.2.2 and 4.2.3. And all the 'timeouts' and the freezes had gone.

Reyad, do you know when will your patches be included in the stable kernel?
Comment 118 Reyad Attiyat 2015-10-15 14:15:28 UTC
Hey David,

I think 4.2 is the stable kernel and it will take a couple months for linux distributions to update their kernels to this newest version. I think archlinux should be updated to 4.2.3 but it will take a little while to make it into long term support kernel.


Thanks for testing,
Reyad Attiyat
Comment 119 David López 2015-10-15 14:20:49 UTC
Thanks for your answer, Reyad, but I'm not sure if I've understood you. Do you mean that your patches will be included in kernel 4.3? Or we'll still need to manually compile the kernel 4.3 to add your patches?
Comment 120 Reyad Attiyat 2015-10-15 14:38:04 UTC
Hey David,

The patches are in kernel 4.3-rc5 and 4.2.3 and should be in all future releases of linux. So no more manually patching and compiling!
Comment 121 David López 2015-10-15 14:49:38 UTC
Patches are already in 4.2.3? I haven't seem them in the changelog. I installed 4.2.3 a few days ago in archlinux, and I compiled myself the kernel with the patches!
Comment 122 Reyad Attiyat 2015-10-15 14:55:23 UTC
I have not tested the kernels myself but I see the commits are applied. Try archlinux kernel 4.2.3 or kernel 4.3-rc5 they should both work without any patching.
Comment 123 David López 2015-10-15 15:06:17 UTC
Maybe I'm missing something, but I don't think they are included in 4.2.3. I've installed the linux 4.2.3-1 from archlinux core repo and I've found a freeze in less than 15 seconds.
Comment 124 Reyad Attiyat 2015-10-15 15:12:31 UTC
I will do testing tonight on my device, sorry for the confusion. I do see the patch there but maybe some other issue is causing a freeze/lockup or arch isn't including the correct patch.
Comment 125 David López 2015-10-15 15:14:05 UTC
Thanks!
Comment 126 Ken Okabe 2015-10-15 16:03:10 UTC
Good info. Thanks.

I sort of confirmed by myself kernel 4.3-rc5 source-code is patched in this issue.

I updated my kernel to manjaro(arch tribe) linux 4.3rc4-1 (4.3.0-1-MANJARO Experimental), works good without the heavy bug.
Comment 127 Reyad Attiyat 2015-10-17 22:01:10 UTC
Hey David,

I have now realized that all of the patches are not in 4.2.3 . You will have to wait till 4.3 goes stable as all the patches are scheduled for that release and Ken has confirmed/tested this as well. I was looking at the wrong git commit log and mistakenly thought the patches made it into stable.

Sorry again for the confusion and delay.
Comment 128 David López 2015-10-19 19:33:45 UTC
Don't worry, Reyad. I'll wait until 4.3 is included in arch's core repo, and I'll let you know if it works well without manual compiling.
Comment 129 Ken Okabe 2015-10-27 21:01:32 UTC
I'd like to double-check that under the patched-done official 4.3 kernel, the connection is lost sometimes, and needs to disconnect then connect manually via GUI in order to re-establish the connection. This is without the previous crash or need-to-reboot-to-re-establish.
I am not sure this is because of my wifi-router, but I guess this is still due to the issue of the driver.
Anyone confirmed the same issue under the patched-done official 4.3 kernel?
Comment 130 Reyad Attiyat 2015-10-27 21:14:54 UTC
Hey Ken,

Could you post some logs. The dmesg would help and if there are no errors/warnings there you could try turning on debugging for mwifiex and trying to produce the crash. 

There is a know issue when memory is low but I will try and reproduce a disconnect or timeout on 4.3-rc7

Thank you,
Reyad Attiyat
Comment 131 Ken Okabe 2015-10-27 22:29:13 UTC
Hi Reyad,

Before long time, within an hour after I started investigation, the wifi disconnection happened.


$ sudo dmesg -c

basically return the log filled up with bunch of

```
usb 2-2: data: -ENOSR is returned

```


## DISCONNECTION LOG

When the wifi disconnection happened the log says as below:


```
[11119.607756] usb 2-2: data: -ENOSR is returned
[11119.607793] usb 2-2: data: -ENOSR is returned
[11119.607833] usb 2-2: data: -ENOSR is returned
[11119.612164] usb 2-2: data: -ENOSR is returned
[11119.612208] usb 2-2: data: -ENOSR is returned
[11119.612256] usb 2-2: data: -ENOSR is returned
[11119.613318] usb 2-2: info: successfully disconnected from 10:6f:3f:f6:c2:bc: reason code 6
[11119.616434] usb 2-2: info: trying to associate to 'BUFFALO-F6C2BC' bssid 10:6f:3f:f6:c2:bc
[11120.625524] usb 2-2: ASSOC_RESP: failed,	status code=2 err=0xfffc a_id=0x3fff
[11120.625529] usb 2-2: assoc failure: reason CONNECT_ERR_ASSOC_ERR_TIMEOUT
[11120.625531] usb 2-2: ASSOC_RESP: AUTH timeout
[11120.625570] usb 2-2: info: association to bssid 10:6f:3f:f6:c2:bc failed
[11120.626182] cfg80211: World regulatory domain updated:
[11120.626188] cfg80211:  DFS Master region: unset
[11120.626190] cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp), (dfs_cac_time)
[11120.626195] cfg80211:   (2402000 KHz - 2472000 KHz @ 40000 KHz), (N/A, 2000 mBm), (N/A)
[11120.626198] cfg80211:   (2457000 KHz - 2482000 KHz @ 40000 KHz), (N/A, 2000 mBm), (N/A)
[11120.626202] cfg80211:   (2474000 KHz - 2494000 KHz @ 20000 KHz), (N/A, 2000 mBm), (N/A)
[11120.626206] cfg80211:   (5170000 KHz - 5250000 KHz @ 80000 KHz, 160000 KHz AUTO), (N/A, 2000 mBm), (N/A)
[11120.626210] cfg80211:   (5250000 KHz - 5330000 KHz @ 80000 KHz, 160000 KHz AUTO), (N/A, 2000 mBm), (0 s)
[11120.626213] cfg80211:   (5490000 KHz - 5730000 KHz @ 160000 KHz), (N/A, 2000 mBm), (0 s)
[11120.626216] cfg80211:   (5735000 KHz - 5835000 KHz @ 80000 KHz), (N/A, 2000 mBm), (N/A)
[11120.626218] cfg80211:   (57240000 KHz - 63720000 KHz @ 2160000 KHz), (N/A, 0 mBm), (N/A)
[11125.073283] usb 2-2: info: trying to associate to 'BUFFALO-F6C2BC' bssid 10:6f:3f:f6:c2:bc
[11125.112870] usb 2-2: info: associated to bssid 10:6f:3f:f6:c2:bc successfully
[11125.195318] usb 2-2: Rx of mgmt packet failed
[11126.035703] usb 2-2: Rx of mgmt packet failed
[11148.988753] usb 2-2: data: -ENOSR is returned
[11181.043683] usb 2-2: data: -ENOSR is returned
[11181.043689] usb 2-2: data: -ENOSR is returned
[11239.576499] usb 2-2: data: -ENOSR is returned
[11239.576514] usb 2-2: data: -ENOSR is returned
```

The log says
```
[11125.112870] usb 2-2: info: associated to bssid 10:6f:3f:f6:c2:bc successfully
```

however, the packet never come through properly unless I disconnect the network and reconnect manually with GUI.


## RECONNECTION LOG

Here is the log when I re-establish the network manually with GUI:


```
[11359.479644] usb 2-2: data: -ENOSR is returned
[11359.479650] usb 2-2: data: -ENOSR is returned
[11359.479726] usb 2-2: data: -ENOSR is returned
[11359.479732] usb 2-2: data: -ENOSR is returned
[11455.636236] usb 2-2: info: successfully disconnected from 10:6f:3f:f6:c2:bc: reason code 3
[11455.636275] usb 2-2: info: successfully disconnected from 00:00:00:00:00:00:	reason code 3
[11455.647144] cfg80211: World regulatory domain updated:
[11455.647149] cfg80211:  DFS Master region: unset
[11455.647151] cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp), (dfs_cac_time)
[11455.647155] cfg80211:   (2402000 KHz - 2472000 KHz @ 40000 KHz), (N/A, 2000 mBm), (N/A)
[11455.647158] cfg80211:   (2457000 KHz - 2482000 KHz @ 40000 KHz), (N/A, 2000 mBm), (N/A)
[11455.647160] cfg80211:   (2474000 KHz - 2494000 KHz @ 20000 KHz), (N/A, 2000 mBm), (N/A)
[11455.647163] cfg80211:   (5170000 KHz - 5250000 KHz @ 80000 KHz, 160000 KHz AUTO), (N/A, 2000 mBm), (N/A)
[11455.647166] cfg80211:   (5250000 KHz - 5330000 KHz @ 80000 KHz, 160000 KHz AUTO), (N/A, 2000 mBm), (0 s)
[11455.647169] cfg80211:   (5490000 KHz - 5730000 KHz @ 160000 KHz), (N/A, 2000 mBm), (0 s)
[11455.647171] cfg80211:   (5735000 KHz - 5835000 KHz @ 80000 KHz), (N/A, 2000 mBm), (N/A)
[11455.647173] cfg80211:   (57240000 KHz - 63720000 KHz @ 2160000 KHz), (N/A, 0 mBm), (N/A)
[11459.684916] usb 2-2: info: trying to associate to 'BUFFALO-F6C2BC' bssid 10:6f:3f:f6:c2:bc
[11459.701722] usb 2-2: info: associated to bssid 10:6f:3f:f6:c2:bc successfully
[11459.815193] usb 2-2: Rx of mgmt packet failed
[11459.925631] usb 2-2: Rx of mgmt packet failed
[11460.677892] usb 2-2: Rx of mgmt packet failed
[11460.926293] usb 2-2: Rx of mgmt packet failed
[11465.326687] usb 2-2: data: -ENOSR is returned
[11466.078204] usb 2-2: data: -ENOSR is returned
[11466.877727] usb 2-2: Rx of mgmt packet failed
[11466.984530] usb 2-2: Rx of mgmt packet failed
```

I cannnot distinguish between the connection failure-log and success-log besides the disconection reason code is 6 or 3.

I'd like to hear your idea. Thanks.
Comment 132 Ken Okabe 2015-10-28 00:21:23 UTC
PS.

My previous posts are not precise.
To be precise, the wifi network connection is Not lost or disconnected.

For unknown reason, the only download packet becomes 0 k/s, and upload packet seems working.

Then, after this status, the download packet never come through, and after the manual disconnect, reconnect, the status is recovered as normal.
Comment 133 Reyad Attiyat 2015-11-11 08:03:54 UTC
Hey Ken,

Sorry about the delayed response my surface charger broke and needed to be replaced. I have reproduced the problem you have described. The RX (receiving) packet that is sent to the wireless card is being dropped. I will investigate this problem further this weekend, and hopefully have some patches for testing.

Thanks,
Reyad Attiyat
Comment 134 Ken Okabe 2015-11-11 21:22:57 UTC
Thanks Rryad, please do.
Comment 135 Reyad Attiyat 2015-11-15 20:09:04 UTC
Hey Ken,

Try applying patch 5 to kernel 4.3. It didn't make it into stable and seems to correct the issue.
Comment 136 David López 2015-11-15 20:36:39 UTC
Hi Reyad, 

Do you know if patch 2 has been included in 4.3? I think we don't have to worry about patch 3, I've noticed that it was included in 4.2.4

Kind Regards,
Comment 138 Ken Okabe 2015-12-02 02:49:17 UTC
Reyad,

Thanks.

So, patch 5 shall be included to next official kernel update?

One more thing I concern is that I observe tons of log via
journalctl

```
Dec 01 21:26:50 manjaro kernel: usb 2-2: data: -ENOSR is returned
Dec 01 21:26:50 manjaro kernel: usb 2-2: data: -ENOSR is returned
Dec 01 21:26:50 manjaro kernel: usb 2-2: data: -ENOSR is returned
Dec 01 21:26:50 manjaro kernel: usb 2-2: data: -ENOSR is returned
Dec 01 21:26:50 manjaro kernel: usb 2-2: data: -ENOSR is returned
Dec 01 21:26:50 manjaro kernel: usb 2-2: data: -ENOSR is returned
Dec 01 21:26:50 manjaro kernel: usb 2-2: data: -ENOSR is returned
Dec 01 21:26:50 manjaro kernel: usb 2-2: data: -ENOSR is returned
Dec 01 21:26:50 manjaro kernel: usb 2-2: data: -ENOSR is returned
Dec 01 21:26:50 manjaro kernel: usb 2-2: data: -ENOSR is returned
Dec 01 21:26:50 manjaro kernel: usb 2-2: data: -ENOSR is returned
Dec 01 21:26:50 manjaro kernel: usb 2-2: data: -ENOSR is returned
Dec 01 21:26:50 manjaro kernel: usb 2-2: data: -ENOSR is returned
Dec 01 21:26:50 manjaro kernel: usb 2-2: data: -ENOSR is returned
Dec 01 21:26:50 manjaro kernel: usb 2-2: data: -ENOSR is returned
Dec 01 21:26:50 manjaro kernel: usb 2-2: data: -ENOSR is returned
Dec 01 21:26:50 manjaro kernel: usb 2-2: data: -ENOSR is returned
Dec 01 21:26:50 manjaro kernel: usb 2-2: data: -ENOSR is returned
Dec 01 21:26:50 manjaro kernel: usb 2-2: data: -ENOSR is returned
Dec 01 21:26:50 manjaro kernel: usb 2-2: data: -ENOSR is returned
Dec 01 21:26:50 manjaro kernel: usb 2-2: data: -ENOSR is returned
Dec 01 21:26:50 manjaro kernel: usb 2-2: data: -ENOSR is returned
Dec 01 21:26:50 manjaro kernel: usb 2-2: data: -ENOSR is returned
Dec 01 21:26:50 manjaro kernel: usb 2-2: data: -ENOSR is returned
Dec 01 21:26:50 manjaro kernel: usb 2-2: data: -ENOSR is returned
Dec 01 21:26:50 manjaro kernel: usb 2-2: data: -ENOSR is returned
Dec 01 21:26:50 manjaro kernel: usb 2-2: data: -ENOSR is returned
Dec 01 21:26:50 manjaro kernel: usb 2-2: data: -ENOSR is returned
Dec 01 21:26:50 manjaro kernel: usb 2-2: data: -ENOSR is returned
Dec 01 21:26:50 manjaro kernel: usb 2-2: data: -ENOSR is returned
Dec 01 21:26:50 manjaro kernel: usb 2-2: data: -ENOSR is returned
Dec 01 21:26:50 manjaro kernel: usb 2-2: data: -ENOSR is returned
Dec 01 21:26:50 manjaro kernel: usb 2-2: data: -ENOSR is returned

```

Is this because of this wifi driver version?

This obviously generates many unnecessary journal files under

/var/log/journal/

and eats up drive space so quickly, and also some machine resource.

Thanks.
Comment 139 Peter Hunt 2016-01-07 09:15:10 UTC
Patch 5 seems to no longer be required in 4.4-rc8. In fact, applying it creates many RX errors and causes wifi to be fairly unstable whereas without it, performance is as of 4.3.3 with the patch.
Comment 140 Ken Okabe 2016-01-07 12:33:18 UTC
Peter,

I happened to have updated to 4.4-rc8, and now confirmed the ENOSR log fill is no longer there. Good thing. Thanks for the post.
I still have not confirmed the Wifi-disconnection that require re-connection. Hope this is the final stable solution or linux kernel for this wifi issue.
Comment 141 David López 2016-02-07 15:18:38 UTC
I'm afraid I have bad news, I've recently updated to kernel 4.4.1 without any extra patch (apart from those already included in archlinux stable repo), and I've found a freeze in the wifi card while I was receiving streaming. 

Here's the dmesg:

http://pastebin.com/7xVUTu91

Anyone has suffered similar issues in kernel 4.4? I read Peter's and Ken's posts, so I didn't expect problems. Should I compile 4.4.1 kernel with patch 5?

PD: my 'suspend' issue (https://bugzilla.kernel.org/show_bug.cgi?id=69661#c56) remains unsolved in 4.4.1, but this is a minor bug and I really don't care about it.
Comment 142 Ken Okabe 2016-02-12 23:28:40 UTC
4.4.1-2-MANJARO (probably same kernel version provided by arch repo),
I have used for a while, but no outstanding issue so far.
Comment 143 David López 2016-02-14 22:31:55 UTC
Hi Ken. My kernel is also 4.4.1-2, I think it's the same as yours. 

I've found a new freeze, this time when I started (less than 1 minute).

http://pastebin.com/qP17p0Ut

I'm compiling the kernel with patch 5. I'll taste it and post my impression.
Comment 144 Ken Okabe 2016-02-14 22:48:34 UTC
David,

today, during watching youtube, I had a freeze for the first time.
I'm not sure the freeze is related to this issue, but I can say it never happened to the prior version of the 4.3 kernel.
Comment 145 Ken Okabe 2016-05-18 01:18:23 UTC
kernel 4.3 good
kernel 4.4 bad
kernel 4.5 bad

4.5.4-1-ARCH
Comment 146 Massimo Andreasi Bassi 2016-06-08 13:13:39 UTC
Same unstable situation here, running 4.4.0-22-generic on a Surface Pro 1 as a headless ubuntu server (media server).

I found the machine stuck with no networking:

```
# service networking status

● networking.service - Raise network interfaces
   Loaded: loaded (/lib/systemd/system/networking.service; enabled; vendor preset: enabled)
  Drop-In: /run/systemd/generator/networking.service.d
           └─50-insserv.conf-$network.conf
   Active: failed (Result: exit-code) since Tue 2016-06-07 10:04:03 UTC; 1 day 2h ago
     Docs: man:interfaces(5)
  Process: 805 ExecStart=/sbin/ifup -a --read-environment (code=exited, status=1/FAILURE)
  Process: 794 ExecStartPre=/bin/sh -c [ "$CONFIGURE_INTERFACES" != "no" ] && [ -n "$(ifquery --read-environment --list --exclude=lo)" ] && udevadm settle (code=exited, status=0/SUCCESS)
 Main PID: 805 (code=exited, status=1/FAILURE)

Jun 07 10:04:03 home wpa_supplicant[920]: Could not read interface wlx6045bdd95c92 flags: No such device
Jun 07 10:04:03 home wpa_supplicant[920]: WEXT: Could not set interface 'wlx6045bdd95c92' UP
Jun 07 10:04:03 home wpa_supplicant[920]: wlx6045bdd95c92: Failed to initialize driver interface
Jun 07 10:04:03 home ifup[805]: /etc/network/if-pre-up.d/wpasupplicant: 120: /etc/network/if-pre-up.d/wpasupplicant: cannot create /dev/stderr: No such device or address
Jun 07 10:04:03 home ifup[805]: run-parts: /etc/network/if-pre-up.d/wpasupplicant exited with return code 1
Jun 07 10:04:03 home ifup[805]: Failed to bring up wlx6045bdd95c92.
Jun 07 10:04:03 home systemd[1]: networking.service: Main process exited, code=exited, status=1/FAILURE
Jun 07 10:04:03 home systemd[1]: Failed to start Raise network interfaces.
Jun 07 10:04:03 home systemd[1]: networking.service: Unit entered failed state.
Jun 07 10:04:03 home systemd[1]: networking.service: Failed with result 'exit-code'.
```

The machine had been logging for hours:
```
Jun  8 12:13:45 home kernel: [94192.233507] usb 1-3: 4318439004 : Tx timeout(#248), bss_type-num = 0-0
Jun  8 12:13:55 home kernel: [94202.233974] usb 1-3: 4318441504 : Tx timeout(#249), bss_type-num = 0-0
```

Everything seems to have started after:
```
Jun  7 04:17:26 home kernel: [131556.286845] ------------[ cut here ]------------
Jun  7 04:17:26 home kernel: [131556.286853] WARNING: CPU: 0 PID: 7984 at /build/linux-FvcHlK/linux-4.4.0/net/sched/sch_generic.c:303 dev_watchdog+0x237/0x240()
Jun  7 04:17:26 home kernel: [131556.286855] NETDEV WATCHDOG: wlx6045bdd95c92 (usb): transmit queue 1 timed out
Jun  7 04:17:26 home kernel: [131556.286856] Modules linked in: xt_nat xt_tcpudp veth ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter bridge stp llc aufs bnep btusb btrtl btbcm btintel bluetooth snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm gpio_ich snd_seq_midi snd_seq_midi_event intel_rapl snd_rawmidi x86_pkg_temp_thermal nls_iso8859_1 intel_powerclamp coretemp kvm_intel snd_seq kvm snd_seq_device snd_timer irqbypass snd mei_me mei soundcore lpc_ich ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi mwifiex_usb mwifiex cfg80211 autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ses enclosure uas usb_storage crct10dif_pclmul i915 crc32_pclmul video i2c_algo_bit drm_kms_helper aesni_intel syscopyarea sysfillrect sysimgblt fb_sys_fops aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd drm ahci libahci
Jun  7 04:17:26 home kernel: [131556.286906] CPU: 0 PID: 7984 Comm: Plex Media Scan Not tainted 4.4.0-22-generic #40-Ubuntu
Jun  7 04:17:26 home kernel: [131556.286907] Hardware name: Microsoft Corporation Surface with Windows 8 Pro/Surface with Windows 8 Pro, BIOS 1.07.0050 11/28/2014
Jun  7 04:17:26 home kernel: [131556.286908]  0000000000000286 00000000b68e62b1 ffff88011f203d98 ffffffff813e9c53
Jun  7 04:17:26 home kernel: [131556.286911]  ffff88011f203de0 ffffffff81d62a90 ffff88011f203dd0 ffffffff81080fb2
Jun  7 04:17:26 home kernel: [131556.286912]  0000000000000001 ffff8800d10f0500 0000000000000000 ffff8800d1805000
Jun  7 04:17:26 home kernel: [131556.286914] Call Trace:
Jun  7 04:17:26 home kernel: [131556.286916]  <IRQ>  [<ffffffff813e9c53>] dump_stack+0x63/0x90
Jun  7 04:17:26 home kernel: [131556.286923]  [<ffffffff81080fb2>] warn_slowpath_common+0x82/0xc0
Jun  7 04:17:26 home kernel: [131556.286925]  [<ffffffff8108104c>] warn_slowpath_fmt+0x5c/0x80
Jun  7 04:17:26 home kernel: [131556.286927]  [<ffffffff810b4b19>] ? update_curr+0x79/0x160
Jun  7 04:17:26 home kernel: [131556.286929]  [<ffffffff810ab02d>] ? ttwu_do_activate.constprop.90+0x5d/0x70
Jun  7 04:17:26 home kernel: [131556.286931]  [<ffffffff81741e07>] dev_watchdog+0x237/0x240
Jun  7 04:17:26 home kernel: [131556.286933]  [<ffffffff81741bd0>] ? qdisc_rcu_free+0x40/0x40
Jun  7 04:17:26 home kernel: [131556.286936]  [<ffffffff810ec3b5>] call_timer_fn+0x35/0x120
Jun  7 04:17:26 home kernel: [131556.286937]  [<ffffffff81741bd0>] ? qdisc_rcu_free+0x40/0x40
Jun  7 04:17:26 home kernel: [131556.286939]  [<ffffffff810ecd6a>] run_timer_softirq+0x23a/0x2f0
Jun  7 04:17:26 home kernel: [131556.286942]  [<ffffffff810859f1>] __do_softirq+0x101/0x290
Jun  7 04:17:26 home kernel: [131556.286944]  [<ffffffff81085cf3>] irq_exit+0xa3/0xb0
Jun  7 04:17:26 home kernel: [131556.286947]  [<ffffffff81827da2>] smp_apic_timer_interrupt+0x42/0x50
Jun  7 04:17:26 home kernel: [131556.286949]  [<ffffffff81826062>] apic_timer_interrupt+0x82/0x90
Jun  7 04:17:26 home kernel: [131556.286950]  <EOI> 
Jun  7 04:17:26 home kernel: [131556.286951] ---[ end trace c432bed7336f2a26 ]---
Jun  7 04:17:26 home kernel: [131556.286956] usb 1-3: 4327782752 : Tx timeout(#1), bss_type-num = 0-0
Jun  7 04:17:31 home kernel: [131561.294606] usb 1-3: 4327784004 : Tx timeout(#2), bss_type-num = 0-0
Jun  7 04:17:41 home kernel: [131571.293998] usb 1-3: 4327786504 : Tx timeout(#3), bss_type-num = 0-0
Jun  7 04:17:51 home kernel: [131581.293362] usb 1-3: 4327789004 : Tx timeout(#4), bss_type-num = 0-0
Jun  7 04:18:01 home kernel: [131591.292744] usb 1-3: 4327791504 : Tx timeout(#5), bss_type-num = 0-0
Jun  7 04:18:11 home kernel: [131601.292115] usb 1-3: 4327794004 : Tx timeout(#6), bss_type-num = 0-0
```

Also, during normal operation I see a lot of repeated reconnections after:
```
Jun  6 21:40:07 home kernel: [107717.892499] usb 1-3: Rx of mgmt packet failed
```

Any clue? Is there a version that's a bit more stable?

Note You need to log in before you can comment on or make changes to this bug.