Bug 64111 - mwifiex_usb USB8797 crash failed to get signal information
Summary: mwifiex_usb USB8797 crash failed to get signal information
Status: RESOLVED DUPLICATE of bug 69661
Alias: None
Product: Drivers
Classification: Unclassified
Component: network-wireless (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: drivers_network-wireless@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-10-31 07:54 UTC by Reyad Attiyat
Modified: 2015-03-17 14:02 UTC (History)
9 users (show)

See Also:
Kernel Version: 3.13, 3.13-wl
Tree: Fedora
Regression: No


Attachments
dmesg output seg fault on unload (9.51 KB, text/plain)
2013-11-05 01:54 UTC, Reyad Attiyat
Details
dmesg output with removed autosuspend (68.83 KB, text/plain)
2013-11-29 12:00 UTC, perpe
Details
DMESG 3.11 Kernel xhci suspend (37.45 KB, text/plain)
2013-12-16 21:00 UTC, Reyad Attiyat
Details
DMESG 3.12 ehci (19.36 KB, text/plain)
2013-12-16 21:01 UTC, Reyad Attiyat
Details
DMESG 3.12.5 EHCI without MWIFIEX (122.75 KB, text/plain)
2013-12-18 02:57 UTC, Reyad Attiyat
Details
DMESG 3.12.5 XHCI without MWIFIEX (233.34 KB, text/plain)
2013-12-18 02:58 UTC, Reyad Attiyat
Details
Zwane's kernel 3.13-rc4 config (157.82 KB, text/plain)
2014-01-14 04:56 UTC, Zwane Mwaikambo
Details
moleculars xhci-enabled kernel config regarding comment #36 (139.91 KB, application/octet-stream)
2014-01-14 11:00 UTC, molecular
Details
attachment-25476-0.html (1.61 KB, text/html)
2014-01-31 17:56 UTC, Bing Zhao
Details
wireless-testing-dmesg-fail-signal (85.71 KB, application/octet-stream)
2014-01-31 17:56 UTC, Reyad Attiyat
Details
DMESG mwifiex crash WL Extra Debug (171.32 KB, text/plain)
2014-02-05 15:41 UTC, Reyad Attiyat
Details
DMESG mwifiex crash WL Extra Debug 2 (172.72 KB, text/plain)
2014-02-06 17:38 UTC, Reyad Attiyat
Details

Description Reyad Attiyat 2013-10-31 07:54:09 UTC
On the Microsoft Surface pro 2 the marvell wireless usb W8797 doesn't create an interface.

lsusb:
1286:2044 Marvell Semiconductor, Inc.

dmesg:
[   11.989538] usb 2-2: USB disconnect, device number 3
[   12.703796] usb 2-2: new high-speed USB device number 10 using xhci_hcd
[   12.742469] usb 2-2: New USB device found, idVendor=1286, idProduct=2044
[   12.742474] usb 2-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[   12.742478] usb 2-2: Product: Bluetooth and Wireless LAN Composite Device
[   12.742480] usb 2-2: Manufacturer: Marvell
[   12.742482] usb 2-2: SerialNumber: 2818785876D3
[   12.747749] usb 2-2: ep 0x4 - rounding interval to 128 microframes, ep desc says 255 microframes


[   22.761596] usb 2-2: WLAN FW is active
[   22.862239] usbcore: registered new interface driver btusb
[   32.552479] usb 2-2: mwifiex_cmd_timeout_func: Timeout cmd id (1383204595.95925) = 0xa9, act = 0x0
[   32.552522] usb 2-2: num_data_h2c_failure = 0
[   32.552538] usb 2-2: num_cmd_h2c_failure = 0
[   32.552552] usb 2-2: num_cmd_timeout = 1
[   32.552566] usb 2-2: num_tx_timeout = 0
[   32.552580] usb 2-2: last_cmd_index = 1
[   32.552593] usb 2-2: last_cmd_id: 00 00 a9 00 00 00 00 00 00 00
[   32.552604] usb 2-2: last_cmd_act: 00 00 00 00 00 00 00 00 00 00
[   32.552622] usb 2-2: last_cmd_resp_index = 0
[   32.552633] usb 2-2: last_cmd_resp_id: 00 00 00 00 00 00 00 00 00 00
[   32.552645] usb 2-2: last_event_index = 0
[   32.552655] usb 2-2: last_event: 00 00 00 00 00 00 00 00 00 00
[   32.552667] usb 2-2: data_sent=0 cmd_sent=0
[   32.552680] usb 2-2: ps_mode=0 ps_state=0
[   32.553270] usb 2-2: cmd timeout

Is it pssoible this can work with the current driver or firmwaare?
Comment 1 Bing Zhao 2013-11-04 21:03:06 UTC
Driver gets command timeout on command 0xa9 which is the first command after firmware is downloaded.
Does the command timeout happen every time you reboot the Surface Pro 2 or happends occasionally?
Comment 2 Reyad Attiyat 2013-11-05 01:51:26 UTC
(In reply to Bing Zhao from comment #1)

> Does the command timeout happen every time you reboot the Surface Pro 2 or
> happends occasionally?

The timeout happens as soon as I boot/load the module. 
Does this timeout mean that the firmware is not accepted by the hardware? Could I try a newer firmware somehow. Is it possible to extract it from the windows driver or do you have to snoop the usb packets?

I'm really curious in to how to debug this wireless card further.

I upgraded to 3.12-rc7. I noticed I get a segmentation fault when trying to unload mwifiex_usb when it is in the above state. This might have to do with this error I saw in the debug output:

    mwifiex_usb: mwifiex_usb_disconnect: card or card->adapter is NULL


I will attach a dmesg with the backtrace.
Comment 3 Reyad Attiyat 2013-11-05 01:54:13 UTC
Created attachment 113401 [details]
dmesg output seg fault on unload
Comment 4 Bing Zhao 2013-11-05 04:25:05 UTC
The firmware is accepted by the hardware as you have got this message in your description.
[   22.761596] usb 2-2: WLAN FW is active

In attachment 113401 [details] (Comment 3) I don't see the mwifiex driver gets loaded. Instead, I saw this error and that's probably the reason.
[   48.869771] usb 2-2: can't set config #1, error -110

There seems to be a problem on the USB interface.
I noticed that you are using USB XHCI. Is it possible that you could try disabling CONFIG_USB_XHCI_HCD?

mwifiex_usb_disconnect message isn't a problem. I have that too.
Comment 5 Reyad Attiyat 2013-11-05 05:16:58 UTC
(In reply to Bing Zhao from comment #4)
> There seems to be a problem on the USB interface.
> I noticed that you are using USB XHCI. Is it possible that you could try
> disabling CONFIG_USB_XHCI_HCD

Disabling XHCI did work. Thank You.

My dmesg output now gets flooded with this when I use the wireless card:
usb 1-1.2: data: -ENOSR is returned

Is this related to the usb controller or this module.

dmesg output with xhci disabled
[    3.669470] usb 1-1.2: new high-speed USB device number 3 using ehci-pci
[    3.764560] usb 1-1.2: New USB device found, idVendor=1286, idProduct=2043
[    3.764565] usb 1-1.2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[    3.764569] usb 1-1.2: Product: Marvell Wireless Device
[    3.764571] usb 1-1.2: Manufacturer: Marvell
[    3.764574] usb 1-1.2: SerialNumber: 0000000000000000
[   13.508639] usb 1-1.2: USB disconnect, device number 3
[   14.199811] usb 1-1.2: new high-speed USB device number 8 using ehci-pci
[   14.290891] usb 1-1.2: New USB device found, idVendor=1286, idProduct=2044
[   14.290896] usb 1-1.2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[   14.290899] usb 1-1.2: Product: Bluetooth and Wireless LAN Composite Device
[   14.290902] usb 1-1.2: Manufacturer: Marvell
[   14.290904] usb 1-1.2: SerialNumber: 2818785876D3
[   14.305308] usb 1-1.2: WLAN FW is active
[   14.894582] usb 1-1.2: driver_version = mwifiex 1.0 (14.69.11.p179) 
[  683.341211] usb 1-1.2: data: -ENOSR is returned
[  861.333247] usb 1-1.2: data: -ENOSR is returned
[  862.498962] usb 1-1.2: data: -ENOSR is returned
[  862.499465] usb 1-1.2: data: -ENOSR is returned
[  862.606033] usb 1-1.2: data: -ENOSR is returned
[  862.606400] usb 1-1.2: data: -ENOSR is returned
Comment 6 Bing Zhao 2013-11-05 20:15:55 UTC
That's great! Thanks.
The "data: -ENOSR is returned" messages are printed when all USB URBs are busy on sending TX data. It usually happens when running heavier tranffic but shouldn't be problem though.
Do you actually get any issue with these messages?
Comment 7 jon 2013-11-13 01:48:05 UTC
Mr. Zhao,
I can reproduce the same symptoms and failures on several recent kernels on a surface pro 2.  Any thoughts on when/how the conflict with xhci might get resolved?

Thanks for the assistance.
Comment 8 Bing Zhao 2013-11-13 03:03:52 UTC
The current workaround is to disable CONFIG_USB_XHCI_HCD as mentioned in Comment #4. The ultimate solution would be getting a firmware update. I will post the new firmware to linux-firmware list as soon as it's available.
Comment 9 perpe 2013-11-28 16:41:51 UTC
The problem is the autosuspend of the driver.
Removing line 518 in usb.c (usb_disable_autosuspend(card->udev);) and line 579 (  .supports_autosuspend = 1,) solved my problems (also own a Surface Pro 2)
The Surface Pro 2 needs XHCI, without it suspend/resume doesn't work (USB doesn't wake up). After removing these 2 lines, you should enable XHCI again (I did it as a module), wifi still works.
Comment 10 perpe 2013-11-29 12:00:02 UTC
Created attachment 116701 [details]
dmesg output with removed autosuspend
Comment 11 perpe 2013-11-29 12:08:29 UTC
My quick fix isn't the right way to solve it, but it may help you to do it the right way. I have attached a dmesg output with my fix. On resume dmesg tells me that the module misses a reset_resume function and with phy device is lost and a new one with an incremental number is created (phy0->phy1)
Without my fix the problems were:
- wifi only works without xhci (but the Surface Pro 2 needs xhci for proper sleep/resume handling)
-on battery mode shut down hung (removing usb_disable_autosuspend(card->udev); solved this)
- on power supply remove/battery mode wifi was suspended (removing .supports_autosuspend = 1, solved this)
Comment 12 Bing Zhao 2013-12-02 23:20:00 UTC
Thanks for providing the information. We are trying to get a new firmware to fix the XHCI problem. I will notify here as soon as the new firmware is available. We will also check the autosuspend issues you reported.
Comment 13 molecular 2013-12-08 08:23:33 UTC
I registered so I can watch this thread.

I'm running with CONFIG_USB_XHCI_HCD disabled and am experiencing very annoying and weird problems (don't know if related) with powersave/sleep/wakeup.

Can't wait for the firmware update and will likely try the powersave disabling fix above once I figure out how to install modules correctly on arch (mkinicpio not loading my modules)
Comment 14 perpe 2013-12-11 18:55:34 UTC
Bing Zhao, 
the Surface Pro 2 uses the USB8797_PID_2 version, this is the one without firmware download. I don't know how a new firmware can solve the problems.
Comment 15 Bing Zhao 2013-12-13 00:45:36 UTC
USB8797_PID_1 is the one without firmware downloaded. After firmware is downloaded successfully, USB8797_PID_1 is disconnected and USB8797_PID_2 is created by firmware.

I saw a similar issue with xhci controller on an embedded platform where Bluetooth is over USB. That problem had been fixed by firmware change.
So I'm guessing that it could be the same problem here although this is WLAN over USB.

I have tested USB8797 on my Lenovo T430 laptop with a USB 3.0 port and XHCI enabled in kernel. The issue mentioned in bug description and comment 4 are not reproducible on this platform with the upstream firmware 14.69.11.p179. However I couldn't test suspend/resume because this T430 laptop resumes into a black screen with and without USB8797 card.

On my other laptop Dell E6410, there are only USB 2.0 ports. USB8797 also works fine as expected. The suspend/resume also works with and without the two changes mentioned in comment 9. I don't see any problem on running on battery either.

Perhaps this is something specific to Surface Pro 2. I don't have such a device though. Could you attach your kernel .config?
Comment 16 molecular 2013-12-15 09:54:50 UTC
I can't get the mwifiex_usb to work even with the patch by perpe from comment 9.

This is on a surface pro 2 using a kernel 3.12.5, with CONFIG_USB_XHCI_HCD=m and patch from commet 9 applied to drivers/net/wireless/mwifiex/usb.c.

Most notable excerpts from dmesg:

[   11.784275] usb 2-2: USB disconnect, device number 2
[   11.784341] mwifiex_usb: mwifiex_usb_disconnect: card or card->adapter is NULL

[   22.524784] usb 2-2: WLAN FW is active
[   22.544784] Bluetooth: Core ver 2.16
[   22.544802] NET: Registered protocol family 31
[   22.544804] Bluetooth: HCI device and connection manager initialized
[   22.544810] Bluetooth: HCI socket layer initialized
[   22.544812] Bluetooth: L2CAP socket layer initialized
[   22.544817] Bluetooth: SCO socket layer initialized
[   22.546271] usbcore: registered new interface driver btusb
[   32.522236] usb 2-2: mwifiex_cmd_timeout_func: Timeout cmd id (1387100407.490678) = 0xa9, act = 0x0
[   32.522251] usb 2-2: num_data_h2c_failure = 0
[   32.522255] usb 2-2: num_cmd_h2c_failure = 0
[   32.522259] usb 2-2: num_cmd_timeout = 1
[   32.522262] usb 2-2: num_tx_timeout = 0
[   32.522265] usb 2-2: last_cmd_index = 1
[   32.522269] usb 2-2: last_cmd_id: 00 00 a9 00 00 00 00 00 00 00
[   32.522273] usb 2-2: last_cmd_act: 00 00 00 00 00 00 00 00 00 00
[   32.522276] usb 2-2: last_cmd_resp_index = 0
[   32.522280] usb 2-2: last_cmd_resp_id: 00 00 00 00 00 00 00 00 00 00
[   32.522283] usb 2-2: last_event_index = 0
[   32.522286] usb 2-2: last_event: 00 00 00 00 00 00 00 00 00 00
[   32.522290] usb 2-2: data_sent=0 cmd_sent=0
[   32.522293] usb 2-2: ps_mode=0 ps_state=0
[   32.522300] usb 2-2: cmd timeout

Is there anything I can do to help solve this?
Comment 17 Bing Zhao 2013-12-16 18:07:20 UTC
The xhci issue is there regardless the changes from comment 9.
I will order a Surface Pro 2 for debugging this issue.
Comment 18 Reyad Attiyat 2013-12-16 19:14:26 UTC
I have also tried perpe patch however I'm unable to get the Surface to suspend/hibernate even if I disable mwifiex_usb.
Comment 19 Bing Zhao 2013-12-16 19:44:36 UTC
@Reyad, when you say "disable mwifiex_usb", do you mean that the mwifiex_usb is disabled in kernel .config (or mwifiex.ko and mwifiex_usb.ko are not loaded at all)? If that is the case, perpe's patches won't take effect and there could be generic suspend/hibernate issues with Surface Pro 2.
Can anyone else confirm the same behavir (suspend/hibernate doesn't work even without usb8797 driver), perpe, molecular, jon? Thanks!
Comment 20 Reyad Attiyat 2013-12-16 20:59:39 UTC
@Bing I have tested both ehci and xhci with mwifiex_usb compiled as a module and then unloaded before testing suspend. Should I try removing them from .config? I have attached two dmesg.

The first is with vanilla kenrel 3.11 mwifiex unloaded and xhci loaded as a module. The device does suspend I think although I do get lot of messages about timeout and device not responding. 

The second is with vanilla kernel 3.12.5 mwifiex unloaded as well. This kernel is built without xhci and it might suspend but when it comes back no usb devices work.

I also get some message about the firmware being in a reset state
usb 1-1.2: PREP_CMD: FW in reset state

If you would like I can test other configurations or post more log output. I have even tested the newest xhci from Sarah Sharp xhci linux repo and it did not work. It did seem to get rid of the error about usb couldn't be configured but still get the same message about command time out like before.
Comment 21 Reyad Attiyat 2013-12-16 21:00:28 UTC
Created attachment 118691 [details]
DMESG 3.11 Kernel xhci suspend
Comment 22 Reyad Attiyat 2013-12-16 21:01:02 UTC
Created attachment 118701 [details]
DMESG 3.12 ehci
Comment 23 Bing Zhao 2013-12-16 23:20:36 UTC
Yes, please remove CONFIG_MWIFIEX* from .config and re-test suspend/hibernate and resume. This way USB8797 is irrelevant completely.
The "FW in reset state" messages are due to commands being blocked while unloading driver. It should allow these commands because you are not removing the device (by rfkill or something similar). But this shouldn't cause the problem here.
Comment 24 Reyad Attiyat 2013-12-18 02:56:19 UTC
I removed the driver from the kernel 3.12.5 build and tried to suspend the tablet. With EHCI I got the same result as when I used the kernel compiled with the mwifiex driver. The computer would go to a black screen and then wake it self up with no USB devices working. This should be a separate bug I'm guessing with EHCI or ACPI modules. 

With XHCI complied as a module the tablet did seem to suspend and did not wake up after a few seconds like the EHCI kernel would. The tablet also woke up correctly and USB devices worked. 

I captured the dmesg output for both. The XHCI kernel is the most verbose with some backtraces logged as well. These seem to be from XHCI driver it self.

Hope these help you. If you need anything else performed just let me know. I will try and test hibernation as well and report back.
Comment 25 Reyad Attiyat 2013-12-18 02:57:40 UTC
Created attachment 118881 [details]
DMESG 3.12.5 EHCI without MWIFIEX
Comment 26 Reyad Attiyat 2013-12-18 02:58:42 UTC
Created attachment 118891 [details]
DMESG 3.12.5 XHCI without MWIFIEX
Comment 27 Bing Zhao 2013-12-18 18:20:33 UTC
Thanks Reyad for running these tests.
The result matches perpe's comment #11 "wifi only works without xhci (but the Surface Pro 2 needs xhci for proper sleep/resume handling)".

There is nothing I need from you right now. I'm waiting for my Surface Pro 2 to arrive for further debug.
Comment 28 molecular 2013-12-22 10:18:32 UTC
Thanks Bing Zhao for jumping on this and even ordering a device. If you don't despise of it I would like to chip in a little Bitcoin donation to your purchase (give me an address if you agree)
Comment 29 Bing Zhao 2013-12-23 20:57:28 UTC
Thank you molecular for your generosity!
Don't worry about it as Marvell will purchase the device.
Comment 30 Bing Zhao 2014-01-09 02:24:06 UTC
Sorry for delay. OK, I got my Suface Pro 2 and installed Ubuntu 12.04 LTS 64-bit on it with latest mwifiex driver from wireless-testing.git tree. The kernel version is 3.13-rc7. The command timeout problems also happen with original usb8797 firmware image from linux-firmware.git tree.

I've uploaded an updated usb8797 firmware image (v14.68.29.p26) here:

git://git.marvell.com/mwifiex-firmware.git master

The command timeout seems resolved now with this new firmware.
I also tested suspend/resume briefly. It works fine for me except for one time I got system freeze after system resumed back. After power cycled my Suface Pro 2 I haven't had any issue since.

Please get the new firmware image and try on your system. Thanks!

If this version works for you, I'll post a pull-request to upstream.
Comment 31 Reyad Attiyat 2014-01-09 05:22:06 UTC
Just tested the new firmware on kernel 3.13-rc7 with Fedora 20 and can report the same results as you have. There is no issue starting the mwifiex driver when XHCI usb driver is enabled.

Thanks again Bing Zhao, and everyone else who helped, for all your time on this issue. You've made this device a lot more functional under linux.
Comment 32 joakim verona 2014-01-10 15:16:31 UTC
It works for me now also, with the same setup as Reyad.

Thanks!
Comment 33 Zwane Mwaikambo 2014-01-11 07:16:57 UTC
Thanks for the updated firmware! It works for me too using the 3.13.0-rc4 mwifiex_usb driver. However i do get the USB wlan device stop responding with autosuspend enabled. I have no modified the btusb and mwifiex_usb drivers to not advertise supports_autosuspend and it is currently more stable.

[ 2049.357261] xhci_hcd 0000:00:14.0: Timeout while waiting for address device command
[ 2049.567264] usb 2-2: Device not responding to set address.
[ 2049.777019] usb 2-2: device not accepting address 8, error -71
[ 2049.896957] usb 2-2: new high-speed USB device number 9 using xhci_hcd
[ 2054.894199] xhci_hcd 0000:00:14.0: Timeout while waiting for address device command
[ 2055.104179] usb 2-2: Device not responding to set address.
[ 2055.313949] usb 2-2: device not accepting address 9, error -71
[ 2055.433909] usb 2-2: new high-speed USB device number 10 using xhci_hcd
[ 2060.431100] xhci_hcd 0000:00:14.0: Timeout while waiting for address device command
[ 2060.641045] usb 2-2: Device not responding to set address.
[ 2060.850838] usb 2-2: device not accepting address 10, error -71
[ 2060.970787] usb 2-2: new high-speed USB device number 11 using xhci_hcd
[ 2065.968026] xhci_hcd 0000:00:14.0: Timeout while waiting for address device command
[ 2066.178027] usb 2-2: Device not responding to set address.
[ 2066.387734] usb 2-2: device not accepting address 11, error -71
[ 2066.387776] hub 2-0:1.0: unable to enumerate USB device on port 2
[ 2157.978216] usbcore: deregistering interface driver usb8797
[ 2184.009739] usbcore: registered new interface driver usb8797
Comment 34 molecular 2014-01-11 08:26:06 UTC
(In reply to joakim verona from comment #32)
> It works for me now also, with the same setup as Reyad.
> 
> Thanks!

New firmware is looking good for me (on kernel 3.12.5)

Thank you so much Bing Zhao, I think you can push this.

here's dmesg of boot:

[   22.478953] usb 2-2: USB disconnect, device number 2
[   22.479023] mwifiex_usb: mwifiex_usb_disconnect: card or card->adapter is NULL
[   23.208529] usb 2-2: new high-speed USB device number 7 using xhci_hcd
[   23.223365] usb 2-2: ep 0x4 - rounding interval to 128 microframes, ep desc says 255 microframes
[   23.226678] usb 2-2: WLAN FW is active
[   23.246049] Bluetooth: Core ver 2.16
[   23.246066] NET: Registered protocol family 31
[   23.246067] Bluetooth: HCI device and connection manager initialized
[   23.246075] Bluetooth: HCI socket layer initialized
[   23.246077] Bluetooth: L2CAP socket layer initialized
[   23.246081] Bluetooth: SCO socket layer initialized
[   23.247479] usbcore: registered new interface driver btusb
[   23.475132] ieee80211 phy0: ignoring F/W country code EU
[   23.475399] usb 2-2: driver_version = mwifiex 1.0 (14.68.29.p26) 
[   25.870382] usb 2-1: new low-speed USB device number 8 using xhci_hcd
[   25.887816] usb 2-1: ep 0x81 - rounding interval to 64 microframes, ep desc says 80 microframes
[   25.890060] input: PixArt USB Optical Mouse as /devices/pci0000:00/0000:00:14.0/usb2/2-1/2-1:1.0/input/input15
[   25.890199] hid-generic 0003:093A:2510.0006: input,hidraw4: USB HID v1.11 Mouse [PixArt USB Optical Mouse] on usb-0000:00:14.0-1/input0
[   30.816289] hda-intel: IRQ timing workaround is activated for card #1. Suggest a bigger bdl_pos_adj.
[   31.050863] fuse init (API version 7.22)

and here what happens on "echo mem > /sys/power/state", wait, press power button:

[  158.445370] PM: Syncing filesystems ... done.
[  158.459174] PM: Preparing system for mem sleep
[  158.459407] Freezing user space processes ... (elapsed 0.001 seconds) done.
[  158.460711] Freezing remaining freezable tasks ... (elapsed 0.000 seconds) done.
[  158.461541] PM: Entering mem sleep
[  158.461806] Suspending console(s) (use no_console_suspend to debug)
[  158.465499] usb 2-2: None of the WOWLAN triggers enabled
[  158.465732] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[  158.465799] sd 0:0:0:0: [sda] Stopping disk
[  158.544909] mei_me 0000:00:16.0: suspend
[  168.472749] usb 2-2: mwifiex_cmd_timeout_func: Timeout cmd id (1389428422.608831) = 0x28, act = 0x13
[  168.472751] usb 2-2: num_data_h2c_failure = 0
[  168.472752] usb 2-2: num_cmd_h2c_failure = 0
[  168.472753] usb 2-2: num_cmd_timeout = 1
[  168.472754] usb 2-2: num_tx_timeout = 0
[  168.472755] usb 2-2: last_cmd_index = 0
[  168.472756] usb 2-2: last_cmd_id: 28 00 16 00 24 00 28 00 e5 00
[  168.472757] usb 2-2: last_cmd_act: 13 00 00 00 00 1f 13 00 01 00
[  168.472759] usb 2-2: last_cmd_resp_index = 4
[  168.472760] usb 2-2: last_cmd_resp_id: 7f 80 16 80 24 80 28 80 e5 80
[  168.472761] usb 2-2: last_event_index = 3
[  168.472762] usb 2-2: last_event: 00 00 17 00 2b 00 33 00 00 00
[  168.472763] usb 2-2: data_sent=0 cmd_sent=0
[  168.472764] usb 2-2: ps_mode=1 ps_state=0
[  168.472768] usb 2-2: cmd timeout
[  168.472792] ieee80211 phy0: deleting the crypto keys
[  168.472796] usb 2-2: PREP_CMD: device in suspended state
[  168.472798] ieee80211 phy0: deleting the crypto keys
[  168.472799] usb 2-2: PREP_CMD: device in suspended state
[  168.472800] ieee80211 phy0: deleting the crypto keys
[  168.472801] usb 2-2: PREP_CMD: device in suspended state
[  168.472802] ieee80211 phy0: deleting the crypto keys
[  168.472803] usb 2-2: PREP_CMD: device in suspended state
[  168.472804] ieee80211 phy0: deleting the crypto keys
[  168.472828] cfg80211: Calling CRDA to update world regulatory domain
[  168.473025] PM: suspend of devices complete after 10016.336 msecs
[  168.473298] PM: late suspend of devices complete after 0.270 msecs
[  168.473573] ehci-pci 0000:00:1d.0: System wakeup enabled by ACPI
[  168.499442] xhci_hcd 0000:00:14.0: System wakeup enabled by ACPI
[  168.513147] PM: noirq suspend of devices complete after 39.868 msecs
[  168.513515] ACPI: Preparing to enter system sleep state S3
[  168.513814] PM: Saving platform NVS memory
[  168.517798] Disabling non-boot CPUs ...
[  168.619321] smpboot: CPU 1 is now offline
[  168.722598] smpboot: CPU 2 is now offline
[  168.825873] smpboot: CPU 3 is now offline
[  168.827816] ACPI: Low-level resume complete
[  168.827881] PM: Restoring platform NVS memory
[  168.829241] Enabling non-boot CPUs ...
[  168.829290] smpboot: Booting Node 0 Processor 1 APIC 0x2
[  168.844018] Intel pstate controlling: cpu 1
[  168.844084] CPU1 is up
[  168.844109] smpboot: Booting Node 0 Processor 2 APIC 0x1
[  168.858816] Intel pstate controlling: cpu 2
[  168.858864] CPU2 is up
[  168.858883] smpboot: Booting Node 0 Processor 3 APIC 0x3
[  168.873495] Intel pstate controlling: cpu 3
[  168.873541] CPU3 is up
[  168.879904] ACPI: Waking up from system sleep state S3
[  168.918926] xhci_hcd 0000:00:14.0: System wakeup disabled by ACPI
[  168.958938] ehci-pci 0000:00:1d.0: System wakeup disabled by ACPI
[  168.972489] PM: noirq resume of devices complete after 92.069 msecs
[  168.972851] PM: early resume of devices complete after 0.329 msecs
[  168.972909] i915 0000:00:02.0: setting latency timer to 64
[  168.972920] xhci_hcd 0000:00:14.0: setting latency timer to 64
[  168.973048] mei_me 0000:00:16.0: irq 58 for MSI/MSI-X
[  168.973204] ahci 0000:00:1f.2: setting latency timer to 64
[  168.973519] snd_hda_intel 0000:00:1b.0: irq 60 for MSI/MSI-X
[  168.973624] ehci-pci 0000:00:1d.0: setting latency timer to 64
[  168.973937] snd_hda_intel 0000:00:03.0: irq 61 for MSI/MSI-X
[  169.272251] usb 2-2: received data processing failed!
[  169.292050] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  169.294589] ata1.00: ACPI cmd ef/10:09:00:00:00:b0 (SET FEATURES) succeeded
[  169.295206] ata1.00: ACPI cmd ef/10:09:00:00:00:b0 (SET FEATURES) succeeded
[  169.295422] ata1.00: configured for UDMA/133
[  169.295492] sd 0:0:0:0: [sda] Starting disk
[  169.432185] usb 2-4: reset full-speed USB device number 4 using xhci_hcd
[  169.447835] xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep ffff880212491a00
[  169.447836] xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep ffff880212491a40
[  169.447851] xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep ffff8800da8bc480
[  169.605292] usb 2-6: reset high-speed USB device number 6 using xhci_hcd
[  169.632990] xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep ffff880212746bc0
[  169.778655] usb 2-5: reset high-speed USB device number 5 using xhci_hcd
[  169.806463] xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep ffff8800da9c0880
[  169.871192] PM: resume of devices complete after 898.825 msecs
[  169.871448] PM: Finishing wakeup.
[  169.872563] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment
[  169.872576] pci 0000:00:00.0: no hotplug settings from platform
[  169.872579] pci 0000:00:00.0: using default PCI settings
[  169.872586] i915 0000:00:02.0: no hotplug settings from platform
[  169.872588] i915 0000:00:02.0: using default PCI settings
[  169.872595] snd_hda_intel 0000:00:03.0: no hotplug settings from platform
[  169.872600] xhci_hcd 0000:00:14.0: no hotplug settings from platform
[  169.872602] xhci_hcd 0000:00:14.0: using default PCI settings
[  169.872612] mei_me 0000:00:16.0: no hotplug settings from platform
[  169.872614] mei_me 0000:00:16.0: using default PCI settings
[  169.872624] snd_hda_intel 0000:00:1b.0: no hotplug settings from platform
[  169.872628] ehci-pci 0000:00:1d.0: no hotplug settings from platform
[  169.872630] ehci-pci 0000:00:1d.0: using default PCI settings
[  169.872641] lpc_ich 0000:00:1f.0: no hotplug settings from platform
[  169.872643] lpc_ich 0000:00:1f.0: using default PCI settings
[  169.872655] ahci 0000:00:1f.2: no hotplug settings from platform
[  169.872658] ahci 0000:00:1f.2: using default PCI settings
[  169.872666] pci 0000:00:1f.3: no hotplug settings from platform
[  169.872668] pci 0000:00:1f.3: using default PCI settings
[  169.871449] Restarting tasks ... done.
[  169.873343] video LNXVIDEO:00: Restoring backlight state
[  169.873734] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment
[  169.873744] pci 0000:00:00.0: no hotplug settings from platform
[  169.873746] pci 0000:00:00.0: using default PCI settings
[  169.873753] i915 0000:00:02.0: no hotplug settings from platform
[  169.873755] i915 0000:00:02.0: using default PCI settings
[  169.873762] snd_hda_intel 0000:00:03.0: no hotplug settings from platform
[  169.873766] xhci_hcd 0000:00:14.0: no hotplug settings from platform
[  169.873768] xhci_hcd 0000:00:14.0: using default PCI settings
[  169.873777] mei_me 0000:00:16.0: no hotplug settings from platform
[  169.873779] mei_me 0000:00:16.0: using default PCI settings
[  169.873789] snd_hda_intel 0000:00:1b.0: no hotplug settings from platform
[  169.873793] ehci-pci 0000:00:1d.0: no hotplug settings from platform
[  169.873795] ehci-pci 0000:00:1d.0: using default PCI settings
[  169.873805] lpc_ich 0000:00:1f.0: no hotplug settings from platform
[  169.873807] lpc_ich 0000:00:1f.0: using default PCI settings
[  169.873819] ahci 0000:00:1f.2: no hotplug settings from platform
[  169.873821] ahci 0000:00:1f.2: using default PCI settings
[  169.873829] pci 0000:00:1f.3: no hotplug settings from platform
[  169.873832] pci 0000:00:1f.3: using default PCI settings
[  169.890443] cfg80211: World regulatory domain updated:
[  169.890447] cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
[  169.890449] cfg80211:   (2402000 KHz - 2472000 KHz @ 40000 KHz), (N/A, 2000 mBm)
[  169.890451] cfg80211:   (2457000 KHz - 2482000 KHz @ 40000 KHz), (N/A, 2000 mBm)
[  169.890453] cfg80211:   (2474000 KHz - 2494000 KHz @ 20000 KHz), (N/A, 2000 mBm)
[  169.890455] cfg80211:   (5170000 KHz - 5250000 KHz @ 80000 KHz), (N/A, 2000 mBm)
[  169.890457] cfg80211:   (5735000 KHz - 5835000 KHz @ 80000 KHz), (N/A, 2000 mBm)
[  169.890459] cfg80211:   (57240000 KHz - 63720000 KHz @ 2160000 KHz), (N/A, 0 mBm)
[  169.890518] cfg80211: Calling CRDA for country: DE
[  169.896332] cfg80211: Regulatory domain changed to country: DE
[  169.896336] cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
[  169.896337] cfg80211:   (2400000 KHz - 2483500 KHz @ 40000 KHz), (N/A, 2000 mBm)
[  169.896339] cfg80211:   (5150000 KHz - 5350000 KHz @ 80000 KHz), (N/A, 2000 mBm)
[  169.896340] cfg80211:   (5470000 KHz - 5725000 KHz @ 80000 KHz), (N/A, 2698 mBm)
[  169.896341] cfg80211:   (57240000 KHz - 65880000 KHz @ 2160000 KHz), (N/A, 4000 mBm)
[  170.234875] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off

What else to test?
Comment 35 Zwane Mwaikambo 2014-01-13 05:02:21 UTC
I had the mwifiex hardware stop responding again, this time with the following output; Alternatively it could be an entirely different mwifiex_usb bug.

[ 5351.415748] usb 2-2: mwifiex_cmd_timeout_func: Timeout cmd id (1389588459.840482) = 0xa4, act = 0x0
[ 5351.415752] usb 2-2: num_data_h2c_failure = 0
[ 5351.415753] usb 2-2: num_cmd_h2c_failure = 1
[ 5351.415755] usb 2-2: num_cmd_timeout = 3
[ 5351.415756] usb 2-2: num_tx_timeout = 0
[ 5351.415757] usb 2-2: last_cmd_index = 4
[ 5351.415758] usb 2-2: last_cmd_id: 16 00 a4 00 7f 00 16 00 a4 00
[ 5351.415760] usb 2-2: last_cmd_act: 00 00 00 00 00 00 00 00 00 00
[ 5351.415761] usb 2-2: last_cmd_resp_index = 1
[ 5351.415762] usb 2-2: last_cmd_resp_id: 7f 80 16 80 7f 80 16 80 a4 80
[ 5351.415763] usb 2-2: last_event_index = 0
[ 5351.415765] usb 2-2: last_event: 2b 00 2b 00 17 00 2b 00 17 00
[ 5351.415766] usb 2-2: data_sent=0 cmd_sent=0
[ 5351.415767] usb 2-2: ps_mode=1 ps_state=0
[ 5351.415774] usb 2-2: cmd timeout
[ 5351.415785] usb 2-2: failed to get signal information
[ 5361.430164] usb 2-2: mwifiex_cmd_timeout_func: Timeout cmd id (1389588469.860464) = 0xa4, act = 0x0                                                                                       
[ 5361.430168] usb 2-2: num_data_h2c_failure = 0                                                                                                                                             
[ 5361.430170] usb 2-2: num_cmd_h2c_failure = 1                                                                                                                                              
[ 5361.430171] usb 2-2: num_cmd_timeout = 4                                                                                                                                                  
[ 5361.430173] usb 2-2: num_tx_timeout = 0                                                                                                                                                   
[ 5361.430174] usb 2-2: last_cmd_index = 0                                                                                                                                                   
[ 5361.430176] usb 2-2: last_cmd_id: a4 00 a4 00 7f 00 16 00 a4 00                                                                                                                           
[ 5361.430177] usb 2-2: last_cmd_act: 00 00 00 00 00 00 00 00 00 00                                                                                                                          
[ 5361.430178] usb 2-2: last_cmd_resp_index = 1                                                                                                                                              
[ 5361.430180] usb 2-2: last_cmd_resp_id: 7f 80 16 80 7f 80 16 80 a4 80                                                                                                                      
[ 5361.430181] usb 2-2: last_event_index = 0
[ 5361.430182] usb 2-2: last_event: 2b 00 2b 00 17 00 2b 00 17 00
[ 5361.430184] usb 2-2: data_sent=0 cmd_sent=1
[ 5361.430185] usb 2-2: ps_mode=1 ps_state=0
[ 5361.430190] usb 2-2: cmd timeout
[ 5361.430196] usb 2-2: failed to get signal information
[ 5361.430222] ------------[ cut here ]------------
[ 5361.430229] WARNING: CPU: 2 PID: 4939 at drivers/usb/core/urb.c:336 usb_submit_urb+0x46e/0x480()
[ 5361.430231] URB ffff8802133c7a80 submitted while active
[ 5361.430233] Modules linked in: hid_microsoft btusb(F) vmnet(OF) vmw_vsock_vmci_transport vsock vmw_vmci vmmon(OF) snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp hid_sensor_als coretemp hid_sensor_accel_3d crct10dif_pclmul hid_sensor_gyro_3d hid_sensor_magn_3d crc32_pclmul hid_sensor_trigger industrialio_triggered_buffer ghash_clmulni_intel kfifo_buf aesni_intel mwifiex_usb(F) industrialio hid_sensor_iio_common mwifiex(F) aes_x86_64 lrw i915 gf128mul glue_helper ablk_helper joydev cryptd hid_sensor_hub snd_hda_codec_realtek hid_multitouch hid_generic cfg80211 snd_hda_intel uvcvideo snd_hda_codec tpm_infineon videobuf2_vmalloc videobuf2_memops videobuf2_core rfcomm bnep snd_hwdep videodev bluetooth parport_pc ppdev snd_pcm snd_page_alloc drm_kms_helper snd_seq_midi snd_seq_midi_event snd_rawmidi drm snd_seq usbhid hid snd_seq_device snd_timer microcode snd lpc_ich mei_me video mei mac_hid tpm_tis soundcore i2c_algo_bit binfmt_misc lp parport nls_iso8859_1 xhci_hcd ahci libahci
[ 5361.430289] CPU: 2 PID: 4939 Comm: kworker/u9:0 Tainted: GF         IO 3.13.0-rc4-s2pro+ #6
[ 5361.430290] Hardware name: Microsoft Corporation Surface Pro 2/Surface Pro 2, BIOS 2.03.0250 09/06/2013
[ 5361.430296] Workqueue: MWIFIEX_WORK_QUEUE mwifiex_main_work_queue [mwifiex]
[ 5361.430298]  0000000000000009 ffff880207df9c40 ffffffff81708165 ffff880207df9c88
[ 5361.430301]  ffff880207df9c78 ffffffff8105de5c ffff8802133c7a80 ffff8801f222e000
[ 5361.430304]  0000000000000001 ffff8801f1f54800 ffff88020a4f09bc ffff880207df9cd8
[ 5361.430306] Call Trace:
[ 5361.430311]  [<ffffffff81708165>] dump_stack+0x45/0x56
[ 5361.430315]  [<ffffffff8105de5c>] warn_slowpath_common+0x8c/0xc0
[ 5361.430318]  [<ffffffff8105df4c>] warn_slowpath_fmt+0x4c/0x50
[ 5361.430321]  [<ffffffff81548e3e>] usb_submit_urb+0x46e/0x480
[ 5361.430324]  [<ffffffff8101b3c9>] ? read_tsc+0x9/0x20
[ 5361.430328]  [<ffffffffa0300176>] mwifiex_usb_host_to_card+0x116/0x280 [mwifiex_usb]
[ 5361.430334]  [<ffffffffa04c1e43>] mwifiex_exec_next_cmd+0x3d3/0x530 [mwifiex]
[ 5361.430338]  [<ffffffff81094ddf>] ? __dequeue_entity+0x2f/0x50
[ 5361.430343]  [<ffffffffa04bdb90>] mwifiex_main_process+0x500/0x540 [mwifiex]
[ 5361.430348]  [<ffffffffa04bdbf0>] mwifiex_main_work_queue+0x20/0x30 [mwifiex]
[ 5361.430352]  [<ffffffff810798c8>] process_one_work+0x178/0x430
[ 5361.430355]  [<ffffffff8107aed1>] worker_thread+0x121/0x3e0
[ 5361.430358]  [<ffffffff8107adb0>] ? manage_workers.isra.20+0x300/0x300
[ 5361.430361]  [<ffffffff81081739>] kthread+0xc9/0xe0
[ 5361.430365]  [<ffffffff81010000>] ? perf_trace_xen_mc_callback+0xe0/0xe0
[ 5361.430367]  [<ffffffff81081670>] ? flush_kthread_worker+0x80/0x80
[ 5361.430371]  [<ffffffff8171883c>] ret_from_fork+0x7c/0xb0
[ 5361.430374]  [<ffffffff81081670>] ? flush_kthread_worker+0x80/0x80
[ 5361.430376] ---[ end trace 19e97cfbe3854cf4 ]---
[ 5361.430378] usb 2-2: mwifiex_usb_host_to_card: usb_submit_urb failed
[ 5361.430380] usb 2-2: DNLD_CMD: host to card failed
[ 5361.430388] usb 2-2: failed to get signal information
[ 5361.430419] usb 2-2: mwifiex_usb_host_to_card: usb_submit_urb failed
[ 5361.430422] usb 2-2: DNLD_CMD: host to card failed
[ 5361.430428] usb 2-2: failed to get signal information
[ 5361.430517] usb 2-2: mwifiex_usb_host_to_card: usb_submit_urb failed
[ 5361.430519] usb 2-2: DNLD_CMD: host to card failed
[ 5361.430526] usb 2-2: failed to get signal information
Comment 36 molecular 2014-01-13 12:39:21 UTC
I'm seeing problems with the new firmware (after it worked seemingly flawlessly for 2 days). After about 3 hours of working with the machine, wifi stopped working.

There's a burst of about 7200 identical lines like this:

  Jan 13 12:37:37 dose kernel: usb 2-2: data: -ENOSR is returned

within timespan of about 2 seconds.

After that:

Jan 13 12:38:21 dose kernel: usb 2-2: data: -ENOSR is returned
Jan 13 12:38:53 dose kernel: usb 2-2: data: -ENOSR is returned
Jan 13 12:40:25 dose kernel: usb 2-2: data: -ENOSR is returned
Jan 13 12:40:25 dose kernel: usb 2-2: data: -ENOSR is returned
Jan 13 12:52:27 dose kernel: usb 2-2: mwifiex_cmd_timeout_func: Timeout cmd id (1389613947.580533) = 0xa4, act = 0x0
Jan 13 12:52:27 dose kernel: usb 2-2: num_data_h2c_failure = 0
Jan 13 12:52:27 dose kernel: usb 2-2: num_cmd_h2c_failure = 0
Jan 13 12:52:27 dose kernel: usb 2-2: num_cmd_timeout = 1
Jan 13 12:52:27 dose kernel: usb 2-2: num_tx_timeout = 0
Jan 13 12:52:27 dose kernel: usb 2-2: last_cmd_index = 4
Jan 13 12:52:27 dose kernel: usb 2-2: last_cmd_id: 16 00 a4 00 7f 00 16 00 a4 00
Jan 13 12:52:27 dose kernel: usb 2-2: last_cmd_act: 00 00 00 00 00 00 00 00 00 00
Jan 13 12:52:27 dose kernel: usb 2-2: last_cmd_resp_index = 3
Jan 13 12:52:27 dose kernel: usb 2-2: last_cmd_resp_id: 16 80 a4 80 7f 80 16 80 7f 80
Jan 13 12:52:27 dose kernel: usb 2-2: last_event_index = 0
Jan 13 12:52:27 dose kernel: usb 2-2: last_event: 33 00 17 00 2b 00 33 00 33 00
Jan 13 12:52:27 dose kernel: usb 2-2: data_sent=0 cmd_sent=0
Jan 13 12:52:27 dose kernel: usb 2-2: ps_mode=1 ps_state=0
Jan 13 12:52:27 dose kernel: usb 2-2: cmd timeout
Jan 13 12:52:27 dose kernel: usb 2-2: failed to get signal information
Jan 13 12:52:34 dose kernel: usb 2-2: data: -ENOSR is returned
Jan 13 12:52:38 dose kernel: usb 2-2: mwifiex_cmd_timeout_func: Timeout cmd id (1389613958.113865) = 0xa4, act = 0x0
Jan 13 12:52:38 dose kernel: usb 2-2: num_data_h2c_failure = 0
Jan 13 12:52:38 dose kernel: usb 2-2: num_cmd_h2c_failure = 0
Jan 13 12:52:38 dose kernel: usb 2-2: num_cmd_timeout = 2
Jan 13 12:52:38 dose kernel: usb 2-2: num_tx_timeout = 0
Jan 13 12:52:38 dose kernel: usb 2-2: last_cmd_index = 0
Jan 13 12:52:38 dose kernel: usb 2-2: last_cmd_id: a4 00 a4 00 7f 00 16 00 a4 00
Jan 13 12:52:38 dose kernel: usb 2-2: last_cmd_act: 00 00 00 00 00 00 00 00 00 00
Jan 13 12:52:38 dose kernel: usb 2-2: last_cmd_resp_index = 3
Jan 13 12:52:38 dose kernel: usb 2-2: last_cmd_resp_id: 16 80 a4 80 7f 80 16 80 7f 80
Jan 13 12:52:38 dose kernel: usb 2-2: last_event_index = 0
Jan 13 12:52:38 dose kernel: usb 2-2: last_event: 33 00 17 00 2b 00 33 00 33 00
Jan 13 12:52:38 dose kernel: usb 2-2: data_sent=1 cmd_sent=1
Jan 13 12:52:38 dose kernel: usb 2-2: ps_mode=1 ps_state=0
Jan 13 12:52:38 dose kernel: usb 2-2: cmd timeout
Jan 13 12:52:38 dose kernel: usb 2-2: failed to get signal information
Jan 13 12:52:38 dose kernel: ------------[ cut here ]------------
Jan 13 12:52:38 dose kernel: WARNING: CPU: 0 PID: 2631 at drivers/usb/core/urb.c:330 usb_submit_urb+0x45b/0x470 [usbcore]()
Jan 13 12:52:38 dose kernel: URB ffff880212c3f480 submitted while active
Jan 13 12:52:38 dose kernel: Modules linked in: fuse btusb bluetooth snd_hda_codec_hdmi x86_pkg_temp_thermal kvm_intel kvm coretemp tpm_infineon iTCO_wdt iTCO_vendor_support mwifiex_usb uvcvideo mwifiex videobuf2_vmalloc videobuf2_memops cfg80211 videobuf2_core rfkill videodev media evdev joydev microcode pcspkr i2c_i801 thermal snd_hda_codec_realtek i915 snd_hda_intel intel_agp tpm_tis intel_gtt snd_hda_codec drm_kms_helper tpm snd_hwdep tpm_bios battery snd_pcm drm snd_page_alloc snd_timer mei_me i2c_algo_bit i2c_core snd mei lpc_ich soundcore video ac processor button hid_lenovo_tpkbd hid_multitouch hid_generic hid_sensor_hub usbhid hid sd_mod crct10dif_pclmul crct10dif_common crc32_pclmul crc32c_intel ghash_clmulni_intel ahci libahci aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd libata xhci_hcd
Jan 13 12:52:38 dose kernel:  ehci_pci ehci_hcd usbcore scsi_mod usb_common ext4 crc16 mbcache jbd2 dm_crypt dm_mod
Jan 13 12:52:38 dose kernel: CPU: 0 PID: 2631 Comm: kworker/0:0 Tainted: G          I  3.12.5-custom_xhci #1
Jan 13 12:52:38 dose kernel: Hardware name: Microsoft Corporation Surface Pro 2/Surface Pro 2, BIOS 2.03.0250 09/06/2013
Jan 13 12:52:38 dose kernel: Workqueue: MWIFIEX_WORK_QUEUE mwifiex_main_work_queue [mwifiex]
Jan 13 12:52:38 dose kernel:  0000000000000009 ffff880200b23c48 ffffffff814eccab ffff880200b23c90
Jan 13 12:52:38 dose kernel:  ffff880200b23c80 ffffffff81062bcd ffff880212c3f480 ffff8802139e2000
Jan 13 12:52:38 dose kernel:  0000000000000001 ffff88021237b93c 0000000000008000 ffff880200b23ce0
Jan 13 12:52:38 dose kernel: Call Trace:
Jan 13 12:52:38 dose kernel:  [<ffffffff814eccab>] dump_stack+0x54/0x8d
Jan 13 12:52:38 dose kernel:  [<ffffffff81062bcd>] warn_slowpath_common+0x7d/0xa0
Jan 13 12:52:38 dose kernel:  [<ffffffff81062c3c>] warn_slowpath_fmt+0x4c/0x50
Jan 13 12:52:38 dose kernel:  [<ffffffffa0120cdb>] usb_submit_urb+0x45b/0x470 [usbcore]
Jan 13 12:52:38 dose kernel:  [<ffffffffa01b941c>] mwifiex_usb_host_to_card+0x10c/0x280 [mwifiex_usb]
Jan 13 12:52:38 dose kernel:  [<ffffffff810be0fe>] ? getnstimeofday+0xe/0x30
Jan 13 12:52:38 dose kernel:  [<ffffffffa03c8f63>] mwifiex_exec_next_cmd+0x3d3/0x530 [mwifiex]
Jan 13 12:52:38 dose kernel:  [<ffffffffa03c4c62>] mwifiex_main_process+0x4e2/0x520 [mwifiex]
Jan 13 12:52:38 dose kernel:  [<ffffffffa03c4cc0>] mwifiex_main_work_queue+0x20/0x30 [mwifiex]
Jan 13 12:52:38 dose kernel:  [<ffffffff8107dba7>] process_one_work+0x167/0x450
Jan 13 12:52:38 dose kernel:  [<ffffffff8107e5b1>] worker_thread+0x121/0x3a0
Jan 13 12:52:38 dose kernel:  [<ffffffff8107e490>] ? manage_workers.isra.23+0x2b0/0x2b0
Jan 13 12:52:38 dose kernel:  [<ffffffff81084fc0>] kthread+0xc0/0xd0
Jan 13 12:52:38 dose kernel:  [<ffffffff81084f00>] ? kthread_create_on_node+0x120/0x120
Jan 13 12:52:38 dose kernel:  [<ffffffff814fb77c>] ret_from_fork+0x7c/0xb0
Jan 13 12:52:38 dose kernel:  [<ffffffff81084f00>] ? kthread_create_on_node+0x120/0x120
Jan 13 12:52:38 dose kernel: ---[ end trace 0c32ee593cb0aad2 ]---
Jan 13 12:52:38 dose kernel: usb 2-2: mwifiex_usb_host_to_card: usb_submit_urb failed
Jan 13 12:52:38 dose kernel: usb 2-2: DNLD_CMD: host to card failed
Jan 13 12:52:38 dose kernel: usb 2-2: failed to get signal information
Jan 13 12:52:38 dose kernel: usb 2-2: mwifiex_usb_host_to_card: usb_submit_urb failed
Jan 13 12:52:38 dose kernel: usb 2-2: DNLD_CMD: host to card failed
Jan 13 12:52:38 dose kernel: usb 2-2: failed to get signal information
Jan 13 12:52:39 dose kernel: usb 2-2: mwifiex_usb_host_to_card: usb_submit_urb failed
Jan 13 12:52:39 dose kernel: usb 2-2: DNLD_CMD: host to card failed
Jan 13 12:52:39 dose kernel: usb 2-2: failed to get signal information
Jan 13 12:52:39 dose kernel: usb 2-2: mwifiex_usb_host_to_card: usb_submit_urb failed
Jan 13 12:52:39 dose kernel: usb 2-2: DNLD_CMD: host to card failed
Jan 13 12:52:39 dose kernel: usb 2-2: mwifiex_usb_host_to_card: usb_submit_urb failed
Jan 13 12:52:39 dose kernel: usb 2-2: DNLD_CMD: host to card failed
Jan 13 12:52:39 dose kernel: usb 2-2: getting data rate
Jan 13 12:52:40 dose kernel: usb 2-2: mwifiex_usb_host_to_card: usb_submit_urb failed
Jan 13 12:52:40 dose kernel: usb 2-2: DNLD_CMD: host to card failed
Jan 13 12:52:40 dose kernel: usb 2-2: failed to get signal information

and another stacktrace shortly after:

Jan 13 12:53:22 dose kernel: ------------[ cut here ]------------
Jan 13 12:53:22 dose kernel: WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:264 dev_watchdog+0x279/0x290()
Jan 13 12:53:22 dose kernel: NETDEV WATCHDOG: mlan0 (usb): transmit queue 1 timed out
Jan 13 12:53:22 dose kernel: Modules linked in: fuse btusb bluetooth snd_hda_codec_hdmi x86_pkg_temp_thermal kvm_intel kvm coretemp tpm_infineon iTCO_wdt iTCO_vendor_support mwifiex_usb uvcvideo mwifiex videobuf2_vmalloc videobuf2_memops cfg80211 videobuf2_core rfkill videodev media evdev joydev microcode pcspkr i2c_i801 thermal snd_hda_codec_realtek i915 snd_hda_intel intel_agp tpm_tis intel_gtt snd_hda_codec drm_kms_helper tpm snd_hwdep tpm_bios battery snd_pcm drm snd_page_alloc snd_timer mei_me i2c_algo_bit i2c_core snd mei lpc_ich soundcore video ac processor button hid_lenovo_tpkbd hid_multitouch hid_generic hid_sensor_hub usbhid hid sd_mod crct10dif_pclmul crct10dif_common crc32_pclmul crc32c_intel ghash_clmulni_intel ahci libahci aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd libata xhci_hcd
Jan 13 12:53:22 dose kernel:  ehci_pci ehci_hcd usbcore scsi_mod usb_common ext4 crc16 mbcache jbd2 dm_crypt dm_mod
Jan 13 12:53:22 dose kernel: CPU: 3 PID: 0 Comm: swapper/3 Tainted: G        W I  3.12.5-custom_xhci #1
Jan 13 12:53:22 dose kernel: Hardware name: Microsoft Corporation Surface Pro 2/Surface Pro 2, BIOS 2.03.0250 09/06/2013
Jan 13 12:53:22 dose kernel:  0000000000000009 ffff88021fb83d58 ffffffff814eccab ffff88021fb83da0
Jan 13 12:53:22 dose kernel:  ffff88021fb83d90 ffffffff81062bcd 0000000000000001 ffff88021340b000
Jan 13 12:53:22 dose kernel:  ffff880212356440 0000000000000004 0000000000000003 ffff88021fb83df0
Jan 13 12:53:22 dose kernel: Call Trace:
Jan 13 12:53:22 dose kernel:  <IRQ>  [<ffffffff814eccab>] dump_stack+0x54/0x8d
Jan 13 12:53:22 dose kernel:  [<ffffffff81062bcd>] warn_slowpath_common+0x7d/0xa0
Jan 13 12:53:22 dose kernel:  [<ffffffff81062c3c>] warn_slowpath_fmt+0x4c/0x50
Jan 13 12:53:22 dose kernel:  [<ffffffff810707de>] ? mod_timer_pinned+0xde/0x1b0
Jan 13 12:53:22 dose kernel:  [<ffffffff814212b9>] dev_watchdog+0x279/0x290
Jan 13 12:53:22 dose kernel:  [<ffffffff81421040>] ? dev_graft_qdisc+0x80/0x80
Jan 13 12:53:22 dose kernel:  [<ffffffff8106f6a6>] call_timer_fn+0x36/0x190
Jan 13 12:53:22 dose kernel:  [<ffffffff81421040>] ? dev_graft_qdisc+0x80/0x80
Jan 13 12:53:22 dose kernel:  [<ffffffff81070024>] run_timer_softirq+0x214/0x2d0
Jan 13 12:53:22 dose kernel:  [<ffffffff81067a3f>] __do_softirq+0xef/0x2b0
Jan 13 12:53:22 dose kernel:  [<ffffffff814fd25c>] call_softirq+0x1c/0x30
Jan 13 12:53:22 dose kernel:  [<ffffffff81016b75>] do_softirq+0x55/0x90
Jan 13 12:53:22 dose kernel:  [<ffffffff81067d56>] irq_exit+0x86/0xb0
Jan 13 12:53:22 dose kernel:  [<ffffffff814fdc05>] smp_apic_timer_interrupt+0x45/0x60
Jan 13 12:53:22 dose kernel:  [<ffffffff814fc59d>] apic_timer_interrupt+0x6d/0x80
Jan 13 12:53:22 dose kernel:  <EOI>  [<ffffffff813cb702>] ? cpuidle_enter_state+0x52/0xc0
Jan 13 12:53:22 dose kernel:  [<ffffffff813cb6f8>] ? cpuidle_enter_state+0x48/0xc0
Jan 13 12:53:22 dose kernel:  [<ffffffff813cb837>] cpuidle_idle_call+0xc7/0x280
Jan 13 12:53:22 dose kernel:  [<ffffffff8101e30e>] arch_cpu_idle+0xe/0x30
Jan 13 12:53:22 dose kernel:  [<ffffffff810b63e1>] cpu_startup_entry+0x271/0x2f0
Jan 13 12:53:22 dose kernel:  [<ffffffff81042028>] start_secondary+0x228/0x2d0
Jan 13 12:53:22 dose kernel: ---[ end trace 0c32ee593cb0aad3 ]---

and loads more of these:

Jan 13 12:54:55 dose kernel: usb 2-2: failed to get signal information
Jan 13 12:54:55 dose kernel: usb 2-2: mwifiex_usb_host_to_card: usb_submit_urb failed
Jan 13 12:54:55 dose kernel: usb 2-2: DNLD_CMD: host to card failed
Jan 13 12:54:55 dose kernel: usb 2-2: failed to get signal information
Jan 13 12:54:56 dose kernel: usb 2-2: mwifiex_usb_host_to_card: usb_submit_urb failed
Jan 13 12:54:56 dose kernel: usb 2-2: DNLD_CMD: host to card failed
Jan 13 12:54:56 dose kernel: usb 2-2: failed to get signal information
Jan 13 12:54:56 dose kernel: usb 2-2: mwifiex_usb_host_to_card: usb_submit_urb failed
Jan 13 12:54:56 dose kernel: usb 2-2: DNLD_CMD: host to card failed
Jan 13 12:54:56 dose kernel: usb 2-2: failed to get signal information
Jan 13 12:54:57 dose kernel: usb 2-2: mwifiex_usb_host_to_card: usb_submit_urb failed
Jan 13 12:54:57 dose kernel: usb 2-2: DNLD_CMD: host to card failed
Jan 13 12:54:57 dose kernel: usb 2-2: failed to get signal information
Jan 13 12:54:57 dose kernel: usb 2-2: mwifiex_usb_host_to_card: usb_submit_urb failed

until I tried to reboot, which failed on shutdown (hang) as seen in this photo I took:

  http://i.imgur.com/8DWOK8H.png

I switched off the device and turned it back on. My USB keyboard wasn't working at this point: 

  http://i.imgur.com/8qpCBR2.png

Another power-cycle and everything was fine again:

  http://i.imgur.com/X74vSKB.png

(except those "usb 2-2: data: -ENOSR is returned" ocurring in bursts, but I consider these "warnings", not errors)
Comment 37 Bing Zhao 2014-01-14 03:54:05 UTC
@Zwane: how do you enable autosuspend in comment #33 and what are the steps to reproduce issue in comment #35? Could you send me your kernel .config file?

@molecular: Did you suspend/resume the device before you hit the issue in comment #36? Could you send over your kernel .config file?

I have the Surface Pro 2 now. Ideally I should be able to reproduce these problems and start debugging from there.
Comment 38 Zwane Mwaikambo 2014-01-14 04:56:11 UTC
Created attachment 121931 [details]
Zwane's kernel 3.13-rc4 config
Comment 39 Zwane Mwaikambo 2014-01-14 04:57:32 UTC
@Bing, i enabled autosuspend via powertop. The failure in comment #35 occurred after ~30 minutes of light web browsing.
Comment 40 Zwane Mwaikambo 2014-01-14 05:02:00 UTC
I found an interesting Ubuntu bug that may affect bug reports;

https://bugs.launchpad.net/ubuntu/+source/toshset/+bug/1219333

The Marvell wifi device just happens to be the same USB bus location as the 3G novatel model on Toshiba laptops.
Comment 41 molecular 2014-01-14 10:54:35 UTC
(In reply to Bing Zhao from comment #37)
> @molecular: Did you suspend/resume the device before you hit the issue in
> comment #36? Could you send over your kernel .config file?

I'm pretty sure I didn't suspend resume.

It might be notable that I was producing quite heavy traffic (copying files from my phone via ssh) before the problem ocurred. The scp was still running at the time the problem ocurred and then bailed with an error.

> I have the Surface Pro 2 now. Ideally I should be able to reproduce these
> problems and start debugging from there.

[nick@dose linux]$ uname -a
Linux dose 3.12.5-custom_xhci #1 SMP PREEMPT Thu Dec 12 13:44:41 CET 2013 x86_64 GNU/Linux

can put my kernel config here since it's too long (can I upload files here?)

my .config (from /proc/config.gz): http://pastebin.com/Nua4HjXP
Comment 42 molecular 2014-01-14 10:59:00 UTC
(In reply to molecular from comment #41)
> (In reply to Bing Zhao from comment #37)
> > @molecular: Did you suspend/resume the device before you hit the issue in
> > comment #36? Could you send over your kernel .config file?
> 
> I'm pretty sure I didn't suspend resume.
> 
> It might be notable that I was producing quite heavy traffic (copying files
> from my phone via ssh) before the problem ocurred. The scp was still running
> at the time the problem ocurred and then bailed with an error.

Let me also add that I had great trouble getting wifi back to working condition afterwards. It just didn't want to work any more at all even with original firmware and xhci-disabled kernel. I rebooted about 10 times, then it worked again. Strange, don't know what to make of that.

Sorry I didn't capture the errors at this point because I was fed up and had to get work done. Strangely rebooting many times did the trick, now using the xhci kernel and Bing Zhaos new firmware. No problems since I posted comment #36 and rebooted about 10 times.
Comment 43 molecular 2014-01-14 11:00:23 UTC
Created attachment 122021 [details]
moleculars xhci-enabled kernel config regarding comment #36
Comment 44 Bing Zhao 2014-01-19 04:14:31 UTC
@Zwane: Your kernel .config is almost same as mine. Just a couple lines different because yours is 3.13-rc4. For the autosuspend issue, were you testing with AC power or on battery? As perpe said in comment 11, when running on battery the Surface system suspends the usb8797. I can reproduce this behavior on my Surface Pro 2. Could you try perpe's workaround by removing ".supports_autosuspend = 1" line in mwifiex usb.c?

@molecular: are you able update the kernel to wireless-testing? My kernel is 3.13-rc7 (commit f3f9f6a). You .config is very different than mine even after I "make oldconfig" with 3.13rc7. I haven't got any problem so far with your .config on 3.13-rc7 though. Based on another bug report 68001, 3.13-rc7 seems more stable than 3.12.x.
Comment 45 Zwane Mwaikambo 2014-01-19 06:54:27 UTC
@Bing i was testing on battery and have removed autosuspend for both btusb and mwifiex_usb. That improves suspend/resume reliability, but the normal runtime failure is still there.
Comment 46 Reyad Attiyat 2014-01-31 17:54:32 UTC
I have experienced the same issue as molecular. I have tried kernel 3.13.0 and wireless-testing. The error seems to be the same each time:

[ 1055.519510] usb 2-2: mwifiex_cmd_timeout_func: Timeout cmd id (1391186967.75501) = 0xa4, act = 0x0
[ 1055.519528] usb 2-2: num_data_h2c_failure = 0
[ 1055.519532] usb 2-2: num_cmd_h2c_failure = 0
[ 1055.519536] usb 2-2: num_cmd_timeout = 1
[ 1055.519539] usb 2-2: num_tx_timeout = 0
[ 1055.519543] usb 2-2: last_cmd_index = 2
[ 1055.519547] usb 2-2: last_cmd_id: 7f 00 16 00 a4 00 16 00 a4 00
[ 1055.519550] usb 2-2: last_cmd_act: 00 00 00 00 00 00 00 00 00 00
[ 1055.519554] usb 2-2: last_cmd_resp_index = 1
[ 1055.519558] usb 2-2: last_cmd_resp_id: 7f 80 16 80 7f 80 16 80 a4 80
[ 1055.519561] usb 2-2: last_event_index = 0
[ 1055.519565] usb 2-2: last_event: 1c 00 17 00 2b 00 1c 00 1c 00
[ 1055.519568] usb 2-2: data_sent=0 cmd_sent=0
[ 1055.519572] usb 2-2: ps_mode=1 ps_state=0
[ 1055.519582] usb 2-2: cmd timeout
[ 1055.519636] usb 2-2: failed to get signal information

Followed by two backtraces
WARNING: CPU: 0 PID: 1442 at net/sched/sch_generic.c:264 dev_watchdog+0x276/0x280()

WARNING: CPU: 3 PID: 691 at drivers/usb/core/urb.c:336 usb_submit_urb+0x45b/0x470()

I'll post my entire dmesg below. I have tested multiple access points and seen this crash on each. It is not reproducible easily. It seems to occur within the first 30mins of reboot/association. I have tried heavy traffic simulation with iperf, but still could not reliably reproduce this error. Other than this the driver works correctly and performs well.
Comment 47 Bing Zhao 2014-01-31 17:56:09 UTC
Created attachment 124011 [details]
attachment-25476-0.html

I'm currently out of office. E-mail response will be delayed.
Comment 48 Reyad Attiyat 2014-01-31 17:56:41 UTC
Created attachment 124021 [details]
wireless-testing-dmesg-fail-signal
Comment 49 Bing Zhao 2014-02-04 22:45:04 UTC
@Reyad, could you enable DYNAMIC_DEBUG and collect the debug logs? Thanks!

set CONFIG_DYNAMIC_DEBUG=y in kernel .config and re-compile.
after reboot, run these commands to turn on dynamic debug for mwifiex modules:

echo "module mwifiex +p" > /sys/kernel/debug/dynamic_debug/control
echo "module mwifiex_usb +p" > /sys/kernel/debug/dynamic_debug/control
Comment 50 Reyad Attiyat 2014-02-05 15:41:40 UTC
Created attachment 124711 [details]
DMESG mwifiex crash WL Extra Debug

@Bing, Here is the most recent crash on wireless-testing with the extra debug information enabled.

This occurred within 10minutes of booting the device, after streaming content from youtube.

I'll try and get a second sampling of the error with the extra debug information enabled.
Comment 51 Reyad Attiyat 2014-02-06 17:38:06 UTC
Created attachment 124891 [details]
DMESG mwifiex crash WL Extra Debug 2

@Bing

Here is one more dmesg output after a crash. This crash was generated the same way as the one in the above comment. The device was plugged in and was streaming music from YouTube.

I'll try and capture more samples of the crash with extra debugging. Let me know if you need me to post them.
Comment 52 Bing Zhao 2014-02-07 20:47:50 UTC
Thanks Reyad for collecting these logs. I don't need any test from you now as I reproduce the tx timeout using a different .config file. See Bug 69661.
How often does the timeout happen on your Surface Pro 2?
Comment 53 Reyad Attiyat 2014-02-08 22:26:07 UTC
Well it happens a lot, like couple times a week for sure. I'm gonna mark this as a dupe as bug 69661 has the same error, failed to get signal information. This bug report as reported orignally was resolved with the firmware update.

*** This bug has been marked as a duplicate of bug 69661 ***
Comment 54 molecular 2015-03-17 14:02:57 UTC
WOW!!!

I finally have FULL SUCCESS (mwifiex_usb works fully, xhci (usb3) works fully, suspend and resume works fully) with Bing Zhaos (I assume) firmware and 3.19.1 kernel

here's what I did (for anyone stuck here, which seems possible to me given that I got stuck with vanilla kernel):

use xhci enabled kernel (like 3.19.1 in my case, enable CONFIG_USB_XHCI_HCD in kernel .config) (at this point things looked bleak: no mwifiex functionality, xhci freeze on suspend/resume (no keyboard))

So I decided to revisit firmware:

#> git clone git://git.marvell.com/mwifiex-firmware.git
#> cp mwifiex-firmware/mrvl/usb* /lib/firmware/mrvl/
#> reboot

SUCCESS!

boy am I glad that I revisited this (accidentally bought a USB 3 ethernet dongle, so I _had_ to).

I'm a happy man... thanks to everyone, especially Bing Zhao!

Note You need to log in before you can comment on or make changes to this bug.