Bug 186661 - mwifiex_pcie not stable - Surface Pro4
Summary: mwifiex_pcie not stable - Surface Pro4
Status: ASSIGNED
Alias: None
Product: Drivers
Classification: Unclassified
Component: network-wireless (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_network-wireless@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks: 178231
  Show dependency tree
 
Reported: 2016-11-02 02:39 UTC by Zhang Rui
Modified: 2018-11-26 08:41 UTC (History)
5 users (show)

See Also:
Kernel Version: 4.8
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg after resume from freeze mode (89.33 KB, application/x-archive)
2016-11-02 02:39 UTC, Zhang Rui
Details
dmesg after a fresh boot in 4.10-rc2 kernel (147.78 KB, text/plain)
2017-01-20 11:19 UTC, Zhang Rui
Details
lspci (15.23 KB, text/plain)
2017-01-20 11:22 UTC, Zhang Rui
Details
dmesg with the crash log (151.88 KB, text/plain)
2017-02-16 02:54 UTC, Zhang Rui
Details
screenshot that contains the crash log during shutdown (3.03 MB, image/jpeg)
2017-02-16 05:09 UTC, Zhang Rui
Details

Description Zhang Rui 2016-11-02 02:39:07 UTC
Created attachment 243521 [details]
dmesg after resume from freeze mode

After resume from freeze mode, wifi connection is lost, and a lot of mwifiex_pcie driver errors in dmesg.
[  229.126568] Restarting tasks ... done.
[  230.462523] [drm] RC6 on
[  231.582602] mwifiex_pcie 0000:02:00.0: Firmware wakeup failed
[  231.583938] mwifiex_pcie 0000:02:00.0: PREP_CMD: FW in reset state
[  231.583947] mwifiex_pcie 0000:02:00.0: scan failed: -1

Then I tried unloading and reloading mwifiex_pcie driver, and then system freezes and I need to reboot my system.
Comment 1 Len Brown 2016-11-07 23:22:26 UTC
Note: We can't compare freeze to mem/ACPI-S3 on this system, because the MS Surface Pro3 does not support ACPI/S3.
Comment 2 Zhang Rui 2017-01-20 11:18:17 UTC
The problem should be "mwifiex_pcie" driver is not stable.

In 4.4 kernel, wifi is working for days, and then become unusable, even before suspend-to-idle.
Now I have updated kernel to 4.10-rc2 and the problem becomes worse. It becomes unusable even after a couple of hours.

dmesg is full of below error messages
[17518.396212] mwifiex_pcie 0000:02:00.0: PREP_CMD: FW in reset state
[17518.396215] mwifiex_pcie 0000:02:00.0: scan failed: -1
[17519.397678] mwifiex_pcie 0000:02:00.0: PREP_CMD: FW in reset state
[17519.397686] mwifiex_pcie 0000:02:00.0: scan failed: -1
[17520.399299] mwifiex_pcie 0000:02:00.0: PREP_CMD: FW in reset state
[17520.399304] mwifiex_pcie 0000:02:00.0: scan failed: -1
[17521.400932] mwifiex_pcie 0000:02:00.0: PREP_CMD: FW in reset state
[17521.400943] mwifiex_pcie 0000:02:00.0: scan failed: -1
[17522.402873] mwifiex_pcie 0000:02:00.0: PREP_CMD: FW in reset state
[17522.402877] mwifiex_pcie 0000:02:00.0: scan failed: -1
[17523.403870] mwifiex_pcie 0000:02:00.0: PREP_CMD: FW in reset state
[17523.403874] mwifiex_pcie 0000:02:00.0: scan failed: -1
Comment 3 Zhang Rui 2017-01-20 11:19:56 UTC
Created attachment 252551 [details]
dmesg after a fresh boot in 4.10-rc2 kernel
Comment 4 Zhang Rui 2017-01-20 11:22:20 UTC
Created attachment 252561 [details]
lspci
Comment 5 Amitkumar Karwar 2017-01-23 07:23:16 UTC
Could you try latest firmware?
http://git.marvell.com/?p=mwifiex-firmware.git;a=commit;h=05e2f3a4acf4174ec507a3464a374ecb1b4ec011
Comment 6 Zhang Rui 2017-02-16 02:49:47 UTC
I have not upgraded firmware before, and the following is what I did

I downloaded pcie8897_uapsta.bin from the git repo you attached and copy it to /lib/firmware/mrvl/, but the problem still exists.

dmesg shows
[    7.024492] mwifiex_pcie 0000:02:00.0: WLAN FW is active
[    7.142161] mwifiex_pcie 0000:02:00.0: info: MWIFIEX VERSION: mwifiex 1.0 (15.68.7.p77) 
[    7.142165] mwifiex_pcie 0000:02:00.0: driver_version = mwifiex 1.0 (15.68.7.p77) 
[    7.142549] mwifiex_pcie 0000:02:00.0 wlp2s0: renamed from mlan0

so I guess the firmware is successfully upgraded, right?
Comment 7 Zhang Rui 2017-02-16 02:52:38 UTC
the wireless network usually breaks in one day.

When it breaks, I get this in dmesg
[ 1143.248229] skbuff: skb_over_panic: text:ffffffffc0654f24 len:3583 put:1353 head:ffff93cef6f77000 data:ffff93cef6f770e4 tail:0xee3 end:0xec0 dev:<NULL>
[ 1143.248255] ------------[ cut here ]------------
[ 1143.248282] kernel BUG at net/core/skbuff.c:105!
[ 1143.248305] invalid opcode: 0000 [#1] SMP
[ 1143.248323] Modules linked in: rfcomm bnep btusb btrtl btbcm btintel bluetooth hid_sensor_als hid_sensor_rotation hid_sensor_gyro_3d hid_sensor_accel_3d hid_sensor_trigger industrialio_triggered_buffer kfifo_buf industrialio hid_sensor_iio_common hid_sensor_hub snd_soc_skl snd_soc_skl_ipc snd_soc_sst_ipc snd_soc_sst_dsp snd_hda_ext_core snd_soc_sst_match snd_hda_codec_hdmi snd_soc_core snd_hda_codec_realtek snd_hda_codec_generic snd_compress ac97_bus snd_pcm_dmaengine snd_hda_intel snd_hda_codec intel_rapl snd_hda_core i2c_designware_platform x86_pkg_temp_thermal i2c_designware_core snd_hwdep intel_powerclamp coretemp snd_pcm nls_iso8859_1 mwifiex_pcie mwifiex snd_seq_midi snd_seq_midi_event kvm_intel kvm snd_rawmidi cfg80211 irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_seq pcbc
[ 1143.248585]  joydev snd_seq_device aesni_intel input_leds snd_timer aes_x86_64 crypto_simd glue_helper snd cryptd idma64 soundcore virt_dma mei_me mei shpchp intel_pch_thermal intel_lpss_pci surfacepro3_button intel_lpss_acpi mac_hid soc_button_array intel_lpss acpi_pad parport_pc ppdev lp parport autofs4 hid_generic usbhid i915 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm i2c_hid hid pinctrl_sunrisepoint video fjes pinctrl_intel
[ 1143.248700] CPU: 2 PID: 1180 Comm: kworker/u9:2 Not tainted 4.10.0-rc2+ #2
[ 1143.248727] Hardware name: Microsoft Corporation Surface Pro 4/Surface Pro 4, BIOS 106.1281.768 08/01/2016
[ 1143.248758] Workqueue: MWIFIEX_WORK_QUEUE mwifiex_main_work_queue [mwifiex]
[ 1143.248777] task: ffff93cf1b191e00 task.stack: ffffaf8402bc4000
[ 1143.248795] RIP: 0010:skb_panic+0x64/0x70
[ 1143.248807] RSP: 0018:ffffaf8402bc7cc8 EFLAGS: 00010282
[ 1143.248821] RAX: 000000000000008b RBX: ffff93cef7be7f00 RCX: 0000000000000000
[ 1143.248840] RDX: 0000000000000000 RSI: ffff93cf2f50de48 RDI: ffff93cf2f50de48
[ 1143.248859] RBP: ffffaf8402bc7ce8 R08: 000000000003c9a5 R09: 00000000000008bd
[ 1143.248878] R10: ffff93cef7be7c00 R11: ffffffff8812c38d R12: 0000000000000549
[ 1143.248897] R13: ffff93cf19186c00 R14: ffff93cef6f7c880 R15: 0000000000000001
[ 1143.248917] FS:  0000000000000000(0000) GS:ffff93cf2f500000(0000) knlGS:0000000000000000
[ 1143.248938] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1143.248954] CR2: 00007fead84f4000 CR3: 000000045b5af000 CR4: 00000000003406e0
[ 1143.248973] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1143.248992] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1143.249011] Call Trace:
[ 1143.249021]  skb_put+0x4d/0x50
[ 1143.249035]  mwifiex_11n_aggregate_pkt+0x1f4/0x670 [mwifiex]
[ 1143.249054]  mwifiex_wmm_process_tx+0x47e/0x940 [mwifiex]
[ 1143.249072]  ? __switch_to+0x23c/0x4f0
[ 1143.249086]  mwifiex_main_process+0x77b/0x8d0 [mwifiex]
[ 1143.249103]  mwifiex_main_work_queue+0x1f/0x30 [mwifiex]
[ 1143.249120]  process_one_work+0x16b/0x480
[ 1143.249132]  worker_thread+0x4b/0x500
[ 1143.249144]  kthread+0x101/0x140
[ 1143.249155]  ? process_one_work+0x480/0x480
[ 1143.249168]  ? kthread_create_on_node+0x60/0x60
[ 1143.250471]  ret_from_fork+0x25/0x30
[ 1143.251709] Code: cc 00 00 00 48 89 44 24 10 8b 87 c8 00 00 00 48 89 44 24 08 48 8b 87 d8 00 00 00 48 c7 c7 d0 78 d4 87 48 89 04 24 e8 fc 0f a4 ff <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 e8 77 ba cb ff
[ 1143.252997] RIP: skb_panic+0x64/0x70 RSP: ffffaf8402bc7cc8
[ 1143.261088] ---[ end trace abec5c473459f72c ]---

And when I tried to reboot my laptop, system hangs with the similar log in my screen.

Both full dmesg and screenshot contains the crash logs are attached.
Comment 8 Zhang Rui 2017-02-16 02:54:09 UTC
Created attachment 254778 [details]
dmesg with the crash log
Comment 9 Zhang Rui 2017-02-16 05:09:37 UTC
Created attachment 254779 [details]
screenshot that contains the crash log during shutdown
Comment 10 Amitkumar Karwar 2017-02-16 06:49:44 UTC
The issue reported in Comment #8 and #9 has been fixed recently. Please try below change.
https://patchwork.kernel.org/patch/9510541/
Comment 11 Zhang Rui 2017-02-16 07:35:00 UTC
cool.
I assume the patch can be applied on top of 4.10-rc8, right?
Will try the patch and get back to you ASAP.

Thanks for the quick response.
Comment 12 Zhang Rui 2017-02-18 07:13:27 UTC
I boot my kernel at 30 hours ago, and now I can see these message, like 15 hours after boot

Feb 18 11:09:11 rzhang1-surface kernel: [54024.368899] mwifiex_pcie 0000:02:00.0: Firmware wakeup failed

This is the first error message.

Feb 18 11:09:11 rzhang1-surface kernel: [54024.368973] mwifiex_pcie 0000:02:00.0: failed to get signal information
Feb 18 11:09:11 rzhang1-surface kernel: [54024.369302] mwifiex_pcie 0000:02:00.0: PREP_CMD: FW in reset state
Feb 18 11:09:11 rzhang1-surface kernel: [54024.369308] mwifiex_pcie 0000:02:00.0: failed to get signal information
Feb 18 11:09:14 rzhang1-surface kernel: [54027.704296] mwifiex_pcie 0000:02:00.0: PREP_CMD: FW in reset state
Feb 18 11:09:14 rzhang1-surface kernel: [54027.704302] mwifiex_pcie 0000:02:00.0: failed to get signal information
Feb 18 11:09:14 rzhang1-surface kernel: [54027.704354] mwifiex_pcie 0000:02:00.0: PREP_CMD: FW in reset state
Feb 18 11:09:14 rzhang1-surface kernel: [54027.704356] mwifiex_pcie 0000:02:00.0: failed to get signal information
Feb 18 11:09:20 rzhang1-surface kernel: [54033.708233] mwifiex_pcie 0000:02:00.0: PREP_CMD: FW in reset state
Feb 18 11:09:20 rzhang1-surface kernel: [54033.708239] mwifiex_pcie 0000:02:00.0: failed to get signal information
Feb 18 11:09:20 rzhang1-surface kernel: [54033.708290] mwifiex_pcie 0000:02:00.0: PREP_CMD: FW in reset state
Feb 18 11:09:20 rzhang1-surface kernel: [54033.708293] mwifiex_pcie 0000:02:00.0: failed to get signal information

then these following messages repeats every a few seconds. But until now, the network is still working... :)
Comment 13 Ganapathi Bhat 2018-11-25 07:51:57 UTC
Hi Zhang Rui,

If you don't see this issue can you kindly close the same?

Thanks,
Ganapathi
Comment 14 Zhang Rui 2018-11-26 08:41:53 UTC
the problem still exists last time I checked this, probably half a year ago.
I will test the latest upstream and update later.

Note You need to log in before you can comment on or make changes to this bug.