Bug 196043 - ath9k freezes suspend resume Ubuntu 17.04 with kernel 4.10 to 4.12
Summary: ath9k freezes suspend resume Ubuntu 17.04 with kernel 4.10 to 4.12
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: network-wireless (show other bugs)
Hardware: Intel Linux
: P1 high
Assignee: drivers_network-wireless@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-06-12 14:14 UTC by freeseek
Modified: 2017-07-18 21:05 UTC (History)
0 users

See Also:
Kernel Version: 4.12.0-041200rc4-generic
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg kernel 4.12.0-041200rc4-generic (86.50 KB, text/plain)
2017-06-12 14:14 UTC, freeseek
Details
dmesg kernel 4.8.6-040806-generic (155.90 KB, text/plain)
2017-06-13 13:34 UTC, freeseek
Details

Description freeseek 2017-06-12 14:14:03 UTC
Created attachment 256955 [details]
dmesg kernel 4.12.0-041200rc4-generic

Since I have updated from Ubuntu 16.10 to Ubuntu 17.04, I often experience the laptop failing to properly recover after suspend/resuming the system. My best guess is that the issue is caused by driver ath9k, given the kernel trace, but it could be something else. My laptop is a Dell XPS L322X/0PJHXN version 9333 (2013) now running kernel 4.12.0-041200rc4-generic (with 8GB of RAM and no swap). Attached below is the kernel trace I receive when the bug is triggered just before the laptop suspends.

To reproduce the bug I usually need to suspend the system about five times. Most of the times suspend/resume does work just fine, and then occasionally it leaves the system in an unrecoverable state. Not sure if a specific state triggers the issue. It seems random to me. Once the bug arises the network system does not work anymore, and slowly all programs start to fail needing a hard reboot (or  Alt+Sys Req+B).

I filed a bug with launchpad (https://bugs.launchpad.net/bugs/1697027). I am not sure when the problem was introduced but the laptop never exhibited this issue with Ubuntu 16.10 and kernel 4.8 and I started to observe the issue when upgrading to Ubuntu 17.04 and kernel 4.10.

[  185.161176] ------------[ cut here ]------------
[  185.161182] WARNING: CPU: 0 PID: 984 at /home/kernel/COD/linux/kernel/kthread.c:71 kthread_stop+0xf1/0x100
[  185.161183] Modules linked in: uas usb_storage hid_generic usbhid hid cdc_ether usbnet r8152 mii rfcomm cmac bnep snd_hda_codec_hdmi snd_hda_codec_realtek dell_wmi snd_hda_codec_generic sparse_keymap snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm intel_rapl x86_pkg_temp_thermal intel_powerclamp snd_seq_midi snd_seq_midi_event coretemp kvm_intel dell_laptop snd_rawmidi kvm irqbypass dell_smbios crct10dif_pclmul snd_seq dcdbas uvcvideo crc32_pclmul ghash_clmulni_intel videobuf2_vmalloc arc4 videobuf2_memops snd_seq_device ath9k pcbc videobuf2_v4l2 snd_timer ath9k_common videobuf2_core aesni_intel ath9k_hw ath3k videodev aes_x86_64 crypto_simd media glue_helper snd btusb cryptd ath btrtl mac80211 intel_cstate btbcm soundcore btintel intel_rapl_perf cfg80211 bluetooth input_leds joydev mei_me
[  185.161228]  serio_raw ecdh_generic mei shpchp lpc_ich acpi_als kfifo_buf industrialio mac_hid binfmt_misc parport_pc ppdev lp parport ip_tables x_tables autofs4 i915 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect psmouse sysimgblt fb_sys_fops ahci drm libahci wmi video
[  185.161248] CPU: 0 PID: 984 Comm: NetworkManager Not tainted 4.12.0-041200rc4-generic #201706042031
[  185.161249] Hardware name: Dell Inc.          Dell System XPS L322X/0PJHXN, BIOS A09 05/15/2013
[  185.161251] task: ffff950170fdda00 task.stack: ffffa22c01538000
[  185.161253] RIP: 0010:kthread_stop+0xf1/0x100
[  185.161254] RSP: 0018:ffffa22c0153b5b0 EFLAGS: 00010246
[  185.161255] RAX: ffffffffa6257800 RBX: ffff950171b79560 RCX: 0000000000000000
[  185.161256] RDX: 0000000080000000 RSI: 000000007fffffff RDI: ffff9500ac9a9680
[  185.161257] RBP: ffffa22c0153b5c8 R08: 0000000000000000 R09: 0000000000000000
[  185.161258] R10: ffffa22c0153b648 R11: ffff9501768004b8 R12: ffff9500ac9a9680
[  185.161259] R13: ffff950171b79f70 R14: ffff950171b78780 R15: ffff9501749dc018
[  185.161260] FS:  00007f0d6bfd5540(0000) GS:ffff95017f200000(0000) knlGS:0000000000000000
[  185.161261] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  185.161262] CR2: 00007fc190161a08 CR3: 0000000232906000 CR4: 00000000001406f0
[  185.161263] Call Trace:
[  185.161272]  ath9k_rng_stop+0x1a/0x20 [ath9k]
[  185.161277]  ath9k_stop+0x3b/0x1d0 [ath9k]
[  185.161299]  drv_stop+0x33/0xf0 [mac80211]
[  185.161317]  ieee80211_stop_device+0x43/0x50 [mac80211]
[  185.161334]  ieee80211_do_stop+0x4f2/0x810 [mac80211]
[  185.161337]  ? _raw_spin_unlock_bh+0x1e/0x20
[  185.161340]  ? dev_deactivate_many+0x205/0x240
[  185.161356]  ieee80211_stop+0x1a/0x20 [mac80211]
[  185.161359]  __dev_close_many+0x99/0x100
[  185.161361]  __dev_close+0x45/0x70
[  185.161363]  __dev_change_flags+0x9d/0x160
[  185.161365]  dev_change_flags+0x29/0x60
[  185.161368]  do_setlink+0x32e/0xca0
[  185.161370]  ? check_preempt_curr+0x79/0x90
[  185.161373]  ? attach_task+0x4c/0x60
[  185.161376]  ? __update_load_avg_se.isra.35+0x15b/0x180
[  185.161378]  ? __update_load_avg_se.isra.35+0x15b/0x180
[  185.161379]  ? __update_load_avg_se.isra.35+0x15b/0x180
[  185.161381]  ? __update_load_avg_se.isra.35+0x15b/0x180
[  185.161384]  ? nla_parse+0x35/0x140
[  185.161386]  rtnl_newlink+0x7d3/0x900
[  185.161391]  ? security_capget+0x60/0x70
[  185.161392]  ? ns_capable_common+0x68/0x80
[  185.161394]  ? ns_capable+0x13/0x20
[  185.161395]  rtnetlink_rcv_msg+0xee/0x220
[  185.161398]  ? __atime_needs_update+0x7f/0x1a0
[  185.161400]  ? rtnl_newlink+0x900/0x900
[  185.161402]  netlink_rcv_skb+0xe7/0x120
[  185.161404]  rtnetlink_rcv+0x28/0x30
[  185.161406]  netlink_unicast+0x18c/0x220
[  185.161408]  netlink_sendmsg+0x2ba/0x3b0
[  185.161410]  sock_sendmsg+0x38/0x50
[  185.161412]  ___sys_sendmsg+0x2d7/0x2f0
[  185.161414]  ? wake_up_q+0x80/0x80
[  185.161417]  ? __wake_up_common+0x4d/0x80
[  185.161419]  ? eventfd_write+0x113/0x260
[  185.161421]  ? wake_up_q+0x80/0x80
[  185.161422]  ? ep_poll+0x3c7/0x3e0
[  185.161425]  __sys_sendmsg+0x54/0x90
[  185.161426]  ? __sys_sendmsg+0x54/0x90
[  185.161429]  SyS_sendmsg+0x12/0x20
[  185.161430]  entry_SYSCALL_64_fastpath+0x1e/0xa9
[  185.161432] RIP: 0033:0x7f0d69bda460
[  185.161433] RSP: 002b:00007ffdf32fad80 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
[  185.161435] RAX: ffffffffffffffda RBX: 000055f9a48d86a0 RCX: 00007f0d69bda460
[  185.161436] RDX: 0000000000000000 RSI: 00007ffdf32fade0 RDI: 000000000000000c
[  185.161437] RBP: 000000000000000e R08: 0000000000000000 R09: 000055f9a47e3730
[  185.161437] R10: 000055f9a47e3730 R11: 0000000000000293 R12: 000055f9a48d8910
[  185.161438] R13: 000055f9a486b050 R14: 0000000000000004 R15: 00007f0d60b747bd
[  185.161440] Code: 7e 3e f7 00 48 85 db 74 18 48 8b 03 48 8b 7b 08 48 83 c3 18 44 89 ee ff d0 48 8b 03 48 85 c0 75 eb 5b 44 89 e8 41 5c 41 5d 5d c3 <0f> ff e9 30 ff ff ff 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 
[  185.161472] ---[ end trace b1c25865e2e9ee44 ]---
[  185.161478] BUG: unable to handle kernel paging request at 00007f49a0dc0e10
[  185.161525] IP: kthread_stop+0x30/0x100
[  185.161546] PGD 2347f5067 
[  185.161547] P4D 2347f5067 
[  185.161563] PUD 0 

[  185.161601] Oops: 0002 [#1] SMP
[  185.161619] Modules linked in: uas usb_storage hid_generic usbhid hid cdc_ether usbnet r8152 mii rfcomm cmac bnep snd_hda_codec_hdmi snd_hda_codec_realtek dell_wmi snd_hda_codec_generic sparse_keymap snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm intel_rapl x86_pkg_temp_thermal intel_powerclamp snd_seq_midi snd_seq_midi_event coretemp kvm_intel dell_laptop snd_rawmidi kvm irqbypass dell_smbios crct10dif_pclmul snd_seq dcdbas uvcvideo crc32_pclmul ghash_clmulni_intel videobuf2_vmalloc arc4 videobuf2_memops snd_seq_device ath9k pcbc videobuf2_v4l2 snd_timer ath9k_common videobuf2_core aesni_intel ath9k_hw ath3k videodev aes_x86_64 crypto_simd media glue_helper snd btusb cryptd ath btrtl mac80211 intel_cstate btbcm soundcore btintel intel_rapl_perf cfg80211 bluetooth input_leds joydev mei_me
[  185.161974]  serio_raw ecdh_generic mei shpchp lpc_ich acpi_als kfifo_buf industrialio mac_hid binfmt_misc parport_pc ppdev lp parport ip_tables x_tables autofs4 i915 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect psmouse sysimgblt fb_sys_fops ahci drm libahci wmi video
[  185.162102] CPU: 0 PID: 984 Comm: NetworkManager Tainted: G        W       4.12.0-041200rc4-generic #201706042031
[  185.162151] Hardware name: Dell Inc.          Dell System XPS L322X/0PJHXN, BIOS A09 05/15/2013
[  185.162194] task: ffff950170fdda00 task.stack: ffffa22c01538000
[  185.162225] RIP: 0010:kthread_stop+0x30/0x100
[  185.162249] RSP: 0018:ffffa22c0153b5b0 EFLAGS: 00010246
[  185.162276] RAX: ffffffffa6257800 RBX: 00007f49a0dc0e10 RCX: 0000000000000000
[  185.162312] RDX: 0000000080000000 RSI: 000000007fffffff RDI: ffff9500ac9a9680
[  185.162346] RBP: ffffa22c0153b5c8 R08: 0000000000000000 R09: 0000000000000000
[  185.162383] R10: ffffa22c0153b648 R11: ffff9501768004b8 R12: ffff9500ac9a9680
[  185.162419] R13: ffff950171b79f70 R14: ffff950171b78780 R15: ffff9501749dc018
[  185.162455] FS:  00007f0d6bfd5540(0000) GS:ffff95017f200000(0000) knlGS:0000000000000000
[  185.162497] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  185.162526] CR2: 00007f49a0dc0e10 CR3: 0000000232906000 CR4: 00000000001406f0
[  185.162562] Call Trace:
[  185.162582]  ath9k_rng_stop+0x1a/0x20 [ath9k]
[  185.162609]  ath9k_stop+0x3b/0x1d0 [ath9k]
[  185.162646]  drv_stop+0x33/0xf0 [mac80211]
[  185.162687]  ieee80211_stop_device+0x43/0x50 [mac80211]
[  185.162730]  ieee80211_do_stop+0x4f2/0x810 [mac80211]
[  185.162759]  ? _raw_spin_unlock_bh+0x1e/0x20
[  185.162783]  ? dev_deactivate_many+0x205/0x240
[  185.162822]  ieee80211_stop+0x1a/0x20 [mac80211]
[  185.162848]  __dev_close_many+0x99/0x100
[  185.162870]  __dev_close+0x45/0x70
[  185.162891]  __dev_change_flags+0x9d/0x160
[  185.162915]  dev_change_flags+0x29/0x60
[  185.162938]  do_setlink+0x32e/0xca0
[  185.162959]  ? check_preempt_curr+0x79/0x90
[  185.162983]  ? attach_task+0x4c/0x60
[  185.163005]  ? __update_load_avg_se.isra.35+0x15b/0x180
[  185.163032]  ? __update_load_avg_se.isra.35+0x15b/0x180
[  185.163060]  ? __update_load_avg_se.isra.35+0x15b/0x180
[  185.163090]  ? __update_load_avg_se.isra.35+0x15b/0x180
[  185.163119]  ? nla_parse+0x35/0x140
[  185.163141]  rtnl_newlink+0x7d3/0x900
[  185.163165]  ? security_capget+0x60/0x70
[  185.163187]  ? ns_capable_common+0x68/0x80
[  185.163209]  ? ns_capable+0x13/0x20
[  185.163229]  rtnetlink_rcv_msg+0xee/0x220
[  185.163253]  ? __atime_needs_update+0x7f/0x1a0
[  185.163278]  ? rtnl_newlink+0x900/0x900
[  185.163300]  netlink_rcv_skb+0xe7/0x120
[  185.163323]  rtnetlink_rcv+0x28/0x30
[  185.163345]  netlink_unicast+0x18c/0x220
[  185.163368]  netlink_sendmsg+0x2ba/0x3b0
[  185.163391]  sock_sendmsg+0x38/0x50
[  185.163413]  ___sys_sendmsg+0x2d7/0x2f0
[  185.163435]  ? wake_up_q+0x80/0x80
[  185.163456]  ? __wake_up_common+0x4d/0x80
[  185.163478]  ? eventfd_write+0x113/0x260
[  185.163500]  ? wake_up_q+0x80/0x80
[  185.163520]  ? ep_poll+0x3c7/0x3e0
[  185.163540]  __sys_sendmsg+0x54/0x90
[  185.163561]  ? __sys_sendmsg+0x54/0x90
[  185.165161]  SyS_sendmsg+0x12/0x20
[  185.166718]  entry_SYSCALL_64_fastpath+0x1e/0xa9
[  185.168315] RIP: 0033:0x7f0d69bda460
[  185.169930] RSP: 002b:00007ffdf32fad80 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
[  185.171580] RAX: ffffffffffffffda RBX: 000055f9a48d86a0 RCX: 00007f0d69bda460
[  185.173251] RDX: 0000000000000000 RSI: 00007ffdf32fade0 RDI: 000000000000000c
[  185.174685] RBP: 000000000000000e R08: 0000000000000000 R09: 000055f9a47e3730
[  185.175772] R10: 000055f9a47e3730 R11: 0000000000000293 R12: 000055f9a48d8910
[  185.176810] R13: 000055f9a486b050 R14: 0000000000000004 R15: 00007f0d60b747bd
[  185.177807] Code: 55 48 89 e5 41 55 41 54 49 89 fc 53 0f 1f 44 00 00 f0 41 ff 44 24 18 41 f6 44 24 1e 20 0f 84 c9 00 00 00 49 8b 9c 24 48 09 00 00 <f0> 80 0b 02 4c 89 e7 e8 44 ff ff ff 4c 89 e7 e8 ec a9 00 00 48 
[  185.178973] RIP: kthread_stop+0x30/0x100 RSP: ffffa22c0153b5b0
[  185.179987] CR2: 00007f49a0dc0e10
[  185.188605] ---[ end trace b1c25865e2e9ee45 ]---
Comment 1 freeseek 2017-06-13 04:06:26 UTC
I performed some testing, and while I cannot be 100% sure, I do believe that the problem lies within driver ath9k. If I remove module ath9k, suspending and resuming the laptop works all the time and I cannot reproduce the bug. After dissecting the kernel, it seems very likely that the problem was introduced between kernel release 4.8.6 and release 4.8.7 (using builds from http://kernel.ubuntu.com/~kernel-ppa/mainline/). However, I have no idea what piece of code could be responsible. There are only two files changed within the ath9k driver across these two releases:

drivers/net/wireless/ath/ath9k/ar9003_calib.c

drivers/net/wireless/ath/ath9k/hw.h

I tried to recompile the ath9k driver in kernel 4.8.7 by reverting the changes in those two files, but reverting the changes does not fix the bug. My guess is that the bug was pre-existing in the ath9k driver but was not triggered before release 4.8.7.
Comment 2 freeseek 2017-06-13 13:34:21 UTC
Created attachment 256979 [details]
dmesg kernel 4.8.6-040806-generic

Nevermind, I was able to reproduce the bug with kernel 4.8.6. It is just much harder to reproduce with previous kernels, that is, the bug is less likely to trigger.
Comment 3 freeseek 2017-06-22 13:04:13 UTC
Many thanks to Miaoqing Pan for writing a patch for this bug:
https://patchwork.kernel.org/patch/9803211/
Comment 4 freeseek 2017-07-18 21:05:31 UTC
While the fix did not make it into kernel 4.12, it made it into kernel 4.13-rc1:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v4.13-rc1&id=07246c115801c27652700e3679bb58661ef7ed65

Note You need to log in before you can comment on or make changes to this bug.