Bug 185621 - ath10k slow resume - 2260 ms - Dell XPS13 - 9360
Summary: ath10k slow resume - 2260 ms - Dell XPS13 - 9360
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: network-wireless (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_network-wireless@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks: 178231
  Show dependency tree
 
Reported: 2016-10-30 03:20 UTC by Len Brown
Modified: 2017-07-18 14:10 UTC (History)
5 users (show)

See Also:
Kernel Version: 4.4.0-23-generic, 4.9.0-994-generic
Subsystem:
Regression: No
Bisected commit-id:


Attachments
lspci -vv output (10.76 KB, application/octet-stream)
2016-10-30 03:20 UTC, Len Brown
Details
analyze_suspend output on 4.4.0-23-generic (254.82 KB, text/html)
2016-10-30 03:24 UTC, Len Brown
Details
analyze_suspend output on 4.9.0-994-generic (252.26 KB, text/html)
2016-10-30 04:12 UTC, Len Brown
Details
analyze_suspend output for freeze on 4.9-rc3 (194.12 KB, text/html)
2016-10-30 04:43 UTC, Len Brown
Details
HTML output of `analyze_suspend.py` (373.41 KB, text/html)
2017-02-16 15:30 UTC, Paul Menzel
Details

Description Len Brown 2016-10-30 03:20:04 UTC
Created attachment 243241 [details]
lspci -vv output

The slowest resume device on the Dell XPS13 - 9630 is the 
3a:00.0 Network controller:
Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapter (rev 32)

This laptop comes from dell pre-installed with Ubuntu 16.04
and so this is the "out of the box" customer experience.
Comment 1 Len Brown 2016-10-30 03:24:03 UTC
Created attachment 243251 [details]
analyze_suspend output on 4.4.0-23-generic

looks  like the fun begins with ath10k_pcie_cold_reset()
Comment 2 Len Brown 2016-10-30 04:12:56 UTC
Created attachment 243261 [details]
analyze_suspend output on 4.9.0-994-generic

Reproduced with recent upstream-based kernel, as distributed here:
http://kernel.ubuntu.com/~kernel-ppa/mainline/drm-intel-nightly/2016-10-20/

this kernel calls itself 4.9.0-994-generic
I used this kernel because this machine has a new (KBL) graphics controller,
which I'm not sure is fully supported in Linus' tree.
Comment 3 Len Brown 2016-10-30 04:43:28 UTC
Created attachment 243271 [details]
analyze_suspend output for freeze on 4.9-rc3

ath10k issue appears to be the same with Linux-4.9-rc3
have reproduced both with suspend to S3, and here, freeze to idle path.
Comment 4 Kalle Valo 2016-11-03 20:42:27 UTC
I think the problem here is that starting the firmware takes too long. If Wake on Wireless LAN (WoWLAN) is disabled mac80211 stops the firmware during suspend and starts it again during resume.

(But if WoWLAN would be enabled mac80211 would actually leave the firmware running and resume should be faster.)

We should try to make starting the firmware faster but no idea how much we can speed it up. Other option is to modify mac80211 not to stop firmware but that will again increase power consumption.
Comment 5 Len Brown 2016-11-21 22:33:22 UTC
How to enable WoWLAN to verify this theory?
I don't see any BIOS SETUP options for this, is there an OS/driver option?

Is there an option to mac80211 to allow the firmware to not stop during suspend?  If yes, I'll be happy to measure the power before/after.

Also, in the case that the firmware does need to be started, the driver could also make this asynchronous.  The current synchronous design makes all of user-space wait while this driver sorts out its firmware.
Comment 6 Kalle Valo 2016-12-20 17:26:51 UTC
(In reply to Len Brown from comment #5)
> How to enable WoWLAN to verify this theory?
> I don't see any BIOS SETUP options for this, is there an OS/driver option?

I haven't tried WoWLAN myself but it should be possible to configure it with the iw tool. I found some documentation for wl18xx, which hopefully also applies to ath10k as they both use mac80211:

http://processors.wiki.ti.com/index.php/WL18xx_Adding_WoWLAN

> Is there an option to mac80211 to allow the firmware to not stop during
> suspend?  If yes, I'll be happy to measure the power before/after.

I'm not aware of any way except to make an ugly hack to mac80211.

> Also, in the case that the firmware does need to be started, the driver
> could also make this asynchronous.  The current synchronous design makes all
> of user-space wait while this driver sorts out its firmware.

I think mac80211 requires firmware bootup to be synchronous so that might be challenging to change.

There might be a way to improve this as one guy was able to make the firmware bootup faster in ath10k. He hasn't been able to post a proper patch yet as he is busy with something else at the moment but hopefully he does that soon.

Also ath10k devs (and wireless devs for that part) don't really follow bugzilla that closely. It would be easier, and get more people involved, if we could switch to ath10k and linux-wireless mailing lists:

ath10k@lists.infradead.org
linux-wireless@vger.kernel.org
Comment 7 Kalle Valo 2017-02-16 09:58:06 UTC
Ryan now submitted a patch which should speed up the firmware upload from 2 seconds to 0.5 seconds:

ath10k: improve the firmware download time for QCA6174
https://patchwork.kernel.org/patch/9575833/

Feedback very welcome. If a full tree is preferred the patch is in master-pending branch of my ath.git tree:

https://git.kernel.org/cgit/linux/kernel/git/kvalo/ath.git/log/?h=master-pending
Comment 8 Paul Menzel 2017-02-16 10:06:20 UTC
(In reply to Kalle Valo from comment #7)
> Ryan now submitted a patch which should speed up the firmware upload from 2
> seconds to 0.5 seconds:
> 
> ath10k: improve the firmware download time for QCA6174
> https://patchwork.kernel.org/patch/9575833/

Very nice. Thank you. It looks like there is a missing ; in there?

> Feedback very welcome. If a full tree is preferred the patch is in
> master-pending branch of my ath.git tree:
> 
> https://git.kernel.org/cgit/linux/kernel/git/kvalo/ath.git/log/?h=master-
> pending

It’d be awesome if that commit would be tagged to be backported to the stable releases.
Comment 9 Paul Menzel 2017-02-16 13:30:05 UTC
Merging the branch master-pending into the master branch of Linus’ tree, and building that, the WLAN does not work at all on the Dell XPS 13.

```
ath10k_pci 0000:3a:00.0: failed to fetch board data for bus=pci,vendor=168c,device=003e,subsystem-vendor=1a56,subsystem-device=1535,variant=RV_0520 from ath10k/QCA6174/hw3.0/board-2.bin
```

I’ll try to cherry-pick only the commits.
Comment 10 Paul Menzel 2017-02-16 15:30:17 UTC
Created attachment 254791 [details]
HTML output of `analyze_suspend.py`

I cherry picked on top of v4.10-rc8-3-g747ae0a96f1a.

```
63616c41d242 ath10k: improve the firmware download time for QCA6174
747ae0a96f1a Merge tag 'media/v4.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
```

With that resume time went down by 1.5 seconds.

```
ieee80211 @ phy0 {ieee80211} async_device (Total Suspend: 89.181 ms Total Resume: 763.594 ms)
```

Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>
Comment 11 Kalle Valo 2017-02-20 14:39:29 UTC
(In reply to Paul Menzel from comment #8)
> (In reply to Kalle Valo from comment #7)
> > Ryan now submitted a patch which should speed up the firmware upload from 2
> > seconds to 0.5 seconds:
> > 
> > ath10k: improve the firmware download time for QCA6174
> > https://patchwork.kernel.org/patch/9575833/
> 
> Very nice. Thank you. It looks like there is a missing ; in there?

That was fixed in v2. I accidentally gave you a link to v1, sorry.

> > Feedback very welcome. If a full tree is preferred the patch is in
> > master-pending branch of my ath.git tree:
> > 
> > https://git.kernel.org/cgit/linux/kernel/git/kvalo/ath.git/log/?h=master-
> > pending
> 
> It’d be awesome if that commit would be tagged to be backported to the
> stable releases.

At least in my opinion this is more like a new feature, not a bug fix. So I'm not sure if it fulfills the stable patch rules.
Comment 12 Kalle Valo 2017-02-20 14:43:48 UTC
(In reply to Paul Menzel from comment #9)
> Merging the branch master-pending into the master branch of Linus’ tree, and
> building that, the WLAN does not work at all on the Dell XPS 13.
> 
> ```
> ath10k_pci 0000:3a:00.0: failed to fetch board data for
> bus=pci,vendor=168c,device=003e,subsystem-vendor=1a56,subsystem-device=1535,
> variant=RV_0520 from ath10k/QCA6174/hw3.0/board-2.bin
> ```
> 
> I’ll try to cherry-pick only the commits.

I believe that's a regression caused by this comment:

f2593cb1b291 ath10k: Search SMBIOS for OEM board file extension

A workaround would be to update to the latest board-2.bin:

https://github.com/kvalo/ath10k-firmware/commit/8d15818b0f9c7b09f743538ac2d3e1409779f52a

But we will work on getting this fixed properly so that the latest kernel will continue to work with the older board-2.bin. Thanks for reporting this!
Comment 13 Kalle Valo 2017-02-20 14:46:06 UTC
(In reply to Paul Menzel from comment #10)
> Created attachment 254791 [details]
> HTML output of `analyze_suspend.py`
> 
> I cherry picked on top of v4.10-rc8-3-g747ae0a96f1a.
> 
> ```
> 63616c41d242 ath10k: improve the firmware download time for QCA6174
> 747ae0a96f1a Merge tag 'media/v4.10-4' of
> git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
> ```
> 
> With that resume time went down by 1.5 seconds.
> 
> ```
> ieee80211 @ phy0 {ieee80211} async_device (Total Suspend: 89.181 ms Total
> Resume: 763.594 ms)
> ```
> 
> Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>

Nice, thanks for testing this. I added your Tested-by to the commit.
Comment 14 Paul Menzel 2017-02-28 10:42:40 UTC
(In reply to Kalle Valo from comment #11)
> (In reply to Paul Menzel from comment #8)
> > (In reply to Kalle Valo from comment #7)

[…]

> > > Feedback very welcome. If a full tree is preferred the patch is in
> > > master-pending branch of my ath.git tree:
> > > 
> > > https://git.kernel.org/cgit/linux/kernel/git/kvalo/ath.git/log/?h=master-
> > > pending
> > 
> > It’d be awesome if that commit would be tagged to be backported to the
> > stable releases.
> 
> At least in my opinion this is more like a new feature, not a bug fix. So
> I'm not sure if it fulfills the stable patch rules.

From the document `Documentation/process/stable-kernel-rules.rst` [1] I would see the items below justifying backporting it.

1.  “It must fix a real bug that bothers people (not a, “This could be a problem...” type thing).” (It bothers all the laptop owners.)
2.  “Serious issues as reported by a user of a distribution kernel may also be considered if they fix a notable performance or interactivity issue. As these fixes are not as obvious and have a higher risk of a subtle regression they should only be submitted by a distribution kernel maintainer and include an addendum linking to a bugzilla entry if it exists and additional information on the user-visible impact.
New device IDs and quirks are also accepted.”

The following points reject it though.

1.  “oh, that’s not good” is subjective, but it’s probably *not* critical.
2.  “It cannot be bigger than 100 lines, with context.”

Is there a way to notify distributions to backport it?


[1] https://static.lwn.net/kerneldoc/process/stable-kernel-rules.html
Comment 15 Paul Menzel 2017-02-28 18:09:59 UTC
As I don’t see it yet in Linus’ branch, will that commit be included in Linux 4.11?
Comment 16 Kalle Valo 2017-04-20 10:42:13 UTC
(In reply to Paul Menzel from comment #14)
> (In reply to Kalle Valo from comment #11)
> > (In reply to Paul Menzel from comment #8)
> > > (In reply to Kalle Valo from comment #7)
> 
> […]
> 
> > > > Feedback very welcome. If a full tree is preferred the patch is in
> > > > master-pending branch of my ath.git tree:
> > > > 
> > > >
> https://git.kernel.org/cgit/linux/kernel/git/kvalo/ath.git/log/?h=master-
> > > > pending
> > > 
> > > It’d be awesome if that commit would be tagged to be backported to the
> > > stable releases.
> > 
> > At least in my opinion this is more like a new feature, not a bug fix. So
> > I'm not sure if it fulfills the stable patch rules.
> 
> From the document `Documentation/process/stable-kernel-rules.rst` [1] I
> would see the items below justifying backporting it.
> 
> 1.  “It must fix a real bug that bothers people (not a, “This could be a
> problem...” type thing).” (It bothers all the laptop owners.)
> 2.  “Serious issues as reported by a user of a distribution kernel may also
> be considered if they fix a notable performance or interactivity issue. As
> these fixes are not as obvious and have a higher risk of a subtle regression
> they should only be submitted by a distribution kernel maintainer and
> include an addendum linking to a bugzilla entry if it exists and additional
> information on the user-visible impact.
> New device IDs and quirks are also accepted.”

Ah, I haven't realised this.

> The following points reject it though.
> 
> 1.  “oh, that’s not good” is subjective, but it’s probably *not* critical.
> 2.  “It cannot be bigger than 100 lines, with context.”

Yeah, the first patch is around 400 lines so it is too big for stable.

> Is there a way to notify distributions to backport it?

I don't know. There's always the backports which people can use to run latest ath10k on older kernels but the project hasn't done a release for some time now so one has to get the latest version from git:

https://backports.wiki.kernel.org/index.php/Main_Page

> [1] https://static.lwn.net/kerneldoc/process/stable-kernel-rules.html

(In reply to Paul Menzel from comment #15)
> As I don’t see it yet in Linus’ branch, will that commit be included in
> Linux 4.11?

If all goes well the first release containing the patches will be 4.12-rc1.

https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git/commit/?h=ath-next&id=583a6629432ca95813a585a7117331ffe36fe939

https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git/commit/?h=ath-next&id=912b6e8850a51a09ec771aedf2b4428ac9b34e20
Comment 17 Kalle Valo 2017-04-27 08:54:27 UTC
Marking bug fixed.
Comment 18 Paul Menzel 2017-07-18 14:10:46 UTC
Just an update that with Linux 4.13-rc1 it really takes that long, that means 0.5 seconds, to download the firmware.

ath10k_bmi_fast_download [ath10k_core] (531.523 ms @ 184.159001)
•   ath10k_bmi_lz_stream_start [ath10k_core] (0.007 ms @ 184.159001)
•   ath10k_bmi_lz_data [ath10k_core] (531.359 ms @ 184.159008)
•   ath10k_bmi_lz_data [ath10k_core] (0.144 ms @ 184.690368)
•   ath10k_bmi_lz_stream_start [ath10k_core] (0.012 ms @ 184.690512)

I created bug #196413 [1] to improve the time further.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=196413

Note You need to log in before you can comment on or make changes to this bug.