Bug 218237 - performance regression in AX88179 USB ethernet adapter adds 400ms to resume
Summary: performance regression in AX88179 USB ethernet adapter adds 400ms to resume
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Network (show other bugs)
Hardware: All Linux
: P3 normal
Assignee: drivers_network@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks: 178231
  Show dependency tree
 
Reported: 2023-12-06 21:01 UTC by Todd Brandt
Modified: 2024-03-25 09:17 UTC (History)
2 users (show)

See Also:
Kernel Version: 6.7.0-rc3
Subsystem:
Regression: Yes
Bisected commit-id: 0739af07d1d947af27c877f797cb82ceee702515


Attachments
thinkpad-x1_s2idle_6.7-rc2.html (667.90 KB, text/html)
2023-12-06 21:01 UTC, Todd Brandt
Details
thinkpad-x1_s2idle_6.7-rc3.html (678.24 KB, text/html)
2023-12-06 21:02 UTC, Todd Brandt
Details

Description Todd Brandt 2023-12-06 21:01:35 UTC
Created attachment 305552 [details]
thinkpad-x1_s2idle_6.7-rc2.html

Apparently a fix to the AX88179 usb ethernet dongle in 6.7-rc3 involves adding a full 400ms to S2idle and S3 resume. I understand that full functionality should come first before performance, but is it really necessary to add almost a half second to resume on all systems that use this device? Is there a better way of fixing this or is this device really that broken?

The commit and diff is here and I've attached the before and after effects in two sleepgraph timelines from a thinkpad x1 that uses the dongle.

commit 0739af07d1d947af27c877f797cb82ceee702515
Author: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>
Date:   Mon Nov 20 13:06:29 2023 +0100

    net: usb: ax88179_178a: fix failed operations during ax88179_reset


diff --git a/drivers/net/usb/ax88179_178a.c b/drivers/net/usb/ax88179_178a.c
index aff39bf3161d..4ea0e155bb0d 100644
--- a/drivers/net/usb/ax88179_178a.c
+++ b/drivers/net/usb/ax88179_178a.c
@@ -1583,11 +1583,11 @@ static int ax88179_reset(struct usbnet *dev)
 
        *tmp16 = AX_PHYPWR_RSTCTL_IPRL;
        ax88179_write_cmd(dev, AX_ACCESS_MAC, AX_PHYPWR_RSTCTL, 2, 2, tmp16);
-       msleep(200);
+       msleep(500);
 
        *tmp = AX_CLK_SELECT_ACS | AX_CLK_SELECT_BCS;
        ax88179_write_cmd(dev, AX_ACCESS_MAC, AX_CLK_SELECT, 1, 1, tmp);
-       msleep(100);
+       msleep(200);
 
        /* Ethernet PHY Auto Detach*/
        ax88179_auto_detach(dev);
Comment 1 Todd Brandt 2023-12-06 21:02:21 UTC
Created attachment 305553 [details]
thinkpad-x1_s2idle_6.7-rc3.html

after timeline showing the additional 400ms of resume time.
Comment 2 José Ignacio Tornos Martínez 2023-12-07 08:30:03 UTC
Yes, it is necessary because of the commented issue in the commit.
At this moment there is no more information about how to do it in a better way. Indeed the manufacturer driver is using sleeps.
As you say full functionality should come first before performance, otherwise the device may not be initialized properly.
Comment 3 Len Brown 2023-12-09 03:05:36 UTC
How do reliability and speed compare if the driver is unloaded on suspend
and loaded on resume?
Comment 4 Len Brown 2023-12-09 03:07:03 UTC
If this situation persists, then we need to replace all of the AX**
devices from our lab -- because they'll mask useful testing of other devices...
Comment 5 José Ignacio Tornos Martínez 2023-12-09 17:35:35 UTC
Realize that reset operation is needed as well when the device is resumed, and in this way we avoid possible problems. 
I guess it is due to your scenario or the way you test, but I do not understand how it can mask useful testing of other devices if all the drivers/devices in the machine are resumed.
Comment 6 José Ignacio Tornos Martínez 2023-12-10 09:38:09 UTC
Anyway, let me try to confirm if reset operation is strictly necessary when this device is resumed.
Comment 7 Todd Brandt 2024-01-19 19:05:50 UTC
(In reply to José Ignacio Tornos Martínez from comment #5)
> Realize that reset operation is needed as well when the device is resumed,
> and in this way we avoid possible problems. 
> I guess it is due to your scenario or the way you test, but I do not
> understand how it can mask useful testing of other devices if all the
> drivers/devices in the machine are resumed.

Len just means it affects our ability to see at a glance other devices' effects on total resume time. Any machine with this dongle will have a constant extra 400ms in total resume that masks performance issues in any other devices that run in parallel.
Comment 8 José Ignacio Tornos Martínez 2024-03-25 09:17:57 UTC
Reset is strictly necessary and with no more information I think it is better to keep like this to avoid the problems mentioned.

Note You need to log in before you can comment on or make changes to this bug.