Bug 215747

Summary: Mouse gets sluggish after suspend/resume and power usage significant higher after resume
Product: Power Management Reporter: Nico Schottelius (nico-bugzilla.kernel.org)
Component: Hibernation/SuspendAssignee: Rafael J. Wysocki (rjw)
Status: RESOLVED DUPLICATE    
Severity: normal CC: linux, regressions
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 5.17.0 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg
dmesg-5.15.31
dmesg after which symptoms happen on 5.15.31 as well
/proc/interrupts before suspend & resume
/proc/interrupts after FAILED suspend & resume
/proc/interrupts AFTER successful suspend & resume
dmesg-5.15.32-post-failed-suspend
dmesg-5.15.32-post-success-suspend
dmesg-5.17.1-0-edge.pre-suspend
irq-watch-pre-suspend-5.17.1-0-edge
dmesg-5.17.1-0-edge.post-suspend
irq-watch-post-suspend-5.17.1-0-edge
dmesg with fan active suspend

Description Nico Schottelius 2022-03-26 19:27:06 UTC
Created attachment 300619 [details]
dmesg

TL;DR:

Suspending and resume makes the ELAN0670:00 trackpad sluggish (very hard to move the pointer) and the energy usage of the notebook is about 3 times higher than before suspend.

Background:


On a Lenovo X1 Nano the trackpad works fine until suspend/resume.The estimated battery runtime PRIOR to suspend/resume is 8h 43m. After suspend resume it drops to 2h 20m instantly.

There seems to be a firmware error in the iwlwifi card show in the attached dmesg, but I am not sure whether "that's enough" to cause both symptoms.

Kernel is from Alpine Linux, which is basically stock upstream afaics.
Comment 1 Nico Schottelius 2022-03-26 19:43:57 UTC
Created attachment 300620 [details]
dmesg-5.15.31
Comment 2 Nico Schottelius 2022-03-26 19:45:01 UTC
This problem does not seem to exist on 5.15.31. Suspend/resume and the mosue is still working as usual.

However the estimated runtime is significantly lower, from the start, at about 3h.
Comment 3 Artem S. Tashkinov 2022-03-27 13:13:26 UTC
Any processes using an abnormal amount of CPU after resume?

Does powertop report anything unusual?

Could you try installing Linux 5.17 and see if it's resolved?
Comment 4 Nico Schottelius 2022-03-28 06:38:16 UTC
First correction: the problem also exist on 5.15.31-0-lts.

Note though that the "TPPS/2 Elan TrackPoint" still works smooth, not hanging observed like I see it on the "ELAN0670:00 04F3:3150 Touchpad"

CPU wise I cannot identify any process running more than the usual few % of CPU. 

Top lines of powertop after suspend/resume:

Summary: 1367.1 wakeups/second,  0.0 GPU ops/seconds, 0.0 VFS ops/sec and 11.7% CPU use

Power est.              Usage       Events/s    Category       Description
  2.64 W    100.0%                      Device         Radio device: btusb
  1.33 W     50.0%                      Device         Display backlight
  1.21 W      5.0 ms/s     300.9        Timer          tick_sched_timer
  728 mW    328.6 µs/s     181.4        Interrupt      [27] idma64.0
  640 mW     15.3 ms/s     159.5        Process        [PID 4143] firefox
  264 mW      1.3 ms/s      40.8        Process        [PID 18595] /usr/bin/pulseaudio --start --
  260 mW      1.2 ms/s      64.8        Process        [PID 16564] /usr/lib/firefox/firefox -cont
  225 mW      1.0 ms/s      56.0        Process        [PID 4316] firefox
  208 mW     14.3 pkts/s                Device         Network interface: wlan0 (iwlwifi)
  203 mW      2.3 ms/s      50.5        Timer          hrtimer_wakeup


atop top lines:

ATOP - nb3             2022/03/28  08:36:13             -----------------             10s elapsed
PRC | sys    1.00s | user   2.34s  | #proc    244 | #tslpu    81 | #zombie    0  | no  procacct |
CPU | sys       8% | user     19%  | irq      26% | idle    747% | wait      0%  | curscal  49% |
cpu | sys       3% | user      0%  | irq      25% | idle     71% | cpu000 w  0%  | curscal  98% |
cpu | sys       1% | user      8%  | irq       0% | idle     91% | cpu007 w  0%  | curscal  42% |
cpu | sys       1% | user      3%  | irq       0% | idle     96% | cpu003 w  0%  | curscal   9% |
cpu | sys       1% | user      2%  | irq       0% | idle     97% | cpu006 w  0%  | curscal  58% |
cpu | sys       1% | user      2%  | irq       0% | idle     98% | cpu005 w  0%  | curscal  55% |
cpu | sys       1% | user      1%  | irq       0% | idle     97% | cpu002 w  0%  | curscal  26% |
cpu | sys       1% | user      1%  | irq       0% | idle     98% | cpu001 w  0%  | curscal  23% |
cpu | sys       1% | user      2%  | irq       0% | idle     98% | cpu004 w  0%  | curscal  83% |
CPL | avg1    1.14 | avg5    1.37  | avg15   1.41 | csw   142536 | intr  290838  | numcpu     8 |
MEM | tot    15.4G | free  588.6M  | cache   6.5G | buff  852.4M | slab    2.5G  | numnode    1 |
SWP | tot     0.0M | free    0.0M  | swcac   0.0M |              | vmcom   9.8G  | vmlim   7.7G |
LVM |         root | busy      1%  | read       3 | write      8 | MBw/s    0.1  | avio 7.91 ms |
DSK |      nvme0n1 | busy      1%  | read       3 | write      8 | MBw/s    0.1  | avio 7.91 ms |
NET | transport    | tcpi      68  | tcpo      78 | udpi       4 | udpo       8  | tcpao      2 |
NET | network      | ipi      129  | ipo      102 | ipfrw      0 | deliv    129  | icmpo     17 |
NET | wlan0     0% | pcki     127  | pcko     102 | sp  433 Mbps | si   41 Kbps  | so   23 Kbps |

  PID SYSCPU  USRCPU RDELAY   VGROW   RGROW RUID      ST EXC   THR  S CPUNR   CPU  CMD        1/3
 2544  0.04s   0.74s  0.00s  128.0K   76.0K root      --   -     1  S     7    8%  powertop
 4143  0.14s   0.52s  0.00s  900.0K    1.2M nico      --   -   232  S     5    7%  firefox
15785  0.21s   0.35s  0.00s      0B   -4.0K nico      --   -    33  S     4    6%  Isolated Web C
 4070  0.11s   0.13s  0.00s   -4.0K      0B nico      --   -    17  S     4    2%  Xorg
18595  0.15s   0.08s  0.00s      0B      0B nico      --   -     7  S     4    2%  pulseaudio
 4715  0.01s   0.16s  0.00s  204.0K    1.3M nico      --   -    29  S     0    2%  Isolated Web C
 4353  0.02s   0.11s  0.00s   -1.0M   -0.2M nico      --   -    29  S     6    1%  WebExtensions
 2479  0.09s   0.00s  0.01s      0B      0B root      --   -     1  D     0    1%  irq/118-ELAN06
 4089  0.02s   0.04s  0.00s      0B      0B nico      --   -     1  S     5    1%  i3
 3162  0.05s   0.01s  0.00s      0B      0B root      --   -     1  S     1    1%  acpid
 3201  0.03s   0.02s  0.00s      0B      0B nico      --   -     1  R     2    1%  atop
27438  0.01s   0.02s  0.00s  132.0K   24.0K nico      --   -    27  S     5    0%  Isolated Web C
 5791  0.01s   0.02s  0.00s      0B  -24.0K nico      --   -    27  S     3    0%  Isolated Web C
31445  0.01s   0.01s  0.00s    1.0M    1.1M nico      --   -    35  S     1    0%  Isolated Web C
29507  0.00s   0.02s  0.00s      0B      0B nico      --   -    27  S     2    0%  Isolated Web C
13266  0.00s   0.02s  0.00s      0B   76.0K nico      --   -    27  S     6    0%  Isolated Web C
  525  0.02s   0.00s  0.00s      0B      0B root      --   -     1  I     2    0%  kworker/u16:48
  487  0.02s   0.00s  0.00s      0B      0B root      --   -     1  I     1    0%  kworker/u16:7-
 4704  0.00s   0.01s  0.00s      0B   -0.1M nico      --   -    30  S     0    0%  Isolated Web C

I first detected the problem on 5.17.0 and I just downgraded afterwards to 5.15.31 to see if it persists, which is indeed the case.

I will attach the dmesg output *after* this issue also happens on 5.15.31.
Comment 5 Nico Schottelius 2022-03-28 06:38:49 UTC
Created attachment 300631 [details]
dmesg after which symptoms happen on 5.15.31 as well
Comment 6 The Linux kernel's regression tracker (Thorsten Leemhuis) 2022-04-04 13:56:21 UTC
(In reply to Nico Schottelius from comment #4)
>
> I first detected the problem on 5.17.0 and I just downgraded afterwards to
> 5.15.31 to see if it persists, which is indeed the case.

But things worked fine in the past, e.g. on earlier 5.15.y releases or older series like 5.14 or 5.13?
Comment 7 Nico Schottelius 2022-04-05 11:14:31 UTC
Yes, I am pretty sure about that. It is a bit hard to tell when it exactly broke, as I am following the Alpine Linux edge releases for a while. My assumption is that it might still have been working in 5.13. 

I am also not 100% sure whether the energy consumption surge is related to it, however seeing that btusb is draining 2.6W is puzzling me a bit. 

Is there any command line I could enable for debugging issues around this?
Comment 8 Thomas Weißschuh 2022-04-09 05:04:40 UTC
Are you using s3 or s2idle?

If it is happening when using s3 it could be a duplicate of
https://bugzilla.kernel.org/show_bug.cgi?id=214775
(and https://gitlab.freedesktop.org/libinput/libinput/-/issues/694 https://gitlab.freedesktop.org/libinput/libinput/-/issues/694 and everything linked from there)

Can you check in /proc/interrupts the behavior of the device before and after suspend-resume?
Comment 9 Nico Schottelius 2022-04-09 08:40:19 UTC
I am using s3, because with s2_idle the fans stay on during suspend and the device gets hot.

I'll get the device and do a reboot in the next hours to check the irq behaviour.
Comment 10 Nico Schottelius 2022-04-09 08:54:38 UTC
Now doing a test on 5.15.32 (current lts version of alpine linux, will reboot to -edge which is 5.17.x afterwards):

* First suspend (pm-suspend call) fails
* Trackpad still works correct after failed suspend
* Second suspend works
* Trackpad jumps after correct suspend/resume
* IRQ storm seems to be present after 2nd suspend

I've attached all files that I recorded, the interrupts before 1st, after 1st and after 2nd suspend as well a related dmesg. Will now reboot and do the same on 5.17.
Comment 11 Nico Schottelius 2022-04-09 08:55:40 UTC
Created attachment 300728 [details]
/proc/interrupts before suspend & resume

I used ( while true; do date; cat /proc/interrupts; sleep 5; done ) | tee irq-watch-...
Comment 12 Nico Schottelius 2022-04-09 08:56:01 UTC
Created attachment 300729 [details]
/proc/interrupts after FAILED suspend & resume
Comment 13 Nico Schottelius 2022-04-09 08:56:23 UTC
Created attachment 300730 [details]
/proc/interrupts AFTER successful suspend & resume
Comment 14 Nico Schottelius 2022-04-09 08:57:02 UTC
Created attachment 300731 [details]
dmesg-5.15.32-post-failed-suspend
Comment 15 Nico Schottelius 2022-04-09 08:57:23 UTC
Created attachment 300732 [details]
dmesg-5.15.32-post-success-suspend
Comment 16 Nico Schottelius 2022-04-09 09:00:00 UTC
So far your theory that it is the same/similar issue seems to make sense, I assume this is the related IRQ:

  27:     332998          0          0          0          0      55796          0          0  IR-IO-APIC   27-fasteoi   idma64.0, i2c_designware.0


The notebook has a trackpad and a touchpad, as seen in xinput list:

[10:58] nb3:~% xinput list
⎡ Virtual core pointer                    	id=2	[master pointer  (3)]
⎜   ↳ Virtual core XTEST pointer              	id=4	[slave  pointer  (2)]
⎜   ↳ ELAN0670:00 04F3:3150 Mouse             	id=11	[slave  pointer  (2)]
⎜   ↳ ELAN0670:00 04F3:3150 Touchpad          	id=12	[slave  pointer  (2)]
⎜   ↳ TPPS/2 Elan TrackPoint                  	id=16	[slave  pointer  (2)]

I am mentioning this, because the IRQ count on 

 118:          0          0          0          0          0          0          0       8834  IR-IO-APIC  118-fasteoi   ELAN0670:00


Seems to be rather stable.
Comment 17 Thomas Weißschuh 2022-04-09 09:05:13 UTC
Can you do the same for s2idle? If the problem goes away then it would be the same as the other issues.

Fixing this properly apparently will need a firmware update for the touchpad which is not sure to be done.
For me it was easier to fix my issues with s2idle...
Comment 18 Nico Schottelius 2022-04-09 09:05:55 UTC
Just rebooted to 5.17.1-0-edge and indeed, the IRQ storm seems to be visible (if I am interpreting the data correctly). Will attach log files in a second.
Comment 19 Nico Schottelius 2022-04-09 09:06:39 UTC
Created attachment 300733 [details]
dmesg-5.17.1-0-edge.pre-suspend
Comment 20 Nico Schottelius 2022-04-09 09:07:03 UTC
Created attachment 300734 [details]
irq-watch-pre-suspend-5.17.1-0-edge
Comment 21 Nico Schottelius 2022-04-09 09:07:30 UTC
Created attachment 300735 [details]
dmesg-5.17.1-0-edge.post-suspend
Comment 22 Nico Schottelius 2022-04-09 09:07:52 UTC
Created attachment 300736 [details]
irq-watch-post-suspend-5.17.1-0-edge
Comment 23 Nico Schottelius 2022-04-09 09:25:01 UTC
So two findings after switching to s2_idle:

* Sometimes, like with s3, the suspend of the system fails, even with 5.17.1 (suspend 1 = failed, suspend 2 = success, suspend 3 = failed, suspend 4 = success)
* The mouse jumps do not happen after the 2nd suspend and from what I can the see IRQ storm is not present

To be honest, this whole issue MIGHT have started with me switching to S3 some weeks/month ago, probably on a similar level to 5.13 being phased out.

The reason why I switched to S3 is that the notebook gets hot in a bag during the suspend. The led outside indicates it is suspended, however the fans continue to run and I assume the system never really shuts down.

I'll keep s2_idle for the moment on, but I assume that the suspend issues have not been solved.

There is another quirk on the Lenovo X1 nano and I am not sure if it is related, but mentioning it here in case it is helpful: The external monitor output only works if the monitor is connected before wakeup: i.e. I suspend the notebook, connect the monitor and then wake it up -> xrandr shows the output.

With directly plugging it in, it does not appear. Not sure if related, just mentioning in case it's helpful.
Comment 24 Nico Schottelius 2022-04-09 09:55:46 UTC
I just checked. With s2_idle even after 1 minute the notebook's fan is still spinning as if it is running.

I will attach the latest dmesg during which this is happening.
Comment 25 Nico Schottelius 2022-04-09 09:56:14 UTC
Created attachment 300737 [details]
dmesg with fan active suspend
Comment 26 Thomas Weißschuh 2022-04-09 10:31:17 UTC
For the issues with failing suspends and spinning fans it would probably be best to open dedicated reports, either here on the Lenovo forums.
Comment 27 Nico Schottelius 2022-04-09 21:18:44 UTC
You are right. I think this bug can be closed as a duplicate.

*** This bug has been marked as a duplicate of bug 214775 ***