Bug 215747
Summary: | Mouse gets sluggish after suspend/resume and power usage significant higher after resume | ||
---|---|---|---|
Product: | Power Management | Reporter: | Nico Schottelius (nico-bugzilla.kernel.org) |
Component: | Hibernation/Suspend | Assignee: | Rafael J. Wysocki (rjw) |
Status: | RESOLVED DUPLICATE | ||
Severity: | normal | CC: | linux, regressions |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 5.17.0 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
dmesg
dmesg-5.15.31 dmesg after which symptoms happen on 5.15.31 as well /proc/interrupts before suspend & resume /proc/interrupts after FAILED suspend & resume /proc/interrupts AFTER successful suspend & resume dmesg-5.15.32-post-failed-suspend dmesg-5.15.32-post-success-suspend dmesg-5.17.1-0-edge.pre-suspend irq-watch-pre-suspend-5.17.1-0-edge dmesg-5.17.1-0-edge.post-suspend irq-watch-post-suspend-5.17.1-0-edge dmesg with fan active suspend |
Created attachment 300620 [details]
dmesg-5.15.31
This problem does not seem to exist on 5.15.31. Suspend/resume and the mosue is still working as usual. However the estimated runtime is significantly lower, from the start, at about 3h. Any processes using an abnormal amount of CPU after resume? Does powertop report anything unusual? Could you try installing Linux 5.17 and see if it's resolved? First correction: the problem also exist on 5.15.31-0-lts. Note though that the "TPPS/2 Elan TrackPoint" still works smooth, not hanging observed like I see it on the "ELAN0670:00 04F3:3150 Touchpad" CPU wise I cannot identify any process running more than the usual few % of CPU. Top lines of powertop after suspend/resume: Summary: 1367.1 wakeups/second, 0.0 GPU ops/seconds, 0.0 VFS ops/sec and 11.7% CPU use Power est. Usage Events/s Category Description 2.64 W 100.0% Device Radio device: btusb 1.33 W 50.0% Device Display backlight 1.21 W 5.0 ms/s 300.9 Timer tick_sched_timer 728 mW 328.6 µs/s 181.4 Interrupt [27] idma64.0 640 mW 15.3 ms/s 159.5 Process [PID 4143] firefox 264 mW 1.3 ms/s 40.8 Process [PID 18595] /usr/bin/pulseaudio --start -- 260 mW 1.2 ms/s 64.8 Process [PID 16564] /usr/lib/firefox/firefox -cont 225 mW 1.0 ms/s 56.0 Process [PID 4316] firefox 208 mW 14.3 pkts/s Device Network interface: wlan0 (iwlwifi) 203 mW 2.3 ms/s 50.5 Timer hrtimer_wakeup atop top lines: ATOP - nb3 2022/03/28 08:36:13 ----------------- 10s elapsed PRC | sys 1.00s | user 2.34s | #proc 244 | #tslpu 81 | #zombie 0 | no procacct | CPU | sys 8% | user 19% | irq 26% | idle 747% | wait 0% | curscal 49% | cpu | sys 3% | user 0% | irq 25% | idle 71% | cpu000 w 0% | curscal 98% | cpu | sys 1% | user 8% | irq 0% | idle 91% | cpu007 w 0% | curscal 42% | cpu | sys 1% | user 3% | irq 0% | idle 96% | cpu003 w 0% | curscal 9% | cpu | sys 1% | user 2% | irq 0% | idle 97% | cpu006 w 0% | curscal 58% | cpu | sys 1% | user 2% | irq 0% | idle 98% | cpu005 w 0% | curscal 55% | cpu | sys 1% | user 1% | irq 0% | idle 97% | cpu002 w 0% | curscal 26% | cpu | sys 1% | user 1% | irq 0% | idle 98% | cpu001 w 0% | curscal 23% | cpu | sys 1% | user 2% | irq 0% | idle 98% | cpu004 w 0% | curscal 83% | CPL | avg1 1.14 | avg5 1.37 | avg15 1.41 | csw 142536 | intr 290838 | numcpu 8 | MEM | tot 15.4G | free 588.6M | cache 6.5G | buff 852.4M | slab 2.5G | numnode 1 | SWP | tot 0.0M | free 0.0M | swcac 0.0M | | vmcom 9.8G | vmlim 7.7G | LVM | root | busy 1% | read 3 | write 8 | MBw/s 0.1 | avio 7.91 ms | DSK | nvme0n1 | busy 1% | read 3 | write 8 | MBw/s 0.1 | avio 7.91 ms | NET | transport | tcpi 68 | tcpo 78 | udpi 4 | udpo 8 | tcpao 2 | NET | network | ipi 129 | ipo 102 | ipfrw 0 | deliv 129 | icmpo 17 | NET | wlan0 0% | pcki 127 | pcko 102 | sp 433 Mbps | si 41 Kbps | so 23 Kbps | PID SYSCPU USRCPU RDELAY VGROW RGROW RUID ST EXC THR S CPUNR CPU CMD 1/3 2544 0.04s 0.74s 0.00s 128.0K 76.0K root -- - 1 S 7 8% powertop 4143 0.14s 0.52s 0.00s 900.0K 1.2M nico -- - 232 S 5 7% firefox 15785 0.21s 0.35s 0.00s 0B -4.0K nico -- - 33 S 4 6% Isolated Web C 4070 0.11s 0.13s 0.00s -4.0K 0B nico -- - 17 S 4 2% Xorg 18595 0.15s 0.08s 0.00s 0B 0B nico -- - 7 S 4 2% pulseaudio 4715 0.01s 0.16s 0.00s 204.0K 1.3M nico -- - 29 S 0 2% Isolated Web C 4353 0.02s 0.11s 0.00s -1.0M -0.2M nico -- - 29 S 6 1% WebExtensions 2479 0.09s 0.00s 0.01s 0B 0B root -- - 1 D 0 1% irq/118-ELAN06 4089 0.02s 0.04s 0.00s 0B 0B nico -- - 1 S 5 1% i3 3162 0.05s 0.01s 0.00s 0B 0B root -- - 1 S 1 1% acpid 3201 0.03s 0.02s 0.00s 0B 0B nico -- - 1 R 2 1% atop 27438 0.01s 0.02s 0.00s 132.0K 24.0K nico -- - 27 S 5 0% Isolated Web C 5791 0.01s 0.02s 0.00s 0B -24.0K nico -- - 27 S 3 0% Isolated Web C 31445 0.01s 0.01s 0.00s 1.0M 1.1M nico -- - 35 S 1 0% Isolated Web C 29507 0.00s 0.02s 0.00s 0B 0B nico -- - 27 S 2 0% Isolated Web C 13266 0.00s 0.02s 0.00s 0B 76.0K nico -- - 27 S 6 0% Isolated Web C 525 0.02s 0.00s 0.00s 0B 0B root -- - 1 I 2 0% kworker/u16:48 487 0.02s 0.00s 0.00s 0B 0B root -- - 1 I 1 0% kworker/u16:7- 4704 0.00s 0.01s 0.00s 0B -0.1M nico -- - 30 S 0 0% Isolated Web C I first detected the problem on 5.17.0 and I just downgraded afterwards to 5.15.31 to see if it persists, which is indeed the case. I will attach the dmesg output *after* this issue also happens on 5.15.31. Created attachment 300631 [details]
dmesg after which symptoms happen on 5.15.31 as well
(In reply to Nico Schottelius from comment #4) > > I first detected the problem on 5.17.0 and I just downgraded afterwards to > 5.15.31 to see if it persists, which is indeed the case. But things worked fine in the past, e.g. on earlier 5.15.y releases or older series like 5.14 or 5.13? Yes, I am pretty sure about that. It is a bit hard to tell when it exactly broke, as I am following the Alpine Linux edge releases for a while. My assumption is that it might still have been working in 5.13. I am also not 100% sure whether the energy consumption surge is related to it, however seeing that btusb is draining 2.6W is puzzling me a bit. Is there any command line I could enable for debugging issues around this? Are you using s3 or s2idle? If it is happening when using s3 it could be a duplicate of https://bugzilla.kernel.org/show_bug.cgi?id=214775 (and https://gitlab.freedesktop.org/libinput/libinput/-/issues/694 https://gitlab.freedesktop.org/libinput/libinput/-/issues/694 and everything linked from there) Can you check in /proc/interrupts the behavior of the device before and after suspend-resume? I am using s3, because with s2_idle the fans stay on during suspend and the device gets hot. I'll get the device and do a reboot in the next hours to check the irq behaviour. Now doing a test on 5.15.32 (current lts version of alpine linux, will reboot to -edge which is 5.17.x afterwards): * First suspend (pm-suspend call) fails * Trackpad still works correct after failed suspend * Second suspend works * Trackpad jumps after correct suspend/resume * IRQ storm seems to be present after 2nd suspend I've attached all files that I recorded, the interrupts before 1st, after 1st and after 2nd suspend as well a related dmesg. Will now reboot and do the same on 5.17. Created attachment 300728 [details]
/proc/interrupts before suspend & resume
I used ( while true; do date; cat /proc/interrupts; sleep 5; done ) | tee irq-watch-...
Created attachment 300729 [details]
/proc/interrupts after FAILED suspend & resume
Created attachment 300730 [details]
/proc/interrupts AFTER successful suspend & resume
Created attachment 300731 [details]
dmesg-5.15.32-post-failed-suspend
Created attachment 300732 [details]
dmesg-5.15.32-post-success-suspend
So far your theory that it is the same/similar issue seems to make sense, I assume this is the related IRQ: 27: 332998 0 0 0 0 55796 0 0 IR-IO-APIC 27-fasteoi idma64.0, i2c_designware.0 The notebook has a trackpad and a touchpad, as seen in xinput list: [10:58] nb3:~% xinput list ⎡ Virtual core pointer id=2 [master pointer (3)] ⎜ ↳ Virtual core XTEST pointer id=4 [slave pointer (2)] ⎜ ↳ ELAN0670:00 04F3:3150 Mouse id=11 [slave pointer (2)] ⎜ ↳ ELAN0670:00 04F3:3150 Touchpad id=12 [slave pointer (2)] ⎜ ↳ TPPS/2 Elan TrackPoint id=16 [slave pointer (2)] I am mentioning this, because the IRQ count on 118: 0 0 0 0 0 0 0 8834 IR-IO-APIC 118-fasteoi ELAN0670:00 Seems to be rather stable. Can you do the same for s2idle? If the problem goes away then it would be the same as the other issues. Fixing this properly apparently will need a firmware update for the touchpad which is not sure to be done. For me it was easier to fix my issues with s2idle... Just rebooted to 5.17.1-0-edge and indeed, the IRQ storm seems to be visible (if I am interpreting the data correctly). Will attach log files in a second. Created attachment 300733 [details]
dmesg-5.17.1-0-edge.pre-suspend
Created attachment 300734 [details]
irq-watch-pre-suspend-5.17.1-0-edge
Created attachment 300735 [details]
dmesg-5.17.1-0-edge.post-suspend
Created attachment 300736 [details]
irq-watch-post-suspend-5.17.1-0-edge
So two findings after switching to s2_idle: * Sometimes, like with s3, the suspend of the system fails, even with 5.17.1 (suspend 1 = failed, suspend 2 = success, suspend 3 = failed, suspend 4 = success) * The mouse jumps do not happen after the 2nd suspend and from what I can the see IRQ storm is not present To be honest, this whole issue MIGHT have started with me switching to S3 some weeks/month ago, probably on a similar level to 5.13 being phased out. The reason why I switched to S3 is that the notebook gets hot in a bag during the suspend. The led outside indicates it is suspended, however the fans continue to run and I assume the system never really shuts down. I'll keep s2_idle for the moment on, but I assume that the suspend issues have not been solved. There is another quirk on the Lenovo X1 nano and I am not sure if it is related, but mentioning it here in case it is helpful: The external monitor output only works if the monitor is connected before wakeup: i.e. I suspend the notebook, connect the monitor and then wake it up -> xrandr shows the output. With directly plugging it in, it does not appear. Not sure if related, just mentioning in case it's helpful. I just checked. With s2_idle even after 1 minute the notebook's fan is still spinning as if it is running. I will attach the latest dmesg during which this is happening. Created attachment 300737 [details]
dmesg with fan active suspend
For the issues with failing suspends and spinning fans it would probably be best to open dedicated reports, either here on the Lenovo forums. You are right. I think this bug can be closed as a duplicate. *** This bug has been marked as a duplicate of bug 214775 *** |
Created attachment 300619 [details] dmesg TL;DR: Suspending and resume makes the ELAN0670:00 trackpad sluggish (very hard to move the pointer) and the energy usage of the notebook is about 3 times higher than before suspend. Background: On a Lenovo X1 Nano the trackpad works fine until suspend/resume.The estimated battery runtime PRIOR to suspend/resume is 8h 43m. After suspend resume it drops to 2h 20m instantly. There seems to be a firmware error in the iwlwifi card show in the attached dmesg, but I am not sure whether "that's enough" to cause both symptoms. Kernel is from Alpine Linux, which is basically stock upstream afaics.