Bug 204367 - [macbookpro (12,1 early 2015)] cpu scaling doesn't work anymore after sleep/wakeup cycle
Summary: [macbookpro (12,1 early 2015)] cpu scaling doesn't work anymore after sleep/w...
Status: RESOLVED DUPLICATE of bug 203995
Alias: None
Product: Power Management
Classification: Unclassified
Component: intel_pstate (show other bugs)
Hardware: Intel Linux
: P1 normal
Assignee: Srinivas Pandruvada
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-07-29 16:56 UTC by nsklaus
Modified: 2019-10-05 20:39 UTC (History)
6 users (show)

See Also:
Kernel Version: 5.0 to 5.3rc2
Subsystem:
Regression: No
Bisected commit-id:


Attachments
journalctl -k, tlp-stat, journalctl -b (238.79 KB, text/plain)
2019-07-29 16:56 UTC, nsklaus
Details
Turbotstat Before Suspend. All working normally. (14.21 KB, text/plain)
2019-09-13 19:02 UTC, AO
Details
Turbotstat Before Suspend and After Suspend. CPU issue (24.64 KB, text/plain)
2019-09-13 19:07 UTC, AO
Details
Turbotstat After Suspend. CPU freq at max (limited to 3.40 GHz) (11.96 KB, text/plain)
2019-09-13 19:17 UTC, AO
Details
After resume, interrups report (11.64 KB, text/plain)
2019-09-14 13:53 UTC, AO
Details
Interrupts Graph before suspend, during, and after resume (700.31 KB, image/svg+xml)
2019-09-15 21:25 UTC, AO
Details
Suspend and Resume: interrupts with modules disabled. (678.61 KB, image/svg+xml)
2019-09-16 00:12 UTC, AO
Details
Interrupt graph while running 'rmmod i2c_hid' (612.04 KB, image/svg+xml)
2019-09-17 15:11 UTC, AO
Details
Interrupt graph while running kernel 5.3 (676.98 KB, image/svg+xml)
2019-09-17 16:19 UTC, AO
Details
Frequencies over time with modprobe [-r] i2c_hid (232.10 KB, image/png)
2019-09-18 03:45 UTC, AO
Details
Interrupts Graph before suspend, during, and after resume with systemd (701.80 KB, image/svg+xml)
2019-09-18 06:25 UTC, AO
Details

Description nsklaus 2019-07-29 16:56:47 UTC
Created attachment 284029 [details]
journalctl -k, tlp-stat,  journalctl -b

hello,

i have this power management issue on my laptop with cpu frequency scaling.
when i do an initial boot, everything works fine, i have tlp installed and my cpu frequency is correctly scaled as requested at minimum (800mhz) to prevent heating and keep longer battery. but, when i put the computer to sleep, after i wake it up, the cpu frequency scaling doesn't seem to work anymore. no matter what i try my cpu freq stays at 2.6ghz.

i added kernel argument:
GRUB_CMDLINE_LINUX_DEFAULT="text debug systemd.log_level=info" 

and then from terminal i did this just after a fresh boot:
echo 'file intel_pstate.c +p' | sudo dd of=/sys/kernel/debug/dynamic_debug/control

next, i did a sleep/wakeup cycle,
and finaly i took the output of:
journalctl -k --since='20 minutes ago'

and i attached it to this report.

my laptop is a macbookpro (12,1 early 2015) running ubuntu 19.04 as single OS.
so, basicaly to sum it up, regarding to cpu scaling, everything works fine, until i do a sleep/wakeup cycle, then afterwards the cpu freq just won't scale anymore
Comment 1 Chen Yu 2019-09-09 09:34:55 UTC
How about put all non boot CPU offline and online, then check their cpufreq?
Besides, do you set both the min, max to 5 before running S3?
Comment 2 AO 2019-09-11 14:51:05 UTC
I believe I have this same issue with the intel_pstate driver with a new HP Zbook x360 with the i9-8950HK Intel CPU. 


After some pause/resume states, I get two issues: 

1) CPU frequencies across all threads go to 100% even when system load is close to 0%.  
2) Some times the system will crash hard with a complete lockup (can't ssh to the machine remotely, current ssh sessions (doing a "tail -f /var/log/*log") hang ) as described in Bug# 200133. 

I'm seeing this in /var/log/kern.log around the time of the CPU going to max frequency.

Sep 10 19:22:09 REDACTED kernel: [    5.116718] intel_pstate: Intel P-state driver initializing
Sep 10 19:22:09 REDACTED kernel: [    5.116763] intel_pstate: Disabling energy efficiency optimization


Details: 


$cat /proc/cpuinfo 
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 158
model name      : Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
stepping        : 10
microcode       : 0xb4
cpu MHz         : 2400.000
cache size      : 12288 KB
physical id     : 0
siblings        : 12
core id         : 0
cpu cores       : 6
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 22
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology n
onstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch 
cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsave
s dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs
bogomips        : 5808.00
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

.... [others identical]

$ cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_driver
intel_pstate
intel_pstate
intel_pstate
intel_pstate
intel_pstate
intel_pstate
intel_pstate
intel_pstate
intel_pstate
intel_pstate
intel_pstate
intel_pstate

$ cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
powersave
powersave
powersave
powersave
powersave
powersave
powersave
powersave
powersave
powersave
powersave
powersave


$ uname -a
Linux ao-x360 5.0.0-27-generic #28-Ubuntu SMP Tue Aug 20 19:53:07 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

$ cat /etc/issue
Ubuntu 19.04 \n \l


$ cpufreq-info
analyzing CPU 0:
  driver: intel_pstate
  CPUs which run at the same hardware frequency: 0
  CPUs which need to have their frequency coordinated by software: 0
  maximum transition latency: 4294.55 ms.
  hardware limits: 800 MHz - 4.80 GHz
  available cpufreq governors: performance, powersave
  current policy: frequency should be within 800 MHz and 4.80 GHz.
                  The governor "powersave" may decide which speed to use
                  within this range.
  current CPU frequency is 4.80 GHz.

  ... [similar for all the other CPU threads]

All the CPU threads all show they are running at the maximum CPU allowed frequency. 

I can change the max allowed frequency to a lower setting with

  for i in {0..11}; do 
    sudo cpufreq-set -c $i --max "$DESIRED_MAX"GHz; 
  done


which lowers all to whatever I've set $DESIRED_MAX to, but they all stay pegged at 100% clock frequency continuously.

Suspend and resume does not solve the issue.

Only after reboots do CPU frequencies go back to normal behavior with each thread having independent and fluctuating frequencies. 

Happy to put in more information or debug if necessary. Would be nice to not have to reboot so frequently.
Comment 3 AO 2019-09-11 16:08:25 UTC
More info I was running BIOS="Q71 Ver 01.07.00 4/15/2019" with the above report/comment. 

I just checked and saw there's a newer HP BIOS available and installed it. NewBIOS="(Q71) Ver 01.08.00 Rev.A 08/14/2019"

Will update this ticket if the BIOS update does or does not result in a fix for this issue.
Comment 4 AO 2019-09-12 03:57:21 UTC
The BIOS update did not solve this issue. On restore: "intel_pstate: Disabling energy efficiency optimization" is reported

Some lines from /var/log/kern.log near intel_pstate and ACPI messages.

Sep 11 19:40:08 REDACTED kernel: [22319.285340] ACPI: Low-level resume complete
Sep 11 19:40:08 REDACTED kernel: [22319.285432] ACPI: EC: EC started
Sep 11 19:40:08 REDACTED kernel: [22319.285433] PM: Restoring platform NVS memory
Sep 11 19:40:08 REDACTED kernel: [22319.286327] Enabling non-boot CPUs ...
Sep 11 19:40:08 REDACTED kernel: [22319.286372] x86: Booting SMP configuration:
Sep 11 19:40:08 REDACTED kernel: [22319.286373] smpboot: Booting Node 0 Processor 1 APIC 0x2
Sep 11 19:40:08 REDACTED kernel: [22319.288313]  cache: parent cpu1 should not be sleeping
Sep 11 19:40:08 REDACTED kernel: [22319.288346] intel_pstate: Disabling energy efficiency optimization
Sep 11 19:40:08 REDACTED kernel: [22319.288502] CPU1 is up
Sep 11 19:40:08 REDACTED kernel: [22319.288788] smpboot: Booting Node 0 Processor 2 APIC 0x4
Sep 11 19:40:08 REDACTED kernel: [22319.289372]  cache: parent cpu2 should not be sleeping
Sep 11 19:40:08 REDACTED kernel: [22319.289534] CPU2 is up
Sep 11 19:40:08 REDACTED kernel: [22319.289554] smpboot: Booting Node 0 Processor 3 APIC 0x6
Sep 11 19:40:08 REDACTED kernel: [22319.290138]  cache: parent cpu3 should not be sleeping
Sep 11 19:40:08 REDACTED kernel: [22319.290310] CPU3 is up
...
Sep 11 19:40:08 REDACTED kernel: [22319.309924] ACPI: Waking up from system sleep state S3
Sep 11 19:40:08 REDACTED kernel: [22320.409692] ACPI: EC: interrupt unblocked
Sep 11 19:40:08 REDACTED kernel: [22320.410302] nvidia 0000:01:00.0: Enabling HDA controller
Sep 11 19:40:08 REDACTED kernel: [22320.748846] nvme nvme0: 8/0/0 default/read/poll queues
Sep 11 19:40:08 REDACTED kernel: [22320.848849] usb 1-14: reset full-speed USB device number 4 using xhci_hcd
Sep 11 19:40:08 REDACTED kernel: [22320.975491] ata4: SATA link down (SStatus 4 SControl 300)
Sep 11 19:40:08 REDACTED kernel: [22320.975522] ata3: SATA link down (SStatus 4 SControl 300)
Sep 11 19:40:08 REDACTED kernel: [22321.125262] usb 1-8: reset full-speed USB device number 3 using xhci_hcd
Sep 11 19:40:08 REDACTED kernel: [22321.142143] ish-hid {33AECD58-B679-4E54-9BD9-A04D34F0C226}: [hid-ish]: enum_devices_done OK, num_hid_devices=3
Sep 11 19:40:08 REDACTED kernel: [22321.401408] usb 1-7: reset high-speed USB device number 2 using xhci_hcd
Sep 11 19:40:08 REDACTED kernel: [22323.193031] ACPI: EC: event unblocked
Sep 11 19:40:08 REDACTED kernel: [22323.256586] acpi LNXPOWER:09: Turning OFF
Sep 11 19:40:08 REDACTED kernel: [22323.256643] acpi LNXPOWER:05: Turning OFF
Sep 11 19:40:08 REDACTED kernel: [22323.260798] OOM killer enabled.
Sep 11 19:40:08 REDACTED kernel: [22323.260798] Restarting tasks ... done.
Sep 11 19:40:09 REDACTED kernel: [22323.358096] PM: suspend exit
...
Comment 5 AO 2019-09-13 13:45:04 UTC
More investigation: 

$cat /proc/cmdline

BOOT_IMAGE=/boot/vmlinuz-5.0.0-27-generic root=REDACTED ro quiet splash vt.handoff=1


On boot these seem to be the intel drivers. 

$grep -a intel_ /var/log/kern.log

Sep 13 08:09:41 REDACTED kernel: [    4.988078] intel_idle: MWAIT substates: 0x11142120
Sep 13 08:09:41 REDACTED kernel: [    4.988079] intel_idle: v0.4.1 model 0x9E
Sep 13 08:09:41 REDACTED kernel: [    4.988569] intel_idle: lapic_timer_reliable_states 0xffffffff
Sep 13 08:09:41 REDACTED kernel: [    5.099840] intel_pstate: Intel P-state driver initializing
Sep 13 08:09:41 REDACTED kernel: [    5.099883] intel_pstate: Disabling energy efficiency optimization
Sep 13 08:09:41 REDACTED kernel: [    5.100412] intel_pstate: HWP enabled
Sep 13 08:09:41 REDACTED kernel: [    5.133952] intel_pmc_core:  initialized
Sep 13 08:09:41 REDACTED kernel: [   10.263525] intel_rapl: Found RAPL domain package
Sep 13 08:09:41 REDACTED kernel: [   10.263526] intel_rapl: Found RAPL domain core
Sep 13 08:09:41 REDACTED kernel: [   10.263527] intel_rapl: Found RAPL domain uncore
Sep 13 08:09:41 REDACTED kernel: [   10.263528] intel_rapl: Found RAPL domain dram
Comment 6 Srinivas Pandruvada 2019-09-13 15:21:55 UTC
Are you saying that you don't see "intel_pstate: Disabling energy efficiency optimizations" during boot process. Only appears during system resume?
Comment 7 AO 2019-09-13 17:05:24 UTC
I see "intel_pstate: Disabling energy efficiency optimization" appearing both during boot and system resume. 

Here's the last boot dmesg filtered by egrep:

$ dmesg | egrep 'intel_|ENERGY'
[    0.509321] ENERGY_PERF_BIAS: Set to 'normal', was 'performance'
[    0.509321] ENERGY_PERF_BIAS: View and update with x86_energy_perf_policy(8)
[    4.988078] intel_idle: MWAIT substates: 0x11142120
[    4.988079] intel_idle: v0.4.1 model 0x9E
[    4.988569] intel_idle: lapic_timer_reliable_states 0xffffffff
[    5.099840] intel_pstate: Intel P-state driver initializing
[    5.099883] intel_pstate: Disabling energy efficiency optimization
[    5.100412] intel_pstate: HWP enabled
[    5.133952] intel_pmc_core:  initialized
[   10.263525] intel_rapl: Found RAPL domain package
[   10.263526] intel_rapl: Found RAPL domain core
[   10.263527] intel_rapl: Found RAPL domain uncore
[   10.263528] intel_rapl: Found RAPL domain dram


Comment #4 above is what appeared in /var/log/kern.log right before the CPU frequencies went and stayed at 100% max.
Comment 8 Srinivas Pandruvada 2019-09-13 18:20:23 UTC
Can you run turbostat and then do suspend/resume while the tool is running and collect the log till you observe the problem.
Comment 9 AO 2019-09-13 19:02:00 UTC
Created attachment 284955 [details]
Turbotstat Before Suspend. All working normally.

First of 3 files to upload: turbostat.1.log

1. Reboot. 
2. $sudo turbostat -d -o turbostat.1.log
3. Ctrl-C

Baseline file when all is working normally.
Comment 10 AO 2019-09-13 19:07:16 UTC
Created attachment 284957 [details]
Turbotstat Before Suspend and After Suspend. CPU issue

2nd of three files to upload

1. sudo turbostat -d -o turbostat.2.log
2. Close lid
3. Open lid

CPUs frequencies now all identical and all close to 100% (not quite).
Comment 11 AO 2019-09-13 19:17:52 UTC
Created attachment 284959 [details]
Turbotstat After Suspend. CPU freq at max (limited to 3.40 GHz)

3rd of 3 files to upload. 

I had forgotten that to prevent overheating when this happens I run 

for i in {0..11}; do
  sudo cpufreq-set -c $i --max 3.40GHz;
done

which really shows how CPU gets pegged to max. 

1. Limit max CPU freq as above
2. sudo turbostat -d -o turbostat.3.log
3. Ctrl-C
Comment 12 Srinivas Pandruvada 2019-09-13 19:43:09 UTC
This is happening because after resume you are getting too many interrupts (~2000 vs ~123000). So some device after resume just getting continuous interrupts.
I suggest look at 
cat /proc/interrupts before and after and check which IRQ is this.
Comment 13 AO 2019-09-14 13:53:18 UTC
Created attachment 284965 [details]
After resume, interrups report

I did a suspend and resume and did a $watch "cat /proc/interrupts" (see attached file) 

The interrupts which jumped and/or kept increasing like mad were

14,16,17,20,122,126,136

The largest initial jump was from 
  17:       IR-IO-APIC   17-fasteoi   i2c_designware.1, idma64.1, snd_hda_intel:card1


The ones which seemed to increase per unit time the fastest of the above were 126, 136, 16, 14
 136:        intel-gpio  258  SYNA307B:00
 126:        intel-gpio   77  WCOM48A0:00
  16:        IR-IO-APIC   16-fasteoi   i801_smbus, i2c_designware.0, idma64.0
  14:        IR-IO-APIC   14-fasteoi   INT3450:00


The ones which increased but not as quickly were 20 and 122
  20:       IR-IO-APIC   20-fasteoi   intel_ish_ipc
 122:        IR-PCI-MSI 327680-edge      xhci_hcd


I didn't do a "interrupts/time" chart but can if there's a need (and if you know a better way than using awk to sum the rows). 

--------------------------------------
In "top", ordered by CPU, SNA307B and WCOM48A are the top two processes. 
----
top - 08:34:19 up 26 min,  1 user,  load average: 2.04, 1.98, 1.65
Tasks: 280 total,   1 running, 279 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.1 us,  0.6 sy,  0.0 ni, 99.0 id,  0.0 wa,  0.0 hi,  0.2 si,  0.0 st
MiB Mem :  64229.5 total,  60860.0 free,   1672.9 used,   1696.7 buff/cache
MiB Swap:   2048.0 total,   2048.0 free,      0.0 used.  61751.3 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                                                                 
  329 root     -51   0       0      0      0 S   7.0   0.0   1:34.69 irq/136-SYNA307                                                                                                                                                         
  327 root     -51   0       0      0      0 D   1.3   0.0   0:17.84 irq/126-WCOM48A                                                                                                                                                         
  460 root      19  -1  119652  75752  74292 S   1.0   0.1   0:13.79 systemd-journal                                                                                                                                                         
 1002 root      20   0    8260   4816   1692 S   1.0   0.0   0:10.76 haveged                                                                                                                                                                 
 1009 syslog    20   0  224292   5088   3556 S   0.7   0.0   0:04.06 rsyslogd
...
Comment 14 Srinivas Pandruvada 2019-09-14 14:50:43 UTC
They seems to be touchpad drivers. Can you blacklist intel-gpio via kernel command line and boot?
It is better to check rate of interrupts before and after resume. I think you will not see them as at the top in the "top" report. 

One more experiment to try if you see this issue by doing
#echo freeze > /sys/power/state

Then resume.
Comment 15 AO 2019-09-15 21:25:18 UTC
Created attachment 284985 [details]
Interrupts Graph before suspend, during, and after resume

I didn't find any programs which recorded changes in interrupts over time by each irq so I wrote one in bash (let me know if you think it useful to upload) which parses /proc/interrupts and then passes aggregated data to gnuplot (see attached .svg). 

The attached graph shows: 

* xaxis = seconds since program started. You see that I started the program, then did a suspend and resume (closed, waited, then opened the lid).
Suspend kicked in at about t=5 seconds. About 25 seconds later (about t=30) the system resumed reporting interrupts. 

* yaxis = # interrupts per IRQ since program started. This # reported is cumulative. 


IRQ 16 = "IR-IO-APIC 16-fasteoi i801_smbus,"
IRQ 17 = "IR-IO-APIC 17-fasteoi i2c_designware.1,"

Interrupts for IRQ 17 continue to increase rapidly when the system pegs CPU frequencies.


As you asked, I also tried 

#echo freeze > /sys/power/state

Since the system resumed normally I tested it a few times.  I resumed it in different ways.  On resume the system always acted normally. In this freeze test, Interrupts for IRQ 17 increased rapidly but only until about time=50 seconds and then stopped increasing rapidly. Interrupts for IRQ 16 did not skyrocket. I can upload that graph as an attachment if you think it worthwhile.
Comment 16 Srinivas Pandruvada 2019-09-15 21:56:10 UTC
So some i2c devices going crazy after resume. Can you try to blacklist i2c_designware and i801_smbus module?
Comment 17 Srinivas Pandruvada 2019-09-15 21:57:24 UTC
Also blacklist snd_hda_intel.
Comment 18 AO 2019-09-16 00:12:40 UTC
Created attachment 284987 [details]
Suspend and Resume: interrupts with modules disabled.

Added to /etc/modprobe.d/blacklist-aaatest.conf the following lines

blacklist i2c_designware
blacklist i801_smbus
blacklist snd_hda_intel


After a reboot. The laptop does not have sound. 
Suspend and resume still results in CPU frequencies going to max. 

IRQs 16 and 17 increase continuously. (see attached)
Comment 19 AO 2019-09-16 00:48:30 UTC
Also tried 

blacklist i2c_i801 

in the file, but it looks like it gets loaded anyway. 

$ lsmod | grep i2c
i2c_algo_bit           16384  1 i915
i2c_i801               32768  0
i2c_hid                28672  0
hid                   126976  8 i2c_hid,wacom,usbhid,hid_multitouch,hid_sensor_hub,intel_ishtp_hid,hid_generic,uhid


And with "blacklist i2c_i801" in /etc/modprobe.d/blacklist-aaatest.conf,  suspend and resume still has the CPU frequencies going to max.
Comment 20 Srinivas Pandruvada 2019-09-16 16:21:34 UTC
Added Jarkko to see if there is a way to silence i2c interrupts after resume.
Comment 21 Jarkko Nikula 2019-09-17 06:37:10 UTC
Couple thoughts:
"In "top", ordered by CPU, SNA307B and WCOM48A are the top two processes.".

Does interrupt storm go away if you rmmod i2c_hid? I see both of them generate also bunch of interrupts and IIRC each touchscreen/touchpad interrupt cause many I2C transactions and interrupts.

Are you able to try newest Linux v5.3 kernel to see does it have the same issue?
Comment 22 AO 2019-09-17 15:11:27 UTC
Created attachment 285031 [details]
Interrupt graph while running 'rmmod i2c_hid'

Yes, if you run "rmmod i2c_hid" then the interrupt storm immediately stops. In the attached
SVG image you see interrupts for IRQs 16 and 17 skyrocketing until T=4 seconds which is about when I ran the 
rmmod command. 


It also kills the touchpad and touchscreen leaving only keyboard for an interface.  

Unfortunately running "insmod i2c_hid" did not work (I think the message was something like "could not find i2c_hid")

I can try 5.3 with https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.3/ and let you know what happens.
Comment 23 Srinivas Pandruvada 2019-09-17 16:14:29 UTC
After you rmmod i2c_hid, do you see the frequencies drop also? I don't see turbostate output after that.
Comment 24 AO 2019-09-17 16:19:18 UTC
Created attachment 285037 [details]
Interrupt graph while running kernel 5.3

Test with Kernel 5.3 installed with .deb files

( 
linux-headers-5.3.0-050300_5.3.0-050300.201909152230_all.deb
linux-headers-5.3.0-050300-generic_5.3.0-050300.201909152230_amd64.deb
linux-image-unsigned-5.3.0-050300-generic_5.3.0-050300.201909152230_amd64.deb
linux-modules-5.3.0-050300-generic_5.3.0-050300.201909152230_amd64.deb
)

after I did a suspend/resume the results were:

  * touchpad/touchscreen worked normally for only a few seconds, then was significantly laggy, then stopped responding entirely (keyboard worked though)
  * IRQ 17 interrupts normalized (see attached image)
  * IRQ 16 interrupts continued to increase without bound (see attached image)
  * CPU frequencies were intermittently at max and normal
  * full crash/system-lockup when trying to reboot

Messages on installation of kernel were
------------------------------
$ sudo dpkg -i *.deb
Selecting previously unselected package linux-headers-5.3.0-050300.
(Reading database ... 229726 files and directories currently installed.)
Preparing to unpack linux-headers-5.3.0-050300_5.3.0-050300.201909152230_all.deb ...
Unpacking linux-headers-5.3.0-050300 (5.3.0-050300.201909152230) ...
Selecting previously unselected package linux-headers-5.3.0-050300-generic.
Preparing to unpack linux-headers-5.3.0-050300-generic_5.3.0-050300.201909152230_amd64.deb ...
Unpacking linux-headers-5.3.0-050300-generic (5.3.0-050300.201909152230) ...
Selecting previously unselected package linux-image-unsigned-5.3.0-050300-generic.
Preparing to unpack linux-image-unsigned-5.3.0-050300-generic_5.3.0-050300.201909152230_amd64.deb ...
Unpacking linux-image-unsigned-5.3.0-050300-generic (5.3.0-050300.201909152230) ...
Selecting previously unselected package linux-modules-5.3.0-050300-generic.
Preparing to unpack linux-modules-5.3.0-050300-generic_5.3.0-050300.201909152230_amd64.deb ...
Unpacking linux-modules-5.3.0-050300-generic (5.3.0-050300.201909152230) ...
Setting up linux-headers-5.3.0-050300 (5.3.0-050300.201909152230) ...
Setting up linux-headers-5.3.0-050300-generic (5.3.0-050300.201909152230) ...
/etc/kernel/header_postinst.d/dkms:
 * dkms: running auto installation service for kernel 5.3.0-050300-generic

Kernel preparation unnecessary for this kernel.  Skipping...

Building module:
cleaning build area...
unset ARCH; env NV_VERBOSE=1 'make' -j12 NV_EXCLUDE_BUILD_MODULES='' KERNEL_UNAME=5.3.0-050300-generic IGNORE_XEN_PRESEN
CE=1 IGNORE_CC_MISMATCH=1 SYSSRC=/lib/modules/5.3.0-050300-generic/build LD=/usr/bin/ld.bfd modules.....(bad exit status
: 2)
ERROR (dkms apport): kernel package linux-headers-5.3.0-050300-generic is not supported
Error! Bad return status for module build on kernel: 5.3.0-050300-generic (x86_64)
Consult /var/lib/dkms/nvidia/418.56/build/make.log for more information.
   ...done.
Setting up linux-modules-5.3.0-050300-generic (5.3.0-050300.201909152230) ...
Setting up linux-image-unsigned-5.3.0-050300-generic (5.3.0-050300.201909152230) ...
I: /vmlinuz.old is now a symlink to boot/vmlinuz-5.0.0-27-generic
I: /initrd.img.old is now a symlink to boot/initrd.img-5.0.0-27-generic
I: /vmlinuz is now a symlink to boot/vmlinuz-5.3.0-050300-generic
I: /initrd.img is now a symlink to boot/initrd.img-5.3.0-050300-generic
Processing triggers for linux-image-unsigned-5.3.0-050300-generic (5.3.0-050300.201909152230) ...
/etc/kernel/postinst.d/dkms:
 * dkms: running auto installation service for kernel 5.3.0-050300-generic

Kernel preparation unnecessary for this kernel.  Skipping...

Building module:
cleaning build area...
unset ARCH; env NV_VERBOSE=1 'make' -j12 NV_EXCLUDE_BUILD_MODULES='' KERNEL_UNAME=5.3.0-050300-generic IGNORE_XEN_PRESEN
CE=1 IGNORE_CC_MISMATCH=1 SYSSRC=/lib/modules/5.3.0-050300-generic/build LD=/usr/bin/ld.bfd modules......(bad exit statu
s: 2)
ERROR (dkms apport): kernel package linux-headers-5.3.0-050300-generic is not supported
Error! Bad return status for module build on kernel: 5.3.0-050300-generic (x86_64)
Consult /var/lib/dkms/nvidia/418.56/build/make.log for more information.
   ...done.
/etc/kernel/postinst.d/initramfs-tools:
update-initramfs: Generating /boot/initrd.img-5.3.0-050300-generic
cryptsetup: WARNING: The initramfs image may not contain cryptsetup binaries 
    nor crypto modules. If that's on purpose, you may want to uninstall the 
    'cryptsetup-initramfs' package in order to disable the cryptsetup initramfs 
    integration and avoid this warning.
W: Possible missing firmware /lib/firmware/i915/icl_guc_32.0.3.bin for module i915
W: Possible missing firmware /lib/firmware/i915/glk_guc_32.0.3.bin for module i915
W: Possible missing firmware /lib/firmware/i915/kbl_guc_32.0.3.bin for module i915
W: Possible missing firmware /lib/firmware/i915/bxt_guc_32.0.3.bin for module i915
W: Possible missing firmware /lib/firmware/i915/skl_guc_32.0.3.bin for module i915
W: Possible missing firmware /lib/firmware/i915/icl_huc_ver8_4_3238.bin for module i915
W: Possible missing firmware /lib/firmware/i915/glk_huc_ver03_01_2893.bin for module i915
/etc/kernel/postinst.d/zz-update-grub:
Sourcing file `/etc/default/grub'
Sourcing file `/etc/default/grub.d/init-select.cfg'
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-5.3.0-050300-generic
Found initrd image: /boot/initrd.img-5.3.0-050300-generic
Found linux image: /boot/vmlinuz-5.0.0-27-generic
Found initrd image: /boot/initrd.img-5.0.0-27-generic
Found linux image: /boot/vmlinuz-5.0.0-25-generic
Found initrd image: /boot/initrd.img-5.0.0-25-generic
Adding boot menu entry for EFI firmware configuration
done
---------------------
Comment 25 AO 2019-09-17 16:21:35 UTC
> After you rmmod i2c_hid, do you see the frequencies drop also? I don't see
> turbostate output after that.

Yes - frequencies drop to normal. Do you want me to re-test the 'rmmod i2c_hid' command with a turbostate report?
Comment 26 Srinivas Pandruvada 2019-09-17 17:02:15 UTC
No, I just want to make sure that this is the issue.
Comment 27 AO 2019-09-18 03:45:27 UTC
Created attachment 285047 [details]
Frequencies over time with modprobe [-r] i2c_hid

I have a workaround: 

On reading the docs it appears modprobe is smarter than insmod and can handle more dependencies. 

So on resume I ran 

$ sudo modprobe -r  i2c_hid
 and then
$ sudo modprobe  i2c_hid

Immediately after "modprobe -r i2c_hid" the frequencies return to normal and IRQ interrupts stop increasing. On running "modprobe i2c_hid" the mouse and touchscreen start working again.
Comment 28 AO 2019-09-18 06:25:08 UTC
Created attachment 285049 [details]
Interrupts Graph before suspend, during, and after resume with systemd

To see if I could automate the modprobe commands on sleep and resume, I created 2 systemd unit files and 2 executables 
which disable i2c_hid on sleep and enable it on resume. 

The results were as follows
1) Resume is now almost instantaneous. Before it took 2-5 seconds and I'd have to type furiously to get the keyboard to be recognized for login. 
2) interrupts do not climb as quickly or as much
3) interrupts do not climb without stopping.  (see attached image)

The 4 files to disabling/enabling i2c_hid are as follows. 

::::FILE 1::::::::::
/etc/systemd/system/aaa.removei2c.onsuspend.service
::::::::::::::
[Unit]
Description=Disables i2c_ihd at sleep to fix module issue

[Service]
Type=oneshot
ExecStart=/usr/local/bin/disable_touchpad
TimeoutSec=0
StandardOutput=syslog

[Install]
WantedBy=sleep.target




::::::FILE 2::::::::
/etc/systemd/system/aaa.returni2c.onresume.service
::::::::::::::
[Unit]
Description=Re-Enables intel i2c mouse driver at resume from sleep
After=suspend.target

[Service]
User=root
Type=oneshot
ExecStart=/usr/local/bin/enable_touchpad
TimeoutSec=0
StandardOutput=syslog

[Install]
WantedBy=suspend.target


::::FILE 3::::::::::
/usr/local/bin/disable_touchpad
::::::::::::::
#!/bin/bash
#
logger "Disabling i2c_hid"
modprobe -r i2c_hid


::::FILE 4::::::::::
/usr/local/bin/enable_touchpad
::::::::::::::
#!/bin/bash
logger "Enabling i2c_hid"
modprobe i2c_hid
Comment 29 Jarkko Nikula 2019-09-18 08:13:04 UTC
Cc'ed Andy if he knows from pinctrl point of view what might explain that those touchpad and touchscreen GPIO interrupts starts producing spurious interrupts after resume.

Can this be the same issue than in linux-next commit 6cb0880f0822 ("pinctrl: intel: remap the pin number to gpio offset for irq enabled pin")?
Comment 30 Andy Shevchenko 2019-09-18 10:55:42 UTC
(In reply to Jarkko Nikula from comment #29)
> Can this be the same issue than in linux-next commit 6cb0880f0822 ("pinctrl:
> intel: remap the pin number to gpio offset for irq enabled pin")?

Yes, it's being discussed in bug #203995.
Comment 31 AO 2019-09-19 00:01:26 UTC
I'd like to test on my machine when that patch (or a similar one) makes its way to a Ubuntu kernel repository, but I'm not sure if I have found the right spot to track this fix through github or git.kernel.org 

So, apologies if this is an ignorant question and for not being able to figure this out, but it looks like a possible patch for this issue was submitted here: https://github.com/endlessm/linux/pull/494 ( remap the pin number to gpio offset for irq enabled pin )? 

but it appears that code didn't make it into a kernel merge? 

The closest thing I found was:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a0a5f7661e1c49de0c1f93ca435ba2880b88fc5c


Is that the code I should make sure is in a kernel when I next test to see if this solves that problem with my HP Zbook?
Comment 32 Chen Yu 2019-09-30 01:44:59 UTC
According to 203995 , the patch is queued for 5.4.

*** This bug has been marked as a duplicate of bug 203995 ***
Comment 33 AO 2019-10-05 20:39:17 UTC
I saw that the latest Ubuntu distro kernel update had a bunch of i2c fixes and HID fixes. 

https://launchpad.net/ubuntu/+source/linux/5.0.0-31.33

So I installed that kernel and tested while disabling the workaround (see above) that removed/added the i2c_hid module on sleep/resume.

The system on resuming from sleep now: 
 * No longer has an interrupt storm 
 * CPU frequencies behave normally
 * keyboard/mouse is accessible after resume without a long delay. 

It appears this kernel update has resolved the issue for this brand of HP Zbook. 

Thanks!

Note You need to log in before you can comment on or make changes to this bug.