Bug 101861 - kworker/0:2 uses 75 % of CPU core after USB Speakers are attached and USB3 is enabled
Summary: kworker/0:2 uses 75 % of CPU core after USB Speakers are attached and USB3 is...
Status: CLOSED WILL_NOT_FIX
Alias: None
Product: ACPI
Classification: Unclassified
Component: BIOS (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Lv Zheng
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-07-23 12:20 UTC by Michal Nowak
Modified: 2017-04-12 06:41 UTC (History)
7 users (show)

See Also:
Kernel Version: kernel-4.1.3-200.fc22.x86_64
Subsystem:
Regression: No
Bisected commit-id:


Attachments
ACPI interrupts counters (3.91 KB, text/plain)
2015-07-23 12:21 UTC, Michal Nowak
Details
dmesg (126.73 KB, text/plain)
2015-07-23 12:21 UTC, Michal Nowak
Details
DSDT dump (31.85 KB, application/octet-stream)
2015-07-23 12:22 UTC, Michal Nowak
Details
journalctl log (4.09 KB, text/x-log)
2015-07-23 12:27 UTC, Michal Nowak
Details
lspci (1.63 KB, text/plain)
2015-07-23 12:28 UTC, Michal Nowak
Details
lspci -vv (27.61 KB, text/plain)
2015-07-23 12:28 UTC, Michal Nowak
Details
lsusb (53.62 KB, text/plain)
2015-07-23 12:28 UTC, Michal Nowak
Details
lsusb logitech (5.79 KB, text/plain)
2015-07-23 12:29 UTC, Michal Nowak
Details
perf top (4.90 KB, text/plain)
2015-07-23 12:29 UTC, Michal Nowak
Details
ACPI dump on 4.3.3 kernel with USB 2 on (299.47 KB, text/plain)
2016-01-09 14:49 UTC, Michal Nowak
Details
ACPI dump on 4.3.3 kernel with USB 3 on (299.47 KB, text/plain)
2016-01-09 14:49 UTC, Michal Nowak
Details
Verbose log from usb-core debugging (6.02 KB, text/plain)
2016-01-12 20:30 UTC, Michal Nowak
Details
Stacks from kworker (88.27 KB, application/x-xz)
2016-01-15 18:57 UTC, Michal Nowak
Details
/sys/kernel/debug/tracing/set_event output (18.00 KB, application/x-xz)
2016-02-04 21:31 UTC, Michal Nowak
Details
Interrupts before suspend (3.91 KB, text/plain)
2016-05-24 19:03 UTC, Michal Nowak
Details
Interrupts after suspend (3.91 KB, text/plain)
2016-05-24 19:03 UTC, Michal Nowak
Details
Interrupts after keyboard was plugged in (after suspend->resume) (3.91 KB, text/plain)
2016-05-24 19:13 UTC, Michal Nowak
Details
trace_pipe (148.33 KB, text/plain)
2016-05-24 19:34 UTC, Michal Nowak
Details

Description Michal Nowak 2015-07-23 12:20:46 UTC
I run kernel-4.1.3-200.fc22.x86_64 on Toshiba Satellite L855 notebook with latest BIOS. When USB Speakers are attached kworker pops up and indefinitely uses one CPU on 75 %, indefinitely. It happens only when USB3 is enabled in BIOS, USB2 is OK.

USB Speaker: Logitech S-150 (lsusb claims it to be: Tenx Technology, Inc.).

If /sys/firmware/acpi/interrupts/gpe0D is disabled by

  echo disable > /sys/firmware/acpi/interrupts/gpe0D 

kworker disappears, if enabled pops up again.

Wrt, the logs: Jul 23 13:50:15 is the time I plugged the Speakers.
Comment 1 Michal Nowak 2015-07-23 12:21:26 UTC
Created attachment 183371 [details]
ACPI interrupts counters
Comment 2 Michal Nowak 2015-07-23 12:21:44 UTC
Created attachment 183381 [details]
dmesg
Comment 3 Michal Nowak 2015-07-23 12:22:03 UTC
Created attachment 183391 [details]
DSDT dump
Comment 4 Michal Nowak 2015-07-23 12:27:02 UTC
Created attachment 183401 [details]
journalctl log

Just the relevant parts, the original one contained to much personal stuff. Let me know if you need more.
Comment 5 Michal Nowak 2015-07-23 12:28:01 UTC
Created attachment 183411 [details]
lspci
Comment 6 Michal Nowak 2015-07-23 12:28:20 UTC
Created attachment 183421 [details]
lspci -vv
Comment 7 Michal Nowak 2015-07-23 12:28:37 UTC
Created attachment 183431 [details]
lsusb
Comment 8 Michal Nowak 2015-07-23 12:29:06 UTC
Created attachment 183441 [details]
lsusb logitech
Comment 9 Michal Nowak 2015-07-23 12:29:35 UTC
Created attachment 183451 [details]
perf top

Head of `perf top`.
Comment 10 Aaron Lu 2015-08-19 09:17:39 UTC
After boot, does gpe0D gets enabled? Check its status by:
$ cat /sys/firmware/acpi/interrupts/gpe0D

There are multiple devices make use of GPE 0D to wake system up:
EHC1, EHC2, XHC and HDEF. They have _PRW defined that uses GPE 0D, but that GPE should only be enabled when we are going to enter system sleep state I think.
Comment 11 Michal Nowak 2015-08-20 08:59:30 UTC
Yes, if gpe0D is disabled, it is enabled after a reboot. Let me know if there's any other info you need.
Comment 12 Zhang Rui 2015-12-28 07:33:28 UTC
please attach the full acpidump output by running "acpidump > acpidump.out".
Comment 13 Michal Nowak 2016-01-09 14:49:12 UTC
Created attachment 199091 [details]
ACPI dump on 4.3.3 kernel with USB 2 on
Comment 14 Michal Nowak 2016-01-09 14:49:40 UTC
Created attachment 199101 [details]
ACPI dump on 4.3.3 kernel with USB 3 on
Comment 15 Michal Nowak 2016-01-09 14:51:32 UTC
--- acpidump.4.3.3-300.fc23.x86_64.USB2.out	2016-01-09 15:21:51.144357799 +0100
+++ acpidump.4.3.3-300.fc23.x86_64.USB3.out	2016-01-09 15:26:43.471728604 +0100
@@ -3804,10 +3804,10 @@ SSDT @ 0x0000000000000000
   08D0: 00 00 0A FC 0A 80                                ......
 
 SSDT @ 0x0000000000000000
-  0000: 53 53 44 54 92 0A 00 00 01 C7 49 4E 53 59 44 45  SSDT......INSYDE
+  0000: 53 53 44 54 92 0A 00 00 01 D7 49 4E 53 59 44 45  SSDT......INSYDE
   0010: 43 52 20 43 52 42 20 20 00 30 00 00 41 43 50 49  CR CRB  .0..ACPI
   0020: 00 00 04 00 10 46 0C 5C 00 08 53 53 44 54 12 43  .....F.\..SSDT.C
-  0030: 05 0C 0D 43 50 55 30 49 53 54 20 00 0C 18 C0 43  ...CPU0IST ....C
+  0030: 05 0C 0D 43 50 55 30 49 53 54 20 00 0C 18 B0 43  ...CPU0IST ....C
   0040: AD 0C D6 08 00 00 0D 41 50 49 53 54 20 20 20 00  .......APIST   .
   0050: 0C 98 AA 5F AF 0C 03 03 00 00 0D 43 50 55 30 43  ..._.......CPU0C
   0060: 53 54 20 00 0C 18 90 5F AF 0C 0E 08 00 00 0D 41  ST ...._.......A
Comment 16 Aaron Lu 2016-01-11 01:58:16 UTC
I'm adding Alan and Mathias in case they have an idea what could have happened here.
Comment 17 Alan Stern 2016-01-11 16:08:25 UTC
Michal, please try enabling debugging for the usbcore module (echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control) and see what dmesg says.
Comment 18 Michal Nowak 2016-01-12 20:30:15 UTC
Created attachment 199461 [details]
Verbose log from usb-core debugging

The log starts when I plugged the USB speakers in and the kworker process started using CPU to it's usual 75 %.
Comment 19 Alan Stern 2016-01-12 21:42:36 UTC
I thought the activity might be caused by lots of suspends and resumes.  But all the suspends and resumes in the log are for some other device, so that's not the reason.

Can you find out what that kworker thread is running (for example by Alt-SysRq-W)?
Comment 20 Michal Nowak 2016-01-14 19:43:42 UTC
I see a lot of lines like:

cfs_rq[5]:/autogroup-81
  .exec_clock                    : 601.771342
  .MIN_vruntime                  : 0.000001
  .min_vruntime                  : 587.650039
  .max_vruntime                  : 0.000001
  .spread                        : 0.000000
  .spread0                       : -6993524.765452
  .nr_spread_over                : 0
  .nr_running                    : 0
  ...

But the respective kworker process (kworker/0:2 or kworker/0:0) is not there (but many other kworker processes are). Is there a more direct way how to "dump" the process e.g. by using it's PID?
Comment 21 Alan Stern 2016-01-14 22:12:24 UTC
Yeah, that's not very helpful, is it?  "cat /proc/PID/stack" (as root) may work out better.
Comment 22 Michal Nowak 2016-01-15 17:13:04 UTC
There's just:

[<ffffffffffffffff>] 0xffffffffffffffff

Sorry... The rest is out of curiosity:

6398 # cat sched
kworker/0:1 (6398, #threads: 1)
-------------------------------------------------------------------
se.exec_start                                :      43981907.237377
se.vruntime                                  :       8550696.763760
se.sum_exec_runtime                          :         75749.342807
se.statistics.sum_sleep_runtime              :       2605178.315222
se.statistics.wait_start                     :             0.000000
se.statistics.sleep_start                    :      43981907.256566
se.statistics.block_start                    :             0.000000
se.statistics.sleep_max                      :        105403.459829
se.statistics.block_max                      :            50.956173
se.statistics.exec_max                       :             0.721679
se.statistics.slice_max                      :             0.179643
se.statistics.wait_max                       :             6.901659
se.statistics.wait_sum                       :           136.706988
se.statistics.wait_count                     :              3661535
se.statistics.iowait_sum                     :             0.000000
se.statistics.iowait_count                   :                    0
se.nr_migrations                             :                    2
se.statistics.nr_migrations_cold             :                    0
se.statistics.nr_failed_migrations_affine    :                 1247
se.statistics.nr_failed_migrations_running   :                    0
se.statistics.nr_failed_migrations_hot       :                    0
se.statistics.nr_forced_migrations           :                    0
se.statistics.nr_wakeups                     :              3661618
se.statistics.nr_wakeups_sync                :                    0
se.statistics.nr_wakeups_migrate             :                    1
se.statistics.nr_wakeups_local               :              3661575
se.statistics.nr_wakeups_remote              :                   43
se.statistics.nr_wakeups_affine              :                    0
se.statistics.nr_wakeups_affine_attempts     :                    0
se.statistics.nr_wakeups_passive             :                    0
se.statistics.nr_wakeups_idle                :                    0
avg_atom                                     :             0.020687
avg_per_cpu                                  :         37874.671403
nr_switches                                  :              3661535
nr_voluntary_switches                        :              3661173
nr_involuntary_switches                      :                  362
se.load.weight                               :                 1024
se.avg.load_sum                              :             35158651
se.avg.util_sum                              :                34184
se.avg.load_avg                              :                  730
se.avg.util_avg                              :                  726
se.avg.last_update_time                      :       43981907263735
policy                                       :                    0
prio                                         :                  120
clock-delta                                  :                   29
numa_pages_migrated                          :                    0
numa_preferred_nid                           :                   -1
total_numa_faults                            :                    0
current_node=0, numa_group_id=0
numa_faults node=0 task_private=0 task_shared=0 group_private=0 group_shared=0


6398 # cat status
Name:	kworker/0:1
State:	S (sleeping)
Tgid:	6398
Ngid:	0
Pid:	6398
PPid:	2
TracerPid:	0
Uid:	0	0	0	0
Gid:	0	0	0	0
FDSize:	64
Groups:	
NStgid:	6398
NSpid:	6398
NSpgid:	0
NSsid:	0
Threads:	1
SigQ:	0/23631
SigPnd:	0000000000000000
ShdPnd:	0000000000000000
SigBlk:	0000000000000000
SigIgn:	ffffffffffffffff
SigCgt:	0000000000000000
CapInh:	0000000000000000
CapPrm:	0000003fffffffff
CapEff:	0000003fffffffff
CapBnd:	0000003fffffffff
CapAmb:	0000000000000000
Seccomp:	0
Cpus_allowed:	01
Cpus_allowed_list:	0
Mems_allowed:	00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
Mems_allowed_list:	0
voluntary_ctxt_switches:	3660401
nonvoluntary_ctxt_switches:	362


6398 # cat task/6398/stack 
[<ffffffff810b8ff6>] worker_thread+0xc6/0x450
[<ffffffff810bede8>] kthread+0xd8/0xf0
[<ffffffff8178199f>] ret_from_fork+0x3f/0x70
[<ffffffffffffffff>] 0xffffffffffffffff


6398 # cat task/6398/sched
kworker/0:1 (6398, #threads: 1)
-------------------------------------------------------------------
se.exec_start                                :      43981881.396282
se.vruntime                                  :       8550678.325554
se.sum_exec_runtime                          :         75730.904601
se.statistics.sum_sleep_runtime              :       2605171.057848
se.statistics.wait_start                     :             0.000000
se.statistics.sleep_start                    :      43981881.396282
se.statistics.block_start                    :             0.000000
se.statistics.sleep_max                      :        105403.459829
se.statistics.block_max                      :            50.956173
se.statistics.exec_max                       :             0.721679
se.statistics.slice_max                      :             0.179643
se.statistics.wait_max                       :             6.901659
se.statistics.wait_sum                       :           136.542284
se.statistics.wait_count                     :              3660706
se.statistics.iowait_sum                     :             0.000000
se.statistics.iowait_count                   :                    0
se.nr_migrations                             :                    2
se.statistics.nr_migrations_cold             :                    0
se.statistics.nr_failed_migrations_affine    :                 1246
se.statistics.nr_failed_migrations_running   :                    0
se.statistics.nr_failed_migrations_hot       :                    0
se.statistics.nr_forced_migrations           :                    0
se.statistics.nr_wakeups                     :              3660790
se.statistics.nr_wakeups_sync                :                    0
se.statistics.nr_wakeups_migrate             :                    1
se.statistics.nr_wakeups_local               :              3660747
se.statistics.nr_wakeups_remote              :                   43
se.statistics.nr_wakeups_affine              :                    0
se.statistics.nr_wakeups_affine_attempts     :                    0
se.statistics.nr_wakeups_passive             :                    0
se.statistics.nr_wakeups_idle                :                    0
avg_atom                                     :             0.020687
avg_per_cpu                                  :         37865.452300
nr_switches                                  :              3660706
nr_voluntary_switches                        :              3660344
nr_involuntary_switches                      :                  362
se.load.weight                               :                 1024
se.avg.load_sum                              :             35419884
se.avg.util_sum                              :                34529
se.avg.load_avg                              :                  734
se.avg.util_avg                              :                  732
se.avg.last_update_time                      :       43981881404069
policy                                       :                    0
prio                                         :                  120
clock-delta                                  :                   31
numa_pages_migrated                          :                    0
numa_preferred_nid                           :                   -1
total_numa_faults                            :                    0
current_node=0, numa_group_id=0
numa_faults node=0 task_private=0 task_shared=0 group_private=0 group_shared=0


6398 # cat task/6398/status 
Name:	kworker/0:1
State:	R (running)
Tgid:	6398
Ngid:	0
Pid:	6398
PPid:	2
TracerPid:	0
Uid:	0	0	0	0
Gid:	0	0	0	0
FDSize:	64
Groups:	
NStgid:	6398
NSpid:	6398
NSpgid:	0
NSsid:	0
Threads:	1
SigQ:	0/23631
SigPnd:	0000000000000000
ShdPnd:	0000000000000000
SigBlk:	0000000000000000
SigIgn:	ffffffffffffffff
SigCgt:	0000000000000000
CapInh:	0000000000000000
CapPrm:	0000003fffffffff
CapEff:	0000003fffffffff
CapBnd:	0000003fffffffff
CapAmb:	0000000000000000
Seccomp:	0
Cpus_allowed:	01
Cpus_allowed_list:	0
Mems_allowed:	00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
Mems_allowed_list:	0
voluntary_ctxt_switches:	3660332
nonvoluntary_ctxt_switches:	362
Comment 23 Alan Stern 2016-01-15 18:24:39 UTC
The "[<ffffffffffffffff>] 0xffffffffffffffff" stuff is just the meaningless last line of the stack trace.  The earlier parts are more meaningful (although not much more in this case).

You may have to try this multiple times to get something useful -- the contents of /proc/PID/stack will vary as the process runs.  Since that thread is consuming 75% of the CPU time, it should be running a large fraction of the time.  See if you can catch it while it's doing something nontrivial.
Comment 24 Michal Nowak 2016-01-15 18:57:33 UTC
Created attachment 199741 [details]
Stacks from kworker

Ran

  while true; do cat /proc/28817/stack ; echo; done > out

for a minute or so.
Comment 25 Alan Stern 2016-01-15 22:08:52 UTC
The stack traces show lots of runtime resumes.  You can figure out which device is responsible by disabling runtime PM for each of the devices associated with gpe0D in turn.  You do that by writing "on" to the devices power/control file in /sys/.
Comment 26 Michal Nowak 2016-01-17 08:41:00 UTC
I ran:

  for i in `find /sys/ -name control | grep 'power/control'`; do echo on > $i; done

and it did nothing to the kworker process..
Comment 27 Alan Stern 2016-01-17 16:06:02 UTC
Did the kworker's stack trace still contain references to runtime_suspend or runtime_resume after you did that?

You can try tracing runtime PM events directly:

  echo 'rpm:*' >>/sys/kernel/debug/tracing/set_event

This will probably generate a lot of tracing output.  To turn it off:

  echo '!rpm:*' >>/sys/kernel/debug/tracing/set_event

While the tracing is on, you can read its output from

  /sys/kernel/debug/tracing/trace_pipe.
Comment 28 Michal Nowak 2016-01-31 19:04:18 UTC
I am unable to reproduce it for two weeks.. Will get to this BZ once the issue is back.
Comment 29 Aaron Lu 2016-02-01 02:38:25 UTC
May I know what happened that make the issue go away?
Comment 30 Michal Nowak 2016-02-01 05:42:56 UTC
I don't know of anything. I doubt it's fixed. I guess I am just unable to trigger it.
Comment 31 Michal Nowak 2016-02-04 21:31:34 UTC
Created attachment 202941 [details]
/sys/kernel/debug/tracing/set_event output

It's back. Restart -> suspend -> resume -> attached the speakers -> the bug.

The only strange thing about the setup where I was unable to trigger the bug was that it had uptime for 18 days. Log attached. Let me know if there's more I can do.
Comment 32 Aaron Lu 2016-05-13 03:02:48 UTC
Since the problem is back, can you answer Alan's question in comment #27?

Another thing to try is to disable the USB3 controller's wake GPE:
# cat /proc/acpi/wakeup
XHC? SX ...
# echo XHC? > /proc/acpi/wakeup
Comment 33 Alan Stern 2016-05-13 15:04:56 UTC
Aaron may be on the right track.  It's particularly noticeable in the tracing output that each wakeup event causes runtime resume requests for three PCI devices: 0000:00:1a.0, 0000:00:1b.0, and 0000:00:1d.0.  None of these is the xHCI controller!  They are all other devices that must share the same PME# wakeup signal.

It's also notable that the lspci -vv output shows this for the xHCI controller (0000:00:14.0):
	Capabilities: [70] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst+ PME-Enable+ DSel=0 DScale=0 PME+

The PME+ at the end of the Status: line means that the controller has activated its PME wakeup signal.  This is in spite of the fact that the controller's power state is D0 and the Flags: line says that PME can be turned on only in D3hot or D3cold.

It seems that the xHCI controller is sending a wakeup signal at times when it shouldn't.  This looks like a hardware or firmware bug.
Comment 34 Zhang Rui 2016-05-16 06:32:51 UTC
please attach the output of "grep . /sys/firmware/acpi/interrupts/*" both before and after suspend.
Comment 35 Aaron Lu 2016-05-16 09:11:50 UTC
If you attach another USB device, say USB keyboard to the port, will there be GPE storm?
Comment 36 Michal Nowak 2016-05-24 19:03:35 UTC
Created attachment 217251 [details]
Interrupts before suspend
Comment 37 Michal Nowak 2016-05-24 19:03:58 UTC
Created attachment 217261 [details]
Interrupts after suspend
Comment 38 Michal Nowak 2016-05-24 19:05:16 UTC
(In reply to Zhang Rui from comment #34)
> please attach the output of "grep . /sys/firmware/acpi/interrupts/*" both
> before and after suspend.

See above, the diff being:

~ # diff *_suspend
2c2
< /sys/firmware/acpi/interrupts/ff_gbl_lock:       0   disabled
---
> /sys/firmware/acpi/interrupts/ff_gbl_lock:       0   enabled
Comment 39 Michal Nowak 2016-05-24 19:13:03 UTC
Created attachment 217271 [details]
Interrupts after keyboard was plugged in (after suspend->resume)

(In reply to Aaron Lu from comment #35)
> If you attach another USB device, say USB keyboard to the port, will there
> be GPE storm?

Yes! It's the same as with those speakers, kworker/0:2 is up'n'running.
Comment 40 Michal Nowak 2016-05-24 19:16:40 UTC
(In reply to Michal Nowak from comment #39)
> Yes! It's the same as with those speakers, kworker/0:2 is up'n'running.

It's strange to me how's that I never noticed it with the keyboard before, only with those speakers. Maybe because I always connect the keyboard via an external USB2 HUB to the only USB port which does not have the port in blue color?
Comment 41 Michal Nowak 2016-05-24 19:34:31 UTC
Created attachment 217281 [details]
trace_pipe

(In reply to Alan Stern from comment #27)
> Did the kworker's stack trace still contain references to runtime_suspend or
> runtime_resume after you did that?

After I ran:

for i in `find /sys/ -name control | grep 'power/control'`; do echo on > $i; done

they do not contain runtime_{suspend,resume}.

If I run:

echo 'rpm:*' >>/sys/kernel/debug/tracing/set_event

And look at /sys/kernel/debug/tracing/trace_pipe I see a lot of lines like:

rpm_resume: 0000:00:1a.0 flags-0 cnt-2  dep-0  auto-0 p-0 irq-0 child-1
rpm_return_int: rpm_resume+0xbc/0x680:0000:00:1a.0 ret=1

See attached snip from trace_pipe.
Comment 42 Michal Nowak 2016-05-24 19:40:55 UTC
(In reply to Aaron Lu from comment #32)
> Since the problem is back, can you answer Alan's question in comment #27?
> 
> Another thing to try is to disable the USB3 controller's wake GPE:
> # cat /proc/acpi/wakeup
> XHC? SX ...
> # echo XHC? > /proc/acpi/wakeup

This changes XHC from "enabled" to "disabled":

  XHC       S4    *disabled  pci:0000:00:14.0

The interrupts in /sys/firmware/acpi/interrupts/* still grow:

/sys/firmware/acpi/interrupts/gpe0D:58970552   enabled
/sys/firmware/acpi/interrupts/gpe_all:58970567
/sys/firmware/acpi/interrupts/sci:58970567
Comment 43 Michal Nowak 2016-05-24 19:42:32 UTC
Hopefully I answered all your questions, let me know otherwise. Thanks for looking into this.
Comment 44 Michal Nowak 2016-05-24 20:20:19 UTC
(In reply to Michal Nowak from comment #40)
> Maybe because I always connect the keyboard via an
> external USB2 HUB to the only USB port which does not have the port in blue
> color?

Disregard this, please. Happens to me with every port with every USB device I have.
Comment 45 Alan Stern 2016-05-24 20:28:27 UTC
In response to comment #42: After disabling wakeup for xHCI, what does "lspci -vv -s 00:14.0" show?

Also, can you use your system without the xHCI controller?  What happens if you rmmod xhci-hcd or unbind it from the 0000:00:14.0 device?
Comment 46 Michal Nowak 2016-05-25 05:48:45 UTC
(In reply to Alan Stern from comment #45)
> In response to comment #42: After disabling wakeup for xHCI, what does
> "lspci -vv -s 00:14.0" show?

This is after fresh start, no GPE storm at that time triggered:

00:14.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB xHCI Host Controller (rev 04) (prog-if 30 [XHCI])
	Subsystem: Toshiba America Info Systems Device ff1e
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 26
	Region 0: Memory at c1200000 (64-bit, non-prefetchable) [size=64K]
	Capabilities: [70] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [80] MSI: Enable+ Count=1/8 Maskable- 64bit+
		Address: 00000000fee0c00c  Data: 41e1
	Kernel driver in use: xhci_hcd

The lspci output did not change after disabling XHCI.

> 
> Also, can you use your system without the xHCI controller?

On a day-to-day basis I use the system with USB3 disabled in BIOS (i.e. I run USB2 with ehci driver).

> What happens if you rmmod xhci-hcd or unbind it from the 0000:00:14.0 device?

I can't find xhci-hcd kernel module here on Fedora 23, perhaps it's build in? How do I unbind the driver from the 0000:00:14.0 device?
Comment 47 Alan Stern 2016-05-25 14:24:16 UTC
This line:

    Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-

indicates that PME is disabled and turned off.  Therefore there's no GPE storm.

So the real question now is, what do you get if you disable xHCI wakeup after one of those storms occurs?

You can unbind xhci-hcd by doing:

    echo 0000:00:14.0 >/sys/bus/pci/drivers/xhci_hcd/unbind
Comment 48 Michal Nowak 2016-05-25 16:36:31 UTC
~ # cat /proc/acpi/wakeup | grep XHC
XHC	  S4	*enabled   pci:0000:00:14.0
~ # echo XHC > /proc/acpi/wakeup
~ # cat /proc/acpi/wakeup | grep XHC
XHC	  S4	*disabled  pci:0000:00:14.0
~ # lspci -vv -s 00:14.0 
...
		Status: D0 NoSoftRst+ PME-Enable+ DSel=0 DScale=0 PME+
...

After

    echo 0000:00:14.0 >/sys/bus/pci/drivers/xhci_hcd/unbind

I can't use any USB device, but the storm is present.
Comment 49 Alan Stern 2016-05-25 18:55:30 UTC
I probably should have expected that.  The settings in /proc/acpi/wakeup take effect only when the system goes into suspend, not while it is running normally.

Just out of curiosity, if you boot with USB-3 enabled in the BIOS and the speakers already plugged in, does the GPE storm still occur?
Comment 50 Michal Nowak 2016-05-25 20:31:06 UTC
(In reply to Alan Stern from comment #49)
> Just out of curiosity, if you boot with USB-3 enabled in the BIOS and the
> speakers already plugged in, does the GPE storm still occur?

Nope. I have to suspend -> resume -> unplug device -> plug the device to trigger the storm.

Knowing all this, is there a possibility to fix the issue in HW/BIOS by a workaround in kernel?

E.g.

  echo disable > /sys/firmware/acpi/interrupts/gpe0D

always works but I dunno all the consequences.
Comment 51 Aaron Lu 2016-05-26 02:18:50 UTC
(In reply to Michal Nowak from comment #50)
> (In reply to Alan Stern from comment #49)
> > Just out of curiosity, if you boot with USB-3 enabled in the BIOS and the
> > speakers already plugged in, does the GPE storm still occur?
> 
> Nope. I have to suspend -> resume -> unplug device -> plug the device to
> trigger the storm.

This seems to indicate the GPE enable has something to do with the system resume, and the replug of the device triggered that GPE to occur.
I'm also adding Rafael to take a look at this.
Comment 52 Alan Stern 2016-07-07 17:23:20 UTC
Disabling gpe0D will prevent some wakeup methods from working.  It may also prevent some devices from automatically going into or coming out of runtime suspend.

If you don't care about that (say, if you always wake up your computer by pressing the power button and everything else seems to work okay), then go ahead and do it.

I don't know of any workaround to fix the problem, but maybe somebody else can find one.
Comment 53 Michal Nowak 2016-07-13 20:14:45 UTC
(In reply to Alan Stern from comment #52)
> If you don't care about that (say, if you always wake up your computer by
> pressing the power button and everything else seems to work okay), then go
> ahead and do it.

This is my case indeed.

> I don't know of any workaround to fix the problem, but maybe somebody else
> can find one.

Thank you. I am fine with disabling gpe0D, otherwise I can revert to USB 2.0. As it seems there's nothing to fix in kernel, I wouldn't object if this bug is closed - not sure what's the right approach.
Comment 54 Lv Zheng 2016-09-05 01:47:26 UTC
Need to investigate if GPE code has order issue or need to be re-designed for this issue.
Comment 56 Lv Zheng 2017-03-27 08:47:52 UTC
Let me say something about this issue:

The philosophy is:
In suspend/resume process, special GPE APIs are invoked:
Between acpi_hw_enable_all_wakeup_gpes() and acpi_hw_enable_all_runtime_gpes(), no acpi_disable_gpe()/acpi_enable_gpe() should be invoked.
Otherwise the GPE enable status can be messed up, causing post-resume GPE storms.
However I don't know if there is still such kind of issues in Linux.

After checking all files, there are these invocations:

1. OLPC: not related.
rch/x86/platform/olpc/olpc-xo15-sci.c:	acpi_disable_gpe(NULL, xo15_sci_gpe);
arch/x86/platform/olpc/olpc-xo15-sci.c:	acpi_enable_gpe(NULL, xo15_sci_gpe);
drivers/platform/x86/xo15-ebook.c:		acpi_enable_gpe(device->wakeup.gpe_device,

2. Apple: not related.
drivers/platform/x86/apple-gmux.c:		acpi_disable_gpe(NULL, gmux_data->gpe);
drivers/platform/x86/apple-gmux.c:		acpi_disable_gpe(NULL, gmux_data->gpe);
drivers/platform/x86/apple-gmux.c:		status = acpi_enable_gpe(NULL, gmux_data->gpe);

3. EC: boot only, not related.
drivers/acpi/ec.c:		acpi_disable_gpe(NULL, ec->gpe);
drivers/acpi/ec.c:		acpi_enable_gpe(NULL, ec->gpe);

4. sysfs: User driven, not related.
drivers/acpi/sysfs.c:			result = acpi_disable_gpe(handle, index);
drivers/acpi/sysfs.c:			result = acpi_enable_gpe(handle, index);

So the followings are the only possible problems:
drivers/acpi/device_pm.c:			acpi_disable_gpe(wakeup->gpe_device, wakeup->gpe_number);
drivers/acpi/device_pm.c:		res = acpi_enable_gpe(wakeup->gpe_device, wakeup->gpe_number);
drivers/acpi/wakeup.c:			acpi_enable_gpe(dev->wakeup.gpe_device,

I'll check them later.
Comment 57 Lv Zheng 2017-03-27 08:50:16 UTC
acpi_wakeup_device_init(): boot only, not related.
drivers/acpi/wakeup.c:			acpi_enable_gpe(dev->wakeup.gpe_device,
Comment 58 Lv Zheng 2017-03-27 09:03:27 UTC
drivers/acpi/device_pm.c:			acpi_disable_gpe(wakeup->gpe_device, wakeup->gpe_number);
drivers/acpi/device_pm.c:		res = acpi_enable_gpe(wakeup->gpe_device, wakeup->gpe_number);

They are invoked in acpi_device_wakeup().
Which is invoked by:
acpi_dev_suspend_late()/acpi_dev_resume_early(): .suspend_late/.resume_early callbacks.
acpi_dev_pm_attach()/acpi_dev_pm_detach(): driver probe, not related
acpi_pm_device_run_wake(): .run_wakeup, not related
acpi_pm_device_sleep_wake(): .sleep_wake, call by .suspend/.resume callbacks.
Comment 59 Lv Zheng 2017-03-27 09:10:33 UTC
Cannot see a possiblity that .suspend_late/.suspend/.resume/.resume_early is invoked during this period.
Possibly the code base has been changed and the bug has gone?
So let's close it and you can re-open it if it can still be reproduced.

Thanks
Lv
Comment 60 Michal Nowak 2017-03-27 16:37:10 UTC
Just tested with kernel 4.10.6 on openSUSE Leap 42.2 and it's still there.
Comment 61 Lv Zheng 2017-04-12 05:23:43 UTC
Hmm, so probably you still need to use "acpi_mask_gpe=0x0D" as a boot parameter for this machine.
I couldn't find any significant GPE issues after investigation.
Can we close it now?
Comment 62 Zhang Rui 2017-04-12 06:41:58 UTC
Bug closed as there is nothing we can do in kernel, plus there is a workaround available for this issue.
Please feel free to reopen it if you still have any questions.

Note You need to log in before you can comment on or make changes to this bug.