Bug 169241

Summary: AC to battery causes extreme load, many kworker threads in D state - 2008 Macbook
Product: ACPI Reporter: G Mo (spambox+kernelbug)
Component: Power-BatteryAssignee: Lv Zheng (lv.zheng)
Status: CLOSED UNREPRODUCIBLE    
Severity: normal CC: lenb, rui.zhang, yu.c.chen
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.4.0, 4.6.3 Subsystem:
Regression: No Bisected commit-id:
Attachments: output of 'acpidump -o'
'cat /proc/kworker_pid/stack' of first kworker thread to appear after a plug event
'cat /proc/kworker_pid/stack' of an additional kworker thread to appear after a plug event

Description G Mo 2016-09-17 23:06:53 UTC
SYNOPSIS

Most of the time, disconnecting a 2008 Macbook from AC, or reconnecting it to AC, will cause hundreds of kworker threads to be spawned, driving up system load to levels where a GUI may become unusable. CPU and iowait is low. kworker threads are in D state. 'powertop' shows increased numbers of ACPI interrupts, and acpi_ec_event_processor and acpi_os_execute_deferred very high ms/s figures in the Usage column.

This is occurring on Lubuntu 16.04. The bug is present in the 4.4.0 distribution kernel and in kernel 4.6.3-yakkety from the Ubuntu kernel PPA. Booting with acpi=off renders critical portions of the system unusable.

See also: https://ubuntuforums.org/showthread.php?t=2335053&p=13544491


RELATED INFO

'top':
top - 21:29:22 up 1 day, 5:57, 2 users, load average: 223.67, 179.10, 166.01
Tasks: 448 total, 1 running, 447 sleeping, 0 stopped, 0 zombie
%Cpu(s): 4.3 us, 1.0 sy, 0.0 ni, 94.5 id, 0.2 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 4011084 total, 557060 free, 2009348 used, 1444676 buff/cache
KiB Swap: 2999804 total, 2607704 free, 392100 used. 1574160 avail Mem 


$ ps ax | grep kworker | wc -l
271


$ ps -e -o state,pid,cmd | grep ^D | grep kworker | wc -l
257


'powertop' before power source change:
Summary: 294.0 wakeups/second,  2.1 GPU ops/seconds, 0.0 VFS ops/sec and 6.7% CPU use
                Usage       Events/s    Category       Description
            100.0%                      Device         Audio codec hwC0D0: Realtek
              6.0 ms/s      54.1        Process        /usr/lib/chromium-browser/chromium-browser --enable-pinch
             38.0 ms/s      48.5        Process        /usr/lib/chromium-browser/chromium-browser --type=renderer --enable-pinch --enable-features=D
            362.1 µs/s      29.8        kWork          dbs_timer
              1.6 ms/s      27.3        Interrupt      [9] acpi


'powertop' two refreshes after power source change:

Summary: 300.8 wakeups/second,  3.8 GPU ops/seconds, 0.0 VFS ops/sec and 1084.8% CPU use
                Usage       Events/s    Category       Description
             9627 ms/s       9.9        kWork          acpi_ec_event_processor
             1138 ms/s      16.3        kWork          acpi_os_execute_deferred
             13.8 ms/s     236.5        Interrupt      [9] acpi
            100.0%                      Device         Audio codec hwC0D0: Realtek
Comment 1 Chen Yu 2016-09-18 03:42:20 UTC
Looks like the EC work item has been queued on ec_query_wq frequently, besides, the notifier has triggered acpi_os_execute_deferred rapidly, let me add @Lv in this discussion. Please attach your acpidump file.
Comment 2 Chen Yu 2016-09-18 03:45:46 UTC
And please find the pid of that kworker, and provide:
cat /proc/kworker_pid/stack
Comment 3 G Mo 2016-09-18 04:41:27 UTC
Created attachment 238531 [details]
output of 'acpidump -o'
Comment 4 G Mo 2016-09-18 04:42:18 UTC
Created attachment 238541 [details]
'cat /proc/kworker_pid/stack' of first kworker thread to appear after a plug event
Comment 5 G Mo 2016-09-18 04:42:50 UTC
Created attachment 238551 [details]
'cat /proc/kworker_pid/stack' of an additional kworker thread to appear after a plug event
Comment 6 G Mo 2016-09-18 04:44:34 UTC
I attached the output of 'acpidump -o' and a couple kworkers, the first one to appear after I triggered the bug, and one of the later ones to appear. I'm not sure if those were the right ones.

Interestingly, I found I cannot trigger the bug from the console or lightdm screen - the bug only happens when an LXDE session (this is Lubuntu) is running, or even more strangely, upon the start of an LXDE session if the power source has been toggled before that session is started.
Comment 7 Lv Zheng 2016-09-20 02:45:19 UTC
First we need to know which _Qxx has triggered this issue.
Could you enable CONFIG_ACPI_DEBUG (enable and re-build the kernel if it is not enabled currently) and boot the enabled kernel with the following boot parameter:
 dyndbg="file ec.c +p"

Thanks
Lv
Comment 8 Lv Zheng 2016-09-20 02:46:42 UTC
(In reply to Lv Zheng from comment #7)
> Could you enable CONFIG_ACPI_DEBUG (enable and re-build the kernel if it is
> not enabled currently) and boot the enabled kernel with the following boot
> parameter:
>  dyndbg="file ec.c +p"

Sorry for missing the next steps:
And please obtain the dmesg output after triggering the issue and upload the dmesg output here.
Hopefully we can see the query number in the log.
Comment 9 Lv Zheng 2016-10-25 07:55:21 UTC
Pinging...
Comment 10 G Mo 2016-10-25 21:40:23 UTC
Hi, sorry, I do intend to follow up on this. I'm having some trouble reproducing it and need to (re)build some kernels and test further on the Macbook. It's slow going.
Comment 11 Lv Zheng 2016-10-26 15:26:09 UTC
OK. Leaving this opened.
I'll ping you next month.
Hope it has already been fixed by the upstream kernel then. :)
Comment 12 Zhang Rui 2016-12-22 02:20:15 UTC
Hi, G Mo,
any updates?
Comment 13 Zhang Rui 2016-12-22 02:20:37 UTC
can you reproduce the problem in the latest upstream kernel?
Comment 14 Lv Zheng 2016-12-22 03:07:09 UTC
Closing due to no progress.
Feel free to re-open it when you can reproduce the issue.

Thanks and best regards
Lv