Bug 189291

Summary: Bisected: Low performance and lowest processor frequency for hp compaq 6715b - AMD Turion(tm) 64 X2
Product: Platform Specific/Hardware Reporter: cthx (sntmail)
Component: x86-64Assignee: Thomas Gleixner (tglx)
Status: NEEDINFO ---    
Severity: normal CC: alexey.kv, dnawrock+kernelorg, info, lenb, manuel, marcodv, rui.zhang, swelef
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 4.3 (?) - 4.8 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: cpuinfo
lspci
dmesg for 4.1.6 kernel
dmesg for 4.8.10 kernel
dmesg for 4.8.10 kernel with acpi=noirq
grep . /sys/devices/system/cpu/cpu0/cpufreq/* for 4.1.6 (ok)
grep . /sys/devices/system/cpu/cpu0/cpufreq/* for 4.8.12 (bug)
good and bad dmesg and /proc/interrupts
investigation.tar.bz2
investigation.tar.bz2 with the debug patch

Description cthx 2016-11-29 01:47:43 UTC
Created attachment 246061 [details]
cpuinfo

My laptop is very slow at current kernels. The frequency stucks at 800 MHz.
Latest kernel which is ok is 4.1.x (4.2 has a backlight problem).

acpi=noirq solves the problem but adds other problems...
Comment 1 cthx 2016-11-29 01:48:12 UTC
Created attachment 246071 [details]
lspci
Comment 2 cthx 2016-11-29 01:48:52 UTC
Created attachment 246081 [details]
dmesg for 4.1.6 kernel
Comment 3 cthx 2016-11-29 01:49:46 UTC
Created attachment 246091 [details]
dmesg for 4.8.10 kernel
Comment 4 cthx 2016-11-29 01:50:37 UTC
Created attachment 246101 [details]
dmesg for 4.8.10 kernel with acpi=noirq
Comment 5 Len Brown 2016-12-05 23:30:12 UTC
acpi=noirq puts the system into PIC mode, while ACPI is able to put it in IOAPIC mode.  Unclear why this has any effect on the problem at hand.  Do you get the same result with simply "noapic"?

In both cases, powernow_k8 seems to load and print the same messages.

In the working and non-working cases, what do you see with

grep . /sys/devices/system/cpu/cpu0/cpufreq/*

What is the latest working kernel, and what is the first failing kernel?  Can you git-bisect to what commit caused this to break?
Comment 6 cthx 2016-12-08 23:45:00 UTC
acpi=noapic doesn't help this.

I think the regression was made in 4.2.x or 4.3.x (4.2.x kernels have a backlight issue on my laptop, so I use 4.1.x)
Comment 7 cthx 2016-12-08 23:47:36 UTC
Created attachment 247171 [details]
grep . /sys/devices/system/cpu/cpu0/cpufreq/* for 4.1.6 (ok)
Comment 8 cthx 2016-12-08 23:48:31 UTC
Created attachment 247181 [details]
grep . /sys/devices/system/cpu/cpu0/cpufreq/* for 4.8.12 (bug)
Comment 9 cthx 2016-12-09 00:13:08 UTC
btw, seems the same bug was described here:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1598312
Comment 10 Zhang Rui 2017-01-09 07:45:06 UTC
(In reply to cthx from comment #6)
> acpi=noapic doesn't help this.
> 
> I think the regression was made in 4.2.x or 4.3.x (4.2.x kernels have a
> backlight issue on my laptop, so I use 4.1.x)

what do you mean by backlight issue?
If the platform is still alive, please run git-bisect to find out which commit introduces the problem.
Comment 11 Zhang Rui 2017-01-11 02:47:23 UTC
this is only one functional change in powernow-k8 driver since 4.1
commit 38c52e6343f0e28abc7daf15cbbcd7e450667202
Author: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Date:   Tue May 26 15:11:31 2015 +0200

    powernow-k8: Replace cpu_core_mask() with topology_core_cpumask()
    
    The former duplicates the functionality of the latter but is
    neither documented nor arch-independent.
    
    Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
    Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
    Acked-by: Rafael J. Wysocki <rjw@rjwysocki.net>
    Cc: Benoit Cousson <bcousson@baylibre.com>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Fenghua Yu <fenghua.yu@intel.com>
    Cc: Guenter Roeck <linux@roeck-us.net>
    Cc: Jean Delvare <jdelvare@suse.de>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Oleg Drokin <oleg.drokin@intel.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Russell King <linux@arm.linux.org.uk>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Link: http://lkml.kernel.org/r/1432645896-12588-5-git-send-email-bgolaszewski@baylibre.com
    Signed-off-by: Ingo Molnar <mingo@kernel.org>


can you please check if "git revert 38c52e6343f0e28abc7daf15cbbcd7e450667202" fixes the problem or not?
Comment 12 Vladimir Marko 2017-01-15 00:52:56 UTC
If you followed the link in comment #9 (posted 2016-12-09), you would have found the results of the bisect for Ubuntu (posted 2016-11-30) and two possible patches (posted 2016-12-08).

The bisect for Ubuntu yields

3b806e2b94cad37a8809df7c86f7cfdcd3baa719 is the first bad commit
commit 3b806e2b94cad37a8809df7c86f7cfdcd3baa719
Author: Thomas Gleixner <tglx@linutronix.de>
Date: Mon Sep 14 12:00:55 2015 +0200

    x86/ioapic: Force affinity setting in setup_ioapic_dest()

    BugLink: http://bugs.launchpad.net/bugs/1509886

    commit 4857c91f0d195f05908fff296ba1ec5fca87066c upstream.

    ...

The commit 4857c91f0d195f05908fff296ba1ec5fca87066c does not revert cleanly in the current kernel tree but it's rather trivial to do it manually. The two possible patches I posted on the Ubuntu bug were successfully tested to fix the issue.
Comment 13 Zhang Rui 2017-01-16 03:12:51 UTC
(In reply to Vladimir Marko from comment #12)
> If you followed the link in comment #9 (posted 2016-12-09), you would have
> found the results of the bisect for Ubuntu (posted 2016-11-30) and two
> possible patches (posted 2016-12-08).
> 
> The bisect for Ubuntu yields
> 
> 3b806e2b94cad37a8809df7c86f7cfdcd3baa719 is the first bad commit
> commit 3b806e2b94cad37a8809df7c86f7cfdcd3baa719
> Author: Thomas Gleixner <tglx@linutronix.de>
> Date: Mon Sep 14 12:00:55 2015 +0200
> 
>     x86/ioapic: Force affinity setting in setup_ioapic_dest()
> 
>     BugLink: http://bugs.launchpad.net/bugs/1509886
> 
>     commit 4857c91f0d195f05908fff296ba1ec5fca87066c upstream.
> 
>     ...
> 
> The commit 4857c91f0d195f05908fff296ba1ec5fca87066c does not revert cleanly
> in the current kernel tree but it's rather trivial to do it manually. The
> two possible patches I posted on the Ubuntu bug were successfully tested to
> fix the issue.

Cool. Thanks for your effort.
And we have two fixes that have been verified to fix the problem
https://launchpadlibrarian.net/297109286/revert.patch
and
https://launchpadlibrarian.net/297109378/alternative.patch

Thomas,
can you please take a look at this issue?
Comment 14 Thomas Gleixner 2017-01-19 16:40:17 UTC
On Mon, 16 Jan 2017, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=189291
> 
> Zhang Rui <rui.zhang@intel.com> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>            Assignee|rui.zhang@intel.com         |tglx@linutronix.de
> 
> --- Comment #13 from Zhang Rui <rui.zhang@intel.com> ---
> (In reply to Vladimir Marko from comment #12)
> > If you followed the link in comment #9 (posted 2016-12-09), you would have
> > found the results of the bisect for Ubuntu (posted 2016-11-30) and two
> > possible patches (posted 2016-12-08).
> > 
> > The bisect for Ubuntu yields
> > 
> > 3b806e2b94cad37a8809df7c86f7cfdcd3baa719 is the first bad commit
> > commit 3b806e2b94cad37a8809df7c86f7cfdcd3baa719
> > Author: Thomas Gleixner <tglx@linutronix.de>
> > Date: Mon Sep 14 12:00:55 2015 +0200
> > 
> >     x86/ioapic: Force affinity setting in setup_ioapic_dest()
> > 
> >     BugLink: http://bugs.launchpad.net/bugs/1509886
> > 
> >     commit 4857c91f0d195f05908fff296ba1ec5fca87066c upstream.
> > 
> >     ...
> > 
> > The commit 4857c91f0d195f05908fff296ba1ec5fca87066c does not revert cleanly
> > in the current kernel tree but it's rather trivial to do it manually. The
> > two possible patches I posted on the Ubuntu bug were successfully tested to
> > fix the issue.
> 
> Cool. Thanks for your effort.
> And we have two fixes that have been verified to fix the problem
> https://launchpadlibrarian.net/297109286/revert.patch
> and
> https://launchpadlibrarian.net/297109378/alternative.patch

Neither of those two are fixes. They paper over the problem.

What's worse they reintroduce the regression which was fixed with this
commit.

The old code, i.e. before the conversion to the hierarchical irq domain
wrote directly to the io apic to set the destination and that got dropped
during the conversion. The bisected commit restored the original behaviour.

So now the question is why the revert or the other patch 'fixes' the problem.

Can I please get for both the unmodified kernel and the patched kernel the
following information:

  Boot with apic=verbose on the kernel command line

  Gather dmesg output (full boot log)

  Output from 'cat /proc/interrupts'

Thanks,

	tglx
Comment 15 Vladimir Marko 2017-01-21 16:09:49 UTC
Created attachment 252711 [details]
good and bad dmesg and /proc/interrupts

Adding archive with dmesg and /proc/interrupts for a good and bad kernel.

Bad: Built at commit 4c9eff7af69c61749b9eb09141f18f35edbf2210, with amd64
generic config from http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.10-rc4/
except for using regular stack protector because
    Cannot use CONFIG_CC_STACKPROTECTOR_STRONG:
    -fstack-protector-strong not supported by compiler

Good: Built with the additional revert.patch from
    https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1598312
Comment 16 Vladimir Marko 2017-01-23 00:35:59 UTC
The irq_set_affinity() call in setup_ipapic_dest() that the commit
    4857c91f0d195f05908fff296ba1ec5fca87066c
replaces was introduced in commit
    aa5cb97f14a2dd5aefabed6538c35ebc087d7c24

I built a kernel at aa5cb97f14a2dd5aefabed6538c35ebc087d7c24 with
the cherry-pick 4857c91f0d195f05908fff296ba1ec5fca87066c and it
worked fine. I conclude that the bug was introduced somewhere in
between and aa5cb97f14a2dd5aefabed6538c35ebc087d7c24 exposed it.

I shall try and bisect again, cherry-picking
    4857c91f0d195f05908fff296ba1ec5fca87066c
at every iteration to expose the underlying bug.
That may take a while.
Comment 17 Vladimir Marko 2017-01-30 00:20:28 UTC
Bisecting with
    4857c91f0d195f05908fff296ba1ec5fca87066c
applied on top yields

0be275e3a5607b23f5132121bca22a10ee23aa99 is the first bad commit
commit 0be275e3a5607b23f5132121bca22a10ee23aa99
Author: Jiang Liu <jiang.liu@linux.intel.com>
Date:   Tue Apr 14 10:29:57 2015 +0800

    x86/irq: Use cached IOAPIC entry instead of reading from hardware
    
    Use cached IOAPIC entry instead of reading data from IOAPIC hardware
    registers to improve performance.
    
    Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
    Tested-by: Joerg Roedel <jroedel@suse.de>
    Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    Cc: David Cohen <david.a.cohen@linux.intel.com>
    Cc: Sander Eikelenboom <linux@eikelenboom.it>
    Cc: David Vrabel <david.vrabel@citrix.com>
    Cc: Tony Luck <tony.luck@intel.com>
    Cc: Joerg Roedel <joro@8bytes.org>
    Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Cc: Bjorn Helgaas <bhelgaas@google.com>
    Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
    Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
    Cc: Randy Dunlap <rdunlap@infradead.org>
    Cc: Yinghai Lu <yinghai@kernel.org>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Dimitri Sivanich <sivanich@sgi.com>
    Cc: Grant Likely <grant.likely@linaro.org>
    Link: http://lkml.kernel.org/r/1428978610-28986-21-git-send-email-jiang.liu@linux.intel.com
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

:040000 040000 c3f631e46c7cee67fd8f1d900d307eabe53341b4 fcce52d36b3a12568a7b4e0033c5578e65bb37e6 M	arch
Comment 18 Vladimir Marko 2017-02-08 00:37:21 UTC
Created attachment 254511 [details]
investigation.tar.bz2

Data from investigation on top of the first bad commit.
Comment 19 Vladimir Marko 2017-02-08 00:53:33 UTC
Created attachment 254521 [details]
investigation.tar.bz2 with the debug patch
Comment 20 Vladimir Marko 2017-02-08 01:26:43 UTC
I've investigated the differences between the writes done before and after
    0be275e3a5607b23f5132121bca22a10ee23aa99 ("offending commit")
with the cherry-pick
    4857c91f0d195f05908fff296ba1ec5fca87066c ("exposing commit")
on top. Please see the data in the attachment added in comment #19.
(Comment #18 didn't contain the patch needed to interpret the data.)

You can "grep SWELEF" in the "good" dmesg output to see that the
"offending commit", despite saying that it is using "cached APIC entry",
actually writes very different data to the APIC entry at the beginning,
compared to the previous read-modify-write approach. Compare the "eu.w1"
values against the final "reg" value after the "->".

The funny thing is that the "delayed" dmesg (with the "exposing commit"
reverted, thus delaying the writes from setup_ioapic_dest()) does not
show any discrepancy for ioapic_set_affinity(). Some discrepancies remain
for io_apic_modify_irq() but these are immaterial as I checked by testing
with only the io_apic_modify_irq() #if to the "reverted" state.

Any further interpretation of the data is beyond my abilities.
Please have a look.
Comment 21 Alexey Kunitskiy 2017-02-28 19:23:43 UTC
Have exactly the same problem on my 6715s. Thanks to the original bugreporter for bisecting.
If there is something I can test, I'll be more than happy to do it.
Comment 22 Len Brown 2017-03-07 00:16:37 UTC
Is this still a problem in Linux-4.10?
Comment 23 Alexey Kunitskiy 2017-03-07 06:50:47 UTC
(In reply to Len Brown from comment #22)
> Is this still a problem in Linux-4.10?

Yes. Linux-4.10.1
Comment 24 Vladimir Marko 2017-04-13 00:16:27 UTC
Using WARN_ON(.) I have determined that during initialization
  - ioapic entries are read from
    [<ffffffff810524e8>] __ioapic_read_entry+0x88/0xa0
    [<ffffffff81052532>] ioapic_read_entry+0x32/0x60
    [<ffffffff81d67f71>] enable_IO_APIC+0x63/0x10b
    [<ffffffff81d67001>] apic_bsp_setup+0x89/0xae
    [<ffffffff81d64cc4>] native_smp_prepare_cpus+0x2cf/0x30d
    [<ffffffff81d511a6>] kernel_init_freeable+0xa5/0x1ef
      kernel_init_freeable() init/main.c:995
      {inlined} do_pre_smp_initcalls() init/main.c:888
          do_one_initcall(*fn);
  - mp_irqdomain_activate() is called for apic=0, pin=0 (timer irq) from
    [<ffffffff81054aec>] mp_irqdomain_activate+0x9c/0xb0
    [<ffffffff810da501>] irq_domain_activate_irq+0x41/0x50
    [<ffffffff81d6887c>] setup_IO_APIC+0x687/0x830
    [<ffffffff81052c6d>] ? clear_IO_APIC+0x4d/0x70
    [<ffffffff81d6701a>] apic_bsp_setup+0xa2/0xae
    [<ffffffff81d64cc4>] native_smp_prepare_cpus+0x2cf/0x30d
    [<ffffffff81d511a6>] kernel_init_freeable+0xa5/0x1ef
      kernel_init_freeable() init/main.c:995
      {inlined} do_pre_smp_initcalls() init/main.c:888
          do_one_initcall(*fn);
  - ioapic_set_affinity() is called from
    [<ffffffff8105284c>] ioapic_set_affinity+0xdc/0xf0
    [<ffffffff81d68afd>] setup_ioapic_dest+0xd8/0xf6
    [<ffffffff81d64e39>] native_smp_cpus_done+0x10a/0x117
    [<ffffffff81d7633c>] smp_init+0x69/0x88
    [<ffffffff81d511dc>] kernel_init_freeable+0xdb/0x1ef
      kernel_init_freeable() init/main.c:999
          sched_init_smp();
  - the next mp_irqdomain_activate() is for apic=0, pin=21 and called from
    [<ffffffff81054aec>] mp_irqdomain_activate+0x9c/0xb0
    [<ffffffff810da501>] irq_domain_activate_irq+0x41/0x50
    [<ffffffff810d6f4c>] irq_startup+0x2c/0x80
    [<ffffffff810d5801>] __setup_irq+0x511/0x5a0
    [<ffffffff811dbbd2>] ? kmem_cache_alloc_trace+0x1e2/0x220
    [<ffffffff810d59cd>] ? request_threaded_irq+0xad/0x1b0
    [<ffffffff814409e4>] ? acpi_osi_handler+0xa9/0xa9
    [<ffffffff814409e4>] ? acpi_osi_handler+0xa9/0xa9
    [<ffffffff810d5a17>] request_threaded_irq+0xf7/0x1b0
    [<ffffffff814575be>] ? acpi_ev_sci_dispatch+0x65/0x65
    [<ffffffff81d99f47>] ? acpi_sleep_proc_init+0x2a/0x2a
    [<ffffffff81440ef5>] acpi_os_install_interrupt_handler+0x92/0xc6
    [<ffffffff8145762f>] acpi_ev_install_sci_handler+0x23/0x25
    [<ffffffff81454f72>] acpi_ev_install_xrupt_handlers+0x1c/0x6e
    [<ffffffff81d9b4ba>] acpi_enable_subsystem+0x8b/0x90
    [<ffffffff81d99fbc>] acpi_init+0x75/0x267
    [<ffffffff81d99f47>] ? acpi_sleep_proc_init+0x2a/0x2a
    [<ffffffff81002144>] do_one_initcall+0xd4/0x210
    [<ffffffff81098400>] ? parse_args+0x190/0x480
    [<ffffffff810bbe28>] ? __wake_up+0x48/0x60
    [<ffffffff81d51263>] kernel_init_freeable+0x162/0x1ef
      kernel_init_freeable() init/main.c:1001
      {inlined} do_basic_setup() init/main.c:880
      {inlined} do_initcalls() init/main.c:861
      {inlined} do_initcall_level(.) init/main.c:853
          do_one_initcall(*fn);

The timer irq (pin=0) seems to be special (legacy irq?) and its mp_irqdomain_activate() is called early through setup_IO_APIC() and check_timer(). However, the rest of the mp_irqdomain_activate() calls come later through acpi_init().

What does stand out is that ioapic_set_affinity() for all other irqs is called before acpi_init() and therefore it's likely that something is not yet initialized, so actually activating these irqs at that point is premature. And the combination of commits
    0be275e3a5607b23f5132121bca22a10ee23aa99
    4857c91f0d195f05908fff296ba1ec5fca87066c
actually causes activation of those irqs through __ioapic_write_entry() with eu.entry.mask=0 as already shown by my previous logs.

What I still do not understand is why do those "cached" entries have mask=0. The commit message

    commit 0be275e3a5607b23f5132121bca22a10ee23aa99
    Author: Jiang Liu <jiang.liu@linux.intel.com>
    Date:   Tue Apr 14 10:29:57 2015 +0800

        x86/irq: Use cached IOAPIC entry instead of reading from hardware
    
        Use cached IOAPIC entry instead of reading data from IOAPIC hardware
        registers to improve performance.
        [...]

seems to imply that it's simply a performance optimization and it should not affect what we write to the IOAPIC entry. That's in direct conflict with the observable fact that without this commit the RMW operation would preserve mask=1 but the "cached" entry we write with this commit has mask=0.

I would recommend reverting 0be275e3a5607b23f5132121bca22a10ee23aa99 until this discrepancy is explained and rectified. Unfortunately, that revert has a conflict but reverting

    commit 1f934641294ca2e09016c689862378fbb15da4d4
    Author: Thomas Gleixner <tglx@linutronix.de>
    Date:   Tue Apr 14 10:29:58 2015 +0800

        x86/irq: Remove sis apic bug workaround
    
        The SiS apic bug workaround is now obsolete as we cache the register
        values for performance reasons.
        [...]

first would solve that.
Comment 25 Vladimir Marko 2017-04-22 19:51:57 UTC
The IO_APIC_route_entry initialized in the mp_setup_entry() has mask=0 even though IRQs have not been enabled at that point in the kernel initialization. And since there is no other field where the kernel would remember whether IRQs have been enabled, the early "chip->irq_set_affinity(...)" from setup_ioapic_dest() actually enables IRQs way before the necessary structures have been initialized.

I've also tried to move the setup_ioapic_dest() call to a different place during the initialization. When it's before acpi_bus_init() in acpi_init(), it's broken. If it's after that, everything is OK.

This bug can be fixed in several ways. In my order of preference:

1. Revert 1f934641294ca2e09016c689862378fbb15da4d4,
      and 0be275e3a5607b23f5132121bca22a10ee23aa99.
   The commit message of the latter does not match reality,
   the so-called "cached" entry is not what's been written,
   so returning to RMW operation is preferable.

2. Initialize the IO_APIC_route_entry with mask=1 and update
   this flag when interrupts are enabled/disabled.

3. Move setup_ioapic_dest() after acpi_bus_init() in acpi_init(),
   or to another appropriate place inside acpi_bus_init().

4. Revert 4857c91f0d195f05908fff296ba1ec5fca87066c.
   (This has already been refused by Thomas Gleixner.)

I can try and prepare a patch but the maintainer (Thomas Gleixner) needs to decide which approach to take. I'm not going to unnecessarily write multiple patches.
Comment 26 Alexey Kunitskiy 2017-04-23 18:48:05 UTC
I've compiled and tested #1 approach (revert f934641294ca2e09016c689862378fbb15da4d4,
and 0be275e3a5607b23f5132121bca22a10ee23aa99)
as Vladimir Marko suggested.

It works very good for me so far on 4.9.16
Comment 27 Vladimir Marko 2017-05-01 23:28:05 UTC
I've built a 4.11 kernel with Ubuntu config-4.11.0-041100rc8-generic (and stack protector reduced to regular) and verifed that the issue is still present. However, I do not see any way to edit the bug to change either the affected kernel version or the status.
Comment 28 Dariusz Nawrocki 2017-05-03 21:58:40 UTC
I have built a fresh kernel from current master branch - commit "89c9fea3c8034cdb2fd745f551cde0b507fd6893"  . The bug still exists - the CPU frequency is limited to 800 MHz.

After I reverted the two commits above on my local master branch ( 1f934641294ca2e09016c689862378fbb15da4d4 
 and 0be275e3a5607b23f5132121bca22a10ee23aa99 ) and built the kernel again the notebook works correctly.  It can scale frequency within full range up to 1900 MHz.
Comment 29 Marek Dvořák 2017-06-15 09:55:16 UTC
Linux 4.10.0-22-generic #24-Ubuntu SMP Mon May 22 17:43:20 UTC 2017 x86_64 GNU/Linux

The same issue on the same notebook.

But I wanted to share a bit about kernel parameters. `acpi=noirq` works around the cpu scaling successfully, but the wifi got broken (though I have tried only one boot so I don't know how statistically significant that is). Then I tried `noapic` (It seems like comment #6 tried `acpi=noapic` instead) and both cpu and wlan seem to work.

What's the status of the patch?
Comment 30 Vladimir Marko 2017-06-18 08:07:10 UTC
Awaiting reply from Thomas Gleixner (see comment #25) who seems to be ignoring this bug (last comment at 2017-01-19 16:40:17 UTC) and even a direct email (May 1, 2017). I'm wondering if it really takes public "Linus Nvidia rant" to make things move forward.
Comment 31 Marek Dvořák 2017-06-18 20:50:24 UTC
I don't know if those issues are related (since they seem to be all ACPI related), but the reboot/powerdown are not shutting the notebook properly. Do your proposed patches solve also this issue? Or are they separate?
Comment 32 Vladimir Marko 2017-06-18 22:34:47 UTC
Yes, each of the proposed fixes solves the broken powerdown/reboot as well.
Comment 33 Manuel Bärenz 2018-02-22 09:28:15 UTC
Given that this is still an issue, and that the 4.1 kernel is affected by the meltdown/spectre vulnerability, what's the current way of properly resolving this?
Comment 34 Vladimir Marko 2018-02-22 22:53:14 UTC
Re comment #33: What does this bug have to do with the meltdown/spectre?
(This was still an issue when I tried 4.14.)
Comment 35 Manuel Bärenz 2018-02-23 09:52:50 UTC
(In reply to Vladimir Marko from comment #34)
> Re comment #33: What does this bug have to do with the meltdown/spectre?
> (This was still an issue when I tried 4.14.)

The bug itself doesn't have anything to do with meltdown/spectre. But one potential resolution (using the 4.1 kernel) conflicts with it.
Comment 36 Vladimir Marko 2018-04-02 19:40:09 UTC
I just tried 4.16 and everything works fine. Looking at the history of arch/x86/kernel/apic/io_apic.c , I would guess the fix was actually

    commit 90ad9e2d91067983f3328e21b306323877e5f48a
    Author: Thomas Gleixner <tglx@linutronix.de>
    Date:   Wed Sep 13 23:29:49 2017 +0200

        x86/io_apic: Reevaluate vector configuration on activate()
        [...]

which was already present in 4.15.
Comment 37 Vladimir Marko 2018-04-02 21:31:10 UTC
Reverting the commit 90ad9e2d91067983f3328e21b306323877e5f48a (see comment #36) did not reintroduce the bug. So it remains unknown where between 4.14 and 4.16 this bug was fixed and whether 4.15 is affected.
Comment 38 Manuel Bärenz 2018-04-04 07:31:37 UTC
I just tested 4.15 on nixOS 18.03. The fan is quiet and the temperature can be set to all possible values via "cpupower frequency-set -f". Weirdly, the ondemand governor cranks up the frequency to maximum even if not under load, but that's maybe a separate bug, and no big deal.
Comment 39 Alexey Kunitskiy 2018-10-05 07:50:57 UTC
Tested on 4.14.73 - works bad.
On 4.18.11 works good for me, including frequency scaling.