Bug 84381

Summary: Thinkpad extra buttons (thinkpad_acpi) no longer functioning with 3.17.0rc4
Product: ACPI Reporter: Thomas Richter (thor)
Component: Config-InterruptsAssignee: Lan Tianyu (tianyu.lan)
Status: CLOSED CODE_FIX    
Severity: normal CC: dion, jiang.liu, lenb, public, rui.zhang
Priority: P1    
Hardware: Intel   
OS: Linux   
Kernel Version: 3.17.0rc4 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: dmesg of the R31 attached
acpidump attached
Kernel configuration for the IBM R31
Custom DSDT for the R31, disabling the S3 state.

Description Thomas Richter 2014-09-12 10:03:49 UTC
Starting with at least kernel 3.17.0rc4, the thinkpad extra buttons on an IBM R31 are  no longer functioning and remain unresponsive. On the boot-log, I receive the message

ACPI: SCI (ACPI GSI 9) not registered

The same configuration (same user space) worked flawless on a vanilla 3.15.0.
Comment 1 Lan Tianyu 2014-09-17 05:43:48 UTC
Could you provide the output of acpidump and dmesg?
How about the early version of v3.17-rc?
Comment 2 Thomas Richter 2014-09-24 15:41:47 UTC
Created attachment 151791 [details]
dmesg of the R31 attached
Comment 3 Thomas Richter 2014-09-24 15:42:05 UTC
Created attachment 151801 [details]
acpidump attached
Comment 4 Thomas Richter 2014-09-26 14:13:05 UTC
Also reproduced on 3.17.0rc1, thus the bug is at least older. Note that 3.15 worked just fine.

Any news on this?
Comment 5 Thomas Richter 2014-10-02 14:23:01 UTC
The same problem appears on the Fujitsu S6010 laptop. Starting on kernel 3.17, the fujitsu-extra buttons no longer work, though writing into /sys/class/firmware/fujitsu-laptop/brightness does work. For kernel 3.16, it is just the reverse. I can control the backlight with the fujitsu buttons, but writing into /sys/class/firmware/fujitsu-laptop/brightness does nothing, and the gnome brightness applet fails. Thus, I can either control the backlight with the ACPI buttons (kernel 3.16) or with /sys/class/firmware/fujitsu-laptop/brightness (kernel 3.17) but not both.

Could we please restore the setting to 3.15 kernel where *both* worked? 

Thanks.
Comment 6 Thomas Richter 2014-10-03 18:28:37 UTC
I found a patch for the S6010 laptop to make both the keyboard functions and the device working. For that, edit arch/x86/kernel/acpi/boot.c and update acpi_gsi_to_irq as follows:

static unsigned int gsi_to_irq(unsigned int gsi)
{
	unsigned int irq = gsi + NR_IRQS_LEGACY;
	unsigned int i;

	for (i = 0; i < NR_IRQS_LEGACY; i++) {
		if (isa_irq_to_gsi[i] == gsi) {
			return i;
		}
	}

	/* Provide an identity mapping of gsi == irq
	 * except on truly weird platforms that have
	 * non isa irqs in the first 16 gsis.
	 */
	if (gsi >= NR_IRQS_LEGACY)
		irq = gsi;
	else
		irq = gsi_top + gsi;

	return irq;
}
/* THOR: Experimental end */

int acpi_gsi_to_irq(u32 gsi, unsigned int *irqp)
{
	int irq = mp_map_gsi_to_irq(gsi, IOAPIC_MAP_ALLOC | IOAPIC_MAP_CHECK);

	if (irq >= 0) {
		*irqp = irq;
		return 0;
	}

	/* THOR: Experimental start */
	irq = gsi_to_irq(gsi);
	if (irq >= 0) {
	  *irqp = irq;
	  return 0;
	}
	/* THOR Experimental end */

	return -1;
}

Whether this works on the R31 is yet to be determined.
Comment 7 Dmitry Nezhevenko 2014-10-07 07:31:59 UTC
Not sure that it's same but brightness buttons doesn't work on Lenovo t440p laptop with kernel 3.17. Reverting back to 3.16 fixes issue for me. Both kernels provides only /sys/class/backlight/intel_backlight and changing brightness manually works
Comment 8 Dmitry Nezhevenko 2014-10-08 08:59:05 UTC
My issue seems to be different. I've created #85861 for it (bisected).
Comment 9 Thomas Richter 2014-10-08 09:01:54 UTC
A well-formed patch for the issue:

Signed-off-by: Thomas Richter <thor@math.tu-berlin.de>
--- linux-3.17/arch/x86/kernel/acpi/boot.c.old  2014-10-07 15:43:44.000000000 +0200
+++ linux-3.17/arch/x86/kernel/acpi/boot.c      2014-10-07 16:04:33.000000000 +0200
@@ -609,9 +609,23 @@
        if (irq >= 0) {
                *irqp = irq;
                return 0;
-       }
+       } else {
+               unsigned int i;
+
+               for (i = 0; i < NR_IRQS_LEGACY; i++) {
+                       if (isa_irq_to_gsi[i] == gsi) {
+                               *irqp = i;
+                               return 0;
+                       }
+               }
+
+               if (gsi >= NR_IRQS_LEGACY)
+                       *irqp = gsi;
+               else
+                       *irqp = gsi_top + gsi;

-       return -1;
+               return 0;
+       }
 }
 EXPORT_SYMBOL_GPL(acpi_gsi_to_irq);
Comment 10 Jiang Liu 2014-10-14 08:18:57 UTC
Hi Thomas,
    Thanks for report and analyze the regression. I need more information to understand the root cause of the regression, could you please help to provide:
1) kernel configuration file. I want to make sure whether IOAPIC has been disabled.
2) There's no ACPI MADT(APIC) table in the acpidum.out. Is it an uniprocessor system?
3) I'm still checking what causes this warning: "ACPI: Override [DSDT-  CNOTE2], this is unsafe: tainting kernel".
[    0.000000] ACPI: RSDT 0x3F7E0000 00002C (v01 IBM    Cnote2   00003110 IBM  00000001)
[    0.000000] ACPI: FACP 0x3F7E0054 000074 (v01 IBM    Cnote2   00003110 IBM  00000001)
[    0.000000] ACPI: Override [DSDT-  CNOTE2], this is unsafe: tainting kernel
[    0.000000] Disabling lock debugging due to kernel taint
[    0.000000] ACPI: DSDT 0x3F7E00CC Logical table override, new table: 0xC1426B90
[    0.000000] ACPI: DSDT 0xC1426B90 004CBB (v01 IBM    CNOTE2   00003110 INTL 20100528)irq_vector_alloc_param.patch

Thanks!
Gerry
Comment 11 Jiang Liu 2014-10-14 08:39:02 UTC
Hi Thomas,
       Could you please help to test following patch?
Thanks!
Gerry
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index b436fc735aa4..eceba9d9e116 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -604,14 +604,19 @@ void __init acpi_pic_sci_set_trigger(unsigned int irq, u16 trigger)

 int acpi_gsi_to_irq(u32 gsi, unsigned int *irqp)
 {
-       int irq = mp_map_gsi_to_irq(gsi, IOAPIC_MAP_ALLOC | IOAPIC_MAP_CHECK);
+       int irq;

-       if (irq >= 0) {
+       if (acpi_irq_model == ACPI_IRQ_MODEL_PIC) {
+               *irqp = gsi;
+       } else {
+               irq = mp_map_gsi_to_irq(gsi,
+                                       IOAPIC_MAP_ALLOC | IOAPIC_MAP_CHECK);
+               if (irq < 0)
+                       return -1;
                *irqp = irq;
-               return 0;
        }

-       return -1;
+       return 0;
 }
 EXPORT_SYMBOL_GPL(acpi_gsi_to_irq);
Comment 12 Thomas Richter 2014-10-15 13:05:50 UTC
(In reply to Jiang Liu from comment #10)
> Hi Thomas,
>     Thanks for report and analyze the regression. I need more information to
> understand the root cause of the regression, could you please help to
> provide:
> 1) kernel configuration file. I want to make sure whether IOAPIC has been
> disabled.

I'm creating an attachment. Yes, IOAPIC is disabled. Actually, I wonder whether this system even has an APIC.

> 2) There's no ACPI MADT(APIC) table in the acpidum.out. Is it an
> uniprocessor system?

Yes, this is a uniprocessor system.


> 3) I'm still checking what causes this warning: "ACPI: Override [DSDT- 
> CNOTE2], this is unsafe: tainting kernel".
> [    0.000000] ACPI: RSDT 0x3F7E0000 00002C (v01 IBM    Cnote2   00003110
> IBM  00000001)
> [    0.000000] ACPI: FACP 0x3F7E0054 000074 (v01 IBM    Cnote2   00003110
> IBM  00000001)
> [    0.000000] ACPI: Override [DSDT-  CNOTE2], this is unsafe: tainting
> kernel
> [    0.000000] Disabling lock debugging due to kernel taint
> [    0.000000] ACPI: DSDT 0x3F7E00CC Logical table override, new table:
> 0xC1426B90
> [    0.000000] ACPI: DSDT 0xC1426B90 004CBB (v01 IBM    CNOTE2   00003110
> INTL 20100528)irq_vector_alloc_param.patch

It means that the system runs with a custom DSDT. I'm attaching it, but it only differs from the original DSDT in so far as it disables the S3 sleep state. Unfortunately, this system has a broken Bios which breaks resume out of S3 (suspend to RAM) - the system never reaches the kernel real-mode entry point and just hangs. The same problem (no thinklight, no ACPI-buttons) also appears with the regular DSDT, but then closing the lid causes the system to enter S3 state instead of S1 state, and hence crashes the machine.
Comment 13 Thomas Richter 2014-10-15 13:06:18 UTC
Created attachment 153851 [details]
Kernel configuration for the IBM R31
Comment 14 Thomas Richter 2014-10-15 13:06:54 UTC
Created attachment 153861 [details]
Custom DSDT for the R31, disabling the S3 state.
Comment 15 Thomas Richter 2014-10-15 13:34:35 UTC
(In reply to Jiang Liu from comment #11)

Hi Gerry,

>        Could you please help to test following patch?

/* snip */

Yes, this patch works too. At least, it works in the R31. I would expect that it also works on the S6010 given that the architecture of the system is pretty similar (except that the S6010 connects its internal display by an exotic DVO).

Thanks.
Comment 16 Jiang Liu 2014-10-20 15:45:43 UTC
Patch has been merged into tip tree.
Commit-ID:  961b6a7003acec4f9d70dabc1a253b783cb74272
Gitweb:     http://git.kernel.org/tip/961b6a7003acec4f9d70dabc1a253b783cb74272
Author:     Jiang Liu <jiang.liu@linux.intel.com>
AuthorDate: Mon, 20 Oct 2014 22:45:27 +0800
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Mon, 20 Oct 2014 17:23:00 +0200

x86: ACPI: Do not translate GSI number if IOAPIC is disabled

When IOAPIC is disabled, acpi_gsi_to_irq() should return gsi directly
instead of calling mp_map_gsi_to_irq() to translate gsi to IRQ by IOAPIC.
It fixes https://bugzilla.kernel.org/show_bug.cgi?id=84381.

This regression was introduced with commit 6b9fb7082409 "x86, ACPI,
irq: Consolidate algorithm of mapping (ioapic, pin) to IRQ number"

Reported-and-Tested-by: Thomas Richter <thor@math.tu-berlin.de>
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Thomas Richter <thor@math.tu-berlin.de>
Cc: rui.zhang@intel.com
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: <stable@vger.kernel.org> # 3.17
Link: http://lkml.kernel.org/r/1413816327-12850-1-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Comment 17 Matías de la Cruz 2014-10-23 00:55:40 UTC
Just updated to 3.17.1 (ArchLinux 3.17.1-1) on a T23 and I am receiving this notification during boot (right before fsck checks) and in the related system journal (systemd setup). Previous kernel (3.16.4) didn't show anything related as far as I can tell.

As far as I have tested though the Thinkpad extra buttons are working as expected in my case (brightness and sleep have been tested so far).

Therefore, noting #16 and the reference to a patch, is this notification still supposed to appear in the 3.17.1 kernel? Or am I talking too soon, despite #16 as mentioned and taking into account the bug status?
Comment 18 Jiang Liu 2014-10-23 05:20:05 UTC
Hi Matias,
     The patch is still in tip tree, and hasn't been merged into upstream tree yet.
Comment 19 Zhang Rui 2014-10-24 06:48:23 UTC
Mark the bug as Resolved as there is already a patch available.
And we will close it when the patch is shipped in Linus' tree.
Comment 20 Matías de la Cruz 2014-11-03 20:01:15 UTC
#18

Thanks for the hint.

It seems that I have finally found what regression I experiment with all the kernels (both 3.17.1 and 3.17.2) that feature the 'ACPI: SCI (ACPI GSI 9) not registered' notification. 
It is not related to the Thinkpad extra buttons that work correctly as mentioned but that any USB device connected does not get its related modules loaded.

As soon as I downgrade to a kernel not featuring the error (3.16.4) USB devices are detected as usual and the modules are loaded. It works too with the LTS kernel (3.14.23) even though I am not sure it is that much of a valid comparison with the up-to-date kernel.

Should I consider it related to this bug or, despite the same notification (nothing else shows up anywhere), report it as a new one?
Comment 21 Jiang Liu 2014-11-04 01:30:42 UTC
hi Matías,
    Could you please help to try 3.18-rc3? The fix has been merged into 3.18-rc3.
Comment 22 Matías de la Cruz 2014-11-04 22:45:01 UTC
(In reply to Jiang Liu from comment #21)
> hi Matías,
>     Could you please help to try 3.18-rc3? The fix has been merged into
> 3.18-rc3.

I am sorry to say that I will have to pass on your offer, at least with regard to the rc3 version. This computer is quite necessary to work 'as good (and stable) as possible' and isn't therefore available for this type of testing.

Still, running Archlinux and, as such, a cutting edge distribution (which might sound like a contradiction to what I have just stated even though IMHO it is not) I will surely report as soon as possible once the kernel is available in the related repositories. 

I hope this would still help with your (and my own if the fix is suppose to address the type of regression reported by me) 'testing needs' somehow.
Comment 23 Matías de la Cruz 2014-11-17 00:30:37 UTC
I have just updated to kernel 3.17.3 and as far as I can tell the ACPI warning does not longer appear and the USB stick I just tested after rebooting from LTS to the new kernel is detected correctly.

Has the fix been merged earlier than stated in #21 or something else was changed for this to happen?

I will test more thoroughly tomorrow to be sure everything works as it should instead of some sort of momentary placebo.
Comment 24 Matías de la Cruz 2014-11-17 04:03:27 UTC
Even though I haven't started further testing as of yet with regard to the USB issue I have already come across the first regression after the update (whether the patch is included or not in this kernel).
Neither hibernation nor suspend mode work as they should.
Hibernation starts but it stalls with the indicator left blinking. As for Suspension the process apparently works fine but when resuming from that state it stalls.

Every time I tested, after (hard) shutdown and rebooting, the journal is recovered but, unfortunately, it does not show anything useful as to what happened. It merely states 'Suspending system...' once the hibernation/suspension has been triggered/executed without offering any further clue.
Comment 25 Matías de la Cruz 2014-11-19 02:52:26 UTC
Tests have confirmed what I stated previously in #24, loading of related USB modules is back once a device of that type is connected (devices tested: USB storage sticks - no mouse/keyboard of this type available for testing) and both suspend/hibernate don't work anymore (crash - former when resuming and the latter when starting with the suspend icon (half moon) left blinking).
Now again I am wondering if I should consider this as part of the present bug report or something different. A hint on this would be much appreciated.

In the same way, if someone knows a way of gathering more evidence on the reasons for this to happen, please, let me know. Thanks in advance.
Comment 26 Jiang Liu 2014-11-19 02:59:37 UTC
Is there any chance to provide any log or panic messages? We need those information to identify possible causes due to we have no access to the hardware.
Comment 27 Matías de la Cruz 2014-11-19 04:50:59 UTC
(In reply to Jiang Liu from comment #26)
> Is there any chance to provide any log or panic messages? We need those
> information to identify possible causes due to we have no access to the
> hardware.

Unfortunately systemd journal doesn't show any further info than the one I posted in #24. If there were, I would have readily posted them.

The only thing that I can think of right now is that f.e. during the hibernation process, once it is triggered, the laptop makes the first 'sound' (just as during suspension) but does not make the second one after which the laptop ultimately powers off. Instead of that second sound the screen powers off and the suspend icon is left blinking as mentioned before.

Whereas during suspension the first 'sound' is made perfectly and the system suspends. Afterwards, the system resumes, the screen turns on and the console cursor appears stuck on a black screen background with no further element on it but the cursor.

It feels a bit dumb to me to explain you this but, in the light of lacking actual logs, maybe these details do offer you some clue on where the 3.17.3 kernel may fail during suspend/hibernate.
Comment 28 Matías de la Cruz 2014-12-20 21:46:55 UTC
Sorry for the late update on this but usual free software maintenance and this time of the year does not leave much time left.

On my behalf latest kernel 3.17.6 solved the suspend/hibernate problems I had been experiencing and the thinkpad extra buttons keep working as usual. Therefore for me this issue is solved.

Thanks again for your kind attention and happy holidays.