Bug 68291

Summary:	pci_enable_device crashes in io_apic_setup via acpi_register_gsi
Product:	Platform Specific/Hardware	Reporter:	Alan (alan)
Component:	x86-64	Assignee:	platform_x86_64 (platform_x86_64)
Status:	NEW ---
Severity:	normal	CC:	adamw, ashok.raj, bugzilla, jacek.m.danecki, jfw, lenb, mathias.nyman, mike, nrndda, yao.jin
Priority:	P1
Hardware:	All
OS:	Linux
Kernel Version:	next-20140107, 3.13-rc7	Subsystem:
Regression:	Yes	Bisected commit-id:
Attachments:	Full log of the boot patch to trigger crash on 3.14.rc4 patch to solve crash IRQ conflict workaround patch

Description Alan 2014-01-08 17:10:20 UTC

Created attachment 121281 [details]
Full log of the boot

On 3.11 Baytrail/T devices can be successfully booted with working graphics providing the mode is forced.

On -next this has regressed and the machine crashes and burns on boot

Modules linked in: i915(+) brcmfmac coretemp kvm_intel brcmutil cfg80211 kvm i2c_algo_
bit drm_kms_helper drm asix hid_multitouch usbnet mii microcode i2c_hid i2c_designware
_platform wmi video int3403_thermal acpi_pad i2c_designware_core hid_generic usbhid hi
d usb_storage mmc_block sdhci_acpi sdhci
CPU: 2 PID: 359 Comm: systemd-udevd Tainted: G        W    3.13.0-rc7-next-20140107 #2
Hardware name: ASUSTeK COMPUTER INC. T100TA/T100TA, BIOS T100TA.214 09/22/2013
task: ffff880034764530 ti: ffff880070c72000 task.ti: ffff880070c72000
RIP: 0010:[<ffffffff81042804>]  [<ffffffff81042804>] __add_pin_to_irq_node+0x34/0xb0
RSP: 0018:ffff880070c73918  EFLAGS: 00010206
RAX: 3c63970328003230 RBX: 0000000000000000 RCX: 0000000000000010
RDX: 0000000000000000 RSI: 00000000ffffffff RDI: 3c63970328003230
RBP: ffff880070c73938 R08: ffff88007271e900 R09: ffff880073000000
R10: 0000000000000012 R11: ffffffff81366d77 R12: 0000000000000010
R13: 00000000ffffffff R14: ffff88007271e818 R15: 0000000000000000
FS:  00007fadb7693880(0000) GS:ffff880078900000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007fadb4e49000 CR3: 0000000070c5c000 CR4: 00000000001007e0
Stack:
 0000000000000010 ffff88007271e818 00000000ffffffff ffff880070c739f8
 ffff880070c739a0 ffffffff81042ef4 ffff8800724af000 0000000000000001
 ffff880070c73970 ffffffff8138332f ffff880070c73990 ffffffff814ca86c
Call Trace:
 [<ffffffff81042ef4>] io_apic_setup_irq_pin+0x84/0x2f0
 [<ffffffff8138332f>] ? acpi_ut_remove_reference+0x2e/0x31
 [<ffffffff814ca86c>] ? dmi_matches+0x5c/0xf0
 [<ffffffff81044748>] io_apic_setup_irq_pin_once+0x88/0xd0
 [<ffffffff810447e4>] io_apic_set_pci_routing+0x34/0x70
 [<ffffffff8103c17f>] mp_register_gsi+0xaf/0x1c0
 [<ffffffff8103c29e>] acpi_register_gsi_ioapic+0xe/0x10
 [<ffffffff8103bfef>] acpi_register_gsi+0xf/0x20
 [<ffffffff81367013>] acpi_pci_irq_enable+0x120/0x206
 [<ffffffff813288af>] ? pci_bus_read_config_word+0x7f/0x90
 [<ffffffff814e0451>] pcibios_enable_device+0x31/0x40
 [<ffffffff8133037b>] do_pci_enable_device+0x3b/0x60
 [<ffffffff81331708>] pci_enable_device_flags+0xc8/0x120
 [<ffffffff813317b3>] pci_enable_device+0x13/0x20
 [<ffffffffa00d0f98>] drm_get_pci_dev+0x68/0x220 [drm]
 [<ffffffffa02906fb>] i915_pci_probe+0x3b/0x60 [i915]
 [<ffffffff81332455>] local_pci_probe+0x45/0xa0
 [<ffffffff813336f5>] ? pci_match_device+0xc5/0xd0
 [<ffffffff81333811>] pci_device_probe+0xd1/0x130
 [<ffffffff813e37e5>] driver_probe_device+0x125/0x3b0
 [<ffffffff813e3b43>] __driver_attach+0x93/0xa0
 [<ffffffff813e3ab0>] ? __device_attach+0x40/0x40
 [<ffffffff813e1753>] bus_for_each_dev+0x63/0xa0
 [<ffffffff813e319e>] driver_attach+0x1e/0x20
 [<ffffffff813e2d80>] bus_add_driver+0x180/0x250
 [<ffffffffa0206000>] ? 0xffffffffa0205fff
 [<ffffffff813e41c4>] driver_register+0x64/0xf0
 [<ffffffffa0206000>] ? 0xffffffffa0205fff
 [<ffffffff81331e0b>] __pci_register_driver+0x4b/0x50
 [<ffffffffa00d126a>] drm_pci_init+0x11a/0x130 [drm]
 [<ffffffffa0206000>] ? 0xffffffffa0205fff
 [<ffffffffa020606a>] i915_init+0x6a/0x6c [i915]
 [<ffffffff8100212a>] do_one_initcall+0xda/0x180
 [<ffffffff81050953>] ? set_memory_nx+0x43/0x50
 [<ffffffff810d1bb7>] load_module+0x1df7/0x25d0
 [<ffffffff810cd6d0>] ? store_uevent+0x40/0x40
 [<ffffffff810d2432>] SyS_init_module+0xa2/0xf0
 [<ffffffff815a18ed>] system_call_fastpath+0x1a/0x1f
Code: 41 56 49 89 fe 41 55 41 89 f5 41 54 41 89 cc 53 48 8b 07 89 d3 48 85 c0 75 11 eb
 2f 0f 1f 00 48 8b 78 08 48 85 ff 74 1f 48 89 f8 <39> 18 75 f0 44 39 60 04 75 ea 5b 41
 5c 41 5d 41 5e 31 c0 5d c3 
RIP  [<ffffffff81042804>] __add_pin_to_irq_node+0x34/0xb0
 RSP <ffff880070c73918>
---[ end trace 2b029fbbda559dd1 ]---

Comment 1 Alan 2014-01-08 17:10:58 UTC

The box is set up for debugging so I can test patches etc on it.

Comment 2 Len Brown 2014-01-08 21:47:29 UTC

Alan reports that 3.11 worked and 3.13-rc7 fails.
marking this report as a regression.

Comment 3 Len Brown 2014-01-10 06:47:47 UTC

Alan reports 3.12 works.
So the regression is between 3.12 and 3.13-rc7

Comment 4 Alan 2014-01-13 20:14:02 UTC

Much digging and odd bisect behaviour later which wouldn't give straight answers

The critical thing is the BYT pi mux stuff it seems

I see

byt_gpio: INT33FC:02: Failed to request IRQ 18 ACPI event handler
ACPI: \_SB_.PCIO0: notify handler is installed

Then a bit later the i915 explosion

CONFIG_PINCTRL_BAYTRAIL = y ; crashes
CONFIG_PINCTRL_BAYTRAIL = n ; all works

Can't tell easily if this is a pinmux bug or perhaps the Failed to request.. sets up a future fail on the same IRQ line ?

Comment 5 Mathias Nyman 2014-01-14 16:23:43 UTC

the baytrail pinctrl/gpio driver tries to set up 
interrupt handlers for gpio triggered ACPI events.

Just to narrow it down, does this help?:

diff --git a/drivers/pinctrl/pinctrl-baytrail.c b/drivers/pinctrl/pinctrl-baytrail.c
index 114f5ef..f4a7ecb 100644
--- a/drivers/pinctrl/pinctrl-baytrail.c
+++ b/drivers/pinctrl/pinctrl-baytrail.c
@@ -486,8 +486,6 @@ static int byt_gpio_probe(struct platform_device *pdev)
                irq_set_handler_data(hwirq, vg);
                irq_set_chained_handler(hwirq, byt_gpio_irq_handler);
 
-               /* Register interrupt handlers for gpio signaled acpi events */
-               acpi_gpiochip_request_interrupts(gc);
        }

This function requests interrupts for gpio pins that have ACPI GPIO_INT resources in ACPI DSDT table and an ACPI event handler that should be called when interrupt triggered.

Comment 6 Alan 2014-01-14 23:26:06 UTC

Yes that stops the crash

Comment 7 Adam Williamson 2014-01-16 22:00:44 UTC

Seems to resolve the crash here too...and now I'm back to a blank screen when modesetting kicks in, on the v8p.

Comment 8 Alan 2014-01-16 22:32:02 UTC

Thats the other bug.

Interestingly for that one if you remote login or script X to do an xrandr newmode, xrandr addmode and xrandr --output VGA1 --mode blah

you get video.

Comment 9 Adam Williamson 2014-01-16 22:43:43 UTC

Yeah, I thought it had been fixed, but I think what happened was actually that this bug showed up and broke KMS entirely.

My report for the blank-screen issue is https://bugs.freedesktop.org/show_bug.cgi?id=71977 - do you have one too?

Comment 10 Adam Williamson 2014-01-16 22:52:38 UTC

Alan: should I close my report for this issue - https://bugzilla.kernel.org/show_bug.cgi?id=67861 - as a dupe of this one?

Comment 11 Alan 2014-01-17 00:42:19 UTC

If the same "fix" works for both platforms then yeah - mark it a dup of this

Comment 12 Adam Williamson 2014-01-17 01:55:37 UTC

*** Bug 67861 has been marked as a duplicate of this bug. ***

Comment 13 Mathias Nyman 2014-02-28 17:18:41 UTC

This is triggered on on the Asus t100ta baytrail/T because of a irq descriptor
conflict.

There are two gpio triggered acpi events in this device, GPIO 6 and 18 (0x12).
These gpios are translated to irqs by calling gpio_to_irq()which in turn will call irq_create_mapping(vg->domain, offset);

irq_create_mapping will take care of allocating the irq descriptor, taking the first available number starting from the given value (6 in our case)

the first 16 (0-15) are already reserved by legacy ISA code, so it gets the first free irq descriptor which is number 16.

The i915 driver also uses irq 16, it loads later than gpio and crashes in probe.

I'm not familiar with how the inner parts of irq code works, or how i915 driver 
requests its interrupt, but I think that if descriptor number 16 is reserved then it should just get the next virtual irq number and requst that one?

Comment 14 Adam Williamson 2014-02-28 17:35:45 UTC

Huh. I don't recall what we did about this bug, but it clearly got fixed somehow at some point, because I don't have it with 3.14rc4 and I'm not carrying any patches for it.

Comment 15 Mathias Nyman 2014-03-03 08:39:46 UTC

(In reply to Adam Williamson from comment #14)
> Huh. I don't recall what we did about this bug, but it clearly got fixed
> somehow at some point, because I don't have it with 3.14rc4 and I'm not
> carrying any patches for it.

Issue never got fixed, The patch that enables GPIO driver to load
just got reverted. 

commit 2b844ba79f4a114bd228ad6fee040ffd99a0963d
Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Date:   Fri Jan 17 14:23:29 2014 +0100

    Revert "ACPI: Add BayTrail SoC GPIO and LPSS ACPI IDs"

Comment 16 Adam Williamson 2014-03-03 17:01:33 UTC

Ah, right. But: I have that patch re-applied in my current builds, and they don't crash.

See https://github.com/AdamWill/baytrail-m/blob/master/kernel/MANIFEST:

* LPSS support, probably needed for sound to work
* https://lkml.org/lkml/2014/2/18/127
baytrail_acpi_pwm.patch
* https://lkml.org/lkml/2014/1/20/88
lpss_pwm.patch
* https://github.com/rjwysocki/linux-pm/commit/b2a51a6d0f96308251cfa41b793c43d0316e3b16
baytrail_gpio_lpss_ids.patch

I'm applying those three patches together, against 3.14rc4, and not hitting this issue.

Comment 17 Jin Yao 2014-03-04 07:55:53 UTC

Looks the IRQ value (16) is hardwired by T100 BIOS. 

Dump debug info in acpi_pci_irq_check_entry() before crash:
        0000:00:02[A] -> [16]

Dump part of ACPI DSDT table:
        Method (_PRT, 0, NotSerialized)  // _PRT: PCI Routing Table
        {
            If (PICM)
            {
                Return (AR00)
            }

            Return (PR00)
        }

        Name (AR00, Package (0x11)
        {
            Package (0x04)
            {
                0x0002FFFF, 
                Zero, 
                Zero, 
                0x10
            }, 
            ......
        }

That can explain why IRQ is used again after GPIO driver loading.

Comment 18 Mathias Nyman 2014-03-06 10:48:10 UTC

(In reply to Adam Williamson from comment #16)
> Ah, right. But: I have that patch re-applied in my current builds, and they
> don't crash.
> 
> See https://github.com/AdamWill/baytrail-m/blob/master/kernel/MANIFEST:
> 
> * LPSS support, probably needed for sound to work
> * https://lkml.org/lkml/2014/2/18/127
> baytrail_acpi_pwm.patch
> * https://lkml.org/lkml/2014/1/20/88
> lpss_pwm.patch
> *
> https://github.com/rjwysocki/linux-pm/commit/
> b2a51a6d0f96308251cfa41b793c43d0316e3b16
> baytrail_gpio_lpss_ids.patch
> 
> I'm applying those three patches together, against 3.14rc4, and not hitting
> this issue.

Oh, right, 3.14rc4 has some modifications already in it.

Short story:

You need one more patch to trigger this, which is not out anywhere,
I'll attach it here.

Long story:

3.14rc4 has some modifications to how gpio triggered acpi event handlers are registered, gpio driver no longer calls acpi_gpiochip_request_interrupts() itself, it's now done in gpiochip_add() automatically. Because pinctrl-baytrail.c calls gpiochip_add() before it sets up interrupts (creates the new irq domain etc.) the gpio triggered acpi events will not work on this version, (and not crash either) so a patch that moves  gpiochip_add() to after interrupt initialization is needed.    

The patches that changes the gpio triggered acpi event registration is here https://lkml.org/lkml/2013/11/26/104

the patch that needs to be applied to make it crash/work is attached

Comment 19 Mathias Nyman 2014-03-06 10:51:14 UTC

Created attachment 128281 [details]
patch  to trigger crash on 3.14.rc4

Comment 20 Jin Yao 2014-03-10 13:40:07 UTC

I think I got the root cause. Let's see 2 execution flows (based on 3.13 kernel):

Flow 1: GPIO
	acpi_gpiochip_request_interrupts() -> byt_gpio_to_irq() -> irq_create_mapping() -> 
		irq_alloc_desc_from()	/* IRQ16 is allocated */
		irq_domain_associate() ->
			domain->ops->map() /* Execute the callback byt_gpio_irq_map() */

			static int byt_gpio_irq_map(struct irq_domain *d, unsigned int virq,
						    irq_hw_number_t hw)
			{
				struct byt_gpio *vg = d->host_data;
				......			
				irq_set_chip_data(virq, vg);	
				......	
			}
			
			int irq_set_chip_data(unsigned int irq, void *data)
			{
				struct irq_desc *desc = irq_get_desc_lock(irq, &flags, 0);
				......
				desc->irq_data.chip_data = data;
				......
			}

	Now the chip_data in irq_desc of IRQ16 is pointing to "byt_gpio".

Flow 2: i915
	acpi_pci_irq_enable()	/* From ACPI, we know the gsi=16 */
		-> acpi_register_gsi() -> mp_register_gsi() -> io_apic_set_pci_routing() ->
		io_apic_setup_irq_pin_once() -> io_apic_setup_irq_pin()
		
		/* irq = 16 here */
		io_apic_setup_irq_pin(unsigned int irq, int node, struct io_apic_irq_attr *attr)
		{
		/* A */	struct irq_cfg *cfg = alloc_irq_and_cfg_at(irq, node);
			int ret;
		
			if (!cfg)
				return -EINVAL;
			ret = __add_pin_to_irq_node(cfg, node, attr->ioapic, attr->ioapic_pin);
			if (!ret)
				setup_ioapic_irq(irq, cfg, attr);
			return ret;
		}

	The problem is at line A, actually the returned cfg is the chip_data in irq_desc of IRQ16, it's pointing to "byt_gpio" in flow 1.

Comment 21 Mathias Nyman 2014-03-10 16:23:54 UTC

Created attachment 128821 [details]
patch to solve crash

great catch, so io_apic code assumes all irqs descriptors belong to a io_apic irq chip, and that all chip_data is of type irq_cfg.

This patch checks that the interrupt belongs to ioapic before touching chip data.

With this patch device boots, but graphics probing fails because the legacy interrupt it wants is reserved.

Comment 22 Ashok Raj 2014-03-11 17:30:54 UTC

Guess this is a better way instead of panic.. Do you think some debug printk's would benefit? Simply because 

If this happens it appears this is a bug in ACPI DSL in the platform, or Linux. In this case it appears its a Linux bug? 

looking at the dsl for GPIO below, the IRQ number is 0x31... why would linux allocate 0x10 instead? 


        Device (GPO0)
        {
            Name (_ADR, Zero)  // _ADR: Address
            Name (_HID, "INT33FC")  // _HID: Hardware ID
            Name (_CID, "INT33FC")  // _CID: Compatible ID
            Name (_DDN, "ValleyView General Purpose Input/Output (GPIO) controller")  // _DDN: DOS Device Name
            Name (_UID, One)  // _UID: Unique ID
            Method (_CRS, 0, NotSerialized)  // _CRS: Current Resource Settings
            {
                Name (RBUF, ResourceTemplate ()
                {
                    Memory32Fixed (ReadWrite,
                        0xFED0C000,         // Address Base
                        0x00001000,         // Address Length
                        )
                    Interrupt (ResourceConsumer, Level, ActiveLow, Shared, ,, )
                    {
                        0x00000031,
                    }
                })
                Return (RBUF)
            }

Comment 23 Mathias Nyman 2014-03-12 09:46:32 UTC

(In reply to Ashok Raj from comment #22)

> looking at the dsl for GPIO below, the IRQ number is 0x31... why would linux
> allocate 0x10 instead? 
> 
>                     Interrupt (ResourceConsumer, Level, ActiveLow, Shared,
> ,, )
>                     {
>                         0x00000031,
>                     }

This is the interrupt resource the gpio controller uses, this is not the issue in this case.

the GPIO controller works as an interrupt controller, the gpio controller is connected to only one interrupt, and driver only listens to that one (0x31). but the driver provides lots of interrupts to the system, one for each gpio line if needed. So when interrupt 0x31 is called the gpio interrupt handler checks which pin was asserted and calls the handler for that specific interrupt.

For example when sd card detection is connected to GPIO pin 38, the sd card driver asks the gpio driver what interrupt that corresponds to by calling gpio_to_irq().  gpio_to_irq() in the gpio driver maps gpio 38 to a irq descriptor by finding a free interrupt descriptor from a global bitmap of all descriptor. Once a free one is found it is returned to SD card driver. for example descriptor 40

SD card driver registers a handler for irq 40. When a sd card is inserted gpio line 38 is set high, the gpio controller fires an interrupt on (0x31), gpio driver interrupt handler is called, it checks which gpio was asserted (was line 38, it finds which interrupts gpio 38 is mapped to (40) and calls the interrupt handler for 40 (the interrupt handler registered by SD card)

Issue here is that on x86 the ioapic used to be the only one providing interrupts (well, int these ranges atleast I think) so ioapic code assumes that if a intrrupt descriptor is occupied in the global bitmap it belongs to an ioapic, and it accesses data that is only valid for ioapics which causes failures.

In this specific case we collide on interrupt descriptor 16. And it turns out the graphic driver does not even need interrupt.

So this needs to be fixed in two places:
1. make sure io_apic code does not access data in interrupt descriptors that belong to other "interrupt controllers", eg gpio than io_apic.

2. fix graphics driver / bios to not request the not needed irq 16 (graphics driver use msi interrupts)

-Mathias

Comment 24 Jin Yao 2014-03-13 04:12:41 UTC

Another question confuses me for a while why the GPIO controller registered IRQ can't be visible in /proc/interrupts? They should be 48/49/50 on ASUS T100. 

Following are the log of /proc/interrupts on my T100:

            CPU0       CPU1       CPU2       CPU3       
   0:         26          0          0          0   IO-APIC-edge      timer
   4:          2          0          0          0   IO-APIC-edge    
  16:          0          0          0          0  BYT-GPIO    6-demux     GPIO-signaled-ACPI-event
  18:          0          0          0          0  BYT-GPIO   18-demux     GPIO-signaled-ACPI-event
  32:          0          0          0          0   IO-APIC-fasteoi   80860F41:00
  33:          0          0          0          0   IO-APIC-fasteoi   80860F41:01
  34:          0          0          0          0   IO-APIC-fasteoi   80860F41:02
  35:          0          0          0          0   IO-APIC-fasteoi   80860F41:03
  36:          0          0          0          0   IO-APIC-fasteoi   80860F41:04
  37:       1033          0          0          0   IO-APIC-fasteoi   80860F41:05
  38:          0          0          0          0   IO-APIC-fasteoi   80860F41:06
  40:          0          0          0          0  BYT-GPIO   38-demux     sd_cd
  44:       2824          0          0          0   IO-APIC-fasteoi   mmc0
  46:        732          0          0          0   IO-APIC-fasteoi   mmc1
  47:          0          0          0          0   IO-APIC-fasteoi   mmc2
  69:          1          0          0          0   IO-APIC-fasteoi   ATML1000:00
 103:      26387          0          0          0   PCI-MSI-edge      xhci_hcd
 NMI:          0          0          0          0   Non-maskable interrupts
 LOC:       3110       1788       1278       1685   Local timer interrupts
 SPU:          0          0          0          0   Spurious interrupts
 PMI:          0          0          0          0   Performance monitoring interrupts
 IWI:        204        215        119        288   IRQ work interrupts
 RTR:          0          0          0          0   APIC ICR read retries
 RES:       2802       3989       2722       4779   Rescheduling interrupts
 CAL:        723        587        917       4521   Function call interrupts
 TLB:         36         68         18         43   TLB shootdowns
 TRM:          0          0          0          0   Thermal event interrupts
 THR:          0          0          0          0   Threshold APIC interrupts
 MCE:          0          0          0          0   Machine check exceptions
 MCP:          1          1          1          1   Machine check polls
 ERR:          0
 MIS:          0

Comment 25 Jin Yao 2014-03-13 04:49:06 UTC

My understanding for Mathias's explanation is, when a sd card is inserted, interrupt 49 is fired, but interrupt 40 will not be fired. 

Interrupt 49 is belong to GPIO controller. In interrupt 49 handler it directly calls the interrupt 40 registered handler.

For IRQ16, my understanding it's different than interrupt 40. It has an individual interrupt handler thread which will execute the predefined ACPI handler.

For example, if pin6 on GPIO is triggered, interrupt 16 is fired. The interrupt handler thread of IRQ16 will execute the ACPI handler.

Correct me if my understanding is wrong.

Comment 26 Mathias Nyman 2014-03-13 12:16:17 UTC

(In reply to Jin Yao from comment #25)
> 
> For IRQ16, my understanding it's different than interrupt 40. It has an
> individual interrupt handler thread which will execute the predefined ACPI
> handler.
> 
> For example, if pin6 on GPIO is triggered, interrupt 16 is fired. The
> interrupt handler thread of IRQ16 will execute the ACPI handler.
> 
> Correct me if my understanding is wrong.

Same still goes for the gpio triggered ACPI events as for the sd card. Early when gpio driver calls gpiochip_add() we register interrupt handlers for gpio triggered acpi events.

registration simplified:

gpiochip_add()   // pinctrl-baytraul.c
 acpi_gpiochip_add();   // gpiolib.c
  acpi_gpiochip_request_interrupts()  // gpiolib-acpi.c
  // search for gpio interrupts and acpi handlers from ACPI tables.
  // finds pin 6, pin 6 will get mapped to irq 16 in here: 
  irq = chip->to_irq(chip, pin);  // calls baytrail gpio_to_irq() which maps.
  devm_request_threaded_irq(..,irq,.., acpi_gpio_irq_handler); //set handler for interrupt descriptor 16. (irq = 16, pin = 6))

When gpio line 6 is set high, interrupt (0x31) fires, gpio driver interrupt handler is called -> finds line 6 was high -> line 6 maps to interrupt descriptor 16, calls the interrupt handler set for interrupt 16 -> acpi_gpio_irq_handler()

static irqreturn_t acpi_gpio_irq_handler(int irq, void *data)
{
	acpi_handle handle = data;

        acpi_evaluate_object(handle, NULL, NULL, NULL);

        return IRQ_HANDLED;
}

Ready.

Comment 27 Mathias Nyman 2014-03-13 12:22:41 UTC

(In reply to Jin Yao from comment #24)
> Another question confuses me for a while why the GPIO controller registered
> IRQ can't be visible in /proc/interrupts? They should be 48/49/50 on ASUS
> T100. 

I'm not sure, I remember something about proc/interrupts not showing chained handlers.

irq_set_chained_handler(hwirq, byt_gpio_irq_handler);

Comment 28 Ashok Raj 2014-03-13 18:46:04 UTC

I just can't understand what the point in creating a irq domain, irq controller, but yet collide with a global name space for irq. 

since the gpio handler is the one manually calling the others, the irq's serviced by GPIO is just one required to get things going. 

pci seems to have the smarts to not request if msi was enabled, but i suspect its in a path which hits before the msi is enabled for graphics.

a simpler chance to avoid (again a hack) would be maybe have a base for the GPIO, and you request irq_create_mapping(vdomain, base+offset)

the real way to fix should be to have the irqs allocated from the interrupt controller domain, and then manage it that way.

Comment 29 Jin Yao 2014-03-14 01:45:37 UTC

Anoher issue I found is devm_request_threaded_irq() returning with failure. 

genirq: Threaded irq requested with handler=NULL and !ONESHOT for irq 16
byt_gpio INT33FC:02: Failed to request IRQ 16 ACPI event handler
Return value of devm_request_threaded_irq() is -22.

Following is my modification, then it works.

/* Assume BIOS sets the triggering, so no flags */
ret = devm_request_threaded_irq(chip->dev, irq, NULL, handler,
-                               0, "GPIO-signaled-ACPI-event",
+                               IRQF_ONESHOT, "GPIO-signaled-ACPI-event",
                                data);

Comment 30 Mathias Nyman 2014-03-14 08:25:11 UTC

(In reply to Jin Yao from comment #29)
> Anoher issue I found is devm_request_threaded_irq() returning with failure. 
> 
> genirq: Threaded irq requested with handler=NULL and !ONESHOT for irq 16
> byt_gpio INT33FC:02: Failed to request IRQ 16 ACPI event handler
> Return value of devm_request_threaded_irq() is -22.
> 
> Following is my modification, then it works.
> 
> /* Assume BIOS sets the triggering, so no flags */
> ret = devm_request_threaded_irq(chip->dev, irq, NULL, handler,
> -                               0, "GPIO-signaled-ACPI-event",
> +                               IRQF_ONESHOT, "GPIO-signaled-ACPI-event",
>                                 data);

This is a known issue being resolved by Mika W. in a patchseries:

http://marc.info/?l=linux-kernel&m=139445622116860&w=2

I also use the "IRQF_ONESHOT" flag while working on this issue until
patches are accepted.

Comment 31 Mathias Nyman 2014-03-14 08:30:09 UTC

(In reply to Ashok Raj from comment #28)
> I just can't understand what the point in creating a irq domain, irq
> controller, but yet collide with a global name space for irq. 

here's some explanation:
https://www.kernel.org/doc/Documentation/IRQ-domain.txt

Comment 32 Jin Yao 2014-03-18 05:54:50 UTC

I think the issue is at the "__irq_alloc_descs()".

__irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node,
		  struct module *owner)
{
	int start, ret;

	if (!cnt)
		return -EINVAL;

	if (irq >= 0) {
		if (from > irq)
			return -EINVAL;
		from = irq;
	}

	mutex_lock(&sparse_irq_lock);

	start = bitmap_find_next_zero_area(allocated_irqs, IRQ_BITMAP_BITS,
					   from, cnt, 0);
	ret = -EEXIST;

	/* Jin Yao: irq = 16, start = 17 */
/* A */ if (irq >=0 && start != irq)
		goto err;

	if (start + cnt > nr_irqs) {
		ret = irq_expand_nr_irqs(start + cnt);
		if (ret)
			goto err;
	}

	bitmap_set(allocated_irqs, start, cnt);
	mutex_unlock(&sparse_irq_lock);
	return alloc_descs(start, cnt, node, owner);

err:
	mutex_unlock(&sparse_irq_lock);
	return ret;
}

At the case of i915 driver probing, the irq is 16 and bitmap_find_next_zero_area() returns 17. That's correct, because 16 is occupied by GPIO pin. The problem is at the line A, the checking is failed and goto "err:" directly.

Line A should be buggy code for this case, but I'm not sure if other code may be impacted if I just removed the line A here.

Comment 33 Jin Yao 2014-03-18 06:46:39 UTC

If the fix at the area of IRQ or graphics is not easy, can gpio take the responsiblity? I mean, since the irq16 is just a virtual interrupt, can it be replaced by GPIO internal variable ? 

For example, when gpio line 6 is set high, interrupt (0x31) fires, gpio driver interrupt handler is called -> finds line 6 was high -> line 6 maps to internal variable A -> calls the handler registered for internal variable A -> acpi_gpio_irq_handler()

Comment 34 Alan 2014-03-18 11:39:48 UTC

Given we now know this is low level irq naughtiness in the x86 tree I've moved this to an x86 bug so that the x86 maintainers also notice and hopefully comment.

Comment 35 Ashok Raj 2014-03-25 16:49:22 UTC

"low level irq naughtiness" :-) nicely put!

The whole irqdomain seems falling short of keeping these allocators local to the domain without clobbering with global irq. couple of ugly fixes possible, but long term i think irqdomain needs to be enhanced to do the right work.

Comment 36 Adam Williamson 2014-04-14 19:35:43 UTC

So https://lkml.org/lkml/2014/4/13/190 has been submitted as a workaround for this (without a note in the bug...), but it completely and utterly fails to apply against current 3.15, for me:

5 out of 5 hunks FAILED -- saving rejects to file drivers/pinctrl/pinctrl-baytrail.c.rej

was it written against / intended for 3.14? The email doesn't seem to make it clear.

Thanks!

Comment 37 John Wells 2014-04-15 08:24:12 UTC

Created attachment 132331 [details]
IRQ conflict workaround patch

IRQ conflict workaround patch by Jin Yao

Comment 38 John Wells 2014-04-15 08:25:36 UTC

Comment on attachment 132331 [details]
IRQ conflict workaround patch

Hey Adam

I couldn't get the patch to apply, so I applied it manually. Here's the diff against my 3.15 tree. See if it helps.

Comment 39 Adam Williamson 2014-04-16 18:14:49 UTC

7 out of 8 hunks fail, when applied against the current Fedora kernel tree, identified as 3.15.0-0.rc1.git2.1 , "Linux v3.15-rc1-49-g10ec34fcb100". That's whether I apply it before or after the patch from https://bugzilla.kernel.org/show_bug.cgi?id=67921 . No other patches in the Fedora kernel build seem to touch pinctrl-baytrail.c .

Comment 40 Adam Williamson 2014-04-17 01:55:39 UTC

OK, so I just rediffed the https://lkml.org/lkml/2014/4/13/190 version of the patch. I'm now building a kernel with both that patch and the fix for https://bugzilla.kernel.org/show_bug.cgi?id=67921 from Doug Johnson, with the bit of Doug's patch that flat out disabled IRQ allocation (to avoid *this* bug) taken out - the idea being to see if we can get SDIO devices showing up and avoid the IRQ allocation conflict problem, all at once. I'll see how that goes.

Comment 41 Jin Yao 2014-04-20 10:12:30 UTC

Could you try the patch https://lkml.org/lkml/2014/4/16/278?

Comment 42 Adam Williamson 2014-04-20 16:14:31 UTC

As noted elsewhere, I built a kernel as described in c#40 - with a rediffed version of the patch sent to ML, plus Doug's fix for 67921 - and it seems to 'work', as in SDIO devices are enumerated and the tablet doesn't crash. However, the touchscreen doesn't work, and Benjamin Tissoires believes this patch is the cause.

I'm now building a kernel without this patch, and with the version of Doug's fix that disables IRQ setup in pinctrl-baytrail.c , to see how that behaves.

I don't see how the version of the patch linked in c#41 is significantly different from the one I tested; is there some important difference?

Comment 43 Adam Williamson 2014-04-23 01:28:17 UTC

It now appears to be Doug Johnson's patch from https://bugzilla.kernel.org/show_bug.cgi?id=67921 which causes the touchscreen issue, not this one. I will confirm this soon, but we think that's the case.

Comment 44 Adam Williamson 2014-04-23 05:15:00 UTC

Well, hum.

First, I built a 3.15rc2 kernel that's stock Fedora Rawhide, except for disabling the backported hid-rmi driver which is also known to cause touchscreen issues. With that one, touchscreen works fine.

Then I added v3 of Doug Johnson's patch - http://dougvj.net/baytrail_gpio_quirk_v3.patch . With that kernel, touchscreen also works fine.

Then I added the patch for this bug, the version linked from c#41 (so the build has *both* the c#41 patch *and* Doug's v3 patch)...and the touchscreen stopped working again.

So, it looks to me like this patch really can break the touchscreen.

Comment 45 Mathias Nyman 2014-04-24 15:18:13 UTC

Proper solution to x86 io_apic code suggested by Thomas Gleixner

http://marc.info/?l=linux-kernel&m=139835044008576&w=2

It moves the region where the dynamic interrupt are started from to
an area after the hardwired GSI interrupts. This way the interrupt descriptors should not conflict

with this fix no workaround/hack should be needed to pinctrl-baytrail

Code untested.

Comment 46 Michael Shigorin 2014-11-10 17:24:33 UTC

3.18-rc4 with CONFIG_PINCTRL_BAYTRAIL=y works for me on hardware that used to break with this option enabled and work with it disabled based on 3.15 or so (Aava Inari 8 tablet in 64-bit mode).  Anyone?