Bug 197953 - Interrupt storm on ACER Chromebook CYAN
Summary: Interrupt storm on ACER Chromebook CYAN
Status: NEW
Alias: None
Product: Platform Specific/Hardware
Classification: Unclassified
Component: x86-64 (show other bugs)
Hardware: x86-64 Linux
: P1 high
Assignee: Andy Shevchenko
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-11-21 21:59 UTC by Guenter Roeck
Modified: 2018-01-09 00:04 UTC (History)
6 users (show)

See Also:
Kernel Version: 4.14
Tree: Mainline
Regression: No


Attachments
acpidump -o tables.dat (105.71 KB, text/plain)
2017-11-21 21:59 UTC, Guenter Roeck
Details
dmesg (89.49 KB, text/plain)
2017-11-21 22:00 UTC, Guenter Roeck
Details
dmidecode (3.93 KB, text/plain)
2017-11-21 22:03 UTC, Guenter Roeck
Details
grep -H 15 /sys/bus/acpi/devices/*/status (1.27 KB, text/plain)
2017-11-21 22:04 UTC, Guenter Roeck
Details
lspci -vv -nk (12.50 KB, text/plain)
2017-11-21 22:06 UTC, Guenter Roeck
Details
/sys/kernel/debug/pinctrl/*/pins (9.28 KB, text/plain)
2017-11-22 17:12 UTC, Guenter Roeck
Details
/sys/kernel/debug/irq_domain_mapping (23.88 KB, text/plain)
2017-11-22 17:33 UTC, Guenter Roeck
Details
Another debug patch to test (2.07 KB, patch)
2017-11-22 19:21 UTC, Andy Shevchenko
Details | Diff
Updated dmesg after applying #16 (76.39 KB, text/plain)
2017-11-22 19:48 UTC, Guenter Roeck
Details
Another dmesg, taken after initialization is complete (163.59 KB, text/plain)
2017-11-22 19:52 UTC, Guenter Roeck
Details
Fix for wrong offset in /proc/interrupts (2.81 KB, patch)
2017-11-24 19:54 UTC, Andy Shevchenko
Details | Diff
Mask non-interrupt pins (2.15 KB, patch)
2017-11-27 20:01 UTC, Dmitry Torokhov
Details | Diff
Mask all interrupts the community is able to generate (not GPEs) (655 bytes, patch)
2017-11-28 19:59 UTC, Mika Westerberg
Details | Diff
Adjust interrupt line mapping (v2) (6.38 KB, patch)
2017-11-29 15:59 UTC, Andy Shevchenko
Details | Diff
Quirk masking all interrupts in Intel_Strago based systems (988 bytes, patch)
2017-11-30 11:43 UTC, Mika Westerberg
Details | Diff
Adjust interrupt line mapping (v3) (6.51 KB, patch)
2017-12-01 16:32 UTC, Andy Shevchenko
Details | Diff
260979: Adjust interrupt line mapping (v4) (6.52 KB, patch)
2017-12-01 16:38 UTC, Andy Shevchenko
Details | Diff
gpio / ACPI: Drop unnecessary ACPI GPIO to Linux GPIO translation (13.36 KB, patch)
2017-12-01 19:27 UTC, Andy Shevchenko
Details | Diff
Adjust interrupt line mapping (v5) (6.77 KB, patch)
2017-12-01 19:28 UTC, Andy Shevchenko
Details | Diff

Description Guenter Roeck 2017-11-21 21:59:48 UTC
Created attachment 260765 [details]
acpidump -o tables.dat

I get an interrupt storm when running an image based on 4.14 on cyan (more specifically it is a prototype of chromeos-4.14, ie 4.14 with Chrome OS patches on top of it).

[   60.089234] ->handle_irq():  ffffffffbe2f803f, 
[   60.090455] 0xffffffffbf2af380
[   60.090510] handle_bad_irq+0x0/0x2e5
[   60.090522] ->irq_data.chip(): ffffffffbf2af380, 
[   60.090553]    IRQ_NOPROBE set
[   60.090584] ->handle_irq():  ffffffffbe2f803f, 
[   60.090590] handle_bad_irq+0x0/0x2e5
[   60.090596] ->irq_data.chip(): ffffffffbf2af380, 
[   60.090602] 0xffffffffbf2af380
[   60.090608] ->action():           (null)
[   60.090779] handle_bad_irq+0x0/0x2e5

and so on. The output is that messy because there are just too many bad interrupts.

/proc/interrupts shows:

115:      78302      77589      77478      76348  chv-gpio    0
 135:          0          1          0          0  chv-gpio   20  TS3A227E
 171:     196025     196802     196579     197818  chv-gpio    0
 180:          0          0          0          0  chv-gpio    9  ACPI:Event
 182:          3          1          1          5  chv-gpio   11  i8042
 183:          0          0          0          0  chv-gpio   12  ELAN0000:00
 184:          0          0          0          0  chv-gpio   13  ELAN0001:00
 300:          0          0          0          0  chv-gpio   46  max98090_interrupt
 304:          0          0          0          0  chv-gpio   50  INT33BB:01 cd

Notice interrupts 115 and 171; those are the "bad" ones.

Reverting commit https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/pinctrl/intel/pinctrl-cherryview.c?id=845e405e5e6c9dc9ed10306a4b5bfeaefebc2e84 
fixes the interrupt storm problem, but then the keyboard and touchpad do not work.
Comment 1 Guenter Roeck 2017-11-21 22:00:47 UTC
Created attachment 260767 [details]
dmesg
Comment 2 Guenter Roeck 2017-11-21 22:02:21 UTC
Comment on attachment 260767 [details]
dmesg

dmesg output taken after making the following change to avoid excessive logging.

--- a/kernel/irq/handle.c
+++ b/kernel/irq/handle.c
@@ -30,7 +30,11 @@ void handle_bad_irq(struct irq_desc *desc)
 {
 	unsigned int irq = irq_desc_get_irq(desc);
 
+#if 1
+	WARN_ONCE(1, "irq %d is bad\n", irq);
+#else
 	print_irq_desc(irq, desc);
+#endif
 	kstat_incr_irqs_this_cpu(desc);
 	ack_bad_irq(irq);
 }
Comment 3 Guenter Roeck 2017-11-21 22:02:22 UTC
Comment on attachment 260767 [details]
dmesg

dmesg output taken after making the following change to avoid excessive logging.

--- a/kernel/irq/handle.c
+++ b/kernel/irq/handle.c
@@ -30,7 +30,11 @@ void handle_bad_irq(struct irq_desc *desc)
 {
 	unsigned int irq = irq_desc_get_irq(desc);
 
+#if 1
+	WARN_ONCE(1, "irq %d is bad\n", irq);
+#else
 	print_irq_desc(irq, desc);
+#endif
 	kstat_incr_irqs_this_cpu(desc);
 	ack_bad_irq(irq);
 }
Comment 4 Guenter Roeck 2017-11-21 22:03:42 UTC
Created attachment 260769 [details]
dmidecode
Comment 5 Guenter Roeck 2017-11-21 22:04:59 UTC
Created attachment 260771 [details]
grep -H 15 /sys/bus/acpi/devices/*/status
Comment 6 Guenter Roeck 2017-11-21 22:06:32 UTC
Created attachment 260773 [details]
lspci -vv -nk
Comment 7 Andy Shevchenko 2017-11-22 15:22:55 UTC
On the first glance ACPI tables are broken in different ways. Though I need to dig documentation to understand if it's so or not.

While continue looking at DSDT, I have got one question to you (or maybe two): 
- is this device have latest and greatest BIOS (coreboot) flashed?
- is the device itself a prototype or actual device on market just with new software on top?
Comment 8 Mika Westerberg 2017-11-22 16:42:09 UTC
To me it looks like there are two pins that are misconfigured by the firmware in a way that they have a) interrupt triggering enabled and b) the line is active all the time.

Just to be sure, can you also attach content of /sys/kernel/debug/pinctrl/*/pins to the bug?
Comment 9 Guenter Roeck 2017-11-22 17:10:45 UTC
A similar problem is also seen on Celes (Samsung Chromebook 3), though there is it not an interrupt storm but stops after a while.

(GOOGLE Celes, BIOS Google_Celes.7287.92.104 08/20/2017)

[    0.662722] irq 171, desc: ffff92727a3d0400, depth: 1, count: 0, unhandled: 0
[    0.662728] ->handle_irq():  ffffffff87eb4091, 
[    0.662736] handle_bad_irq+0x0/0x1e8
[    0.662739] ->irq_data.chip(): ffffffff88aab200, 
[    0.662742] 0xffffffff88aab200
[    0.662746] ->action():           (null)
[    0.662748]    IRQ_NOPROBE set
[    0.662751] unexpected IRQ trap at vector ab
[    0.662875] irq 171, desc: ffff92727a3d0400, depth: 1, count: 0, unhandled: 0
[    0.662879] ->handle_irq():  ffffffff87eb4091, 
[    0.662887] handle_bad_irq+0x0/0x1e8
[    0.662890] ->irq_data.chip(): ffffffff88aab200, 
[    0.662893] 0xffffffff88aab200
[    0.662896] ->action():           (null)
[    0.662899]    IRQ_NOPROBE set
[    0.662902] unexpected IRQ trap at vector ab
[    0.662948] irq 171, desc: ffff92727a3d0400, depth: 1, count: 0, unhandled: 0
[    0.662951] ->handle_irq():  ffffffff87eb4091, 
[    0.662954] handle_bad_irq+0x0/0x1e8
[    0.662957] ->irq_data.chip(): ffffffff88aab200, 
[    0.662959] 0xffffffff88aab200
[    0.662962] ->action():           (null)
[    0.662964]    IRQ_NOPROBE set
[    0.662966] unexpected IRQ trap at vector ab
[    0.662942] i8042: Warning: Keylock active
[    0.662990] irq 171, desc: ffff92727a3d0400, depth: 1, count: 0, unhandled: 0
[    0.662993] ->handle_irq():  ffffffff87eb4091, 
[    0.662996] handle_bad_irq+0x0/0x1e8
[    0.662999] ->irq_data.chip(): ffffffff88aab200, 
[    0.663000] 0xffffffff88aab200
[    0.663003] ->action():           (null)
[    0.663005]    IRQ_NOPROBE set
[    0.663008] unexpected IRQ trap at vector ab
[    0.663027] irq 171, desc: ffff92727a3d0400, depth: 1, count: 0, unhandled: 0
[    0.663031] ->handle_irq():  ffffffff87eb4091, 
[    0.663034] handle_bad_irq+0x0/0x1e8
[    0.663037] ->irq_data.chip(): ffffffff88aab200, 
[    0.663039] 0xffffffff88aab200
[    0.663042] ->action():           (null)
[    0.663044]    IRQ_NOPROBE set
[    0.663047] unexpected IRQ trap at vector ab
[    0.663063] irq 171, desc: ffff92727a3d0400, depth: 1, count: 0, unhandled: 0
[    0.663065] ->handle_irq():  ffffffff87eb4091, 
[    0.663068] handle_bad_irq+0x0/0x1e8
[    0.663071] ->irq_data.chip(): ffffffff88aab200, 
[    0.663073] 0xffffffff88aab200
[    0.663076] ->action():           (null)
[    0.663078]    IRQ_NOPROBE set
[    0.663080] unexpected IRQ trap at vector ab
[    0.663128] irq 171, desc: ffff92727a3d0400, depth: 1, count: 0, unhandled: 0
[    0.663131] ->handle_irq():  ffffffff87eb4091, 
[    0.663134] handle_bad_irq+0x0/0x1e8
[    0.663137] ->irq_data.chip(): ffffffff88aab200, 
[    0.663139] 0xffffffff88aab200
[    0.663142] ->action():           (null)
[    0.663144]    IRQ_NOPROBE set
[    0.663147] unexpected IRQ trap at vector ab
Comment 10 Guenter Roeck 2017-11-22 17:12:25 UTC
Created attachment 260793 [details]
/sys/kernel/debug/pinctrl/*/pins
Comment 11 Andy Shevchenko 2017-11-22 17:13:33 UTC
We need also /sys/kernel/debug/irq_domain_mapping when IRQ domain debug option is set in kernel configuration.
If you can get from both affected machines, it would be great.
Comment 12 Guenter Roeck 2017-11-22 17:28:06 UTC
BIOS: I don't know; I'll need to check.
The HW itself is a prototype, but HW-wise it should be close if not the same as the actually shipping device (Acer Chromebook R11). The kernel fails to boot with release units in the lab. Also, the BIOS is known to have an unfixed problem with fixed interrupt numbers (see https://bugs.chromium.org/p/chromium/issues/detail?id=704198; as far as I know the fix did not find its way into a release BIOS). I have no idea if the fix posted with that bug fixes this problem.

I'll try to repeat the test with a release unit and post the result here.

[Note that #9 was with a release unit]
Comment 13 Guenter Roeck 2017-11-22 17:33:30 UTC
Created attachment 260795 [details]
/sys/kernel/debug/irq_domain_mapping
Comment 14 Andy Shevchenko 2017-11-22 18:09:31 UTC
Can you apply below patch (it's mangled, though I think it's could be copied'n'pasted to the ->probe() function manually) and put the result here?

--- a/drivers/pinctrl/intel/pinctrl-cherryview.c
+++ b/drivers/pinctrl/intel/pinctrl-cherryview.c
@@ -1605,6 +1605,9 @@ static int chv_gpio_probe(struct chv_pinctrl *pctrl, int irq)
                offset += range->npins;
        }
 
+dev_info(pctrl->dev, "INTSTAT: %x", readl(pctrl->regs + CHV_INTSTAT));
+dev_info(pctrl->dev, "INTMASK: %x", readl(pctrl->regs + CHV_INTMASK));
+
        /* Do not add GPIOs that can only generate GPEs to the IRQ domain */
        for (i = 0; i < pctrl->community->npins; i++) {
                const struct pinctrl_pin_desc *desc;
Comment 15 Guenter Roeck 2017-11-22 18:19:33 UTC
Here you are:

[    1.620833] cherryview-pinctrl INT33FF:00: INTSTAT: 2
[    1.620856] cherryview-pinctrl INT33FF:00: INTMASK: e
[    1.661506] cherryview-pinctrl INT33FF:01: INTSTAT: 21
[    1.661531] cherryview-pinctrl INT33FF:01: INTMASK: 12f
[    1.706562] cherryview-pinctrl INT33FF:02: INTSTAT: 0
[    1.706586] cherryview-pinctrl INT33FF:02: INTMASK: 0
[    1.736480] cherryview-pinctrl INT33FF:03: INTSTAT: 0
[    1.736504] cherryview-pinctrl INT33FF:03: INTMASK: 1
Comment 16 Andy Shevchenko 2017-11-22 19:21:43 UTC
Created attachment 260797 [details]
Another debug patch to test

Please, apply this one and share results (from dmesg).

P.S. Very interesting bug, indeed.
Comment 17 Guenter Roeck 2017-11-22 19:47:15 UTC
With the patch in #16, the bad interrupt message is gone, but I get an endless sequence of:

[   51.705063] chv_gpio_irq_handler: 201673 callbacks suppressed
[   51.705082] cherryview-pinctrl INT33FF:01: 20 12f
[   51.705158] cherryview-pinctrl INT33FF:01: 20 12f
[   51.705223] cherryview-pinctrl INT33FF:01: 20 12f
[   51.705283] cherryview-pinctrl INT33FF:01: 20 12f
[   51.705367] cherryview-pinctrl INT33FF:01: 20 12f
[   51.705434] cherryview-pinctrl INT33FF:01: 20 12f
[   51.705489] cherryview-pinctrl INT33FF:01: 20 12f
[   51.705545] cherryview-pinctrl INT33FF:01: 20 12f
[   51.705629] cherryview-pinctrl INT33FF:01: 20 12f
[   51.705685] cherryview-pinctrl INT33FF:01: 20 12f

Keyboard and touchpad both work.

Relevant data from /proc/interrupts is now:

 135:          0          0          1          0  chv-gpio   20  TS3A227E
 180:          0          0          0          0  chv-gpio    9  ACPI:Event
 182:          1          1          5          3  chv-gpio   11  i8042
 183:          0          0          0          0  chv-gpio   12  ELAN0000:00
 184:          0          0          0          0  chv-gpio   13  ELAN0001:00
 300:          0          0          0          0  chv-gpio   46  max98090_interrupt
 304:          0          0          0          0  chv-gpio   50  INT33BB:01 cd

meaning the "0" entries are gone.
Comment 18 Guenter Roeck 2017-11-22 19:48:01 UTC
Created attachment 260799 [details]
Updated dmesg after applying #16
Comment 19 Guenter Roeck 2017-11-22 19:51:22 UTC
After some more initialization:

[  501.795021] chv_gpio_irq_handler: 479851 callbacks suppressed
[  501.795046] cherryview-pinctrl INT33FF:01: 20 12f
[  501.795055] cherryview-pinctrl INT33FF:00: 2 e
[  501.795087] cherryview-pinctrl INT33FF:00: 2 e
[  501.795135] cherryview-pinctrl INT33FF:01: 20 12f
[  501.795147] cherryview-pinctrl INT33FF:00: 2 e
[  501.795175] cherryview-pinctrl INT33FF:00: 2 e
[  501.795194] cherryview-pinctrl INT33FF:01: 20 12f
[  501.795226] cherryview-pinctrl INT33FF:01: 20 12f
[  501.795231] cherryview-pinctrl INT33FF:00: 2 e
[  501.795258] cherryview-pinctrl INT33FF:00: 2 e
[  506.796027] chv_gpio_irq_handler: 482149 callbacks suppressed

I'll attach another dmesg.
Comment 20 Guenter Roeck 2017-11-22 19:52:21 UTC
Created attachment 260801 [details]
Another dmesg, taken after initialization is complete
Comment 21 Andy Shevchenko 2017-11-22 20:10:59 UTC
One bug that I found in the driver kinda fixed by the patch.

The issue we have now is due to coreboot default settings:

=== From Linux OS ===
pin 6 (GPIO_DFX_8) GPIO 0x58008200 0x05c00001

Note very first 5 in the line which means interrupt line chosen, which becomes BIT(5) in INTSTAT and in INTMASK, i.e. 0x20 with unmasked IRQ 0x12f & 0x20 == 0x20:

"Interrupt Mask (Interrupt_Mask): Mask bits for the 16 interrupts. Interrupt masked when 0."

=== From Coreboot sources ===
GPI(trig_edge_low, L5, NA, non_maskable, en_rx_data, NA, NA),

It looks like some frequency is on DFX_8 pin.
Comment 22 Andy Shevchenko 2017-11-22 20:15:20 UTC
[  501.795231] cherryview-pinctrl INT33FF:00: 2 e

Same story with HDA, I suppose. 

pin 37 (MF_HDA_DOCKENB) GPIO 0x24c08201 0x05c00003

GPI(trig_edge_both, L2, P_1K_H, non_maskable, en_edge_detect, NA, NA),
Comment 23 Andy Shevchenko 2017-11-22 20:30:51 UTC
Pierre, you might be interested in the ALSA related issues here:
1. Potential dead lock.
2. HDA has two GPIO pins dedicated for interrupts, but it looks like driver only requests one by some reason.

Sorry, if not. Feel free to remove from Cc in that case.
Comment 24 Guenter Roeck 2017-11-22 21:44:05 UTC
Follow-up to #7, #12: I have confirmed that the problem is seen on production units in the lab with latest firmware.

For #21 .. 23: Andy - are you going to send a patch with a fix for the driver problem ? Also, is there anything we can do to work around the problem, such as disabling an interrupt if an interrupt is received but there is no registered handler ?

Thanks,
Guenter
Comment 25 Guenter Roeck 2017-11-22 22:01:23 UTC
Commit bcb48cca23ec9 ("pinctrl: cherryview: Do not mask all interrupts in probe")   may explain why the problem is not seen with older kernels.
Comment 26 Mika Westerberg 2017-11-23 11:14:15 UTC
From the dmesg output, if I read it right we have this:

cherryview-pinctrl INT33FF:01: 20 12f

I guess first one is status and second is mask. Now this seems to indicate the reason is intr_line 5. Looking at your pins file for INT33FF:01 there is this:

pin 6 (GPIO_DFX_8) GPIO 0x58008200 0x05c00001

the 5 in the first dword is the intr_line. This pin seems to be triggering from falling edges. We can't see it here, though. One idea is that you could request this GPIO from the driver through sysfs and see if it stops interrupting:

# cat /sys/kernel/debug/pinctrl/INT33FF\:01/gpio-ranges 
GPIO ranges handled:
0: INT33FF:01 GPIOS [397 - 405] PINS [0 - 8]
9: INT33FF:01 GPIOS [406 - 418] PINS [15 - 27]
22: INT33FF:01 GPIOS [419 - 430] PINS [30 - 41]
34: INT33FF:01 GPIOS [431 - 442] PINS [45 - 56]
46: INT33FF:01 GPIOS [443 - 455] PINS [60 - 72]
# echo $((397+6)) > /sys/class/gpio/export 
[   46.731362] cherryview-pinctrl INT33FF:01: request pin 6 (GPIO_DFX_8) for INT33FF:01:403
Comment 27 Andy Shevchenko 2017-11-24 19:54:44 UTC
Created attachment 260829 [details]
Fix for wrong offset in /proc/interrupts

Guenter, please apply the fix, it should reveal the real problematic pins.
Comment 28 Dmitry Torokhov 2017-11-24 22:56:55 UTC
I was playing with the driver as well and I see that INT33FF:00 always reports pending 0x02 in interrupt routine. Trying to get mapping for the pin gives 0 and it never gets ACKed properly. If I roughly do:

if (!irq) {
        // reset corresponding INTSTAT bit
} else {
        generic_handle_irq(irq);
}

then it starts working much better.
Comment 29 Guenter Roeck 2017-11-25 04:35:36 UTC
I'll test the patch from #27 Monday - I don't have access to the system right now.
Comment 30 Mika Westerberg 2017-11-27 15:54:50 UTC
(In reply to Dmitry Torokhov from comment #28)
> I was playing with the driver as well and I see that INT33FF:00 always
> reports pending 0x02 in interrupt routine. Trying to get mapping for the pin
> gives 0 and it never gets ACKed properly. If I roughly do:
> 
> if (!irq) {
>         // reset corresponding INTSTAT bit
> } else {
>         generic_handle_irq(irq);
> }
> 
> then it starts working much better.

Unfortunately interrupt 0 is also valid so if anything is using that, it will suddenly stop working.

I think the best option is to identify the misconfigured pins (with the help of Andy's latest patc) and then update coreboot to make sure they don't generate interrupts (disable interrupt trigger flags for example from padcfg1).
Comment 31 Pierre Bossart 2017-11-27 17:31:02 UTC
IIRC, I saw the same interrupt storm issue when running Gallium on a Cyan device in legacy mode (which I understand as not using Coreboot), I could run maybe 5mn before the system ran out of disk space due to the system logs...
Comment 32 Guenter Roeck 2017-11-27 17:39:29 UTC
With the patch from #27 applied, plus debug log messages, I get the following:

[   45.198033] chv_gpio_irq_handler: 532644 callbacks suppressed
[   45.198036] cherryview-pinctrl INT33FF:01: irq 48: mapped to irq 0, offset -1
[   45.198048] cherryview-pinctrl INT33FF:00: irq 49: mapped to irq 0, offset -1
[   45.198067] cherryview-pinctrl INT33FF:01: irq 48: 20 12f
[   45.198070] cherryview-pinctrl INT33FF:00: irq 49: 2 6
[   45.198076] cherryview-pinctrl INT33FF:00: irq 49: mapped to irq 0, offset -1
[   45.198094] cherryview-pinctrl INT33FF:01: irq 48: mapped to irq 0, offset -1
[   45.198101] cherryview-pinctrl INT33FF:00: irq 49: 2 6
[   45.198108] cherryview-pinctrl INT33FF:00: irq 49: mapped to irq 0, offset -1
[   45.198127] cherryview-pinctrl INT33FF:00: irq 49: 2 6
[   45.198138] cherryview-pinctrl INT33FF:00: irq 49: mapped to irq 0, offset -1
[   45.198145] cherryview-pinctrl INT33FF:01: irq 48: 20 12f
[   45.198156] cherryview-pinctrl INT33FF:01: irq 48: mapped to irq 0, offset -1
[   45.198164] cherryview-pinctrl INT33FF:00: irq 49: 2 6
[   45.198169] cherryview-pinctrl INT33FF:00: irq 49: mapped to irq 0, offset -1
[   45.198188] cherryview-pinctrl INT33FF:00: irq 49: 2 6
[   45.198197] cherryview-pinctrl INT33FF:00: irq 49: mapped to irq 0, offset -1
[   45.198212] cherryview-pinctrl INT33FF:01: irq 48: 20 12f
[   45.198221] cherryview-pinctrl INT33FF:01: irq 48: mapped to irq 0, offset -1

Log messages generated with:

 	pending = readl(pctrl->regs + CHV_INTSTAT);
+	dev_info_ratelimited(pctrl->dev, "irq %d: %lx %x\n",
+		irq_desc_get_irq(desc), pending, readl(pctrl->regs + CHV_INTMASK));

The offending interrupts are no longer listed in /proc/interrupts, there is no log output unless I generate it myself.
Note that the patch passes the negative offset to irq_find_mapping() which doesn't accept negative values as parameter; this seems wrong.
Comment 33 Guenter Roeck 2017-11-27 18:01:40 UTC
The following code snippet solves (works around) the problem for me.

 	for_each_set_bit(intr_line, &pending, pctrl->community->nirqs) {
 		unsigned irq, offset;
 
 		offset = pctrl->intr_lines[intr_line];
+		if (offset == CHV_INVALID_HWIRQ) {
+			u32 value;
+
+			dev_warn_ratelimited(pctrl->dev,
+					     "Disabling chained interrupt line %d\n",
+					     intr_line);
+
+			value = readl(pctrl->regs + CHV_INTMASK);
+			value &= ~BIT(intr_line);
+			chv_writel(value, pctrl->regs + CHV_INTMASK);
+			chv_writel(BIT(intr_line), pctrl->regs + CHV_INTSTAT);
+			continue;
+		}
+

Resulting log output is:

[    0.187869] cherryview-pinctrl INT33FF:01: Disabling chained interrupt line 5
[    0.847305] cherryview-pinctrl INT33FF:01: Disabling chained interrupt line 0
[   19.164825] cherryview-pinctrl INT33FF:00: Disabling chained interrupt line 1

Would that be acceptable as add-on patch (maybe with a stronger error message) ?

Guenter
Comment 34 Mika Westerberg 2017-11-27 18:26:52 UTC
Hmm, the patch from Andy should expose the offending interrupts in /proc/interrupts.

Note, we can't just go an disable anything because the same thing controls direct interrupts to the IO-APIC.

I still think this is configuration issue and the coreboot you are running misconfigures pins to trigger interrupts for pins that are floating or so. That's why we need to identify those pins and then implement a fix to the correct place which I think is still coreboot, not the driver.

Did you already try to request GPIO 403 from sysfs (without any patches)? Does it change anything?
Comment 35 Guenter Roeck 2017-11-27 18:59:05 UTC
Snippet from /proc/interrupt with Andy's patch applied:

135:          1          0          0          0  chv-gpio   20  TS3A227E
 180:          0          0          0          0  chv-gpio    9  ACPI:Event
 182:          1          2          4          3  chv-gpio   11  i8042
 183:          0          0          0          0  chv-gpio   12  ELAN0000:00
 184:          0          0          0          0  chv-gpio   13  ELAN0001:00
 300:          0          0          0          0  chv-gpio   46  max98090_interrupt
 304:          0          0          0          0  chv-gpio   50  INT33BB:01 cd

Unfortunately I don't see the 'bad' interrupts.

Requesting gpio pin 403 with or without Andy's patch applied does not make a difference.

I understand that this is a (coreboot ?) configuration issue. I'd just like to have a workaround in place to at least have a working image. In that sense, Andy's patch makes the situation even worse: It generates an endless amount of interrupts with no notice, warning, or any other output whatsoever.
Comment 36 Mika Westerberg 2017-11-27 19:05:32 UTC
I think the easiest workaround is to add:

chv_writel(0, pctrl->regs + CHV_INTMASK);

to chv_gpio_probe() right before CHV_INTSTAT is cleared. That should mask all interrupts.
Comment 37 Guenter Roeck 2017-11-27 19:16:01 UTC
#36: Maybe, but that would revert bcb48cca23ec9 ("pinctrl: cherryview: Do not mask all interrupts in probe"). Presumably you had a reason for applying that patch, which makes me hesitate to revert it.

On another note, what happens if the BIOS/coreboot enables interrupts but Linux doesn't load/support the necessary driver(s) ? Even if the coreboot problem is fixed, wouldn't we end up with the same problem ?
Comment 38 Dmitry Torokhov 2017-11-27 20:01:47 UTC
Created attachment 260905 [details]
Mask non-interrupt pins

So it looks like the pin that gives me trouble is pin 1 in SouthWest community, which is FST_SPI_D0. It is configured as a native function, not interrupt, but still is enabled in INTMASK register. If I mask all pins that are not configured as interrupts, then everything seems happy here.
Comment 39 Dmitry Torokhov 2017-11-27 20:05:15 UTC
Logs with my patch:

[    0.087998] cherryview-pinctrl INT33FF:00: chv_gpio_probe: pin 34 is not configured as interrupt, resetting interrupt mask
[    0.088024] cherryview-pinctrl INT33FF:00: chv_gpio_probe: pin 37 is not configured as interrupt, resetting interrupt mask
[    0.088072] cherryview-pinctrl INT33FF:00: chv_gpio_probe: pin 95 is not configured as interrupt, resetting interrupt mask
[    0.088713] cherryview-pinctrl INT33FF:01: chv_gpio_probe: pin 0 is not configured as interrupt, resetting interrupt mask
[    0.088749] cherryview-pinctrl INT33FF:01: chv_gpio_probe: pin 15 is not configured as interrupt, resetting interrupt mask
[    0.089921] cherryview-pinctrl INT33FF:03: chv_gpio_probe: pin 0 is not configured as interrupt, resetting interrupt mask
Comment 40 Guenter Roeck 2017-11-27 20:58:56 UTC
Patch in #39 works for me. Log output on my system:

[    0.172618] cherryview-pinctrl INT33FF:00: chv_gpio_probe: pin 34 is not configured as interrupt, resetting interrupt mask
[    0.172631] cherryview-pinctrl INT33FF:00: chv_gpio_probe: pin 37 is not configured as interrupt, resetting interrupt mask
[    0.172681] cherryview-pinctrl INT33FF:00: chv_gpio_probe: pin 95 is not configured as interrupt, resetting interrupt mask
[    0.177315] cherryview-pinctrl INT33FF:01: chv_gpio_probe: pin 0 is not configured as interrupt, resetting interrupt mask
[    0.177344] cherryview-pinctrl INT33FF:01: chv_gpio_probe: pin 6 is not configured as interrupt, resetting interrupt mask
[    0.177363] cherryview-pinctrl INT33FF:01: chv_gpio_probe: pin 15 is not configured as interrupt, resetting interrupt mask
[    0.177381] cherryview-pinctrl INT33FF:01: chv_gpio_probe: pin 18 is not configured as interrupt, resetting interrupt mask
[    0.177391] cherryview-pinctrl INT33FF:01: chv_gpio_probe: pin 19 is not configured as interrupt, resetting interrupt mask
[    0.177406] cherryview-pinctrl INT33FF:01: chv_gpio_probe: pin 21 is not configured as interrupt, resetting interrupt mask
[    0.185332] cherryview-pinctrl INT33FF:03: chv_gpio_probe: pin 0 is not configured as interrupt, resetting interrupt mask

and no more spurious interrupts. This is with "pinctrl: cherryview: Adjust interrupt line mapping" applied as well.

That won't solve the other problem I mentioned, though: If Linux doesn't support all required drivers for which interrupts are enabled by the BIOS or coreboot, we might be back to having an interrupt storm.

This leads to a follow-up question: Per commit bcb48cca23ec9, interrupts can not be masked because other entities, specifically "SCI and other direct interrupts", may need to have those interrupts to be enabled. If so, why don't the respective drivers request and enable those interrupts ?
Comment 41 Pierre Bossart 2017-11-28 01:20:44 UTC
(In reply to Guenter Roeck from comment #40)
> Patch in #39 works for me. Log output on my system:

> 
> and no more spurious interrupts. This is with "pinctrl: cherryview: Adjust
> interrupt line mapping" applied as well.

I can't see this patch in this bugzilla or the mainline, can you provide a pointer?
I need to run some audio tests on Cyan and would like to apply all suggested fixes.
Thanks!
Comment 42 Guenter Roeck 2017-11-28 02:11:24 UTC
#41: The patch is titled " Fix for wrong offset in /proc/interrupts " in attachments.
Comment 43 Dmitry Torokhov 2017-11-28 04:20:49 UTC
Guenter meant "Mask non-interrupt pins".
Comment 44 Mika Westerberg 2017-11-28 07:48:17 UTC
(In reply to Guenter Roeck from comment #40)
> That won't solve the other problem I mentioned, though: If Linux doesn't
> support all required drivers for which interrupts are enabled by the BIOS or
> coreboot, we might be back to having an interrupt storm.

The way it works (if using direct interrupts) is that the GPIO hardware needs to have the interrupt enabled in INTMASK and the it needs to be routed to the IO-APIC (there is a bit somewhere that BIOS/coreboot sets). Once properly done, the interrupt is routed to the IO-APIC where it is masked by default. The GPIO driver never gets interrupted by those. This means that we don't get spurious interrupts at all and drivers using the IO-APIC interrupt then enables the interrupt when needed using standard APIs.

At least that's how I remember it worked.

So it is important that the coreboot/BIOS configures all pins and masks properly to prevent any spurious interrupts.

> This leads to a follow-up question: Per commit bcb48cca23ec9, interrupts can
> not be masked because other entities, specifically "SCI and other direct
> interrupts", may need to have those interrupts to be enabled. If so, why
> don't the respective drivers request and enable those interrupts ?

It depends on the system. Your system may not use SCI at all (there is "acpi" entry in /proc/interrupts). SCI for one is requested and enabled using standard APIs.
Comment 45 Mika Westerberg 2017-11-28 12:02:24 UTC
(In reply to Mika Westerberg from comment #44)
> (In reply to Guenter Roeck from comment #40)
> > That won't solve the other problem I mentioned, though: If Linux doesn't
> > support all required drivers for which interrupts are enabled by the BIOS
> or
> > coreboot, we might be back to having an interrupt storm.
> 
> The way it works (if using direct interrupts) is that the GPIO hardware
> needs to have the interrupt enabled in INTMASK and the it needs to be routed
> to the IO-APIC (there is a bit somewhere that BIOS/coreboot sets). Once
> properly done, the interrupt is routed to the IO-APIC where it is masked by
> default. The GPIO driver never gets interrupted by those. This means that we
> don't get spurious interrupts at all and drivers using the IO-APIC interrupt
> then enables the interrupt when needed using standard APIs.
> 
> At least that's how I remember it worked.

It seems my memory about this is bit rusty :/ Checked from the documentation and if I understand it correctly the GPIO driver always gets interrupt when any of the pins are configured to trigger an interrupt with the exception of GPEs (there is another register GPIO_ROUT in PCU/PMC that controls whether GPE or SMI is generated).

So for things to work properly the BIOS/coreboot should pass GPIOs to drivers using ACPI Gpio* resources and by default make sure they are masked.
Comment 46 Guenter Roeck 2017-11-28 18:00:05 UTC
So is Dmitry's patch an acceptable workaround ?
Comment 47 Mika Westerberg 2017-11-28 18:34:57 UTC
(In reply to Guenter Roeck from comment #46)
> So is Dmitry's patch an acceptable workaround ?

You mean for the mainline Linux? No, I think coreboot should be fixed instead.
Comment 48 Dmitry Torokhov 2017-11-28 19:04:25 UTC
This is ridiculous. We essentially broke shipped hardware (Cyan used to work fine) and now require BIOS update to address a regression? While new BIOS is a possibility I have no idea how you expect to communicate this to users who do not run ChromeOS and thus will not get updated firmware automatically.
Comment 49 Dmitry Torokhov 2017-11-28 19:18:56 UTC
That said, the patch above is broken; I did not actually read interrupt config before checking and masking it :/
Comment 50 Mika Westerberg 2017-11-28 19:26:11 UTC
(In reply to Dmitry Torokhov from comment #48)
> This is ridiculous. We essentially broke shipped hardware (Cyan used to work
> fine) and now require BIOS update to address a regression? While new BIOS is
> a possibility I have no idea how you expect to communicate this to users who
> do not run ChromeOS and thus will not get updated firmware automatically.

What? Please point me to the regression we caused by a kernel change and we will fix it. Which commit broke this device we are talking about here?

My understanding was that this is *new* hardware based on CYAN design and the GPIO has never worked any better there. Please correct me if I'm mistaken.
Comment 51 Dmitry Torokhov 2017-11-28 19:29:25 UTC
No, we are talking about bog standard Acer Chromebook R11 (Cyan) with mainline kernel. The patch:

ommit bcb48cca23ec9852739e4a464307fa29515bbe48
Author: Mika Westerberg <mika.westerberg@linux.intel.com>
Date:   Mon Aug 22 14:42:52 2016 +0300

    pinctrl: cherryview: Do not mask all interrupts in probe

breaks it badly.
Comment 52 Andy Shevchenko 2017-11-28 19:33:47 UTC
I'm in the middle of reading one document we never saw before (I will discuss my findings with Mika tomorrow) and I can just summarize that Mika's patch *reveals* the issue in Coreboot. In similar way as previously with that absolute IRQ numbering scheme for keyboard.
Comment 53 Dmitry Torokhov 2017-11-28 19:42:42 UTC
It does not matter what the patch reveals, we still have to deal with it. The hardware has been in users hands for couple of years now.
Comment 54 Andy Shevchenko 2017-11-28 19:44:47 UTC
Something is telling me we will anticipate more DMI quirks for those broken platforms.
Comment 55 Mika Westerberg 2017-11-28 19:59:53 UTC
Created attachment 260915 [details]
Mask all interrupts the community is able to generate (not GPEs)

Can you try the attached (untested) patch? Here we mask all interrupts the community can generate but do not touch GPEs. That should still keep GPEs running and at the same time prevent the interrupt storm. Pretty much similar to Dmitry's but more brutal :)
Comment 56 Dmitry Torokhov 2017-11-28 20:07:12 UTC
Andy, you were actually right, it was PCIE_CLKREQ3B/AUDIO_CODEC_IRQ pin that was set up as interrupt and was not masked by coreboot for some reason. I think the best way forward is to mask all interrupts Strago boards with 1.0 BIOS version and hopefully we'll sort it out in updated coreboot at some time.
Comment 57 Andy Shevchenko 2017-11-29 15:59:43 UTC
Created attachment 260931 [details]
Adjust interrupt line mapping (v2)

This should eventually unveil the culprit pins.
Guenter, can you please test this one (w/o any mask patch applied!)?
Comment 58 Guenter Roeck 2017-11-29 20:08:49 UTC
With added debug info, the patch from #57 results in:

[  171.732040] chv_gpio_irq_handler: 507297 callbacks suppressed
[  171.732052] cherryview-pinctrl INT33FF:00: 2 e
[  171.732058] cherryview-pinctrl INT33FF:01: intr_line=5: No irq mapping found
[  171.732098] cherryview-pinctrl INT33FF:01: 20 12f
[  171.732111] cherryview-pinctrl INT33FF:01: intr_line=5: No irq mapping found
[  171.732125] cherryview-pinctrl INT33FF:00: intr_line=1: No irq mapping found
[  171.732153] cherryview-pinctrl INT33FF:00: 2 e
[  171.732168] cherryview-pinctrl INT33FF:01: 20 12f
[  171.732180] cherryview-pinctrl INT33FF:01: intr_line=5: No irq mapping found
[  171.732201] cherryview-pinctrl INT33FF:00: intr_line=1: No irq mapping found
[  171.732225] cherryview-pinctrl INT33FF:01: 20 12f
[  171.732237] cherryview-pinctrl INT33FF:00: 2 e
[  171.732248] cherryview-pinctrl INT33FF:00: intr_line=1: No irq mapping found
[  171.732267] cherryview-pinctrl INT33FF:01: intr_line=5: No irq mapping found
[  171.732278] cherryview-pinctrl INT33FF:00: 2 e
[  171.732289] cherryview-pinctrl INT33FF:00: intr_line=1: No irq mapping found
[  171.732318] cherryview-pinctrl INT33FF:00: 2 e
[  171.732340] cherryview-pinctrl INT33FF:01: 20 12f
[  171.732345] cherryview-pinctrl INT33FF:00: intr_line=1: No irq mapping found
[  171.732371] cherryview-pinctrl INT33FF:01: intr_line=5: No irq mapping found
[  171.732376] cherryview-pinctrl INT33FF:00: 2 e

which probably match the two pins mentioned by Dmitry in #56. 

The patch in #55 should be BIT(pctrl->community->nirqs). Everything looks good with both #55 (fixed) and #57 applied.
Comment 59 Andy Shevchenko 2017-11-30 01:37:59 UTC
Pins should be visible through/prof/interrupts. Can you share what is there now?
Comment 60 Mika Westerberg 2017-11-30 11:43:03 UTC
Created attachment 260945 [details]
Quirk masking all interrupts in Intel_Strago based systems

This is an updated patch. Now we limit interrupt masking to Intel_Strago based systems.

Can you test that it still works?
Comment 61 Guenter Roeck 2017-11-30 16:43:37 UTC
#59: No, the pins are not there. This is what I have:

 135:          0          0          0          0  chv-gpio   20  TS3A227E
 180:          0          0          0          0  chv-gpio    9  ACPI:Event
 182:          3          2          2          2  chv-gpio   11  i8042
 183:          0          0          0          0  chv-gpio   12  ELAN0000:00
 184:          0          0          0          0  chv-gpio   13  ELAN0001:00
 300:          0          0          0          0  chv-gpio   46  max98090_interrupt
 304:          0          0          0          0  chv-gpio   50  INT33BB:01 cd
Comment 62 Guenter Roeck 2017-11-30 17:18:50 UTC
The patch in #60 doesn't work as-is.

need_valid_mask is true if there is no dmi match, so the patch should be "if (!need_valid_mask) { ... }". With that change made, it works.
Comment 63 Mika Westerberg 2017-12-01 10:33:56 UTC
(In reply to Guenter Roeck from comment #62)
> The patch in #60 doesn't work as-is.
> 
> need_valid_mask is true if there is no dmi match, so the patch should be "if
> (!need_valid_mask) { ... }". With that change made, it works.

Right, sorry about that.

I just sent a fixed version upstream.
Comment 64 Andy Shevchenko 2017-12-01 16:32:01 UTC
Created attachment 260979 [details]
Adjust interrupt line mapping (v3)

v3 of the patch, please, provide /proc/interrupts when it's storming.

P.S. When we fill the cache we need to actually map BIOS configured interrupt at ->probe() stage. Otherwise the condition in the IRQ handler just false for unconfigured ones. I guess I didn't miss again something.
Comment 65 Andy Shevchenko 2017-12-01 16:38:34 UTC
Created attachment 260981 [details]
260979: Adjust interrupt line mapping (v4)

Typo fixed, sorry. See comment #64 for details.
Comment 66 Guenter Roeck 2017-12-01 18:08:03 UTC
This is with the patch from #65 applied:

 135:          0          0          0          1  chv-gpio   20  TS3A227E
 180:          0          0          0          0  chv-gpio    9  ACPI:Event
 182:          2          3          4          1  chv-gpio   11  i8042
 183:          0          0          0          0  chv-gpio   12  ELAN0000:00
 184:          0          0          0          0  chv-gpio   13  ELAN0001:00
 300:          0          0          0          0  chv-gpio   46  max98090_interrupt
 304:          0          0          0          0  chv-gpio   50  INT33BB:01 cd

With added debug messages:

[  101.710013] chv_gpio_irq_handler: 524332 callbacks suppressed
[  101.710027] cherryview-pinctrl INT33FF:01: No interrupt handler for intr_line=5
[  101.710044] cherryview-pinctrl INT33FF:00: No interrupt handler for intr_line=1
[  101.710079] cherryview-pinctrl INT33FF:00: No interrupt handler for intr_line=1
[  101.710111] cherryview-pinctrl INT33FF:01: No interrupt handler for intr_line=5

Somehow those unassigned interrupts hide pretty well. Should I add some debug logging somewhere ?
Comment 67 Andy Shevchenko 2017-12-01 18:17:01 UTC
My gosh, in the line in probe it should be

 ctx->irq_line = value ? intsel : CHV_INVALID_HWIRQ;

(intsel instead of i)

Can you give a try?

Also, make sense to add these debug prints into handler (it seems you have this in your debug patch):

dev_info_ratelimited(pctrl->dev, "%s: intr_line %u\n", __func__, intr_line);

                for (offset = 0; offset < pctrl->community->npins; offset++) {
                        ctx = &pctrl->saved_pin_context[offset];
                        if (ctx->irq_line == intr_line) {
                                irq = irq_find_mapping(gc->irq.domain, offset);

dev_info_ratelimited(pctrl->dev, "%s: irq %u\n", __func__, irq);
Comment 68 Guenter Roeck 2017-12-01 18:57:55 UTC
This looks better:

168:     989950     982234     990177     982452  chv-gpio   53
 177:    1552615    1560150    1552186    1560079  chv-gpio    6
 180:          0          0          0          0  chv-gpio    9  ACPI:Event
 182:          4          3          6          5  chv-gpio   11  i8042
 183:          0          0          0          0  chv-gpio   12  ELAN0000:00
 184:          0          0          0          0  chv-gpio   13  ELAN0001:00
 300:          0          0          0          0  chv-gpio   46  max98090_interrupt
 304:          0          0          0          0  chv-gpio   50  INT33BB:01 cd

I had to ratelimit print_irq_desc() in handle_bad_irq() to be able to boot.
Comment 69 Andy Shevchenko 2017-12-01 19:11:42 UTC
Yes, now it looks as expected, eventually.

Pin 95 is #53 in the array. Though, Mika had sent a patch to eliminate that additional level of mapping and in new code it will be #95 in the /proc/interrupts. Because of his patch I need to rebase mine on top of his and send upstream.

For final testing I'm going to attach both patches here.
Comment 70 Andy Shevchenko 2017-12-01 19:27:55 UTC
Created attachment 260983 [details]
gpio / ACPI: Drop unnecessary ACPI GPIO to Linux GPIO translation

Mika's patch (in upstream, though no available via any public git so far)
Comment 71 Andy Shevchenko 2017-12-01 19:28:49 UTC
Created attachment 260985 [details]
Adjust interrupt line mapping (v5)

v5 of the patch. Dependent to "gpio / ACPI: Drop unnecessary ACPI GPIO to Linux GPIO translation".
Comment 72 Andy Shevchenko 2017-12-01 19:30:27 UTC
Guenter, thanks for you patience and this long run of tests!

I would really appreciate if you can perform a final test against attached patches.
Comment 73 Guenter Roeck 2017-12-01 21:44:48 UTC
@Andy: My pleasure. Thanks for working on the fix!

With both patches from #71 and #72 applied:

 149:          0          0          0          0  chv-gpio   34  TS3A227E
 168:    3984415    3924842    3924250    3983649  chv-gpio   53
 182:          0          0          0          0  chv-gpio   67  i8042
 219:    4475311    4535099    4535657    4476168  chv-gpio    6
 224:    4440692    4500476    4501026    4441557  chv-gpio   11
 225:          0          0          1          1  chv-gpio   12
 226:          2          2          3          3  chv-gpio   13
 228:          0          0          0          0  chv-gpio   15  ACPI:Event
 231:          0          0          0          0  chv-gpio   18  ELAN0000:00
 232:          0          0          0          0  chv-gpio   19  ELAN0001:00
 390:          0          0          0          0  chv-gpio   77  max98090_interrupt
 394:          0          0          0          0  chv-gpio   81  INT33BB:01 cd

So we now have five unhandled interrupts, three of which are stuck.

Again, I need

--- a/kernel/irq/handle.c
+++ b/kernel/irq/handle.c
@@ -30,7 +30,8 @@ void handle_bad_irq(struct irq_desc *desc)
 {
 	unsigned int irq = irq_desc_get_irq(desc);
 
-	print_irq_desc(irq, desc);
+	if (printk_ratelimit())
+		print_irq_desc(irq, desc);
 	kstat_incr_irqs_this_cpu(desc);
 	ack_bad_irq(irq);
 }

to reduce the flood of messages as seen in #1.

Interestingly, I don't see all extra interrupts if I also apply "pinctrl: cherryview: Mask all interrupts on Intel_Strago based systems". In that case, the output of /proc/interrupts is:

 149:          0          0          0          0  chv-gpio   34  TS3A227E
 182:          0          0          0          0  chv-gpio   67  i8042
 225:         22         25         26         24  chv-gpio   12
 228:          0          0          0          0  chv-gpio   15  ACPI:Event
 231:         22         25         26         24  chv-gpio   18  ELAN0000:00
 232:          0          0          0          0  chv-gpio   19  ELAN0001:00
 390:          0          0          0          0  chv-gpio   77  max98090_interrupt
 394:          0          0          0          0  chv-gpio   81  INT33BB:01 cd
Comment 74 Guenter Roeck 2017-12-01 23:32:07 UTC
Above comment should have been "With both patches from #70 and #71 applied". Sorry for the confusion.
Comment 75 Guenter Roeck 2017-12-01 23:40:49 UTC
I just noticed that there is something new:

[  102.830210] handle_bad_irq: 12 callbacks suppressed
[  102.830225] irq 225, desc: ffff880159bab5c8, depth: 1, count: 0, unhandled: 0
[  102.830261] ->handle_irq():  ffffffff9df093ae, 
[  102.830279] handle_bad_irq+0x0/0x2f9
[  102.830302] ->irq_data.chip(): ffffffff9eec6d00, 
[  102.830315] chv_gpio_irqchip+0x0/0x110
[  102.830339] ->action():           (null)
[  102.830354]    IRQ_NOPROBE set
[  102.830370] unexpected IRQ trap at vector e1

This is with "pinctrl: cherryview: Mask all interrupts on Intel_Strago based systems" applied in addition to the other two patches.
Comment 76 Andy Shevchenko 2017-12-02 10:08:19 UTC
v4 works (with intsel vs. i fix) with older code, v5 needs more to do since another translation gone there are gaps now and one of way to solve is to reintroduce it via struct saved_pin_context (add a pin number to it and one more condition to the handler). Another way is convert it to a radix tree.

Unfortunately I'll not able to look at this till week after next. It might be Mika will have time to see what we can do here.
Comment 77 Pierre Bossart 2018-01-05 20:30:40 UTC
Out of curiosity, has this been fixed upstream? I have the two patches from November 28 in my tree, wondering if I can replace them with something newer, ideally a cherry-pick from upstream so that this fix goes away when I update the kernel.
Thanks!
Comment 78 Andy Shevchenko 2018-01-05 20:33:45 UTC
Yes.

d2b3c353595a pinctrl: cherryview: Mask all interrupts on Intel_Strago based systems

11bca0a83f83 genirq: Guard handle_bad_irq log messages

One issue still left (not show stopper though) with wrong GPIO number in the /proc/interrupts when storm occurs.
Comment 79 Adam S Levy 2018-01-06 04:02:49 UTC
The keyboard on my Acer Chromebook cb3-431 still does not work with kernel 4.14.11. Has this been fixed upstream and will it be in a later release? Otherwise should I compile the latest kernel and submit a bug report?

Thank you
Comment 80 Pierre Bossart 2018-01-06 15:43:19 UTC
The second patch 11bca0a83f83 ("genirq: Guard handle_bad_irq log message") is not in stable as of 4.14.11, the first one is there - committed by GKH on 12/29. Are both required?
Comment 81 Guenter Roeck 2018-01-06 17:28:20 UTC
The second patch is just a backup and should not be required. I'll have to re-test  to check if I can reproduce the keyboard issue with chromeos-4.14. Maybe there is yet another problem.
Comment 82 Pierre Bossart 2018-01-09 00:04:10 UTC
Since I don't know what I am doing, I tested with all three patches and the keyboard works fine.

d2b3c353595a pinctrl: cherryview: Mask all interrupts on Intel_Strago based systems
11bca0a83f83 genirq: Guard handle_bad_irq log messages

and [PATCH v5] pinctrl: cherryview: Adjust interrupt line mapping (from the attachment). 

I'll try if the first one alone works fine.

Note You need to log in before you can comment on or make changes to this bug.