Bug 211957

Summary: IRQ storm triggered by I2C touchpad on Tigerlake H
Product: Drivers Reporter: Kai-Heng Feng (kai.heng.feng)
Component: I2CAssignee: Jarkko Nikula (jarkko.nikula)
Status: RESOLVED INVALID    
Severity: normal CC: acelan, andy.shevchenko, jarkko.nikula, kaichuan.hsieh, mika.westerberg
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: mainline, linux-next Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg
lspci -vvnn
/proc/interrupts
acpidump
dmesg with debugs enabled
i2c designware controller keeps receiving interrupt
A test patch for change irq flag
acpidump of TPD0 resource template
Full dsdt.dsl

Description Kai-Heng Feng 2021-02-26 12:06:41 UTC
As soon as i2c-hid probes successfully, IRQ storm from INT34C6:00.

This is less likely a bug in touchpad, because the very same touchpad works fine on CML platform.
Comment 1 Kai-Heng Feng 2021-02-26 12:18:52 UTC
Created attachment 295475 [details]
dmesg
Comment 2 Kai-Heng Feng 2021-02-26 12:19:19 UTC
Created attachment 295477 [details]
lspci -vvnn
Comment 3 Kai-Heng Feng 2021-02-26 12:19:37 UTC
Created attachment 295479 [details]
/proc/interrupts
Comment 4 Andy Shevchenko 2021-02-26 14:44:47 UTC
Thanks for the report. I have a few questions here:
 - Since it's Dell machine, have you installed latest firmware on it?
 - Are you able to share also ACPI tables `acpidump -o dell-$MODEL.dat` (replace $MODEL with the actual one)?
 - Can you enable pin control and I²C HID debug? (`i2c_hid.debug=1` in tje kernel command line for the latter one and CONFIG_DEBUG_PINCTRL=y in the kernel configuration for the former one)

From what I see that the controller with IRQ storm is the second one (00:15.1) which has DELL0A66:00 without any sign of IRQ handler registered.

Also, check if bug #207189 has anything to do with your case (I think not, but just to be sure).
Comment 5 Andy Shevchenko 2021-02-26 14:50:50 UTC
(In reply to Andy Shevchenko from comment #4)
> From what I see that the controller with IRQ storm is the second one
> (00:15.1) which has DELL0A66:00 without any sign of IRQ handler registered.

Okay, I found a line in the /proc/interrupts. I also wondering what `apic=debug` will add and perhaps GPIO debug with CONFIG_DEBUG_GPIO=y.
Comment 6 Kai-Heng Feng 2021-02-26 16:33:33 UTC
Created attachment 295487 [details]
acpidump

The platform is not on the market yet, can't disclose its model name.
Comment 7 Kai-Heng Feng 2021-02-26 16:34:00 UTC
Created attachment 295489 [details]
dmesg with debugs enabled
Comment 8 Kai-Heng Feng 2021-02-26 16:35:18 UTC
(In reply to Andy Shevchenko from comment #4)
> Thanks for the report. I have a few questions here:
>  - Since it's Dell machine, have you installed latest firmware on it?
Yes, the firmware is latest.

>  - Are you able to share also ACPI tables `acpidump -o dell-$MODEL.dat`
> (replace $MODEL with the actual one)?
>  - Can you enable pin control and I²C HID debug? (`i2c_hid.debug=1` in tje
> kernel command line for the latter one and CONFIG_DEBUG_PINCTRL=y in the
> kernel configuration for the former one)
> 
> From what I see that the controller with IRQ storm is the second one
> (00:15.1) which has DELL0A66:00 without any sign of IRQ handler registered.
> 
> Also, check if bug #207189 has anything to do with your case (I think not,
> but just to be sure).

No, the platform doesn't have an Nvidia GPU.
Comment 9 Andy Shevchenko 2021-03-01 14:04:11 UTC
I don't see any smoking gun in the logs. The HID doesn't flood the logs, so there is no wrong communication with the touchpad. The Interrupt resource of the controller and its configuration seems sane. Is it possible that we have another peripheral connected to the same bus which has no driver / registered handler?

Can you run `i2cdetect -y 1` (choose the right bus number) and check what is connected there?

It might be some firmware issue as well, I would recommend to talk to Dell if they have any insights.

Also, we may try to debug interrupt handler of the I²C controller, by adding something like `dev_info_ratelimited(...);` there and print the IRQ status. I believe that driver recognizes the interrupt as one that doesn't belong to I²C controller.
Comment 10 Kai-Heng Feng 2021-03-02 03:20:09 UTC
$ sudo i2cdetect -y 1
Warning: Can't use SMBus Quick Write command, will skip some addresses
     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
00:                                                 
10:                                                 
20:                                                 
30: -- -- -- -- -- -- -- --                         
40:                                                 
50: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
60:                                                 
70:
Comment 11 Kai-Heng Feng 2021-03-02 07:33:08 UTC
The IRQs are indeed from i2c-hid, however the report read is 0, caused an early return in i2c_hid_get_input().

Asking vendors to provide more insights.
Comment 12 KaiChuan-Hsieh 2021-03-05 03:52:47 UTC
Created attachment 295663 [details]
i2c designware controller keeps receiving interrupt

I enable i2c_designware_master dyndbg log, and I can see it keeps receiving interrupt, even though there is no interaction with the touchpad.

There is only one slave touchpad attach to the controller. may I know if the interrupt is caused by touchpad transferred message?

The touchpad vendor said the touchpad keeps sending message because of host keeps sending read command to it, the data from touchpad perspective is dumped below, and I didn't see the i2c-command in i2c_hid log, is it sent by designware controller directly? 

Time [s],Packet ID,Address,Data,Read/Write,ACK/NAK
2.982085300000000,0,0x58,0x20,Write,ACK
2.982190100000000,0,0x58,0x00,Write,ACK
2.982406500000000,1,0x59,0x1E,Read,ACK
2.982504300000000,1,0x59,0x00,Read,ACK
2.982601800000000,1,0x59,0x00,Read,ACK
2.982715600000000,1,0x59,0x01,Read,ACK
2.982812900000000,1,0x59,0x39,Read,ACK
2.982910500000000,1,0x59,0x01,Read,ACK
2.983008900000000,1,0x59,0x02,Read,ACK
2.983107700000000,1,0x59,0x00,Read,ACK
2.983205200000000,1,0x59,0x03,Read,ACK
2.983302800000000,1,0x59,0x00,Read,ACK
2.983400500000000,1,0x59,0x0C,Read,ACK
2.983499000000000,1,0x59,0x00,Read,ACK
2.983596900000000,1,0x59,0x04,Read,ACK
2.983695800000000,1,0x59,0x00,Read,ACK
Comment 13 Kai-Heng Feng 2021-03-05 04:15:38 UTC
The i2c-hid driver keeps reading is because IRQ isn't de-asserted...
Comment 14 KaiChuan-Hsieh 2021-03-05 04:28:10 UTC
The IRQ should be de-asserted by touchpad or host?
Comment 15 Kai-Heng Feng 2021-03-05 04:31:40 UTC
HID Over I2C Protocol Specification, 6.1.3 Retrieval of Input Reports:
"If the DEVICE has no more Input Reports to send, it de-asserts the interrupt line."
Comment 16 Kai-Heng Feng 2021-03-05 04:32:44 UTC
However it still _could_ be an issue on intel-pinctrl, since the same touchpad works fine on older platforms.
Comment 17 Andy Shevchenko 2021-03-05 13:42:29 UTC
(In reply to Kai-Heng Feng from comment #16)
> However it still _could_ be an issue on intel-pinctrl, since the same
> touchpad works fine on older platforms.

It could be very well the IRQ line misconfiguration (edge vs. level, etc).
Comment 18 KaiChuan-Hsieh 2021-03-05 14:10:10 UTC
Created attachment 295675 [details]
A test patch for change irq flag

(In reply to Andy Shevchenko from comment #17)
> (In reply to Kai-Heng Feng from comment #16)
> > However it still _could_ be an issue on intel-pinctrl, since the same
> > touchpad works fine on older platforms.
> 
> It could be very well the IRQ line misconfiguration (edge vs. level, etc).

The vendor reply that the touchpad uses level trigger low to issue interrupt.
However, I try to modify the i2c_hid as attached patch, the IRQ storm is still happened. Do you have any suggestion to configure the the IRQ line correctly?

By the way, touchpad vendor has measured the interrput pin's signal, they say it keeps high when there is no interaction with the touchpad, but the i2c-designware.1's interrupt counts is still increasing.
Comment 19 Andy Shevchenko 2021-03-05 14:36:37 UTC
(In reply to KaiChuan-Hsieh from comment #18)
> Created attachment 295675 [details]
> A test patch for change irq flag

It is not correct. The IRQ we are talking about
a) doesn't have anything to do with pin control (it's IOxAPIC);
b) the modification in I²C HID driver obviously has no relation to the controller's IRQ handler.

> (In reply to Andy Shevchenko from comment #17)
> > (In reply to Kai-Heng Feng from comment #16)
> > > However it still _could_ be an issue on intel-pinctrl, since the same
> > > touchpad works fine on older platforms.
> > 
> > It could be very well the IRQ line misconfiguration (edge vs. level, etc).
> 
> The vendor reply that the touchpad uses level trigger low to issue interrupt.
> However, I try to modify the i2c_hid as attached patch, the IRQ storm is
> still happened. Do you have any suggestion to configure the the IRQ line
> correctly?

The question is why I²C controller got the interrupt flood.

> By the way, touchpad vendor has measured the interrput pin's signal, they
> say it keeps high when there is no interaction with the touchpad, but the
> i2c-designware.1's interrupt counts is still increasing.

I moved it to Jarkko and I²C subsystem, but it might be as well in HID driver something related to touchpad firmware (no clue here).
Comment 20 KaiChuan-Hsieh 2021-03-05 14:45:55 UTC
We only have a clue that the touchpad with the same fw/hardware have different result on TGL-H platform but manufactured by different ODM. One can't see IRQ storming, the other can see the IRQ storming. 

I already confirmed that the touchpad connect to the same pin's board name on their design, but I wonder if there is any instruction for them to check BIOS implementation for IRQ line configuration.

If you can help to suggest, it would be helpful.

Thanks,
Comment 21 KaiChuan-Hsieh 2021-03-08 11:18:35 UTC
Hello,

I checked the problem platform, its touchpad device irq in /proc/interrupts has type INT34C6:00, but on okay platform's touchpad device irq has type IR-IO-APIC.

The full name is like:

Fail: INT34C6:00  291         DELL0A69:00
Pass: IR-IO-APIC  96-fasteoi  DELL0A68:00

May I know what makes the interrupt become like this? I don't know if INT34C6:00 is a valid interrupt type. will it cause the host not using level trigger to wakeup host?

Thanks,
Comment 22 Andy Shevchenko 2021-03-08 12:08:46 UTC
(In reply to KaiChuan-Hsieh from comment #21)

> Fail: INT34C6:00  291         DELL0A69:00
> Pass: IR-IO-APIC  96-fasteoi  DELL0A68:00
> 
> May I know what makes the interrupt become like this? I don't know if
> INT34C6:00 is a valid interrupt type. will it cause the host not using level
> trigger to wakeup host?


The former one (Fail) is GPIO, while the latter is IOxAPIC. I just realized that BIOS may have a bug (no validation?) when it is using GpioInt() instead of Interrupt() in DSDT. It may be simply that the pin numbering is wrong in the ACPI. Note, ACPI expects GPIO # which differs to the actual pin number (thanks to Microsoft :-).
Comment 23 KaiChuan-Hsieh 2021-03-08 14:49:58 UTC
Created attachment 295729 [details]
acpidump of TPD0 resource template

Hello,

The TPD0 resource in acpidump is attached. And ODM confirm the hardware is connected to the board name shown below:

DATA: PCH: GPP_C18/I2C1_SDA
CKL: PCH: GPP_C19/I2C1_SCL
INT: PCH: GPP_E3/CPU_GP0

It seems it connects to GP0, and it is 0x0000 in the resource pin list, can you help to indicate what might be wrong. 

Thanks a lot,
Comment 24 KaiChuan-Hsieh 2021-03-08 14:51:30 UTC
Created attachment 295731 [details]
Full dsdt.dsl

Attach the full dsdt.dsl.
Comment 25 Andy Shevchenko 2021-03-08 15:16:43 UTC
(In reply to KaiChuan-Hsieh from comment #23)
> Created attachment 295729 [details]
> acpidump of TPD0 resource template
> 
> Hello,
> 
> The TPD0 resource in acpidump is attached. And ODM confirm the hardware is
> connected to the board name shown below:
> 
> DATA: PCH: GPP_C18/I2C1_SDA
> CKL: PCH: GPP_C19/I2C1_SCL
> INT: PCH: GPP_E3/CPU_GP0
> 
> It seems it connects to GP0, and it is 0x0000 in the resource pin list, can
> you help to indicate what might be wrong. 

Ha!
Seems somebody misinterpreted E and F (yes, capital letters are quite similar)

291 is GPP_F3! According to the above GPP_E3 must be 259. Seems like a BIOS bug.
Comment 26 Andy Shevchenko 2021-03-08 15:26:05 UTC
(In reply to KaiChuan-Hsieh from comment #23)

> It seems it connects to GP0, and it is 0x0000 in the resource pin list, can
> you help to indicate what might be wrong. 

There is GNUM() that is called on top of GPDI. GPDI is provided (filled) by BIOS.
Comment 27 KaiChuan-Hsieh 2021-03-08 15:44:11 UTC
Thanks for your reply. May I ask how did you know the current setting is 291 from the dsdt.dsl?
Comment 28 Andy Shevchenko 2021-03-08 16:01:50 UTC
(In reply to KaiChuan-Hsieh from comment #27)
> Thanks for your reply. May I ask how did you know the current setting is 291
> from the dsdt.dsl?

No, I may not know that from DSDT. The pure Linux logs and files is the key (see output of /proc/interrupts which you cited in comment #21).
Comment 29 KaiChuan-Hsieh 2021-03-08 16:15:32 UTC
ah, I see. thanks for you explanation. I'll request ODM to check.
Comment 30 KaiChuan-Hsieh 2021-03-10 03:33:16 UTC
(In reply to Andy Shevchenko from comment #28)
> (In reply to KaiChuan-Hsieh from comment #27)
> > Thanks for your reply. May I ask how did you know the current setting is
> 291
> > from the dsdt.dsl?
> 
> No, I may not know that from DSDT. The pure Linux logs and files is the key
> (see output of /proc/interrupts which you cited in comment #21).

Hello Andy,

Thanks for your help, ODM confirm that the touchpad interrupt works normal after setting PchI2cTouchPadIrqMode = 1. I'll close the bug.

Thanks,