Bug 207189 - i2c-designware triggers interrupts storm
Summary: i2c-designware triggers interrupts storm
Status: NEEDINFO
Alias: None
Product: Drivers
Classification: Unclassified
Component: I2C (show other bugs)
Hardware: Intel Linux
: P2 blocking
Assignee: Jarkko Nikula
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-04-10 04:20 UTC by Pengyu Ma
Modified: 2023-03-26 14:37 UTC (History)
5 users (show)

See Also:
Kernel Version: 5.0+
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg and interrupts (1.25 MB, application/gzip)
2020-04-10 04:20 UTC, Pengyu Ma
Details

Description Pengyu Ma 2020-04-10 04:20:16 UTC
Created attachment 288323 [details]
dmesg and interrupts

On ThinkPad P73/53, set "display -> discrete mode" in BIOS.
Then boot to any version of kernel with elan_i2c and nvidia_drm drivers loaded together.

i2c-designware will continue issue interrupts storm with error:
"i2c_dw_isr: i2c_designware i2c_designware.0: enabled=0xffffffff stat=0xffffffff"

The interrupts are repeated in speed 13000/sec, system is to busy to handle it.
CPU usage is 

even rmmod elan_i2c and i2c_i801 modules, the irqs don't stop.

This issue only happened on discrete gpu mode, with hybride gpu mode i2c works fine.

kernel and nvidia property drivers versions doesn't matter, issues can be reproduced.
Comment 1 Andy Shevchenko 2020-04-10 18:41:36 UTC
How i2c_i801 is related?
Isn't ELAN touchpad in I2C HID mode? If so, how elan_i2c is related?
Comment 2 Pengyu Ma 2020-04-12 14:54:56 UTC
Elan touchpad can work as PS/2 or i2c mode. 
i2c_i801 shouldn't be related, but when i2c_i801 is blacklist, elan_i2c will not be loaded.
ELAN touchpad works on PS/2 mode without i2c_i801 loaded.

When issue happened, Elan touchpad is working on elan_i2c driver.
It is not i2c-hid driver, it is a separated driver in driver/input/mouse/elan_i2c*.
If elan_i2c is loaded first, when nvidia_drm is loaded later, the interrupts storm begins.
Comment 3 Jarkko Nikula 2020-04-16 07:41:19 UTC
Pure speculation but I'm thinking is this some sort of IO-APIC misconfiguration? 

I see the spamming interrupt line is shared between i801_smbus and i2c_designware.0 but looks like neither of these drivers is handling it.

The debug print from drivers/i2c/busses/i2c-designware-master.c: i2c_dw_isr()

"i2c_dw_isr: i2c_designware i2c_designware.0: enabled=0xffffffff stat=0xffffffff"

shows the i2c-designware runtime PM state is off and it should not generate interrupts. So something else is keeping the shared IRQ line active.

Does the issue occur if elan_i2c and nvidia_drm drivers are blacklisted? I guess the interrupt still comes but worth to check.
Comment 4 Pengyu Ma 2020-04-17 00:58:29 UTC
Ether of elan_i2c or nvidia_drm are blacklisted, the other works fine.

The interrupts will not come out if elan_i2c is working as PS/2 mode.
When use nouveau the interrupts with elan_i2c will not come out too.
Comment 5 Andy Shevchenko 2020-12-09 16:05:50 UTC
Is it possible to test on v5.10-rc7 (as of today) and Linux next (latest available on the date of test) and confirm that the issue is still there?

Why asking? Because at least GPIO ACPI part has gained a lot of stuff regarding to debounce and bias configuration (via ACPI) which might affect behaviour of the touchpads (at least confirmed on some AMD machines).
Comment 6 JoLi 2022-02-26 15:46:43 UTC
I know that this thread now is a bit older,
but I have a Acer Aspire E5-575G, also with an ELAN touchpad and nvidia GPU.
When I look at "watch cat /proc/interrupts", I see that on one CPU Thread there are a LOT of interrupts (thousands per second). On the right it says idma64.0, i2c_designware.0, i801_smbus
interestingly this occurs with both modes of my touchpad (in bios Basic/Advanced mode)
I have kernel 5.16.11-zen1-1-zen, but this also occured on older kernels.
This Error occurs both in vanilla arch and Garuda Linux for me.
This Error leads to high CPU usage on that CPU Thread (about 80% in idle).
Comment 7 JoLi 2022-02-26 15:59:03 UTC
(In reply to JoLi from comment #6)
> I know that this thread now is a bit older,
> but I have a Acer Aspire E5-575G, also with an ELAN touchpad and nvidia GPU.
> When I look at "watch cat /proc/interrupts", I see that on one CPU Thread
> there are a LOT of interrupts (thousands per second). On the right it says
> idma64.0, i2c_designware.0, i801_smbus
> interestingly this occurs with both modes of my touchpad (in bios
> Basic/Advanced mode)
> I have kernel 5.16.11-zen1-1-zen, but this also occured on older kernels.
> This Error occurs both in vanilla arch and Garuda Linux for me.
> This Error leads to high CPU usage on that CPU Thread (about 80% in idle).

nvm, I disabled i2c_designware, but the error still occurs
Comment 8 Jarkko Nikula 2022-02-28 08:25:54 UTC
Hmm... I still have no better idea than 2 years ago but reminded the i801_smbus got two fixes for interrupt storm a few months ago and both are included in v5.16.

9b5bf5878138 ("i2c: i801: Restore INTREN on unload")
03a976c9afb5 ("i2c: i801: Fix interrupt storm from SMB_ALERT signal")

So doesn't look it explain the interrupt storm here (i2c_designware.0 and i801_smbus interrupts are shared here) but could you still try does "rmmod i2c_i801" do any difference?
Comment 9 Andy Shevchenko 2023-01-13 09:50:05 UTC
I'm about to close this bug since nobody appear to test and confirm the state.
So I leave it in need info state for a while (day or two) and then close.

P.S. Now is v6.1.
Comment 10 William 2023-03-26 14:37:12 UTC
Same here. Happens on kernels 5.14, 6.1 and 6.2. Tested booting with irqpoll, and tried rmmod i2c_i801. I get about 18k interrupts per second on IRQ 16 with the system idle. Touching the touchpad raise that number to +100k interrupts and after a few seconds, the kernel throws a stack trace and disable the IRQ. 

I noticed that IRQ 16 is shared with the NVMe controller.

# lspci -v  | grep -B 3 "IRQ 16"

00:15.0 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #0 (rev 21)
	Subsystem: Acer Incorporated [ALI] Device 115c
	Flags: bus master, fast devsel, latency 0, IRQ 16
--

00:1f.4 SMBus: Intel Corporation Sunrise Point-LP SMBus (rev 21)
	Subsystem: Acer Incorporated [ALI] Device 115c
	Flags: medium devsel, IRQ 16
--

02:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM961/PM961/SM963 (prog-if 02 [NVM Express])
	Subsystem: Samsung Electronics Co Ltd SM963 2.5" NVMe PCIe SSD
	Flags: bus master, fast devsel, latency 0, IRQ 16, NUMA node 0

---

[ 1530.412231] irq 16: nobody cared (try booting with the "irqpoll" option)
[ 1530.412236] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G            E      6.1.4-1.el9.elrepo.x86_64 #1
[ 1530.412238] Hardware name: Acer Aspire F5-573G/Captain_SK  , BIOS V1.27 05/26/2017
[ 1530.412239] Call Trace:
[ 1530.412241]  <IRQ>
[ 1530.412244]  dump_stack_lvl+0x45/0x5e
[ 1530.412248]  __report_bad_irq+0x35/0xa7
[ 1530.412250]  note_interrupt.cold+0xb/0x61
[ 1530.412253]  handle_irq_event+0x6e/0x70
[ 1530.412256]  handle_fasteoi_irq+0x90/0x1e0
[ 1530.412258]  __common_interrupt+0x69/0x110
[ 1530.412260]  common_interrupt+0xb3/0xd0
[ 1530.412263]  </IRQ>
[ 1530.412263]  <TASK>
[ 1530.412264]  asm_common_interrupt+0x22/0x40
[ 1530.412267] RIP: 0010:cpuidle_enter_state+0xde/0x410
[ 1530.412270] Code: 00 00 31 ff e8 53 ba 8a ff 45 84 ff 74 16 9c 58 0f 1f 40 00 f6 c4 02 0f 85 11 03 00 00 31 ff e8 48 7b 91 ff fb 0f 1f 44 00 00 <45> 85 f6 0f 88 6e 01 00 00 49 63 d6 4c 2b 2c 24 48 8d 04 52 48 8d
[ 1530.412272] RSP: 0018:ffffadf140113e80 EFLAGS: 00000246
[ 1530.412273] RAX: ffff8bd09ed80000 RBX: ffff8bd09edbca00 RCX: 000000000000001f
[ 1530.412275] RDX: 0000000000000003 RSI: ffffffff9db816fb RDI: ffffffff9db5c8c5
[ 1530.412275] RBP: 0000000000000004 R08: 0000016453ad58ce R09: 0000000000000018
[ 1530.412276] R10: 0000000000001c92 R11: 000000000000088b R12: ffffffff9e6b2b00
[ 1530.412277] R13: 0000016453ad58ce R14: 0000000000000004 R15: 0000000000000000
[ 1530.412280]  cpuidle_enter+0x29/0x40
[ 1530.412283]  cpuidle_idle_call+0x140/0x1d0
[ 1530.412286]  do_idle+0x7e/0xe0
[ 1530.412288]  cpu_startup_entry+0x19/0x20
[ 1530.412290]  start_secondary+0x112/0x130
[ 1530.412293]  secondary_startup_64_no_verify+0xe5/0xeb
[ 1530.412296]  </TASK>
[ 1530.412297] handlers:
[ 1530.412297] [<00000000a66cd0f8>] idma64_irq [idma64]
[ 1530.412304] [<000000004f4a84bd>] i2c_dw_isr [i2c_designware_core]
[ 1530.412309] Disabling IRQ #16

--

My dummy workaround is a script to monitor dmesg and reload the i2c_designware module

Note You need to log in before you can comment on or make changes to this bug.