Bug 177311

Summary: crazy interrupt rate on i801_smbus
Product: Drivers Reporter: Conrad Kostecki (ck+kernelbugzilla)
Component: Hardware MonitoringAssignee: Jean Delvare (jdelvare)
Status: CLOSED CODE_FIX    
Severity: normal CC: andy.shevchenko, jarkko.nikula, linux, mika.westerberg, stephane.poignant
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.8.1 Subsystem:
Regression: No Bisected commit-id:
Attachments: cat /proc/interrupts
dmesg output
Debug patch for the i2c-i801 interrupts
Experimental patch disabling SMB_ALERT signal
2nd version of patch disabling SMB_ALERT signal

Description Conrad Kostecki 2016-10-12 20:59:43 UTC
I've noticed, when I do enable CONFIG_SENSORS_JC42 as a module or build into
my kernel, this causes a very high rate of interrupts on i801_smbus - about
6000-8000 per second according to /proc/interrupts. After 20 minutes, there
were about 5 million interrupts generated on i801_smbus.
 
When I do unload the module jc42, the interrupts do not stop, until I do a
complete reboot.
 
Mainboard: Supermicro A1SRM-2758F
Kernel: Gentoo-Sources 4.8.1 (Happens also with Vanilla 4.8.1 and older kernel
versions)
 
dmesg:
[    8.319900] i801_smbus 0000:00:1f.3: enabling device (0140 -> 0143)
[    8.321864] i801_smbus 0000:00:1f.3: SMBus using PCI interrupt
[    8.326098] ismt_smbus 0000:00:13.0: enabling device (0140 -> 0142)
 
lspci:
00:1f.3 SMBus: Intel Corporation Atom processor C2000 PCU SMBus (rev 02)

When the module is loaded, I am also getting this errors:
[   73.934901] ismt_smbus 0000:00:13.0: completion wait timed out
[   74.974970] ismt_smbus 0000:00:13.0: completion wait timed out
[   76.014949] ismt_smbus 0000:00:13.0: completion wait timed out
[   77.054903] ismt_smbus 0000:00:13.0: completion wait timed out
[   78.094961] ismt_smbus 0000:00:13.0: completion wait timed out
[   79.134982] ismt_smbus 0000:00:13.0: completion wait timed out
[   80.175116] ismt_smbus 0000:00:13.0: completion wait timed out
[   81.215057] ismt_smbus 0000:00:13.0: completion wait timed out
Comment 1 Conrad Kostecki 2016-10-12 21:00:53 UTC
The jc42 module seems to work, as lm_sensors do find the sensors, after loading it:

Galactica ~ # sensors
jc42-i2c-1-19
Adapter: SMBus I801 adapter at e000
temp1:        +30.8°C  (low  =  +0.0°C)                  ALARM (HIGH, CRIT)
                       (high =  +0.0°C, hyst =  +0.0°C)
                       (crit =  +0.0°C, hyst =  +0.0°C)

jc42-i2c-1-1a
Adapter: SMBus I801 adapter at e000
temp1:        +29.5°C  (low  =  +0.0°C)                  ALARM (HIGH, CRIT)
                       (high =  +0.0°C, hyst =  +0.0°C)
                       (crit =  +0.0°C, hyst =  +0.0°C)

jc42-i2c-1-18
Adapter: SMBus I801 adapter at e000
temp1:        +27.2°C  (low  =  +0.0°C)                  ALARM (HIGH, CRIT)
                       (high =  +0.0°C, hyst =  +0.0°C)
                       (crit =  +0.0°C, hyst =  +0.0°C)

jc42-i2c-1-1b
Adapter: SMBus I801 adapter at e000
temp1:        +28.2°C  (low  =  +0.0°C)                  ALARM (HIGH, CRIT)
                       (high =  +0.0°C, hyst =  +0.0°C)
                       (crit =  +0.0°C, hyst =  +0.0°C)
Comment 2 Guenter Roeck 2016-11-28 15:19:42 UTC
You need to set the temperature limits correctly. Without limits, the chips will persistently generate alarms which is the likely cause of the interrupts.

That won't solve the completion interrupt timeouts, though. That may be another problem.
Comment 3 Conrad Kostecki 2016-11-28 16:10:21 UTC
(In reply to Guenter Roeck from comment #2)
> You need to set the temperature limits correctly. Without limits, the chips
> will persistently generate alarms which is the likely cause of the
> interrupts.
> 
> That won't solve the completion interrupt timeouts, though. That may be
> another problem.

Hi!
Thanks for your answer. I've gave a try and set those limits, so sensors does not show any more ALARM. Seems not to be the cause, because after settings, the interrupts are still generated massivley..

jc42-i2c-1-1b
Adapter: SMBus I801 adapter at e000
RAM:          +30.0°C  (low  =  +0.0°C)
                       (high = +80.0°C, hyst = +80.0°C)
                       (crit = +80.0°C, hyst = +80.0°C)

jc42-i2c-1-19
Adapter: SMBus I801 adapter at e000
RAM:          +32.0°C  (low  =  +0.0°C)
                       (high = +80.0°C, hyst = +80.0°C)
                       (crit = +80.0°C, hyst = +80.0°C)
jc42-i2c-1-1a
Adapter: SMBus I801 adapter at e000
RAM:          +31.0°C  (low  =  +0.0°C)
                       (high = +80.0°C, hyst = +80.0°C)
                       (crit = +80.0°C, hyst = +80.0°C)

jc42-i2c-1-18
Adapter: SMBus I801 adapter at e000
RAM:          +28.0°C  (low  =  +0.0°C)
                       (high = +80.0°C, hyst = +80.0°C)
                       (crit = +80.0°C, hyst = +80.0°C)

Cheers
Conrad
Comment 4 Guenter Roeck 2016-11-28 17:40:22 UTC
Weird, especially since the chips should not generate interrupts in the first place unless it is explicitly enabled (which the driver doesn't do, or at least shouldn't do). My wild guess is that taking the chips out of shutdown mode for some reasons enables the interrupt.

Can you send the output of "i2cdump -y -f 1 0x18 w" ? Also, do the interrupts stop when you unload the driver ?

Thanks,
Guenter
Comment 5 Guenter Roeck 2016-11-28 17:41:34 UTC
Please forget the question about the unload, as you already answered it.
Comment 6 Conrad Kostecki 2016-11-28 17:43:33 UTC
(In reply to Guenter Roeck from comment #4)
> Weird, especially since the chips should not generate interrupts in the
> first place unless it is explicitly enabled (which the driver doesn't do, or
> at least shouldn't do). My wild guess is that taking the chips out of
> shutdown mode for some reasons enables the interrupt.
> 
> Can you send the output of "i2cdump -y -f 1 0x18 w" ?

Here we go:

╭─root@Galactica ~
╰─➤  i2cdump -y -f 1 0x18 w
     0,8  1,9  2,a  3,b  4,c  5,d  6,e  7,f
00: ef00 0000 0005 0000 0005 c801 1f00 0182
08: 0000 0000 0000 0000 0000 0000 0000 0000
10: 0000 0000 0000 0000 0000 0000 0000 0000
18: 0000 0000 0000 0000 0000 0000 0000 0000
20: 0000 0000 0000 0000 0000 0000 0000 0000
28: 0000 0000 0000 0000 0000 0000 0000 0000
30: 0000 0000 0000 0000 0000 0000 0000 0000
38: 0000 0000 0000 0000 0000 0000 0000 0000
40: 0000 0000 0000 0000 0000 0000 0000 0000
48: 0000 0000 0000 0000 0000 0000 0000 0000
50: 0000 0000 0000 0000 0000 0000 0000 0000
58: 0000 0000 0000 0000 0000 0000 0000 0000
60: 0000 0000 0000 0000 0000 0000 0000 0000
68: 0000 0000 0000 0000 0000 0000 0000 0000
70: 0000 0000 0000 0000 0000 0000 0000 0000
78: 0000 0000 0000 0000 0000 0000 0000 0000
80: 0000 0000 0000 0000 0000 0000 0000 0000
88: 0000 0000 0000 0000 0000 0000 0000 0000
90: 0000 0000 0000 0000 0000 0000 0000 0000
98: 0000 0000 0000 0000 0000 0000 0000 0000
a0: 0000 0000 0000 0000 0000 0000 0000 0000
a8: 0000 0000 0000 0000 0000 0000 0000 0000
b0: 0000 0000 0000 0000 0000 0000 0000 0000
b8: 0000 0000 0000 0000 0000 0000 0000 0000
c0: 0000 0000 0000 0000 0000 0000 0000 0000
c8: 0000 0000 0000 0000 0000 0000 0000 0000
d0: 0000 0000 0000 0000 0000 0000 0000 0000
d8: 0000 0000 0000 0000 0000 0000 0000 0000
e0: 0000 0000 0000 0000 0000 0000 0000 0000
e8: 0000 0000 0000 0000 0000 0000 0000 0000
f0: 0000 0000 0000 0000 0000 0000 0000 0000
f8: 0000 0000 0000 0000 0000 0000 0000 0000

>Also, do the interrupts stop when you unload the driver ?

No, they stop first, when I do a complete server reboot.
Comment 7 Conrad Kostecki 2016-11-28 17:46:49 UTC
Ah, forgot to add. Loading the old "eeprom"-module causes the same problem with the interrupts, see [1]. Maybe this is somehow connected?

[1] https://bugzilla.kernel.org/show_bug.cgi?id=177291
Comment 8 Guenter Roeck 2016-11-28 17:59:07 UTC
This is an Atmel AT30TS00. Per configuration register, events are disabled, and there is no event pending, meaning it should not really be the JC42s generating the interrupts.

Another question: If you only load the i801 module after boot (ie prevent the jc42 module from loading, eg by blacklisting it, but still load the i801 module), do you still get the interrupts ?

Thanks,
Guenter
Comment 9 Conrad Kostecki 2016-11-28 18:01:22 UTC
(In reply to Guenter Roeck from comment #8)
> Another question: If you only load the i801 module after boot (ie prevent
> the jc42 module from loading, eg by blacklisting it, but still load the i801
> module), do you still get the interrupts ?

That's my current situation ;-) jc42 is only a module, which is currently not being loaded at system startup and i801 is compiled into my kernel. In such case, zero interrupts are generated on i801_smbus.

Cheers
Conrad
Comment 10 Guenter Roeck 2016-11-28 18:07:07 UTC
#7 suggests a problem with the i801 driver and its interrupt handling. #9 contradicts that a bit, though.

Maybe the C2000 has problems with interrupts, or implements it differently than handled by the driver. This may be triggered by an actual access on the bus. You could try to confirm it by running the i2cdump command after booting without the jc42 module loaded (i2cdetect -y 1 should show no reserved addresses) and see if the interrupts start happening.

Thanks,
Guenter
Comment 11 Conrad Kostecki 2016-11-28 18:23:06 UTC
(In reply to Guenter Roeck from comment #10)
> #7 suggests a problem with the i801 driver and its interrupt handling. #9
> contradicts that a bit, though.
> 
> Maybe the C2000 has problems with interrupts, or implements it differently
> than handled by the driver. This may be triggered by an actual access on the
> bus. You could try to confirm it by running the i2cdump command after
> booting without the jc42 module loaded (i2cdetect -y 1 should show no
> reserved addresses) and see if the interrupts start happening.
> 
> Thanks,
> Guenter

You nail it ;-) Right after executing "i2cdump -y -f 1 0x18 w", the interrupts start massively. But jc42 wasn't loaded.

Cheers
Conrad
Comment 12 Conrad Kostecki 2016-11-28 18:32:34 UTC
Sorry, but I don't know, what do you mean here by reserved?

Before/after executing i2cdump (output is the same):

╭─root@Galactica ~
╰─➤  i2cdetect -y 1
     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
00:          -- -- -- -- -- 08 -- -- -- -- -- -- --
10: -- -- -- -- -- -- -- -- 18 19 1a 1b -- -- -- --
20: -- -- -- -- -- -- -- -- -- -- -- -- -- -- 2e --
30: 30 31 32 33 -- -- -- -- -- -- -- -- -- -- -- --
40: -- -- -- -- -- -- -- -- -- 49 -- -- -- -- -- --
50: 50 51 52 53 -- -- -- -- -- -- -- -- -- -- -- --
60: -- 61 -- -- -- -- -- -- -- 69 -- -- 6c -- -- --
70: -- -- -- -- -- -- -- --

A simple "i2cdetect -y 1" also triggers the interrupts.
Comment 13 Guenter Roeck 2016-11-28 18:44:12 UTC
With "reserved" I meant "a driver for a chip is loaded". After you load the jc42 driver (or the eeprom driver), you'll see that some of the addresses show up as "UU".

Anyway, I think the conclusion is that the i801 driver has problems with interrupt support on your hardware, as I suspected in #10. Issue #177291 is really the same problem. Jean maintains that driver as well, so he should be able to help.
Comment 14 Conrad Kostecki 2016-11-28 18:50:43 UTC
(In reply to Guenter Roeck from comment #13)
> With "reserved" I meant "a driver for a chip is loaded". After you load the
> jc42 driver (or the eeprom driver), you'll see that some of the addresses
> show up as "UU".

Ah I see. Yes, after loading jc42, I can see "UU".

╭─root@Galactica ~
╰─➤  i2cdetect -y 1
     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
00:          -- -- -- -- -- 08 -- -- -- -- -- -- --
10: -- -- -- -- -- -- -- -- UU UU UU UU -- -- -- --
20: -- -- -- -- -- -- -- -- -- -- -- -- -- -- 2e --
30: 30 31 32 33 -- -- -- -- -- -- -- -- -- -- -- --
40: -- -- -- -- -- -- -- -- -- 49 -- -- -- -- -- --
50: 50 51 52 53 -- -- -- -- -- -- -- -- -- -- -- --
60: -- 61 -- -- -- -- -- -- -- 69 -- -- 6c -- -- --
70: -- -- -- -- -- -- -- --

> Anyway, I think the conclusion is that the i801 driver has problems with
> interrupt support on your hardware, as I suspected in #10. Issue #177291 is
> really the same problem. Jean maintains that driver as well, so he should be
> able to help.

Should I close #177291 as a duplicate, as it's mine ticket.
Thanks for your support. Hope, Jean has an idea :)
Comment 15 Jean Delvare 2016-11-29 08:32:39 UTC
Thanks Guenter for stepping in. I always suspected the problem was with the SMBus controller (i2c-i801 driver) and I intended to comment about it long ago but then forgot, sorry about that :-(
Comment 16 Jean Delvare 2016-11-29 08:41:00 UTC
Conrad, I need detailed information about the SMBus PCI devices and the IRQs on your machine. Please attach the output of:

$ /sbin/lspci -nn | grep SMBus

$ /sbin/lspci -xxx -s <device>
(for each device listed above)

$ cat /proc/interrupts

Also look for any message related to i2c, SMBus, i801 or the PCI devices above in the kernel logs.
Comment 17 Conrad Kostecki 2016-11-29 08:57:43 UTC
Hello Jean!

(In reply to Jean Delvare from comment #16)
> $ /sbin/lspci -nn | grep SMBus

00:13.0 System peripheral [0880]: Intel Corporation Atom processor C2000 SMBus 2.0 [8086:1f15] (rev 02)
00:1f.3 SMBus [0c05]: Intel Corporation Atom processor C2000 PCU SMBus [8086:1f3c] (rev 02)
 
> $ /sbin/lspci -xxx -s <device>
> (for each device listed abov

╭─root@Galactica /home/kostecki  
╰─➤  lspci -xxx -s 00:13.0
00:13.0 System peripheral: Intel Corporation Atom processor C2000 SMBus 2.0 (rev 02)
00: 86 80 15 1f 46 05 10 00 02 00 80 08 00 00 00 00
10: 04 40 f1 ff 0f 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 d9 15 20 08
30: 00 00 00 00 40 00 00 00 00 00 00 00 ff 01 00 00
40: 10 80 92 00 01 80 00 10 20 08 04 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 10 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 01 8c 03 00 00 00 00 00 00 00 00 00 05 00 81 01
90: 0c f0 ef fe 00 00 00 00 a6 41 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 01 00 10 00 10 80
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

╭─root@Galactica /home/kostecki  
╰─➤  lspci -xxx -s 00:1f.3
00:1f.3 SMBus: Intel Corporation Atom processor C2000 PCU SMBus (rev 02)
00: 86 80 3c 1f 43 01 98 02 02 00 05 0c 00 00 00 00
10: 00 00 50 df 00 00 00 00 00 00 00 00 00 00 00 00
20: 01 e0 00 00 00 00 00 00 00 00 00 00 d9 15 20 08
30: 00 00 00 00 00 00 00 00 00 00 00 00 ff 02 00 00
40: 11 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 03 04 04 00 00 00 08 08 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 0f 02 01 03 03 03 00


> $ cat /proc/interrupts
 
See attachment.

> Also look for any message related to i2c, SMBus, i801 or the PCI devices
> above in the kernel logs.

╭─root@Galactica /
╰─➤  dmesg|grep -i smbus

[    7.968653] i801_smbus 0000:00:1f.3: enabling device (0140 -> 0143)
[    7.970338] i801_smbus 0000:00:1f.3: SMBus using PCI interrupt
[    7.974068] ismt_smbus 0000:00:13.0: enabling device (0140 -> 0142)
[  974.471917] ismt_smbus 0000:00:13.0: completion wait timed out
[  975.512022] ismt_smbus 0000:00:13.0: completion wait timed out
[  976.552097] ismt_smbus 0000:00:13.0: completion wait timed out
[  977.592124] ismt_smbus 0000:00:13.0: completion wait timed out
[  978.632168] ismt_smbus 0000:00:13.0: completion wait timed out
[  979.682207] ismt_smbus 0000:00:13.0: completion wait timed out
[  980.712251] ismt_smbus 0000:00:13.0: completion wait timed out
[  981.752310] ismt_smbus 0000:00:13.0: completion wait timed out

The timeout messages are only shown, when I do load jc42.
I am also attaching my whole dmesg.

Cheers
Conrad
Comment 18 Conrad Kostecki 2016-11-29 08:58:23 UTC
Created attachment 246221 [details]
cat /proc/interrupts
Comment 19 Conrad Kostecki 2016-11-29 08:58:35 UTC
Created attachment 246231 [details]
dmesg output
Comment 20 Jean Delvare 2016-11-29 10:45:16 UTC
Can you blacklist ismt-msi, reboot and see if it makes any difference?
Comment 21 Conrad Kostecki 2016-11-29 11:28:39 UTC
(In reply to Jean Delvare from comment #20)
> Can you blacklist ismt-msi, reboot and see if it makes any difference?

No, didn't changed anything. I've compiled a new kernel without ismt-msi (CONFIG_I2C_ISMT=n) and still after loading jc42 interrupts go very high.
Comment 22 Jean Delvare 2016-11-29 11:56:23 UTC
OK, thanks. I have added Intel folks to Cc. I can't find the register descriptions for the Atom C2000 SMBus function, so there's not so much I can do.

Conrad, support for the SMBus in this CPU family was added several years ago to the i2c-i801 driver, so I am wondering why this bug is only reported now.

Is this new hardware for you? Or you have it for some time, and it was working fine so far, and broke with a kernel or OS update?
Comment 23 Jarkko Nikula 2016-11-29 12:42:30 UTC
I found some datasheet through Avoton C2750
http://ark.intel.com/products/77987/Intel-Atom-Processor-C2750-4M-Cache-2_40-GHz
->
https://www-ssl.intel.com/content/dam/www/public/us/en/documents/datasheets/atom-c2000-microserver-datasheet.pdf

I guess both C2758 and C2750 are compatible as they are listed in C2000 Product Family for Communications.
Comment 24 Conrad Kostecki 2016-11-29 13:07:42 UTC
(In reply to Jean Delvare from comment #22)
> Is this new hardware for you? Or you have it for some time, and it was
> working fine so far, and broke with a kernel or OS update?

Yes, this is new hardware. I bought it a few weeks before starting this ticket. So I can't tell, if it was working before.

(In reply to Jarkko Nikula from comment #23)
> I found some datasheet through Avoton C2750
> http://ark.intel.com/products/77987/Intel-Atom-Processor-C2750-4M-Cache-2_40-
> GHz
> ->
> https://www-ssl.intel.com/content/dam/www/public/us/en/documents/datasheets/
> atom-c2000-microserver-datasheet.pdf
> 
> I guess both C2758 and C2750 are compatible as they are listed in C2000
> Product Family for Communications.

C2750 is with turbo boost, C2758 has instead of turbo boost a quickassist accelerator. (Don't know, if this makes a difference for the register)
Comment 25 Jean Delvare 2016-11-29 18:58:21 UTC
Jarkko, I found the same document, however it doesn't appear to contain register definitions, or I am blind.
Comment 26 Conrad Kostecki 2016-11-29 22:32:33 UTC
(In reply to Jean Delvare from comment #25)
> Jarkko, I found the same document, however it doesn't appear to contain
> register definitions, or I am blind.

Maybe chapter 15.8 and 18.5? Sorry, if that's wrong, as I don't know, if that's, what you are searching?
Comment 27 Guenter Roeck 2016-11-29 23:27:16 UTC
Problem is that only the register addresses are provided, not the register definitions. Sure, there is a status register, and we know its address, but we don't know how the bits are defined and if they are defined exactly like in other Intel CPUs.

With the C2000 being a different micro-architecture than the "mainline" Intel CPUs, there is a real possibility that the register definitions are different.
Comment 28 Jarkko Nikula 2016-11-30 07:36:06 UTC
Sorry, I looked at it too quickly. Indeed definitions are missing. I'll ask http://ark.intel.com/ is there more detailed datasheet available.
Comment 29 Jean Delvare 2016-11-30 08:07:57 UTC
Conrad, until we sort it out, you may be able to work around the problem by passing option disable_features=0x10 to the i2c-i801 driver.
Comment 30 Conrad Kostecki 2016-11-30 08:53:50 UTC
(In reply to Jean Delvare from comment #29)
> Conrad, until we sort it out, you may be able to work around the problem by
> passing option disable_features=0x10 to the i2c-i801 driver.

Hey Jean,
seems to help as a workaround after disabling the interrupts for i2c-i801.

[    7.950079] i801_smbus 0000:00:1f.3: Interrupt disabled by user
[    7.951624] i801_smbus 0000:00:1f.3: enabling device (0140 -> 0143)
[    7.953270] i801_smbus 0000:00:1f.3: SMBus using polling

Cheers
Conrad
Comment 31 Conrad Kostecki 2017-03-23 17:12:02 UTC
*** Bug 177291 has been marked as a duplicate of this bug. ***
Comment 32 Conrad Kostecki 2017-03-23 17:13:48 UTC
Any news for me? :)
Comment 33 Jean Delvare 2017-03-28 09:37:42 UTC
Jarkko, were you able to get your hands on a datasheet? It doesn't need to be public, if you can check the register definitions for us.
Comment 34 Jarkko Nikula 2017-03-28 10:29:48 UTC
I got one contact info back in December but no response. Maybe busy before holidays and I forgot to ping again. I'll ask again.
Comment 35 Conrad Kostecki 2017-05-07 22:08:40 UTC
(In reply to Jarkko Nikula from comment #34)
> I got one contact info back in December but no response. Maybe busy before
> holidays and I forgot to ping again. I'll ask again.

Did you got any reply?
Comment 36 Jarkko Nikula 2017-05-08 08:20:35 UTC
Just only out of office reply back in March but pinged again now.
Comment 37 Conrad Kostecki 2017-06-10 14:47:28 UTC
(In reply to Jarkko Nikula from comment #36)
> Just only out of office reply back in March but pinged again now.

And now? ;-)
Comment 38 Andy Shevchenko 2021-02-26 14:41:32 UTC
Hmm... Seems this one gets somehow abandoned. Jarkko, any news on this? Same question to Conrad, do you have any luck with v5.11 based kernels (or closer to latest)?
Comment 39 Conrad Kostecki 2021-02-26 16:36:38 UTC
(In reply to Andy Shevchenko from comment #38)
> Hmm... Seems this one gets somehow abandoned. Jarkko, any news on this? Same
> question to Conrad, do you have any luck with v5.11 based kernels (or closer
> to latest)?

Nope. No news. Problem still exists with latest kernel.
Comment 40 Jarkko Nikula 2021-03-01 13:16:37 UTC
Unfortunately I don't have any updates on this.
Comment 41 Andy Shevchenko 2021-06-08 08:32:57 UTC
This bug gives me an idea to try MSI on i801, but it appears that there is none of the platforms that have MSI capability on this device. Not sure if it's usable information, but I think it's better to share it anyway.
Comment 42 stephane.poignant 2021-10-09 11:22:50 UTC
Not sure that's completely related, but would assume at least partially.
I have two mini-servers, one with a Supermicro A2SDi-8C-HLN4F (Atom C3758), and the other one with an older Supermicro A1SRM-2758F (Atom C2758F).

I upgraded both from Debian Buster (kernel 4.19.194-3) to Bullseye (5.10.46-5). No issue on the C3758, but i was faced with severe performance regression on the C2758F.

When running 5.10 on the C2758F, /proc/interrupts shows about 100k interrupts per second for 'IO-APIC 18-fasteoi i801_smbus', and overall performance suffers a lot (e.g. iperf between two KVM virtual machines bridged together is 93% slower with 5.10 than with 4.19).

So far i was getting around the issue by blocklisting i2c_i801. After i found this, i tried adding the disable_features=0x10 option, and that worked too.

I'm not using jc42 at all, sensors thresholds are set to correct values by the distro tools.

# i2cdetect -l

# sensors
nvme-pci-0400
Adapter: PCI adapter
Composite:    +30.9°C  (low  = -273.1°C, high = +84.8°C)
                       (crit = +84.8°C)
Sensor 1:     +30.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +31.9°C  (low  = -273.1°C, high = +65261.8°C)

coretemp-isa-0000
Adapter: ISA adapter
Core 0:       +48.0°C  (high = +98.0°C, crit = +98.0°C)
Core 1:       +48.0°C  (high = +98.0°C, crit = +98.0°C)
Core 2:       +48.0°C  (high = +98.0°C, crit = +98.0°C)
Core 3:       +48.0°C  (high = +98.0°C, crit = +98.0°C)
Core 4:       +47.0°C  (high = +98.0°C, crit = +98.0°C)
Core 5:       +46.0°C  (high = +98.0°C, crit = +98.0°C)
Core 6:       +47.0°C  (high = +98.0°C, crit = +98.0°C)
Core 7:       +47.0°C  (high = +98.0°C, crit = +98.0°C)

# dmesg | egrep -i '(smbus|i801)'
[    2.226240] ismt_smbus 0000:00:13.0: enabling device (0000 -> 0002)
[    2.229927] i801_smbus 0000:00:1f.3: enabling device (0000 -> 0003)
[    2.230089] i801_smbus 0000:00:1f.3: SPD Write Disable is set
[    2.230136] i801_smbus 0000:00:1f.3: SMBus using PCI interrupt

~# lspci -nn | grep SMBus
00:13.0 System peripheral [0880]: Intel Corporation Atom processor C2000 SMBus 2.0 [8086:1f15] (rev 03)
00:1f.3 SMBus [0c05]: Intel Corporation Atom processor C2000 PCU SMBus [8086:1f3c] (rev 03)

# lspci -xxx -s 00:13.0
00:13.0 System peripheral: Intel Corporation Atom processor C2000 SMBus 2.0 (rev 03)
00: 86 80 15 1f 06 04 10 00 03 00 80 08 00 00 00 00
10: 04 70 31 df 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 d9 15 20 08
30: 00 00 00 00 40 00 00 00 00 00 00 00 ff 01 00 00
40: 10 80 92 00 01 80 00 10 20 08 04 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 10 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 01 8c 03 00 00 00 00 00 00 00 00 00 05 00 81 01
90: 04 00 e4 fe 00 00 00 00 21 40 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 01 00 10 00 10 80
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

# lspci -xxx -s 00:1f.3
00:1f.3 SMBus: Intel Corporation Atom processor C2000 PCU SMBus (rev 03)
00: 86 80 3c 1f 03 00 98 02 03 00 05 0c 00 00 00 00
10: 00 40 31 df 00 00 00 00 00 00 00 00 00 00 00 00
20: 01 e0 00 00 00 00 00 00 00 00 00 00 d9 15 20 08
30: 00 00 00 00 00 00 00 00 00 00 00 00 ff 02 00 00
40: 11 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 03 04 04 00 00 00 08 08 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 0f 02 01 03 03 03 00
Comment 43 Conrad Kostecki 2021-10-09 13:10:35 UTC
Yes, this is the same problem here. But Intel doesn't seem to be interessted here :-(
Comment 44 Jarkko Nikula 2021-10-11 13:07:56 UTC
(In reply to stephane.poignant from comment #42)
> I upgraded both from Debian Buster (kernel 4.19.194-3) to Bullseye
> (5.10.46-5). No issue on the C3758, but i was faced with severe performance
> regression on the C2758F.
> 
Interesting, so was the 4.19 working on the C2758F without interrupt storm?
Comment 45 stephane.poignant 2021-10-11 19:37:23 UTC
(In reply to Jarkko Nikula from comment #44)
> (In reply to stephane.poignant from comment #42)
> > I upgraded both from Debian Buster (kernel 4.19.194-3) to Bullseye
> > (5.10.46-5). No issue on the C3758, but i was faced with severe performance
> > regression on the C2758F.
> > 
> Interesting, so was the 4.19 working on the C2758F without interrupt storm?

I haven't checked the /proc/interrupts when running 4.19 so i cannot tell for sure that the interrupts were not there. The performance regression was not there for sure. I can check this in a couple of weeks (server at a remote location with no oobm network).

Dmesg when running 4.19 shows it had interrupts enabled:

[    0.000000] Linux version 4.19.0-17-amd64 (debian-kernel@lists.debian.org) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP Debian 4.19.194-3 (2021-07-18)
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-4.19.0-17-amd64 root=/dev/mapper/vg1--hrbpsrv01-h--hrbpsrv01 ro quiet rd.luks.options=discard
...
[    1.434097] Run /init as init process
[    1.782787] dca service started, version 1.12.1
[    1.783203] ismt_smbus 0000:00:13.0: enabling device (0000 -> 0002)
[    1.796694] cryptd: max_cpu_qlen set to 1000
[    1.801177] i801_smbus 0000:00:1f.3: enabling device (0000 -> 0003)
[    1.801317] i801_smbus 0000:00:1f.3: SPD Write Disable is set
[    1.801356] i801_smbus 0000:00:1f.3: SMBus using PCI interrupt
[    1.805199] igb: Intel(R) Gigabit Ethernet Network Driver - version 5.4.0-k
[    1.805202] igb: Copyright (c) 2007-2014 Intel Corporation.
[    1.805246] igb 0000:00:14.0: enabling device (0000 -> 0002)
[    1.816722] SSE version of gcm_enc/dec engaged.
...
Comment 46 Conrad Kostecki 2021-10-11 20:26:40 UTC
The problem do persists in kernel 4.19 and other versions. It only depens, if a different driver triggers the interrupts. If so, they are counting very high. So it's possible, that you had none driver in 4.19 using those interrupts and as a consequence, the bug did not trigger.

@Jarkko Nikula: Since you are still replying, could you please try again and further to get the needed docs, as requested by Jean Delvare?
Comment 47 Jarkko Nikula 2021-10-13 11:37:43 UTC
@Conrad Kostecki: Yeah, I agree with you it's unlikely problem was absent in 4.19 as it was present way before it.

I was in contact with our sales support and they told the Atom C2758 with F-postfix is custom to SuperMicro. Unfortunately they didn't find explicit specification for the SMBus controller on it but they told it's based on the same 22 nm Silvermonth architecture than the Bay Trail. I suppose SMBus IO should be compatible.

Unfortunately public datasheets for Bay Trails seems scarce too but I was able to find something when searching datasheets for the Bay Trail E3825 used in MinnowBoard Max. Following document seems to be available for the registered ark.intel.com user or by search engines:

"Intel Atom ® Processor E3800 Product Family" with Document Number: 538136 and Chapter 33 "PCU – System Management Bus (SMBus)"
Comment 48 Jarkko Nikula 2021-10-13 11:39:05 UTC
Created attachment 299193 [details]
Debug patch for the i2c-i801 interrupts
Comment 49 Jarkko Nikula 2021-10-13 11:44:31 UTC
Could you try attached patch what interrupt statuses it will print in case of interrupt storm? It's rate limited debug print so it shouldn't flood the dmesg.

You need to have CONFIG_DYNAMIC_DEBUG=y in your kernel config and either enable the debug print in runtime by following:

mount none /sys/kernel/debug -t debugfs
echo -n "func i801_isr +p" >/sys/kernel/debug/dynamic_debug/control

or by appending that to your kernel command line:
i2c_i801.dyndbg="func i801_isr +p"
Comment 50 Conrad Kostecki 2021-10-13 22:18:43 UTC
Here is the output:

pcicst 0x298, SMBHSTSTS 0x60
[  359.205884] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  359.205918] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  364.210031] i801_isr: 375367 callbacks suppressed
[  364.210043] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  364.210085] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  364.210126] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  364.210142] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  364.210178] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  364.210217] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  364.210234] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  364.210253] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  364.210292] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  364.210329] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  369.220035] i801_isr: 380909 callbacks suppressed
[  369.220047] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  369.220069] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  369.220109] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  369.220146] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  369.220185] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  369.220222] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  369.220262] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  369.220278] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  369.220317] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  369.220333] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  374.230078] i801_isr: 393736 callbacks suppressed
[  374.230109] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  374.230151] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  374.230191] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  374.230210] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  374.230248] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  374.230283] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  374.230297] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  374.230332] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  374.230345] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  374.230358] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  379.240037] i801_isr: 382705 callbacks suppressed
[  379.240068] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  379.240090] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  379.240110] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  379.240130] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  379.240150] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  379.240186] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  379.240205] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  379.240242] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  379.240281] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  379.240297] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  384.250032] i801_isr: 387109 callbacks suppressed
[  384.250043] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  384.250065] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  384.250104] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  384.250141] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  384.250181] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  384.250197] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  384.250216] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  384.250255] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  384.250292] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  384.250311] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60

$ cat /proc/interrupts
           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7
 18:          0          0          0   26596692          0          0          0          0   IO-APIC  18-fasteoi   i801_smbus
Comment 51 Jarkko Nikula 2021-10-14 10:58:00 UTC
Thanks. Those debug prints confirm the interrupt is really coming from the SMBus controller (bit 3 is set in PCI status) and the SMB alert bit is set.
Comment 52 Jarkko Nikula 2021-10-14 10:58:55 UTC
Created attachment 299201 [details]
Experimental patch disabling SMB_ALERT signal
Comment 53 Jarkko Nikula 2021-10-14 11:03:47 UTC
@Conrad Kostecki: Could you try does the attached experimental patch which disables the SMB_ALERT help here.
Comment 54 stephane.poignant 2021-10-14 20:10:57 UTC
Thanks for the follow up, i will test the patch on my setup as well by next week.
Comment 55 Conrad Kostecki 2021-10-14 20:53:07 UTC
I just tested the patch and can confirm, it works. After applying patch, interrupts dropped nearly to zero on i801_smbus.
Comment 56 Andy Shevchenko 2021-10-14 21:00:04 UTC
(In reply to Conrad Kostecki from comment #55)
> I just tested the patch and can confirm, it works. After applying patch,
> interrupts dropped nearly to zero on i801_smbus.

According to the specification the host (if implemented ALERT) should issue special byte read command to see which device wants to send something. If the proper implementation won't fix this, it might be some pin configuration issue (like pull down sitting on the respective pin) or PCB or firmware (BIOS) issues.
Would be nice to understand, if it can be done without much efforts, what's exactly is making the ALERT be asserted.
Comment 57 Jarkko Nikula 2021-10-15 08:04:32 UTC
I was thinking too should there be proper acknowledging for the SMB_ALERT but since the driver currently doesn't have support for it I wanted to see does simple disabling help.

Fortunately I was able to reproduce issue locally in an another platform where the SMB_ALERT was connected to a resistor and was able to pull-down the signal by a wire. Interrupt storm begins when the SMB_ALERT goes down for a moment and continues after.

I'll test a bit more and make a proper patch. One thing I'm wondering should the driver restore the original disable status on driver removal like what is done for host notify in i801_disable_host_notify().
Comment 58 Jarkko Nikula 2021-10-15 14:12:40 UTC
Created attachment 299217 [details]
2nd version of patch disabling SMB_ALERT signal

I moved the SMB_ALERT signal disabling to i801_enable_host_notify() since the SMBSLVCMD register is available on ICH3 and later. Also it keeps the original value prior to driver load.
Comment 59 Andy Shevchenko 2021-10-15 14:27:07 UTC
(In reply to Jarkko Nikula from comment #58)
> 2nd version of patch disabling SMB_ALERT signal

Side remark: Looking into this code, shouldn't you first clean current notifications and only after that enable IRQ?
Comment 60 Conrad Kostecki 2021-10-15 22:39:15 UTC
Patch v2 works for me. Interrupts still are fine and do not go crazy.
Comment 61 stephane.poignant 2021-10-16 00:41:12 UTC
I can confirm that i am getting the same results with the two patches on my setup with the Debian kernels.
Debug patch produces the same messages, and with SMB_ALERT disable patch there was no longer any interrupt triggered.

Also when booting into the previous kernel i was using (linux-image-4.19.0-17-amd64 4.19.194-3), the module loads with the default config but i am not getting any interrupt. So for my particular setup the issue only appeared after upgrading from Debian kernel 4.19 to 5.10.

Will test the second version of the patch ASAP and provide you with the results.


## Kernel 4.16

# uname -a
Linux hrbpsrv01.intra.lan 4.19.0-17-amd64 #1 SMP Debian 4.19.194-3 (2021-07-18) x86_64 GNU/Linux

# cat /proc/interrupts | grep i801
 18:          0          0          0          0          0          0          0          0   IO-APIC  18-fasteoi   i801_smbus

# dmesg
...
[ 6652.023634] i801_smbus 0000:00:1f.3: SPD Write Disable is set
[ 6652.023689] i801_smbus 0000:00:1f.3: SMBus using PCI interrupt
...


## Debian linux-image-5.10.0-9-amd64 (5.10.70-1) + Debug patch

# uname -a
Linux hrbpsrv01.intra.lan 5.10.0-9-amd64 #1 SMP Debian 5.10.70-1 (2021-09-30) x86_64 GNU/Linux

# cat /proc/interrupts | grep i801
 18:          0          0          0          0          0    7358862          0          0   IO-APIC  18-fasteoi   i801_smbus
(increase at about 100k interrupts/sec)

# dmesg
...
[  516.429120] i801_smbus 0000:00:1f.3: SPD Write Disable is set
[  516.429140] i801_smbus 0000:00:1f.3: An interrupt is pending!
[  516.429161] i801_smbus 0000:00:1f.3: SMBus using PCI interrupt
[  516.429933] i2c i2c-1: 4/4 memory slots populated (from DMI)
[  516.430337] at24 1-0050: supply vcc not found, using dummy regulator
[  516.431043] at24 1-0050: 256 byte spd EEPROM, read-only
[  516.431078] i2c i2c-1: Successfully instantiated SPD at 0x50
[  516.431455] at24 1-0051: supply vcc not found, using dummy regulator
[  516.432148] at24 1-0051: 256 byte spd EEPROM, read-only
[  516.432174] i2c i2c-1: Successfully instantiated SPD at 0x51
[  516.432576] at24 1-0052: supply vcc not found, using dummy regulator
[  516.433284] at24 1-0052: 256 byte spd EEPROM, read-only
[  516.433325] i2c i2c-1: Successfully instantiated SPD at 0x52
[  516.433748] at24 1-0053: supply vcc not found, using dummy regulator
[  516.434454] at24 1-0053: 256 byte spd EEPROM, read-only
[  516.434497] i2c i2c-1: Successfully instantiated SPD at 0x53
[  525.513104] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  525.513133] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  525.513161] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  525.513185] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  525.513209] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  525.513234] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  525.513258] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  525.513281] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  525.513316] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  525.513352] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  530.514207] i801_isr: 297603 callbacks suppressed
[  530.514221] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  530.514259] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  530.514299] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  530.514331] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  530.514366] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  530.514391] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  530.514425] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  530.514457] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  530.514482] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  530.514507] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  535.518261] i801_isr: 320308 callbacks suppressed
[  535.518273] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  535.518311] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  535.518337] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  535.518362] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  535.518386] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  535.518415] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  535.518442] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  535.518467] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  535.518491] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  535.518516] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
...


## Kernel 5.10 + Disable ALRM interrupt patch

# cat /proc/interrupts | grep i801
 18:          0          0          0          0          0   10567596          0          0   IO-APIC  18-fasteoi   i801_smbus
(no longer increase)

# dmesg
...
[  664.110013] i801_smbus 0000:00:1f.3: SPD Write Disable is set
[  664.110065] i801_smbus 0000:00:1f.3: SMBus using PCI interrupt
[  664.111975] i2c i2c-1: 4/4 memory slots populated (from DMI)
[  664.112460] at24 1-0050: supply vcc not found, using dummy regulator
[  664.113195] at24 1-0050: 256 byte spd EEPROM, read-only
[  664.113240] i2c i2c-1: Successfully instantiated SPD at 0x50
[  664.113657] at24 1-0051: supply vcc not found, using dummy regulator
[  664.114374] at24 1-0051: 256 byte spd EEPROM, read-only
[  664.114412] i2c i2c-1: Successfully instantiated SPD at 0x51
[  664.114823] at24 1-0052: supply vcc not found, using dummy regulator
[  664.116794] at24 1-0052: 256 byte spd EEPROM, read-only
[  664.116838] i2c i2c-1: Successfully instantiated SPD at 0x52
[  664.117288] at24 1-0053: supply vcc not found, using dummy regulator
[  664.118042] at24 1-0053: 256 byte spd EEPROM, read-only
[  664.118092] i2c i2c-1: Successfully instantiated SPD at 0x53
Comment 62 stephane.poignant 2021-10-16 15:20:42 UTC
Patch V2 works for me too.

# cat /proc/interrupts | grep i801
 18:          0          0          0          0          0          8          0          0   IO-APIC  18-fasteoi   i801_smbus
Comment 63 Jarkko Nikula 2021-10-18 14:07:09 UTC
(In reply to Andy Shevchenko from comment #59)
> (In reply to Jarkko Nikula from comment #58)
> > 2nd version of patch disabling SMB_ALERT signal
> 
> Side remark: Looking into this code, shouldn't you first clean current
> notifications and only after that enable IRQ?

That's a good question and made me debugging more. In fact disabling doesn't disable detection and SMBALERT_STS will be set and cause short burst of interrupts during driver load and unload time if SMB_ALERT signal was asserted. Looks like it's better to add basic acknowledging for it into i801_isr().

I'm not sure would clearing pending interrupts at the probe time cause any regression but acknowledging the SMBALERT_STS in i801_isr() makes sure the status doesn't stay forever if it occurs after probe.
Comment 64 Andy Shevchenko 2021-10-18 15:18:18 UTC
(In reply to Jarkko Nikula from comment #63)
> (In reply to Andy Shevchenko from comment #59)
> > (In reply to Jarkko Nikula from comment #58)
> > > 2nd version of patch disabling SMB_ALERT signal
> > 
> > Side remark: Looking into this code, shouldn't you first clean current
> > notifications and only after that enable IRQ?
> 
> That's a good question and made me debugging more. In fact disabling doesn't
> disable detection and SMBALERT_STS will be set and cause short burst of
> interrupts during driver load and unload time if SMB_ALERT signal was
> asserted. Looks like it's better to add basic acknowledging for it into
> i801_isr().
> 
> I'm not sure would clearing pending interrupts at the probe time cause any
> regression but acknowledging the SMBALERT_STS in i801_isr() makes sure the
> status doesn't stay forever if it occurs after probe.

It also makes sense to test it with DEBUG_SHIRQ enabled (yes, I know that more than a half of the drivers in the Linux kernel will either crash or behave badly on this, not many developers know about the debugging feature).
Comment 65 Jean Delvare 2021-12-03 08:31:27 UTC
This bug is believed to be fixed in kernel v5.16 by the following 2 commits:

commit 03a976c9afb5e3c4f8260c6c08a27d723b279c92
Author: Jarkko Nikula
Date:   Wed Nov 17 11:45:09 2021 +0200

    i2c: i801: Fix interrupt storm from SMB_ALERT signal

commit 9b5bf5878138293fb5b14a48a7a17b6ede6bea25
Author: Jean Delvare
Date:   Tue Nov 9 16:02:57 2021 +0100

    i2c: i801: Restore INTREN on unload
Comment 66 Conrad Kostecki 2022-01-14 17:30:33 UTC
Upgraded to kernel 5.16 today no more irq noise. Thank you!