Bug 213345 - i801_smbus: Timeout waiting for interrupt, driver can't access SMBus
Summary: i801_smbus: Timeout waiting for interrupt, driver can't access SMBus
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: I2C (show other bugs)
Hardware: All Linux
: P1 low
Assignee: Drivers/I2C virtual user
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-06-05 17:32 UTC by Johannes Penßel
Modified: 2025-05-07 13:06 UTC (History)
9 users (show)

See Also:
Kernel Version: 5.10.27-5.14-rc3
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg (76.95 KB, text/plain)
2021-06-05 17:32 UTC, Johannes Penßel
Details
DSDT.dsl (1.78 MB, text/x-csrc)
2021-08-22 11:10 UTC, Johannes Penßel
Details
dmidecode output (22.09 KB, text/plain)
2024-05-04 02:22 UTC, ruirui.yang
Details

Description Johannes Penßel 2021-06-05 17:32:15 UTC
Created attachment 297185 [details]
dmesg

Problem: On boot, i801_smbus reports these errors:
[    3.221208] i801_smbus 0000:00:1f.4: SPD Write Disable is set
[    3.221247] i801_smbus 0000:00:1f.4: SMBus using PCI interrupt

[    3.423566] i801_smbus 0000:00:1f.4: Timeout waiting for interrupt!
[    3.423569] i801_smbus 0000:00:1f.4: Transaction timeout
[    3.425564] i801_smbus 0000:00:1f.4: Failed terminating the transaction
[    3.425603] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!

Tested on Arch Linux / Gentoo, kernel 5.10.27, 5.12.x
HW: Lenovo Ideapad 5 15ITL05 i5-1135G7, latest firmware

Attempted fixes so far: disabling all other drivers accessing SMBus, using different SMBus access methods

lspci -k reports the driver loading correctly:
00:1f.4 SMBus: Intel Corporation Tiger Lake-LP SMBus Controller (rev 20)
        Subsystem: Lenovo Tiger Lake-LP SMBus Controller
        Kernel driver in use: i801_smbus
        Kernel modules: i2c_i801

lm_sensors seems to have issues detecting SMBus:
Found unknown SMBus adapter 8086:a0a3 at 0000:00:1f.4.

i2cdetect -F output:
Functionalities implemented by /dev/i2c-12:
I2C                              no
SMBus Quick Command              yes
SMBus Send Byte                  yes
SMBus Receive Byte               yes
SMBus Write Byte                 yes
SMBus Read Byte                  yes
SMBus Write Word                 yes
SMBus Read Word                  yes
SMBus Process Call               no
SMBus Block Write                yes
SMBus Block Read                 yes
SMBus Block Process Call         yes
SMBus PEC                        yes
I2C Block Write                  yes
I2C Block Read                   yes
Comment 1 Johannes Penßel 2021-06-05 17:37:45 UTC
Note: accessing the SMBus device via i2cdetect causes the flood of "i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!" error messages seen in dmesg.
Comment 2 Johannes Penßel 2021-08-22 11:10:26 UTC
Created attachment 298413 [details]
DSDT.dsl

DSDT.dsl file for Lenovo IdeaPad 5 15ITL05 extracted from the latest FHCN57WW firmware.
Comment 3 Johannes Penßel 2021-08-22 11:12:01 UTC
https://lore.kernel.org/linux-i2c/CAJCQCtTB+KW596A1Q+Ds6u9uvUrqeOSmer6qKv7g+xRYijGS3A@mail.gmail.com/

This report describes the same issue on a Thinkpad Carbon X1 7th gen running Linux 5.14-rc3. Maybe this is a Lenovo-specific problem?
Comment 4 Mateusz Jończyk 2022-01-09 18:48:45 UTC
Hello,

Did you try using a "i2c-i801.disable_features=0x10" Linux kernel cmdline option to disable usage of interrupts in the driver?

In the dmesg you have:
[    3.221208] i801_smbus 0000:00:1f.4: SPD Write Disable is set
[    3.221247] i801_smbus 0000:00:1f.4: SMBus using PCI interrupt
[    3.222484] i2c i2c-12: 2/2 memory slots populated (from DMI)
[...]
[    3.423566] i801_smbus 0000:00:1f.4: Timeout waiting for interrupt!
[    3.423569] i801_smbus 0000:00:1f.4: Transaction timeout
[    3.425564] i801_smbus 0000:00:1f.4: Failed terminating the transaction
[    3.425603] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!

It is possible that the SMBus hangs while trying to access SPD EEPROM chips located on RAM modules. So blacklisting the "eeprom" and "at24" modules may help here.

Why do you specifically want to access SMBus / the I2C bus? Is there a piece of hardware on the laptop that does not work?
Comment 5 Mateusz Jończyk 2022-01-09 18:59:52 UTC
Try also blacklisting the "i2c-i801" module and loading it manually ("modprobe i2c-i801 disable_features=0x10") after the laptop has finished booting. The problems may be caused by something other accessing the i2c bus during boot.
Comment 6 Luis Ortega 2024-01-21 15:28:46 UTC
Hello. I did the "boot with the cmdline option" thing and "blacklist and then modprobe" thing so far, and I've seen this pop up on dmesg

[  104.545429] i801_smbus 0000:00:1f.4: Interrupt disabled by user
[  104.545691] i801_smbus 0000:00:1f.4: SPD Write Disable is set
[  104.545714] i801_smbus 0000:00:1f.4: SMBus using polling
[  104.546865] i2c i2c-14: 2/2 memory slots populated (from DMI)
[  104.549764] iTCO_vendor_support: vendor-support=0
[  104.747810] i801_smbus 0000:00:1f.4: Transaction timeout
[  104.749912] i801_smbus 0000:00:1f.4: Failed terminating the transaction
[  104.749970] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  104.760558] iTCO_wdt iTCO_wdt: Found a Intel PCH TCO device (Version=6, TCOBASE=0x0400)
[  104.760629] iTCO_wdt iTCO_wdt: initialized. heartbeat=30 sec (nowayout=0)

I'll try blacklisting those other modules next
Comment 7 Luis Ortega 2024-01-21 15:36:31 UTC
It didn't do anything, and I noticed that those modules don't even get loaded by default (they're not in lsmod output on a normal boot)
Also, forgot to say, I have the exact same laptop, but with an i7-1165g7 instead.
Comment 8 ruirui.yang 2024-04-19 08:22:57 UTC
I got similar issue on thinkpad X1 gen9 with latest 6.9.0-rc4+
Git bisect the first bad commit is "13e3a512a29001c  i2c: smbus: Support up to 8 SPD EEPROMs
"

modprobe without param:
[ 1290.401393] i801_smbus 0000:00:1f.4: SPD Write Disable is set
[ 1290.401486] i801_smbus 0000:00:1f.4: SMBus using PCI interrupt
[ 1290.403340] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[ 1290.403383] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[ 1290.403410] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[ 1290.403437] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[ 1290.403465] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[ 1290.403492] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[ 1290.403519] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[ 1290.403546] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!

with param 
[ 1314.568785] i801_smbus 0000:00:1f.4: Interrupt disabled by user
[ 1314.568837] i801_smbus 0000:00:1f.4: SPD Write Disable is set
[ 1314.568894] i801_smbus 0000:00:1f.4: SMBus using polling
[ 1314.570230] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[ 1314.570257] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[ 1314.570283] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[ 1314.570310] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[ 1314.570336] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[ 1314.570362] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[ 1314.570389] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[ 1314.570415] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
Comment 9 Mateusz Jończyk 2024-04-25 19:36:10 UTC
Hello,

> I got similar issue on thinkpad X1 gen9 with latest 6.9.0-rc4+
> Git bisect the first bad commit is "13e3a512a29001c  i2c: smbus: Support up
> to 8 SPD EEPROMs
> "

Please report it to linux-i2c@vger.kernel.org while CCing the email addresses from description of this commit.

I suspect that
commit 13e3a512a29001c ("i2c: smbus: Support up to 8 SPD EEPROMs")
only triggered a problem that was present earlier. Does running i2cdetect on a kernel without this problem (like Linux 6.8) trigger this bug?

Greetings,
Mateusz
Comment 10 ruirui.yang 2024-05-01 07:25:36 UTC
Thanks for reporting this to mail list, I tried i2cdetect on old kernel, did not see errors. But I'm not familar with i2c, not sure which option do you want me to try. Anyway, here is some output.
# i2cdetect -l
i2c-0	i2c       	Synopsys DesignWare I2C adapter 	I2C adapter
i2c-1	smbus     	SMBus I801 adapter at efa0      	SMBus adapter
i2c-2	i2c       	i915 gmbus dpa                  	I2C adapter
i2c-3	i2c       	i915 gmbus dpb                  	I2C adapter
i2c-4	i2c       	i915 gmbus dpc                  	I2C adapter
i2c-5	i2c       	i915 gmbus tc1                  	I2C adapter
i2c-6	i2c       	i915 gmbus tc2                  	I2C adapter
i2c-7	i2c       	i915 gmbus tc3                  	I2C adapter
i2c-8	i2c       	i915 gmbus tc4                  	I2C adapter
i2c-9	i2c       	i915 gmbus tc5                  	I2C adapter
i2c-10	i2c       	i915 gmbus tc6                  	I2C adapter
i2c-11	i2c       	AUX A/DDI A/PHY A               	I2C adapter
i2c-12	i2c       	AUX USBC1/DDI TC1/PHY TC1       	I2C adapter
i2c-13	i2c       	AUX USBC2/DDI TC2/PHY TC2       	I2C adapter
i2c-14	i2c       	AUX USBC3/DDI TC3/PHY TC3       	I2C adapter
i2c-15	i2c       	AUX USBC4/DDI TC4/PHY TC4       	I2C adapter

# i2cdetect -y 11
     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
00:                         -- -- -- -- -- -- -- -- 
10: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
20: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
30: -- -- -- -- -- -- -- 37 -- -- -- -- -- -- -- -- 
40: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
50: 50 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
60: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
70: -- -- -- -- -- -- -- --

# i2cdetect -F 11
Functionalities implemented by /dev/i2c-11:
I2C                              yes
SMBus Quick Command              yes
SMBus Send Byte                  yes
SMBus Receive Byte               yes
SMBus Write Byte                 yes
SMBus Read Byte                  yes
SMBus Write Word                 yes
SMBus Read Word                  yes
SMBus Process Call               yes
SMBus Block Write                yes
SMBus Block Read                 yes
SMBus Block Process Call         yes
SMBus PEC                        yes
I2C Block Write                  yes
I2C Block Read                   yes

# uname -r
6.7.9-200.fc39.x86_64
Comment 11 Mateusz Jończyk 2024-05-01 08:18:19 UTC
Please check

i2cdetect 0
i2cdetect 1
dmidecode

How many RAM modules does this computer have?
Comment 12 Heiner Kallweit 2024-05-01 09:49:19 UTC
(In reply to ruirui.yang from comment #8)
> I got similar issue on thinkpad X1 gen9 with latest 6.9.0-rc4+
> Git bisect the first bad commit is "13e3a512a29001c  i2c: smbus: Support up
> to 8 SPD EEPROMs
> "
> 

The mentioned commit showed up in 6.8 only, but the problem reports date back to at least 5.12. So I don't think this commit is the culprit.
Please bisect between last known good kernel and 5.12.
Comment 13 ruirui.yang 2024-05-03 13:59:55 UTC
6.6 kernel works fine.

# i2cdetect 0
WARNING! This program can confuse your I2C bus, cause data loss and worse!
I will probe file /dev/i2c-0.
I will probe address range 0x08-0x77.
Continue? [Y/n] Y
     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
00:                         -- -- -- -- -- -- -- -- 
10: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
20: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
30: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
40: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
50: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
60: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
70: -- -- -- -- -- -- -- --                         
# i2cdetect 1
Warning: Can't use SMBus Quick Write command, will skip some addresses
WARNING! This program can confuse your I2C bus, cause data loss and worse!
I will probe file /dev/i2c-1.
I will probe address range 0x08-0x77.
Continue? [Y/n] Y
     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
00:                                                 
10:                                                 
20:                                                 
30: -- -- -- -- -- -- -- --                         
40:                                                 
50: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
60:                                                 
70:
Comment 14 ruirui.yang 2024-05-04 02:22:56 UTC
Created attachment 306261 [details]
dmidecode output
Comment 15 ruirui.yang 2024-05-04 02:33:02 UTC
5.12 kernel works fine to me so I think it might be different problem although the symptom is similar.  I do not think I can bisect betwen 5.12 and 6.x since all works fine, the first bad commit is clear to me with my original bisect.
Comment 16 Heiner Kallweit 2024-05-04 08:10:01 UTC
According to the dmidecode output you have 8 memory slots, each populated with a 4GB DIMM. Is this correct?
Then with previous kernel versions you should have seen the warning "Systems with more than 4 memory slots not supported yet".
Comment 17 ruirui.yang 2024-05-04 08:21:45 UTC
It is hard to know, the laptop is a thinkpad x1 gen9, according to below article the memory is soldered to the motherboard. But it said that the memory works in quad-channel mode, so not sure if it is just 4 channels.  I have never seen the warnings, but I can have a quick test with 6.6 kernel.
https://laptopmedia.com/highlights/inside-lenovo-thinkpad-x1-carbon-9th-gen-disassembly-and-upgrade-options/
Comment 18 ruirui.yang 2024-05-04 08:23:40 UTC
Yes! I see the warnings with 6.6 kernel:
[    1.591594] i2c i2c-0: 8/8 memory slots populated (from DMI)
[    1.593214] i2c i2c-0: Systems with more than 4 memory slots not supported yet, not instantiating SPD
Comment 19 Mateusz Jończyk 2024-05-04 08:33:25 UTC
> --- Comment #13 from ruirui.yang@linux.dev ---
> 6.6 kernel works fine.
>
> # i2cdetect 0
> [...]
> # i2cdetect 1
> [...]

Are there any warning messages in dmesg generated while you run i2cdetect?
Comment 20 ruirui.yang 2024-05-04 08:36:34 UTC
Yes, there are something printed below:
[  777.551328] i801_smbus 0000:00:1f.4: Transaction timeout
[  777.553555] i801_smbus 0000:00:1f.4: Failed terminating the transaction
[  777.553674] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.553752] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.553824] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.553895] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.553964] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.554036] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.554108] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.554187] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.554264] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.554336] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.554408] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.554481] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.554552] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.554623] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.554695] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.554767] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.554838] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.554908] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.554980] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.555051] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.555122] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.555224] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.555302] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.555387] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.555463] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.555539] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.555614] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.555689] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.555765] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.555836] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.555908] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.555980] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.556050] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.556121] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.556193] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.556264] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.556336] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.556408] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.556478] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.556552] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.556624] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.556696] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.556766] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.556837] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.556909] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.556977] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.557049] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.557119] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.557189] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.557259] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.557331] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.557402] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.557471] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.557543] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.557614] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.557687] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.557759] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.557830] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.557899] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.557970] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.558041] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.558112] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.558183] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.558254] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.558325] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.558396] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.558467] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.558536] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.558585] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.558623] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.558658] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.558696] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.558731] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.558766] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.558801] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.558837] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.558872] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.558907] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.558942] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.558977] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.559012] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.559047] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.559082] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.559117] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.559152] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.559196] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.559233] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.559271] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.559306] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.559342] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.559378] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.559417] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.559455] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.559490] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.559525] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.559560] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.559596] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.559631] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.559666] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.559701] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.559736] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.559771] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.559806] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.559844] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.559879] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.559914] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.559949] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.559984] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.560019] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.560055] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  777.560090] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
Comment 21 ruirui.yang 2024-05-04 08:40:35 UTC
Hmm, comment #20 is a reply to comment #19, will use "reply" to quote the questions later.
Comment 22 Heiner Kallweit 2024-05-04 10:08:59 UTC
Seems that for whatever reason bit SMBHSTSTS_HOST_BUSY is constantly set.
It should be automatically cleared by the host after each SMBUS operation.

Could you please test the following?
It's no proper fix, it's just to test whether manually resetting the bit on i801 driver load helps.


diff --git a/drivers/i2c/busses/i2c-i801.c b/drivers/i2c/busses/i2c-i801.c
index a7c0c8710..a521bd4a3 100644
--- a/drivers/i2c/busses/i2c-i801.c
+++ b/drivers/i2c/busses/i2c-i801.c
@@ -1683,6 +1683,8 @@ static int i801_probe(struct pci_dev *dev, const struct pci_device_id *id)
 		outb_p(inb_p(SMBAUXCTL(priv)) &
 		       ~(SMBAUXCTL_CRC | SMBAUXCTL_E32B), SMBAUXCTL(priv));
 
+	outb_p(SMBHSTSTS_HOST_BUSY, SMBHSTSTS(priv));
+
 	/* Default timeout in interrupt mode: 200 ms */
 	priv->adapter.timeout = HZ / 5;
 
-- 
2.45.0
Comment 23 ruirui.yang 2024-05-04 11:03:04 UTC
(In reply to Heiner Kallweit from comment #22)
> Seems that for whatever reason bit SMBHSTSTS_HOST_BUSY is constantly set.
> It should be automatically cleared by the host after each SMBUS operation.
> 
> Could you please test the following?
> It's no proper fix, it's just to test whether manually resetting the bit on
> i801 driver load helps.

Hi, with the changes, I still got the SMBus busy messages below:
[~]$ uname -r
6.9.0-rc6+
[~]$ dmesg|grep i801
[    1.563819] i801_smbus 0000:00:1f.4: enabling device (0000 -> 0003)
[    1.565938] i801_smbus 0000:00:1f.4: SPD Write Disable is set
[    1.569900] i801_smbus 0000:00:1f.4: SMBus using PCI interrupt
[    1.776346] i801_smbus 0000:00:1f.4: Transaction timeout
[    1.780426] i801_smbus 0000:00:1f.4: Failed terminating the transaction
[    1.782933] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[    1.785568] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[    1.787947] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[    1.790513] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[    1.792779] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[    1.794917] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[    1.797100] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
Comment 24 Heiner Kallweit 2024-05-04 21:39:17 UTC
OK, so it can start a transaction, but transaction doesn't finish.
Looks like BIOS might have switched off some clock needed for SMBUS.
First (failed) SMBUS access attempt is when the ee1004 driver is probed.
Before the referenced change this driver wasn't instantiated (due to >4 memory slots).

Having said that I think that the SMBUS wasn't usable also before. You could check under 6.6 with i2cdetect, and whether you can access the SPD EEPROM's from userspace.

Simple workaround would be to blacklist ee1004 and/or i801.
Comment 25 Heiner Kallweit 2024-05-04 22:01:27 UTC
Following more than 3 yrs old post for a Gen8 model seems to indicate that SMBUS access on this machine type has been problematic for years.

https://forums.lenovo.com/topic/findpost/27/5048595/5181539

The bisect just points to the change that revealed the problem.
Comment 26 Luca Carlon 2025-05-04 11:06:57 UTC
Hello! I'm also experiencing an issue that may seem similar. I'm also on a brand new Lenovo P16 Gen2 with Intel i9.

The main symptom is that I was unable to hibernate/suspend the machine. The main problem is that driver spd5118 fails to suspend and aborts the operation. As discussed here https://lore.kernel.org/lkml/dmx2x5sziux7ubk5fcas2nmj4lt3vpalr5gc7qmmwq2megmp24@24vmehdkle3x/ the problem may be somehow related to the i2c communication.

This is what I see after the boot:

# dmesg -w | grep smbus
[    5.416242] i801_smbus 0000:00:1f.4: enabling device (0000 -> 0003)
[    5.416572] i801_smbus 0000:00:1f.4: SPD Write Disable is set
[    5.416607] i801_smbus 0000:00:1f.4: SMBus using PCI interrupt

If I try to list the busses I get:

# i2cdetect -l
i2c-0   i2c             Synopsys DesignWare I2C adapter         I2C adapter
i2c-1   smbus           SMBus I801 adapter at efa0              SMBus adapter
i2c-2   i2c             i915 gmbus dpa                          I2C adapter
i2c-3   i2c             i915 gmbus dpb                          I2C adapter
i2c-4   i2c             i915 gmbus dpc                          I2C adapter
i2c-5   i2c             i915 gmbus dpd                          I2C adapter
i2c-6   i2c             i915 gmbus tc1                          I2C adapter
i2c-7   i2c             AUX A/DDI A/PHY A                       I2C adapter
i2c-8   i2c             AUX C/DDI C/PHY C                       I2C adapter
i2c-9   i2c             AUX D/DDI D/PHY D                       I2C adapter

If I query the SMBus:

# i2cdetect -y 1
     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
00:                         -- -- -- -- -- -- -- --
10: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
20: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
30: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
40: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
50: UU UU -- -- -- -- -- -- -- -- -- -- -- -- -- --
60: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
70: -- -- -- -- -- -- -- --

I guess those two are the sensors in my two DDR5 modules. If I try to query those sensors I get:

spd5118-i2c-1-51
Adapter: SMBus I801 adapter at efa0
ERROR: Can't get value of subfeature temp1_lcrit_alarm: Can't read
ERROR: Can't get value of subfeature temp1_min_alarm: Can't read
ERROR: Can't get value of subfeature temp1_max_alarm: Can't read
ERROR: Can't get value of subfeature temp1_crit_alarm: Can't read
ERROR: Can't get value of subfeature temp1_min: Can't read
ERROR: Can't get value of subfeature temp1_max: Can't read
ERROR: Can't get value of subfeature temp1_lcrit: Can't read
ERROR: Can't get value of subfeature temp1_crit: Can't read
temp1:            N/A  (low  =  +0.0°C, high =  +0.0°C)
                       (crit low =  +0.0°C, crit =  +0.0°C)
[...]
spd5118-i2c-1-50
Adapter: SMBus I801 adapter at efa0
ERROR: Can't get value of subfeature temp1_lcrit_alarm: Can't read
ERROR: Can't get value of subfeature temp1_min_alarm: Can't read
ERROR: Can't get value of subfeature temp1_max_alarm: Can't read
ERROR: Can't get value of subfeature temp1_crit_alarm: Can't read
ERROR: Can't get value of subfeature temp1_min: Can't read
ERROR: Can't get value of subfeature temp1_max: Can't read
ERROR: Can't get value of subfeature temp1_lcrit: Can't read
ERROR: Can't get value of subfeature temp1_crit: Can't read
temp1:            N/A  (low  =  +0.0°C, high =  +0.0°C)
                       (crit low =  +0.0°C, crit =  +0.0°C)

At this point, dmesg immediately reports the same errors:

[  788.313930] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  788.313997] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  788.314038] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  788.314075] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  788.314119] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  788.314159] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  788.314196] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  788.314236] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  788.333155] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  788.333182] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  788.333221] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  788.333243] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  788.333268] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  788.333290] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  788.333312] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  788.333335] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
[  788.333361] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!

Blacklisting spd5118 fixes my main problem, but that is more a workaround than a solution.
Comment 27 Armin Wolf 2025-05-04 18:54:19 UTC
Can you share the output of "acpidump" on your device?
Comment 28 Luca Carlon 2025-05-04 20:50:51 UTC
I see that it created more than 2MB of data. For safety reasons, would it be possible to restrict the amount of data I need to upload publicly? Thank you.
Comment 29 Armin Wolf 2025-05-04 21:12:02 UTC
Those 2MB of data contain the ACPI firmware tables. I would prefer if you upload the the whole dataset.

Do not worry, ACPI firmware tables contain no private information.
Comment 30 Armin Wolf 2025-05-05 00:26:55 UTC
I thing i have an idea:

Maybe commit ba9ad2af7019 ("i2c: i801: Fix I2C Block Read on 8-Series/C220 and later") is the root cause of the issue. I strongly suspect that the i2c controller begins to malfunction as soon as the spd5118 driver begins to use i2c block read commands. This happens when reading sensor values, so that would explain why the first probe succeeded.

Basically everything works until someone actually tries to read the temperature sensor value. This confuses the i2c controller until the next power cycle.

Do you know how to build a custom kernel? If no then i can try to provide you a .deb package containing a kernel without the commit mentioned above.
Comment 31 Guenter Roeck 2025-05-05 03:52:06 UTC
The spd5118 driver uses regmap, and regmap doesn't seem to execute a I2C_SMBUS_I2C_BLOCK_DATA command. That is the command fixed with commit ba9ad2af7019. Am I missing something ?
Comment 32 Armin Wolf 2025-05-05 05:04:49 UTC
The regmap code selects regmap_i2c_smbus_i2c_block as the regmap backend, which uses i2c_smbus_read/write_i2c_block_data().

However this would mean that the driver uses block transfers from the start, so they indeed cannot be the cause for the i2c issue.

Anyway, in this case we need to take a look at the ACPI tables. Maybe some firmware components are messing with the i2c controller?
Comment 33 Guenter Roeck 2025-05-05 13:41:05 UTC
It is odd that the spd5118 driver instantiates because that means that some I2C transactions must work and the lockup happens later.
Maybe it would help to enable enable tracing on SMBus commands to find out which command actually triggers the bus lockup. 
Either case, it might make sense to try resetting the i2c controller (set bit 3 of SMBHSTCFG) if it gets stuck.
Comment 34 Guenter Roeck 2025-05-05 13:51:20 UTC
There should be some kernel log messages "DDR5 temperature sensor: vendor  ..." when the spd5118 driver is loaded. How does those messages look like ? It would probably not help much, but it should at least tell us the chip vendors and maybe give an indication if those chips work in other systems. So far I have seen "0x06:0x32 revision 1.6" and "0x00:0xb3 revision 2.2", and both are known to work.
Comment 35 Luca Carlon 2025-05-05 16:22:36 UTC
(In reply to Armin Wolf from comment #29)
> Those 2MB of data contain the ACPI firmware tables. I would prefer if you
> upload the the whole dataset.
> 
> Do not worry, ACPI firmware tables contain no private information.

I think this is what you wanted: https://drive.google.com/file/d/1L9TPS4o4KCeRmNePOgQ3QUdJbVY8PKG7/view?usp=sharing. Sorry if I'm sharing it this way but I prefer to be able to remove it once not useful anymore. Please let me know if I created it correctly.

> Do you know how to build a custom kernel? If no then i can try to provide you
> a .deb package containing a kernel without the commit mentioned above.

Yes, I know how to build a custom kernel, but unfortunately it seems Endeavour OS is using systemd-boot, and I do not clearly understand how to install my custom build. It will probably take me some time to figure out. A deb package is not probably useful here.
Comment 36 Luca Carlon 2025-05-05 17:37:19 UTC
(In reply to Guenter Roeck from comment #34)
> There should be some kernel log messages "DDR5 temperature sensor: vendor
> ..." when the spd5118 driver is loaded. How does those messages look like ?
> It would probably not help much, but it should at least tell us the chip
> vendors and maybe give an indication if those chips work in other systems.
> So far I have seen "0x06:0x32 revision 1.6" and "0x00:0xb3 revision 2.2",
> and both are known to work.

You are right:

# dmesg | grep spd
[    5.412402] spd5118 1-0050: DDR5 temperature sensor: vendor 0x06:0x32 revision 1.6
[    5.417398] spd5118 1-0051: DDR5 temperature sensor: vendor 0x06:0x32 revision 1.6

Is that supposed to be written every time the driver is loaded? Cause I tried to rmmod/modprobe it, but those logs do not appear again. It is also not logged if I blacklist it and try to modprobe after the boot (maybe something messed with the bus already?).
Comment 37 Guenter Roeck 2025-05-05 18:23:14 UTC
The message is (or should be) seen whenever the driver is instantiated. You don't see it again because the driver won't re-instantiate after the first time due to the bus lockup. 0x06:0x32 is the Montage Technology chip which works just fine in my (and various other) systems, so the problem is unlikely to be caused by chip behavior.

Do you get an error message when you run modprobe and the message is not seen ? modprobe should bail out with "Device or resource busy" or a similar message.

"It is also not logged if I blacklist it and try to modprobe after the boot" - that is really interesting. Maybe you are right, and the bus locks up due to some other transaction. I have no idea what that might be, though.

If you boot with the driver blacklisted and run "i2cdetect -y 1" immediately, what do you get ? If you do that, do you see "SMBus is busy, can't use it!" in the kernel log ?

Also, if a driver for some other chip is instantiated on the same i2c bus, "grep . /sys/class/i2c-dev/i2c-1/device/*/name" might tell us what that is. That might give us another hint.
Comment 38 Luca Carlon 2025-05-05 22:20:35 UTC
> Do you get an error message when you run modprobe and the message is not seen
> ? modprobe should bail out with "Device or resource busy" or a similar
> message.

I tried to boot with the driver blacklisted, then I modprobed spd5118: no error message from modprobe and no error message in dmesg.

> If you boot with the driver blacklisted and run "i2cdetect -y 1" immediately,
> what do you get ? If you do that, do you see "SMBus is busy, can't use it!"
> in the kernel log ?

This is the result:

# i2cdetect -y 1
     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
00:                         -- -- -- -- -- -- -- --
10: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
20: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
30: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
40: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
50: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
60: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
70: -- -- -- -- -- -- -- --

No log in dmesg.

> Also, if a driver for some other chip is instantiated on the same i2c bus,
> "grep . /sys/class/i2c-dev/i2c-1/device/*/name" might tell us what that is.
> That might give us another hint.

This is what I get:

$ grep . /sys/class/i2c-dev/i2c-1/device/*/name
/sys/class/i2c-dev/i2c-1/device/1-0050/name:spd5118
/sys/class/i2c-dev/i2c-1/device/1-0051/name:spd5118

Is it normal that I'm seeing this when the driver is blacklisted?
Comment 39 Guenter Roeck 2025-05-06 03:06:48 UTC
I have no idea why i2cdetect fails to report any devices, even more so since there is no error log in dmesg.

The entries in /sys/class/i2c-dev/i2c-1/device/ are as expected; they are generated by the i801 code which tries to instantiate the chips.

The only other idea I have would be to enable i2c bus tracing and see what it reports.
Comment 40 Armin Wolf 2025-05-06 06:19:54 UTC
(In reply to Luca Carlon from comment #35)
> (In reply to Armin Wolf from comment #29)
> > Those 2MB of data contain the ACPI firmware tables. I would prefer if you
> > upload the the whole dataset.
> > 
> > Do not worry, ACPI firmware tables contain no private information.
> 
> I think this is what you wanted:
> https://drive.google.com/file/d/1L9TPS4o4KCeRmNePOgQ3QUdJbVY8PKG7/
> view?usp=sharing. Sorry if I'm sharing it this way but I prefer to be able
> to remove it once not useful anymore. Please let me know if I created it
> correctly.
> 
> > Do you know how to build a custom kernel? If no then i can try to provide
> you
> > a .deb package containing a kernel without the commit mentioned above.
> 
> Yes, I know how to build a custom kernel, but unfortunately it seems
> Endeavour OS is using systemd-boot, and I do not clearly understand how to
> install my custom build. It will probably take me some time to figure out. A
> deb package is not probably useful here.

I took a look at the ACPI table and it seems that the BIOS leaves the i2c controller alone.

This means that somehow the spd5118 driver is causing those issues alone. Thus i too suggest that you enable i2c bus tracing so that we can get more information about the i2c operation triggering this mess.
Comment 41 Luca Carlon 2025-05-06 16:35:03 UTC
I found this for my current kernel:

$ zcat /proc/config.gz | grep I2C_DEBUG
# CONFIG_I2C_DEBUG_CORE is not set
# CONFIG_I2C_DEBUG_ALGO is not set
# CONFIG_I2C_DEBUG_BUS is not set

So I guess I'll need to build a debug kernel first, is this correct?
Comment 42 Guenter Roeck 2025-05-06 17:26:18 UTC
No, that should not be necessary. As root, try:

# cd /sys/kernel/debug/tracing
# echo "adapter_nr==1" > events/smbus/filter
# echo 1 > events/smbus/enable
# echo 1 > tracing_on
# modprobe spd5118
# sensors
# cat trace
# echo 0 > tracing_on
Comment 43 Luca Carlon 2025-05-06 17:42:10 UTC
Ah yes, now I see something, thanks. Not very useful I guess:

# tracer: nop
#
# entries-in-buffer/entries-written: 16/16   #P:32
#
#                                _-----=> irqs-off/BH-disabled
#                               / _----=> need-resched
#                              | / _---=> hardirq/softirq
#                              || / _--=> preempt-depth
#                              ||| / _-=> migrate-disable
#                              |||| /     delay
#           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
#              | |         |   |||||     |         |
        modprobe-3540    [000] .....   356.148355: smbus_read: i2c-1 a=050 f=0000 c=0 WORD_DATA
        modprobe-3540    [000] .....   356.352449: smbus_result: i2c-1 a=050 f=0000 c=0 WORD_DATA rd res=-110
        modprobe-3540    [000] .....   356.352523: smbus_read: i2c-1 a=051 f=0000 c=0 WORD_DATA
        modprobe-3540    [000] .....   356.560456: smbus_result: i2c-1 a=051 f=0000 c=0 WORD_DATA rd res=-110
        modprobe-3540    [000] .....   356.560586: smbus_read: i2c-1 a=052 f=0000 c=0 BYTE
        modprobe-3540    [000] .....   356.769460: smbus_result: i2c-1 a=052 f=0000 c=0 BYTE rd res=-110
        modprobe-3540    [000] .....   356.769465: smbus_read: i2c-1 a=053 f=0000 c=0 BYTE
        modprobe-3540    [000] .....   356.977454: smbus_result: i2c-1 a=053 f=0000 c=0 BYTE rd res=-110
        modprobe-3540    [000] .....   356.977457: smbus_read: i2c-1 a=054 f=0000 c=0 BYTE
        modprobe-3540    [000] .....   357.184467: smbus_result: i2c-1 a=054 f=0000 c=0 BYTE rd res=-110
        modprobe-3540    [000] .....   357.184476: smbus_read: i2c-1 a=055 f=0000 c=0 BYTE
        modprobe-3540    [000] .....   357.392448: smbus_result: i2c-1 a=055 f=0000 c=0 BYTE rd res=-110
        modprobe-3540    [000] .....   357.392454: smbus_read: i2c-1 a=056 f=0000 c=0 BYTE
        modprobe-3540    [000] .....   357.600469: smbus_result: i2c-1 a=056 f=0000 c=0 BYTE rd res=-110
        modprobe-3540    [000] .....   357.600476: smbus_read: i2c-1 a=057 f=0000 c=0 BYTE
        modprobe-3540    [000] .....   357.809453: smbus_result: i2c-1 a=057 f=0000 c=0 BYTE rd res=-110

I should probably somehow enable tracing at the beginning of the boot, to log the entire communication.
Comment 44 Guenter Roeck 2025-05-06 17:55:52 UTC
Yes, in that situation the bus is already locked up.
Comment 45 Guenter Roeck 2025-05-06 18:10:35 UTC
It is odd though that it tries a byte read on addresses 0x52..0x57. The driver should not do that - it only uses word accesses for register 0x00.
Comment 46 Guenter Roeck 2025-05-06 18:19:16 UTC
I think i2c_default_probe() is doing the byte accesses. That makes me wonder if the bus locks up when trying to read from those addresses. The first pass would detect the connected chips at address 0x50 and 0x51, subsequent passes would encounter the bus in locked state.
Comment 47 Luca Carlon 2025-05-06 18:40:17 UTC
Is this data retrieved from the bus?

# dmesg | grep spd
[    5.412402] spd5118 1-0050: DDR5 temperature sensor: vendor 0x06:0x32 revision 1.6
[    5.417398] spd5118 1-0051: DDR5 temperature sensor: vendor 0x06:0x32 revision 1.6

cause that data arrives only when the driver is not blacklisted. If I blacklist it and modprobe it after the boot, that is not received.
Comment 48 Guenter Roeck 2025-05-06 20:08:25 UTC
Yes, the data is retrieved from the bus. My current theory is:

- The i801 driver probes the address range from 0x50..0x57 to check if a device
  is connected
- Whenever it find one, the respective driver's probe function is called.
  That function reads the chip vendor and revision from the chip
  and instantiates the device
- The i801 driver keeps probing the remaining addresses. While probing those,
  the controller locks up.

You should see some "Successfully instantiated SPD at ..." messages in the boot log. I assume you have CONFIG_SENSORS_SPD5118_DETECT enabled in your configuration - that would trigger those extra address checks. If it is enabled, disabling that configuration option might be a workaround. If the bus indeed locks up due to the attempted read operations on addresses 0x52..0x57, that would however only be a temporary workaround. The real fix would have to be implemented in the i801 driver - for example by resetting the controller if it is stuck.

In this context, keep in mind that the spd5118 driver will be blacklisted at some point in the near future because write protecting the 0x50..0x57 address range is not compatible with the operation of spd5118 devices.
Comment 49 Heiner Kallweit 2025-05-06 20:33:21 UTC
"The real fix would have to be implemented in the i801 driver - for example by resetting the controller if it is stuck."

i801 does this already, it sets the SMBHSTCNT_KILL bit in case of a transaction timeout. However such a soft reset may have its limits, depending on how hard the chip has locked up. And I'd rather call this a workaround.

IMO the root cause is the probing mechanism. An address should only be accessed if we know there's a RAM module. Like for DDR4 driver ee1004 instantiates jc42 at an address only if SPD indicates presence of a temp sensor.
Comment 50 Guenter Roeck 2025-05-06 20:51:22 UTC
The i801 controller also has a soft_reset bit which isn't supported.

"IMO the root cause is the probing mechanism." - sure, that is why there is a SENSORS_SPD5118_DETECT configuration option. The comment regarding the jc42 driver  isn't really accurate because the i2c subsystem _will_ scan the entire 0x18..0x1f address range when the driver is loaded. Only difference is that the i2c core will use the I2C_FUNC_SMBUS_QUICK command if supported for that scan and not attempt a byte read.

Maye we should change the comment associated with SENSORS_SPD5118_DETECT to "if unsure, say N". I don't think that will really help, though.
Comment 51 Luca Carlon 2025-05-06 21:09:18 UTC
> You should see some "Successfully instantiated SPD at ..." messages in the
> boot log.

Yes:

# dmesg | grep -i spd
[    5.333756] i801_smbus 0000:00:1f.4: SPD Write Disable is set
[    5.343238] i2c i2c-1: Successfully instantiated SPD at 0x50
[    5.343655] i2c i2c-1: Successfully instantiated SPD at 0x51

> I assume you have CONFIG_SENSORS_SPD5118_DETECT enabled in your configuration
> - that would trigger those extra address checks.

Yes, correct:

$ zcat /proc/config.gz | grep CONFIG_SENSORS_SPD5118_DETECT         
CONFIG_SENSORS_SPD5118_DETECT=y

> If it is enabled, disabling that configuration option might be a workaround.
> If the bus indeed locks up due to the attempted read operations on addresses
> 0x52..0x57, that would however only be a temporary workaround. The real fix
> would have to be implemented in the i801 driver - for example by resetting
> the controller if it is stuck.

The workaround I found seems to work well so far: I blacklist the spd5118 driver. I was hoping a real fix could be found: a user should not be forced to blacklist a driver to suspend the machine.

Thank you for your help.
Comment 52 Armin Wolf 2025-05-06 21:18:04 UTC
Looking this patch series [1] i begin to wonder whether the i801 driver should even instantiate any SPD chips when the write disable bit is set. This would prevent the bus lockup during probe and possibly other issues should this issue also exist on DDR4 platforms.

[1] https://lore.kernel.org/linux-i2c/20250430-for-upstream-i801-spd5118-no-instantiate-v2-0-2f54d91ae2c7@canonical.com/
Comment 53 Guenter Roeck 2025-05-06 22:53:12 UTC
DDR4 chips don't have the problem: The temperature sensors are at a different address range (0x18..0x1f) and are not affected by the write protection, and the eeprom driver is implemented read-only. Also, unlike DDR5/SPD5118, DDR4 EEPROMs do not require write operations into the 0x50..0x5f address range for page selection.
Comment 54 Heiner Kallweit 2025-05-07 05:37:26 UTC
(In reply to Guenter Roeck from comment #50)
> The i801 controller also has a soft_reset bit which isn't supported.
> 
I think you're referring to SSRESET in HOSTC in the PCI config space.
Right, this isn't supported yet.

> "IMO the root cause is the probing mechanism." - sure, that is why there is
> a SENSORS_SPD5118_DETECT configuration option. The comment regarding the
> jc42 driver  isn't really accurate because the i2c subsystem _will_ scan the
> entire 0x18..0x1f address range when the driver is loaded. Only difference
> is that the i2c core will use the I2C_FUNC_SMBUS_QUICK command if supported
> for that scan and not attempt a byte read.
> 
> Maye we should change the comment associated with SENSORS_SPD5118_DETECT to
> "if unsure, say N". I don't think that will really help, though.

I was a little fast with writing, and I agree with the remarks.
I2C probing is considered a legacy mechanism by Wolfram. The problem here highlights why. Can't we remove the probing from spd5118 and only instantiate it at an address if DMI/SPD indicate presence?
Comment 55 Guenter Roeck 2025-05-07 13:06:44 UTC
> I was a little fast with writing, and I agree with the remarks.
> I2C probing is considered a legacy mechanism by Wolfram. The problem here
> highlights why. Can't we remove the probing from spd5118 and only
> instantiate it at an address if DMI/SPD indicate presence?

That is why the configuration option exists. Not all systems support DMI, and not all controllers call the SPD probe functions. On devicetree systems, due to the dynamic nature of memory insertion, it is unlikely that the DIMMs are listed in devicetree.

On top of that, even the SPD probe function probes the addresses: If DIMMs are not inserted in order and there are holes in the address assignments, we would be back  to address probing of addresses with nothing connected to it (because that is what the SPD detection code does). DMI only reports the number of inserted modules, not their addresses.

Note You need to log in before you can comment on or make changes to this bug.