Created attachment 297185 [details] dmesg Problem: On boot, i801_smbus reports these errors: [ 3.221208] i801_smbus 0000:00:1f.4: SPD Write Disable is set [ 3.221247] i801_smbus 0000:00:1f.4: SMBus using PCI interrupt [ 3.423566] i801_smbus 0000:00:1f.4: Timeout waiting for interrupt! [ 3.423569] i801_smbus 0000:00:1f.4: Transaction timeout [ 3.425564] i801_smbus 0000:00:1f.4: Failed terminating the transaction [ 3.425603] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! Tested on Arch Linux / Gentoo, kernel 5.10.27, 5.12.x HW: Lenovo Ideapad 5 15ITL05 i5-1135G7, latest firmware Attempted fixes so far: disabling all other drivers accessing SMBus, using different SMBus access methods lspci -k reports the driver loading correctly: 00:1f.4 SMBus: Intel Corporation Tiger Lake-LP SMBus Controller (rev 20) Subsystem: Lenovo Tiger Lake-LP SMBus Controller Kernel driver in use: i801_smbus Kernel modules: i2c_i801 lm_sensors seems to have issues detecting SMBus: Found unknown SMBus adapter 8086:a0a3 at 0000:00:1f.4. i2cdetect -F output: Functionalities implemented by /dev/i2c-12: I2C no SMBus Quick Command yes SMBus Send Byte yes SMBus Receive Byte yes SMBus Write Byte yes SMBus Read Byte yes SMBus Write Word yes SMBus Read Word yes SMBus Process Call no SMBus Block Write yes SMBus Block Read yes SMBus Block Process Call yes SMBus PEC yes I2C Block Write yes I2C Block Read yes
Note: accessing the SMBus device via i2cdetect causes the flood of "i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!" error messages seen in dmesg.
Created attachment 298413 [details] DSDT.dsl DSDT.dsl file for Lenovo IdeaPad 5 15ITL05 extracted from the latest FHCN57WW firmware.
https://lore.kernel.org/linux-i2c/CAJCQCtTB+KW596A1Q+Ds6u9uvUrqeOSmer6qKv7g+xRYijGS3A@mail.gmail.com/ This report describes the same issue on a Thinkpad Carbon X1 7th gen running Linux 5.14-rc3. Maybe this is a Lenovo-specific problem?
Hello, Did you try using a "i2c-i801.disable_features=0x10" Linux kernel cmdline option to disable usage of interrupts in the driver? In the dmesg you have: [ 3.221208] i801_smbus 0000:00:1f.4: SPD Write Disable is set [ 3.221247] i801_smbus 0000:00:1f.4: SMBus using PCI interrupt [ 3.222484] i2c i2c-12: 2/2 memory slots populated (from DMI) [...] [ 3.423566] i801_smbus 0000:00:1f.4: Timeout waiting for interrupt! [ 3.423569] i801_smbus 0000:00:1f.4: Transaction timeout [ 3.425564] i801_smbus 0000:00:1f.4: Failed terminating the transaction [ 3.425603] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! It is possible that the SMBus hangs while trying to access SPD EEPROM chips located on RAM modules. So blacklisting the "eeprom" and "at24" modules may help here. Why do you specifically want to access SMBus / the I2C bus? Is there a piece of hardware on the laptop that does not work?
Try also blacklisting the "i2c-i801" module and loading it manually ("modprobe i2c-i801 disable_features=0x10") after the laptop has finished booting. The problems may be caused by something other accessing the i2c bus during boot.
Hello. I did the "boot with the cmdline option" thing and "blacklist and then modprobe" thing so far, and I've seen this pop up on dmesg [ 104.545429] i801_smbus 0000:00:1f.4: Interrupt disabled by user [ 104.545691] i801_smbus 0000:00:1f.4: SPD Write Disable is set [ 104.545714] i801_smbus 0000:00:1f.4: SMBus using polling [ 104.546865] i2c i2c-14: 2/2 memory slots populated (from DMI) [ 104.549764] iTCO_vendor_support: vendor-support=0 [ 104.747810] i801_smbus 0000:00:1f.4: Transaction timeout [ 104.749912] i801_smbus 0000:00:1f.4: Failed terminating the transaction [ 104.749970] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 104.760558] iTCO_wdt iTCO_wdt: Found a Intel PCH TCO device (Version=6, TCOBASE=0x0400) [ 104.760629] iTCO_wdt iTCO_wdt: initialized. heartbeat=30 sec (nowayout=0) I'll try blacklisting those other modules next
It didn't do anything, and I noticed that those modules don't even get loaded by default (they're not in lsmod output on a normal boot) Also, forgot to say, I have the exact same laptop, but with an i7-1165g7 instead.
I got similar issue on thinkpad X1 gen9 with latest 6.9.0-rc4+ Git bisect the first bad commit is "13e3a512a29001c i2c: smbus: Support up to 8 SPD EEPROMs " modprobe without param: [ 1290.401393] i801_smbus 0000:00:1f.4: SPD Write Disable is set [ 1290.401486] i801_smbus 0000:00:1f.4: SMBus using PCI interrupt [ 1290.403340] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 1290.403383] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 1290.403410] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 1290.403437] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 1290.403465] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 1290.403492] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 1290.403519] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 1290.403546] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! with param [ 1314.568785] i801_smbus 0000:00:1f.4: Interrupt disabled by user [ 1314.568837] i801_smbus 0000:00:1f.4: SPD Write Disable is set [ 1314.568894] i801_smbus 0000:00:1f.4: SMBus using polling [ 1314.570230] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 1314.570257] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 1314.570283] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 1314.570310] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 1314.570336] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 1314.570362] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 1314.570389] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 1314.570415] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
Hello, > I got similar issue on thinkpad X1 gen9 with latest 6.9.0-rc4+ > Git bisect the first bad commit is "13e3a512a29001c i2c: smbus: Support up > to 8 SPD EEPROMs > " Please report it to linux-i2c@vger.kernel.org while CCing the email addresses from description of this commit. I suspect that commit 13e3a512a29001c ("i2c: smbus: Support up to 8 SPD EEPROMs") only triggered a problem that was present earlier. Does running i2cdetect on a kernel without this problem (like Linux 6.8) trigger this bug? Greetings, Mateusz
Thanks for reporting this to mail list, I tried i2cdetect on old kernel, did not see errors. But I'm not familar with i2c, not sure which option do you want me to try. Anyway, here is some output. # i2cdetect -l i2c-0 i2c Synopsys DesignWare I2C adapter I2C adapter i2c-1 smbus SMBus I801 adapter at efa0 SMBus adapter i2c-2 i2c i915 gmbus dpa I2C adapter i2c-3 i2c i915 gmbus dpb I2C adapter i2c-4 i2c i915 gmbus dpc I2C adapter i2c-5 i2c i915 gmbus tc1 I2C adapter i2c-6 i2c i915 gmbus tc2 I2C adapter i2c-7 i2c i915 gmbus tc3 I2C adapter i2c-8 i2c i915 gmbus tc4 I2C adapter i2c-9 i2c i915 gmbus tc5 I2C adapter i2c-10 i2c i915 gmbus tc6 I2C adapter i2c-11 i2c AUX A/DDI A/PHY A I2C adapter i2c-12 i2c AUX USBC1/DDI TC1/PHY TC1 I2C adapter i2c-13 i2c AUX USBC2/DDI TC2/PHY TC2 I2C adapter i2c-14 i2c AUX USBC3/DDI TC3/PHY TC3 I2C adapter i2c-15 i2c AUX USBC4/DDI TC4/PHY TC4 I2C adapter # i2cdetect -y 11 0 1 2 3 4 5 6 7 8 9 a b c d e f 00: -- -- -- -- -- -- -- -- 10: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 20: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 30: -- -- -- -- -- -- -- 37 -- -- -- -- -- -- -- -- 40: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 50: 50 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 60: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 70: -- -- -- -- -- -- -- -- # i2cdetect -F 11 Functionalities implemented by /dev/i2c-11: I2C yes SMBus Quick Command yes SMBus Send Byte yes SMBus Receive Byte yes SMBus Write Byte yes SMBus Read Byte yes SMBus Write Word yes SMBus Read Word yes SMBus Process Call yes SMBus Block Write yes SMBus Block Read yes SMBus Block Process Call yes SMBus PEC yes I2C Block Write yes I2C Block Read yes # uname -r 6.7.9-200.fc39.x86_64
Please check i2cdetect 0 i2cdetect 1 dmidecode How many RAM modules does this computer have?
(In reply to ruirui.yang from comment #8) > I got similar issue on thinkpad X1 gen9 with latest 6.9.0-rc4+ > Git bisect the first bad commit is "13e3a512a29001c i2c: smbus: Support up > to 8 SPD EEPROMs > " > The mentioned commit showed up in 6.8 only, but the problem reports date back to at least 5.12. So I don't think this commit is the culprit. Please bisect between last known good kernel and 5.12.
6.6 kernel works fine. # i2cdetect 0 WARNING! This program can confuse your I2C bus, cause data loss and worse! I will probe file /dev/i2c-0. I will probe address range 0x08-0x77. Continue? [Y/n] Y 0 1 2 3 4 5 6 7 8 9 a b c d e f 00: -- -- -- -- -- -- -- -- 10: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 20: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 30: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 40: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 50: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 60: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 70: -- -- -- -- -- -- -- -- # i2cdetect 1 Warning: Can't use SMBus Quick Write command, will skip some addresses WARNING! This program can confuse your I2C bus, cause data loss and worse! I will probe file /dev/i2c-1. I will probe address range 0x08-0x77. Continue? [Y/n] Y 0 1 2 3 4 5 6 7 8 9 a b c d e f 00: 10: 20: 30: -- -- -- -- -- -- -- -- 40: 50: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 60: 70:
Created attachment 306261 [details] dmidecode output
5.12 kernel works fine to me so I think it might be different problem although the symptom is similar. I do not think I can bisect betwen 5.12 and 6.x since all works fine, the first bad commit is clear to me with my original bisect.
According to the dmidecode output you have 8 memory slots, each populated with a 4GB DIMM. Is this correct? Then with previous kernel versions you should have seen the warning "Systems with more than 4 memory slots not supported yet".
It is hard to know, the laptop is a thinkpad x1 gen9, according to below article the memory is soldered to the motherboard. But it said that the memory works in quad-channel mode, so not sure if it is just 4 channels. I have never seen the warnings, but I can have a quick test with 6.6 kernel. https://laptopmedia.com/highlights/inside-lenovo-thinkpad-x1-carbon-9th-gen-disassembly-and-upgrade-options/
Yes! I see the warnings with 6.6 kernel: [ 1.591594] i2c i2c-0: 8/8 memory slots populated (from DMI) [ 1.593214] i2c i2c-0: Systems with more than 4 memory slots not supported yet, not instantiating SPD
> --- Comment #13 from ruirui.yang@linux.dev --- > 6.6 kernel works fine. > > # i2cdetect 0 > [...] > # i2cdetect 1 > [...] Are there any warning messages in dmesg generated while you run i2cdetect?
Yes, there are something printed below: [ 777.551328] i801_smbus 0000:00:1f.4: Transaction timeout [ 777.553555] i801_smbus 0000:00:1f.4: Failed terminating the transaction [ 777.553674] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.553752] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.553824] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.553895] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.553964] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.554036] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.554108] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.554187] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.554264] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.554336] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.554408] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.554481] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.554552] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.554623] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.554695] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.554767] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.554838] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.554908] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.554980] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.555051] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.555122] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.555224] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.555302] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.555387] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.555463] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.555539] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.555614] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.555689] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.555765] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.555836] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.555908] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.555980] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.556050] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.556121] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.556193] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.556264] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.556336] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.556408] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.556478] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.556552] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.556624] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.556696] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.556766] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.556837] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.556909] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.556977] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.557049] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.557119] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.557189] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.557259] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.557331] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.557402] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.557471] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.557543] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.557614] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.557687] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.557759] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.557830] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.557899] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.557970] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.558041] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.558112] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.558183] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.558254] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.558325] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.558396] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.558467] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.558536] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.558585] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.558623] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.558658] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.558696] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.558731] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.558766] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.558801] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.558837] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.558872] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.558907] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.558942] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.558977] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.559012] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.559047] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.559082] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.559117] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.559152] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.559196] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.559233] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.559271] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.559306] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.559342] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.559378] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.559417] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.559455] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.559490] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.559525] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.559560] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.559596] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.559631] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.559666] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.559701] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.559736] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.559771] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.559806] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.559844] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.559879] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.559914] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.559949] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.559984] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.560019] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.560055] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 777.560090] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
Hmm, comment #20 is a reply to comment #19, will use "reply" to quote the questions later.
Seems that for whatever reason bit SMBHSTSTS_HOST_BUSY is constantly set. It should be automatically cleared by the host after each SMBUS operation. Could you please test the following? It's no proper fix, it's just to test whether manually resetting the bit on i801 driver load helps. diff --git a/drivers/i2c/busses/i2c-i801.c b/drivers/i2c/busses/i2c-i801.c index a7c0c8710..a521bd4a3 100644 --- a/drivers/i2c/busses/i2c-i801.c +++ b/drivers/i2c/busses/i2c-i801.c @@ -1683,6 +1683,8 @@ static int i801_probe(struct pci_dev *dev, const struct pci_device_id *id) outb_p(inb_p(SMBAUXCTL(priv)) & ~(SMBAUXCTL_CRC | SMBAUXCTL_E32B), SMBAUXCTL(priv)); + outb_p(SMBHSTSTS_HOST_BUSY, SMBHSTSTS(priv)); + /* Default timeout in interrupt mode: 200 ms */ priv->adapter.timeout = HZ / 5; -- 2.45.0
(In reply to Heiner Kallweit from comment #22) > Seems that for whatever reason bit SMBHSTSTS_HOST_BUSY is constantly set. > It should be automatically cleared by the host after each SMBUS operation. > > Could you please test the following? > It's no proper fix, it's just to test whether manually resetting the bit on > i801 driver load helps. Hi, with the changes, I still got the SMBus busy messages below: [~]$ uname -r 6.9.0-rc6+ [~]$ dmesg|grep i801 [ 1.563819] i801_smbus 0000:00:1f.4: enabling device (0000 -> 0003) [ 1.565938] i801_smbus 0000:00:1f.4: SPD Write Disable is set [ 1.569900] i801_smbus 0000:00:1f.4: SMBus using PCI interrupt [ 1.776346] i801_smbus 0000:00:1f.4: Transaction timeout [ 1.780426] i801_smbus 0000:00:1f.4: Failed terminating the transaction [ 1.782933] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 1.785568] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 1.787947] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 1.790513] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 1.792779] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 1.794917] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 1.797100] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it!
OK, so it can start a transaction, but transaction doesn't finish. Looks like BIOS might have switched off some clock needed for SMBUS. First (failed) SMBUS access attempt is when the ee1004 driver is probed. Before the referenced change this driver wasn't instantiated (due to >4 memory slots). Having said that I think that the SMBUS wasn't usable also before. You could check under 6.6 with i2cdetect, and whether you can access the SPD EEPROM's from userspace. Simple workaround would be to blacklist ee1004 and/or i801.
Following more than 3 yrs old post for a Gen8 model seems to indicate that SMBUS access on this machine type has been problematic for years. https://forums.lenovo.com/topic/findpost/27/5048595/5181539 The bisect just points to the change that revealed the problem.
Hello! I'm also experiencing an issue that may seem similar. I'm also on a brand new Lenovo P16 Gen2 with Intel i9. The main symptom is that I was unable to hibernate/suspend the machine. The main problem is that driver spd5118 fails to suspend and aborts the operation. As discussed here https://lore.kernel.org/lkml/dmx2x5sziux7ubk5fcas2nmj4lt3vpalr5gc7qmmwq2megmp24@24vmehdkle3x/ the problem may be somehow related to the i2c communication. This is what I see after the boot: # dmesg -w | grep smbus [ 5.416242] i801_smbus 0000:00:1f.4: enabling device (0000 -> 0003) [ 5.416572] i801_smbus 0000:00:1f.4: SPD Write Disable is set [ 5.416607] i801_smbus 0000:00:1f.4: SMBus using PCI interrupt If I try to list the busses I get: # i2cdetect -l i2c-0 i2c Synopsys DesignWare I2C adapter I2C adapter i2c-1 smbus SMBus I801 adapter at efa0 SMBus adapter i2c-2 i2c i915 gmbus dpa I2C adapter i2c-3 i2c i915 gmbus dpb I2C adapter i2c-4 i2c i915 gmbus dpc I2C adapter i2c-5 i2c i915 gmbus dpd I2C adapter i2c-6 i2c i915 gmbus tc1 I2C adapter i2c-7 i2c AUX A/DDI A/PHY A I2C adapter i2c-8 i2c AUX C/DDI C/PHY C I2C adapter i2c-9 i2c AUX D/DDI D/PHY D I2C adapter If I query the SMBus: # i2cdetect -y 1 0 1 2 3 4 5 6 7 8 9 a b c d e f 00: -- -- -- -- -- -- -- -- 10: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 20: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 30: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 40: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 50: UU UU -- -- -- -- -- -- -- -- -- -- -- -- -- -- 60: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 70: -- -- -- -- -- -- -- -- I guess those two are the sensors in my two DDR5 modules. If I try to query those sensors I get: spd5118-i2c-1-51 Adapter: SMBus I801 adapter at efa0 ERROR: Can't get value of subfeature temp1_lcrit_alarm: Can't read ERROR: Can't get value of subfeature temp1_min_alarm: Can't read ERROR: Can't get value of subfeature temp1_max_alarm: Can't read ERROR: Can't get value of subfeature temp1_crit_alarm: Can't read ERROR: Can't get value of subfeature temp1_min: Can't read ERROR: Can't get value of subfeature temp1_max: Can't read ERROR: Can't get value of subfeature temp1_lcrit: Can't read ERROR: Can't get value of subfeature temp1_crit: Can't read temp1: N/A (low = +0.0°C, high = +0.0°C) (crit low = +0.0°C, crit = +0.0°C) [...] spd5118-i2c-1-50 Adapter: SMBus I801 adapter at efa0 ERROR: Can't get value of subfeature temp1_lcrit_alarm: Can't read ERROR: Can't get value of subfeature temp1_min_alarm: Can't read ERROR: Can't get value of subfeature temp1_max_alarm: Can't read ERROR: Can't get value of subfeature temp1_crit_alarm: Can't read ERROR: Can't get value of subfeature temp1_min: Can't read ERROR: Can't get value of subfeature temp1_max: Can't read ERROR: Can't get value of subfeature temp1_lcrit: Can't read ERROR: Can't get value of subfeature temp1_crit: Can't read temp1: N/A (low = +0.0°C, high = +0.0°C) (crit low = +0.0°C, crit = +0.0°C) At this point, dmesg immediately reports the same errors: [ 788.313930] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 788.313997] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 788.314038] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 788.314075] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 788.314119] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 788.314159] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 788.314196] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 788.314236] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 788.333155] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 788.333182] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 788.333221] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 788.333243] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 788.333268] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 788.333290] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 788.333312] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 788.333335] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! [ 788.333361] i801_smbus 0000:00:1f.4: SMBus is busy, can't use it! Blacklisting spd5118 fixes my main problem, but that is more a workaround than a solution.
Can you share the output of "acpidump" on your device?
I see that it created more than 2MB of data. For safety reasons, would it be possible to restrict the amount of data I need to upload publicly? Thank you.
Those 2MB of data contain the ACPI firmware tables. I would prefer if you upload the the whole dataset. Do not worry, ACPI firmware tables contain no private information.
I thing i have an idea: Maybe commit ba9ad2af7019 ("i2c: i801: Fix I2C Block Read on 8-Series/C220 and later") is the root cause of the issue. I strongly suspect that the i2c controller begins to malfunction as soon as the spd5118 driver begins to use i2c block read commands. This happens when reading sensor values, so that would explain why the first probe succeeded. Basically everything works until someone actually tries to read the temperature sensor value. This confuses the i2c controller until the next power cycle. Do you know how to build a custom kernel? If no then i can try to provide you a .deb package containing a kernel without the commit mentioned above.
The spd5118 driver uses regmap, and regmap doesn't seem to execute a I2C_SMBUS_I2C_BLOCK_DATA command. That is the command fixed with commit ba9ad2af7019. Am I missing something ?
The regmap code selects regmap_i2c_smbus_i2c_block as the regmap backend, which uses i2c_smbus_read/write_i2c_block_data(). However this would mean that the driver uses block transfers from the start, so they indeed cannot be the cause for the i2c issue. Anyway, in this case we need to take a look at the ACPI tables. Maybe some firmware components are messing with the i2c controller?
It is odd that the spd5118 driver instantiates because that means that some I2C transactions must work and the lockup happens later. Maybe it would help to enable enable tracing on SMBus commands to find out which command actually triggers the bus lockup. Either case, it might make sense to try resetting the i2c controller (set bit 3 of SMBHSTCFG) if it gets stuck.
There should be some kernel log messages "DDR5 temperature sensor: vendor ..." when the spd5118 driver is loaded. How does those messages look like ? It would probably not help much, but it should at least tell us the chip vendors and maybe give an indication if those chips work in other systems. So far I have seen "0x06:0x32 revision 1.6" and "0x00:0xb3 revision 2.2", and both are known to work.
(In reply to Armin Wolf from comment #29) > Those 2MB of data contain the ACPI firmware tables. I would prefer if you > upload the the whole dataset. > > Do not worry, ACPI firmware tables contain no private information. I think this is what you wanted: https://drive.google.com/file/d/1L9TPS4o4KCeRmNePOgQ3QUdJbVY8PKG7/view?usp=sharing. Sorry if I'm sharing it this way but I prefer to be able to remove it once not useful anymore. Please let me know if I created it correctly. > Do you know how to build a custom kernel? If no then i can try to provide you > a .deb package containing a kernel without the commit mentioned above. Yes, I know how to build a custom kernel, but unfortunately it seems Endeavour OS is using systemd-boot, and I do not clearly understand how to install my custom build. It will probably take me some time to figure out. A deb package is not probably useful here.
(In reply to Guenter Roeck from comment #34) > There should be some kernel log messages "DDR5 temperature sensor: vendor > ..." when the spd5118 driver is loaded. How does those messages look like ? > It would probably not help much, but it should at least tell us the chip > vendors and maybe give an indication if those chips work in other systems. > So far I have seen "0x06:0x32 revision 1.6" and "0x00:0xb3 revision 2.2", > and both are known to work. You are right: # dmesg | grep spd [ 5.412402] spd5118 1-0050: DDR5 temperature sensor: vendor 0x06:0x32 revision 1.6 [ 5.417398] spd5118 1-0051: DDR5 temperature sensor: vendor 0x06:0x32 revision 1.6 Is that supposed to be written every time the driver is loaded? Cause I tried to rmmod/modprobe it, but those logs do not appear again. It is also not logged if I blacklist it and try to modprobe after the boot (maybe something messed with the bus already?).
The message is (or should be) seen whenever the driver is instantiated. You don't see it again because the driver won't re-instantiate after the first time due to the bus lockup. 0x06:0x32 is the Montage Technology chip which works just fine in my (and various other) systems, so the problem is unlikely to be caused by chip behavior. Do you get an error message when you run modprobe and the message is not seen ? modprobe should bail out with "Device or resource busy" or a similar message. "It is also not logged if I blacklist it and try to modprobe after the boot" - that is really interesting. Maybe you are right, and the bus locks up due to some other transaction. I have no idea what that might be, though. If you boot with the driver blacklisted and run "i2cdetect -y 1" immediately, what do you get ? If you do that, do you see "SMBus is busy, can't use it!" in the kernel log ? Also, if a driver for some other chip is instantiated on the same i2c bus, "grep . /sys/class/i2c-dev/i2c-1/device/*/name" might tell us what that is. That might give us another hint.
> Do you get an error message when you run modprobe and the message is not seen > ? modprobe should bail out with "Device or resource busy" or a similar > message. I tried to boot with the driver blacklisted, then I modprobed spd5118: no error message from modprobe and no error message in dmesg. > If you boot with the driver blacklisted and run "i2cdetect -y 1" immediately, > what do you get ? If you do that, do you see "SMBus is busy, can't use it!" > in the kernel log ? This is the result: # i2cdetect -y 1 0 1 2 3 4 5 6 7 8 9 a b c d e f 00: -- -- -- -- -- -- -- -- 10: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 20: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 30: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 40: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 50: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 60: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 70: -- -- -- -- -- -- -- -- No log in dmesg. > Also, if a driver for some other chip is instantiated on the same i2c bus, > "grep . /sys/class/i2c-dev/i2c-1/device/*/name" might tell us what that is. > That might give us another hint. This is what I get: $ grep . /sys/class/i2c-dev/i2c-1/device/*/name /sys/class/i2c-dev/i2c-1/device/1-0050/name:spd5118 /sys/class/i2c-dev/i2c-1/device/1-0051/name:spd5118 Is it normal that I'm seeing this when the driver is blacklisted?
I have no idea why i2cdetect fails to report any devices, even more so since there is no error log in dmesg. The entries in /sys/class/i2c-dev/i2c-1/device/ are as expected; they are generated by the i801 code which tries to instantiate the chips. The only other idea I have would be to enable i2c bus tracing and see what it reports.
(In reply to Luca Carlon from comment #35) > (In reply to Armin Wolf from comment #29) > > Those 2MB of data contain the ACPI firmware tables. I would prefer if you > > upload the the whole dataset. > > > > Do not worry, ACPI firmware tables contain no private information. > > I think this is what you wanted: > https://drive.google.com/file/d/1L9TPS4o4KCeRmNePOgQ3QUdJbVY8PKG7/ > view?usp=sharing. Sorry if I'm sharing it this way but I prefer to be able > to remove it once not useful anymore. Please let me know if I created it > correctly. > > > Do you know how to build a custom kernel? If no then i can try to provide > you > > a .deb package containing a kernel without the commit mentioned above. > > Yes, I know how to build a custom kernel, but unfortunately it seems > Endeavour OS is using systemd-boot, and I do not clearly understand how to > install my custom build. It will probably take me some time to figure out. A > deb package is not probably useful here. I took a look at the ACPI table and it seems that the BIOS leaves the i2c controller alone. This means that somehow the spd5118 driver is causing those issues alone. Thus i too suggest that you enable i2c bus tracing so that we can get more information about the i2c operation triggering this mess.
I found this for my current kernel: $ zcat /proc/config.gz | grep I2C_DEBUG # CONFIG_I2C_DEBUG_CORE is not set # CONFIG_I2C_DEBUG_ALGO is not set # CONFIG_I2C_DEBUG_BUS is not set So I guess I'll need to build a debug kernel first, is this correct?
No, that should not be necessary. As root, try: # cd /sys/kernel/debug/tracing # echo "adapter_nr==1" > events/smbus/filter # echo 1 > events/smbus/enable # echo 1 > tracing_on # modprobe spd5118 # sensors # cat trace # echo 0 > tracing_on
Ah yes, now I see something, thanks. Not very useful I guess: # tracer: nop # # entries-in-buffer/entries-written: 16/16 #P:32 # # _-----=> irqs-off/BH-disabled # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / _-=> migrate-disable # |||| / delay # TASK-PID CPU# ||||| TIMESTAMP FUNCTION # | | | ||||| | | modprobe-3540 [000] ..... 356.148355: smbus_read: i2c-1 a=050 f=0000 c=0 WORD_DATA modprobe-3540 [000] ..... 356.352449: smbus_result: i2c-1 a=050 f=0000 c=0 WORD_DATA rd res=-110 modprobe-3540 [000] ..... 356.352523: smbus_read: i2c-1 a=051 f=0000 c=0 WORD_DATA modprobe-3540 [000] ..... 356.560456: smbus_result: i2c-1 a=051 f=0000 c=0 WORD_DATA rd res=-110 modprobe-3540 [000] ..... 356.560586: smbus_read: i2c-1 a=052 f=0000 c=0 BYTE modprobe-3540 [000] ..... 356.769460: smbus_result: i2c-1 a=052 f=0000 c=0 BYTE rd res=-110 modprobe-3540 [000] ..... 356.769465: smbus_read: i2c-1 a=053 f=0000 c=0 BYTE modprobe-3540 [000] ..... 356.977454: smbus_result: i2c-1 a=053 f=0000 c=0 BYTE rd res=-110 modprobe-3540 [000] ..... 356.977457: smbus_read: i2c-1 a=054 f=0000 c=0 BYTE modprobe-3540 [000] ..... 357.184467: smbus_result: i2c-1 a=054 f=0000 c=0 BYTE rd res=-110 modprobe-3540 [000] ..... 357.184476: smbus_read: i2c-1 a=055 f=0000 c=0 BYTE modprobe-3540 [000] ..... 357.392448: smbus_result: i2c-1 a=055 f=0000 c=0 BYTE rd res=-110 modprobe-3540 [000] ..... 357.392454: smbus_read: i2c-1 a=056 f=0000 c=0 BYTE modprobe-3540 [000] ..... 357.600469: smbus_result: i2c-1 a=056 f=0000 c=0 BYTE rd res=-110 modprobe-3540 [000] ..... 357.600476: smbus_read: i2c-1 a=057 f=0000 c=0 BYTE modprobe-3540 [000] ..... 357.809453: smbus_result: i2c-1 a=057 f=0000 c=0 BYTE rd res=-110 I should probably somehow enable tracing at the beginning of the boot, to log the entire communication.
Yes, in that situation the bus is already locked up.
It is odd though that it tries a byte read on addresses 0x52..0x57. The driver should not do that - it only uses word accesses for register 0x00.
I think i2c_default_probe() is doing the byte accesses. That makes me wonder if the bus locks up when trying to read from those addresses. The first pass would detect the connected chips at address 0x50 and 0x51, subsequent passes would encounter the bus in locked state.
Is this data retrieved from the bus? # dmesg | grep spd [ 5.412402] spd5118 1-0050: DDR5 temperature sensor: vendor 0x06:0x32 revision 1.6 [ 5.417398] spd5118 1-0051: DDR5 temperature sensor: vendor 0x06:0x32 revision 1.6 cause that data arrives only when the driver is not blacklisted. If I blacklist it and modprobe it after the boot, that is not received.
Yes, the data is retrieved from the bus. My current theory is: - The i801 driver probes the address range from 0x50..0x57 to check if a device is connected - Whenever it find one, the respective driver's probe function is called. That function reads the chip vendor and revision from the chip and instantiates the device - The i801 driver keeps probing the remaining addresses. While probing those, the controller locks up. You should see some "Successfully instantiated SPD at ..." messages in the boot log. I assume you have CONFIG_SENSORS_SPD5118_DETECT enabled in your configuration - that would trigger those extra address checks. If it is enabled, disabling that configuration option might be a workaround. If the bus indeed locks up due to the attempted read operations on addresses 0x52..0x57, that would however only be a temporary workaround. The real fix would have to be implemented in the i801 driver - for example by resetting the controller if it is stuck. In this context, keep in mind that the spd5118 driver will be blacklisted at some point in the near future because write protecting the 0x50..0x57 address range is not compatible with the operation of spd5118 devices.
"The real fix would have to be implemented in the i801 driver - for example by resetting the controller if it is stuck." i801 does this already, it sets the SMBHSTCNT_KILL bit in case of a transaction timeout. However such a soft reset may have its limits, depending on how hard the chip has locked up. And I'd rather call this a workaround. IMO the root cause is the probing mechanism. An address should only be accessed if we know there's a RAM module. Like for DDR4 driver ee1004 instantiates jc42 at an address only if SPD indicates presence of a temp sensor.
The i801 controller also has a soft_reset bit which isn't supported. "IMO the root cause is the probing mechanism." - sure, that is why there is a SENSORS_SPD5118_DETECT configuration option. The comment regarding the jc42 driver isn't really accurate because the i2c subsystem _will_ scan the entire 0x18..0x1f address range when the driver is loaded. Only difference is that the i2c core will use the I2C_FUNC_SMBUS_QUICK command if supported for that scan and not attempt a byte read. Maye we should change the comment associated with SENSORS_SPD5118_DETECT to "if unsure, say N". I don't think that will really help, though.
> You should see some "Successfully instantiated SPD at ..." messages in the > boot log. Yes: # dmesg | grep -i spd [ 5.333756] i801_smbus 0000:00:1f.4: SPD Write Disable is set [ 5.343238] i2c i2c-1: Successfully instantiated SPD at 0x50 [ 5.343655] i2c i2c-1: Successfully instantiated SPD at 0x51 > I assume you have CONFIG_SENSORS_SPD5118_DETECT enabled in your configuration > - that would trigger those extra address checks. Yes, correct: $ zcat /proc/config.gz | grep CONFIG_SENSORS_SPD5118_DETECT CONFIG_SENSORS_SPD5118_DETECT=y > If it is enabled, disabling that configuration option might be a workaround. > If the bus indeed locks up due to the attempted read operations on addresses > 0x52..0x57, that would however only be a temporary workaround. The real fix > would have to be implemented in the i801 driver - for example by resetting > the controller if it is stuck. The workaround I found seems to work well so far: I blacklist the spd5118 driver. I was hoping a real fix could be found: a user should not be forced to blacklist a driver to suspend the machine. Thank you for your help.
Looking this patch series [1] i begin to wonder whether the i801 driver should even instantiate any SPD chips when the write disable bit is set. This would prevent the bus lockup during probe and possibly other issues should this issue also exist on DDR4 platforms. [1] https://lore.kernel.org/linux-i2c/20250430-for-upstream-i801-spd5118-no-instantiate-v2-0-2f54d91ae2c7@canonical.com/
DDR4 chips don't have the problem: The temperature sensors are at a different address range (0x18..0x1f) and are not affected by the write protection, and the eeprom driver is implemented read-only. Also, unlike DDR5/SPD5118, DDR4 EEPROMs do not require write operations into the 0x50..0x5f address range for page selection.
(In reply to Guenter Roeck from comment #50) > The i801 controller also has a soft_reset bit which isn't supported. > I think you're referring to SSRESET in HOSTC in the PCI config space. Right, this isn't supported yet. > "IMO the root cause is the probing mechanism." - sure, that is why there is > a SENSORS_SPD5118_DETECT configuration option. The comment regarding the > jc42 driver isn't really accurate because the i2c subsystem _will_ scan the > entire 0x18..0x1f address range when the driver is loaded. Only difference > is that the i2c core will use the I2C_FUNC_SMBUS_QUICK command if supported > for that scan and not attempt a byte read. > > Maye we should change the comment associated with SENSORS_SPD5118_DETECT to > "if unsure, say N". I don't think that will really help, though. I was a little fast with writing, and I agree with the remarks. I2C probing is considered a legacy mechanism by Wolfram. The problem here highlights why. Can't we remove the probing from spd5118 and only instantiate it at an address if DMI/SPD indicate presence?
> I was a little fast with writing, and I agree with the remarks. > I2C probing is considered a legacy mechanism by Wolfram. The problem here > highlights why. Can't we remove the probing from spd5118 and only > instantiate it at an address if DMI/SPD indicate presence? That is why the configuration option exists. Not all systems support DMI, and not all controllers call the SPD probe functions. On devicetree systems, due to the dynamic nature of memory insertion, it is unlikely that the DIMMs are listed in devicetree. On top of that, even the SPD probe function probes the addresses: If DIMMs are not inserted in order and there are holes in the address assignments, we would be back to address probing of addresses with nothing connected to it (because that is what the SPD detection code does). DMI only reports the number of inserted modules, not their addresses.