Bug 118811
Summary: | Bisected: 30 secs boot delay + confusing UEFI on reboot | ||
---|---|---|---|
Product: | Drivers | Reporter: | Thomas (v10lator) |
Component: | I2C | Assignee: | Drivers/I2C virtual user (drivers-i2c) |
Status: | RESOLVED DUPLICATE | ||
Severity: | normal | CC: | bastienphilbert, bugfinding+linux, jdelvare |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | 4.5, 4.6, next | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
i2c Patch
dmesg of linux-4.7-rc5 Test Patch |
Description
Thomas
2016-05-24 06:15:08 UTC
Not sure if that helps but I forgot:
> $ lspci -s 00:14.0 -nnv
> 00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 SMBus
> Controller [1002:4385] (rev 42)
> Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 SMBus
> Controller [1002:4385]
> Flags: 66MHz, medium devsel
> Kernel driver in use: piix4_smbus
I can confirm this issue with the same board, Asus Sabertooth 990FX R2.0 Upon reboot it appears to want to overclock, as it complains about an overclocking failure. I suspect the change to the i2c drivers somehow triggers something within the pathway Asus uses for its overclocking tools. I'm not sure it really wants to overclock. This is how I see it: The change in initialization process somehow messes up values, but not the ones on CMOS. Then, when you reboot, these messed up values prevent some hardware (i2c maybe?) from being initialized by UEFI, so it reboots a few times in a row (you won't notice that except the long waiting time), resets everything to default and presents you the overclocking message. Now why do I think the overclocking message is just a default message for failed boot? Cause when you enter the UEFI menu to check the settings (like it tells you to do) and look at Advanced Mode -> Ai Tweaker every value is what it should be - which shouldn't be the case if there was some overclocking going on. P.S. This is just what I think that might happen. I'm in no way an expert. Thomas, you might very well be right. Has been fixed by commit 52795f6fdeb8a2b98373108ac2838c674bb2cbc4. (In reply to bastienphilbert from comment #5) > Has been fixed by commit 52795f6fdeb8a2b98373108ac2838c674bb2cbc4. I updated my kernel to 4.5.7-hardened-r3 (this is a hardened gentoo kernel: https://packages.gentoo.org/packages/sys-kernel/hardened-sources ) which seems to already include the commit you mentioned but that didn't fix this issue for me. (In reply to Thomas from comment #6) > but that didn't fix this issue for me. I looked a bit deeper into the patch (commit 52795f6fdeb8a2b98373108ac2838c674bb2cbc4) and it says "So restore the original names for all legacy (pre-SB800) devices." but this is a SB800 device: lspci -nn -s 00:14.0 -vvvv 00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 SMBus Controller [1002:4385] (rev 42) Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 SMBus Controller [1002:4385] Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap- 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Kernel driver in use: piix4_smbus (compare that with http://lxr.free-electrons.com/source/include/linux/pci_ids.h?v=3.19#L380 and the patch itself (what does set is_sb800 to true?). On SB800 the commit says it changes a "implementation detail the user doesn't need to know" only, so I guess this can't work at all. As I have some free time atm I decided to try to track this down some more and it worked! First off I'm no expert in low level programming nor I2C nor AMD hardware so my workaround might have unwanted side effects like killing your kitty. Anyway, here's a one-liner that fixes the issue: --- linux-4.5.7-hardened-r3.old/drivers/i2c/busses/i2c-piix4.c 2016-06-27 13:18:05.354987218 +0200 +++ linux-4.5.7-hardened-r3/drivers/i2c/busses/i2c-piix4.c 2016-06-27 13:05:19.269893243 +0200 @@ -642,8 +642,7 @@ adap->owner = THIS_MODULE; adap->class = I2C_CLASS_HWMON | I2C_CLASS_SPD; - adap->algo = sb800_main ? &piix4_smbus_algorithm_sb800 - : &smbus_algorithm; + adap->algo = &smbus_algorithm; adapdata = kzalloc(sizeof(*adapdata), GFP_KERNEL); if (adapdata == NULL) { Tracking this down some more the difference seems to be between http://lxr.free-electrons.com/source/drivers/i2c/busses/i2c-piix4.c#L583 (bugged) and http://lxr.free-electrons.com/source/drivers/i2c/busses/i2c-piix4.c#L485 (works) but that's for a experienced kernel dev to figure out. One more thing: If you have a look at the bad commit you'll notice this: - adap->algo = &smbus_algorithm; + adap->algo = sb800_main ? &piix4_smbus_algorithm_sb800 + : &smbus_algorithm; but Jean Delvare didn't tell about this in the commit log. I wonder what this was originally supposed to fix? If you read the commit carefully that does need to be there as the bisected patch is trying to stop a race condition by setting the parent algorithm before the card is initialized. However seems that the call in this buses probe method calls it with false rather then true for the parent card and this may be causing a issue of setting up the wrong algorithm for the parent card, i.e. using the smbus algorithm designed for children cards. I am uploading a patch to see if this is actually the issue. Created attachment 221291 [details]
i2c Patch
(In reply to bastienphilbert from comment #11) > Created attachment 221291 [details] > i2c Patch Doesn't seem to help. Ok that was a stupid idea after having a closer look at the code for your driver. Seems that the piix_smbus_algorithm was not supported alternative ports and was fixed by this commit,https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=6befa3fde65fe437f588da490c07a114393ce229. Can you test a newer kernel with that patch applied i.e. mainline in order to see if your bug is fixed by that commit. (In reply to bastienphilbert from comment #13) > Ok that was a stupid idea after having a closer look at the code for your > driver. > Seems that the piix_smbus_algorithm was not supported alternative ports and > was fixed by this > commit,https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/ > commit/?id=6befa3fde65fe437f588da490c07a114393ce229. Can you test a newer > kernel with that patch applied i.e. mainline in order to see if your bug is > fixed by that commit. I applied that patch to my kernel but it didn't help. Do you still want me to test a newer kernel? Test the lastest rc candidate and report back with a dmesg. Seems a few things may be going wrong either we are writing to incorrect registers or the port numbers are not being set correctly. Created attachment 221461 [details]
dmesg of linux-4.7-rc5
Latest rc shows the same issue but seems to be less verbose (no more "i2c i2c-12: Failed! (01)" messages).
Seem if this either gets you a different output or fixes the issue. Created attachment 221471 [details]
Test Patch
The "bad patch" is adding support for the multiplexed SMBus channels. The code is not broken per se, however it turns out that on some boards the extra SMBus channels are not wired properly. So as soon as you try to access the SMBus channels in question (for example to probe if slave devices are present) long delays occur. Unfortunately there is no way to know if the extra SMBus channels work or not. I have no immediate solution for this problem. Reverting the patch in question would again deny access to the extra SMBus channels on boards where they actually work. The current list of issues with my views on them was published on the linux-i2c list some times ago: http://marc.info/?l=linux-i2c&m=145744643613736&w=2 *** This bug has been marked as a duplicate of bug 112811 *** |