Bug 118811 - Bisected: 30 secs boot delay + confusing UEFI on reboot
Summary: Bisected: 30 secs boot delay + confusing UEFI on reboot
Status: RESOLVED DUPLICATE of bug 112811
Alias: None
Product: Drivers
Classification: Unclassified
Component: I2C (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: Drivers/I2C virtual user
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-05-24 06:15 UTC by Thomas
Modified: 2016-07-01 06:19 UTC (History)
3 users (show)

See Also:
Kernel Version: 4.5, 4.6, next
Subsystem:
Regression: No
Bisected commit-id:


Attachments
i2c Patch (890 bytes, patch)
2016-06-27 15:39 UTC, [account disabled by administrator]
Details | Diff
dmesg of linux-4.7-rc5 (74.95 KB, text/plain)
2016-06-29 09:30 UTC, Thomas
Details
Test Patch (795 bytes, patch)
2016-06-29 16:26 UTC, [account disabled by administrator]
Details | Diff

Description Thomas 2016-05-24 06:15:08 UTC
So first of the symptoms of the bug:
	
The system hangs for 30 seconds while booting:
```
> [    5.110391] sd 9:0:0:0: [sdc] Attached SCSI removable disk
> [   35.829145] i2c i2c-11: Failed reset at end of transaction (01)
> [   35.829180] i2c i2c-11: Failed! (01)
> [   35.829208] i2c i2c-11: Failed! (01)
> [   35.829236] i2c i2c-11: Failed! (01)
> [   35.829263] i2c i2c-11: Failed! (01)
> [   35.829291] i2c i2c-11: Failed! (01)
> [   35.829319] i2c i2c-11: Failed! (01)
> [   35.829346] i2c i2c-11: Failed! (01)
> [   35.829420] i2c i2c-12: Failed! (01)
> [   35.829448] i2c i2c-12: Failed! (01)
> [   35.829476] i2c i2c-12: Failed! (01)
> [   35.829504] i2c i2c-12: Failed! (01)
> [   35.829532] i2c i2c-12: Failed! (01)
> [   35.829559] i2c i2c-12: Failed! (01)
> [   35.829587] i2c i2c-12: Failed! (01)
> [   35.829614] i2c i2c-12: Failed! (01)
> [   35.829637] piix4_smbus 0000:00:14.0: Auxiliary SMBus Host Controller at
> 0xb20

Then, when you reboot, everything seems okay until the display turns off: It never turns on again. Well, that's not 100% true: If you wait long enough UEFI will scream that it couldn't boot and you should check the settings (so better press the reset button before that).

As the title says I bisected this and here's the result:

> 83c60158ebf16417af28b338bc0380cf17f2b9f9 is the first bad commit
> commit 83c60158ebf16417af28b338bc0380cf17f2b9f9
> Author: Jean Delvare <jdelvare@suse.de>
> Date:   Mon Jan 25 12:17:07 2016 +0100
> 
>     i2c: piix4: Fully initialize SB800 before it is registered
>     
>     This closes a race window where I2C device drivers attempt to access
>     I2C buses which aren't fully initialized yet.
>     
>     Signed-off-by: Jean Delvare <jdelvare@suse.de>
>     Tested-by: Christian Fetzer <fetzer.ch@gmail.com>
>     Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
> 
> :040000 040000 19df69fe03ba0bfaab9e6a60fabe9ab387efb0bb
> 807abbfb65fe1b2bd06469a2e39e953675e6a975 M  drivers

The motherboard in use is a ASUS SABERTOOTH 990FX R2.0

This bug is still present in next-20160524
Comment 1 Thomas 2016-05-24 06:20:48 UTC
Not sure if that helps but I forgot:

> $ lspci -s 00:14.0 -nnv
> 00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 SMBus
> Controller [1002:4385] (rev 42)
>         Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 SMBus
>         Controller [1002:4385]
>         Flags: 66MHz, medium devsel
>         Kernel driver in use: piix4_smbus
Comment 2 bugfinding+linux 2016-05-27 08:15:39 UTC
I can confirm this issue with the same board, Asus Sabertooth 990FX R2.0

Upon reboot it appears to want to overclock, as it complains about an overclocking failure. I suspect the change to the i2c drivers somehow triggers something within the pathway Asus uses for its overclocking tools.
Comment 3 Thomas 2016-05-27 08:27:56 UTC
I'm not sure it really wants to overclock. This is how I see it:
The change in initialization process somehow messes up values, but not the ones on CMOS. Then, when you reboot, these messed up values prevent some hardware (i2c maybe?) from being initialized by UEFI, so it reboots a few times in a row (you won't notice that except the long waiting time), resets everything to default and presents you the overclocking message.

Now why do I think the overclocking message is just a default message for failed boot? Cause when you enter the UEFI menu to check the settings (like it tells you to do) and look at Advanced Mode -> Ai Tweaker every value is what it should be - which shouldn't be the case if there was some overclocking going on.

P.S. This is just what I think that might happen. I'm in no way an expert.
Comment 4 bugfinding+linux 2016-05-27 08:43:01 UTC
Thomas, you might very well be right.
Comment 5 [account disabled by administrator] 2016-06-26 22:00:05 UTC
Has been fixed by commit 52795f6fdeb8a2b98373108ac2838c674bb2cbc4.
Comment 6 Thomas 2016-06-27 10:21:00 UTC
(In reply to bastienphilbert from comment #5)
> Has been fixed by commit 52795f6fdeb8a2b98373108ac2838c674bb2cbc4.

I updated my kernel to 4.5.7-hardened-r3 (this is a hardened gentoo kernel: https://packages.gentoo.org/packages/sys-kernel/hardened-sources ) which seems to already include the commit you mentioned but that didn't fix this issue for me.
Comment 7 Thomas 2016-06-27 10:28:54 UTC
(In reply to Thomas from comment #6)
> but that didn't fix this issue for me.

I looked a bit deeper into the patch (commit 52795f6fdeb8a2b98373108ac2838c674bb2cbc4) and it says "So restore the
original names for all legacy (pre-SB800) devices." but this is a SB800 device:

lspci -nn -s 00:14.0 -vvvv
00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 SMBus Controller [1002:4385] (rev 42)
        Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 SMBus Controller [1002:4385]
        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap- 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Kernel driver in use: piix4_smbus

(compare that with http://lxr.free-electrons.com/source/include/linux/pci_ids.h?v=3.19#L380 and the patch itself (what does set is_sb800 to true?).

On SB800 the commit says it changes a "implementation detail the
user doesn't need to know" only, so I guess this can't work at all.
Comment 8 Thomas 2016-06-27 11:23:17 UTC
As I have some free time atm I decided to try to track this down some more and it worked!

First off I'm no expert in low level programming nor I2C nor AMD hardware so my workaround might have unwanted side effects like killing your kitty. Anyway, here's a one-liner that fixes the issue:

--- linux-4.5.7-hardened-r3.old/drivers/i2c/busses/i2c-piix4.c  2016-06-27 13:18:05.354987218 +0200
+++ linux-4.5.7-hardened-r3/drivers/i2c/busses/i2c-piix4.c      2016-06-27 13:05:19.269893243 +0200
@@ -642,8 +642,7 @@
 
        adap->owner = THIS_MODULE;
        adap->class = I2C_CLASS_HWMON | I2C_CLASS_SPD;
-       adap->algo = sb800_main ? &piix4_smbus_algorithm_sb800
-                               : &smbus_algorithm;
+       adap->algo = &smbus_algorithm;
 
        adapdata = kzalloc(sizeof(*adapdata), GFP_KERNEL);
        if (adapdata == NULL) {

Tracking this down some more the difference seems to be between http://lxr.free-electrons.com/source/drivers/i2c/busses/i2c-piix4.c#L583 (bugged) and http://lxr.free-electrons.com/source/drivers/i2c/busses/i2c-piix4.c#L485 (works) but that's for a experienced kernel dev to figure out.
Comment 9 Thomas 2016-06-27 11:55:33 UTC
One more thing: If you have a look at the bad commit you'll notice this:

-	adap->algo = &smbus_algorithm;
+	adap->algo = sb800_main ? &piix4_smbus_algorithm_sb800
+				: &smbus_algorithm;

but Jean Delvare didn't tell about this in the commit log. I wonder what this was originally supposed to fix?
Comment 10 [account disabled by administrator] 2016-06-27 15:38:34 UTC
If you read the commit carefully that does need to be there as the bisected patch is trying to stop a race condition by setting the parent algorithm before the card is initialized. However seems that the call in this buses probe method calls it with false rather then true for the parent card and this may be causing a issue of setting up the wrong algorithm for the parent card, i.e. using the smbus algorithm designed for children cards. I am uploading a patch to see if this is actually the issue.
Comment 11 [account disabled by administrator] 2016-06-27 15:39:03 UTC
Created attachment 221291 [details]
i2c Patch
Comment 12 Thomas 2016-06-27 16:50:43 UTC
(In reply to bastienphilbert from comment #11)
> Created attachment 221291 [details]
> i2c Patch

Doesn't seem to help.
Comment 13 [account disabled by administrator] 2016-06-27 19:06:14 UTC
Ok that was a stupid idea after having a closer look at the code for your driver.
Seems that the piix_smbus_algorithm was not supported alternative ports and was fixed by this commit,https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=6befa3fde65fe437f588da490c07a114393ce229. Can you test a newer kernel with that patch applied i.e. mainline in order to see if your bug is fixed by that commit.
Comment 14 Thomas 2016-06-28 09:12:51 UTC
(In reply to bastienphilbert from comment #13)
> Ok that was a stupid idea after having a closer look at the code for your
> driver.
> Seems that the piix_smbus_algorithm was not supported alternative ports and
> was fixed by this
> commit,https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/
> commit/?id=6befa3fde65fe437f588da490c07a114393ce229. Can you test a newer
> kernel with that patch applied i.e. mainline in order to see if your bug is
> fixed by that commit.

I applied that patch to my kernel but it didn't help. Do you still want me to test a newer kernel?
Comment 15 [account disabled by administrator] 2016-06-28 15:32:12 UTC
Test the lastest rc candidate and report back with a dmesg. Seems a few things may be going wrong either we are writing to incorrect registers or the port numbers are not being set correctly.
Comment 16 Thomas 2016-06-29 09:30:02 UTC
Created attachment 221461 [details]
dmesg of linux-4.7-rc5

Latest rc shows the same issue but seems to be less verbose (no more "i2c i2c-12: Failed! (01)" messages).
Comment 17 [account disabled by administrator] 2016-06-29 16:24:38 UTC
Seem if this either gets you a different output or fixes the issue.
Comment 18 [account disabled by administrator] 2016-06-29 16:26:25 UTC
Created attachment 221471 [details]
Test Patch
Comment 19 Jean Delvare 2016-07-01 06:19:28 UTC
The "bad patch" is adding support for the multiplexed SMBus channels. The code is not broken per se, however it turns out that on some boards the extra SMBus channels are not wired properly. So as soon as you try to access the SMBus channels in question (for example to probe if slave devices are present) long delays occur.

Unfortunately there is no way to know if the extra SMBus channels work or not. I have no immediate solution for this problem. Reverting the patch in question would again deny access to the extra SMBus channels on boards where they actually work.

The current list of issues with my views on them was published on the linux-i2c list some times ago:
http://marc.info/?l=linux-i2c&m=145744643613736&w=2

*** This bug has been marked as a duplicate of bug 112811 ***

Note You need to log in before you can comment on or make changes to this bug.