Bug 112811 - udevd hangs on boot for 40-50 seconds
Summary: udevd hangs on boot for 40-50 seconds
Status: NEEDINFO
Alias: None
Product: Drivers
Classification: Unclassified
Component: I2C (show other bugs)
Hardware: IA-32 Linux
: P1 normal
Assignee: Jean Delvare
URL:
Keywords:
: 118811 200713 (view as bug list)
Depends on:
Blocks:
 
Reported: 2016-02-21 19:02 UTC by Stuart Foster
Modified: 2020-08-06 16:05 UTC (History)
3 users (show)

See Also:
Kernel Version: Linux 4.5-RC5
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
Bisect .config , Linux 4.5-RC5 dmesg and lspci -v (190.00 KB, application/octet-stream)
2016-02-21 19:02 UTC, Stuart Foster
Details
dmesg output i2c debug turned on (101.07 KB, text/plain)
2016-02-22 23:27 UTC, Stuart Foster
Details
dmesg from test i2c-piix4.ko (94.67 KB, text/plain)
2016-02-25 14:48 UTC, Stuart Foster
Details
dmesg with i2c-piix4 DEBUG turned on (100.57 KB, text/plain)
2016-03-02 14:07 UTC, Stuart Foster
Details
dmesg after soak (100.79 KB, text/plain)
2016-03-02 16:16 UTC, Stuart Foster
Details
Binary DMI dump from ASUS SABERTOOTH 990FX R2.0 (2.48 KB, application/octet-stream)
2016-07-01 15:49 UTC, Thomas
Details
Binary DMI dump from ASUS M5A97 PRO (2.49 KB, application/octet-stream)
2016-07-01 22:06 UTC, Stuart Foster
Details
dmesg with i2c debug, ASUS K53U (104.30 KB, text/plain)
2018-08-21 10:47 UTC, mirh
Details

Description Stuart Foster 2016-02-21 19:02:21 UTC
Created attachment 204241 [details]
Bisect .config , Linux 4.5-RC5 dmesg and lspci -v

While testing Linux 4.5.0-rc5 I noticed that was hanging for a significant period on booting. On bisecting I found the following commit was causing my issue:


83c60158ebf16417af28b338bc0380cf17f2b9f9 is the first bad commit
commit 83c60158ebf16417af28b338bc0380cf17f2b9f9
Author: Jean Delvare <jdelvare@suse.de>
Date:   Mon Jan 25 12:17:07 2016 +0100

    i2c: piix4: Fully initialize SB800 before it is registered
    
    This closes a race window where I2C device drivers attempt to access
    I2C buses which aren't fully initialized yet.
    
    Signed-off-by: Jean Delvare <jdelvare@suse.de>
    Tested-by: Christian Fetzer <fetzer.ch@gmail.com>
    Signed-off-by: Wolfram Sang <wsa@the-dreams.de>

:040000 040000 19df69fe03ba0bfaab9e6a60fabe9ab387efb0bb 807abbfb65fe1b2bd06469a2e39e953675e6a975 M	drivers

Any suggestions ?
Comment 1 Stuart Foster 2016-02-22 23:27:49 UTC
Created attachment 204361 [details]
dmesg output i2c debug turned on

Here with some additional information, also:

root@Andromeda:~# i2cdetect -l
i2c-0	i2c       	Radeon i2c bit bus 0x90         	I2C adapter
i2c-1	i2c       	Radeon i2c bit bus 0x91         	I2C adapter
i2c-2	i2c       	Radeon i2c bit bus 0x92         	I2C adapter
i2c-3	i2c       	Radeon i2c bit bus 0x93         	I2C adapter
i2c-4	i2c       	Radeon i2c bit bus 0x94         	I2C adapter
i2c-5	i2c       	Radeon i2c bit bus 0x95         	I2C adapter
i2c-6	i2c       	Radeon i2c bit bus 0x96         	I2C adapter
i2c-7	i2c       	Radeon i2c bit bus 0x97         	I2C adapter
i2c-8	smbus     	SMBus PIIX4 adapter port 0 at 0b00	SMBus adapter
i2c-9	smbus     	SMBus PIIX4 adapter port 2 at 0b00	SMBus adapter
i2c-10	smbus     	SMBus PIIX4 adapter port 3 at 0b00	SMBus adapter
i2c-11	smbus     	SMBus PIIX4 adapter port 4 at 0b00	SMBus adapter
i2c-12	smbus     	SMBus PIIX4 adapter port 1 at 0b20	SMBus adapter

If I build the i2c_piix4 code into the kernel the boot hangs before udevd is run, which in my opinion eliminates udevd as the source of my problem and further points the problem back to the i2c piix4 fix identified previously.

Thanks.
Comment 2 Jean Delvare 2016-02-23 11:08:54 UTC
Thanks for reporting.

Which chip is this happening on?

$ /sbin/lspci -nn | grep SMBus

should tell.
Comment 3 Jean Delvare 2016-02-23 11:10:22 UTC
Sorry, you can ignore the previous comment, just found the lspci in the first attachment.
Comment 4 Jean Delvare 2016-02-23 12:28:14 UTC
The bisected patch does the right thing, the problem is that it uncovers a problem that was hidden by the bug it fixes. Looking at the logs, it turns out that 2 of the multiplexed SMBus channels do not work properly. You have an i2c device driver loaded (eeprom) which probes 8 I2C addresses on each SMBus port. There is a 2-3 second timeout happening each time the SMBus controller doesn't respond. 2.5 * 2 * 8 = 40 seconds.

As a temporary workaround, if you do not load the eeprom module, you should avoid the boot time delay.

I'll see if I can find a way to find our which SMBus channels are disabled, so that they aren't registered at all.

As a side note, I see you have CONFIG_GPIO_SX150X=y in your configuration file, which you most probably do not need.
Comment 5 Stuart Foster 2016-02-23 13:17:16 UTC
Thanks for the update, I have taken note of the CONFIG_GPIO_SX150X=y comment and the workaround for the eeprom module should I need it.
Comment 6 Jean Delvare 2016-02-25 13:58:39 UTC
I am still trying to gain access to a system with a similar SMBus controller.

I have modified the driver to check for pin usage. If you feel adventurous, feel tree to give it a try:
http://jdelvare.nerim.net/devel/lm-sensors/drivers/i2c-piix4/

Instructions at:
http://jdelvare.nerim.net/devel/lm-sensors/drivers/INSTALL

Remember it is totally untested code (although I don't expect anything serious to happen besides the driver not working.) This may or may not solve your problem. In both cases I would be interested in the corresponding kernel logs.

It is also possible that the pins are configured for SMBus usage (as this is the default for all but SMBus port 4) but the hardware manufacturer left the pins floating because they don't need the extra SMBus ports. If this is the case, my changes won't help, and I'm afraid I don't have any idea how to detect that :(
Comment 7 Stuart Foster 2016-02-25 14:48:07 UTC
Created attachment 206021 [details]
dmesg from test i2c-piix4.ko

Using the test i2c-piix4.ko resulted in the same level of delay in system start up.
Comment 8 Stuart Foster 2016-02-27 14:39:25 UTC
If we cannot detect the ports that are causing the problem at run time could we set boot parameters to ignore specific ports ? :-)
Comment 9 Jean Delvare 2016-03-02 09:52:17 UTC
My changes didn't work:

piix4_smbus 0000:00:14.0: ACPI MMIO space enabled, memory-mapped at 0x00000000

Obviously this address makes no sense. So either my code is wrong or the address is not initialized by your BIOS. But it is odd because the datasheets says it does have a (non-zero) default value. I just noticed that the sp5100_tco driver must deal with the same kind of configuration check for this chipset and the code there is much more complex than what I came up with. I'll investigate.

With the current code, as it can't read the pin configuration, the driver falls back to the default pin configuration, which is ports 0, 2, 3 enabled and port 4 disabled. You can indeed see that port 4 is no longer instantiated by the driver, which is probably good as it didn't look like you had anything there. I'm not 100% sure though, as apparently the many errors caused the SMBus controller to go wild and I'm not sure it is still behaving properly. Out of curiosity, are you still able to access the devices on port 0 (e.g. eeprom) after the driver is done with probing the phantom ports? I'm not sure the error recovery paths are working, they are barely tested.

Anyway, port 4 is not where the timeouts causing the boot delay were occurring anyway. The delays come from ports 2 and 3 which are still enabled, so no improvement.

Indeed if we can't get a reliable way to detect which ports should be enabled or disabled, we may have to disable multiplexing altogether when we have no reason to believe is is configured properly. Or at least disable I2C/SMBus slave device auto-detection by default on the extra ports. And we could add a module parameter to manually override the configuration, as you suggested. But I would prefer if we can get things to just work by default... So I'll try harder.
Comment 10 Jean Delvare 2016-03-02 09:52:46 UTC
Note: The new log is less verbose because debugging isn't enabled. If you uncomment the following line in the standalone driver's Makefile:

#EXTRA_CFLAGS	+= -DDEBUG

you'll get more information from the failures.
Comment 11 Jean Delvare 2016-03-02 10:31:28 UTC
From the Bolton RRG datasheet:

SCL0/GPIO43 is configured as SCL0 when (PMIO_2E) Smbus0Sel[1:0] are set to 00b; otherwise, it is configured as GPIO43
SDA0/GPIO47 is configured as SDA0 when (PMIO_2E) Smbus0Sel[1:0] are set to 00b; otherwise, it is configured as GPIO47

And same for all other SMBus ports. So at least on the Bolton there is no way to detect which pins were intended for SMBus usage and which weren't. Don't know if that holds for the SB800 as well, or Mullins/Carrizo either.
Comment 12 Stuart Foster 2016-03-02 14:07:09 UTC
Created attachment 206631 [details]
dmesg with i2c-piix4 DEBUG turned on

This is the dmesg with i2c-piix4 DEBUG turned on (using linux-4.5-rc6). This build does in fact boot quicker (but if we are dealing with floating ports who knows what the current states are). The machine has been turned off till just now so I will let it soak for a while and test again. I did check the eeprom contents via the /sys entries and they do appear to be working on this build.
Comment 13 Jean Delvare 2016-03-02 14:10:56 UTC
Good, I w
Comment 14 Jean Delvare 2016-03-02 14:13:28 UTC
Good, I was worried that maybe the repeated failures would confuse the controller to a point where it would no longer work even on the buses which actually exist.
Comment 15 Stuart Foster 2016-03-02 16:16:22 UTC
Created attachment 206641 [details]
dmesg after soak

Retesting after ~1 hour shows the boot delay is back to ~50 seconds. Looks like floating ports are present ?
Comment 16 Stuart Foster 2016-03-02 23:12:27 UTC
Jean

In response to your e-mail:

"Hi Stuart,

I am travelling and unable to update my standalone driver repository at
this time. Could you please give a try to the attached driver and report
the address reported for ACPI MMIO space, hopefully no longer
0x00000000? I stole the code from the sp5100_tco driver.

Thanks,"

I tested the revised 2c-piix4.c code and the reseult was as follows:

[    5.254583] piix4_smbus 0000:00:14.0: ACPI MMIO space enabled, memory-mapped at 0xfed80000
Comment 17 Jean Delvare 2016-03-03 11:51:01 UTC
Good, at least reading the address is fixed. Did it change anything regarding which ports are instantiated? Do you still get ports 0, 2, 3 and 1 instantiated and port 4 skipped?
Comment 18 Stuart Foster 2016-03-03 13:27:57 UTC
Yes:

i2cdetect -l
i2c-0	i2c       	Radeon i2c bit bus 0x90         	I2C adapter
i2c-1	i2c       	Radeon i2c bit bus 0x91         	I2C adapter
i2c-2	i2c       	Radeon i2c bit bus 0x92         	I2C adapter
i2c-3	i2c       	Radeon i2c bit bus 0x93         	I2C adapter
i2c-4	i2c       	Radeon i2c bit bus 0x94         	I2C adapter
i2c-5	i2c       	Radeon i2c bit bus 0x95         	I2C adapter
i2c-6	i2c       	Radeon i2c bit bus 0x96         	I2C adapter
i2c-7	i2c       	Radeon i2c bit bus 0x97         	I2C adapter
i2c-8	smbus     	SMBus PIIX4 adapter port 0 at 0b00	SMBus adapter
i2c-9	smbus     	SMBus PIIX4 adapter port 2 at 0b00	SMBus adapter
i2c-10	smbus     	SMBus PIIX4 adapter port 3 at 0b00	SMBus adapter
i2c-11	smbus     	SMBus PIIX4 adapter port 1 at 0b20	SMBus adapter
Comment 19 Stuart Foster 2016-03-23 19:01:07 UTC
Jean,

It's been a while, have you had any further thoughts on this issue ?

Thanks

Stuart
Comment 20 Jean Delvare 2016-07-01 06:19:28 UTC
*** Bug 118811 has been marked as a duplicate of this bug. ***
Comment 21 Thomas 2016-07-01 15:49:59 UTC
Created attachment 221661 [details]
Binary DMI dump from ASUS SABERTOOTH 990FX R2.0

(In reply to Jean Delvare from bug 118811 comment #19)
> The current list of issues with my views on them was published on the
> linux-i2c list some times ago:
> http://marc.info/?l=linux-i2c&m=145744643613736&w=2

"Yet another possibility is a DMI-based list of boards with
known mappings. Not sure what is best, maybe we need a mix of all that,
opinions welcome."

I guess a list could work.

"please provide the DMI data for your board, in case we need
it later."

Attached is the output of dmidecode --dump-bin - hope it helps.
Comment 22 Stuart Foster 2016-07-01 22:06:03 UTC
Created attachment 221691 [details]
Binary DMI dump from ASUS M5A97 PRO

If it will help herewith the DMI dump for my M5A97 PRO.
Comment 23 Stuart Foster 2017-02-14 08:45:07 UTC
Is there any further work being carried out to resolve this issue ?
Comment 24 Elrondo 2017-05-12 05:46:06 UTC
Think the bug I have is the same as you:
https://bugzilla.kernel.org/show_bug.cgi?id=192121
Comment 25 Stuart Foster 2017-05-12 07:25:52 UTC
I used the work around in comment 4 to solve my startup delay, your issue may be different:

"As a temporary workaround, if you do not load the eeprom module, you should avoid the boot time delay."
Comment 26 mirh 2018-08-20 15:47:33 UTC
*** Bug 200713 has been marked as a duplicate of this bug. ***
Comment 27 mirh 2018-08-21 10:47:08 UTC
Created attachment 278001 [details]
dmesg with i2c debug, ASUS K53U

And here's mine with every I2C_DEBUG thing enabled. 
It's already a miracle I could boot to get it, after plenty of restarts.
Comment 28 mirh 2020-08-06 16:05:21 UTC
I tested 4.19.133 (latest attempt had been with 4.19.0) and everything seems alright. In fact piix4 mentions don't even pop up anymore. 

After quite a lot of time spent bisecting (put aside work happened to port selection logic in the i2c-piix4 driver too) the commits that really made a difference were the backports of 4abb951b73ff0a8a979113ef185651aa3c8da19b and 50ee7529ec4500c88f8664560770a7a1b65db72b.

Note You need to log in before you can comment on or make changes to this bug.