I'm seeing a regression in rc5 on the very first boot that I haven't seen in a week of 4.19.0-0.rc4.git3.1.fc30.x86_64 (git ae596de1a0c8). [ 51.172922] f29h.local kernel: thunderbolt 0000:03:00.0: timeout reading config space 1 from 0x1 [ 51.172925] f29h.local kernel: ------------[ cut here ]------------ [ 51.172927] f29h.local kernel: thunderbolt 0000:03:00.0: 0:3: non switch port without a PHY [ 51.172954] f29h.local kernel: WARNING: CPU: 2 PID: 2036 at drivers/thunderbolt/switch.c:594 tb_switch_add+0x69b/0x780 [thunderbolt]
Created attachment 278745 [details] dmesg only
Created attachment 278747 [details] journal full Including in case something user space related is triggering this, like fwupd or boltd - but I don't see proximity of any of their messages with this warning.
Thanks for the report! I marked this as a regression. I don't see any PCI- or Thunderbolt-related changes between v4.19-rc4 and v4.19-rc5. That *is* the interval you've identified, isn't it (-rc4 works and -rc5 fails)? Can you please post this to linux-kernel and linux-pci (and include the URL of this bugzilla)? Also cc these folks (from MAINTAINERS for Thunderbolt): M: Andreas Noever <andreas.noever@gmail.com> M: Michael Jamet <michael.jamet@intel.com> M: Mika Westerberg <mika.westerberg@linux.intel.com> M: Yehezkel Bernat <YehezkelShB@gmail.com>
[ 52.642045] f29h.local kernel: thunderbolt 0000:03:00.0: timeout writing config space 2 to 0x2c [ 73.122341] f29h.local kernel: thunderbolt 0000:03:00.0: timeout reading config space 1 from 0x2 [ 73.122357] f29h.local kernel: thunderbolt 0000:03:00.0: 0: tb_eeprom_read_rom failed The tb_eeprom_read_rom failed message does sometimes happen with 4.19.0-0.rc4.git3.1.fc30.x86_64, but not every boot. Whereas with rc5 it looks like it's every boot so far. I don't see a pattern.
Created attachment 278749 [details] dmesg only copy 2 This is the same configuration, just a reboot with rc5, and the call trace doesn't happen. Therefore it's a transient problem; which means it's plausible it can happen on rc4 and I just haven't hit it yet.
I suspect this is the same issue we had previously with fwupd powering down the controller while the driver is in the middle of initialization. Chris, can you check the version of fwupd (fwupdmgr --version)? I also assume you do not have any TBT devices connected, right?
I suspect this is a duplicate of https://bugzilla.kernel.org/show_bug.cgi?id=199631.
Newer fwupd was configured to keep the lock open for a period of time. So if this is an newer fwupd then it does sound like it could be bolt doing it. [ 29.673264] f29h.local boltd[2032]: power: force_power support: yes [ 29.673343] f29h.local boltd[2032]: power: setting force_power to ON You can certainly experiment with blacklisting thunderbolt and thunderbolt_power plugins in fwupd and with disabling bolt. Please confirm your fwupd version too.
fwupd-1.1.2-1.fc29.x86_64 bolt-0.4-2.fc29.x86_64 There are two USB-C/Thunderbolt 3 connectors on this laptop, and each has one device connected: - USB-C to USB-A (USB 3.0) adapter with an old USB 1.1 keyboard attached - USB-C to HDMI, HDMI to DVI, NEC Multisync PA241W I'm not sure if the display connection is thunderbolt or displayport as it's passed through the USB-C to HDMI connection.
Also, this bug report is the first instance of the tb_switch_add warning I've seen since bug 199631 back in May.
This is the same laptop as in bug 199631. Newer manufacturer firmware has been applied since that bug report: [ 0.000000] DMI: HP HP Spectre Notebook/81A0, BIOS F.41 06/15/2018 But still the same thunderbolt controller firmware. # cat /sys/bus/thunderbolt/devices/0-0/nvm_version 16.0 Re-reading bug 199631, in that case nothing was connected at the time. In this bug 201227 case, I had two devices connected, listed in comment 9. Otherwise they appear to be dups. And while it taints the kernel, it appears to be non-fatal.
Created attachment 278769 [details] Add debugging to Intel WMI Thunderbolt module
Hi Chris, The Thunderbolt firmware is typically not part of the system BIOS. Yes, the warning you see is not fatal but it merely means that the controller got powered off during the driver initialization and the driver does what it can to bail out. However, it would be good to understand why you are seeing this issue. I wonder if you could apply the patch I attached and when the situation happens attach full dmesg here. That should show when the controller is force powered. In addition can you attach acpidump here as well? I would like to check how the ACPI force power method is implemented.
Created attachment 278777 [details] HP spectre acpidump
(In reply to Mika Westerberg from comment #13) > I > wonder if you could apply the patch I attached and when the situation > happens attach full dmesg here. That should show when the controller is > force powered. It seems to be really rare, since it last happened in May. I can run it with the patch for maybe a week, but generally I'm running Fedora kernels not building my own.
OK, I checked the acpidump and the force power method looks pretty standard. It could be just a bug in the TBT firmware because yours is quite old (v16) or another explanation could be that there is something (fwupd, bolt) that powers the controller down early. It would be good if you could run with the patch some time (week is fine) and if you see the warning again attach full dmesg. Maybe we can spot what happens.