When the operating system is booted with the default ASPM policy (POLICY_DEFAULT), current code is querying the enable/disable states from ASPM registers to determine the policy. For example, a BIOS could set the power saving state to performance and clear all ASPM control registers. A balanced ASPM policy could enable L0s and disable L1. A power conscious BIOS could enable both L0s and L1 to trade off latency and performance vs. power. After hotplug removal, pcie_aspm_exit_link_state() function clears the ASPM registers. An insertion following hotplug removal reads incorrect policy as ASPM disabled even though ASPM was enabled during boot. This is caused by the fact that same function is used for reconfiguring ASPM regardless of the of the power on state.
I understand the bug here and I agree we need to fix this, but for completeness and the benefit of future readers, could you please attach a complete "lspci -vv" output before the hotplug removal, another one from after the hotplug insertion, and the complete dmesg log? I wonder if there's anything we could log in dmesg that would be useful here. It might be more noise than it's worth, but then again, ASPM problems seem pretty common.
Created attachment 255275 [details] boot log with 4.9 I have network connectivity issues on 4.10 that is preventing me to collect logs. I attached the logs from 4.9 instead as the issue is there as well.
Created attachment 255277 [details] lspci_after_insertion_with_01_01_00_0
Created attachment 255279 [details] lspci_before_remove_l0s_enabled_on_01_01_00_00