Bug 205587
Summary: | AMD Ryzen5/7 XHCI controllers fail to runtime resume | ||
---|---|---|---|
Product: | Power Management | Reporter: | Daniel Drake (drake) |
Component: | Run-Time-PM | Assignee: | Rafael J. Wysocki (rjw) |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | ||
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 5.3 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
UX434DA lspci output
UX434DA acpidump UX434DA dmesg X512DK dmesg X512DK lspci |
Description
Daniel Drake
2019-11-20 09:15:44 UTC
Created attachment 285981 [details]
UX434DA lspci output
Created attachment 285985 [details]
UX434DA acpidump
The workaround being explored is increasing the pci_dev_d3_sleep() delay from 10ms to 20ms. That completely avoids the problem. Trying to understand this better and look for alternatives to quirking the delay on these XHCI controllers, testing with pci.git pci/pm HEAD 395f121e6199 which includes bae26849372b ("PCI/PM: Move pci_dev_wait() definition earlier") 395f121e6199 ("PCI/PM: Wait for device to become ready after power-on") When runtime suspending the device: 1. pci_raw_set_power_state() transitions from D0 to D3hot 2. acpi_device_set_power() calls _PS3 to transition to D3hot, then calls acpi_power_transition(D3cold) to remove device power When runtime resuming the device: 1. acpi_device_set_power() calls acpi_power_transition(D0) to power on the device, then calls _PS0 to transition to D0. 2. pci_update_current_state() is called, reads pmcsr and sets dev->current_state=D3hot. 3. pci_dev_wait() is called but as COMMAND=0x100403 it does nothing 4. pci_raw_set_power_state(D0) is called. pmcsr is 0x103. pci_dev_d3_sleep() delays 10ms. now pmcsr has value 0x3. This causes the log message "refused to change power state" and USB is unusable from this point. The 2 ways to detect that the failed resume condition has been met are: 1. PCI_COMMAND reg. This starts with value 0x403 right before pmcsr is written to transition to D0. After the 10ms delay, it still has value 0x403. This is an indication that the failure condition is going to be met. After the extra 10ms elay, PCI_COMMAND has value 0, which indicates that things are OK to continue. 2. PCI_PM_CTRL reg. This starts with value 0x103. Then 0 is written and pci_dev_d3_sleep() delays 10ms. At this point it has value 0x3, which is an indication that the failure condition is going to be met, and Linux already logs "refused to change power state" here. After an additional 10ms delay, it has value 0. So, as an alternative to the device quirk, we could examine either of those registers to detect the condition, and if detected, either delay another 10ms or completely retry the pmcsr D0 transition again (which will also have the required effect of delaying another 10ms). There has been no observed evidence that this condition can be detected as CRS. At all existing points in the code where pci_dev_wait() is called, PCI_COMMAND does not have the required ~0 value that would cause a delay. Same for PCI_VENDOR_ID. I also checked reading PCI_VENDOR_ID before the D0 transition, and during the D0 transition, and after, it is always value 0x1022 i.e. does not provide an indication that this failure is met. Created attachment 286077 [details]
UX434DA dmesg
Created attachment 286081 [details]
X512DK dmesg
Created attachment 286083 [details]
X512DK lspci
Checking on the D3cold vs D3hot aspect of things, with ACPI dbug messages from device_pm.c I can see that Linux is turning off the relevant power domains when going into D3cold. However, I'm rather convinced that this is not actually fully cutting power to the PCI device. Experimenting, I add a 200ms delay and then a pmscr register read to the end of pci_set_power_state() , after pci_platform_power_transition() has been called. If the power was truly cut and we're in D3cold then I would expect this to fail. However the register read succeeds and returns the same value 0x103. So on these platforms the device really suspends into D3hot and then we have trouble doing the D3hot to D0 transition, where an extra delay is needed. Fixed by PCI: increase D3 delay for AMD Ryzen5/7 XHCI controllers |