Bug 219611

Summary: Read of pcie_bw sysfs file on AMD GPU blocks for 1 second
Product: Drivers Reporter: Russell Haley (yumpusamongus+kernelbugzilla)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: RESOLVED ANSWERED    
Severity: normal    
Priority: P3    
Hardware: Intel   
OS: Linux   
Kernel Version: Subsystem:
Regression: No Bisected commit-id:

Description Russell Haley 2024-12-18 14:49:58 UTC
Multiple cases of userspace resource monitors getting tripped up by this:

https://github.com/Syllo/nvtop/issues/139  

https://github.com/Syllo/nvtop/issues/208  

https://github.com/aristocratos/btop/issues/793  

https://gitlab.com/mission-center-devs/mission-center/-/issues/309

The behavior is highly unusual and would require special treatment of just that file in userspace.

The docs say "The amdgpu driver provides a sysfs API for estimating how much data has been received and sent by the GPU in the last second through PCIe". Specifically, the LAST second, not the second starting when read() was called.

The culprit, as far as I can tell, is the msleep here: https://elixir.bootlin.com/linux/v6.12.4/source/drivers/gpu/drm/amd/amdgpu/soc15.c#L756 (the same code is copy-pasted in 4 places).

I am not familiar with the intricacies of AMD GPUs, but what would be the cost to having those counters enabled all the time, and reporting the number of messages in some recent second? Or even better, ripping this out and exposing the integrating message counts directly, so userspace can choose whichever sample rate it wants?
Comment 1 Artem S. Tashkinov 2024-12-18 18:55:32 UTC
Please report here instead:

https://gitlab.freedesktop.org/drm/amd/-/issues