Bug 219611 - Read of pcie_bw sysfs file on AMD GPU blocks for 1 second
Summary: Read of pcie_bw sysfs file on AMD GPU blocks for 1 second
Status: RESOLVED ANSWERED
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: Intel Linux
: P3 normal
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-12-18 14:49 UTC by Russell Haley
Modified: 2024-12-18 18:55 UTC (History)
0 users

See Also:
Kernel Version:
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Russell Haley 2024-12-18 14:49:58 UTC
Multiple cases of userspace resource monitors getting tripped up by this:

https://github.com/Syllo/nvtop/issues/139  

https://github.com/Syllo/nvtop/issues/208  

https://github.com/aristocratos/btop/issues/793  

https://gitlab.com/mission-center-devs/mission-center/-/issues/309

The behavior is highly unusual and would require special treatment of just that file in userspace.

The docs say "The amdgpu driver provides a sysfs API for estimating how much data has been received and sent by the GPU in the last second through PCIe". Specifically, the LAST second, not the second starting when read() was called.

The culprit, as far as I can tell, is the msleep here: https://elixir.bootlin.com/linux/v6.12.4/source/drivers/gpu/drm/amd/amdgpu/soc15.c#L756 (the same code is copy-pasted in 4 places).

I am not familiar with the intricacies of AMD GPUs, but what would be the cost to having those counters enabled all the time, and reporting the number of messages in some recent second? Or even better, ripping this out and exposing the integrating message counts directly, so userspace can choose whichever sample rate it wants?
Comment 1 Artem S. Tashkinov 2024-12-18 18:55:32 UTC
Please report here instead:

https://gitlab.freedesktop.org/drm/amd/-/issues

Note You need to log in before you can comment on or make changes to this bug.