Hello, I have a radeon HD4650 pcie, I use the radeon driver, I notice that with kernel 4.15.rc8 mainline some videos in 1080p@60fps format in youtube play with slight lags in fullscreen mode, but If use kernel 4.14.13-1 then all is Ok, streaming videos in 1080p@60fps play without lags ( perfect 60 fps ), so I suspect a bug in radeon driver provided by kernel 4.15.rc8, my OS is archlinux 64 bits, some example of video 1080@60 fps for youtube : https://www.youtube.com/watch?v=CcUXxM_rbAM
Created attachment 273695 [details] glxinfo glxinfo
Created attachment 273697 [details] dmesg kernel 4.14.13 good dmesg kernel 4.14.13 with this version there is no problem of lags in youtube 1080p@60 fps in fullscreen
Can you bisect?
I will try(In reply to Alex Deucher from comment #3) > Can you bisect? I will try to bisect
I did the bisect, the first bad commit is : commit cf2623d951c1c52923a776e01cf2e2afc9d042a0 Author: Rex Zhu <Rex.Zhu@amd.com> Date: Mon Sep 4 18:11:52 2017 +0800 drm/amd/powerplay: refine powerplay code for RV use function points instand of function table. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Rex Zhu <Rex.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Created attachment 273743 [details] bisect log here is the bisect log
I think something went wrong with your bisect. That commit touched a completely different driver that's not even loaded on your system.
Yes you (In reply to Alex Deucher from comment #7) > I think something went wrong with your bisect. That commit touched a > completely different driver that's not even loaded on your system. yes you are right, something went wrong with the bisect, because I revet commit cf2623d9 then the bug is still here
(In reply to Barto from comment #8) > yes you are right, something went wrong with the bisect, > > because I revet commit cf2623d9 then the bug is still here I meant "if I revert commit cf2623d9 then the bug is still here"
I think I found the real "first bad commit", it's : commit 648bc3574716400acc06f99915815f80d9563783 Author: Christian König <christian.koenig@amd.com> Date: Thu Jul 6 09:59:43 2017 +0200 drm/ttm: add transparent huge page support for DMA allocations v2 Try to allocate huge pages when it makes sense. v2: fix comment and use ifdef Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> because the previous commit( d188bfa5532ce5b426681d8530ff1a9683eea0ad ) before this bad commit doesn't have the bug, so I can conclude that the first bad commit which introduces the bug is 648bc3574716400acc06f99915815f80d9563783, I have made a new bisect which confirms the same thing, so what do you think Alex ? I tried to revert 648bc3574716400acc06f99915815f80d9563783 but git refuses to revert this commit because of conflict file, it seems that later one developper has edited one or several files modified previoulsy by commit 648bc3574716400acc06f99915815f80d9563783
Created attachment 273747 [details] new git bisect log here is a new bisect log, which confirms that commit 648bc3574716400acc06f99915815f80d9563783 is the first bad commit
Created attachment 273757 [details] workaround by reverting changes made by the bad commit I created a patch, it's a workaround very simple, it reverts changes made by commit 648bc3574716400acc06f99915815f80d9563783 related to drm/ttm and radeon_ttm.c, the idea is to restore the same behaviour of kernel 4.14.13 for drm/ttm, 8 files patched : drivers/gpu/drm/ttm/ttm_bo_util.c drivers/gpu/drm/ttm/ttm_memory.c drivers/gpu/drm/ttm/ttm_page_alloc.c drivers/gpu/drm/ttm/ttm_page_alloc_dma.c drivers/gpu/drm/radeon/radeon_ttm.c include/drm/ttm/ttm_debug.h include/drm/ttm/ttm_memory.h include/drm/ttm/ttm_page_alloc.h I hope a definitive solution will be found by Alex or Christian König
the specs of my PC : cpu : intel core 2 quad Q9650 ( 3 Ghz ) ram : 8 Gb DDR2 gpu : amd radeon HD4650 pcie motherboard : gigabyte GA-P35-DS3L ( socket 775, chipset intel P35 ) window manager : plasma 5 ( kde ) OS : archlinux 64 bits web browser : firefox 57.0.4 ( youtube in html5 mode, no flash plugin ) the output of "cat /proc/meminfo" when the bug occurs : MemTotal: 8171844 kB MemFree: 6188792 kB MemAvailable: 6979356 kB Buffers: 156940 kB Cached: 780372 kB SwapCached: 0 kB Active: 1020160 kB Inactive: 742148 kB Active(anon): 647812 kB Inactive(anon): 126472 kB Active(file): 372348 kB Inactive(file): 615676 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 5242876 kB SwapFree: 5242876 kB Dirty: 21768 kB Writeback: 0 kB AnonPages: 812840 kB Mapped: 423408 kB Shmem: 127600 kB Slab: 73336 kB SReclaimable: 46020 kB SUnreclaim: 27316 kB KernelStack: 6256 kB PageTables: 29324 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 9328796 kB Committed_AS: 2785064 kB VmallocTotal: 34359738367 kB VmallocUsed: 0 kB VmallocChunk: 0 kB HardwareCorrupted: 0 kB AnonHugePages: 145408 kB ShmemHugePages: 0 kB ShmemPmdMapped: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 187264 kB DirectMap2M: 8200192 kB
Well it is expected that this patch causes slightly more overhead during memory allocation in exchange for better throughput. But that should never result in lags in youtube videos, at least not of the browser doesn't do something very very stupid. (In reply to Barto from comment #13) > HugePages_Total: 0 > HugePages_Free: 0 > HugePages_Rsvd: 0 > HugePages_Surp: 0 > Hugepagesize: 2048 kB Ok well that is strange, the patch shouldn't have any effect when huge pages are disabled.
Hello Christian, I tried to disable "transparent huge page" with the kernel parameter "transparent_hugepage=never", but the bug is still here here is the output of "cat /proc/meminfo" when the kernel parameter "transparent_hugepage=never" is used and when I see hd youtube video@60 fps in fullscreen mode : MemTotal: 8171844 kB MemFree: 6201872 kB MemAvailable: 7196352 kB Buffers: 175860 kB Cached: 828424 kB SwapCached: 0 kB Active: 962712 kB Inactive: 770948 kB Active(anon): 572560 kB Inactive(anon): 117736 kB Active(file): 390152 kB Inactive(file): 653212 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 5242876 kB SwapFree: 5242876 kB Dirty: 6592 kB Writeback: 0 kB AnonPages: 729364 kB Mapped: 446376 kB Shmem: 118880 kB Slab: 80408 kB SReclaimable: 52736 kB SUnreclaim: 27672 kB KernelStack: 5956 kB PageTables: 31036 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 9328796 kB Committed_AS: 2722200 kB VmallocTotal: 34359738367 kB VmallocUsed: 0 kB VmallocChunk: 0 kB HardwareCorrupted: 0 kB AnonHugePages: 0 kB ShmemHugePages: 0 kB ShmemPmdMapped: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 201600 kB DirectMap2M: 8185856 kB the same experience but without the kernel parameter "transparent_hugepage=never", the bug is still here : MemTotal: 8171844 kB MemFree: 6068060 kB MemAvailable: 6948308 kB Buffers: 181860 kB Cached: 851652 kB SwapCached: 0 kB Active: 1107604 kB Inactive: 752108 kB Active(anon): 663292 kB Inactive(anon): 123784 kB Active(file): 444312 kB Inactive(file): 628324 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 5242876 kB SwapFree: 5242876 kB Dirty: 11148 kB Writeback: 0 kB AnonPages: 814052 kB Mapped: 457708 kB Shmem: 124920 kB Slab: 84420 kB SReclaimable: 56168 kB SUnreclaim: 28252 kB KernelStack: 5936 kB PageTables: 31100 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 9328796 kB Committed_AS: 2813580 kB VmallocTotal: 34359738367 kB VmallocUsed: 0 kB VmallocChunk: 0 kB HardwareCorrupted: 0 kB AnonHugePages: 161792 kB ShmemHugePages: 0 kB ShmemPmdMapped: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 199552 kB DirectMap2M: 8187904 kB what it's sure is that the commit 648bc357471 "drm/ttm: add transparent huge page support for DMA allocations v2" triggers the bug with my PC configuration, when I want to see a youtube video 1080p@60fps in fullscreen mode, there is a performance issue, the playback is not 100% fluid ( I can notice little lags in travelling-type images ) some additionnal informations : - I use plasma 5 ( kde ) as window manager, the vsync option is set to "automatic" in plasma 5 options, "openGL 2.0 acceleration" option is used for the compositor for plasma 5, - mesa version : 17.3.2-2 $ glxinfo | grep "OpenGL version" OpenGL version string: 3.0 Mesa 17.3.2 - the contain of /etc/X11/xorg.conf.d/20-radeon.conf : Section "Device" Identifier "Radeon" Driver "modesetting" Option "TearFree" "off" Option "DRI" "3" Option "AccelMethod" "glamor" EndSection
(In reply to Barto from comment #15) > Hello Christian, > > I tried to disable "transparent huge page" with the kernel parameter > "transparent_hugepage=never", but the bug is still here Sounds like you misunderstood me: > HugePages_Total: 0 > HugePages_Free: 0 > HugePages_Rsvd: 0 > HugePages_Surp: 0 > Hugepagesize: 2048 kB Huge pages are always disabled on your system. It would be interesting to see what happens when you try to enable them, not disable them.
What is the content of /sys/kernel/debug/dri/0/ttm_dma_page_pool while you are playing a youtube video? And can you try to compile the kernel with CONFIG_SWIOTLB disabled?
here is the output of "cat /sys/kernel/debug/dri/0/ttm_dma_page_pool" with kernel 4.15rc8 when I play a youtube video hd@60 fps in full screen mode : pool refills pages freed inuse available name wc 801 0 2834 370 radeon 0000:01:00.0 wchuge 1079 4307 6 3 radeon 0000:01:00.0 cached 10663 42097 552 3 radeon 0000:01:00.0 cachedhuge 6 19 0 5 radeon 0000:01:00.0 I will try to compile kernel 4.15rc8 with CONFIG_SWIOTLB disabled
(In reply to Christian König from comment #16) > Huge pages are always disabled on your system. It would be interesting to > see what happens when you try to enable them, not disable them. I tried with kernel parameter "transparent_hugepage=always", the bug is still here, and here is the output of ""cat /proc/meminfo", it seems that huge pages are still not used by my system despite the kernel parameter : MemTotal: 8171844 kB MemFree: 6052052 kB MemAvailable: 6938968 kB Buffers: 179304 kB Cached: 854248 kB SwapCached: 0 kB Active: 1076432 kB Inactive: 803524 kB Active(anon): 676640 kB Inactive(anon): 122984 kB Active(file): 399792 kB Inactive(file): 680540 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 5242876 kB SwapFree: 5242876 kB Dirty: 3016 kB Writeback: 0 kB AnonPages: 828348 kB Mapped: 447036 kB Shmem: 124128 kB Slab: 82000 kB SReclaimable: 53952 kB SUnreclaim: 28048 kB KernelStack: 6368 kB PageTables: 32316 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 9328796 kB Committed_AS: 2841448 kB VmallocTotal: 34359738367 kB VmallocUsed: 0 kB VmallocChunk: 0 kB HardwareCorrupted: 0 kB AnonHugePages: 106496 kB ShmemHugePages: 0 kB ShmemPmdMapped: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 205696 kB DirectMap2M: 8181760 kB but what means the line "AnonHugePages" ?
(In reply to Barto from comment #19) > AnonHugePages: 106496 kB > ShmemHugePages: 0 kB ... > but what means the line "AnonHugePages" ? Ah crap you are right, AnonHugePages are the anonymous huge pages. I was looking at the wrong value. So huge page support is indeed enabled on you CPU, and it doesn't seem to matter if we enable or disable it.
(In reply to Barto from comment #18) > pool refills pages freed inuse available name > wc 801 0 2834 370 radeon > 0000:01:00.0 > wchuge 1079 4307 6 3 radeon > 0000:01:00.0 > cached 10663 42097 552 3 radeon > 0000:01:00.0 > cachedhuge 6 19 0 5 radeon > 0000:01:00.0 > > I will try to compile kernel 4.15rc8 with CONFIG_SWIOTLB disabled Can you also give me those numbers before, while and after you played a video on youtube? Especially what are the number of refills and pages freed before, while and after you play the video? It starts to look more like that this isn't an issue in the kernel, but rather the userspace stack or application is constantly allocating and freeing memory. Best approach would be to gather the "while you play the video" numbers from a separate system over ssh, because when you put the firefox window into the background it can change the results.
(In reply to Christian König from comment #21) > Best approach would be to gather the "while you play the video" numbers from > a separate system over ssh, because when you put the firefox window into the > background it can change the results. I have already use a special technic in order to be able to get the infos while playing a video, I use a bash script with an "infinite while loop", which can allow me to put in fullscreen mode the video while the script will generate the log files : #!/bin/bash i="0" j="0" while [ $i -lt 4 ] do sleep 1 echo $j cat /proc/meminfo > log_$j.txt j=$[$j+1] done I tried to rebuild kernel 4.15rc8 with the option "CONFIG_SWIOTLB" disabled, the bug is still here
so here is the output of "cat /sys/kernel/debug/dri/0/ttm_dma_page_pool" when no youtube video are played ( firefox is not running ), with kernel 4.15rc8 : pool refills pages freed inuse available name wc 1682 0 1344 5384 radeon 0000:01:00.0 wchuge 1130 4514 0 6 radeon 0000:01:00.0 cached 106272 424533 552 3 radeon 0000:01:00.0 cachedhuge 6 18 0 6 radeon 0000:01:00.0 and now with firefox running, youtube video hd@60 fps in full screen mode ( output given by my bash script running in background ) : pool refills pages freed inuse available name wc 1682 0 2089 4639 radeon 0000:01:00.0 wchuge 3012 12042 3 3 radeon 0000:01:00.0 cached 113562 453693 552 3 radeon 0000:01:00.0 cachedhuge 7 24 0 4 radeon 0000:01:00.0 I made another expirement : using a different web-browser : - If I use chromium 63.0.3239.132 ( a web browser based on chrome ) then there is no bug, playback is 100% fluid, no lags, it's perfect, - If I use opera 50.0 then there is a bug, like firefox 57, I get lags, the only difference is that I get a vsync problem ( tearing ) so it seems that your commit breaks something on applications like web-browsers if they use a precise technic for rendering streaming video ( related to memory management ? ), chromium seems to be the only web-browser which is not affected by your commit
I forgot the third case, here is the output after a youtube video has been played on firefox 57, I closed firefox and now here is the output : # cat /sys/kernel/debug/dri/0/ttm_dma_page_pool pool refills pages freed inuse available name wc 1682 0 4016 2712 radeon 0000:01:00.0 wchuge 3403 13608 0 4 radeon 0000:01:00.0 cached 131859 526872 560 4 radeon 0000:01:00.0 cachedhuge 9 32 0 4 radeon 0000:01:00.0 the numbers seems stable
Someone has made a suggestion to me : go to "about:support" in firefox for ckecking some options, I can read this : HW_COMPOSITING blocked by default: Acceleration blocked by platform OPENGL_COMPOSITING unavailable by default: Hardware compositing is disabled WEBRENDER opt-in by default: WebRender is an opt-in feature unavailable by runtime: Build doesn't include WebRender and this guy told me to check if the property "layers.acceleration.force-enabled" is set to true in firefox ( in "about:config" section ), I checked and it was set to false by default, so I set it to true and I restarted firefox, and now there is no lag in firefox for youtube video 1080p@60 fps with kernel 4.15rc8 but this advice seems to be a "cheat mode/workaround", it doesn't explain why your commit triggers this problem with firefox 57 when "layers.acceleration.force-enabled" option is disabled in firefox ( which is the default value ), with kernel 4.14.14 ( and when I revert your commit ) youtube video plays without lags, even if "layers.acceleration.force-enabled" is disabled
(In reply to Barto from comment #25) > but this advice seems to be a "cheat mode/workaround", it doesn't explain > why your commit triggers this problem with firefox 57 when > "layers.acceleration.force-enabled" option is disabled in firefox ( which is > the default value ), Well, actually it explains perfectly what is going wrong here :) My huge page patches makes memory accesses faster for the price of making allocating memory more costly. E.g. by using 2M pages instead of 4K you improve some hardware path by the factor of 512. Now what I see when I look at your numbers is that user space allocated and freed (13608−4514)*2M = 18.1GB of memory while playing youtube videos!. This means that either the application or the driver stack is doing something very very stupid. Instead of using buffers round robin they are allocating them, using them once and then freeing them again. As a band aid I will try to fix our algorithm when pages are freed again, but in general the driver stack or application should be fixed to not do that. Probably best if you open up a bug report on http://bugs.freedesktop.org/ so that somebody can investigate what userspace is doing here.
(In reply to Christian König from comment #26) > Now what I see when I look at your numbers is that user space allocated and > freed (13608−4514)*2M = 18.1GB of memory while playing youtube videos!. > > This means that either the application or the driver stack is doing > something very very stupid. Instead of using buffers round robin they are > allocating them, using them once and then freeing them again. > I made a new test, in order to be sure that there is no mistake ( I disabled the option ""layers.acceleration.force-enabled" in firefox, before playing the video : pool refills pages freed inuse available name wc 697 0 1344 1444 radeon 0000:01:00.0 wchuge 26 98 0 6 radeon 0000:01:00.0 cached 4166 15978 680 6 radeon 0000:01:00.0 cachedhuge 3 9 0 3 radeon 0000:01:00.0 [root@ultima-dbr cesar]# while reading the video ( position 32 seconds ) : pool refills pages freed inuse available name wc 783 0 2089 1043 radeon 0000:01:00.0 wchuge 989 3947 3 6 radeon 0000:01:00.0 cached 8573 33737 552 3 radeon 0000:01:00.0 cachedhuge 5 14 0 6 radeon 0000:01:00.0 after reading 38 seconds of the video ( I click to "pause" in youtube at 38 seconds ) : pool refills pages freed inuse available name wc 783 0 2285 847 radeon 0000:01:00.0 wchuge 1153 4607 2 3 radeon 0000:01:00.0 cached 8675 34143 552 5 radeon 0000:01:00.0 cachedhuge 5 14 0 6 radeon 0000:01:00.0
I made another test, by using a player like VLC and a video file ( 1080p, 60 FPS ) : - with a video file 1080p@60 fps, GPU acceleration disabled ( XV output ) and kernel 4.15rc8 : I can notice a slight degradation, but less visible than firefox, it needs a trained eye in order to notice the performance issue, - with a video file 1080p@60 fps, GPU acceleration enabled ( VDPAU ) and kernel 4.15rc8 : all is ok, perfect 60 FPS frame, 100% smooth playback - with a video file 1080p@60 fps, GPU acceleration disabled ( XV output ) and kernel 4.14.14: all is ok the key here is the video resolution and framerate, things get difficult for my CPU when we reach the resolution 1080p AND the framerate 60 fps, any weak/non optimized algorithm in kernel ( or video driver ) will likely trigger slight lags/frame drops, if I use the "cheat mode" ( GPU acceleration with VDPAU ) then all is ok, no lags when the video has a resolution 1080p/60 fps and when I use kernel 4.15rc8. It would be interesting to create a benchmark ( a simple source code in C ) in order to evaluate with precision ( with a number ) the performance level of your algorithm related to drm/ttm, it will make easy the comparison between kernel 4.14.14 ( old algorithm ) and kernel 4.15rc8
For this you need to call the driver IOCTL to create a buffer object directly. Best is probably you use the Mesa code as and work from that backward, see here https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/winsys/radeon/drm/radeon_drm_bo.c function radeon_create_bo. drmCommandWriteRead(), struct drm_radeon_gem_create and all the defines you need are provided by libdrm. Domain should be RADEON_DOMAIN_GTT in this case, flags can be zero or RADEON_GEM_GTT_WC. Size and alignment should be obvious.
(In reply to Christian König from comment #26) > As a band aid I will try to fix our algorithm when pages are freed again, > but in general the driver stack or application should be fixed to not do > that. > > Probably best if you open up a bug report on http://bugs.freedesktop.org/ so > that somebody can investigate what userspace is doing here. Did you progress about this ttm algorithm ? I filmed my screen with a smartphone in order to capture the problem : https://www.youtube.com/watch?v=YqtleU5YBlA at first I filmed with kernel 4.14.14, then with kernel 4.15.2, we can see on the second attempt that the vertical scrool of the credits are not as good as with kernel 4.14.14 configuration : configuration used : - radeon hd4650 pci-e - CPU intel core 2 quad Q9650 - archlinux 64 bits - kernel 4.14.14 and 4.15.2-2 - firefox the youtube video for the test : https://youtu.be/AXL4r30VgSE?t=19m2s in fact the performance loss can occur at 720p@30fps on some situation, for example the vertical scroll of the credits, there are noticeable lags for the scrolling of the credits, though the rest of the video seems smooth, it would be interesting to have a kernel boot parameter in order to restore the previous ttm algorithm