Created attachment 277107 [details] Config file for 4.17.3. Hi there, I'm experiencing some out-of-memory issues while running Cities:Skylines using the amdgpu driver. Trying to run a new game cases a complete system-freeze running any Kernel that runs the amdgpu driver instead of a rather old Kernel using the amdgpu-pro driver. The memory is the system related main memory, not the GPU memory. System details: I'm running Ubuntu Mate 16.04 with a custom build 4.17.3 Kernel (Find config attached) AMD FX-8350 32 GB RAM Radeon RX470 Sample Main Memory usage. Kernel 4.4 with amdgpu-pro driver - RAM Usage after 1 Minute: 2.4 GB Kernel 4.17.3 with amdgpu driver - RAM Usage after 1 Minute: 13 GB Kernel 4.16.18 with amdgpu driver - RAM Usage after 1 Minute: 13 GB Kernel 4.18.0-rc2 with amdgpu driver - RAM Usage after 1 Minute: 13 GB I get similar results with running Stardew Valley (Factor two difference, clearly measurable) Find attached the config file for the 4.17.3 Kernel. Other kernels have been build using this config file and the default suggestions for any unconfigured parameter. Greetings, Felix
Please attach your dmesg output and xorg log if using X.
Also, please attach the output of free and of top after pressing shift-M, both captured while RAM usage is high.
Yeah, I'll post the mentioned things today after I got home.
Created attachment 277121 [details] dmesg Kernel 4.17.3
Created attachment 277123 [details] dmesg on Kernel 4.4.0
Created attachment 277125 [details] Xorg Log on 4.17.3
Created attachment 277127 [details] Xorg log on 4.4.0
Created attachment 277129 [details] Free and stats of the two Kernels Contains free and the /proc/$ID/stat and /proc/$ID/statm output of the two Kernel versions
Created attachment 277131 [details] Output of top of the problematic process on the two Kernels Truncated output of top of the problematic process on the two kernels
I uploaded all the requested files. Interestingly the output of top and statm of the process has comparable values except for the data stack (see file stats) Virtual, resident and shared memory are comparable. If you need any further data don't hesitate to ask. Thank you
You could also try to compile your kernel with kmemleak enabled.
I'm rebuilding the kernel and checking a possible memory leak with kmemleak.
Having some problems setting up kmemleak at the moment. I'll test and check tomorrow
Another possibility would be narrowing down where between 4.4 and 4.16 this started happening, and eventually bisecting.
Created attachment 277135 [details] kmemleak output of Cities.x64 I was finally able to create a kmemleak output and cropped it to the relevant outpt coming from the affected program. I hope this is helpful.
(In reply to phoenix from comment #15) > I hope this is helpful. Unfortunately not really, the only thing in there is a known issue with the IOVA cache. Can you try to bisect as Michel suggested?
Sure, I'm going to investigate through the different kernel versions but that is gonna take me some time (I have to do this in my spare time) I'll post my progress and findings, when available.
Does the memory usage go back down when you quit the game? Or when you restart X? Or never?
The memory usage goes immediately down once the game quits. No X restart necessary
In that case, the output of running the game in valgrind --leak-check=full might be interesting.
Jep, I'll have a look this evening. Maybe I can reproduce the issue with another program as well to exclude exclusive problems with a single userland program.
Apperently it's not that easy to attach valgrind to any Steam game, so I'm going the suggested approach of trying it out using different Kernel version. Interestingly I could observe similar behaviour in Stardew Valley but not in Kerbal Space program, as the following attached statm shows: ## /proc/$ID/statm for Stardew Valley (Similar problem see the data segment) # statm for 4917 on 4.17.3 978381 424915 23927 849 0 449695 0 # statm for 4370 on 4.4.0 979917 418188 23774 849 0 874146 0 ## /proc/$ID/statm for Kerbal Space Program (Problem does not occur) # statm of 5419 on 4.4.0 532753 381415 19974 7863 0 446822 0 # statm of on 4.17.3 529142 389210 19754 7863 0 441862 0 I'm investigating using different Kernel versions and maybe I'm able to write a simple OpenGL program that triggers the problem.
(In reply to phoenix from comment #22) > > ## /proc/$ID/statm for Stardew Valley (Similar problem see the data segment) > # statm for 4917 on 4.17.3 > 978381 424915 23927 849 0 449695 0 > # statm for 4370 on 4.4.0 > 979917 418188 23774 849 0 874146 0 Did you swap these numbers? The only significant difference is the data size (second to last number), but the 4.4 number is bigger by ~400MB.
Hi Michel, wiredly not, I just double-checked them an in Stardew Valley the 4.4 number is really the 400 MB bigger one. For now I'm gonna give the kernel version numbers a try before we're working here on two things at the same time.
Created attachment 277153 [details] Memor usage measurements for different programs using Kernel 4.9.111 and 4.15.0-24 Ok, I've tested the issue using Kernel 4.15.0-24-generic (Shipped with Ubuntu Mate) using the amdgpu driver and a 4.9.111 Kernel using the amdgpu-pro driver (17.40). Sadly building the amdgpu-pro driver for Kernel linux-4.14.53 failed, so I couldn't test that one. The issue occurs also in the 4.15.0-24-generic Kernel, while the 4.9.111 Kernel has significantly lower main memory requirements using Cities Skylines. Also I found out, that neither the output of mstat nor proc shows significant differences in the processes between the Kernel versions. So as of now the only accessible metric for measuring the memory usage is to look at the output for 'free'. In addition I could observe the same memory issue (but without a system freeze) in Civilization Beyond earth using the above mentioned Kernel versions. That program is more suitable than a rather low-resource program like Stardew Valley. Find attached the text file MemUsage.txt with my current measurements. Attaching Valgrind to a Steam Game is kind of non-trivial, do you still think that this gives us some meaningful insights? I can work that out, but fear that this soon goes beyond the scopes of my available time, still can give it a shot.
(In reply to phoenix from comment #25) > Ok, I've tested the issue using Kernel 4.15.0-24-generic (Shipped with > Ubuntu Mate) using the amdgpu driver and a 4.9.111 Kernel using the > amdgpu-pro driver (17.40). BTW, ideally you should only test with the kernel's own amdgpu driver, not with amdgpu-pro, because the later uses its own copies of core DRM and even some core kernel code, and has other modifications compared to the stock driver.
(In reply to Michel Dänzer from comment #26) > BTW, ideally you should only test with the kernel's own amdgpu driver, not > with amdgpu-pro, because the later uses its own copies of core DRM and even > some core kernel code, and has other modifications compared to the stock > driver. To be even more precise I'm not sure that this is actually a kernel problem, or just caused by some mix up between the amdgpu-pro driver and the upstream driver. So testing on a clean install could yield some more results.
Hi Michel, Hi Christian, that makes sense, I test it on a clean environment. Sorry, that I should have done that in the first place :-/
I'm a bit busy at the moment, hope that I will find time on the weekend to further investigate!
Finally had time to investigate. The bug doesn't appear on a fresh install of Ubuntu 16.04 using the 4.17.3 Kernel with the above posted configuration. So apperently Christian was right and it was a weird mix-up between the amdgpu-pro and the upstream driver. I mark the bug as Resolved -> Obsolete, because it was indeed just a zombie from relict of an ancient installation :-) I should have check in the first place on a fresh install. Anyway - Thank you very much for the support and help and I wish you still a pleasant Sunday (or a good start into the week) Greetings, Felix