problem: I experience major memory leaks with my mobile radeon 5850 using the latest 2.6.38 kernel. The problem occurs with both the open source driver (I've only tried with KMS) and the catalyst driver. I have experienced this since I purchased my laptop in late December. For me, the leaked memory has reached 3GB of my 6GB many times after a few days of use (I work with large .pdfs). how to reproduce: The easiest way for me to quickly reproduce the leak is to open huge .pdf files with okular and scroll through the files. Here is an example: http://developer.amd.com/gpu_assets/ATI_Stream_SDK_OpenCL_Programming_Guide.pdf After doing the above, I logged logging out to terminal and killed all processes (except for init and kernel processes). Below is free -m output after doing this. There are 3828-1949-128 = 1751 MB of memory that is still in use for no reason. I have to reboot to get it back. I've gotten this number up to 3GB before; the only limit is my system's memory (6GB). total used free shared buffers cached Mem: 5920 3828 2092 0 128 1949 -/+ buffers/cache: 1749 4170 Swap: 7998 0 7998 This is the only related bug that I found, but it's closed: https://bugzilla.redhat.com/show_bug.cgi?id=642045 my setup: latest Arch Linux software KDE 4.6.1 with compositing (I haven't tested if this is related) mobile radeon 5850 $ uname -a Linux J 2.6.38-ARCH #1 SMP PREEMPT Tue Mar 15 09:36:10 CET 2011 x86_64 Intel(R) Core(TM) i7 CPU Q 720 @ 1.60GHz GenuineIntel GNU/Linux thanks
It's a little unclear - is this a regression? If so, is it a 2.6.37 -> 2.6.38 regression? Thanks.
It's not a regression. I also experienced the problem with 2.6.37 and 2.6.36. I haven't tested any other kernel versions; I got my laptop in December.
1df6a2ebd75067aefbdf07482bf8e3d0584e04ee was the fix for the original problem, which was in 2.6.37. Unless something we've done since is causing it or another race to happen. So logging out of X doesn't get the memory back? can you attach the contents of /sys/kernel/debug/dri/0/radeon_vram_mm and radeon_gtt_mm? (after mounting debugfs).
The problem occurred in 2.6.37 also. Logging out of X doesn't get the memory back. I'm also not sure if just scrolling a lot causes the problem. It seems to be easier for me to consume memory by opening many .pdf files. Here is some output after logging out of X (I still had about ~30MB from root processes): $ free -m total used free shared buffers cached Mem: 5920 2668 3252 0 96 1088 -/+ buffers/cache: 1484 4436 Swap: 7998 0 7998 $ cat /sys/kernel/debug/dri/0/radeon_vram_mm 0x00000000-0x00000040: 0x00000040: used 0x00000040-0x00000140: 0x00000100: used 0x00000140-0x00000141: 0x00000001: used 0x00000141-0x0000092a: 0x000007e9: used 0x0000092a-0x00040000: 0x0003f6d6: free total: 262144, used 2346 free 259798 $ cat /sys/kernel/debug/dri/0/radeon_gtt_mm 0x00000000-0x00000001: 0x00000001: used 0x00000001-0x00000011: 0x00000010: used 0x00000011-0x00000111: 0x00000100: used 0x00000111-0x00000211: 0x00000100: used 0x00000211-0x00020000: 0x0001fdef: free total: 131072, used 529 free 130543
how about cat /sys/kernel/debug/dri/0/ttm_page_pool
how about cat /sys/kernel/debug/dri/0/ttm_page_pool and /sys/class/drm/ttm/memory_accounting/*/*
Created attachment 51232 [details] debug info
The attached "debug info" above is output after logging out of X (I still had about ~30MB from root processes). It includes what Dave asked for. FYI, I haven't rebooted since my last post. $ free -m total used free shared buffers cached Mem: 5920 4279 1641 0 212 2547 -/+ buffers/cache: 1519 4401 Swap: 7998 0 7998
I'm having trouble reconciling this with the graphics driver. a) you claim it happens with fglrx as well, there is nothing shared between them (so it can't be a common gpu bug) b) nothing is leaking according to that debug info in anywhere obvious. One last test would be unload the gpu stack to see what happens got to runlevel 3 or anywhere X isn't rnunig echo 0 > /sys/class/vtconsole/vtcon1/bind rmmod radeon ttm drm_kms_helper drm see if memory is still gone. maybe you can watch /proc/slabinfo to see where the memory might be going. It might also just be page cache from the pdf loading, does echo 3 > /proc/sys/vm/drop_caches help?
echo 3 > /proc/sys/vm/drop_caches cleared all of the memory. Thank you! See output below (ran after exiting X). I didn't try unloading the gpu stack. I can try it if you like. I also saved /proc/slabinfo before and after I dropped the caches. If I'm interpreting the file correctly, I only notice large drops in ext4 filesystem stuff. Is there a way to tell which slabs are considered buffers or cache? # free -m total used free shared buffers cached Mem: 5920 4318 1602 0 113 2465 -/+ buffers/cache: 1740 4180 Swap: 7998 0 7998 # echo 3 > /proc/sys/vm/drop_caches # free -m total used free shared buffers cached Mem: 5920 225 5695 0 3 12 -/+ buffers/cache: 209 5711 Swap: 7998 0 7998
Created attachment 51392 [details] slabinfo before echo 3 > /proc/sys/vm/drop_caches
Created attachment 51402 [details] slabinfo after echo 3 > /proc/sys/vm/drop_caches
So is this something that should be fixed or is everything working as intended? To me, it doesn't seem correct. Unused caches shouldn't be stored in my "used" memory. Is there anything else that I should try to help? Can anyone else reproduce this? I just open large pdfs and scroll through them.
There is no leaking here. A quick google search turns up http://sourcefrog.net/weblog/software/linux-kernel/free-mem.html.
I'm sorry. I wasn't clear. There is NOT a memory leak. Below is some of my 'free -m' output from above. My question is, why does the 'used -/+ buffers/cache' decrease when I 'echo 3 > /proc/sys/vm/drop_caches'? # free -m total used free shared buffers cached Mem: 5920 4318 1602 0 113 2465 -/+ buffers/cache: 1740 4180 Swap: 7998 0 7998 # echo 3 > /proc/sys/vm/drop_caches # free -m total used free shared buffers cached Mem: 5920 225 5695 0 3 12 -/+ buffers/cache: 209 5711 Swap: 7998 0 7998
I think it's clear that no memory is "leaked", so I guess I'll mark this "bug" as invalid. I still don't understand why 'echo 3 > /proc/sys/vm/drop_caches' caused the 'used -/+ buffers/cache' field to decrease by 1500MB (see my previous comment) since I thought this field doesn't include cached memory. But that's probably just a sign of my ignorance.