Bug 219427

Summary: Memory leak of pinned pages in "low memory conditions"
Product: Memory Management Reporter: vlovich
Component: Page AllocatorAssignee: Andrew Morton (akpm)
Status: NEW ---    
Severity: normal CC: bizyaev, edwin0cheng
Priority: P3    
Hardware: All   
OS: Linux   
Kernel Version: Subsystem:
Regression: No Bisected commit-id:

Description vlovich 2024-10-25 20:08:14 UTC
Running llama.cpp by default has it using CUDA's allocator which creates pinned pages.  Running on the latest official 6.11 kernel results in permanent memory leaks after each invocation with `free -m` reporting more & more memory used with no active process actually using that memory. Similarly, `nr_foll_pin_acquired` and `nr_foll_pin_acquired` in `/proc/vmstat` are horribly imbalanced. llama.cpp discussion https://github.com/ggerganov/llama.cpp/issues/9988 and reported to nvidia 
https://forums.developer.nvidia.com/t/memory-leak-on-kernel-6-11-0-when-using-cudamallochost/308691

I see a patch proposed in https://lore.kernel.org/lkml/87y12ibbew.fsf@nvdebian.thelocal/T/#ma3aebfc4d8aa152d2c0439bedf0a4862d2510185 but the patch doesn't seem to have been applied in 6.12 RC nor mainline so I wanted to create a bug to make sure this is tracked.
Comment 1 vlovich 2024-10-25 21:31:25 UTC
The only thing that doesn't make sense in the explanation. I have 64GiB of RAM and even on a freshly booted machine, the memory usage is only 4 GiB. The maximum allocated by llama.cpp is ~16 GiB. So it's a bit strange to be hitting the issue with the reasoning that it's in the "low memory conditions" case only.
Comment 2 vlovich 2024-10-25 23:10:34 UTC
I've confirmed the patch fixes the issue on 6.11. Can't quite get 6.12 booting for some reason to double-check.