Bug 219427

Summary:	Memory leak of pinned pages in "low memory conditions"
Product:	Memory Management	Reporter:	vlovich
Component:	Page Allocator	Assignee:	Andrew Morton (akpm)
Status:	NEW ---
Severity:	normal	CC:	bizyaev, edwin0cheng
Priority:	P3
Hardware:	All
OS:	Linux
Kernel Version:		Subsystem:
Regression:	No	Bisected commit-id:

Description vlovich 2024-10-25 20:08:14 UTC

Running llama.cpp by default has it using CUDA's allocator which creates pinned pages.  Running on the latest official 6.11 kernel results in permanent memory leaks after each invocation with `free -m` reporting more & more memory used with no active process actually using that memory. Similarly, `nr_foll_pin_acquired` and `nr_foll_pin_acquired` in `/proc/vmstat` are horribly imbalanced. llama.cpp discussion https://github.com/ggerganov/llama.cpp/issues/9988 and reported to nvidia 
https://forums.developer.nvidia.com/t/memory-leak-on-kernel-6-11-0-when-using-cudamallochost/308691

I see a patch proposed in https://lore.kernel.org/lkml/87y12ibbew.fsf@nvdebian.thelocal/T/#ma3aebfc4d8aa152d2c0439bedf0a4862d2510185 but the patch doesn't seem to have been applied in 6.12 RC nor mainline so I wanted to create a bug to make sure this is tracked.

Comment 1 vlovich 2024-10-25 21:31:25 UTC

The only thing that doesn't make sense in the explanation. I have 64GiB of RAM and even on a freshly booted machine, the memory usage is only 4 GiB. The maximum allocated by llama.cpp is ~16 GiB. So it's a bit strange to be hitting the issue with the reasoning that it's in the "low memory conditions" case only.

Comment 2 vlovich 2024-10-25 23:10:34 UTC

I've confirmed the patch fixes the issue on 6.11. Can't quite get 6.12 booting for some reason to double-check.