I have several KVM virtual servers at a number of hosting providers. On just one of them, the unreclaimable slab memory is growing linearly over time, until it hits a maximum (when the server's memory is 100% allocated). All the memory is allocated in kmalloc-64 slabs. After enabling slab debugging using slub_debug=U, /sys/kernel/slab/kmalloc-64/alloc_calls says that most of the allocations are coming from kvm_async_pf_task_wake This looks very similar to this blog post: https://darkimmortal.com/debian-10-kernel-slab-memory-leak/. Also see my post on ServerFault: https://serverfault.com/questions/1020241/debugging-kmalloc-64-slab-allocations-memory-leak Any suggestions on how to debug this? It seems like it could be a kernel bug.
I forgot to mention that this VPS is running Debian Bullseye (testing) root@lux01:~# uname -a Linux lux01 5.6.0-2-cloud-amd64 #1 SMP Debian 5.6.14-1 (2020-05-23) x86_64 GNU/Linux
Could you apply below to the guest kernel and have a try again? diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index d6f22a3..93d267e 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -156,6 +156,7 @@ static void apf_task_wake_one(struct kvm_task_sleep_node *n) hlist_del_init(&n->link); if (swq_has_sleeper(&n->wq)) swake_up_one(&n->wq); + kfree(n); }
@Wanpeng Li - I'll try that out and let you know how it goes.
That patch won't work, most APFs have a node that comes from the stack. The issue must be arising when you enter this branch of kvm_async_pf_task_wake: /* * async PF was not yet handled. * Add dummy entry for the token. */ n = kzalloc(sizeof(*n), GFP_ATOMIC); but it should be handled here in kvm_async_pf_task_wait: if (e) { /* dummy entry exist -> wake up was delivered ahead of PF */ hlist_del(&e->link); raw_spin_unlock(&b->lock); kfree(e); return false; }
Yeah, after applying that patch, I ended up getting a kernel panic on the kfree() call. A blog post I read (https://darkimmortal.com/debian-10-kernel-slab-memory-leak/) mentioned adding "no-kvmapf" to the kernel command line as a workaround for this issue... Are there any major issues that would occur as a result of doing that? Currently this issue is totally filling this server's memory after a few days of uptime.
No particular issue apart from a latency increase if the host is being overcommitted. It will be fixed in 5.8, but it's a host-side bug so you need to inform your hosting provider or add no-kvmapf yourself.