Bug 9991

Summary: i386: PAE kernel memory leak on SMP systems
Product: Platform Specific/Hardware Reporter: Bart Van Assche (bvanassche)
Component: i386Assignee: platform_i386
Status: CLOSED CODE_FIX    
Severity: high    
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.22.18 Subsystem:
Regression: --- Bisected commit-id:
Attachments: Kernel config (.config)
dmesg output
bzip2-compressed ODS with graph of memory usage
Minimized kernel config.
Memory usage graph for minimized kernel config.
Memory usage graph for minimized kernel config.
Memory usage graph for minimized kernel config.
LD_LIBRARY_PATH=/usr/lib/virtualbox strace -f /usr/lib/virtualbox/VBoxManage list vms >/dev/null
LD_LIBRARY_PATH=/usr/lib/virtualbox strace -f /usr/lib/virtualbox/VBoxManage list vms >/dev/null
patch to display quicklists in /proc/meminfo

Description Bart Van Assche 2008-02-14 06:11:24 UTC
Latest working kernel version: (not known).
Earliest failing kernel version: fails with both 2.6.22.12 and 2.6.22.18.
Distribution: Ubuntu 7.10 server.
Hardware Environment: server with 16GB RAM and 8 CPU's (Intel Xeon E5320).
Software Environment: Ubuntu 7.10 server.
Problem Description:

The available memory steadily decreases when polling for the VirtualBox status. This problem occurs with a 32-bit kernel when CONFIG_HIGHMEM64G and CONFIG_PAE are enabled. This problem does not occur when CONFIG_HIGHMEM64G is disabled. This problem does not occur on the same hardware with a 64-bit kernel.

Steps to reproduce:

1. Make sure that the VirtualBox software is installed:

apt-get -y install virtualbox

2. Make sure that the VirtualBox kernel driver is NOT loaded:

rmmod vboxdrv

3. Start the following command to observe the available memory (do NOT trust free or top -- these tools do not take reclaimable slab memory into account):

while true; do /usr/bin/awk </proc/meminfo '/^MemTotal/{t=$2}/^MemFree/{f=$2}/^Buffers:/{b=$2}/^Cached:/{c=$2}/^SReclaimable:/{sr=$2}''END{print "Total: " t " KB, in use: " t-f-b-c-sr ", free: " f+b+c+sr " KB."}'; sleep 60; done

4. In a second shell, run the following command:

for ((i=0;i<1000;i++)); do LD_LIBRARY_PATH=/usr/lib/virtualbox /usr/lib/virtualbox/VBoxManage list vms >/dev/null; done
Comment 1 Bart Van Assche 2008-02-14 06:14:28 UTC
Created attachment 14836 [details]
Kernel config (.config)
Comment 2 Bart Van Assche 2008-02-14 06:16:10 UTC
Created attachment 14837 [details]
dmesg output
Comment 3 Bart Van Assche 2008-02-14 06:44:41 UTC
Additional information: this behavior only occurs on systems with more than 4 GB RAM.
Comment 4 Bart Van Assche 2008-02-14 07:03:25 UTC
Created attachment 14839 [details]
bzip2-compressed ODS with graph of memory usage
Comment 5 Thomas Gleixner 2008-02-14 17:57:46 UTC
> 2. Make sure that the VirtualBox kernel driver is NOT loaded:
> 
> rmmod vboxdrv

Does the same problem happen, when you never loaded vboxdrv ?

If yes, is the problem still there with later kernel versions (2.6.23,
2.6.24) ?

Thanks,

	tglx
Comment 6 H. Peter Anvin 2008-02-14 18:08:02 UTC
I would guess this is the PUD leak bug.  I don't see any -stable release that fixes it, which surprises me.  Please try 2.6.24, it has at least the dominant bug fixed.
Comment 7 Bart Van Assche 2008-02-14 23:35:20 UTC
Via a web search for the keywords PAE, memory and leak I found the following patch: http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg256181.html
Apparently this patch is not yet included in the most recent kernel (2.6.24.2). Should we apply that patch manually to the 2.6.24.2 kernel ?
Comment 8 Bart Van Assche 2008-02-18 07:29:19 UTC
Update: same result (leakage of about 1 MB/s) with vanilla 2.6.24.2 kernel and vboxdrv kernel driver never loaded. Applying the aforementioned patch to the 2.6.24.2 kernel failed.
Comment 9 Bart Van Assche 2008-02-19 00:48:36 UTC
Correction: leakage rate in the above comment is wrong, should read about 50 KB/s.
Update:
* same problem with a vanilla 2.6.24.2 kernel in 32-bit mode and CONFIG_HIGHMEM4G: leaks about 50 KB/s (no PAE, no 64G, vboxdrv kernel module never loaded).
* less severe leakage with vanilla 2.6.24.2 kernel in 32-bit mode with high memory disabled: leaks about 8 KB/s (CONFIG_NOHIGHMEM=y).
Comment 10 Bart Van Assche 2008-02-19 02:47:30 UTC
Update: the same test did not trigger a memory leak with the vanilla 2.6.24.2 kernel in 32-bit mode, high memory disabled and SMP disabled (booted with kernel parameter maxcpus=1).
Comment 11 Bart Van Assche 2008-02-21 07:54:57 UTC
Update: the issue also occurs when booting a PAE-kernel with parameter mem=1G. Memory usage increases up to a certain limit. The upper limit is proportional to the total amount of memory installed. Appended new graphs and minimized kernel .config.
Comment 12 Bart Van Assche 2008-02-21 07:57:07 UTC
Created attachment 14933 [details]
Minimized kernel config.
Comment 13 Bart Van Assche 2008-02-21 07:57:28 UTC
Created attachment 14934 [details]
Memory usage graph for minimized kernel config.
Comment 14 Bart Van Assche 2008-02-22 00:37:07 UTC
Created attachment 14949 [details]
Memory usage graph for minimized kernel config.
Comment 15 Bart Van Assche 2008-02-22 07:14:54 UTC
Created attachment 14951 [details]
Memory usage graph for minimized kernel config.
Comment 16 H. Peter Anvin 2008-03-06 00:03:08 UTC
What would be extremely useful would be to probe the sizes of quicklist 0 for all CPUs.  The past problem was that a huge number of pages get stuffed onto the quicklist for one or more CPUs, whereas at least one of them are zero.

This is how to do that with SystemTap:
http://zombieprocess.wordpress.com/2008/01/03/sample-real-world-use-of-systemtap/

It can also be done with gdb on /proc/kcode (gdb vmlinux /proc/kcore) and tracking down the per_cpu areas.
Comment 17 H. Peter Anvin 2008-03-06 00:06:16 UTC
[OK, I didn't look all that closely at the article.  I'm not 100% sure the example is that great.]
Comment 18 Bart Van Assche 2008-03-06 05:40:58 UTC
By this time I got systemtap set up. If you can tell me how I (In reply to comment #16)
> What would be extremely useful would be to probe the sizes of quicklist 0 for
> all CPUs.  The past problem was that a huge number of pages get stuffed onto
> the quicklist for one or more CPUs, whereas at least one of them are zero.
> 
> This is how to do that with SystemTap:
>
> http://zombieprocess.wordpress.com/2008/01/03/sample-real-world-use-of-systemtap/
> 
> It can also be done with gdb on /proc/kcode (gdb vmlinux /proc/kcore) and
> tracking down the per_cpu areas.
 
By this time I got systemtap set up. Note: when I try to run the above example, I get the following error message:
$ stap quicklist-trim.stp
semantic error: failed to retrieve location attribute for local 'q' (dieoffset: 0x539c2b): identifier '$q' at /home/vanasscb/quicklist-trim.stp:5:10
Pass 2: analysis failed.  Try again with more '-v' (verbose) options.

This means that systemtap could not find debug info for the local variable quicklist_trim::q in the vmlinux kernel image. I am not sure whether this is a systemtap or a vmlinux issue.
Comment 19 Bart Van Assche 2008-03-06 05:42:13 UTC
Update: the number of iterations mentioned in #0 is too low to reproduce the issue. The following command does trigger the PAE memory leak:

while true; do LD_LIBRARY_PATH=/usr/lib/virtualbox /usr/lib/virtualbox/VBoxManage list vms >/dev/null; done
Comment 20 Ingo Molnar 2008-03-06 05:49:31 UTC
> Update: the number of iterations mentioned in #0 is too low to reproduce the
> issue. The following command does trigger the PAE memory leak:
> 
> while true; do LD_LIBRARY_PATH=/usr/lib/virtualbox
> /usr/lib/virtualbox/VBoxManage list vms >/dev/null; done

hm, what type of activities does VBoxManage do which other apps dont?

	Ingo
Comment 21 Bart Van Assche 2008-03-06 05:58:26 UTC
I don't think this leak is caused by any specific VBoxManage activity, but I discovered this issue by running VirtualBox software. The VBoxManage executable forks a few processes, waits for their completion, and then stops. Should I append the strace -f output for the above command ?
Comment 22 Ingo Molnar 2008-03-06 06:06:40 UTC
> I don't think this leak is caused by any specific VBoxManage activity, 
> but I discovered this issue by running VirtualBox software. The 
> VBoxManage executable forks a few processes, waits for their 
> completion, and then stops. Should I append the strace -f output for 
> the above command ?

strace -f output would be useful too.

What makes me suspicious about VBox is that it uses kernel 
modifications. (now you unloaded it - but maybe it loads/unloads a 
module itself?) Or maybe it uses some other uncommon kernel facility.

Can you see the leak via other tasks as well? Such as running this 
infinite loop of 'ls' commands started by bash:

   while :; do /bin/bash -c ls > /dev/null; done

	Ingo
Comment 23 Bart Van Assche 2008-03-06 06:41:33 UTC
Created attachment 15161 [details]
LD_LIBRARY_PATH=/usr/lib/virtualbox strace -f /usr/lib/virtualbox/VBoxManage list vms >/dev/null
Comment 24 Bart Van Assche 2008-03-06 06:53:07 UTC
(In reply to comment #22)
> What makes me suspicious about VBox is that it uses kernel 
> modifications. (now you unloaded it - but maybe it loads/unloads a 
> module itself?) Or maybe it uses some other uncommon kernel facility.

All tests were performed without loading the vboxdrv kernel module (moved it out of the /lib/modules/* hierarchy such that it was not found during boot). Furthermore, the VBoxManage process does not even try to access /dev/vboxdrv -- it reads in a few XML files and prints a summary.
Comment 25 H. Peter Anvin 2008-03-06 07:22:59 UTC
Do we have any way of reproducing this with something less cumbersome than Virtualbox?
Comment 26 Ingo Molnar 2008-03-06 07:28:42 UTC
> Created an attachment (id=15161)
>  --> (http://bugzilla.kernel.org/attachment.cgi?id=15161&action=view)
> LD_LIBRARY_PATH=/usr/lib/virtualbox strace -f /usr/lib/virtualbox/VBoxManage
> list vms >/dev/null

thanks. Here are all the non-library files it opens:

 /dev/null
 /dev/urandom
 /etc/gre.conf
 /etc/gre.d/
 /etc/nsswitch.conf
 /etc/passwd
 /proc/bus/usb/devices
 /root/.gre.config
 /root/.VirtualBox/compreg.dat
 /root/.VirtualBox/VirtualBox.xml
 /root/.VirtualBox/xpti.dat
 /tmp/.vbox-root-ipc/lock
 /usr/share/locale/locale.alias

the ones that are somewhat unusal are /proc/bus/usb/devices. Does this 
loop:

   while :; do cat /proc/bus/usb/devices; done

show the leak as well perhaps?

	Ingo
Comment 27 Bart Van Assche 2008-03-06 08:35:10 UTC
Updates:
- Leak happens also with 'while true; do cat /proc/bus/usb/devices; done' (leaks between 100 KB/s and 1000 KB/s on my system, rate is variable). Note: usbfs was not mounted during the tests with the VBoxManage command, so if there is only a single cause, the leak can't be caused by usbfs.
- The shell command I mentioned in #0 for calculating free memory did not take into account SwapCached. The command below does:
  while true
  do
    echo -n "$(/bin/date +%s)  "
    /usr/bin/awk </proc/meminfo '/^MemTotal/{t=$2}/^MemFree/{f=$2}/^Buffers:/{b=$2}/^Cached:/{c=$2}/^SwapCached:/{sc=$2}/^SReclaimable:/{sr=$2}'\
'END{print "Total: " t " KB, in use: " t-f-b-c-sc-sr ", free: " f+b+c+sc+sr " KB."}'
    sleep 10
  done
Comment 28 Ingo Molnar 2008-03-06 08:59:55 UTC
> Updates:
> - Leak happens also with 'while true; do cat /proc/bus/usb/devices; 
> done' (leaks between 100 KB/s and 1000 KB/s on my system, rate is 
> variable). Note: usbfs was not mounted during the tests with the 
> VBoxManage command, so if there is only a single cause, the leak can't 
> be caused by usbfs.

ok, that's good progress - if it's really lost memory and not some 
natural shift away from pagecache (which is not a real leak) then this 
is a _massive_ leak, and in that acase i very much think it's related to 
/proc or /proc/bus/usb/devices.

note that despite usbfs not mounted, your strace indicates an active USB 
subsystem:

[pid 16086] read(16, "\nT:  Bus=04 Lev=00 Prnt=00 Port="..., 1024) = 1024
[pid 16086] read(16, ".6.24.3 uhci_hcd\nS:  Product=UHC"..., 1024) = 1024
[pid 16086] read(16, "hub\nE:  Ad=81(I) Atr=03(Int.) Mx"..., 1024) = 518

does the leak occur if you cat something more common in /proc, say:

  while :; do cat /proc/cpuinfo >/dev/null; done

and does a simple loop of shells which i suggested before show the leak:

  while :; do bash -c /bin/ls >/dev/null; done

?

	Ingo
Comment 29 Bart Van Assche 2008-03-06 23:40:20 UTC
Created attachment 15171 [details]
LD_LIBRARY_PATH=/usr/lib/virtualbox strace -f /usr/lib/virtualbox/VBoxManage list vms >/dev/null

Collected the output now in single-user mode instead of muti-user mode.
Comment 30 Bart Van Assche 2008-03-06 23:44:56 UTC
(In reply to comment #28)
> note that despite usbfs not mounted, your strace indicates an active USB 
> subsystem:
> 
> [pid 16086] read(16, "\nT:  Bus=04 Lev=00 Prnt=00 Port="..., 1024) = 1024
> [pid 16086] read(16, ".6.24.3 uhci_hcd\nS:  Product=UHC"..., 1024) = 1024
> [pid 16086] read(16, "hub\nE:  Ad=81(I) Atr=03(Int.) Mx"..., 1024) = 518

That's because I made a mistake: I ran all tests in single-user mode, except the strace command for collecting the VBoxManage output. By this time I have rerun the strace command in single-user mode and I have replaced the attachment. From the single-user mode strace output:

[pid  7702] open("/proc/bus/usb/devices", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)

> does the leak occur if you cat something more common in /proc, say:
>   while :; do cat /proc/cpuinfo >/dev/null; done

I can trigger the leak via /proc/bus/usb/devices, /proc/cpuinfo and /proc/cmdline (these are the only ones I tried).

> and does a simple loop of shells which i suggested before show the leak:
>   while :; do bash -c /bin/ls >/dev/null; done

The above command did not trigger the leak.
Comment 31 Thomas Gleixner 2008-03-08 12:23:36 UTC
Created attachment 15180 [details]
patch to display quicklists in /proc/meminfo

Ok, I figured out what's going on. It's not a memory leak, it's an accounting problem. x86 32 bit uses quicklists. quicklists keep freed pages in the quicklists up to a limit which depends on the size of available memory.

Can you please apply the test patch and check whether the leak goes away when you add the new entry to your awk script. It looks like:
QuickLists:      21376 kB
Comment 32 Bart Van Assche 2008-03-18 01:24:04 UTC
This issue has been discussed further on the LKML: see also http://lkml.org/lkml/2008/3/9/19
Comment 33 Bart Van Assche 2008-03-26 00:41:34 UTC
Retested with kernel version 2.6.25-rc6-00333-ga4083c9-dirty: memory usage is constant with this kernel when running the above test, which means the leak is fixed in this kernel.
Comment 34 Thomas Gleixner 2008-03-26 13:31:48 UTC
Thanks for testing. I'm closing the bug.

Thanks,
       tglx