Bug 9991
Summary: | i386: PAE kernel memory leak on SMP systems | ||
---|---|---|---|
Product: | Platform Specific/Hardware | Reporter: | Bart Van Assche (bvanassche) |
Component: | i386 | Assignee: | platform_i386 |
Status: | CLOSED CODE_FIX | ||
Severity: | high | ||
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.22.18 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: |
Kernel config (.config)
dmesg output bzip2-compressed ODS with graph of memory usage Minimized kernel config. Memory usage graph for minimized kernel config. Memory usage graph for minimized kernel config. Memory usage graph for minimized kernel config. LD_LIBRARY_PATH=/usr/lib/virtualbox strace -f /usr/lib/virtualbox/VBoxManage list vms >/dev/null LD_LIBRARY_PATH=/usr/lib/virtualbox strace -f /usr/lib/virtualbox/VBoxManage list vms >/dev/null patch to display quicklists in /proc/meminfo |
Description
Bart Van Assche
2008-02-14 06:11:24 UTC
Created attachment 14836 [details]
Kernel config (.config)
Created attachment 14837 [details]
dmesg output
Additional information: this behavior only occurs on systems with more than 4 GB RAM. Created attachment 14839 [details]
bzip2-compressed ODS with graph of memory usage
> 2. Make sure that the VirtualBox kernel driver is NOT loaded:
>
> rmmod vboxdrv
Does the same problem happen, when you never loaded vboxdrv ?
If yes, is the problem still there with later kernel versions (2.6.23,
2.6.24) ?
Thanks,
tglx
I would guess this is the PUD leak bug. I don't see any -stable release that fixes it, which surprises me. Please try 2.6.24, it has at least the dominant bug fixed. Via a web search for the keywords PAE, memory and leak I found the following patch: http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg256181.html Apparently this patch is not yet included in the most recent kernel (2.6.24.2). Should we apply that patch manually to the 2.6.24.2 kernel ? Update: same result (leakage of about 1 MB/s) with vanilla 2.6.24.2 kernel and vboxdrv kernel driver never loaded. Applying the aforementioned patch to the 2.6.24.2 kernel failed. Correction: leakage rate in the above comment is wrong, should read about 50 KB/s. Update: * same problem with a vanilla 2.6.24.2 kernel in 32-bit mode and CONFIG_HIGHMEM4G: leaks about 50 KB/s (no PAE, no 64G, vboxdrv kernel module never loaded). * less severe leakage with vanilla 2.6.24.2 kernel in 32-bit mode with high memory disabled: leaks about 8 KB/s (CONFIG_NOHIGHMEM=y). Update: the same test did not trigger a memory leak with the vanilla 2.6.24.2 kernel in 32-bit mode, high memory disabled and SMP disabled (booted with kernel parameter maxcpus=1). Update: the issue also occurs when booting a PAE-kernel with parameter mem=1G. Memory usage increases up to a certain limit. The upper limit is proportional to the total amount of memory installed. Appended new graphs and minimized kernel .config. Created attachment 14933 [details]
Minimized kernel config.
Created attachment 14934 [details]
Memory usage graph for minimized kernel config.
Created attachment 14949 [details]
Memory usage graph for minimized kernel config.
Created attachment 14951 [details]
Memory usage graph for minimized kernel config.
What would be extremely useful would be to probe the sizes of quicklist 0 for all CPUs. The past problem was that a huge number of pages get stuffed onto the quicklist for one or more CPUs, whereas at least one of them are zero. This is how to do that with SystemTap: http://zombieprocess.wordpress.com/2008/01/03/sample-real-world-use-of-systemtap/ It can also be done with gdb on /proc/kcode (gdb vmlinux /proc/kcore) and tracking down the per_cpu areas. [OK, I didn't look all that closely at the article. I'm not 100% sure the example is that great.] By this time I got systemtap set up. If you can tell me how I (In reply to comment #16) > What would be extremely useful would be to probe the sizes of quicklist 0 for > all CPUs. The past problem was that a huge number of pages get stuffed onto > the quicklist for one or more CPUs, whereas at least one of them are zero. > > This is how to do that with SystemTap: > > http://zombieprocess.wordpress.com/2008/01/03/sample-real-world-use-of-systemtap/ > > It can also be done with gdb on /proc/kcode (gdb vmlinux /proc/kcore) and > tracking down the per_cpu areas. By this time I got systemtap set up. Note: when I try to run the above example, I get the following error message: $ stap quicklist-trim.stp semantic error: failed to retrieve location attribute for local 'q' (dieoffset: 0x539c2b): identifier '$q' at /home/vanasscb/quicklist-trim.stp:5:10 Pass 2: analysis failed. Try again with more '-v' (verbose) options. This means that systemtap could not find debug info for the local variable quicklist_trim::q in the vmlinux kernel image. I am not sure whether this is a systemtap or a vmlinux issue. Update: the number of iterations mentioned in #0 is too low to reproduce the issue. The following command does trigger the PAE memory leak: while true; do LD_LIBRARY_PATH=/usr/lib/virtualbox /usr/lib/virtualbox/VBoxManage list vms >/dev/null; done
> Update: the number of iterations mentioned in #0 is too low to reproduce the
> issue. The following command does trigger the PAE memory leak:
>
> while true; do LD_LIBRARY_PATH=/usr/lib/virtualbox
> /usr/lib/virtualbox/VBoxManage list vms >/dev/null; done
hm, what type of activities does VBoxManage do which other apps dont?
Ingo
I don't think this leak is caused by any specific VBoxManage activity, but I discovered this issue by running VirtualBox software. The VBoxManage executable forks a few processes, waits for their completion, and then stops. Should I append the strace -f output for the above command ?
> I don't think this leak is caused by any specific VBoxManage activity,
> but I discovered this issue by running VirtualBox software. The
> VBoxManage executable forks a few processes, waits for their
> completion, and then stops. Should I append the strace -f output for
> the above command ?
strace -f output would be useful too.
What makes me suspicious about VBox is that it uses kernel
modifications. (now you unloaded it - but maybe it loads/unloads a
module itself?) Or maybe it uses some other uncommon kernel facility.
Can you see the leak via other tasks as well? Such as running this
infinite loop of 'ls' commands started by bash:
while :; do /bin/bash -c ls > /dev/null; done
Ingo
Created attachment 15161 [details]
LD_LIBRARY_PATH=/usr/lib/virtualbox strace -f /usr/lib/virtualbox/VBoxManage list vms >/dev/null
(In reply to comment #22) > What makes me suspicious about VBox is that it uses kernel > modifications. (now you unloaded it - but maybe it loads/unloads a > module itself?) Or maybe it uses some other uncommon kernel facility. All tests were performed without loading the vboxdrv kernel module (moved it out of the /lib/modules/* hierarchy such that it was not found during boot). Furthermore, the VBoxManage process does not even try to access /dev/vboxdrv -- it reads in a few XML files and prints a summary. Do we have any way of reproducing this with something less cumbersome than Virtualbox?
> Created an attachment (id=15161)
> --> (http://bugzilla.kernel.org/attachment.cgi?id=15161&action=view)
> LD_LIBRARY_PATH=/usr/lib/virtualbox strace -f /usr/lib/virtualbox/VBoxManage
> list vms >/dev/null
thanks. Here are all the non-library files it opens:
/dev/null
/dev/urandom
/etc/gre.conf
/etc/gre.d/
/etc/nsswitch.conf
/etc/passwd
/proc/bus/usb/devices
/root/.gre.config
/root/.VirtualBox/compreg.dat
/root/.VirtualBox/VirtualBox.xml
/root/.VirtualBox/xpti.dat
/tmp/.vbox-root-ipc/lock
/usr/share/locale/locale.alias
the ones that are somewhat unusal are /proc/bus/usb/devices. Does this
loop:
while :; do cat /proc/bus/usb/devices; done
show the leak as well perhaps?
Ingo
Updates: - Leak happens also with 'while true; do cat /proc/bus/usb/devices; done' (leaks between 100 KB/s and 1000 KB/s on my system, rate is variable). Note: usbfs was not mounted during the tests with the VBoxManage command, so if there is only a single cause, the leak can't be caused by usbfs. - The shell command I mentioned in #0 for calculating free memory did not take into account SwapCached. The command below does: while true do echo -n "$(/bin/date +%s) " /usr/bin/awk </proc/meminfo '/^MemTotal/{t=$2}/^MemFree/{f=$2}/^Buffers:/{b=$2}/^Cached:/{c=$2}/^SwapCached:/{sc=$2}/^SReclaimable:/{sr=$2}'\ 'END{print "Total: " t " KB, in use: " t-f-b-c-sc-sr ", free: " f+b+c+sc+sr " KB."}' sleep 10 done
> Updates:
> - Leak happens also with 'while true; do cat /proc/bus/usb/devices;
> done' (leaks between 100 KB/s and 1000 KB/s on my system, rate is
> variable). Note: usbfs was not mounted during the tests with the
> VBoxManage command, so if there is only a single cause, the leak can't
> be caused by usbfs.
ok, that's good progress - if it's really lost memory and not some
natural shift away from pagecache (which is not a real leak) then this
is a _massive_ leak, and in that acase i very much think it's related to
/proc or /proc/bus/usb/devices.
note that despite usbfs not mounted, your strace indicates an active USB
subsystem:
[pid 16086] read(16, "\nT: Bus=04 Lev=00 Prnt=00 Port="..., 1024) = 1024
[pid 16086] read(16, ".6.24.3 uhci_hcd\nS: Product=UHC"..., 1024) = 1024
[pid 16086] read(16, "hub\nE: Ad=81(I) Atr=03(Int.) Mx"..., 1024) = 518
does the leak occur if you cat something more common in /proc, say:
while :; do cat /proc/cpuinfo >/dev/null; done
and does a simple loop of shells which i suggested before show the leak:
while :; do bash -c /bin/ls >/dev/null; done
?
Ingo
Created attachment 15171 [details]
LD_LIBRARY_PATH=/usr/lib/virtualbox strace -f /usr/lib/virtualbox/VBoxManage list vms >/dev/null
Collected the output now in single-user mode instead of muti-user mode.
(In reply to comment #28) > note that despite usbfs not mounted, your strace indicates an active USB > subsystem: > > [pid 16086] read(16, "\nT: Bus=04 Lev=00 Prnt=00 Port="..., 1024) = 1024 > [pid 16086] read(16, ".6.24.3 uhci_hcd\nS: Product=UHC"..., 1024) = 1024 > [pid 16086] read(16, "hub\nE: Ad=81(I) Atr=03(Int.) Mx"..., 1024) = 518 That's because I made a mistake: I ran all tests in single-user mode, except the strace command for collecting the VBoxManage output. By this time I have rerun the strace command in single-user mode and I have replaced the attachment. From the single-user mode strace output: [pid 7702] open("/proc/bus/usb/devices", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory) > does the leak occur if you cat something more common in /proc, say: > while :; do cat /proc/cpuinfo >/dev/null; done I can trigger the leak via /proc/bus/usb/devices, /proc/cpuinfo and /proc/cmdline (these are the only ones I tried). > and does a simple loop of shells which i suggested before show the leak: > while :; do bash -c /bin/ls >/dev/null; done The above command did not trigger the leak. Created attachment 15180 [details]
patch to display quicklists in /proc/meminfo
Ok, I figured out what's going on. It's not a memory leak, it's an accounting problem. x86 32 bit uses quicklists. quicklists keep freed pages in the quicklists up to a limit which depends on the size of available memory.
Can you please apply the test patch and check whether the leak goes away when you add the new entry to your awk script. It looks like:
QuickLists: 21376 kB
This issue has been discussed further on the LKML: see also http://lkml.org/lkml/2008/3/9/19 Retested with kernel version 2.6.25-rc6-00333-ga4083c9-dirty: memory usage is constant with this kernel when running the above test, which means the leak is fixed in this kernel. Thanks for testing. I'm closing the bug. Thanks, tglx |