Bug 151621

Summary: umount takes a lot of time and CPU while freeing pagecache pages
Product: File System Reporter: elvis has left the building (icanrealizeum+bugzillakernelorg)
Component: btrfsAssignee: Josef Bacik (josef)
Severity: normal CC: icanrealizeum+bugzillakernelorg
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: <=4.7.0-g0cbbc42 Subsystem:
Regression: No Bisected commit-id:
Attachments: this script will run on reboot/shutdown(by systemd) if placed in /usr/lib/systemd/system-shutdown/debug.sh
I'm including the old .config and the diff to now.

Description elvis has left the building 2016-08-06 16:58:43 UTC
On a 16GB RAM laptop, pagecache pages can get at about 2.3million and during reboot/shutdown, umount (umount2 syscall, more specifically during deactivate_super(mnt->mnt.mnt_sb); in cleanup_mnt function in fs/namespace.c) takes 6 seconds for each 100k pagecache pages, apparently freeing them up according to sysrq+m which shows them decreasing and freemem increasing as time goes by during "Unmounting /oldroot"

(I tried posting on linux-btrfs mailing list but some current issues with it prevented the mail getting through)

Here's a screenshot of the stacktrace during the "Unmounting /oldroot" which is apparently hung, but not really, it's doing work - I remember this being 100% CPU(aka 1 core) for however many minutes it takes until pagecache pages reaches about 771.


Note that there are 0 dirty pages(due to me doing "sync" prior to that), so this isn't about flushing stuff to disk.

This issue is likely active for about 6+ months.(I can't really remember when I've seen it first - as a regression)

Let me know if I can give more info or test any patches - more than happy to :)
Thank you and have a great day!

PS: your current pagecache pages?
echo m|sudo tee /proc/sysrq-trigger >/dev/null ; dmesg|grep -F "pagecache pages"|tail -1

mine: [ 1713.668919] 299366 total pagecache pages
Comment 1 elvis has left the building 2016-08-06 20:10:41 UTC
Found a manual way to free pagecache pages while system is running, thanks to this article: https://unix.stackexchange.com/questions/87908/how-do-you-empty-the-buffers-and-cache-on-a-linux-system/87909#87909

time echo 1 | sudo tee /proc/sys/vm/drop_caches >/dev/null

it goes down to about 100k:
[ 1046.678217] 108752 total pagecache pages

So that means reboot/shutdown will take about 6 seconds now.
Comment 2 elvis has left the building 2016-08-06 21:03:09 UTC
Looks like it's way faster to free pagecache pages while system is running(52k/sec) rather than during "Unmounting /oldroot" while reboot/shutdown happens (approx. 16k/sec)

I can get 2mil pagecache pages by recompiling kernel.

$ freepagecachepages 
[sudo] password for z: 
Sync-ing first:

real	0m6.472s
user	0m0.001s
sys	0m0.284s
Freeing (some) pagecache pages:

real	0m35.826s
user	0m0.009s
sys	0m35.638s
pagecache pages, Before: [ 3975.849231] 2001202 total pagecache pages
pagecache pages, After : [ 4018.580085] 127237 total pagecache pages

and tee is the one using 100% CPU(aka 1 core) during this:
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND     
  829 root      20   0    6124    696    632 R  99.7  0.0   0:20.33 tee
Comment 3 elvis has left the building 2016-08-06 22:03:38 UTC
Created attachment 227831 [details]
this script will run on reboot/shutdown(by systemd) if placed in /usr/lib/systemd/system-shutdown/debug.sh

I've placed the workaround above in this script that systemd runs at reboot.
This gets me from
[ 1031.580394] 2020719 total pagecache pages
[ 1070.743181] 1589 total pagecache pages
in 39sec
and then there's some extra 10 seconds of waiting during Unmounting /oldroot

That's like 51640 pagecache pages freed per second.
Seems better than what deactivate_super function is capable of doing.

I'll test next to see how fast deactivate_super can do it, to be sure it's actually faster.
(recompiling kernel WITH ccache, is how I get the 2 million - i forgot to mention that)
Comment 4 elvis has left the building 2016-08-06 22:31:03 UTC
Ok, allowing deactivate_super to do it(see the two XXX lines in debug.sh above):

[ 1678.711758] 2033917 total pagecache pages
771 (seen with my eyes, not possible log this)
took 91 seconds
(from 1681 to 1772)

that means deactivate_super frees 22342 pagecache pages per second, so practicly half the speed it could?!
for whatever reason.(I wouldn't know why)

Regarding my previous comment, I wonder if I should:
To free pagecache, dentries and inodes:
# echo 3 > /proc/sys/vm/drop_caches
instead of echo 1 which I was doing - maybe it'd shave those 10 extra seconds.

eh, can't hurt.
Comment 5 elvis has left the building 2016-08-06 22:51:57 UTC
that echo 3 was good! there were no extra seconds wasted by deactivate_super after manually freeing pagecache pages via:
# echo 3 > /proc/sys/vm/drop_caches
however this took 49.4 seconds
[  704.528833] 2051485 total pagecache pages
[  753.949448] 1587 total pagecache pages
which means: 41479/sec
so, 10k/sec less speed.

Still good though! basically half the time it took before :)

So that's all for now. Any ideas, let me know, please! Thanks!
Comment 6 elvis has left the building 2016-08-29 16:08:28 UTC
Created attachment 231171 [details]
I'm including the old .config and the diff to now.

Hey, so great news... I've tweaked my old kernel .config and I've now achieved normal speeds!

previously:    55,824 pagecache pages per second
now       : 1,150,415 pagecache pages per second