Possible duplicates: bug 65101 (although I don't have BFQ), bug 74761 (although there is no btrfs-level RAID here). On my laptop, there are two SSDs (/dev/sda and /dev/sdb) in an ISL RAID (driven by mdraid as RAID0). On one partition of the RAID, there is a LUKS volume. That volume, for historical reasons, is used as an LVM PV, and some of the logical volumes inside are formatted as btrfs. This includes the 12GiB logical volume used for /usr/portage shared between multiple Gentoo installations on this laptop. That particular filesystem was created with mixed data+metadata, and currently btrfs fi df /usr/portage shows: System, single: total=4.00MiB, used=4.00KiB Data+Metadata, single: total=12.00GiB, used=11.71GiB GlobalReserve, single: total=240,00MiB, used=0.00 The mount options in /etc/fstab are: compress=zlib,ssd,autodefrag,inode_cache After some multiprocess activity on /usr/portage (as demonstrated by "emerge -av --newuse --deep --update @world"), some kernel thread named "kworker/u16:2" (the number after the colon is different each time) creates huge (70 MB/s read and 70 MB/s write) SSD activity, and uses 15% of the CPU. It does not stop when I stop the "emerge" command, and stops only when I unmount the filesystem. I am afraid that this can physically destroy the SSDs if left unattended. After "echo t > /proc/sysrq-trigger", I have: [ 925.042814] kworker/u16:2 D ffffffff816ee2d0 0 2256 2 0x00000000 [ 925.042819] Workqueue: events_unbound btrfs_async_reclaim_metadata_space [ 925.042821] ffff88024f20b890 0000000000000046 ffff880084f9c260 ffff88024f20bfd8 [ 925.042824] 0000000000014bc0 0000000000014bc0 ffff8802541c8000 ffff880084f9c260 [ 925.042827] ffff88024f20b878 ffffffff810a90eb 0000000000000046 0000000000000004 [ 925.042830] Call Trace: [ 925.042833] [<ffffffff810a90eb>] ? __lock_acquire+0x37b/0x2020 [ 925.042835] [<ffffffff816ee2d0>] ? bit_wait+0x40/0x40 [ 925.042838] [<ffffffff810a884a>] ? mark_held_locks+0x6a/0x90 [ 925.042841] [<ffffffff810d2945>] ? ktime_get+0x105/0x140 [ 925.042844] [<ffffffff810a896d>] ? trace_hardirqs_on_caller+0xfd/0x1c0 [ 925.042847] [<ffffffff8101db59>] ? read_tsc+0x9/0x10 [ 925.042849] [<ffffffff816ee2d0>] ? bit_wait+0x40/0x40 [ 925.042852] [<ffffffff816eda94>] schedule+0x24/0x70 [ 925.042854] [<ffffffff816edd6a>] io_schedule+0x8a/0xd0 [ 925.042857] [<ffffffff816ee2f6>] bit_wait_io+0x26/0x40 [ 925.042859] [<ffffffff816ee0bd>] __wait_on_bit+0x5d/0x90 [ 925.042862] [<ffffffff81140f7f>] wait_on_page_bit+0xbf/0xd0 [ 925.042865] [<ffffffff8109ed30>] ? autoremove_wake_function+0x40/0x40 [ 925.042867] [<ffffffff811410f2>] filemap_fdatawait_range+0x102/0x1a0 [ 925.042871] [<ffffffff812f6187>] btrfs_wait_ordered_range+0x67/0x130 [ 925.042875] [<ffffffff8131ea1d>] __btrfs_write_out_cache+0x35d/0x420 [ 925.042878] [<ffffffff8131edf4>] btrfs_write_out_cache+0x94/0x100 [ 925.042882] [<ffffffff812cc8d3>] btrfs_write_dirty_block_groups+0x503/0x6e0 [ 925.042885] [<ffffffff812da834>] commit_cowonly_roots+0x174/0x230 [ 925.042887] [<ffffffff812dc874>] btrfs_commit_transaction+0x3f4/0x950 [ 925.042890] [<ffffffff812c467d>] flush_space+0x8d/0x4c0 [ 925.042893] [<ffffffff816f2b1c>] ? _raw_spin_unlock+0x2c/0x50 [ 925.042896] [<ffffffff812c4bdc>] btrfs_async_reclaim_metadata_space+0x12c/0x1b0 [ 925.042898] [<ffffffff8107badf>] process_one_work+0x19f/0x4a0 [ 925.042900] [<ffffffff8107ba74>] ? process_one_work+0x134/0x4a0 [ 925.042903] [<ffffffff8107c293>] worker_thread+0x113/0x480 [ 925.042905] [<ffffffff810a8a3d>] ? trace_hardirqs_on+0xd/0x10 [ 925.042908] [<ffffffff8107c180>] ? init_pwq+0xb0/0xb0 [ 925.042910] [<ffffffff81080d2f>] kthread+0xdf/0x100 [ 925.042913] [<ffffffff81080c50>] ? kthread_create_on_node+0x1f0/0x1f0 [ 925.042916] [<ffffffff816f383c>] ret_from_fork+0x7c/0xb0 [ 925.042919] [<ffffffff81080c50>] ? kthread_create_on_node+0x1f0/0x1f0 I have created a disk image with dd. You can download 12 1G-sized parts of it here, including all the data: http://82.193.153.141/btrfs/ The write-hog is readily reproducible by loop-mounting that image and copying some data inside there: mount -t btrfs -o compress=zlib,ssd,autodefrag,inode_cache /dev/loop0 /mnt/btrfs cp inkscape-0.48.5.tar.bz2 /mnt/btrfs/
TL;DR: Had a very similar if not the same problem with 3.15.8 (GrSecurity patch applied); vanilla 3.18.0-rc5 fixes the problem with intensive writes persisting a long time after accessing files; but accessing the file system with `rsync` does still unproportional massive writes. (In reply to Alexander E. Patrakov from comment #0) > of the logical volumes inside are formatted as btrfs. This includes the > 12GiB logical volume used for /usr/portage shared between multiple Gentoo > installations on this laptop. Same here: Gentoo package tree mounted at /usr/portage with btrfs, but actually only around 600 MiB in size. I thought, it couldn't hurt to test the file system there. Turns out, I was wrong. :) > That particular filesystem was created with mixed data+metadata, and > currently btrfs fi df /usr/portage shows: At my end it shows # btrfs fi df /usr/portage System, single: total=4.00MiB, used=4.00KiB Data+Metadata, single: total=596.00MiB, used=572.62MiB unknown, single: total=12.00MiB, used=0.00 > The mount options in /etc/fstab are: mount options are: "defaults" > After some multiprocess activity on /usr/portage (as demonstrated by "emerge > -av --newuse --deep --update @world"), some kernel thread named similar scenario: Doing "cave sync" -- an alternative to "emerge sync" > "kworker/u16:2" (the number after the colon is different each time) creates > huge (70 MB/s read and 70 MB/s write) SSD activity, and uses 15% of the CPU. Can confirm (was between 30 and 70 MB/s). > It does not stop when I stop the "emerge" command, and stops only when I > unmount the filesystem. Can confirm that, too. > I am afraid that this can physically destroy the > SSDs if left unattended. Actually, I can confirm that it's a pretty sure bet, it will trash your SSD. Here my two log entries of my `smartd` | Oct 30 14:22:15 laptop03 smartd[2192]: Device: /dev/sda [SAT], SMART | Prefailure Attribute: 177 Wear_Leveling_Count changed from 97 to 96 | [...] | Nov 13 11:58:25 laptop03 smartd[2191]: Device: /dev/sda [SAT], SMART | Prefailure Attribute: 177 Wear_Leveling_Count changed from 79 to 78 My SSD degraded by more that 15 of 100 points in only two weeks, after being used for over a year previously. 15 points could be compared 3 to 6 years of usage, assuming the number represents degradation linearly; which it probably does. I am an idiot for not realizing the problem earlier, but I started to investigate the problem two weeks after I saw it happen for the first time. When I saw it during this two weeks, I just rebooted the system after I could not figure out quickly why kworker was accessing my disk so much and deferred investigation to a later time. I think it took the biggest hits, when a daily maintenance script was running during the night which triggered the bug. I cannot remember that I restarted the system that often, so the intensive writing might have stopped after some time. > I have created a disk image with dd. You can download 12 1G-sized parts of > it here, including all the data: I have an image which is about 600 MB uncompressed and around 120 MB compressed with `xz`. You can get it compressed here for the next 30 days: http://www.filedropper.com/usrportagebtrfs sha512sum: 8e364c68a78dab0fada41f7e90d0acaa74cd660561103d1b39fea3c6040c291ec7fe5e6368de5e891cfc6e1d0bd4b659b45ff23fddefa9f1941a8a96412f9e79 Using the latest release candidate 3.18.0-rc5 the intensive writes after accessing the file system disappear and no unmount is required anymore. But there are still intensive writes during the rsync process; sample output of `iotop -o` during `rsync`: Total DISK READ : 5.63 M/s | Total DISK WRITE : 108.85 M/s Actual DISK READ: 5.61 M/s | Actual DISK WRITE: 108.93 M/s TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND 945320 be/4 root 5.41 M/s 3.75 K/s 0.00 % 4.08 % rsync --recursive --links --safe~rg/gentoo-portage/ /usr/portage/ 944977 be/4 root 176.28 K/s 76.52 M/s 0.00 % 0.28 % [kworker/u16:4] 945311 be/4 root 26.25 K/s 24.05 M/s 0.00 % 0.05 % [kworker/u16:5] 944982 be/4 root 22.50 K/s 5.97 M/s 0.00 % 0.04 % [kworker/u16:10] 945321 be/4 root 0.00 B/s 1361.46 K/s 0.00 % 0.02 % rsync --recursive --links --safe~rg/gentoo-portage/ /usr/portage/ 943285 be/4 root 7.50 K/s 975.15 K/s 0.00 % 0.01 % [kworker/u16:0] 945309 be/4 root 0.00 B/s 26.25 K/s 0.00 % 0.00 % [kworker/u16:1] The rsync process should not need to write that much data on a pretty recent Gentoo package tree. The executed rsync command is: rsync --recursive --links --safe-links --perms --times --compress --force --whole-file --delet--delete-delay --stats --timeout=180 --filter 'merge /usr/share/paludis/traditional.exclude' --exclude=/.cache --progress rsync://rsync.de.gentoo.org/gentoo-portage/ /usr/portage/ You can replace rsync://rsync.de.gentoo.org/gentoo-portage/ by rsync://rsync.us.gentoo.org/gentoo-portage/ if you are not in Europe, but in the US; everywhere else you might wanna check on http://www.gentoo.org/main/en/mirrors-rsync.xml I copied the data from the corrupted btrfs file system to a fresh one and the massive writes during access disappear, too. So it seems there is actually some corruption in the metadata of the file system, which triggers the massive writes. The corruption must have been introduced by either 3.16.3, 3.15.10 or 3.15.8 (all three from the Gentoo sys-kernel/hardened-sources package). I don't have the logs to retrace which one was running when the problem occured for the first time -- they were already recycled.
I have tried to download the new image in order to rehost it. However, it is truncated to 112 MB and the SHA512SUM doesn't match. Please reupload.
I just downloaded the file and compared it to a fresh `dd | xz` I just made. They have the same SHA512 and size. Obviously, the SHA512 I submitted was wrong. And it could well be about 112 MiB as well as ~ 120 MB -- I rounded a bit liberally. :) file size: 117804208 Bytes (~ 112.35 MiB) sha512sum: 152f470b341f989e40060c902fd6c360cac33bc47afbe93f226f0f3ca2fabccd55fc8a81d7646c15f1fabed0ee1529553529f514c6fa5f72be77018b46a55de0 Seems like I screwed up the SHA512, sorry for that. I hope you can confirm this SHA512 on your side. Thanks for the mirror.
Mirrored as http://82.193.153.141/btrfs/burger-usr.portage.btrfs.xz Note: later on this week I will be replacing the router with a different model, but after that, typical availability guarantees will apply.
I found myself in the same situation, with 60Mb/sec activity for hours on f21 (kernel 3.17.3-300.fc21.x86_64) with a Samsung SSD 840 PRO. My solution was to reboot with 3.12. At this point there was no IO activy, but I had issues with IO, related to full disk (I was ~75% in both Data and Metadata). I ran a "btrfs balance start /" (without dusage, which didn't help), and finally rebooted under 3.17, problem is gone for the time being.
today, same issue was in my laptop (kernel: 3.17.4-1-ARCH). many hours. then... I was run "btrfs balance start /" and this problem is gone (without reboot).
Fixed in 3.18, but not in any of the stable kernels. The fix is: commit 25ce459c1af138f95a3fd318461193397ebb825b Author: Liu Bo <bo.li.liu@oracle.com> Date: Wed Sep 10 12:58:50 2014 +0800 Btrfs: fix loop writing of async reclaim One of my tests shows that when we really don't have space to reclaim via flush_space and also run out of space, this async reclaim work loops on adding itself into the workqueue and keeps writing something to disk according to iostat's results, and these writes mainly comes from commit_transaction which writes super_block. This's unacceptable as it can be bad to disks, especially memeory storages. This adds a check to avoid the above situation. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <clm@fb.com> I will now poke the stable maintainers about it.
memeory/memory