"btrfs send" sometimes produces a dump-stream that "btrfs receive" is unable to apply, leading to errors like: ERROR: rename nixpkgs/pkgs/applications/version-management/subversion-1.2.x/default.nix -> o2979-81788-0 failed. No such file or directory This seems to be triggered when large portions of a filesystem tree get deleted and replaced with files/dirs with the same name (jumping around a git tree). I captured a faulty dump that shows this issue. What I did to trigger the error: - create a new subvolume - clone a git repo in there (I used https://github.com/NixOS/nixpkgs) - snapshot, send, receive on other machine - git checkout a very old revision - git checkout HEAD ( so now we end up with the same files/content, but different inodes as git has replaced everything with itself ) - snapshot, send (to file, https://bluescreen303.nl/btrfs-send-error.dump.xz) - receive on other machine gives above error Judging from the dump (trying to read with hex editor), it seems the file "nixpkgs/pkgs/applications/version-management/subversion-1.2.x/default.nix" is not there on the parent subvolume, but it is. It seems an earlier instruction in the stream deletes or moves is but still expects it to exist afterwards.
I tried to attach the dump, but I get an nginx error "request entity too large". It is 2.2Mb so it should be fine (text states the limit is 5Mb)
Ok so this is what I did #!/bin/bash mkfs.btrfs -f /dev/sdb mount /dev/sdb /mnt/btrfs-test btrfs subvol create /mnt/btrfs-test/send btrfs subvol create /mnt/btrfs-test/receive cd /mnt/btrfs-test/send git clone https://github.com/NixOS/nixpkgs.git cd nixpkgs btrfs subvol snap -r /mnt/btrfs-test/send/ /mnt/btrfs-test/snap1 btrfs send -f /mnt/btrfs-test/send1.dump /mnt/btrfs-test/snap1/ mkfs.btrfs -f /dev/sdc mount /dev/sdc /mnt/scratch btrfs receive -f /mnt/btrfs-test/send1.dump /mnt/scratch/ git checkout 3538f7c54933f5a6b4641fb72bba5ffc87df1479 git checkout master btrfs subvol snap -r /mnt/btrfs-test/send /mnt/btrfs-test/snap2 btrfs send -f /mnt/btrfs-test/send2.dump /mnt/btrfs-test/snap2 btrfs receive -f /mnt/btrfs-test/send2.dump /mnt/scratch/ and it is working fine for me on btrfs-next. Is there something not right with how I've tried to mimic your reproducer? Let me know if there's a mistake and I can adjust it to match your reproducer better. Also I need to know your mount options, it may be something specific with your mount options.
You don't use an incremental send the second time: btrfs send -f /mnt/btrfs-test/send2.dump /mnt/btrfs-test/snap2 -p /mnt/btrfs-test/snap1/ Also, this issue does not trigger every time, but once every 2 or 3 tries. Mount options: compress=lzo,ssd,space_cache,inode_cache (ssd is auto detected)
And I'm on a fairly recent btrfs progs: 650e656a8b9c1fbe4ec5cd8c48ae285b8abd3b69
Ok I've reproduced, I'll see if I can figure out what's going wrong.
Fixed with patch [PATCH] Btrfs: check our parent dir when doing a compare send please re-open this bug if the patch does not fix your problem.
it seems to work. thanks!
I'm sorry to report that there is probably still some edge case. Unfortunately, I haven't found a way to reliably hit the issue. I've been doing automatic send-receive backups every hour for 10 days, for 5 volumes, and only hit a problem once now. I will try to find a way to reproduce, but perhaps you can think of an edge case you missed just by looking at the current code? The only possible clue I have: In the past I only/mostly got errors where rename didn't work, but now I got: ERROR: unlink o131536-162708-0 failed. No such file or directory Maybe the codepath for unlinking differs a bit from rename?
I posted another send patch yesterday, could you add that to your kernel and see if that helps? There are 3 send patches I've sent recently, they are Btrfs: check our parent dir when doing a compare send Btrfs: skip subvol entries when checking if we've created a dir already Btrfs: fix send issues related to inode number reuse Make sure you have those 3 applied and let me know if you are still having issues. If you are then a btrfs receive -vv output would be helpful so I can see what is happening, as well as a general overview of your file system and how you are using send/receive so I can figure out what is triggering this problem.
Ah, sorry for the noise then. I only had the first one. I will apply the other 2 and reopen when I still have issues. Thanks
FYI, on a pristine 3.10.7 kernel, I had to cherry-pick 924794c93690ff7d2909fd32e9d88282c700e224 "Btrfs: cleanup unused arguments in send.c" from kernel/git/josef/btrfs-next.git before beeing able to apply the third patch.
Ah, thanks After getting the failure I did not want to find out which other patches I needed, so I just rebased the entire tree (+ third patch) on top of 3.10.7 :)
Ok, reopening. I'm on 3.11 now, merged with btrfs/master @7a2c824f861d4b7be2ce1c92d2de3ff69cbd7aac (2013-08-30) I have a btrfs receive -vv output as well, but it's quite large, so I only include the tail (-n 30). Filesystem overview: btrfs root @ /mnt/btrfs contains subvolumes /nixos-current /nix-store /home-current /VMs These are mounted on /, /nix/store, /home, /home/mathijs/VMs mount options: compress=lzo,space_cache,inode_cache(,rw,relatime,ssd) Every day I create a snapshot for all of them, in (/mnt/btrfs)/_snaps_/[name]-[timestamp] Then I use: btrfs send -v -p /mnt/btrfs/_snaps_/[name]-[previous-timestamp] /mnt/btrfs/_snaps_/[name]-[most-recent-timestamp] | ssh [remote] "btrfs receive -vv /backup/location > /tmp/last-receive.log 2>&1" Usually this goes ok. But still it fails every now and then. Almost always the errors occur in high-changing directories like git source trees, or (in this case) in browser's caching directories. I've seen this happen with chromium in the past as well, but this time it's conkeror (uses firefox xulrunner). Looking at the receive -vv output: line 14 creates an empty directory line 15 moves the 'Cache' directory out of the way line 16 has the empty directory(14) replace 'Cache' line 29 wants to remove stuff from deep inside 'Cache' (which is now empty, so this fails)
Created attachment 107611 [details] btrfs receive -vv output
I hit the problem once again. I'm on 3.12.0 now (sending end). Receiving end is still on 3.11, but from what I understand, the sending end is responsible for the order of operations right? Attaching btrfs receive log
Created attachment 116941 [details] btrfs receive -vv output (sent from kernel 3.12.0)
Do I need to keep on posting receive log outputs? I'm on 3.12.6 and keep on getting these on a weekly basis. The main culprits seem chromium and firefox profiles/cache stuff. Am I really the only one using this feature on a volume containing ~ ?
I also have both / and /home on btrfs running linux-git kernel. I've been having the same problem. Incremental send results in errors such as this: ERROR: unlink var/db/paludis/repositories/alip/packages/dev-util/ketchup/ketchup-0.9.8-r1.exheres-0 failed. No such file or directory
Still having this problem with linux-3.14
I think I have hit the same issue. If you run the below commands, it will error out with ERROR: rename dir/dir -> dir failed. No such file or directory every time. truncate -s 100M raw_a truncate -s 100M raw_b losetup /dev/loop0 ./raw_a losetup /dev/loop1 ./raw_b mkfs.btrfs /dev/loop0 mkfs.btrfs /dev/loop1 mkdir mnt_a mkdir mnt_b mount -t btrfs /dev/loop0 mnt_a mount -t btrfs /dev/loop1 mnt_b btrfs subvolume create mnt_a/subvol mkdir mnt_a/snaps mkdir mnt_a/subvol/dir touch mnt_a/subvol/dir/dir_file1 mkdir mnt_a/subvol/dir2 touch mnt_a/subvol/dir2/dir2_file1 btrfs subvolume snapshot -r mnt_a/subvol mnt_a/snaps/snap1 mv mnt_a/subvol/dir mnt_a/subvol/dir.1 mkdir mnt_a/subvol/dir mv mnt_a/subvol/dir.1 mnt_a/subvol/dir/dir mv mnt_a/subvol/dir2 mnt_a/subvol/dir/ btrfs subvolume snapshot -r mnt_a/subvol mnt_a/snaps/snap2 mkdir mnt_b/snaps btrfs send mnt_a/snaps/snap2 | btrfs receive mnt_b/snaps btrfs send -p mnt_a/snaps/snap2 mnt_a/snaps/snap1 | btrfs receive mnt_b/snaps
Created attachment 144131 [details] Testcase to reproduce failed rename
Alex: I believe you are using the wrong syntax. You are supposed to send snap1 first and snap2 second. In that case snap1 is the parent of snap2. btrfs send mnt_a/snaps/snap1 | btrfs receive mnt_b/snaps btrfs send -p mnt_a/snaps/snap1 mnt_a/snaps/snap2 | btrfs receive mnt_b/snaps I did not try if that makes a difference though. I must say I haven't had issues since 3.15 myself, but I switched to weekly backups instead of nightly, so I cannot say I have tested stuff thoroughly yet.
It shouldn't matter which one is the parent snapshot as far as I am aware; you can transfer older snapshots with a newer parent. The parent is just a reference snapshot that exists on both ends of the link. I just did this to migrate to a new backup drive; I transfered the latest snapshot first and then filled in the earlier ones. However, I was unable to transfer snapshots before a certain one where this sort of change was made.
Of course it matters. The nature of send/receive is that the transaction-log is being replayed from a certain "version" upto the version that's being sent. This log cannot be ran backwards (the "operation" that deletes a file does not contain the full contents of the file) so it will matter. I tried your example and indeed got an error. Then I switched the parameters as I suggested and everything worked fine. I'm using kernel 3.15.5 and btrfs tools 3.14.2 at the moment and haven't had an issue for some time, probably since 3.15. I'm closing this bug now, as my original issue seems fixed.