I got the following warning and switch to read-only during rsync activity on btrfs. The fs consists of 8 disks, RAID10 data, RAID1 metadata: root@fs0:~# btrfs fi df /mnt/b Data, RAID10: total=2.01TB, used=2.00TB System, RAID1: total=32.00MB, used=188.00KB Metadata, RAID1: total=12.00GB, used=10.53GB root@fs0:~# btrfs fi show /dev/sdg Label: 'BTR0' uuid: a319a7c8-cdef-407d-a215-140920d5af60 Total devices 8 FS bytes used 2.01TB devid 3 size 2.73TB used 530.00GB path /dev/sdg devid 7 size 2.73TB used 504.00GB path /dev/sdb devid 9 size 2.73TB used 504.00GB path /dev/sdd devid 6 size 2.73TB used 504.00GB path /dev/sda devid 8 size 2.73TB used 504.00GB path /dev/sdc devid 2 size 2.73TB used 531.03GB path /dev/sdf devid 5 size 2.73TB used 525.00GB path /dev/sde devid 4 size 2.73TB used 530.03GB path /dev/sdh Btrfs v0.20-rc1-324-g650e656 disks /dev/sd[a-d] were added just a few hours before and 'btrfs fi balance' has been run. [723039.753247] btrfs: relocating block group 1634365079552 flags 65 [723067.913330] btrfs: found 38314 extents [723168.305215] btrfs: found 38314 extents [723169.709185] btrfs: relocating block group 1632217595904 flags 65 [723205.366252] btrfs: found 23256 extents [723241.447709] ------------[ cut here ]------------ [723241.452487] WARNING: CPU: 2 PID: 10899 at fs/btrfs/extent-tree.c:871 btrfs_lookup_extent_info+0x287/0x302 [btrfs]() [723241.463259] Modules linked in: raid1 nfsd auth_rpcgss oid_registry nfs_acl lockd sunrpc xfs loop raid10 md_mod x86_pkg_temp_thermal coretemp kvm_intel kvm crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 ablk_helper cryptd lrw gf128mul iTCO_wdt iTCO_vendor_support glue_helper psmouse lpc_ich microcode i2c_i801 serio_raw pcspkr mfd_core i2c_core evdev ipmi_si ipmi_msghandler video acpi_cpufreq button processor btrfs xor raid6_pq crc32c libcrc32c usb_storage sg sd_mod hid_generic usbhid hid ahci libahci ehci_pci mpt2sas ehci_hcd raid_class libata e1000e fan usbcore ptp scsi_transport_sas usb_common pps_core thermal thermal_sys scsi_mod [723241.523504] CPU: 2 PID: 10899 Comm: btrfs Not tainted 3.12.0 #33 [723241.529650] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.0a 06/08/2012 [723241.537837] 0000000000000000 0000000000000009 ffffffff81389e0b 0000000000000000 [723241.545562] ffffffff8103574a 0000000000000000 ffffffffa01dbf29 0000000000000000 [723241.553171] ffff8803d3588e20 0000000000000000 0000000000000000 0000000000000000 [723241.560780] Call Trace: [723241.563346] [<ffffffff81389e0b>] ? dump_stack+0x41/0x51 [723241.568772] [<ffffffff8103574a>] ? warn_slowpath_common+0x78/0x90 [723241.575091] [<ffffffffa01dbf29>] ? btrfs_lookup_extent_info+0x287/0x302 [btrfs] [723241.582754] [<ffffffffa01dbf29>] ? btrfs_lookup_extent_info+0x287/0x302 [btrfs] [723241.590283] [<ffffffffa01d16cc>] ? update_ref_for_cow+0x5d/0x280 [btrfs] [723241.597321] [<ffffffffa020952e>] ? write_extent_buffer+0x8e/0x106 [btrfs] [723241.604319] [<ffffffffa01d1a89>] ? __btrfs_cow_block+0x19a/0x382 [btrfs] [723241.611244] [<ffffffffa01d1dab>] ? btrfs_cow_block+0xe7/0x13b [btrfs] [723241.617903] [<ffffffffa01d493a>] ? btrfs_search_slot+0x248/0x682 [btrfs] [723241.624814] [<ffffffff81057c8d>] ? should_resched+0x5/0x23 [723241.630504] [<ffffffffa0228a51>] ? do_relocation+0xff/0x3f5 [btrfs] [723241.636971] [<ffffffff81057c8d>] ? should_resched+0x5/0x23 [723241.642673] [<ffffffffa0229e18>] ? relocate_tree_blocks+0x309/0x4b6 [btrfs] [723241.649844] [<ffffffffa01d119c>] ? btrfs_release_path+0x38/0x53 [btrfs] [723241.656673] [<ffffffffa022b33c>] ? relocate_block_group+0x237/0x4eb [btrfs] [723241.663847] [<ffffffffa022b733>] ? btrfs_relocate_block_group+0x143/0x26a [btrfs] [723241.671578] [<ffffffffa020bff9>] ? btrfs_relocate_chunk.isra.59+0x59/0x5fb [btrfs] [723241.679524] [<ffffffffa0218ffa>] ? btrfs_set_lock_blocking_rw+0xc9/0xdb [btrfs] [723241.687046] [<ffffffffa01cf51c>] ? btrfs_item_key_to_cpu+0x12/0x30 [btrfs] [723241.694128] [<ffffffffa02013eb>] ? btrfs_get_token_64+0x76/0xc6 [btrfs] [723241.700947] [<ffffffffa0203e2b>] ? release_extent_buffer+0x90/0x97 [btrfs] [723241.708025] [<ffffffffa020ea7f>] ? btrfs_balance+0x980/0xb44 [btrfs] [723241.714586] [<ffffffffa02150d8>] ? btrfs_ioctl_balance+0x21a/0x296 [btrfs] [723241.721749] [<ffffffffa0217906>] ? btrfs_ioctl+0xf3d/0x1eab [btrfs] [723241.728221] [<ffffffff810df524>] ? handle_mm_fault+0x22c/0x8aa [723241.734268] [<ffffffff81111ab9>] ? path_openat+0x233/0x4c5 [723241.739957] [<ffffffff81390c33>] ? __do_page_fault+0x349/0x3d7 [723241.746006] [<ffffffff810e28b9>] ? vma_gap_callbacks_propagate+0x13/0x2c [723241.752920] [<ffffffff810e33f5>] ? vma_link+0x6e/0x8c [723241.758173] [<ffffffff81113186>] ? vfs_ioctl+0x1e/0x31 [723241.763517] [<ffffffff8111395c>] ? do_vfs_ioctl+0x3ad/0x3ef [723241.769290] [<ffffffff8105cbbf>] ? vtime_account_user+0x35/0x40 [723241.775413] [<ffffffff811139ec>] ? SyS_ioctl+0x4e/0x7e [723241.780754] [<ffffffff8139301b>] ? tracesys+0xdd/0xe2 [723241.786011] ---[ end trace 5763625844d51f70 ]--- [723241.790742] BTRFS error (device sdg) in update_ref_for_cow:863: errno=-30 Readonly filesystem [723241.799414] BTRFS info (device sdg): forced readonly [723241.804493] ------------[ cut here ]------------
Could you please run a btrfs check on your file-system with recent btrfs-progs? The ones displayed in your bug report are somewhat old (6 months being old by btrfs standard). I would advise you to use progs not older than November. If your distribution do not provide recent ones, you can follow those instructions: https://btrfs.wiki.kernel.org/index.php/Btrfs_source_repositories#Official_repository Then please report back the output of "btrfs check /dev/sda" (the fs has to be unmounted).
btrfsck output: checking extents Extent back ref already exists for 1579109920768 parent 1576533598208 root 0 Extent back ref already exists for 1581562929152 parent 1576533598208 root 0 Extent back ref already exists for 4679099793408 parent 1576533598208 root 0 Extent back ref already exists for 4679099813888 parent 1576533598208 root 0 Extent back ref already exists for 1579207237632 parent 1576533598208 root 0 Extent back ref already exists for 1579207303168 parent 1576533598208 root 0 Extent back ref already exists for 1579161665536 parent 1576533598208 root 0 Extent back ref already exists for 1579700260864 parent 1576533598208 root 0 [~249k similar lines omitted] ref mismatch on [1576349827072 4096] extent item 1, found 2 Incorrect global backref count on 1576349827072 found 1 wanted 2 backpointer mismatch on [1576349827072 4096] ref mismatch on [1576349941760 4096] extent item 1, found 2 Incorrect global backref count on 1576349941760 found 1 wanted 2 backpointer mismatch on [1576349941760 4096] ref mismatch on [1576349974528 4096] extent item 1, found 2 Incorrect global backref count on 1576349974528 found 1 wanted 2 backpointer mismatch on [1576349974528 4096] ref mismatch on [1576350085120 4096] extent item 1, found 2 Incorrect global backref count on 1576350085120 found 1 wanted 2 backpointer mismatch on [1576350085120 4096] ref mismatch on [1576350281728 4096] extent item 1, found 2 Incorrect global backref count on 1576350281728 found 1 wanted 2 backpointer mismatch on [1576350281728 4096] [~751k similar lines omitted] checking free space cache checking fs roots checking csums checking root refs Checking filesystem on /dev/sdg UUID: a319a7c8-cdef-407d-a215-140920d5af60 found 300811738651 bytes used err is 0 total csum bytes: 2169886676 total tree bytes: 11531677696 total fs tree bytes: 8164470784 total extent tree bytes: 899383296 btree space waste bytes: 2464195755 file data blocks allocated: 2397512151040 referenced 2397510746112 Btrfs v0.20-rc1-596-ge9ac73b Should I try the --repair option?
update_ref_for_cow does not expect to get 0 from btrfs_lookup_extent_info and turns the filesystem to read-only. Commit 74be9510876a66ad9826613ac8a526d26f9e7f01 "Btrfs: optimize btrfs_lookup_extent_info()" seems to touch the code
Response to Filipe's question on the ML: yes, the fs had skinny extents enabled. I've run 'btrfsck --repair' on that fs and it doesn't seem that it has helped much: after two days I tried to check it again and now the btrfsck output is almost 10M lines, some of the new ones are like ref mismatch on [1576349474816 4096] extent item 1, found 2 Backref 1576349474816 parent 1573 root 1573 not found in extent tree Backref 1576349474816 parent 1583175602176 not referenced back 0x777bebf0 backpointer mismatch on [1576349474816 4096] but most of them (~8M) are like Backref 1585929220096 root 1573 owner 4778267 offset 106496 num_refs 0 not found in extent tree Incorrect local backref count on 1585929220096 root 1573 owner 4778267 offset 106496 found 1 wanted 0 back 0xa29dafb0 Incorrect local backref count on 1585929220096 parent 4248626778112 owner 0 offset 0 found 0 wanted 1 back 0x216cd9d0 Backref disk bytenr does not match extent record, bytenr=1585929220096, ref bytenr=64 backpointer mismatch on [1585929220096 4096] and the otput ends with Errors found in extent allocation tree or chunk allocation checking free space cache checking fs roots checking csums checking root refs Checking filesystem on /dev/sdb UUID: a319a7c8-cdef-407d-a215-140920d5af60 free space inode generation (0) did not match free space cache generation (40846) free space inode generation (0) did not match free space cache generation (40507) free space inode generation (0) did not match free space cache generation (40846) free space inode generation (0) did not match free space cache generation (37774) free space inode generation (0) did not match free space cache generation (40507) free space inode generation (0) did not match free space cache generation (40507) free space inode generation (0) did not match free space cache generation (40847) free space inode generation (0) did not match free space cache generation (40657) free space inode generation (0) did not match free space cache generation (40869) free space inode generation (0) did not match free space cache generation (40921) free space inode generation (0) did not match free space cache generation (40876) free space inode generation (0) did not match free space cache generation (40859) free space inode generation (0) did not match free space cache generation (40879) free space inode generation (0) did not match free space cache generation (40685) free space inode generation (0) did not match free space cache generation (40685) free space inode generation (0) did not match free space cache generation (40885) free space inode generation (0) did not match free space cache generation (40900) free space inode generation (0) did not match free space cache generation (40507) free space inode generation (0) did not match free space cache generation (38953) free space inode generation (0) did not match free space cache generation (40921) found 616910851356 bytes used err is 0 total csum bytes: 2189697104 total tree bytes: 11586048000 total fs tree bytes: 8170622976 total extent tree bytes: 920498176 btree space waste bytes: 2473197330 file data blocks allocated: 2388866162688 referenced 2388864757760 Btrfs v3.12
I've seen that fsck prior to 3.12 reports errors like Incorrect local backref count on 1585929220096 root 1573 owner 4778267 offset 106496 found 1 wanted 0 back 0xa29dafb0 that do not exist, but I see you've used the integration version or 3.12 proper. In connection with the skinny extents I don't think the patch identified in comment 3 is the the root cause, or at lest not the only one.
I'm able to reproduce something like the first btrfsck messages on a fresh filesystem on stock 3.12 kernel and 3.12 btrfs-progs: mkfs.btrfs -f -d raid1 -m raid1 -L BTR1 -O skinny-metadata /dev/sd[mno] mount LABEL=BTR1 /mnt/b1 -o noatime,nodiratime rsync -ax --specials --del --inplace /mnt/x1/home /mnt/b1 # day of fs activity on /mnt/x1/home, BTR1 untouched cd /mnt/b1 btrfs sub snaps -r home home_1 cd .. rsync -ax --specials --del --inplace /mnt/x1/home /mnt/b1 umount /mnt/b1 btrfsck /dev/sdm checking extents Extent back ref already exists for 1093533696 parent 0 root 569 Extent back ref already exists for 1093566464 parent 0 root 569 Extent back ref already exists for 1093599232 parent 0 root 569 [~43k lines omitted] ref mismatch on [29392896 16384] extent item 1, found 2 Incorrect global backref count on 29392896 found 1 wanted 2 backpointer mismatch on [29392896 16384] ref mismatch on [29458432 16384] extent item 1, found 2 Incorrect global backref count on 29458432 found 1 wanted 2 backpointer mismatch on [29458432 16384] [~170k lines omitted] Errors found in extent allocation tree or chunk allocation checking free space cache checking fs roots checking csums checking root refs Checking filesystem on /dev/sdn UUID: 2ec4b5a1-a2c4-4687-9015-cf5b519297f3 found 220988564208 bytes used err is 0 total csum bytes: 1748031412 total tree bytes: 8836808704 total fs tree bytes: 6147653632 total extent tree bytes: 671137792 btree space waste bytes: 1125248357 file data blocks allocated: 1803477946368 referenced 1803477946368 Btrfs v3.12
My external hardrive with RAID1 configuration has this problem occurred twice. I've re-create the file system with RAID1 and after a few weeks. The problem "Errors found in extent allocation tree or chunk allocation" occur again.
Still reproducible on 3.13-rc7 on fresh fs with -d raid10 -m raid1 (procedure as above, only with a few more data modification, shapshot cycles). Got even a new kind of message from btrfsck, see lines after "checking fs roots"). Unable to reproduce it on -d raid1 fs. btrfsck output: checking extents Extent back ref already exists for 1041367040 parent 0 root 258 [~150k lines omitted] Extent back ref already exists for 444234022912 parent 0 root 258 Extent back ref already exists for 444234366976 parent 0 root 258 Extent back ref already exists for 444234874880 parent 0 root 258 ref mismatch on [29753344 16384] extent item 1, found 2 Incorrect global backref count on 29753344 found 1 wanted 2 backpointer mismatch on [29753344 16384] ref mismatch on [38338560 16384] extent item 1, found 2 Incorrect global backref count on 38338560 found 1 wanted 2 backpointer mismatch on [38338560 16384] [~1M lines omitted] Errors found in extent allocation tree or chunk allocation checking free space cache checking fs roots root 258 inode 849332 errors 100, file extent discount root 258 inode 3410411 errors 100, file extent discount root 258 inode 3410412 errors 100, file extent discount root 258 inode 3410413 errors 100, file extent discount root 258 inode 3410414 errors 100, file extent discount [~3k lines omitted] Checking filesystem on /dev/sda UUID: c43757c9-3063-4b80-89ba-64e1bff63519 found 77997139921 bytes used err is 1 total csum bytes: 737692240 total tree bytes: 5144248320 total fs tree bytes: 3975954432 total extent tree bytes: 317571072 btree space waste bytes: 691743734 file data blocks allocated: 863021318144 referenced 863021318144 Btrfs v3.12
These errors reported by btrfsck are somewhat transient: they often disappear after another mount + fs activity + umount. While copying a time series of snapshots from a different fs btrfs sub cre $DST/home for i in home_* ; do rsync -ax --specials --del --inplace $i/ $DST/home btrfs sub snap -r $DST/home $DST/$i umount $DST btrfsck $DEV mount $DST done out of 50 snapshots, btrfsck had output errors after 10 of them, it looks random (runs 11 12 19 20 22 32 37 42 48 50). The number of btrfsck errors reported in one run varied wildly, from 130 lines to >300k lines. kernel 3.13, btrfs-progs 3.12, fs 4*4TB -d raid10 -m raid1 -O skinny-metadata
I just reproduced the bug in 3.17. /dev/sdp was added to 4 disk "-m raid1 -d raid10 -O skinny-metadata" fs, then "btrfs balance start /mnt/b3" was run. There was some activity on the fs after the "balance start", but the fs was idle for several hours (except for the running balance) before hitting the BUG. The fs was created a week ago on 3.16.3 and and was clean according to "btrfs check" run just before adding the drive. After the BUG, it was impossible to umount, but after reboot, it checked clean as well. root@fs0:~# btrfs fi df /mnt/b3 Data, RAID10: total=2.34TiB, used=2.34TiB Data, single: total=8.00MiB, used=0.00 System, RAID1: total=8.00MiB, used=288.00KiB System, single: total=4.00MiB, used=0.00 Metadata, RAID1: total=15.00GiB, used=13.81GiB Metadata, single: total=8.00MiB, used=0.00 unknown, single: total=512.00MiB, used=15.88MiB root@fs0:~# btrfs fi show /mnt/b3 Label: 'BTR3' uuid: f181dd81-c219-44fc-b113-3a8cfd0d3295 Total devices 5 FS bytes used 2.35TiB devid 1 size 2.73TiB used 1.05TiB path /dev/sde devid 2 size 2.73TiB used 1.05TiB path /dev/sdf devid 3 size 2.73TiB used 1.05TiB path /dev/sdg devid 4 size 2.73TiB used 1.05TiB path /dev/sdh devid 5 size 3.64TiB used 523.00GiB path /dev/sdp Btrfs v3.16 [88026.168587] WARNING: CPU: 0 PID: 6486 at fs/btrfs/extent-tree.c:876 btrfs_lookup_extent_info+0x292/0x30a [btrfs]() [88026.178969] Modules linked in: nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc xfs libcrc32c loop raid10 md_mod x86_pkg_temp_thermal coretemp kvm_intel kvm crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 ablk_help er iTCO_wdt iTCO_vendor_support cryptd lrw gf128mul glue_helper lpc_ich i2c_i801 psmouse i2c_core mfd_core pcspkr serio_raw evdev microcode ipmi_si battery ipmi_msghandler video tpm_tis tpm button acpi_cpufreq processor ie31200_edac edac_c ore btrfs xor raid6_pq sg sd_mod uas usb_storage hid_generic usbhid hid e1000e mpt2sas raid_class ptp crc32c_intel pps_core scsi_transport_sas thermal ahci libahci libata ehci_pci ehci_hcd fan thermal_sys usbcore scsi_mod usb_common [88026.242729] CPU: 0 PID: 6486 Comm: btrfs Not tainted 3.17.0 #10 [88026.248675] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.0a 06/08/2012 [88026.256375] 0000000000000000 0000000000000009 ffffffff813a6a46 0000000000000000 [88026.263889] ffffffff8103b591 0000000000000008 ffffffffc01decc4 ffff880077352140 [88026.271398] ffff8800773520b0 0000000000000000 0000000000000000 ffff88035e829000 [88026.278919] Call Trace: [88026.281396] [<ffffffff813a6a46>] ? dump_stack+0x41/0x51 [88026.286729] [<ffffffff8103b591>] ? warn_slowpath_common+0x78/0x90 [88026.292939] [<ffffffffc01decc4>] ? btrfs_lookup_extent_info+0x292/0x30a [btrfs] [88026.300380] [<ffffffffc01decc4>] ? btrfs_lookup_extent_info+0x292/0x30a [btrfs] [88026.307815] [<ffffffffc01df38f>] ? walk_down_proc+0xaf/0x1e3 [btrfs] [88026.314283] [<ffffffff8110bc2a>] ? kmem_cache_alloc+0x91/0x104 [88026.320234] [<ffffffffc01e15b8>] ? walk_down_tree+0x40/0xa9 [btrfs] [88026.326627] [<ffffffffc01e3f3e>] ? btrfs_drop_snapshot+0x2c4/0x656 [btrfs] [88026.333622] [<ffffffffc0231297>] ? merge_reloc_roots+0xf0/0x1ca [btrfs] [88026.340363] [<ffffffffc0231972>] ? relocate_block_group+0x445/0x4bd [btrfs] [88026.347442] [<ffffffffc0231b39>] ? btrfs_relocate_block_group+0x14f/0x267 [btrfs] [88026.355076] [<ffffffffc02106b7>] ? btrfs_relocate_chunk.isra.58+0x58/0x5e2 [btrfs] [88026.362775] [<ffffffffc01d1786>] ? btrfs_item_key_to_cpu+0x12/0x30 [btrfs] [88026.369779] [<ffffffffc0205695>] ? btrfs_get_token_64+0x76/0xc6 [btrfs] [88026.376512] [<ffffffffc020c90b>] ? release_extent_buffer+0x9d/0xa4 [btrfs] [88026.383510] [<ffffffffc0213186>] ? btrfs_balance+0x9b0/0xb9d [btrfs] [88026.389977] [<ffffffffc021a646>] ? btrfs_ioctl_balance+0x21a/0x297 [btrfs] [88026.396977] [<ffffffffc021d462>] ? btrfs_ioctl+0x10f4/0x20a5 [btrfs] [88026.403443] [<ffffffff81121b9e>] ? path_openat+0x233/0x4c5 [88026.409039] [<ffffffff81030620>] ? __do_page_fault+0x339/0x3df [88026.414984] [<ffffffff810f2b9c>] ? __vma_link_rb+0x58/0x73 [88026.420581] [<ffffffff810f2c22>] ? vma_link+0x6b/0x8a [88026.425744] [<ffffffff811237f8>] ? do_vfs_ioctl+0x3ec/0x435 [88026.431428] [<ffffffff8105b9e0>] ? vtime_account_user+0x35/0x40 [88026.437460] [<ffffffff8112388a>] ? SyS_ioctl+0x49/0x77 [88026.442710] [<ffffffff813aafac>] ? tracesys+0x7e/0xe2 [88026.447865] [<ffffffff813ab00b>] ? tracesys+0xdd/0xe2 [88026.453021] ---[ end trace 389e0049cf5821d8 ]--- [88026.457665] ------------[ cut here ]------------ [88026.462316] kernel BUG at fs/btrfs/extent-tree.c:7727! [88026.467471] invalid opcode: 0000 [#1] SMP [88026.471622] Modules linked in: nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc xfs libcrc32c loop raid10 md_mod x86_pkg_temp_thermal coretemp kvm_intel kvm crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 ablk_helper iTCO_wdt iTCO_vendor_support cryptd lrw gf128mul glue_helper lpc_ich i2c_i801 psmouse i2c_core mfd_core pcspkr serio_raw evdev microcode ipmi_si battery ipmi_msghandler video tpm_tis tpm button acpi_cpufreq processor ie31200_edac edac_core btrfs xor raid6_pq sg sd_mod uas usb_storage hid_generic usbhid hid e1000e mpt2sas raid_class ptp crc32c_intel pps_core scsi_transport_sas thermal ahci libahci libata ehci_pci ehci_hcd fan thermal_sys usbcore scsi_mod usb_common [88026.535334] CPU: 0 PID: 6486 Comm: btrfs Tainted: G W 3.17.0 #10 [88026.542402] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.0a 06/08/2012 [88026.550088] task: ffff88040d71f050 ti: ffff8800041a0000 task.ti: ffff8800041a0000 [88026.557600] RIP: 0010:[<ffffffffc01df3a5>] [<ffffffffc01df3a5>] walk_down_proc+0xc5/0x1e3 [btrfs] [88026.566621] RSP: 0018:ffff8800041a3908 EFLAGS: 00010246 [88026.571948] RAX: 0000000000000000 RBX: ffff8801f3045ba0 RCX: 000000000000004d [88026.579105] RDX: 0000000000000002 RSI: ffff8800773520b0 RDI: ffff88040a34c800 [88026.586263] RBP: ffff880077352140 R08: 0000000000000000 R09: ffffffff814eee00 [88026.593420] R10: ffffffff814eee00 R11: ffffffff814eee00 R12: 0000000000000003 [88026.600577] R13: ffff88035e829000 R14: ffff8803fc4bd700 R15: 0000000000000003 [88026.607733] FS: 00007fa15da57880(0000) GS:ffff88041fc00000(0000) knlGS:0000000000000000 [88026.615853] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [88026.621614] CR2: 00007fff2d7ccf48 CR3: 00000001f79de000 CR4: 00000000001407f0 [88026.628772] Stack: [88026.630806] ffff8803fc4bd758 ffffffff8110bc2a 0000000000000000 ffff8801b40beac0 [88026.638320] 0000000000000188 ffff880077352140 0000000000000003 ffff8803fc4bd700 [88026.645832] ffff8801b40beac0 ffff8801b40beac0 ffff88035e829000 ffffffffc01e15b8 [88026.653345] Call Trace: [88026.655814] [<ffffffff8110bc2a>] ? kmem_cache_alloc+0x91/0x104 [88026.661756] [<ffffffffc01e15b8>] ? walk_down_tree+0x40/0xa9 [btrfs] [88026.673781] [<ffffffffc01e3f3e>] ? btrfs_drop_snapshot+0x2c4/0x656 [btrfs] [88026.680768] [<ffffffffc0231297>] ? merge_reloc_roots+0xf0/0x1ca [btrfs] [88026.687500] [<ffffffffc0231972>] ? relocate_block_group+0x445/0x4bd [btrfs] [88026.694579] [<ffffffffc0231b39>] ? btrfs_relocate_block_group+0x14f/0x267 [btrfs] [88026.702187] [<ffffffffc02106b7>] ? btrfs_relocate_chunk.isra.58+0x58/0x5e2 [btrfs] [88026.709878] [<ffffffffc01d1786>] ? btrfs_item_key_to_cpu+0x12/0x30 [btrfs] [88026.716864] [<ffffffffc0205695>] ? btrfs_get_token_64+0x76/0xc6 [btrfs] [88026.723587] [<ffffffffc020c90b>] ? release_extent_buffer+0x9d/0xa4 [btrfs] [88026.730573] [<ffffffffc0213186>] ? btrfs_balance+0x9b0/0xb9d [btrfs] [88026.737043] [<ffffffffc021a646>] ? btrfs_ioctl_balance+0x21a/0x297 [btrfs] [88026.744035] [<ffffffffc021d462>] ? btrfs_ioctl+0x10f4/0x20a5 [btrfs] [88026.750493] [<ffffffff81121b9e>] ? path_openat+0x233/0x4c5 [88026.756082] [<ffffffff81030620>] ? __do_page_fault+0x339/0x3df [88026.762025] [<ffffffff810f2b9c>] ? __vma_link_rb+0x58/0x73 [88026.767613] [<ffffffff810f2c22>] ? vma_link+0x6b/0x8a [88026.772769] [<ffffffff811237f8>] ? do_vfs_ioctl+0x3ec/0x435 [88026.778446] [<ffffffff8105b9e0>] ? vtime_account_user+0x35/0x40 [88026.784476] [<ffffffff8112388a>] ? SyS_ioctl+0x49/0x77 [88026.789718] [<ffffffff813aafac>] ? tracesys+0x7e/0xe2 [88026.794873] [<ffffffff813ab00b>] ? tracesys+0xdd/0xe2 [88026.800029] Code: 00 00 00 4c 89 f9 4c 89 ee 48 89 04 24 48 8b 13 e8 a3 f6 ff ff 83 f8 f4 75 02 0f 0b 85 c0 0f 85 16 01 00 00 4b 83 3c fe 00 75 02 <0f> 0b 41 83 be 94 00 00 00 01 49 63 c4 75 41 49 83 3c c6 01 0f [88026.820258] RIP [<ffffffffc01df3a5>] walk_down_proc+0xc5/0x1e3 [btrfs] [88026.826924] RSP <ffff8800041a3908> [88026.830865] ---[ end trace 389e0049cf5821d9 ]---
Created attachment 153051 [details] dmesg of kernel warning / bug during balance after deleting a device I think I'm hitting the same bug with my raid5 array. I have one disk that causes read errors in a 4-disk array and want to replace it with a new disk, following the guide for "Replacing a drive that hasn't failed yet on a running raid5 array" at http://marc.merlins.org/perso/btrfs/post_2014-03-23_Btrfs-Raid5-Status.html . Essentially what I did was adding the new drive to the array, then deleting the drive that causes read errors. The delete operation apparently causes a balance operation (but you cannot use "btrfs balance status", the "btrfs device delete" command is just blocking). It already ran for several hours and crashed the machine hard during the night (no dmesg :-( ). After I restarted it, I got the dmesg attached, which seems to be the same issue as described here. Kernel is 3.17-rc5+, which is an unmodified cmason's "integration" branch fetched on 2014-10-08, latest commit a43bb39b5c710e79e921fb76341bacc418cfde78 # btrfs fi show Label: none uuid: 9f3a48b7-1b88-44f0-a387-f3712fc2c0b6 Total devices 5 FS bytes used 4.43TiB devid 1 size 3.64TiB used 1.50TiB path /dev/mapper/wdred4tb2 devid 2 size 0.00 used 1.20TiB path /dev/dm-4 devid 3 size 3.64TiB used 1.50TiB path /dev/dm-5 devid 4 size 3.64TiB used 1.50TiB path /dev/dm-6 devid 5 size 3.64TiB used 307.69GiB path /dev/dm-2 Device 2 is the failing one that shall be deleted, device 5 the new one. The command I used was (including output): # btrfs device delete /dev/mapper/wdred4tb3 . Segmentation fault
Created attachment 153071 [details] output of btrfsck after bug and reboot As requested on irc by cmason, I did a btrfsck on the raid5-fs after the bug occured again tonight. The output is attached. Note that there were read errors during operation, that might explain some of the errors. I will attach the dmesg I captured during the run next. btrfs-progs version as reported by "btrfs --version" is v3.16.2.
Created attachment 153081 [details] dmesg captured during the btrfsck run
I'm unable to reproduce the BUG on a fs without "-O skinny-metadata": created fs on the same four disks, copied exactly the same data there, added another disk, run balance, removed disk, balance, again added disk, balance -- no errors so far. On a "-O skinny-metadata" fs, the balance always crashed.
I just got similar bug at the mount time. I had the usual four disk "-d raid10 m raid1 -O skinny-metadata" fs, copied the data, added another drive, run balance, got the usual BUG, rebooted to 3.18-rc1 and the fs did not mount, see the messages below. btrfsck refuses to check it ("Couldn't open file system", v3.16.2). [ 49.778006] WARNING: CPU: 1 PID: 1914 at fs/btrfs/extent-tree.c:876 btrfs_lookup_extent_info+0x292/0x30a [btrfs]() [ 49.788437] Modules linked in: xfs libcrc32c loop raid10 md_mod x86_pkg_temp_thermal coretemp kvm_intel kvm crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 ablk_helper cryptd lrw gf128mul iTCO_wdt glue_helper iTCO_vendor_support i2c_i801 psmouse pcspkr lpc_ich serio_raw mfd_core microcode evdev i2c_core ipmi_si battery ipmi_msghandler video tpm_tis tpm ie31200_edac edac_core button acpi_cpufreq processor btrfs xor raid6_pq sg sd_mod uas usb_storage hid_generic usbhid hid ahci libahci thermal crc32c_intel libata mpt2sas raid_class scsi_transport_sas ehci_pci ehci_hcd e1000e ptp pps_core fan thermal_sys scsi_mod usbcore usb_common [ 49.849834] CPU: 1 PID: 1914 Comm: mount Not tainted 3.18.0-rc1 #12 [ 49.856522] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.0a 06/08/2012 [ 49.864260] 0000000000000000 0000000000000009 ffffffff813aeaf1 0000000000000000 [ 49.872029] ffffffff8103b70a 0000000000000008 ffffffffc032206f ffff88040abb80b0 [ 49.879792] ffff8800d9600380 0000000000000000 0000000000000000 ffff880406a8b000 [ 49.887559] Call Trace: [ 49.890085] [<ffffffff813aeaf1>] ? dump_stack+0x41/0x51 [ 49.895473] [<ffffffff8103b70a>] ? warn_slowpath_common+0x78/0x90 [ 49.901739] [<ffffffffc032206f>] ? btrfs_lookup_extent_info+0x292/0x30a [btrfs] [ 49.909235] [<ffffffffc032206f>] ? btrfs_lookup_extent_info+0x292/0x30a [btrfs] [ 49.916729] [<ffffffffc0322708>] ? walk_down_proc+0xaf/0x1e3 [btrfs] [ 49.923243] [<ffffffff8110f9d0>] ? kmem_cache_alloc+0x90/0x103 [ 49.929250] [<ffffffffc03248e2>] ? walk_down_tree+0x40/0xa9 [btrfs] [ 49.935690] [<ffffffffc0327280>] ? btrfs_drop_snapshot+0x2c4/0x656 [btrfs] [ 49.942740] [<ffffffffc03755d2>] ? merge_reloc_roots+0xf0/0x1c4 [btrfs] [ 49.949532] [<ffffffffc03761f7>] ? btrfs_recover_relocation+0x271/0x30e [btrfs] [ 49.962820] [<ffffffffc03342c2>] ? open_ctree+0x19f9/0x1d5c [btrfs] [ 49.969261] [<ffffffffc03149ef>] ? btrfs_mount+0x44d/0x5b4 [btrfs] [ 49.975605] [<ffffffff810e885d>] ? pcpu_next_unpop+0x36/0x43 [ 49.981425] [<ffffffff810e9b2e>] ? pcpu_alloc+0x439/0x4f5 [ 49.986981] [<ffffffff8111cdad>] ? mount_fs+0x61/0x143 [ 49.992285] [<ffffffff811301ea>] ? vfs_kern_mount+0x62/0xe0 [ 49.998019] [<ffffffff811325a8>] ? do_mount+0x6bd/0x84e [ 50.003408] [<ffffffff810e5e9d>] ? memdup_user+0x31/0x56 [ 50.008885] [<ffffffff81132970>] ? SyS_mount+0x6f/0x9d [ 50.014188] [<ffffffff813b2df0>] ? tracesys_phase2+0xd4/0xd9 [ 50.020007] ---[ end trace d38db2664afe0a63 ]--- [ 50.024709] ------------[ cut here ]------------ [ 50.029398] kernel BUG at fs/btrfs/extent-tree.c:7745! [ 50.034604] invalid opcode: 0000 [#1] SMP [ 50.038909] Modules linked in: xfs libcrc32c loop raid10 md_mod x86_pkg_temp_thermal coretemp kvm_intel kvm crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 ablk_helper cryptd lrw gf128mul iTCO_wdt glue_helper iTCO_vendor_support i2c_i801 psmouse pcspkr lpc_ich serio_raw mfd_core microcode evdev i2c_core ipmi_si battery ipmi_msghandler video tpm_tis tpm ie31200_edac edac_core button acpi_cpufreq processor btrfs xor raid6_pq sg sd_mod uas usb_storage hid_generic usbhid hid ahci libahci thermal crc32c_intel libata mpt2sas raid_class scsi_transport_sas ehci_pci ehci_hcd e1000e ptp pps_core fan thermal_sys scsi_mod usbcore usb_common [ 50.100230] CPU: 1 PID: 1914 Comm: mount Tainted: G W 3.18.0-rc1 #12 [ 50.108054] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.0a 06/08/2012 [ 50.115793] task: ffff8804070bab70 ti: ffff88040eb04000 task.ti: ffff88040eb04000 [ 50.123356] RIP: 0010:[<ffffffffc032271e>] [<ffffffffc032271e>] walk_down_proc+0xc5/0x1e3 [btrfs] [ 50.132479] RSP: 0018:ffff88040eb07a68 EFLAGS: 00010246 [ 50.137860] RAX: 0000000000000000 RBX: ffff8803fc3bb010 RCX: 000000000000003c [ 50.145060] RDX: 000000000000003d RSI: ffff8800d9600380 RDI: ffffe8ffffc8a3b0 [ 50.152261] RBP: ffff88040abb80b0 R08: 0000000000000000 R09: ffff88041fd92e40 [ 50.159462] R10: ffffffff817b9800 R11: ffff88041fd92e40 R12: 0000000000000003 [ 50.166661] R13: ffff880406a8b000 R14: ffff880037042100 R15: 0000000000000003 [ 50.173864] FS: 00007f6b5af8a840(0000) GS:ffff88041fc80000(0000) knlGS:0000000000000000 [ 50.182032] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 50.187848] CR2: 00007f84d58ee480 CR3: 0000000408e32000 CR4: 00000000001407e0 [ 50.195047] Stack: [ 50.197126] ffff880037042158 ffffffff8110f9d0 0000000000000201 ffff88040aa34520 [ 50.204900] ffffffff817b9800 ffff88040abb80b0 0000000000000003 ffff880037042100 [ 50.212669] ffff88040aa34520 ffff88040aa34520 ffff880406a8b000 ffffffffc03248e2 [ 50.220435] Call Trace: [ 50.222950] [<ffffffff8110f9d0>] ? kmem_cache_alloc+0x90/0x103 [ 50.228948] [<ffffffffc03248e2>] ? walk_down_tree+0x40/0xa9 [btrfs] [ 50.235384] [<ffffffffc0327280>] ? btrfs_drop_snapshot+0x2c4/0x656 [btrfs] [ 50.242436] [<ffffffffc03755d2>] ? merge_reloc_roots+0xf0/0x1c4 [btrfs] [ 50.249226] [<ffffffffc03761f7>] ? btrfs_recover_relocation+0x271/0x30e [btrfs] [ 50.256721] [<ffffffffc03342c2>] ? open_ctree+0x19f9/0x1d5c [btrfs] [ 50.263151] [<ffffffffc03149ef>] ? btrfs_mount+0x44d/0x5b4 [btrfs] [ 50.269488] [<ffffffff810e885d>] ? pcpu_next_unpop+0x36/0x43 [ 50.275302] [<ffffffff810e9b2e>] ? pcpu_alloc+0x439/0x4f5 [ 50.280857] [<ffffffff8111cdad>] ? mount_fs+0x61/0x143 [ 50.286148] [<ffffffff811301ea>] ? vfs_kern_mount+0x62/0xe0 [ 50.291875] [<ffffffff811325a8>] ? do_mount+0x6bd/0x84e [ 50.297255] [<ffffffff810e5e9d>] ? memdup_user+0x31/0x56 [ 50.302724] [<ffffffff81132970>] ? SyS_mount+0x6f/0x9d [ 50.308019] [<ffffffff813b2df0>] ? tracesys_phase2+0xd4/0xd9 [ 50.313829] Code: 00 00 00 4c 89 f9 4c 89 ee 48 89 04 24 48 8b 13 e8 d5 f6 ff ff 83 f8 f4 75 02 0f 0b 85 c0 0f 85 16 01 00 00 4b 83 3c fe 00 75 02 <0f> 0b 41 83 be 94 00 00 00 01 49 63 c4 75 41 49 83 3c c6 01 0f [ 50.337268] RIP [<ffffffffc032271e>] walk_down_proc+0xc5/0x1e3 [btrfs] [ 50.344035] RSP <ffff88040eb07a68> [ 50.347605] ---[ end trace d38db2664afe0a64 ]---
Found a way to reproduce the bug by using the kernel tree as a source of changing data to snapshot, so that anyone should be able to reproduce it. Using one tree wasn't reliable to trigger the bug, but with four copies as in the script below it always ended with segfault of btrfs balance and BUG in logs (tried four times). #!/bin/bash SRC=/mnt/b0/linux # git clone of Linus' kernel tree is here MNT=/mnt/b3 mkfs.btrfs -f -L BTR3 -d raid10 -m raid1 -O skinny-metadata /dev/sd[e-h] mount -L BTR3 $MNT cd $MNT mkdir 1 2 3 4 for i in ? ; do btrfs sub cre $MNT/$i/linux cd $SRC for v in `git tag -l v2.6.2\* v3\* | sort -V` ; do echo $i/$v git checkout $v cd $MNT/$i rsync -ax --del --inplace --no-W --exclude .gi* \ --exclude .mail* $SRC/ linux btrfs sub snap -r linux $v cd $SRC done done btrfs dev add /dev/sdp /mnt/b3 btrfs bal start /mnt/b3