Bug 64961 - WARNING: at fs/btrfs/extent-tree.c:871 btrfs_lookup_extent_info+0x287/0x302 [btrfs]()
Summary: WARNING: at fs/btrfs/extent-tree.c:871 btrfs_lookup_extent_info+0x287/0x302 [...
Status: NEW
Alias: None
Product: File System
Classification: Unclassified
Component: btrfs (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Josef Bacik
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-11-14 14:53 UTC by Petr Janecek
Modified: 2014-10-26 13:28 UTC (History)
6 users (show)

See Also:
Kernel Version: 3.12
Tree: Mainline
Regression: No


Attachments
dmesg of kernel warning / bug during balance after deleting a device (89.78 KB, text/x-log)
2014-10-09 20:46 UTC, Luzipher
Details
output of btrfsck after bug and reboot (1.62 KB, text/x-log)
2014-10-10 07:14 UTC, Luzipher
Details
dmesg captured during the btrfsck run (78.00 KB, text/x-log)
2014-10-10 07:15 UTC, Luzipher
Details

Description Petr Janecek 2013-11-14 14:53:12 UTC
I got the following warning and switch to read-only during rsync activity
on btrfs. The fs consists of 8 disks, RAID10 data, RAID1 metadata:

root@fs0:~# btrfs fi df /mnt/b
Data, RAID10: total=2.01TB, used=2.00TB
System, RAID1: total=32.00MB, used=188.00KB
Metadata, RAID1: total=12.00GB, used=10.53GB
root@fs0:~# btrfs fi show /dev/sdg
Label: 'BTR0'  uuid: a319a7c8-cdef-407d-a215-140920d5af60
	Total devices 8 FS bytes used 2.01TB
	devid    3 size 2.73TB used 530.00GB path /dev/sdg
	devid    7 size 2.73TB used 504.00GB path /dev/sdb
	devid    9 size 2.73TB used 504.00GB path /dev/sdd
	devid    6 size 2.73TB used 504.00GB path /dev/sda
	devid    8 size 2.73TB used 504.00GB path /dev/sdc
	devid    2 size 2.73TB used 531.03GB path /dev/sdf
	devid    5 size 2.73TB used 525.00GB path /dev/sde
	devid    4 size 2.73TB used 530.03GB path /dev/sdh

Btrfs v0.20-rc1-324-g650e656

disks /dev/sd[a-d] were added just a few hours before and 'btrfs fi balance'
has been run.

[723039.753247] btrfs: relocating block group 1634365079552 flags 65
[723067.913330] btrfs: found 38314 extents
[723168.305215] btrfs: found 38314 extents
[723169.709185] btrfs: relocating block group 1632217595904 flags 65
[723205.366252] btrfs: found 23256 extents
[723241.447709] ------------[ cut here ]------------
[723241.452487] WARNING: CPU: 2 PID: 10899 at fs/btrfs/extent-tree.c:871 btrfs_lookup_extent_info+0x287/0x302 [btrfs]()
[723241.463259] Modules linked in: raid1 nfsd auth_rpcgss oid_registry nfs_acl lockd sunrpc xfs loop raid10 md_mod x86_pkg_temp_thermal coretemp kvm_intel kvm crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 ablk_helper cryptd lrw gf128mul iTCO_wdt iTCO_vendor_support glue_helper psmouse lpc_ich microcode i2c_i801 serio_raw pcspkr mfd_core i2c_core evdev ipmi_si ipmi_msghandler video acpi_cpufreq button processor btrfs xor raid6_pq crc32c libcrc32c usb_storage sg sd_mod hid_generic usbhid hid ahci libahci ehci_pci mpt2sas ehci_hcd raid_class libata e1000e fan usbcore ptp scsi_transport_sas usb_common pps_core thermal thermal_sys scsi_mod
[723241.523504] CPU: 2 PID: 10899 Comm: btrfs Not tainted 3.12.0 #33
[723241.529650] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.0a 06/08/2012
[723241.537837]  0000000000000000 0000000000000009 ffffffff81389e0b 0000000000000000
[723241.545562]  ffffffff8103574a 0000000000000000 ffffffffa01dbf29 0000000000000000
[723241.553171]  ffff8803d3588e20 0000000000000000 0000000000000000 0000000000000000
[723241.560780] Call Trace:
[723241.563346]  [<ffffffff81389e0b>] ? dump_stack+0x41/0x51
[723241.568772]  [<ffffffff8103574a>] ? warn_slowpath_common+0x78/0x90
[723241.575091]  [<ffffffffa01dbf29>] ? btrfs_lookup_extent_info+0x287/0x302 [btrfs]
[723241.582754]  [<ffffffffa01dbf29>] ? btrfs_lookup_extent_info+0x287/0x302 [btrfs]
[723241.590283]  [<ffffffffa01d16cc>] ? update_ref_for_cow+0x5d/0x280 [btrfs]
[723241.597321]  [<ffffffffa020952e>] ? write_extent_buffer+0x8e/0x106 [btrfs]
[723241.604319]  [<ffffffffa01d1a89>] ? __btrfs_cow_block+0x19a/0x382 [btrfs]
[723241.611244]  [<ffffffffa01d1dab>] ? btrfs_cow_block+0xe7/0x13b [btrfs]
[723241.617903]  [<ffffffffa01d493a>] ? btrfs_search_slot+0x248/0x682 [btrfs]
[723241.624814]  [<ffffffff81057c8d>] ? should_resched+0x5/0x23
[723241.630504]  [<ffffffffa0228a51>] ? do_relocation+0xff/0x3f5 [btrfs]
[723241.636971]  [<ffffffff81057c8d>] ? should_resched+0x5/0x23
[723241.642673]  [<ffffffffa0229e18>] ? relocate_tree_blocks+0x309/0x4b6 [btrfs]
[723241.649844]  [<ffffffffa01d119c>] ? btrfs_release_path+0x38/0x53 [btrfs]
[723241.656673]  [<ffffffffa022b33c>] ? relocate_block_group+0x237/0x4eb [btrfs]
[723241.663847]  [<ffffffffa022b733>] ? btrfs_relocate_block_group+0x143/0x26a [btrfs]
[723241.671578]  [<ffffffffa020bff9>] ? btrfs_relocate_chunk.isra.59+0x59/0x5fb [btrfs]
[723241.679524]  [<ffffffffa0218ffa>] ? btrfs_set_lock_blocking_rw+0xc9/0xdb [btrfs]
[723241.687046]  [<ffffffffa01cf51c>] ? btrfs_item_key_to_cpu+0x12/0x30 [btrfs]
[723241.694128]  [<ffffffffa02013eb>] ? btrfs_get_token_64+0x76/0xc6 [btrfs]
[723241.700947]  [<ffffffffa0203e2b>] ? release_extent_buffer+0x90/0x97 [btrfs]
[723241.708025]  [<ffffffffa020ea7f>] ? btrfs_balance+0x980/0xb44 [btrfs]
[723241.714586]  [<ffffffffa02150d8>] ? btrfs_ioctl_balance+0x21a/0x296 [btrfs]
[723241.721749]  [<ffffffffa0217906>] ? btrfs_ioctl+0xf3d/0x1eab [btrfs]
[723241.728221]  [<ffffffff810df524>] ? handle_mm_fault+0x22c/0x8aa
[723241.734268]  [<ffffffff81111ab9>] ? path_openat+0x233/0x4c5
[723241.739957]  [<ffffffff81390c33>] ? __do_page_fault+0x349/0x3d7
[723241.746006]  [<ffffffff810e28b9>] ? vma_gap_callbacks_propagate+0x13/0x2c
[723241.752920]  [<ffffffff810e33f5>] ? vma_link+0x6e/0x8c
[723241.758173]  [<ffffffff81113186>] ? vfs_ioctl+0x1e/0x31
[723241.763517]  [<ffffffff8111395c>] ? do_vfs_ioctl+0x3ad/0x3ef
[723241.769290]  [<ffffffff8105cbbf>] ? vtime_account_user+0x35/0x40
[723241.775413]  [<ffffffff811139ec>] ? SyS_ioctl+0x4e/0x7e
[723241.780754]  [<ffffffff8139301b>] ? tracesys+0xdd/0xe2
[723241.786011] ---[ end trace 5763625844d51f70 ]---
[723241.790742] BTRFS error (device sdg) in update_ref_for_cow:863: errno=-30 Readonly filesystem
[723241.799414] BTRFS info (device sdg): forced readonly
[723241.804493] ------------[ cut here ]------------
Comment 1 Xavier Bassery 2013-11-22 14:22:57 UTC
Could you please run a btrfs check on your file-system with recent btrfs-progs?
The ones displayed in your bug report are somewhat old (6 months being old by btrfs standard).
I would advise you to use progs not older than November. If your distribution do not provide recent ones, you can follow those instructions: https://btrfs.wiki.kernel.org/index.php/Btrfs_source_repositories#Official_repository

Then please report back the output of "btrfs check /dev/sda" (the fs has to be unmounted).
Comment 2 Petr Janecek 2013-11-28 17:20:27 UTC
btrfsck output:

checking extents
Extent back ref already exists for 1579109920768 parent 1576533598208 root 0 
Extent back ref already exists for 1581562929152 parent 1576533598208 root 0 
Extent back ref already exists for 4679099793408 parent 1576533598208 root 0 
Extent back ref already exists for 4679099813888 parent 1576533598208 root 0 
Extent back ref already exists for 1579207237632 parent 1576533598208 root 0 
Extent back ref already exists for 1579207303168 parent 1576533598208 root 0 
Extent back ref already exists for 1579161665536 parent 1576533598208 root 0 
Extent back ref already exists for 1579700260864 parent 1576533598208 root 0 

[~249k similar lines omitted]

ref mismatch on [1576349827072 4096] extent item 1, found 2
Incorrect global backref count on 1576349827072 found 1 wanted 2
backpointer mismatch on [1576349827072 4096]
ref mismatch on [1576349941760 4096] extent item 1, found 2
Incorrect global backref count on 1576349941760 found 1 wanted 2
backpointer mismatch on [1576349941760 4096]
ref mismatch on [1576349974528 4096] extent item 1, found 2
Incorrect global backref count on 1576349974528 found 1 wanted 2
backpointer mismatch on [1576349974528 4096]
ref mismatch on [1576350085120 4096] extent item 1, found 2
Incorrect global backref count on 1576350085120 found 1 wanted 2
backpointer mismatch on [1576350085120 4096]
ref mismatch on [1576350281728 4096] extent item 1, found 2
Incorrect global backref count on 1576350281728 found 1 wanted 2
backpointer mismatch on [1576350281728 4096]

[~751k similar lines omitted]

checking free space cache
checking fs roots
checking csums
checking root refs
Checking filesystem on /dev/sdg
UUID: a319a7c8-cdef-407d-a215-140920d5af60
found 300811738651 bytes used err is 0
total csum bytes: 2169886676
total tree bytes: 11531677696
total fs tree bytes: 8164470784
total extent tree bytes: 899383296
btree space waste bytes: 2464195755
file data blocks allocated: 2397512151040
 referenced 2397510746112
Btrfs v0.20-rc1-596-ge9ac73b

Should I try the --repair option?
Comment 3 David Sterba 2013-11-28 17:57:02 UTC
update_ref_for_cow does not expect to get 0 from btrfs_lookup_extent_info and turns the filesystem to read-only.

Commit 74be9510876a66ad9826613ac8a526d26f9e7f01 "Btrfs: optimize btrfs_lookup_extent_info()" seems to touch the code
Comment 4 Petr Janecek 2013-11-30 10:16:59 UTC
Response to Filipe's question on the ML: yes, the fs had skinny extents enabled.

I've run 'btrfsck --repair' on that fs and it doesn't seem that it has
helped much: after two days I tried to check it again and now the btrfsck
output is almost 10M lines, some of the new ones are like

ref mismatch on [1576349474816 4096] extent item 1, found 2
Backref 1576349474816 parent 1573 root 1573 not found in extent tree
Backref 1576349474816 parent 1583175602176 not referenced back 0x777bebf0
backpointer mismatch on [1576349474816 4096]

but most of them (~8M) are like

Backref 1585929220096 root 1573 owner 4778267 offset 106496 num_refs 0 not found in extent tree
Incorrect local backref count on 1585929220096 root 1573 owner 4778267 offset 106496 found 1 wanted 0 back 0xa29dafb0
Incorrect local backref count on 1585929220096 parent 4248626778112 owner 0 offset 0 found 0 wanted 1 back 0x216cd9d0
Backref disk bytenr does not match extent record, bytenr=1585929220096, ref bytenr=64
backpointer mismatch on [1585929220096 4096]

and the otput ends with

Errors found in extent allocation tree or chunk allocation
checking free space cache
checking fs roots
checking csums
checking root refs
Checking filesystem on /dev/sdb
UUID: a319a7c8-cdef-407d-a215-140920d5af60
free space inode generation (0) did not match free space cache generation (40846)
free space inode generation (0) did not match free space cache generation (40507)
free space inode generation (0) did not match free space cache generation (40846)
free space inode generation (0) did not match free space cache generation (37774)
free space inode generation (0) did not match free space cache generation (40507)
free space inode generation (0) did not match free space cache generation (40507)
free space inode generation (0) did not match free space cache generation (40847)
free space inode generation (0) did not match free space cache generation (40657)
free space inode generation (0) did not match free space cache generation (40869)
free space inode generation (0) did not match free space cache generation (40921)
free space inode generation (0) did not match free space cache generation (40876)
free space inode generation (0) did not match free space cache generation (40859)
free space inode generation (0) did not match free space cache generation (40879)
free space inode generation (0) did not match free space cache generation (40685)
free space inode generation (0) did not match free space cache generation (40685)
free space inode generation (0) did not match free space cache generation (40885)
free space inode generation (0) did not match free space cache generation (40900)
free space inode generation (0) did not match free space cache generation (40507)
free space inode generation (0) did not match free space cache generation (38953)
free space inode generation (0) did not match free space cache generation (40921)
found 616910851356 bytes used err is 0
total csum bytes: 2189697104
total tree bytes: 11586048000
total fs tree bytes: 8170622976
total extent tree bytes: 920498176
btree space waste bytes: 2473197330
file data blocks allocated: 2388866162688
 referenced 2388864757760
Btrfs v3.12
Comment 5 David Sterba 2013-12-03 15:36:17 UTC
I've seen that fsck prior to 3.12 reports errors like

Incorrect local backref count on 1585929220096 root 1573 owner 4778267 offset 106496 found 1 wanted 0 back 0xa29dafb0

that do not exist, but I see you've used the integration version or 3.12 proper. In connection with the skinny extents I don't think the patch identified in comment 3 is the the root cause, or at lest not the only one.
Comment 6 Petr Janecek 2013-12-03 22:15:17 UTC
I'm able to reproduce something like the first btrfsck messages on a fresh
filesystem on stock 3.12 kernel and 3.12 btrfs-progs:

mkfs.btrfs -f -d raid1 -m raid1 -L BTR1 -O skinny-metadata /dev/sd[mno]
mount LABEL=BTR1 /mnt/b1 -o noatime,nodiratime
rsync -ax --specials --del --inplace /mnt/x1/home /mnt/b1
# day of fs activity on /mnt/x1/home, BTR1 untouched
cd /mnt/b1
btrfs sub snaps -r home home_1
cd ..
rsync -ax --specials --del --inplace /mnt/x1/home /mnt/b1
umount /mnt/b1
btrfsck /dev/sdm

checking extents
Extent back ref already exists for 1093533696 parent 0 root 569 
Extent back ref already exists for 1093566464 parent 0 root 569 
Extent back ref already exists for 1093599232 parent 0 root 569 
[~43k lines omitted]
ref mismatch on [29392896 16384] extent item 1, found 2
Incorrect global backref count on 29392896 found 1 wanted 2
backpointer mismatch on [29392896 16384]
ref mismatch on [29458432 16384] extent item 1, found 2
Incorrect global backref count on 29458432 found 1 wanted 2
backpointer mismatch on [29458432 16384]
[~170k lines omitted]
Errors found in extent allocation tree or chunk allocation
checking free space cache
checking fs roots
checking csums
checking root refs
Checking filesystem on /dev/sdn
UUID: 2ec4b5a1-a2c4-4687-9015-cf5b519297f3
found 220988564208 bytes used err is 0
total csum bytes: 1748031412
total tree bytes: 8836808704
total fs tree bytes: 6147653632
total extent tree bytes: 671137792
btree space waste bytes: 1125248357
file data blocks allocated: 1803477946368
 referenced 1803477946368
Btrfs v3.12
Comment 7 yueqili 2013-12-24 15:14:13 UTC
My external hardrive with RAID1 configuration has this problem occurred twice. I've re-create the file system with RAID1 and after a few weeks. The problem
"Errors found in extent allocation tree or chunk allocation" occur again.
Comment 8 Petr Janecek 2014-01-13 00:32:00 UTC
Still reproducible on 3.13-rc7 on fresh fs with -d raid10 -m raid1
(procedure as above, only with a few more data modification, shapshot
cycles). Got even a new kind of message from btrfsck, see lines after
"checking fs roots"). Unable to reproduce it on -d raid1 fs.

btrfsck output:
checking extents
Extent back ref already exists for 1041367040 parent 0 root 258 
[~150k lines omitted]
Extent back ref already exists for 444234022912 parent 0 root 258 
Extent back ref already exists for 444234366976 parent 0 root 258 
Extent back ref already exists for 444234874880 parent 0 root 258 
ref mismatch on [29753344 16384] extent item 1, found 2
Incorrect global backref count on 29753344 found 1 wanted 2
backpointer mismatch on [29753344 16384]
ref mismatch on [38338560 16384] extent item 1, found 2
Incorrect global backref count on 38338560 found 1 wanted 2
backpointer mismatch on [38338560 16384]
[~1M lines omitted]
Errors found in extent allocation tree or chunk allocation
checking free space cache
checking fs roots
root 258 inode 849332 errors 100, file extent discount
root 258 inode 3410411 errors 100, file extent discount
root 258 inode 3410412 errors 100, file extent discount
root 258 inode 3410413 errors 100, file extent discount
root 258 inode 3410414 errors 100, file extent discount
[~3k lines omitted]
Checking filesystem on /dev/sda
UUID: c43757c9-3063-4b80-89ba-64e1bff63519
found 77997139921 bytes used err is 1
total csum bytes: 737692240
total tree bytes: 5144248320
total fs tree bytes: 3975954432
total extent tree bytes: 317571072
btree space waste bytes: 691743734
file data blocks allocated: 863021318144
 referenced 863021318144
Btrfs v3.12
Comment 9 Petr Janecek 2014-01-22 03:28:17 UTC
These errors reported by btrfsck are somewhat transient: they often
disappear after another mount + fs activity + umount. While copying
a time series of snapshots from a different fs

btrfs sub cre $DST/home
for i in home_* ; do
        rsync -ax --specials --del --inplace $i/ $DST/home
        btrfs sub snap -r $DST/home $DST/$i

        umount $DST
        btrfsck $DEV
        mount $DST
done

out of 50 snapshots, btrfsck had output errors after 10 of them, it looks
random (runs 11 12 19 20 22 32 37 42 48 50). The number of btrfsck errors
reported in one run varied wildly, from 130 lines to >300k lines.

kernel 3.13, btrfs-progs 3.12, fs 4*4TB -d raid10 -m raid1 -O skinny-metadata
Comment 10 Petr Janecek 2014-10-08 13:00:41 UTC
I just reproduced the bug in 3.17. /dev/sdp was added to 4 disk
"-m raid1 -d raid10 -O skinny-metadata" fs, then
"btrfs balance start /mnt/b3" was run. There was some activity on
the fs after the "balance start", but the fs was idle for several
hours (except for the running balance) before hitting the BUG.

The fs was created a week ago on 3.16.3 and and was clean according
to "btrfs check" run just before adding the drive. After the BUG,
it was impossible to umount, but after reboot, it checked clean
as well.

root@fs0:~# btrfs fi df /mnt/b3
Data, RAID10: total=2.34TiB, used=2.34TiB
Data, single: total=8.00MiB, used=0.00
System, RAID1: total=8.00MiB, used=288.00KiB
System, single: total=4.00MiB, used=0.00
Metadata, RAID1: total=15.00GiB, used=13.81GiB
Metadata, single: total=8.00MiB, used=0.00
unknown, single: total=512.00MiB, used=15.88MiB
root@fs0:~# btrfs fi show /mnt/b3
Label: 'BTR3'  uuid: f181dd81-c219-44fc-b113-3a8cfd0d3295
	Total devices 5 FS bytes used 2.35TiB
	devid    1 size 2.73TiB used 1.05TiB path /dev/sde
	devid    2 size 2.73TiB used 1.05TiB path /dev/sdf
	devid    3 size 2.73TiB used 1.05TiB path /dev/sdg
	devid    4 size 2.73TiB used 1.05TiB path /dev/sdh
	devid    5 size 3.64TiB used 523.00GiB path /dev/sdp

Btrfs v3.16


[88026.168587] WARNING: CPU: 0 PID: 6486 at fs/btrfs/extent-tree.c:876 btrfs_lookup_extent_info+0x292/0x30a [btrfs]()
[88026.178969] Modules linked in: nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc xfs libcrc32c loop raid10 md_mod x86_pkg_temp_thermal coretemp kvm_intel kvm crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 ablk_help
er iTCO_wdt iTCO_vendor_support cryptd lrw gf128mul glue_helper lpc_ich i2c_i801 psmouse i2c_core mfd_core pcspkr serio_raw evdev microcode ipmi_si battery ipmi_msghandler video tpm_tis tpm button acpi_cpufreq processor ie31200_edac edac_c
ore btrfs xor raid6_pq sg sd_mod uas usb_storage hid_generic usbhid hid e1000e mpt2sas raid_class ptp crc32c_intel pps_core scsi_transport_sas thermal ahci libahci libata ehci_pci ehci_hcd fan thermal_sys usbcore scsi_mod usb_common
[88026.242729] CPU: 0 PID: 6486 Comm: btrfs Not tainted 3.17.0 #10
[88026.248675] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.0a 06/08/2012
[88026.256375]  0000000000000000 0000000000000009 ffffffff813a6a46 0000000000000000
[88026.263889]  ffffffff8103b591 0000000000000008 ffffffffc01decc4 ffff880077352140
[88026.271398]  ffff8800773520b0 0000000000000000 0000000000000000 ffff88035e829000
[88026.278919] Call Trace:
[88026.281396]  [<ffffffff813a6a46>] ? dump_stack+0x41/0x51
[88026.286729]  [<ffffffff8103b591>] ? warn_slowpath_common+0x78/0x90
[88026.292939]  [<ffffffffc01decc4>] ? btrfs_lookup_extent_info+0x292/0x30a [btrfs]
[88026.300380]  [<ffffffffc01decc4>] ? btrfs_lookup_extent_info+0x292/0x30a [btrfs]
[88026.307815]  [<ffffffffc01df38f>] ? walk_down_proc+0xaf/0x1e3 [btrfs]
[88026.314283]  [<ffffffff8110bc2a>] ? kmem_cache_alloc+0x91/0x104
[88026.320234]  [<ffffffffc01e15b8>] ? walk_down_tree+0x40/0xa9 [btrfs]
[88026.326627]  [<ffffffffc01e3f3e>] ? btrfs_drop_snapshot+0x2c4/0x656 [btrfs]
[88026.333622]  [<ffffffffc0231297>] ? merge_reloc_roots+0xf0/0x1ca [btrfs]
[88026.340363]  [<ffffffffc0231972>] ? relocate_block_group+0x445/0x4bd [btrfs]
[88026.347442]  [<ffffffffc0231b39>] ? btrfs_relocate_block_group+0x14f/0x267 [btrfs]
[88026.355076]  [<ffffffffc02106b7>] ? btrfs_relocate_chunk.isra.58+0x58/0x5e2 [btrfs]
[88026.362775]  [<ffffffffc01d1786>] ? btrfs_item_key_to_cpu+0x12/0x30 [btrfs]
[88026.369779]  [<ffffffffc0205695>] ? btrfs_get_token_64+0x76/0xc6 [btrfs]
[88026.376512]  [<ffffffffc020c90b>] ? release_extent_buffer+0x9d/0xa4 [btrfs]
[88026.383510]  [<ffffffffc0213186>] ? btrfs_balance+0x9b0/0xb9d [btrfs]
[88026.389977]  [<ffffffffc021a646>] ? btrfs_ioctl_balance+0x21a/0x297 [btrfs]
[88026.396977]  [<ffffffffc021d462>] ? btrfs_ioctl+0x10f4/0x20a5 [btrfs]
[88026.403443]  [<ffffffff81121b9e>] ? path_openat+0x233/0x4c5
[88026.409039]  [<ffffffff81030620>] ? __do_page_fault+0x339/0x3df
[88026.414984]  [<ffffffff810f2b9c>] ? __vma_link_rb+0x58/0x73
[88026.420581]  [<ffffffff810f2c22>] ? vma_link+0x6b/0x8a
[88026.425744]  [<ffffffff811237f8>] ? do_vfs_ioctl+0x3ec/0x435
[88026.431428]  [<ffffffff8105b9e0>] ? vtime_account_user+0x35/0x40
[88026.437460]  [<ffffffff8112388a>] ? SyS_ioctl+0x49/0x77
[88026.442710]  [<ffffffff813aafac>] ? tracesys+0x7e/0xe2
[88026.447865]  [<ffffffff813ab00b>] ? tracesys+0xdd/0xe2
[88026.453021] ---[ end trace 389e0049cf5821d8 ]---
[88026.457665] ------------[ cut here ]------------
[88026.462316] kernel BUG at fs/btrfs/extent-tree.c:7727!
[88026.467471] invalid opcode: 0000 [#1] SMP 
[88026.471622] Modules linked in: nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc xfs libcrc32c loop raid10 md_mod x86_pkg_temp_thermal coretemp kvm_intel kvm crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 ablk_helper iTCO_wdt iTCO_vendor_support cryptd lrw gf128mul glue_helper lpc_ich i2c_i801 psmouse i2c_core mfd_core pcspkr serio_raw evdev microcode ipmi_si battery ipmi_msghandler video tpm_tis tpm button acpi_cpufreq processor ie31200_edac edac_core btrfs xor raid6_pq sg sd_mod uas usb_storage hid_generic usbhid hid e1000e mpt2sas raid_class ptp crc32c_intel pps_core scsi_transport_sas thermal ahci libahci libata ehci_pci ehci_hcd fan thermal_sys usbcore scsi_mod usb_common
[88026.535334] CPU: 0 PID: 6486 Comm: btrfs Tainted: G        W      3.17.0 #10
[88026.542402] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.0a 06/08/2012
[88026.550088] task: ffff88040d71f050 ti: ffff8800041a0000 task.ti: ffff8800041a0000
[88026.557600] RIP: 0010:[<ffffffffc01df3a5>]  [<ffffffffc01df3a5>] walk_down_proc+0xc5/0x1e3 [btrfs]
[88026.566621] RSP: 0018:ffff8800041a3908  EFLAGS: 00010246
[88026.571948] RAX: 0000000000000000 RBX: ffff8801f3045ba0 RCX: 000000000000004d
[88026.579105] RDX: 0000000000000002 RSI: ffff8800773520b0 RDI: ffff88040a34c800
[88026.586263] RBP: ffff880077352140 R08: 0000000000000000 R09: ffffffff814eee00
[88026.593420] R10: ffffffff814eee00 R11: ffffffff814eee00 R12: 0000000000000003
[88026.600577] R13: ffff88035e829000 R14: ffff8803fc4bd700 R15: 0000000000000003
[88026.607733] FS:  00007fa15da57880(0000) GS:ffff88041fc00000(0000) knlGS:0000000000000000
[88026.615853] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[88026.621614] CR2: 00007fff2d7ccf48 CR3: 00000001f79de000 CR4: 00000000001407f0
[88026.628772] Stack:
[88026.630806]  ffff8803fc4bd758 ffffffff8110bc2a 0000000000000000 ffff8801b40beac0
[88026.638320]  0000000000000188 ffff880077352140 0000000000000003 ffff8803fc4bd700
[88026.645832]  ffff8801b40beac0 ffff8801b40beac0 ffff88035e829000 ffffffffc01e15b8
[88026.653345] Call Trace:
[88026.655814]  [<ffffffff8110bc2a>] ? kmem_cache_alloc+0x91/0x104
[88026.661756]  [<ffffffffc01e15b8>] ? walk_down_tree+0x40/0xa9 [btrfs]
[88026.673781]  [<ffffffffc01e3f3e>] ? btrfs_drop_snapshot+0x2c4/0x656 [btrfs]
[88026.680768]  [<ffffffffc0231297>] ? merge_reloc_roots+0xf0/0x1ca [btrfs]
[88026.687500]  [<ffffffffc0231972>] ? relocate_block_group+0x445/0x4bd [btrfs]
[88026.694579]  [<ffffffffc0231b39>] ? btrfs_relocate_block_group+0x14f/0x267 [btrfs]
[88026.702187]  [<ffffffffc02106b7>] ? btrfs_relocate_chunk.isra.58+0x58/0x5e2 [btrfs]
[88026.709878]  [<ffffffffc01d1786>] ? btrfs_item_key_to_cpu+0x12/0x30 [btrfs]
[88026.716864]  [<ffffffffc0205695>] ? btrfs_get_token_64+0x76/0xc6 [btrfs]
[88026.723587]  [<ffffffffc020c90b>] ? release_extent_buffer+0x9d/0xa4 [btrfs]
[88026.730573]  [<ffffffffc0213186>] ? btrfs_balance+0x9b0/0xb9d [btrfs]
[88026.737043]  [<ffffffffc021a646>] ? btrfs_ioctl_balance+0x21a/0x297 [btrfs]
[88026.744035]  [<ffffffffc021d462>] ? btrfs_ioctl+0x10f4/0x20a5 [btrfs]
[88026.750493]  [<ffffffff81121b9e>] ? path_openat+0x233/0x4c5
[88026.756082]  [<ffffffff81030620>] ? __do_page_fault+0x339/0x3df
[88026.762025]  [<ffffffff810f2b9c>] ? __vma_link_rb+0x58/0x73
[88026.767613]  [<ffffffff810f2c22>] ? vma_link+0x6b/0x8a
[88026.772769]  [<ffffffff811237f8>] ? do_vfs_ioctl+0x3ec/0x435
[88026.778446]  [<ffffffff8105b9e0>] ? vtime_account_user+0x35/0x40
[88026.784476]  [<ffffffff8112388a>] ? SyS_ioctl+0x49/0x77
[88026.789718]  [<ffffffff813aafac>] ? tracesys+0x7e/0xe2
[88026.794873]  [<ffffffff813ab00b>] ? tracesys+0xdd/0xe2
[88026.800029] Code: 00 00 00 4c 89 f9 4c 89 ee 48 89 04 24 48 8b 13 e8 a3 f6 ff ff 83 f8 f4 75 02 0f 0b 85 c0 0f 85 16 01 00 00 4b 83 3c fe 00 75 02 <0f> 0b 41 83 be 94 00 00 00 01 49 63 c4 75 41 49 83 3c c6 01 0f 
[88026.820258] RIP  [<ffffffffc01df3a5>] walk_down_proc+0xc5/0x1e3 [btrfs]
[88026.826924]  RSP <ffff8800041a3908>
[88026.830865] ---[ end trace 389e0049cf5821d9 ]---
Comment 11 Luzipher 2014-10-09 20:46:03 UTC
Created attachment 153051 [details]
dmesg of kernel warning / bug during balance after deleting a device

I think I'm hitting the same bug with my raid5 array. I have one disk that causes read errors in a 4-disk array and want to replace it with a new disk, following the guide for "Replacing a drive that hasn't failed yet on a running raid5 array" at http://marc.merlins.org/perso/btrfs/post_2014-03-23_Btrfs-Raid5-Status.html .
Essentially what I did was adding the new drive to the array, then deleting the drive that causes read errors. The delete operation apparently causes a balance operation (but you cannot use "btrfs balance status", the "btrfs device delete" command is just blocking). It already ran for several hours and crashed the machine hard during the night (no dmesg :-( ). After I restarted it, I got the dmesg attached, which seems to be the same issue as described here.

Kernel is 3.17-rc5+, which is an unmodified cmason's "integration" branch fetched on 2014-10-08, latest commit a43bb39b5c710e79e921fb76341bacc418cfde78

# btrfs fi show
Label: none  uuid: 9f3a48b7-1b88-44f0-a387-f3712fc2c0b6
        Total devices 5 FS bytes used 4.43TiB
        devid    1 size 3.64TiB used 1.50TiB path /dev/mapper/wdred4tb2
        devid    2 size 0.00 used 1.20TiB path /dev/dm-4
        devid    3 size 3.64TiB used 1.50TiB path /dev/dm-5
        devid    4 size 3.64TiB used 1.50TiB path /dev/dm-6
        devid    5 size 3.64TiB used 307.69GiB path /dev/dm-2

Device 2 is the failing one that shall be deleted, device 5 the new one.

The command I used was (including output):

# btrfs device delete /dev/mapper/wdred4tb3 .
Segmentation fault
Comment 12 Luzipher 2014-10-10 07:14:23 UTC
Created attachment 153071 [details]
output of btrfsck after bug and reboot

As requested on irc by cmason, I did a btrfsck on the raid5-fs after the bug occured again tonight. The output is attached.
Note that there were read errors during operation, that might explain some of the errors. I will attach the dmesg I captured during the run next.

btrfs-progs version as reported by "btrfs --version" is v3.16.2.
Comment 13 Luzipher 2014-10-10 07:15:10 UTC
Created attachment 153081 [details]
dmesg captured during the btrfsck run
Comment 14 Petr Janecek 2014-10-16 04:27:58 UTC
I'm unable to reproduce the BUG on a fs without "-O skinny-metadata":
created fs on the same four disks, copied exactly the same data there,
added another disk, run balance, removed disk, balance, again added disk,
balance -- no errors so far. On a "-O skinny-metadata" fs, the balance 
always crashed.
Comment 15 Petr Janecek 2014-10-20 13:21:37 UTC
I just got similar bug at the mount time. I had the usual four disk "-d raid10 m raid1 -O skinny-metadata" fs, copied the data, added another drive, run balance, got the usual BUG, rebooted to 3.18-rc1 and the fs did not mount, see the messages below. btrfsck refuses to check it ("Couldn't open file system", v3.16.2).

[   49.778006] WARNING: CPU: 1 PID: 1914 at fs/btrfs/extent-tree.c:876 btrfs_lookup_extent_info+0x292/0x30a [btrfs]()
[   49.788437] Modules linked in: xfs libcrc32c loop raid10 md_mod x86_pkg_temp_thermal coretemp kvm_intel kvm crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 ablk_helper cryptd lrw gf128mul iTCO_wdt glue_helper iTCO_vendor_support i2c_i801 psmouse pcspkr lpc_ich serio_raw mfd_core microcode evdev i2c_core ipmi_si battery ipmi_msghandler video tpm_tis tpm ie31200_edac edac_core button acpi_cpufreq processor btrfs xor raid6_pq sg sd_mod uas usb_storage hid_generic usbhid hid ahci libahci thermal crc32c_intel libata mpt2sas raid_class scsi_transport_sas ehci_pci ehci_hcd e1000e ptp pps_core fan thermal_sys scsi_mod usbcore usb_common
[   49.849834] CPU: 1 PID: 1914 Comm: mount Not tainted 3.18.0-rc1 #12
[   49.856522] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.0a 06/08/2012
[   49.864260]  0000000000000000 0000000000000009 ffffffff813aeaf1 0000000000000000
[   49.872029]  ffffffff8103b70a 0000000000000008 ffffffffc032206f ffff88040abb80b0
[   49.879792]  ffff8800d9600380 0000000000000000 0000000000000000 ffff880406a8b000
[   49.887559] Call Trace:
[   49.890085]  [<ffffffff813aeaf1>] ? dump_stack+0x41/0x51
[   49.895473]  [<ffffffff8103b70a>] ? warn_slowpath_common+0x78/0x90
[   49.901739]  [<ffffffffc032206f>] ? btrfs_lookup_extent_info+0x292/0x30a [btrfs]
[   49.909235]  [<ffffffffc032206f>] ? btrfs_lookup_extent_info+0x292/0x30a [btrfs]
[   49.916729]  [<ffffffffc0322708>] ? walk_down_proc+0xaf/0x1e3 [btrfs]
[   49.923243]  [<ffffffff8110f9d0>] ? kmem_cache_alloc+0x90/0x103
[   49.929250]  [<ffffffffc03248e2>] ? walk_down_tree+0x40/0xa9 [btrfs]
[   49.935690]  [<ffffffffc0327280>] ? btrfs_drop_snapshot+0x2c4/0x656 [btrfs]
[   49.942740]  [<ffffffffc03755d2>] ? merge_reloc_roots+0xf0/0x1c4 [btrfs]
[   49.949532]  [<ffffffffc03761f7>] ? btrfs_recover_relocation+0x271/0x30e [btrfs]
[   49.962820]  [<ffffffffc03342c2>] ? open_ctree+0x19f9/0x1d5c [btrfs]
[   49.969261]  [<ffffffffc03149ef>] ? btrfs_mount+0x44d/0x5b4 [btrfs]
[   49.975605]  [<ffffffff810e885d>] ? pcpu_next_unpop+0x36/0x43
[   49.981425]  [<ffffffff810e9b2e>] ? pcpu_alloc+0x439/0x4f5
[   49.986981]  [<ffffffff8111cdad>] ? mount_fs+0x61/0x143
[   49.992285]  [<ffffffff811301ea>] ? vfs_kern_mount+0x62/0xe0
[   49.998019]  [<ffffffff811325a8>] ? do_mount+0x6bd/0x84e
[   50.003408]  [<ffffffff810e5e9d>] ? memdup_user+0x31/0x56
[   50.008885]  [<ffffffff81132970>] ? SyS_mount+0x6f/0x9d
[   50.014188]  [<ffffffff813b2df0>] ? tracesys_phase2+0xd4/0xd9
[   50.020007] ---[ end trace d38db2664afe0a63 ]---
[   50.024709] ------------[ cut here ]------------
[   50.029398] kernel BUG at fs/btrfs/extent-tree.c:7745!
[   50.034604] invalid opcode: 0000 [#1] SMP 
[   50.038909] Modules linked in: xfs libcrc32c loop raid10 md_mod x86_pkg_temp_thermal coretemp kvm_intel kvm crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 ablk_helper cryptd lrw gf128mul iTCO_wdt glue_helper iTCO_vendor_support i2c_i801 psmouse pcspkr lpc_ich serio_raw mfd_core microcode evdev i2c_core ipmi_si battery ipmi_msghandler video tpm_tis tpm ie31200_edac edac_core button acpi_cpufreq processor btrfs xor raid6_pq sg sd_mod uas usb_storage hid_generic usbhid hid ahci libahci thermal crc32c_intel libata mpt2sas raid_class scsi_transport_sas ehci_pci ehci_hcd e1000e ptp pps_core fan thermal_sys scsi_mod usbcore usb_common
[   50.100230] CPU: 1 PID: 1914 Comm: mount Tainted: G        W      3.18.0-rc1 #12
[   50.108054] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.0a 06/08/2012
[   50.115793] task: ffff8804070bab70 ti: ffff88040eb04000 task.ti: ffff88040eb04000
[   50.123356] RIP: 0010:[<ffffffffc032271e>]  [<ffffffffc032271e>] walk_down_proc+0xc5/0x1e3 [btrfs]
[   50.132479] RSP: 0018:ffff88040eb07a68  EFLAGS: 00010246
[   50.137860] RAX: 0000000000000000 RBX: ffff8803fc3bb010 RCX: 000000000000003c
[   50.145060] RDX: 000000000000003d RSI: ffff8800d9600380 RDI: ffffe8ffffc8a3b0
[   50.152261] RBP: ffff88040abb80b0 R08: 0000000000000000 R09: ffff88041fd92e40
[   50.159462] R10: ffffffff817b9800 R11: ffff88041fd92e40 R12: 0000000000000003
[   50.166661] R13: ffff880406a8b000 R14: ffff880037042100 R15: 0000000000000003
[   50.173864] FS:  00007f6b5af8a840(0000) GS:ffff88041fc80000(0000) knlGS:0000000000000000
[   50.182032] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   50.187848] CR2: 00007f84d58ee480 CR3: 0000000408e32000 CR4: 00000000001407e0
[   50.195047] Stack:
[   50.197126]  ffff880037042158 ffffffff8110f9d0 0000000000000201 ffff88040aa34520
[   50.204900]  ffffffff817b9800 ffff88040abb80b0 0000000000000003 ffff880037042100
[   50.212669]  ffff88040aa34520 ffff88040aa34520 ffff880406a8b000 ffffffffc03248e2
[   50.220435] Call Trace:
[   50.222950]  [<ffffffff8110f9d0>] ? kmem_cache_alloc+0x90/0x103
[   50.228948]  [<ffffffffc03248e2>] ? walk_down_tree+0x40/0xa9 [btrfs]
[   50.235384]  [<ffffffffc0327280>] ? btrfs_drop_snapshot+0x2c4/0x656 [btrfs]
[   50.242436]  [<ffffffffc03755d2>] ? merge_reloc_roots+0xf0/0x1c4 [btrfs]
[   50.249226]  [<ffffffffc03761f7>] ? btrfs_recover_relocation+0x271/0x30e [btrfs]
[   50.256721]  [<ffffffffc03342c2>] ? open_ctree+0x19f9/0x1d5c [btrfs]
[   50.263151]  [<ffffffffc03149ef>] ? btrfs_mount+0x44d/0x5b4 [btrfs]
[   50.269488]  [<ffffffff810e885d>] ? pcpu_next_unpop+0x36/0x43
[   50.275302]  [<ffffffff810e9b2e>] ? pcpu_alloc+0x439/0x4f5
[   50.280857]  [<ffffffff8111cdad>] ? mount_fs+0x61/0x143
[   50.286148]  [<ffffffff811301ea>] ? vfs_kern_mount+0x62/0xe0
[   50.291875]  [<ffffffff811325a8>] ? do_mount+0x6bd/0x84e
[   50.297255]  [<ffffffff810e5e9d>] ? memdup_user+0x31/0x56
[   50.302724]  [<ffffffff81132970>] ? SyS_mount+0x6f/0x9d
[   50.308019]  [<ffffffff813b2df0>] ? tracesys_phase2+0xd4/0xd9
[   50.313829] Code: 00 00 00 4c 89 f9 4c 89 ee 48 89 04 24 48 8b 13 e8 d5 f6 ff ff 83 f8 f4 75 02 0f 0b 85 c0 0f 85 16 01 00 00 4b 83 3c fe 00 75 02 <0f> 0b 41 83 be 94 00 00 00 01 49 63 c4 75 41 49 83 3c c6 01 0f 
[   50.337268] RIP  [<ffffffffc032271e>] walk_down_proc+0xc5/0x1e3 [btrfs]
[   50.344035]  RSP <ffff88040eb07a68>
[   50.347605] ---[ end trace d38db2664afe0a64 ]---
Comment 16 Petr Janecek 2014-10-26 13:28:44 UTC
Found a way to reproduce the bug by using the kernel tree as
a source of changing data to snapshot, so that anyone should
be able to reproduce it.

Using one tree wasn't reliable to trigger the bug, but with
four copies as in the script below it always ended with segfault
of btrfs balance and BUG in logs (tried four times).


#!/bin/bash

SRC=/mnt/b0/linux    # git clone of Linus' kernel tree is here
MNT=/mnt/b3

mkfs.btrfs -f -L BTR3 -d raid10 -m raid1 -O skinny-metadata /dev/sd[e-h]
mount -L BTR3 $MNT

cd $MNT
mkdir 1 2 3 4
for i in ? ; do
  	btrfs sub cre $MNT/$i/linux
	cd $SRC
	for v in `git tag -l v2.6.2\* v3\* | sort -V` ; do
		echo $i/$v
		git checkout $v
		cd $MNT/$i
		rsync -ax --del --inplace --no-W --exclude .gi* \
			--exclude .mail* $SRC/ linux
		btrfs sub snap -r linux $v
		cd $SRC
	done
done

btrfs dev add /dev/sdp /mnt/b3
btrfs bal start /mnt/b3

Note You need to log in before you can comment on or make changes to this bug.