Created attachment 292315 [details] dmesg output This is about a system running Ubuntu 20.04, uname -a says: Linux ida.rooot.de 5.4.0-45-generic #49-Ubuntu SMP Wed Aug 26 13:38:52 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux That's a kernel package that was released 31 Aug. I can reproduce the bug by running btrfs balance start /home -musage=70 After a few seconds, the command terminates with a segfault, dmesg output is attached. The core lines in dmesg seem to be: [32387.616249] kernel BUG at fs/btrfs/relocation.c:437! [32387.616271] invalid opcode: 0000 [#1] SMP PTI [32387.616180] BTRFS error (device sdc2): couldn't find block (4853431877632) (level 1) in tree (19918) with key (986 96 124) There is nothing about an IO error in dmesg, so to me as a layman this doesn't look like physical damage. Yet when I repeat the command, it always names the same block.
Here are the filesystem stats: root@ida:~# btrfs fi show / │············································································· Label: 'BTRFS_RAID' uuid: b80344d6-ae49-4557-bd4d-641d0afcda3e │············································································· Total devices 4 FS bytes used 410.91GiB │············································································· devid 2 size 460.94GiB used 214.03GiB path /dev/sdc2 │············································································· devid 3 size 460.94GiB used 214.00GiB path /dev/sdd2 │············································································· devid 5 size 460.90GiB used 214.03GiB path /dev/sdb2 │············································································· devid 6 size 460.94GiB used 202.06GiB path /dev/sda2 btrfs-progs package version is 5.4.1-2
btrfs fi usage / Overall: Device size: 1.80TiB Device allocated: 844.12GiB Device unallocated: 999.59GiB Device missing: 0.00B Used: 821.95GiB Free (estimated): 508.97GiB (min: 508.97GiB) Data ratio: 2.00 Metadata ratio: 2.00 Global reserve: 512.00MiB (used: 0.00B) Data,RAID1: Size:415.00GiB, Used:405.82GiB (97.79%) /dev/sdc2 212.00GiB /dev/sdd2 212.00GiB /dev/sdb2 209.00GiB /dev/sda2 197.00GiB Metadata,RAID1: Size:7.00GiB, Used:5.15GiB (73.62%) /dev/sdc2 2.00GiB /dev/sdd2 2.00GiB /dev/sdb2 5.00GiB /dev/sda2 5.00GiB System,RAID1: Size:64.00MiB, Used:96.00KiB (0.15%) /dev/sdc2 32.00MiB /dev/sdb2 32.00MiB /dev/sda2 64.00MiB Unallocated: /dev/sdc2 246.91GiB /dev/sdd2 246.94GiB /dev/sdb2 246.87GiB /dev/sda2 258.88GiB
Just wanted to say I've run into the exact same issue. [1133936.776355] BTRFS error (device sdh): couldn't find block (235170781286400) (level 1) in tree (7) with key (7869 99 12 786708) [1133936.776549] ------------[ cut here ]------------ [1133936.776552] kernel BUG at fs/btrfs/relocation.c:437! [1133936.776612] invalid opcode: 0000 [#1] SMP PTI [1133936.776657] CPU: 2 PID: 1259511 Comm: btrfs Tainted: G W 5.4.0-52-generic #57-Ubuntu [1133936.776716] Hardware name: LENOVO ThinkServer TS440/ThinkServer TS440, BIOS FBKTD0AUS 06/12/2018 [1133936.776818] RIP: 0010:remove_backref_node+0x139/0x140 [btrfs] [1133936.776859] Code: 41 5f 5d c3 49 8b 96 d0 00 00 00 48 8b 75 c8 49 89 86 d0 00 00 00 48 89 73 50 48 89 53 58 48 8 9 02 80 4b 71 02 e9 38 ff ff ff <0f> 0b c3 0f 0b 66 90 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 54 [1133936.776974] RSP: 0018:ffffaab2411df928 EFLAGS: 00010206 [1133936.777011] RAX: ffff946499a3a100 RBX: ffff9467c6425680 RCX: ffff9467c64253c0 [1133936.777057] RDX: 0000000000a11c29 RSI: cc071d12e7391895 RDI: 000000000002f0a0 [1133936.777103] RBP: ffffaab2411df960 R08: ffff94687ea978c8 R09: 0000000000010101 [1133936.777149] R10: ffff94686fa5e830 R11: 0000000000000001 R12: ffff9467c6425380 [1133936.777196] R13: dead000000000100 R14: ffff9463f21fb020 R15: dead000000000122 [1133936.777243] FS: 00007f3f603818c0(0000) GS:ffff94687ea80000(0000) knlGS:0000000000000000 [1133936.777296] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [1133936.777334] CR2: 00007f4ff9481fb8 CR3: 0000000186acc001 CR4: 00000000001606e0 [1133936.777381] Call Trace: [1133936.777436] build_backref_tree+0x234/0x10e0 [btrfs] [1133936.777502] relocate_tree_blocks+0x121/0x630 [btrfs] [1133936.777540] ? _cond_resched+0x19/0x30 [1133936.777596] ? tree_insert+0x55/0x60 [btrfs] [1133936.777654] relocate_block_group+0x224/0x5f0 [btrfs] [1133936.777723] btrfs_relocate_block_group+0x15e/0x300 [btrfs] [1133936.777808] btrfs_relocate_chunk+0x2a/0x90 [btrfs] [1133936.777872] __btrfs_balance+0x409/0xa50 [btrfs] [1133936.777932] btrfs_balance+0x386/0x550 [btrfs] [1133936.777992] btrfs_ioctl_balance+0x2c1/0x380 [btrfs] [1133936.778056] btrfs_ioctl+0x836/0x20d0 [btrfs] [1133936.778090] ? do_anonymous_page+0x2e6/0x650 [1133936.778122] ? __handle_mm_fault+0x760/0x7a0 [1133936.778155] do_vfs_ioctl+0x407/0x670 [1133936.780465] ? do_vfs_ioctl+0x407/0x670 [1133936.780465] ? do_vfs_ioctl+0x407/0x670 [1133936.782753] ? do_user_addr_fault+0x216/0x450 [1133936.785046] ksys_ioctl+0x67/0x90 [1133936.787319] __x64_sys_ioctl+0x1a/0x20 [1133936.789533] do_syscall_64+0x57/0x190 [1133936.792137] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [1133936.794978] RIP: 0033:0x7f3f6049b50b [1133936.797794] Code: 0f 1e fa 48 8b 05 85 39 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f 3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 55 39 0d 00 f7 d8 64 89 01 48 [1133936.803711] RSP: 002b:00007ffcabbef918 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 [1133936.807004] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f3f6049b50b [1133936.810523] RDX: 00007ffcabbef9a0 RSI: 00000000c4009420 RDI: 0000000000000003 [1133936.814016] RBP: 0000000000000003 R08: 0000558a165ce6b0 R09: 000000000000007c [1133936.817566] R10: 0000000000000000 R11: 0000000000000206 R12: 00007ffcabbf18e1 [1133936.821145] R13: 00007ffcabbef9a0 R14: 0000000000000000 R15: 0000000000000000 [1133936.824599] Modules linked in: ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs cpuid wireguard ip6_udp_tunnel udp_ tunnel veth xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtyp e iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bpfilter br_netfilter bridge stp llc n fsv3 aufs rpcsec_gss_krb5 nfsv4 nfs fscache overlay nls_iso8859_1 mei_hdcp intel_rapl_msr intel_rapl_common x86_pkg_t emp_thermal intel_powerclamp coretemp kvm_intel i915 kvm crct10dif_pclmul ghash_clmulni_intel aesni_intel drm_kms_hel per crypto_simd i2c_algo_bit fb_sys_fops cryptd glue_helper syscopyarea rapl sysfillrect mei_me wmi_bmof intel_cstate input_leds sysimgblt mei ie31200_edac mac_hid sch_fq_codel nfsd auth_rpcgss lp nfs_acl lockd parport grace drm sunrp c ip_tables x_tables autofs4 btrfs xor zstd_compress raid6_pq libcrc32c hid_generic usbhid hid ahci crc32_pclmul liba hci i2c_i801 e1000e lpc_ich megaraid_sas wmi video [1133936.851100] ---[ end trace 44e625a057f42c07 ]--- [1133937.009968] RIP: 0010:remove_backref_node+0x139/0x140 [btrfs] [1133937.009971] Code: 41 5f 5d c3 49 8b 96 d0 00 00 00 48 8b 75 c8 49 89 86 d0 00 00 00 48 89 73 50 48 89 53 58 48 8 9 02 80 4b 71 02 e9 38 ff ff ff <0f> 0b c3 0f 0b 66 90 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 54 [1133937.013183] RSP: 0018:ffffaab2411df928 EFLAGS: 00010206 [1133937.014160] RAX: ffff946499a3a100 RBX: ffff9467c6425680 RCX: ffff9467c64253c0 [1133937.015184] RDX: 0000000000a11c29 RSI: cc071d12e7391895 RDI: 000000000002f0a0 [1133937.016247] RBP: ffffaab2411df960 R08: ffff94687ea978c8 R09: 0000000000010101 [1133937.017262] R10: ffff94686fa5e830 R11: 0000000000000001 R12: ffff9467c6425380 [1133937.018248] R13: dead000000000100 R14: ffff9463f21fb020 R15: dead000000000122 [1133937.019254] FS: 00007f3f603818c0(0000) GS:ffff94687ea80000(0000) knlGS:0000000000000000 [1133937.020344] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [1133937.021293] CR2: 00007f4ff9481fb8 CR3: 0000000186acc001 CR4: 00000000001606e0
(In reply to t.d.richmond from comment #3) > Just wanted to say I've run into the exact same issue. If it is really the same issue (I haven't checked), I can tell you that my way of solving it was fairly easy: And offline btrfs check -r with the newest available version of btrfs-progs. Obviously, btrfs check can detect and fix issues that btrfs scrub can't. But again, you need to do an offline scan, where the filesystem isn't mounted. And I guess, before running btrfs check with the -r option, it makes sense to run a readonly scan and let btrfs dev on the mailing list or in the #btrfs irc channel have a look.
Unfortunately I had just finished doing an offline repair at the advice of the mailing group. I do think it fixed a lot of the issues I was having, but I'm still getting these segfaults while trying to rebalance.
(In reply to t.d.richmond from comment #5) > Unfortunately I had just finished doing an offline repair at the advice of > the mailing group. I do think it fixed a lot of the issues I was having, but > I'm still getting these segfaults while trying to rebalance. Which version of btrfs-progs did you use for the offline repair? I was told in no uncertain terms that I must use the latest one. What kernel version are you using?
Brtfs-progs 5.9 5.4.0-52-generic