After removing a failed disk in a 12-disk RAID 10 BTRFS pool, I tried to run btrfs device delete missing. Unfortunately this failed to complete, and left the following in my log: [49421.533240] ------------[ cut here ]------------ [49421.533248] WARNING: CPU: 3 PID: 5055 at fs/btrfs/delayed-ref.c:475 update_existing_ref.isra.2+0x1b5/0x1e0 [btrfs]() [49421.533248] Modules linked in: macvlan snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic intel_rapl iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel aesni_intel iTCO_wdt snd_hda_controller aes_x86_64 lrw mxm_wmi iTCO_vendor_support gf128mul glue_helper i915 ablk_helper cryptd tg3 snd_hda_codec snd_hwdep snd_pcm ptp mousedev evdev serio_raw drm_kms_helper joydev pcspkr snd_timer pps_core snd libphy mei_me mac_hid lpc_ich i2c_i801 soundcore drm mei intel_gtt i2c_algo_bit i2c_core nuvoton_cir tpm_tis battery rc_core wmi tpm video button ie31200_edac edac_core shpchp processor sch_fq_codel fuse ip_tables x_tables btrfs xor raid6_pq hid_generic sd_mod usbhid hid atkbd libps2 crc32c_intel ahci libahci ehci_pci [49421.533266] xhci_pci ehci_hcd xhci_hcd libata mpt2sas raid_class scsi_transport_sas usbcore usb_common scsi_mod i8042 serio [49421.533270] CPU: 3 PID: 5055 Comm: btrfs Tainted: G W 4.0.7-2-ARCH #1 [49421.533271] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z77 Extreme4, BIOS P2.90 07/11/2013 [49421.533271] 0000000000000000 00000000e1956673 ffff88011cf6f658 ffffffff81574ec3 [49421.533272] 0000000000000000 0000000000000000 ffff88011cf6f698 ffffffff81074e7a [49421.533273] 0000000000000000 ffff88012617abb0 ffff88019157f428 ffff880215592000 [49421.533275] Call Trace: [49421.533276] [<ffffffff81574ec3>] dump_stack+0x4c/0x6e [49421.533278] [<ffffffff81074e7a>] warn_slowpath_common+0x8a/0xc0 [49421.533279] [<ffffffff81074faa>] warn_slowpath_null+0x1a/0x20 [49421.533284] [<ffffffffa02cb8a5>] update_existing_ref.isra.2+0x1b5/0x1e0 [btrfs] [49421.533288] [<ffffffffa02cb9bc>] add_delayed_tree_ref+0xec/0x1b0 [btrfs] [49421.533293] [<ffffffffa02cc7fe>] btrfs_add_delayed_tree_ref+0x10e/0x180 [btrfs] [49421.533298] [<ffffffffa0266730>] btrfs_free_extent+0xe0/0x140 [btrfs] [49421.533302] [<ffffffffa02529f6>] ? btrfs_release_path+0x46/0xb0 [btrfs] [49421.533307] [<ffffffffa0266abe>] do_walk_down+0x32e/0x9d0 [btrfs] [49421.533311] [<ffffffffa0264ad2>] ? walk_down_proc+0x312/0x330 [btrfs] [49421.533316] [<ffffffffa026722d>] walk_down_tree+0xcd/0x110 [btrfs] [49421.533320] [<ffffffffa026ae3f>] btrfs_drop_snapshot+0x3ff/0x8a0 [btrfs] [49421.533325] [<ffffffffa02d3c39>] merge_reloc_roots+0xe9/0x280 [btrfs] [49421.533330] [<ffffffffa02d403e>] relocate_block_group+0x26e/0x720 [btrfs] [49421.533335] [<ffffffffa02d46c6>] btrfs_relocate_block_group+0x1d6/0x2e0 [btrfs] [49421.533340] [<ffffffffa02a797e>] btrfs_relocate_chunk.isra.20+0x3e/0xc0 [btrfs] [49421.533345] [<ffffffffa02a91d4>] btrfs_balance+0xa04/0xf90 [btrfs] [49421.533350] [<ffffffffa02b1a59>] btrfs_ioctl_balance+0x169/0x3d0 [btrfs] [49421.533355] [<ffffffffa02b74e0>] btrfs_ioctl+0x580/0x2950 [btrfs] [49421.533357] [<ffffffff8116c790>] ? ftrace_raw_output_mm_lru_activate+0x70/0x70 [49421.533358] [<ffffffff8116d9d9>] ? __lru_cache_add+0x79/0x90 [49421.533359] [<ffffffff8116ddbb>] ? lru_cache_add_active_or_unevictable+0x2b/0xb0 [49421.533361] [<ffffffff81190158>] ? handle_mm_fault+0xd88/0x17c0 [49421.533362] [<ffffffff81193c0f>] ? __vma_link_rb+0x6f/0x90 [49421.533363] [<ffffffff81193cf1>] ? vma_link+0xc1/0xd0 [49421.533364] [<ffffffff811ebd06>] do_vfs_ioctl+0x2c6/0x4d0 [49421.533366] [<ffffffff81062cfd>] ? __do_page_fault+0x18d/0x4b0 [49421.533367] [<ffffffff811ebf91>] SyS_ioctl+0x81/0xa0 [49421.533369] [<ffffffff8157a7c9>] system_call_fastpath+0x12/0x17 [49421.533369] ---[ end trace 29449123ffb36fe8 ]--- [49422.498816] ------------[ cut here ]------------ [49422.498834] kernel BUG at fs/btrfs/extent-tree.c:2248! [49422.498847] invalid opcode: 0000 [#1] PREEMPT SMP [49422.498862] Modules linked in: macvlan snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic intel_rapl iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel aesni_intel iTCO_wdt snd_hda_controller aes_x86_64 lrw mxm_wmi iTCO_vendor_support gf128mul glue_helper i915 ablk_helper cryptd tg3 snd_hda_codec snd_hwdep snd_pcm ptp mousedev evdev serio_raw drm_kms_helper joydev pcspkr snd_timer pps_core snd libphy mei_me mac_hid lpc_ich i2c_i801 soundcore drm mei intel_gtt i2c_algo_bit i2c_core nuvoton_cir tpm_tis battery rc_core wmi tpm video button ie31200_edac edac_core shpchp processor sch_fq_codel fuse ip_tables x_tables btrfs xor raid6_pq hid_generic sd_mod usbhid hid atkbd libps2 crc32c_intel ahci libahci ehci_pci [49422.499088] xhci_pci ehci_hcd xhci_hcd libata mpt2sas raid_class scsi_transport_sas usbcore usb_common scsi_mod i8042 serio [49422.499121] CPU: 3 PID: 5055 Comm: btrfs Tainted: G W 4.0.7-2-ARCH #1 [49422.499138] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z77 Extreme4, BIOS P2.90 07/11/2013 [49422.499159] task: ffff8800c83a3cc0 ti: ffff88011cf6c000 task.ti: ffff88011cf6c000 [49422.499175] RIP: 0010:[<ffffffffa026834a>] [<ffffffffa026834a>] __btrfs_run_delayed_refs+0x10da/0x12c0 [btrfs] [49422.499205] RSP: 0018:ffff88011cf6f7d8 EFLAGS: 00010202 [49422.499217] RAX: 0000000000004000 RBX: 000015a12fd84000 RCX: ffff8801261afc60 [49422.499233] RDX: 0000000000000000 RSI: ffff88014919fc50 RDI: ffff88014919fc48 [49422.499248] RBP: ffff88011cf6f908 R08: ffff88012616c109 R09: 0000000000000001 [49422.499264] R10: ffffea0006451dc0 R11: 00000000000029f0 R12: ffff88014919fbe0 [49422.499279] R13: 0000000000000000 R14: ffff8800befc8580 R15: ffff880215592000 [49422.499295] FS: 00007f3353fd08c0(0000) GS:ffff88021f380000(0000) knlGS:0000000000000000 [49422.499313] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [49422.499326] CR2: 00007f0b5e06fe4a CR3: 0000000165b86000 CR4: 00000000001407e0 [49422.499341] Stack: [49422.499347] 0000000000000001 0000000000000000 ffff880100000001 0000000000000001 [49422.499366] 0000000000000000 ffffffff812e8fbd 000015a126604000 0000000000004000 [49422.499386] ffff880215592000 ffff8800466ad000 ffff88011cf6f848 ffffffffa02600da [49422.499405] Call Trace: [49422.499415] [<ffffffff812e8fbd>] ? __percpu_counter_add+0x5d/0xa0 [49422.499433] [<ffffffffa02600da>] ? add_pinned_bytes+0x4a/0x60 [btrfs] [49422.499452] [<ffffffffa0268cd7>] ? walk_up_proc+0xd7/0x500 [btrfs] [49422.499470] [<ffffffffa026c893>] btrfs_run_delayed_refs.part.35+0x73/0x270 [btrfs] [49422.499491] [<ffffffffa026caa5>] btrfs_run_delayed_refs+0x15/0x30 [btrfs] [49422.499511] [<ffffffffa027d218>] btrfs_should_end_transaction+0x58/0x60 [btrfs] [49422.499531] [<ffffffffa026ae95>] btrfs_drop_snapshot+0x455/0x8a0 [btrfs] [49422.499551] [<ffffffffa02d3c39>] merge_reloc_roots+0xe9/0x280 [btrfs] [49422.499570] [<ffffffffa02d403e>] relocate_block_group+0x26e/0x720 [btrfs] [49422.499589] [<ffffffffa02d46c6>] btrfs_relocate_block_group+0x1d6/0x2e0 [btrfs] [49422.499610] [<ffffffffa02a797e>] btrfs_relocate_chunk.isra.20+0x3e/0xc0 [btrfs] [49422.499631] [<ffffffffa02a91d4>] btrfs_balance+0xa04/0xf90 [btrfs] [49422.499650] [<ffffffffa02b1a59>] btrfs_ioctl_balance+0x169/0x3d0 [btrfs] [49422.499669] [<ffffffffa02b74e0>] btrfs_ioctl+0x580/0x2950 [btrfs] [49422.499684] [<ffffffff8116c790>] ? ftrace_raw_output_mm_lru_activate+0x70/0x70 [49422.499701] [<ffffffff8116d9d9>] ? __lru_cache_add+0x79/0x90 [49422.499714] [<ffffffff8116ddbb>] ? lru_cache_add_active_or_unevictable+0x2b/0xb0 [49422.499731] [<ffffffff81190158>] ? handle_mm_fault+0xd88/0x17c0 [49422.499746] [<ffffffff81193c0f>] ? __vma_link_rb+0x6f/0x90 [49422.499759] [<ffffffff81193cf1>] ? vma_link+0xc1/0xd0 [49422.499772] [<ffffffff811ebd06>] do_vfs_ioctl+0x2c6/0x4d0 [49422.499785] [<ffffffff81062cfd>] ? __do_page_fault+0x18d/0x4b0 [49422.499799] [<ffffffff811ebf91>] SyS_ioctl+0x81/0xa0 [49422.499812] [<ffffffff8157a7c9>] system_call_fastpath+0x12/0x17 [49422.499825] Code: a5 48 ff ff ff 4c 8b ad 40 ff ff ff 44 8b 95 38 ff ff ff 65 ff 0d 1f 35 da 5f 74 05 e9 10 f9 ff ff e8 ec 52 06 e1 e9 06 f9 ff ff <0f> 0b 3c b6 0f 84 5c fc ff ff 3c b8 0f 85 c4 f6 ff ff 48 8b 3d [49422.499930] RIP [<ffffffffa026834a>] __btrfs_run_delayed_refs+0x10da/0x12c0 [btrfs] [49422.499952] RSP <ffff88011cf6f7d8> [49422.505196] ---[ end trace 29449123ffb36fe9 ]--- [49450.519456] ------------[ cut here ]------------ [49450.519489] kernel BUG at fs/btrfs/extent-tree.c:2248! [49450.519518] invalid opcode: 0000 [#2] PREEMPT SMP [49450.519553] Modules linked in: macvlan snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic intel_rapl iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel aesni_intel iTCO_wdt snd_hda_controller aes_x86_64 lrw mxm_wmi iTCO_vendor_support gf128mul glue_helper i915 ablk_helper cryptd tg3 snd_hda_codec snd_hwdep snd_pcm ptp mousedev evdev serio_raw drm_kms_helper joydev pcspkr snd_timer pps_core snd libphy mei_me mac_hid lpc_ich i2c_i801 soundcore drm mei intel_gtt i2c_algo_bit i2c_core nuvoton_cir tpm_tis battery rc_core wmi tpm video button ie31200_edac edac_core shpchp processor sch_fq_codel fuse ip_tables x_tables btrfs xor raid6_pq hid_generic sd_mod usbhid hid atkbd libps2 crc32c_intel ahci libahci ehci_pci [49450.520095] xhci_pci ehci_hcd xhci_hcd libata mpt2sas raid_class scsi_transport_sas usbcore usb_common scsi_mod i8042 serio [49450.520173] CPU: 2 PID: 2918 Comm: btrfs-transacti Tainted: G D W 4.0.7-2-ARCH #1 [49450.520218] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z77 Extreme4, BIOS P2.90 07/11/2013 [49450.520270] task: ffff880037b4e540 ti: ffff880126a14000 task.ti: ffff880126a14000 [49450.520310] RIP: 0010:[<ffffffffa026834a>] [<ffffffffa026834a>] __btrfs_run_delayed_refs+0x10da/0x12c0 [btrfs] [49450.520378] RSP: 0018:ffff880126a17c08 EFLAGS: 00010202 [49450.520407] RAX: 0000000000004000 RBX: 000015a12fd84000 RCX: ffff8801260f0370 [49450.520445] RDX: 0000000000000000 RSI: ffff88013b6af6b0 RDI: ffff88013b6af6a8 [49450.520483] RBP: ffff880126a17d38 R08: ffff880206c0e9a1 R09: 0000000000000001 [49450.523195] R10: ffffea0007e38140 R11: 00000000000029f0 R12: ffff88013b6af640 [49450.525907] R13: 0000000000000000 R14: ffff880034fe4420 R15: ffff8802154d4210 [49450.528623] FS: 0000000000000000(0000) GS:ffff88021f300000(0000) knlGS:0000000000000000 [49450.531174] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [49450.533616] CR2: 000000000226a724 CR3: 000000000180b000 CR4: 00000000001407e0 [49450.536004] Stack: [49450.538310] 0000000000000001 0000000000000000 ffff880100000001 0000000000000001 [49450.540601] 0000000000000000 ffff880037b4e5a8 ffff88021f293eb0 ffff880037b4e5a8 [49450.542999] ffff88021f293eb0 0000000000000001 ffff880126a17ca8 ffffffff810abedb [49450.545381] Call Trace: [49450.547739] [<ffffffff810abedb>] ? dequeue_entity+0x13b/0x5c0 [49450.550082] [<ffffffff810145a8>] ? __switch_to+0xe8/0x610 [49450.552424] [<ffffffffa026c893>] btrfs_run_delayed_refs.part.35+0x73/0x270 [btrfs] [49450.554795] [<ffffffffa026caa5>] btrfs_run_delayed_refs+0x15/0x30 [btrfs] [49450.557144] [<ffffffffa027d882>] btrfs_commit_transaction+0x52/0xc10 [btrfs] [49450.559503] [<ffffffffa027e4d5>] ? start_transaction+0x95/0x5a0 [btrfs] [49450.561840] [<ffffffffa02793a5>] transaction_kthread+0x1d5/0x240 [btrfs] [49450.564185] [<ffffffffa02791d0>] ? btrfs_cleanup_transaction+0x5a0/0x5a0 [btrfs] [49450.566520] [<ffffffff810934b8>] kthread+0xd8/0xf0 [49450.568809] [<ffffffff810933e0>] ? kthread_worker_fn+0x170/0x170 [49450.571093] [<ffffffff8157a718>] ret_from_fork+0x58/0x90 [49450.573378] [<ffffffff810933e0>] ? kthread_worker_fn+0x170/0x170 [49450.575638] Code: a5 48 ff ff ff 4c 8b ad 40 ff ff ff 44 8b 95 38 ff ff ff 65 ff 0d 1f 35 da 5f 74 05 e9 10 f9 ff ff e8 ec 52 06 e1 e9 06 f9 ff ff <0f> 0b 3c b6 0f 84 5c fc ff ff 3c b8 0f 85 c4 f6 ff ff 48 8b 3d [49450.578235] RIP [<ffffffffa026834a>] __btrfs_run_delayed_refs+0x10da/0x12c0 [btrfs] [49450.580710] RSP <ffff880126a17c08> [49450.583207] ---[ end trace 29449123ffb36fea ]--- There are several dozen of the first warning before this. I have also tried btrfs scrub (which completed with no errors) and btrfs check (which reported no errors until checking cgroups, some minutes after that point it stopped reading the disk and ate 100% of a cpu core) btrfs balance segfaulted and gave the same error log. Other information: The first time I tried to remove the failed device, I was on an older kernel (4.0.4 I believe) and had a simultaneous failure of the system disk (also btrfs but not part of the same pool) that hung the machine. Is there anything else I can do to troubleshoot either this problem or the array?
I was able to get the missing device to delete, finally, by deleting all snapshots and waiting for btrfs-cleanup to finish. It is unclear to me if this issue was caused by snapshots themselves, or if the bad data just happened to be contained in the snapshots.