With ongoing device remove of 1 harddisk from RAID10 filesystem, a nighly crontask doing diff snapshot send | receive combination, triggered the following bug: [55989.296005] ------------[ cut here ]------------ [55989.296412] kernel BUG at fs/btrfs/send.c:5390! [55989.296823] invalid opcode: 0000 [#1] PREEMPT SMP [55989.297235] Modules linked in: arc4 ecb md4 md5 nls_utf8 cifs fscache binfmt_misc vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) fuse usb_storage af_packet snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal snd_hda_codec_hdmi intel_powerclamp coretemp snd_hda_intel snd_hda_controller kvm_intel snd_hda_codec kvm snd_hda_core eeepc_wmi crc32_pclmul asus_wmi snd_pcm crc32c_intel iTCO_wdt sparse_keymap rfkill ghash_clmulni_intel snd_hwdep bcache iTCO_vendor_support aesni_intel ablk_helper snd_seq cryptd snd_timer lrw r8169 gf128mul mii glue_helper snd_seq_device mei_me pcspkr aes_x86_64 serio_raw xhci_pci i2c_i801 mei xhci_hcd snd wmi shpchp lpc_ich soundcore mfd_core battery sg dm_mod autofs4 btrfs raid6_pq xor i915 drm_kms_helper drm i2c_algo_bit video fan thermal button processor [55989.299974] thermal_sys scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_dh [last unloaded: vboxdrv] [55989.300971] CPU: 4 PID: 8109 Comm: btrfs Tainted: G O 4.1.6-11-desktop #14 [55989.301467] Hardware name: ASUS All Series/H87M-PRO, BIOS 2101 07/21/2014 [55989.301960] task: ffff880206eee110 ti: ffff880086444000 task.ti: ffff880086444000 [55989.302468] RIP: 0010:[<ffffffffa02cc46c>] [<ffffffffa02cc46c>] changed_cb+0xaec/0xaf0 [btrfs] [55989.303078] RSP: 0018:ffff880086447ad8 EFLAGS: 00010287 [55989.303671] RAX: 000000000000454a RBX: ffff8803ebb01400 RCX: 0000000000000060 [55989.304268] RDX: ffff880086447bf5 RSI: 000000000000454a RDI: 0000000000000286 [55989.304801] RBP: ffff880086447b88 R08: 0000000000000000 R09: 0000000000000000 [55989.305317] R10: 0000000000000013 R11: 0000000000000004 R12: 0000000000000002 [55989.305829] R13: ffff8801033b3b60 R14: ffff880086447bf5 R15: 0000000000000acd [55989.306341] FS: 00007f3086fe78c0(0000) GS:ffff88041fb00000(0000) knlGS:0000000000000000 [55989.306862] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [55989.307384] CR2: 00007fe4d561d000 CR3: 000000005952e000 CR4: 00000000001406e0 [55989.307949] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [55989.308480] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [55989.309006] Stack: [55989.309527] ffff8803b2eea000 0000000000000065 ffff880086447b18 ffff88038ee4f770 [55989.310065] 0000000000000000 ffff880000000000 0000160000000000 ffff88038ee4f770 [55989.310607] ffff880086447b88 0000000000000000 ffff880086447b88 ffffffffa0274a84 [55989.311147] Call Trace: [55989.311687] [<ffffffffa0274a84>] ? btrfs_get_token_32+0x54/0xe0 [btrfs] [55989.312241] [<ffffffffa027ffd9>] ? memcmp_extent_buffer+0xb9/0x110 [btrfs] [55989.312793] [<ffffffffa023a193>] btrfs_compare_trees+0x603/0x730 [btrfs] [55989.313346] [<ffffffff811c44ec>] ? vfs_write+0x14c/0x1b0 [55989.313905] [<ffffffffa02cb980>] ? process_extent+0x13b0/0x13b0 [btrfs] [55989.314468] [<ffffffffa02cd24b>] btrfs_ioctl_send+0xddb/0x10d0 [btrfs] [55989.315025] [<ffffffff8115cb25>] ? __alloc_pages_nodemask+0x165/0x940 [55989.315595] [<ffffffffa029340e>] btrfs_ioctl+0x29e/0x2a30 [btrfs] [55989.316267] [<ffffffff810928e0>] ? __enqueue_entity+0x70/0x80 [55989.316965] [<ffffffff8109ab77>] ? enqueue_entity+0x4b7/0xde0 [55989.317537] [<ffffffff81349f90>] ? find_next_bit+0x20/0x30 [55989.318381] [<ffffffff8109b934>] ? enqueue_task_fair+0x494/0x7c0 [55989.319374] [<ffffffff8100c859>] ? sched_clock+0x9/0x10 [55989.320098] [<ffffffff81091825>] ? sched_clock_cpu+0x95/0xe0 [55989.320689] [<ffffffff811d74a8>] do_vfs_ioctl+0x2f8/0x510 [55989.321275] [<ffffffff8108e2d5>] ? wake_up_new_task+0x125/0x1d0 [55989.321866] [<ffffffff811e106d>] ? __fget+0x6d/0xa0 [55989.322463] [<ffffffff811d7741>] SyS_ioctl+0x81/0xa0 [55989.323049] [<ffffffff81677872>] system_call_fastpath+0x16/0x75 [55989.323633] Code: fd ff ff 41 83 45 40 01 e9 5f fd ff ff 41 8b 57 40 e9 a1 fc ff ff 4c 89 ff 89 45 88 e8 4e 63 f6 ff 8b 45 88 e9 a6 f5 ff ff 0f 0b <0f> 0b 0f 0b 55 48 89 e5 41 57 41 56 41 55 41 54 49 89 fc bf 15 [55989.324861] RIP [<ffffffffa02cc46c>] changed_cb+0xaec/0xaf0 [btrfs] [55989.325458] RSP <ffff880086447ad8> [55989.328089] ---[ end trace 9142c6f90337ebd3 ]--- After last reboot at around 2015-09-09 9am UTC : # btrfs device remove /dev/sdd /local/net Task is still ongoing and progressing now, 10 hours after bug occurred. Distro base is openSUSE 13.1 with updates from opensuse update servers and some even newer tooling/objects w.r.t. filesystems. Kernel 4.1.6 and btrfs-progs v4.2+20150903
Some update/background info: I keep nightly backup ro snapshots for 33 days on main fs and 100 days on distant backup-only fs. The device remove operation did complete, but I had to do a forced/hard reboot after it. Originally, the fs was RAID5, but I wanted larger harddisks and bcache. The balance from RAID5 to RAID10 was very painfull and slow, but the RAID5 allocation had gotten very slow and on relatively old disks. I have done several device swapping done after this bug (and some other bug hitting) but I did not loose data. As experiment, I repeated the send | receive backup operation with the particular snapshot, but no kernelbug hit. Of course is was not possible to do the same remove/balance operation at the same time. Now I switched to Kernel 4.1.8 and btrfs-progs v4.2+20150922 and 4 the disks+bcache system runs fine.