Bug 104371 - kernel BUG at fs/btrfs/send.c:5390!
Summary: kernel BUG at fs/btrfs/send.c:5390!
Status: NEW
Alias: None
Product: File System
Classification: Unclassified
Component: btrfs (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: Josef Bacik
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-09-10 10:34 UTC by Henk Slager
Modified: 2016-03-20 10:01 UTC (History)
2 users (show)

See Also:
Kernel Version: 4.1.6
Tree: Mainline
Regression: No


Attachments

Description Henk Slager 2015-09-10 10:34:25 UTC
With ongoing device remove of 1 harddisk from RAID10 filesystem, a nighly crontask doing diff snapshot send | receive combination, triggered the following bug:

[55989.296005] ------------[ cut here ]------------
[55989.296412] kernel BUG at fs/btrfs/send.c:5390!
[55989.296823] invalid opcode: 0000 [#1] PREEMPT SMP 
[55989.297235] Modules linked in: arc4 ecb md4 md5 nls_utf8 cifs fscache binfmt_misc vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) fuse usb_storage af_packet snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal snd_hda_codec_hdmi intel_powerclamp coretemp snd_hda_intel snd_hda_controller kvm_intel snd_hda_codec kvm snd_hda_core eeepc_wmi crc32_pclmul asus_wmi snd_pcm crc32c_intel iTCO_wdt sparse_keymap rfkill ghash_clmulni_intel snd_hwdep bcache iTCO_vendor_support aesni_intel ablk_helper snd_seq cryptd snd_timer lrw r8169 gf128mul mii glue_helper snd_seq_device mei_me pcspkr aes_x86_64 serio_raw xhci_pci i2c_i801 mei xhci_hcd snd wmi shpchp lpc_ich soundcore mfd_core battery sg dm_mod autofs4 btrfs raid6_pq xor i915 drm_kms_helper drm i2c_algo_bit video fan thermal button processor
[55989.299974]  thermal_sys scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_dh [last unloaded: vboxdrv]
[55989.300971] CPU: 4 PID: 8109 Comm: btrfs Tainted: G           O    4.1.6-11-desktop #14
[55989.301467] Hardware name: ASUS All Series/H87M-PRO, BIOS 2101 07/21/2014
[55989.301960] task: ffff880206eee110 ti: ffff880086444000 task.ti: ffff880086444000
[55989.302468] RIP: 0010:[<ffffffffa02cc46c>]  [<ffffffffa02cc46c>] changed_cb+0xaec/0xaf0 [btrfs]
[55989.303078] RSP: 0018:ffff880086447ad8  EFLAGS: 00010287
[55989.303671] RAX: 000000000000454a RBX: ffff8803ebb01400 RCX: 0000000000000060
[55989.304268] RDX: ffff880086447bf5 RSI: 000000000000454a RDI: 0000000000000286
[55989.304801] RBP: ffff880086447b88 R08: 0000000000000000 R09: 0000000000000000
[55989.305317] R10: 0000000000000013 R11: 0000000000000004 R12: 0000000000000002
[55989.305829] R13: ffff8801033b3b60 R14: ffff880086447bf5 R15: 0000000000000acd
[55989.306341] FS:  00007f3086fe78c0(0000) GS:ffff88041fb00000(0000) knlGS:0000000000000000
[55989.306862] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[55989.307384] CR2: 00007fe4d561d000 CR3: 000000005952e000 CR4: 00000000001406e0
[55989.307949] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[55989.308480] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[55989.309006] Stack:
[55989.309527]  ffff8803b2eea000 0000000000000065 ffff880086447b18 ffff88038ee4f770
[55989.310065]  0000000000000000 ffff880000000000 0000160000000000 ffff88038ee4f770
[55989.310607]  ffff880086447b88 0000000000000000 ffff880086447b88 ffffffffa0274a84
[55989.311147] Call Trace:
[55989.311687]  [<ffffffffa0274a84>] ? btrfs_get_token_32+0x54/0xe0 [btrfs]
[55989.312241]  [<ffffffffa027ffd9>] ? memcmp_extent_buffer+0xb9/0x110 [btrfs]
[55989.312793]  [<ffffffffa023a193>] btrfs_compare_trees+0x603/0x730 [btrfs]
[55989.313346]  [<ffffffff811c44ec>] ? vfs_write+0x14c/0x1b0
[55989.313905]  [<ffffffffa02cb980>] ? process_extent+0x13b0/0x13b0 [btrfs]
[55989.314468]  [<ffffffffa02cd24b>] btrfs_ioctl_send+0xddb/0x10d0 [btrfs]
[55989.315025]  [<ffffffff8115cb25>] ? __alloc_pages_nodemask+0x165/0x940
[55989.315595]  [<ffffffffa029340e>] btrfs_ioctl+0x29e/0x2a30 [btrfs]
[55989.316267]  [<ffffffff810928e0>] ? __enqueue_entity+0x70/0x80
[55989.316965]  [<ffffffff8109ab77>] ? enqueue_entity+0x4b7/0xde0
[55989.317537]  [<ffffffff81349f90>] ? find_next_bit+0x20/0x30
[55989.318381]  [<ffffffff8109b934>] ? enqueue_task_fair+0x494/0x7c0
[55989.319374]  [<ffffffff8100c859>] ? sched_clock+0x9/0x10
[55989.320098]  [<ffffffff81091825>] ? sched_clock_cpu+0x95/0xe0
[55989.320689]  [<ffffffff811d74a8>] do_vfs_ioctl+0x2f8/0x510
[55989.321275]  [<ffffffff8108e2d5>] ? wake_up_new_task+0x125/0x1d0
[55989.321866]  [<ffffffff811e106d>] ? __fget+0x6d/0xa0
[55989.322463]  [<ffffffff811d7741>] SyS_ioctl+0x81/0xa0
[55989.323049]  [<ffffffff81677872>] system_call_fastpath+0x16/0x75
[55989.323633] Code: fd ff ff 41 83 45 40 01 e9 5f fd ff ff 41 8b 57 40 e9 a1 fc ff ff 4c 89 ff 89 45 88 e8 4e 63 f6 ff 8b 45 88 e9 a6 f5 ff ff 0f 0b <0f> 0b 0f 0b 55 48 89 e5 41 57 41 56 41 55 41 54 49 89 fc bf 15 
[55989.324861] RIP  [<ffffffffa02cc46c>] changed_cb+0xaec/0xaf0 [btrfs]
[55989.325458]  RSP <ffff880086447ad8>
[55989.328089] ---[ end trace 9142c6f90337ebd3 ]---



After last reboot at around 2015-09-09 9am UTC :

# btrfs device remove /dev/sdd /local/net

Task is still ongoing and progressing now, 10 hours after bug occurred.
Distro base is openSUSE 13.1 with updates from opensuse update servers and some even newer tooling/objects w.r.t. filesystems.

Kernel 4.1.6 and btrfs-progs v4.2+20150903
Comment 1 Henk Slager 2015-09-24 13:35:28 UTC
Some update/background info: I keep nightly backup ro snapshots for 33 days on main fs and 100 days on distant backup-only fs.

The device remove operation did complete, but I had to do a forced/hard reboot after it. Originally, the fs was RAID5, but I wanted larger harddisks and bcache. The balance from RAID5 to RAID10 was very painfull and slow, but the RAID5 allocation had gotten very slow and on relatively old disks. I have done several device swapping done after this bug (and some other bug hitting) but I did not loose data.

As experiment, I repeated the send | receive backup operation with the particular snapshot, but no kernelbug hit. Of course is was not possible to do the same remove/balance operation at the same time.

Now I switched to Kernel 4.1.8 and btrfs-progs v4.2+20150922 and 4 the disks+bcache system runs fine.

Note You need to log in before you can comment on or make changes to this bug.