This was bug 962014 ( https://bugzilla.redhat.com/show_bug.cgi?id=962014 ) reported at RedHat's Bugzilla. I thought moving it to the kernel-bugzilla would make it easier to keep track of it. When manually defragmenting Firefox's places file (~80mb, about 1500 extents) I hit a kernel-bug with 3.9.4 as well as with 3.10.0.rc4: [ 39.762340] [<ffffffffa01644ce>] btrfs_search_old_slot+0x56e/0x8d0 [btrfs] [ 39.762384] [<ffffffffa01dacd1>] __resolve_indirect_refs+0x141/0x5d0 [btrfs] [ 39.762428] [<ffffffffa019df51>] ? mark_extent_buffer_accessed+0x51/0x60 [btrfs] [ 39.762471] [<ffffffffa017cf7c>] ? btree_read_extent_buffer_pages.constprop.56+0xcc/0x120 [btrfs] [ 39.762522] [<ffffffffa01a4d1c>] ? read_extent_buffer+0xbc/0x110 [btrfs] [ 39.762563] [<ffffffffa01db51c>] find_parent_nodes+0x3bc/0xf90 [btrfs] [ 39.763922] [<ffffffffa0163b4f>] ? btrfs_search_slot+0x55f/0x970 [btrfs] [ 39.765285] [<ffffffffa01a3c0f>] ? free_extent_buffer+0x4f/0xa0 [btrfs] [ 39.766646] [<ffffffffa01dc189>] btrfs_find_all_roots+0x99/0x100 [btrfs] [ 39.768004] [<ffffffffa0188b50>] ? can_nocow_odirect+0x2f0/0x2f0 [btrfs] [ 39.769370] [<ffffffffa01dcadf>] iterate_extent_inodes+0x14f/0x320 [btrfs] [ 39.770748] [<ffffffffa0188b50>] ? can_nocow_odirect+0x2f0/0x2f0 [btrfs] [ 39.772126] [<ffffffffa01dcd37>] iterate_inodes_from_logical+0x87/0xa0 [btrfs] [ 39.773502] [<ffffffffa0186a0b>] record_extent_backrefs+0x7b/0xd0 [btrfs] [ 39.774874] [<ffffffffa0191af6>] btrfs_finish_ordered_io+0x156/0x960 [btrfs] [ 39.776246] [<ffffffffa01924e5>] finish_ordered_fn+0x15/0x20 [btrfs] [ 39.777623] [<ffffffffa01b10cb>] worker_loop+0x12b/0x500 [btrfs] [ 39.779002] [<ffffffffa01b0fa0>] ? btrfs_queue_worker+0x300/0x300 [btrfs] [ 39.780377] [<ffffffff81080aa0>] kthread+0xc0/0xd0 [ 39.781747] [<ffffffff810809e0>] ? insert_kthread_work+0x40/0x40 [ 39.783122] [<ffffffff8164f7ac>] ret_from_fork+0x7c/0xb0 [ 39.784455] [<ffffffff810809e0>] ? insert_kthread_work+0x40/0x40
Created attachment 103951 [details] full dmesg
Created attachment 104731 [details] dmesg of 3.1rc5
would it help if I somehow could provide an image of the fs which shows the mentioned behaviour?
A "me too". After this, the CPU gets "stuck" and I have to reboot. [360491.932226] ------------[ cut here ]------------ [360491.932261] kernel BUG at /home/abuild/rpmbuild/BUILD/kernel-desktop-3.9.6/linux-3.9/fs/btrfs/ctree.c:1144! [360491.932312] invalid opcode: 0000 [#1] PREEMPT SMP [360491.932344] Modules linked in: xfs nilfs2 jfs usb_storage nls_iso8859_1 nls_cp437 vfat fat mmc_block nfsv4 auth_rpcgss nfs fscache lockd sunrpc tun snd_usb_audio snd_usbmidi_lib snd_rawmidi snd_seq_device fuse xt_tcpudp xt_pkttype xt_LOG xt_limit af_packet ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw ipt_REJECT iptable_raw xt_CT iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack ip6table_filter ip6_tables x_tables arc4 iwldvm mac80211 snd_hda_codec_hdmi snd_hda_codec_conexant iTCO_wdt iTCO_vendor_support mperf coretemp kvm_intel kvm snd_hda_intel snd_hda_codec microcode snd_hwdep snd_pcm thinkpad_acpi snd_timer joydev pcspkr sr_mod snd tpm_tis iwlwifi cdrom e1000e wmi tpm battery ac tpm_bios cfg80211 sdhci_pci soundcore ptp sdhci snd_page_alloc i2c_i801 rfkill pps_core mmc_core lpc_ich mfd_core tcp_westwood sg autofs4 btrfs raid6_pq zlib_deflate xor libcrc32c sha256_generic dm_crypt dm_mod crc32_pclmul ghash_clmulni_intel crc32c_intel aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul thermal i915 drm_kms_helper drm i2c_algo_bit video button processor thermal_sys scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_dh ata_generic ata_piix [360491.933095] CPU 3 [360491.933110] Pid: 22166, comm: btrfs-endio-wri Not tainted 3.9.6-1.g8ead728-desktop #1 LENOVO 4239CTO/4239CTO [360491.933161] RIP: 0010:[<ffffffffa022075b>] [<ffffffffa022075b>] __tree_mod_log_rewind+0x23b/0x240 [btrfs] [360491.933225] RSP: 0018:ffff88014cad7888 EFLAGS: 00010297 [360491.933253] RAX: 0000000000000000 RBX: ffff880065705b40 RCX: ffff88014cad7828 [360491.933289] RDX: 000000000247cc54 RSI: ffff88014949e821 RDI: ffff8801bf916640 [360491.933326] RBP: ffff880073048490 R08: 0000000000001000 R09: ffff88014cad7838 [360491.933362] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801c0556640 [360491.933398] R13: 000000000001731e R14: 000000000000003d R15: ffff880158b76340 [360491.933435] FS: 0000000000000000(0000) GS:ffff88021e2c0000(0000) knlGS:0000000000000000 [360491.933476] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [360491.933506] CR2: 00007f0a360017f8 CR3: 000000015be03000 CR4: 00000000000407e0 [360491.933543] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [360491.933579] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [360491.933617] Process btrfs-endio-wri (pid: 22166, threadinfo ffff88014cad6000, task ffff8801a5f8a6c0) [360491.933662] Stack: [360491.933674] ffff88011b573800 ffff88010126e7f0 0000000000000001 0000160000000000 [360491.933719] ffff880073048490 ffffffffa0228d45 6db6db6db6db6db7 ffff8801c0556640 [360491.933763] ffff88020e764000 ffff880000000000 000000000001731e ffff880197cb8158 [360491.933807] Call Trace: [360491.933848] [<ffffffffa0228d45>] btrfs_search_old_slot+0x635/0x950 [btrfs] [360491.933909] [<ffffffffa02a1ec6>] __resolve_indirect_refs+0x156/0x640 [btrfs] [360491.934044] [<ffffffffa02a2e0c>] find_parent_nodes+0x95c/0x1050 [btrfs] [360491.934176] [<ffffffffa02a3592>] btrfs_find_all_roots+0x92/0x100 [btrfs] [360491.934307] [<ffffffffa02a401e>] iterate_extent_inodes+0x16e/0x370 [btrfs] [360491.934440] [<ffffffffa02a42b8>] iterate_inodes_from_logical+0x98/0xc0 [btrfs] [360491.934572] [<ffffffffa024c1c8>] record_extent_backrefs+0x68/0xe0 [btrfs] [360491.934652] [<ffffffffa0256d80>] btrfs_finish_ordered_io+0x150/0x990 [btrfs] [360491.934739] [<ffffffffa0276ef3>] worker_loop+0x153/0x560 [btrfs] [360491.934833] [<ffffffff810697c3>] kthread+0xb3/0xc0 [360491.934864] [<ffffffff815dc6bc>] ret_from_fork+0x7c/0xb0 [360491.934896] DWARF2 unwinder stuck at ret_from_fork+0x7c/0xb0 [360491.934925] [360491.934934] Leftover inexact backtrace: [360491.934934] [360491.934965] [<ffffffff81069710>] ? kthread_create_on_node+0x120/0x120 [360491.934999] Code: c1 48 63 43 58 48 89 c2 48 c1 e2 05 48 8d 54 10 65 48 63 43 2c 48 89 c6 48 c1 e6 05 48 8d 74 30 65 e8 3a af 04 00 e9 b3 fe ff ff <0f> 0b 0f 0b 90 41 57 41 56 41 55 41 54 55 48 89 fd 53 48 83 ec [360491.935188] RIP [<ffffffffa022075b>] __tree_mod_log_rewind+0x23b/0x240 [btrfs] [360491.935233] RSP <ffff88014cad7888> [360491.946047] ---[ end trace 1475a0830dcadf9c ]--- [360491.946051] note: btrfs-endio-wri[22166] exited with preempt_count 1
Created attachment 106051 [details] fix ulist_add_merge crash
Comment on attachment 106051 [details] fix ulist_add_merge crash Could you please try this patch?
Sure, thanks a lot. I'll give it a try after work.
Hi, just "me too" on Debian 3.9 kernel, had to reboot about one hour after starting defrag, because server became totally unresponsive on network and console. dmesg: http://pastebin.com/Ndxypkxa I will try patch asap. And will inform you about result. I'm not sure if I will be able to reproduce it, because I think, that it was caused by some random problem, when running many concurrent tasks. Ondrej
Can you guys try this patch [PATCH] Btrfs: hold the tree mod lock in __tree_mod_log_rewind and verify it fixes the problem?
I managed to try and - for me on 3.9.8 - it does not appear to resolve the issue.
For me it did improve things (3.10.0.rc7) - I can now successfully execute "btrfs fi defrag places.sqlite && sync" without immediate troubles. However I have experienced two hard lock-ups when using the autodefrag option (no caps-lock led, no mouse-pointer) - I'll use this kernel a few hours and report back when I can find the root-cause of the lock-ups.
I can now confirm that defrag works with the patch, just autodefrag leads to lock-ups quite quickly. Is "[PATCH] Btrfs: only do the tree_mod_log_free_eb if this is our last ref" expected to help in this case?
I mis-applied the patch and - so far - defrag does not seem to lock up the machine.
I spoke too soon. After (trying to) defrag a few files.... it hung up again.
Looks like I jumped the gun, there's a second case that needs to be accounted for, can you also apply Btrfs: only do the tree_mod_log_free_eb if this is our last ref and re-test.
With the second patch applied, I'm able to defrag files. I did get a failure-to-unmount at shutdown, however, and this came up when defragmenting a fairly large file (5.4GB, over 10,000 extents): 2013-07-02T10:22:58.757314-05:00 linux-3y3i kernel: [ 437.616176] ------------[ cut here ]------------ 2013-07-02T10:22:58.757336-05:00 linux-3y3i kernel: [ 437.616202] WARNING: at fs/btrfs/inode.c:2157 record_one_backref+0x3c1/0x430 [btrfs]() 2013-07-02T10:22:58.757338-05:00 linux-3y3i kernel: [ 437.616203] Hardware name: 4239CTO 2013-07-02T10:22:58.757339-05:00 linux-3y3i kernel: [ 437.616204] Modules linked in: fuse xt_tcpudp xt_pkttype xt_LOG xt_limit af_packet ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw ipt_REJECT iptable_raw xt_CT iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack ip6table_filter ip6_tables x_tables arc4 iwldvm mac80211 iTCO_wdt iTCO_vendor_support snd_hda_codec_hdmi snd_hda_codec_conexant snd_hda_intel snd_hda_codec snd_usb_audio mperf snd_pcm coretemp kvm_intel kvm microcode snd_timer snd_hwdep snd_usbmidi_lib snd_rawmidi snd_seq_device pcspkr thinkpad_acpi snd joydev i2c_i801 iwlwifi sr_mod cdrom lpc_ich sdhci_pci mfd_core sdhci cfg80211 e1000e mmc_core rfkill snd_page_alloc ptp pps_core tpm_tis tpm tpm_bios ac battery soundcore wmi tcp_westwood sg autofs4 btrfs raid6_pq zlib_deflate xor libcrc32c sha256_generic dm_crypt dm_mod ghash_clmulni_intel crc32_pclmul crc32c_intel aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul i915 drm_kms_helper thermal drm i2c_algo_bit video button processor thermal_sys scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_dh ata_generic ata_piix 2013-07-02T10:22:58.757341-05:00 linux-3y3i kernel: [ 437.616239] Pid: 725, comm: btrfs-endio-wri Not tainted 3.9.8-1.gfccf19c-desktop #3 2013-07-02T10:22:58.757342-05:00 linux-3y3i kernel: [ 437.616240] Call Trace: 2013-07-02T10:22:58.757343-05:00 linux-3y3i kernel: [ 437.616248] [<ffffffff81004728>] dump_trace+0x88/0x300 2013-07-02T10:22:58.757343-05:00 linux-3y3i kernel: [ 437.616253] [<ffffffff815c8e9b>] dump_stack+0x69/0x6f 2013-07-02T10:22:58.757344-05:00 linux-3y3i kernel: [ 437.616258] [<ffffffff81046069>] warn_slowpath_common+0x79/0xc0 2013-07-02T10:22:58.757345-05:00 linux-3y3i kernel: [ 437.616267] [<ffffffffa025edc1>] record_one_backref+0x3c1/0x430 [btrfs] 2013-07-02T10:22:58.757346-05:00 linux-3y3i kernel: [ 437.616303] [<ffffffffa02b40e5>] iterate_extent_inodes+0x1e5/0x370 [btrfs] 2013-07-02T10:22:58.757347-05:00 linux-3y3i kernel: [ 437.616366] [<ffffffffa02b4308>] iterate_inodes_from_logical+0x98/0xc0 [btrfs] 2013-07-02T10:22:58.757347-05:00 linux-3y3i kernel: [ 437.616425] [<ffffffffa025c218>] record_extent_backrefs+0x68/0xe0 [btrfs] 2013-07-02T10:22:58.757348-05:00 linux-3y3i kernel: [ 437.616453] [<ffffffffa0266dd0>] btrfs_finish_ordered_io+0x150/0x990 [btrfs] 2013-07-02T10:22:58.757353-05:00 linux-3y3i kernel: [ 437.616504] [<ffffffffa0286f43>] worker_loop+0x153/0x560 [btrfs] 2013-07-02T10:22:58.758292-05:00 linux-3y3i kernel: [ 437.616561] [<ffffffff810696c3>] kthread+0xb3/0xc0 2013-07-02T10:22:58.758294-05:00 linux-3y3i kernel: [ 437.616566] [<ffffffff815dca7c>] ret_from_fork+0x7c/0xb0 2013-07-02T10:22:58.758295-05:00 linux-3y3i kernel: [ 437.616570] DWARF2 unwinder stuck at ret_from_fork+0x7c/0xb0 2013-07-02T10:22:58.758296-05:00 linux-3y3i kernel: [ 437.616571] 2013-07-02T10:22:58.758297-05:00 linux-3y3i kernel: [ 437.616572] Leftover inexact backtrace: 2013-07-02T10:22:58.758298-05:00 linux-3y3i kernel: [ 437.616572] 2013-07-02T10:22:58.758299-05:00 linux-3y3i kernel: [ 437.616576] [<ffffffff81069610>] ? kthread_create_on_node+0x120/0x120 2013-07-02T10:22:58.758300-05:00 linux-3y3i kernel: [ 437.616578] ---[ end trace 33ef856bcbe1968d ]--- Also, I/O was *stopped* during this time, but eventually recovered.
Created attachment 107423 [details] BTRFS warning at fs/btrfs/inode.c:2206 record_one_backref+0x371/0x3f0()
I see about the same warning with 3.11 kernel quite often even without any heavy IO load (see the attachment https://bugzilla.kernel.org/attachment.cgi?id=107423). Although I can't observe any IO freezes. Let me know what kind of additional info is needed.
It isn't fixed in 3.11, it's fixed in btrfs-next.
sorry, I can no longer test - because I accidentially destroyed my btrfs-partition, while my backups are badly broken. Currently struggling with btrfs restore SIGSEV when I try to get least some data back (bug 61081).
(In reply to Josef Bacik from comment #19) > It isn't fixed in 3.11, it's fixed in btrfs-next. The patch mentioned in comment #15 is present in 3.11, that supposedly fixes the crash. The warning in record_one_backref still seems to happen with 3.11.1 (per bug 60524), but will be fixed in 3.11.2 by patch "Btrfs: reset ret in record_one_backref"
The patches are in 3.11.6