Description of problem: ran a NFS-trace replay tool to reproduce other issues with BTRFS, when this happend. Only occured once, haven't been successful in trying to reproduce it. Additional info: reporter: libreport-2.1.4 WARNING: at kernel/lockdep.c:708 __lock_acquire+0x1a50/0x1b30() Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_nat nf_nat_ipv6 ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables rfcomm ip6table_filter ip6_tables bnep mmc_block iTCO_wdt iTCO_vendor_support dell_wmi sparse_keymap ppdev snd_hda_codec_hdmi snd_hda_codec_idt mperf coretemp kvm_intel kvm dell_laptop dcdbas microcode arc4 cdc_mbim uvcvideo iwldvm videobuf2_vmalloc videobuf2_memops videobuf2_core mac80211 cdc_ncm videodev usbnet media mii cdc_wdm cdc_acm iwlwifi btusb bluetooth snd_hda_intel cfg80211 sdhci_pci sdhci rfkill i2c_i801 lpc_ich mmc_core snd_hda_codec mfd_core snd_hwdep snd_seq snd_seq_device snd_pcm e1000e snd_page_alloc snd_timer snd soundcore ptp pps_core wmi parport_pc parport binfmt_misc uinput dm_crypt btrfs libcrc32c xor zlib_deflate crc32_pclmul crc32c_intel raid6_pq ghash_clmulni_intel i915 i2c_algo_bit drm_kms_helper drm i2c_core video CPU: 2 PID: 2718 Comm: btrfs-cleaner Not tainted 3.10.0-0.rc2.git3.1.fc20.x86_64 #1 Hardware name: Dell Inc. Latitude E6320/050F4T, BIOS A08 10/18/2011 0000000000000009 ffff8801c288ba38 ffffffff817163eb ffff8801c288ba70 ffffffff810676d1 0000000000000002 ffff88021caca5a0 ffffffff8261ad80 ffff8800938dbd90 0000000000000000 ffff8801c288ba80 ffffffff810677aa Call Trace: [<ffffffff817163eb>] dump_stack+0x19/0x1b [<ffffffff810676d1>] warn_slowpath_common+0x61/0x80 [<ffffffff810677aa>] warn_slowpath_null+0x1a/0x20 [<ffffffff810db3e0>] __lock_acquire+0x1a50/0x1b30 [<ffffffff8101fd69>] ? sched_clock+0x9/0x10 [<ffffffff810acb15>] ? sched_clock_cpu+0xb5/0x100 [<ffffffff810d9c85>] ? __lock_acquire+0x2f5/0x1b30 [<ffffffff8101fd69>] ? sched_clock+0x9/0x10 [<ffffffff810acb15>] ? sched_clock_cpu+0xb5/0x100 [<ffffffff810dbcb2>] lock_acquire+0xa2/0x1f0 [<ffffffffa01dc654>] ? btrfs_tree_lock+0x104/0x270 [btrfs] [<ffffffff8171e536>] _raw_write_lock+0x46/0x80 [<ffffffffa01dc654>] ? btrfs_tree_lock+0x104/0x270 [btrfs] [<ffffffffa01dc654>] btrfs_tree_lock+0x104/0x270 [btrfs] [<ffffffffa01c644a>] ? find_extent_buffer+0xda/0x1f0 [btrfs] [<ffffffffa0190500>] do_walk_down+0xf0/0x530 [btrfs] [<ffffffff8101fd69>] ? sched_clock+0x9/0x10 [<ffffffff810acb15>] ? sched_clock_cpu+0xb5/0x100 [<ffffffffa0190a06>] walk_down_tree+0xc6/0x100 [btrfs] [<ffffffffa0193183>] btrfs_drop_snapshot+0x3d3/0x690 [btrfs] [<ffffffffa01a7564>] btrfs_clean_one_deleted_snapshot+0xd4/0x160 [btrfs] [<ffffffffa019d780>] cleaner_kthread+0x140/0x180 [btrfs] [<ffffffffa019d640>] ? check_leaf.isra.39+0x310/0x310 [btrfs] [<ffffffff81095ced>] kthread+0xed/0x100 [<ffffffff81095c00>] ? insert_kthread_work+0x80/0x80 [<ffffffff81727d6c>] ret_from_fork+0x7c/0xb0 [<ffffffff81095c00>] ? insert_kthread_work+0x80/0x80
I've found this lockdep warnings in my logs from the time I worked on it, the kernel is 3.8+ probably with some -next patches (around 1 Mar 15:36:38). [ 4740.477372] btrfs: relocating block group 23114809344 flags 1 [ 4741.240448] btrfs_clean_one_deleted_snapshot: btrfs: cleaner removing 484 [... snipped like 30 similar messages for brevity] [ 4786.583116] btrfs_clean_one_deleted_snapshot: btrfs: cleaner removing 752 [ 4789.868173] ------------[ cut here ]------------ [ 4789.872123] WARNING: at kernel/lockdep.c:702 __lock_acquire+0x1b7d/0x1f20() [ 4789.872123] Hardware name: Santa Rosa platform [ 4789.872123] Modules linked in: btrfs aoe dm_crypt loop [last unloaded: btrfs] [ 4789.872123] Pid: 18578, comm: btrfs-cleaner Not tainted 3.8.0-default+ #267 [ 4789.872123] Call Trace: [ 4789.872123] [<ffffffff8104c77f>] warn_slowpath_common+0x7f/0xc0 [ 4789.872123] [<ffffffff8104c7da>] warn_slowpath_null+0x1a/0x20 [ 4789.872123] [<ffffffff810abd5d>] __lock_acquire+0x1b7d/0x1f20 [ 4789.872123] [<ffffffff81009ed5>] ? native_sched_clock+0x15/0x80 [ 4789.872123] [<ffffffff810a6909>] ? trace_hardirqs_off_caller+0x29/0xc0 [ 4789.872123] [<ffffffff810aa535>] ? __lock_acquire+0x355/0x1f20 [ 4789.872123] [<ffffffff810a6909>] ? trace_hardirqs_off_caller+0x29/0xc0 [ 4789.872123] [<ffffffffa016e4e1>] ? btrfs_tree_lock+0x131/0x290 [btrfs] [ 4789.872123] [<ffffffff810ac794>] lock_acquire+0x94/0x130 [ 4789.872123] [<ffffffffa016e4e1>] ? btrfs_tree_lock+0x131/0x290 [btrfs] [ 4789.872123] [<ffffffff81009ed5>] ? native_sched_clock+0x15/0x80 [ 4789.872123] [<ffffffff81959706>] _raw_write_lock+0x46/0x80 [ 4789.872123] [<ffffffffa016e4e1>] ? btrfs_tree_lock+0x131/0x290 [btrfs] [ 4789.872123] [<ffffffff81087d1f>] ? local_clock+0x6f/0x80 [ 4789.872123] [<ffffffffa016e4e1>] btrfs_tree_lock+0x131/0x290 [btrfs] [ 4789.872123] [<ffffffffa015845e>] ? find_extent_buffer+0xae/0x110 [btrfs] [ 4789.872123] [<ffffffffa01583b0>] ? alloc_extent_buffer+0x4d0/0x4d0 [btrfs] [ 4789.872123] [<ffffffffa0121571>] do_walk_down+0xe1/0x540 [btrfs] [ 4789.872123] [<ffffffff810a6909>] ? trace_hardirqs_off_caller+0x29/0xc0 [ 4789.872123] [<ffffffff810a69ad>] ? trace_hardirqs_off+0xd/0x10 [ 4789.872123] [<ffffffffa01205b4>] ? btrfs_block_rsv_check+0x74/0x90 [btrfs] [ 4789.872123] [<ffffffffa0121aa8>] walk_down_tree+0xd8/0x110 [btrfs] [ 4789.872123] [<ffffffffa0124b20>] btrfs_drop_snapshot+0x380/0x640 [btrfs] [ 4789.872123] [<ffffffffa0138dc5>] btrfs_clean_one_deleted_snapshot+0x125/0x1a0 [btrfs] [ 4789.872123] [<ffffffffa012ec22>] cleaner_kthread+0xb2/0x180 [btrfs] [ 4789.872123] [<ffffffffa012eb70>] ? btree_readpage+0x30/0x30 [btrfs] [ 4789.872123] [<ffffffff81073e8e>] kthread+0xde/0xf0 [ 4789.872123] [<ffffffff81073db0>] ? flush_kthread_worker+0x1e0/0x1e0 [ 4789.872123] [<ffffffff81962dec>] ret_from_fork+0x7c/0xb0 [ 4789.872123] [<ffffffff81073db0>] ? flush_kthread_worker+0x1e0/0x1e0 [ 4789.872123] ---[ end trace 930320f35566d00f ]--- [ 4792.158795] btrfs_clean_one_deleted_snapshot: btrfs: cleaner removing 372 [... etc] And another one with 3.8-rc7+ tag, without any furhter details.
Created attachment 103571 [details] possible fix This is my educated guess, can you run with this and see if the warning goes away?
When btrfs_find_create_tree_block does the *_create_* part through alloc_extent_buffer, the lockdep class is not set and triggers the warning. In other cases when the tree block is created the class is set. There's another similar instance in walk_down_log_tree where find_create is not followed by lockdep class update. I think the patch is correct and non-intrusive anyway, so add it to next and we'll see, the bug is not easy to reproduce. My guess is that when cleaning starts and no blocks are yet cached, it may show up.
Fixed with [PATCH] Btrfs: set lockdep class before locking new extent buffer walk_down_log_tree doesn't lock the eb after creating it, it reads it first which gets the lockdep class set properly, so it is good to go. Please re-open if you reproduce with this patch in place.