Bug 59061 - Warning caused by btrfs_tree_lock with lock-debugging enabled
Summary: Warning caused by btrfs_tree_lock with lock-debugging enabled
Status: RESOLVED CODE_FIX
Alias: None
Product: File System
Classification: Unclassified
Component: btrfs (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Josef Bacik
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-05-31 10:12 UTC by Clemens Eisserer
Modified: 2013-07-05 21:08 UTC (History)
1 user (show)

See Also:
Kernel Version: 3.10.0-0.rc2
Subsystem:
Regression: No
Bisected commit-id:


Attachments
possible fix (470 bytes, patch)
2013-06-05 15:49 UTC, Josef Bacik
Details | Diff

Description Clemens Eisserer 2013-05-31 10:12:24 UTC
Description of problem:
ran a NFS-trace replay tool to reproduce other issues with BTRFS, when this happend. Only occured once, haven't been successful in trying to reproduce it.

Additional info:
reporter:       libreport-2.1.4
WARNING: at kernel/lockdep.c:708 __lock_acquire+0x1a50/0x1b30()
Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_nat nf_nat_ipv6 ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables rfcomm ip6table_filter ip6_tables bnep mmc_block iTCO_wdt iTCO_vendor_support dell_wmi sparse_keymap ppdev snd_hda_codec_hdmi snd_hda_codec_idt mperf coretemp kvm_intel kvm dell_laptop dcdbas microcode arc4 cdc_mbim uvcvideo iwldvm videobuf2_vmalloc videobuf2_memops videobuf2_core mac80211 cdc_ncm videodev usbnet media mii cdc_wdm cdc_acm iwlwifi btusb bluetooth snd_hda_intel cfg80211 sdhci_pci sdhci rfkill i2c_i801 lpc_ich mmc_core snd_hda_codec mfd_core snd_hwdep snd_seq snd_seq_device snd_pcm e1000e snd_page_alloc snd_timer snd soundcore ptp pps_core wmi parport_pc parport binfmt_misc uinput dm_crypt btrfs libcrc32c xor zlib_deflate crc32_pclmul crc32c_intel raid6_pq ghash_clmulni_intel i915 i2c_algo_bit drm_kms_helper drm i2c_core video
CPU: 2 PID: 2718 Comm: btrfs-cleaner Not tainted 3.10.0-0.rc2.git3.1.fc20.x86_64 #1
Hardware name: Dell Inc. Latitude E6320/050F4T, BIOS A08 10/18/2011
 0000000000000009 ffff8801c288ba38 ffffffff817163eb ffff8801c288ba70
 ffffffff810676d1 0000000000000002 ffff88021caca5a0 ffffffff8261ad80
 ffff8800938dbd90 0000000000000000 ffff8801c288ba80 ffffffff810677aa
Call Trace:
 [<ffffffff817163eb>] dump_stack+0x19/0x1b
 [<ffffffff810676d1>] warn_slowpath_common+0x61/0x80
 [<ffffffff810677aa>] warn_slowpath_null+0x1a/0x20
 [<ffffffff810db3e0>] __lock_acquire+0x1a50/0x1b30
 [<ffffffff8101fd69>] ? sched_clock+0x9/0x10
 [<ffffffff810acb15>] ? sched_clock_cpu+0xb5/0x100
 [<ffffffff810d9c85>] ? __lock_acquire+0x2f5/0x1b30
 [<ffffffff8101fd69>] ? sched_clock+0x9/0x10
 [<ffffffff810acb15>] ? sched_clock_cpu+0xb5/0x100
 [<ffffffff810dbcb2>] lock_acquire+0xa2/0x1f0
 [<ffffffffa01dc654>] ? btrfs_tree_lock+0x104/0x270 [btrfs]
 [<ffffffff8171e536>] _raw_write_lock+0x46/0x80
 [<ffffffffa01dc654>] ? btrfs_tree_lock+0x104/0x270 [btrfs]
 [<ffffffffa01dc654>] btrfs_tree_lock+0x104/0x270 [btrfs]
 [<ffffffffa01c644a>] ? find_extent_buffer+0xda/0x1f0 [btrfs]
 [<ffffffffa0190500>] do_walk_down+0xf0/0x530 [btrfs]
 [<ffffffff8101fd69>] ? sched_clock+0x9/0x10
 [<ffffffff810acb15>] ? sched_clock_cpu+0xb5/0x100
 [<ffffffffa0190a06>] walk_down_tree+0xc6/0x100 [btrfs]
 [<ffffffffa0193183>] btrfs_drop_snapshot+0x3d3/0x690 [btrfs]
 [<ffffffffa01a7564>] btrfs_clean_one_deleted_snapshot+0xd4/0x160 [btrfs]
 [<ffffffffa019d780>] cleaner_kthread+0x140/0x180 [btrfs]
 [<ffffffffa019d640>] ? check_leaf.isra.39+0x310/0x310 [btrfs]
 [<ffffffff81095ced>] kthread+0xed/0x100
 [<ffffffff81095c00>] ? insert_kthread_work+0x80/0x80
 [<ffffffff81727d6c>] ret_from_fork+0x7c/0xb0
 [<ffffffff81095c00>] ? insert_kthread_work+0x80/0x80
Comment 1 David Sterba 2013-06-03 11:52:02 UTC
I've found this lockdep warnings in my logs from the time I worked on it, the kernel is 3.8+ probably with some -next patches (around  1 Mar 15:36:38).

[ 4740.477372] btrfs: relocating block group 23114809344 flags 1
[ 4741.240448] btrfs_clean_one_deleted_snapshot: btrfs: cleaner removing 484
[... snipped like 30 similar messages for brevity]
[ 4786.583116] btrfs_clean_one_deleted_snapshot: btrfs: cleaner removing 752
[ 4789.868173] ------------[ cut here ]------------
[ 4789.872123] WARNING: at kernel/lockdep.c:702 __lock_acquire+0x1b7d/0x1f20()
[ 4789.872123] Hardware name: Santa Rosa platform
[ 4789.872123] Modules linked in: btrfs aoe dm_crypt loop [last unloaded: btrfs]
[ 4789.872123] Pid: 18578, comm: btrfs-cleaner Not tainted 3.8.0-default+ #267
[ 4789.872123] Call Trace:
[ 4789.872123]  [<ffffffff8104c77f>] warn_slowpath_common+0x7f/0xc0
[ 4789.872123]  [<ffffffff8104c7da>] warn_slowpath_null+0x1a/0x20
[ 4789.872123]  [<ffffffff810abd5d>] __lock_acquire+0x1b7d/0x1f20
[ 4789.872123]  [<ffffffff81009ed5>] ? native_sched_clock+0x15/0x80
[ 4789.872123]  [<ffffffff810a6909>] ? trace_hardirqs_off_caller+0x29/0xc0
[ 4789.872123]  [<ffffffff810aa535>] ? __lock_acquire+0x355/0x1f20
[ 4789.872123]  [<ffffffff810a6909>] ? trace_hardirqs_off_caller+0x29/0xc0
[ 4789.872123]  [<ffffffffa016e4e1>] ? btrfs_tree_lock+0x131/0x290 [btrfs]
[ 4789.872123]  [<ffffffff810ac794>] lock_acquire+0x94/0x130
[ 4789.872123]  [<ffffffffa016e4e1>] ? btrfs_tree_lock+0x131/0x290 [btrfs]
[ 4789.872123]  [<ffffffff81009ed5>] ? native_sched_clock+0x15/0x80
[ 4789.872123]  [<ffffffff81959706>] _raw_write_lock+0x46/0x80
[ 4789.872123]  [<ffffffffa016e4e1>] ? btrfs_tree_lock+0x131/0x290 [btrfs]
[ 4789.872123]  [<ffffffff81087d1f>] ? local_clock+0x6f/0x80
[ 4789.872123]  [<ffffffffa016e4e1>] btrfs_tree_lock+0x131/0x290 [btrfs]
[ 4789.872123]  [<ffffffffa015845e>] ? find_extent_buffer+0xae/0x110 [btrfs]
[ 4789.872123]  [<ffffffffa01583b0>] ? alloc_extent_buffer+0x4d0/0x4d0 [btrfs]
[ 4789.872123]  [<ffffffffa0121571>] do_walk_down+0xe1/0x540 [btrfs]
[ 4789.872123]  [<ffffffff810a6909>] ? trace_hardirqs_off_caller+0x29/0xc0
[ 4789.872123]  [<ffffffff810a69ad>] ? trace_hardirqs_off+0xd/0x10
[ 4789.872123]  [<ffffffffa01205b4>] ? btrfs_block_rsv_check+0x74/0x90 [btrfs]
[ 4789.872123]  [<ffffffffa0121aa8>] walk_down_tree+0xd8/0x110 [btrfs]
[ 4789.872123]  [<ffffffffa0124b20>] btrfs_drop_snapshot+0x380/0x640 [btrfs]
[ 4789.872123]  [<ffffffffa0138dc5>] btrfs_clean_one_deleted_snapshot+0x125/0x1a0 [btrfs]
[ 4789.872123]  [<ffffffffa012ec22>] cleaner_kthread+0xb2/0x180 [btrfs]
[ 4789.872123]  [<ffffffffa012eb70>] ? btree_readpage+0x30/0x30 [btrfs]
[ 4789.872123]  [<ffffffff81073e8e>] kthread+0xde/0xf0
[ 4789.872123]  [<ffffffff81073db0>] ? flush_kthread_worker+0x1e0/0x1e0
[ 4789.872123]  [<ffffffff81962dec>] ret_from_fork+0x7c/0xb0
[ 4789.872123]  [<ffffffff81073db0>] ? flush_kthread_worker+0x1e0/0x1e0
[ 4789.872123] ---[ end trace 930320f35566d00f ]---
[ 4792.158795] btrfs_clean_one_deleted_snapshot: btrfs: cleaner removing 372
[... etc]

And another one with 3.8-rc7+ tag, without any furhter details.
Comment 2 Josef Bacik 2013-06-05 15:49:16 UTC
Created attachment 103571 [details]
possible fix

This is my educated guess, can you run with this and see if the warning goes away?
Comment 3 David Sterba 2013-07-03 12:42:45 UTC
When btrfs_find_create_tree_block does the *_create_* part through alloc_extent_buffer, the lockdep class is not set and triggers the warning. In other cases when the tree block is created the class is set.

There's another similar instance in walk_down_log_tree where find_create is not followed by lockdep class update.

I think the patch is correct and non-intrusive anyway, so add it to next and we'll see, the bug is not easy to reproduce. My guess is that when cleaning starts and no blocks are yet cached, it may show up.
Comment 4 Josef Bacik 2013-07-05 21:08:47 UTC
Fixed with

[PATCH] Btrfs: set lockdep class before locking new extent buffer

walk_down_log_tree doesn't lock the eb after creating it, it reads it first which gets the lockdep class set properly, so it is good to go.  Please re-open if you reproduce with this patch in place.

Note You need to log in before you can comment on or make changes to this bug.