Bug 15142

Summary: sysfs-related lockdep warning in __blkdev_get
Product: IO/Storage Reporter: Matti Aarnio (matti.aarnio--kernel-bugzilla)
Component: MDAssignee: io_md
Status: CLOSED CODE_FIX    
Severity: normal CC: kernel, neilb, rjw
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.33-0.18.rc4.git7.fc13.x86_64 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 14885    
Attachments: sysfs: differentiate between locking links and non-links for sysfs from Neil Brown, remade for -rc8

Description Matti Aarnio 2010-01-25 13:38:57 UTC
During boot a 3+1 disk RAID5 array with one out-of-state disk is being assembled.  Kernel RAID5 notices component statuses, and does correct thing, but with lock trouble being indicated.

The troubled component device in my RAID5 array is correctly kicked out of the
array and system boots up, so this is not a _fatal_ thing in itself.

The array gets fixed over next about 12 hours and no further lock troubles
appear.

Previous boot (untroubled) had some 2.6.32 series kernel.    

This was first entered as Fedora bug:
   https://bugzilla.redhat.com/show_bug.cgi?id=558230
but it most likely is direct "upstream issue"

Kernel dmesg report generated during boot:


dracut: Autoassembling MD Raid
md: md0 stopped.
md: bind<sdc1>
md: bind<sdb1>
md: bind<sdd1>
md: bind<sda1>
md: kicking non-fresh sdd1 from array!

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.33-0.18.rc4.git7.fc13.x86_64 #1
-------------------------------------------------------
mdadm/474 is trying to acquire lock:
 (s_active){++++.+}, at: [<ffffffff81175fde>] sysfs_addrm_finish+0x36/0x55

but task is already holding lock:
 (&bdev->bd_mutex){+.+.+.}, at: [<ffffffff811451c4>]
bd_release_from_disk+0x3a/0xec

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 (&bdev->bd_mutex){+.+.+.}:
       [<ffffffff8107e402>] __lock_acquire+0xb71/0xd19
       [<ffffffff8107e686>] lock_acquire+0xdc/0x102
       [<ffffffff814757ee>] __mutex_lock_common+0x4b/0x392
       [<ffffffff81475bf9>] mutex_lock_nested+0x3e/0x43
       [<ffffffff81144570>] __blkdev_get+0x91/0x395
       [<ffffffff81144884>] blkdev_get+0x10/0x12
       [<ffffffff81144dc5>] open_by_devnum+0x2e/0x3f
       [<ffffffff813856eb>] lock_rdev+0x39/0xe4
       [<ffffffff81385881>] md_import_device+0xeb/0x2aa
       [<ffffffff81389fd9>] add_new_disk+0x71/0x441
       [<ffffffff8138c127>] md_ioctl+0xa01/0xf49
       [<ffffffff8121afd5>] __blkdev_driver_ioctl+0x39/0xa3
       [<ffffffff8121b970>] blkdev_ioctl+0x67d/0x6b1
       [<ffffffff81143b90>] block_ioctl+0x37/0x3b
       [<ffffffff8112af18>] vfs_ioctl+0x32/0xa6
       [<ffffffff8112b498>] do_vfs_ioctl+0x490/0x4d6
       [<ffffffff8112b534>] sys_ioctl+0x56/0x79
       [<ffffffff81009c32>] system_call_fastpath+0x16/0x1b

-> #1 (&new->reconfig_mutex){+.+.+.}:
       [<ffffffff8107e402>] __lock_acquire+0xb71/0xd19
       [<ffffffff8107e686>] lock_acquire+0xdc/0x102
       [<ffffffff814757ee>] __mutex_lock_common+0x4b/0x392
       [<ffffffff81475b73>] mutex_lock_interruptible_nested+0x3e/0x43
       [<ffffffff813844ee>] mddev_lock+0x17/0x19
       [<ffffffff813847c5>] md_attr_show+0x32/0x5d
       [<ffffffff81174e64>] sysfs_read_file+0xbd/0x17f
       [<ffffffff8111e4d1>] vfs_read+0xab/0x108
       [<ffffffff8111e5ee>] sys_read+0x4a/0x6e
       [<ffffffff81009c32>] system_call_fastpath+0x16/0x1b

-> #0 (s_active){++++.+}:
       [<ffffffff8107e2ac>] __lock_acquire+0xa1b/0xd19
       [<ffffffff8107e686>] lock_acquire+0xdc/0x102
       [<ffffffff811759eb>] sysfs_deactivate+0x9a/0x103
       [<ffffffff81175fde>] sysfs_addrm_finish+0x36/0x55
       [<ffffffff8117434c>] sysfs_hash_and_remove+0x53/0x6a
       [<ffffffff81176567>] sysfs_remove_link+0x21/0x23
       [<ffffffff81143f28>] del_symlink+0x1b/0x1d
       [<ffffffff8114520b>] bd_release_from_disk+0x81/0xec
       [<ffffffff813840c8>] unbind_rdev_from_array+0x67/0x154
       [<ffffffff81386327>] kick_rdev_from_array+0x16/0x23
       [<ffffffff8138946c>] do_md_run+0x1a1/0x873
       [<ffffffff8138c408>] md_ioctl+0xce2/0xf49
       [<ffffffff8121afd5>] __blkdev_driver_ioctl+0x39/0xa3
       [<ffffffff8121b970>] blkdev_ioctl+0x67d/0x6b1
       [<ffffffff81143b90>] block_ioctl+0x37/0x3b
       [<ffffffff8112af18>] vfs_ioctl+0x32/0xa6
       [<ffffffff8112b498>] do_vfs_ioctl+0x490/0x4d6
       [<ffffffff8112b534>] sys_ioctl+0x56/0x79
       [<ffffffff81009c32>] system_call_fastpath+0x16/0x1b

other info that might help us debug this:

2 locks held by mdadm/474:
 #0:  (&new->reconfig_mutex){+.+.+.}, at: [<ffffffff813844ee>]
mddev_lock+0x17/0x19
 #1:  (&bdev->bd_mutex){+.+.+.}, at: [<ffffffff811451c4>]
bd_release_from_disk+0x3a/0xec

stack backtrace:
Pid: 474, comm: mdadm Not tainted 2.6.33-0.18.rc4.git7.fc13.x86_64 #1
Call Trace:
 [<ffffffff8107d46f>] print_circular_bug+0xa8/0xb6
 [<ffffffff8107e2ac>] __lock_acquire+0xa1b/0xd19
 [<ffffffff8107e686>] lock_acquire+0xdc/0x102
 [<ffffffff81175fde>] ? sysfs_addrm_finish+0x36/0x55
 [<ffffffff8107c15e>] ? lockdep_init_map+0x9e/0x113
 [<ffffffff811759eb>] sysfs_deactivate+0x9a/0x103
 [<ffffffff81175fde>] ? sysfs_addrm_finish+0x36/0x55
 [<ffffffff8147560d>] ? __mutex_unlock_slowpath+0x120/0x132
 [<ffffffff81175fde>] sysfs_addrm_finish+0x36/0x55
 [<ffffffff8117434c>] sysfs_hash_and_remove+0x53/0x6a
 [<ffffffff81176567>] sysfs_remove_link+0x21/0x23
 [<ffffffff81143f28>] del_symlink+0x1b/0x1d
 [<ffffffff8114520b>] bd_release_from_disk+0x81/0xec
 [<ffffffff813840c8>] unbind_rdev_from_array+0x67/0x154
 [<ffffffff814743e1>] ? printk+0x41/0x48
 [<ffffffff81386327>] kick_rdev_from_array+0x16/0x23
 [<ffffffff8138946c>] do_md_run+0x1a1/0x873
 [<ffffffff8138c408>] md_ioctl+0xce2/0xf49
 [<ffffffff81010385>] ? native_sched_clock+0x2d/0x5f
 [<ffffffff81070ffc>] ? cpu_clock+0x43/0x5e
 [<ffffffff8121afd5>] __blkdev_driver_ioctl+0x39/0xa3
 [<ffffffff8107b913>] ? lock_release_holdtime+0x2c/0xdb
 [<ffffffff8121b970>] blkdev_ioctl+0x67d/0x6b1
 [<ffffffff81143b90>] block_ioctl+0x37/0x3b
 [<ffffffff8112af18>] vfs_ioctl+0x32/0xa6
 [<ffffffff8112b498>] do_vfs_ioctl+0x490/0x4d6
 [<ffffffff8112b534>] sys_ioctl+0x56/0x79
 [<ffffffff81009c32>] system_call_fastpath+0x16/0x1b
md: unbind<sdd1>
md: export_rdev(sdd1)
Comment 1 Neil Brown 2010-01-28 01:17:42 UTC
This trace seems to suggest that there is a lock called 's_active' in
sysfs.
However the only 's_active' in sysfs is an atomic_t.
It did used to be a rwsem, but that was back in 2.6.22.

Can you check what patches that kernel has which are not in mainline?
And find out what s_active might be?

NeilBrown
Comment 2 Rafael J. Wysocki 2010-02-02 11:33:47 UTC
I think that may be an upshot of commit 846f99749ab68bbc7f75c74fec305de675b1a1bf.
Comment 3 Neil Brown 2010-02-10 00:13:21 UTC
Thanks.  I've spent a while looking at this dependency loop and other
related one.

This particular one I would like to *not* fix in md.
The loop is reported because while writing to an attribute file in sysfs,
I might delete a symlink in sysfs.  This is not really a dependency loop.
If the lockdep mechanism were taught do differentiate between files and
symlinks in sysfs it would not detect a loop here.

I will suggest an appropriate modification as below.
There are other dependency issues in md which 's_active' triggers
that are genuine.  I have a patch ready which address those.



Author: NeilBrown <neilb@suse.de>
Date:   Wed Feb 10 09:43:45 2010 +1100

    sysfs: differentiate  between locking links and non-links for sysfs
    
    symlinks and non-symlink is sysfs are very different.
    A symlink can never be locked (active) while an attribute
    modification routine is running.  So removing symlink from an
    attribute 'store' routine should be permitted without any lockdep
    warnings.
    
    So split the lockdep context for 's_active' in two, one for symlinks
    and other for everything else.
    
    Signed-off-by: NeilBrown <neilb@suse.de>

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 699f371..e6eeaf6 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -354,7 +354,10 @@ struct sysfs_dirent *sysfs_new_dirent(const char *name, umode_t mode, int type)
 
        atomic_set(&sd->s_count, 1);
        atomic_set(&sd->s_active, 0);
-       sysfs_dirent_init_lockdep(sd);
+       if (type & SYSFS_KOBJ_LINK)
+               sysfs_dirent_init_lockdep(sd, "link");
+       else
+               sysfs_dirent_init_lockdep(sd, "non_link");
 
        sd->s_name = name;
        sd->s_mode = mode;
diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
index cdd9377..a83a02b 100644
--- a/fs/sysfs/sysfs.h
+++ b/fs/sysfs/sysfs.h
@@ -89,11 +89,11 @@ static inline unsigned int sysfs_type(struct sysfs_dirent *sd)
 }
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
-#define sysfs_dirent_init_lockdep(sd)                          \
+#define sysfs_dirent_init_lockdep(sd, type)                    \
 do {                                                           \
        static struct lock_class_key __key;                     \
                                                                \
-       lockdep_init_map(&sd->dep_map, "s_active", &__key, 0);  \
+       lockdep_init_map(&sd->dep_map, "s_active_" type, &__key, 0);    \
 } while(0)
 #else
 #define sysfs_dirent_init_lockdep(sd) do {} while(0)
lines 1-51/51 (END)
Comment 4 Christian Kujau 2010-02-17 10:49:31 UTC
I was able to reproduce the lockdep warning reliably by just issuing mdadm --stop to a raid0 array, see http://nerdbynature.de/bits/2.6.33-rc8/mdadm/ for details.

With Neil's patch applied (to latest -git from Feb 16) the lockdep warning goes away.
Comment 5 Christian Kujau 2010-02-17 10:52:31 UTC
Created attachment 25085 [details]
sysfs: differentiate  between locking links and non-links for sysfs from Neil Brown, remade for -rc8
Comment 6 Rafael J. Wysocki 2010-03-21 12:38:13 UTC
Handled-By : NeilBrown <neilb@suse.de>
Patch : https://bugzilla.kernel.org/attachment.cgi?id=25085
Comment 7 Neil Brown 2010-03-21 23:53:04 UTC
This was fixed upstream by a somewhat different approach.
I think commit a2db6842873c8e5a70652f278d469128cb52db70
is probably the key part of the upstream fix.
Comment 8 Christian Kujau 2010-03-22 00:20:49 UTC
On Sun, 21 Mar 2010 at 12:38, bugzilla-daemon@bugzilla.kernel.org wrote:
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>              Status|NEW                         |RESOLVED
>          Resolution|                            |PATCH_ALREADY_AVAILABLE

Indeed, a patch is available, but I fail to see if it was included into 
mainline.

Neil: fs/sysfs/dir.c and sysfs/sysfs.h look all different now, did the 
      patch make it into the kernel some other way?

Thanks,
Christian.