Bug 11551

Summary: Semi-repeatable hard lockup on 2.6.27-rc6
Product: Platform Specific/Hardware Reporter: Rafael J. Wysocki (rjw)
Component: i386Assignee: platform_x86_64 (platform_x86_64)
Status: CLOSED UNREPRODUCIBLE    
Severity: normal CC: steven
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.27-rc6 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 11167    

Description Rafael J. Wysocki 2008-09-12 11:20:45 UTC
Subject    : Semi-repeatable hard lockup on 2.6.27-rc6
Submitter  : "Steven Noonan" <steven@uplinklabs.net>
Date       : 2008-09-10 18:07
References : http://marc.info/?l=linux-kernel&m=122107007407994&w=4

This entry is being used for tracking a regression from 2.6.26.  Please don't
close it until the problem is fixed in the mainline.
Comment 1 Steven Noonan 2008-09-12 19:16:22 UTC
Actually, this affects an x86 machine, not x86_64.
Comment 2 Steven Noonan 2008-09-12 19:36:13 UTC
Also, apologies to whomever posted this: http://www.gossamer-threads.com/lists/linux/kernel/972192#972192

My email isn't receiving some of the linux-kernel messages for some reason.

The machine is an HP dv5178us. Intel Core Duo 1.66GHz, 2GB RAM, 200GB hard drive. I'm running Linux 2.6.27-rc6 i686 on there.
Comment 3 Rafael J. Wysocki 2008-09-26 16:01:40 UTC
On Sunday, 21 of September 2008, Steven Noonan wrote:
> On Sun, Sep 21, 2008 at 11:54 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.26.  Please verify if it still should be listed and let me know
> > (either way).
> >
> >
> > Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=11551
> > Subject         : Semi-repeatable hard lockup on 2.6.27-rc6
> > Submitter       : Steven Noonan <steven@uplinklabs.net>
> > Date            : 2008-09-10 18:07 (12 days old)
> > References      : http://marc.info/?l=linux-kernel&m=122107007407994&w=4
> >
> >
> 
> The machine with these symptoms was sent in for service on Friday. I
> suspect there may have been dodgy hardware involved on this one. I
> think this bug should be closed for the time being. Once I get the
> machine back, I'll reopen the bug if I can still reproduce it.
Comment 4 Steven Noonan 2008-10-09 21:13:49 UTC
So I got the machine back from repairs, and I booted the kernel I was running before sending it in. This showed up in dmesg (I don't know if it's fixed yet, but it makes -perfect- sense for why it locked up on 'Waiting for udev events...' before):

[   47.087247] =======================================================
[   47.087806] [ INFO: possible circular locking dependency detected ]
[   47.088133] 2.6.27-rc6-tip-00275-g44c7698 #1
[   47.088417] -------------------------------------------------------
[   47.088744] udevd/1202 is trying to acquire lock:
[   47.088749]  (&inode->inotify_mutex){--..}, at: [<c01a8aa3>] inotify_inode_queue_event+0x3e/0xb6
[   47.088763] 
[   47.088765] but task is already holding lock:
[   47.088769]  (&mm->mmap_sem){----}, at: [<c017380b>] sys_munmap+0x20/0x3c
[   47.088779] 
[   47.088781] which lock already depends on the new lock.
[   47.088783] 
[   47.088785] 
[   47.088787] the existing dependency chain (in reverse order) is:
[   47.088790] 
[   47.088792] -> #3 (&mm->mmap_sem){----}:
[   47.088797]        [<c0147e56>] validate_chain+0x839/0xaf5
[   47.088806]        [<c014873a>] __lock_acquire+0x628/0x6c0
[   47.088813]        [<c014881a>] lock_acquire+0x48/0x64
[   47.088819]        [<c0170901>] might_fault+0x51/0x71
[   47.088825]        [<c02b1924>] copy_to_user+0x2d/0x41
[   47.088832]        [<c01a99f5>] inotify_read+0x108/0x173
[   47.088838]        [<c0184442>] vfs_read+0x8f/0x10b
[   47.088845]        [<c018475c>] sys_read+0x40/0x65
[   47.088851]        [<c0102f5d>] sysenter_do_call+0x12/0x35
[   47.088857]        [<ffffffff>] 0xffffffff
[   47.088869] 
[   47.088870] -> #2 (&dev->ev_mutex){--..}:
[   47.088875]        [<c0147e56>] validate_chain+0x839/0xaf5
[   47.088882]        [<c014873a>] __lock_acquire+0x628/0x6c0
[   47.088889]        [<c014881a>] lock_acquire+0x48/0x64
[   47.088895]        [<c0471cdf>] mutex_lock_nested+0xcd/0x24f
[   47.088903]        [<c01a9793>] inotify_dev_queue_event+0x25/0x12b
[   47.088910]        [<c01a8aec>] inotify_inode_queue_event+0x87/0xb6
[   47.088916]        [<c01a90c6>] inotify_dentry_parent_queue_event+0x69/0x83
[   47.088923]        [<c0184afd>] __fput+0x5c/0x140
[   47.088929]        [<c0184e46>] fput+0x1c/0x1e
[   47.088934]        [<c018266a>] filp_close+0x55/0x5f
[   47.088941]        [<c0183716>] sys_close+0x6d/0xa6
[   47.088947]        [<c0102f5d>] sysenter_do_call+0x12/0x35
[   47.088953]        [<ffffffff>] 0xffffffff
[   47.088959] 
[   47.088961] -> #1 (&ih->mutex){--..}:
[   47.088966]        [<c0147e56>] validate_chain+0x839/0xaf5
[   47.088974]        [<c014873a>] __lock_acquire+0x628/0x6c0
[   47.088980]        [<c014881a>] lock_acquire+0x48/0x64
[   47.088986]        [<c0471cdf>] mutex_lock_nested+0xcd/0x24f
[   47.088993]        [<c01a88d6>] inotify_find_update_watch+0x4d/0x8e
[   47.088999]        [<c01a9b23>] sys_inotify_add_watch+0xc3/0x164
[   47.089006]        [<c0102f5d>] sysenter_do_call+0x12/0x35
[   47.089011]        [<ffffffff>] 0xffffffff
[   47.089029] 
[   47.089031] -> #0 (&inode->inotify_mutex){--..}:
[   47.089036]        [<c0147b98>] validate_chain+0x57b/0xaf5
[   47.089043]        [<c014873a>] __lock_acquire+0x628/0x6c0
[   47.089049]        [<c014881a>] lock_acquire+0x48/0x64
[   47.089055]        [<c0471cdf>] mutex_lock_nested+0xcd/0x24f
[   47.089062]        [<c01a8aa3>] inotify_inode_queue_event+0x3e/0xb6
[   47.089068]        [<c01a90c6>] inotify_dentry_parent_queue_event+0x69/0x83
[   47.089075]        [<c0184afd>] __fput+0x5c/0x140
[   47.089081]        [<c0184e46>] fput+0x1c/0x1e
[   47.089087]        [<c0171f7d>] remove_vma+0x2d/0x4c
[   47.089093]        [<c017299c>] do_munmap+0x191/0x1ab
[   47.089099]        [<c0173818>] sys_munmap+0x2d/0x3c
[   47.089106]        [<c0102f5d>] sysenter_do_call+0x12/0x35
[   47.089111]        [<ffffffff>] 0xffffffff
[   47.089118] 
[   47.089119] other info that might help us debug this:
[   47.089121] 
[   47.089125] 1 lock held by udevd/1202:
[   47.089128]  #0:  (&mm->mmap_sem){----}, at: [<c017380b>] sys_munmap+0x20/0x3c
[   47.089137] 
[   47.089138] stack backtrace:
[   47.089143] Pid: 1202, comm: udevd Not tainted 2.6.27-rc6-tip-00275-g44c7698 #1
[   47.089148]  [<c0147612>] print_circular_bug_tail+0xa1/0xac
[   47.089157]  [<c0147b98>] validate_chain+0x57b/0xaf5
[   47.089164]  [<c0145473>] ? save_trace+0x37/0x88
[   47.089171]  [<c014873a>] __lock_acquire+0x628/0x6c0
[   47.089179]  [<c014881a>] lock_acquire+0x48/0x64
[   47.089185]  [<c01a8aa3>] ? inotify_inode_queue_event+0x3e/0xb6
[   47.089192]  [<c0471cdf>] mutex_lock_nested+0xcd/0x24f
[   47.089199]  [<c01a8aa3>] ? inotify_inode_queue_event+0x3e/0xb6
[   47.089206]  [<c01a8aa3>] ? inotify_inode_queue_event+0x3e/0xb6
[   47.089214]  [<c01a8aa3>] inotify_inode_queue_event+0x3e/0xb6
[   47.089220]  [<c01a90b0>] ? inotify_dentry_parent_queue_event+0x53/0x83
[   47.089228]  [<c01a90c6>] inotify_dentry_parent_queue_event+0x69/0x83
[   47.089235]  [<c0184afd>] __fput+0x5c/0x140
[   47.089241]  [<c0184e46>] fput+0x1c/0x1e
[   47.089246]  [<c0171f7d>] remove_vma+0x2d/0x4c
[   47.089253]  [<c017299c>] do_munmap+0x191/0x1ab
[   47.089259]  [<c0173818>] sys_munmap+0x2d/0x3c
[   47.089266]  [<c0102f5d>] sysenter_do_call+0x12/0x35
[   47.089273]  =======================
Comment 5 Ingo Molnar 2008-10-11 03:51:34 UTC
> [   47.087247] =======================================================
> [   47.087806] [ INFO: possible circular locking dependency detected ]
> [   47.088133] 2.6.27-rc6-tip-00275-g44c7698 #1
> [   47.088417] -------------------------------------------------------
> [   47.088744] udevd/1202 is trying to acquire lock:
> [   47.088749]  (&inode->inotify_mutex){--..}, at: [<c01a8aa3>]
> inotify_inode_queue_event+0x3e/0xb6

this is known and should be fixed in v2.6.27 and latest tip/master, 
could you please try it and check whether the message went away? Thanks,

	Ingo
Comment 6 Steven Noonan 2008-10-11 13:49:42 UTC
Ah, good. It's fixed then.

I've been running 2.6.27 on several branches, and it seems fine so far.