Bug 41092 - INFO: trying to register non-static key (sync_inodes_sb)
Summary: INFO: trying to register non-static key (sync_inodes_sb)
Status: RESOLVED INSUFFICIENT_DATA
Alias: None
Product: File System
Classification: Unclassified
Component: VFS (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: fs_vfs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-08-13 22:07 UTC by Paul Bolle
Modified: 2012-08-30 10:31 UTC (History)
6 users (show)

See Also:
Kernel Version: 3.0.1
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Paul Bolle 2011-08-13 22:07:12 UTC
0) This showed up in /var/log/messages (that log indicates that this happened shortly before suspending to ram):

INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
Pid: 1561, comm: sync Not tainted 3.0.1.1-1.local0.fc14.x86_64 #1
Call Trace:
 [<ffffffff8108c166>] register_lock_class+0xfc/0x2ca
 [<ffffffff810101ed>] ? native_sched_clock+0x35/0x37
 [<ffffffff810101f8>] ? sched_clock+0x9/0xd
 [<ffffffff8108e463>] ? mark_lock+0x2d/0x1d8
 [<ffffffff8108e6f6>] __lock_acquire+0xe8/0xdd8
 [<ffffffff810f3623>] ? find_get_pages_tag+0xc6/0xf2
 [<ffffffff8100fcb1>] ? paravirt_read_tsc+0x9/0xd
 [<ffffffff810101ed>] ? native_sched_clock+0x35/0x37
 [<ffffffff810101f8>] ? sched_clock+0x9/0xd
 [<ffffffff8107fff5>] ? sched_clock_local+0x12/0x75
 [<ffffffff811638b5>] ? sync_inodes_sb+0x130/0x1f5
 [<ffffffff8108f8b2>] lock_acquire+0xb7/0xfb
 [<ffffffff811638b5>] ? sync_inodes_sb+0x130/0x1f5
 [<ffffffff8108c00f>] ? lock_release_holdtime+0x54/0x5b
 [<ffffffff814eedd3>] _raw_spin_lock+0x36/0x69
 [<ffffffff811638b5>] ? sync_inodes_sb+0x130/0x1f5
 [<ffffffff814ef75f>] ? _raw_spin_unlock+0x2b/0x2f
 [<ffffffff811638b5>] sync_inodes_sb+0x130/0x1f5
 [<ffffffff814ed3ac>] ? wait_for_common+0xbb/0x10a
 [<ffffffff81167215>] ? __sync_filesystem+0x7a/0x7a
 [<ffffffff811671e2>] __sync_filesystem+0x47/0x7a
 [<ffffffff8116722b>] sync_one_sb+0x16/0x18
 [<ffffffff81145c13>] iterate_supers+0x72/0xc7
 [<ffffffff81167160>] sync_filesystems+0x20/0x22
 [<ffffffff8116729e>] sys_sync+0x21/0x33
 [<ffffffff814f6242>] system_call_fastpath+0x16/0x1b

1) I've peeked around a bit in the vmlinux that corresponds with this kernel. I focused on sync_inodes_sb(). Note that 0x130 == 304.

$ gdb ~/Rpmbuild/BUILD/kernel-3.0.fc14/linux-3.0.x86_64/vmlinux
Reading symbols from /home/[...]/Rpmbuild/BUILD/kernel-3.0.fc14/linux-3.0.x86_64/vmlinux...done.
(gdb) disassemble /m sync_inodes_sb+304
Dump of assembler code for function sync_inodes_sb:
[...]
1146	
1147			spin_lock(&inode->i_lock);
1148			if ((inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) ||
   0xffffffff811638b5 <+304>:	testb  $0x38,0x70(%rbx)
   0xffffffff811638b9 <+308>:	jne    0xffffffff811638c5 <sync_inodes_sb+320>
   0xffffffff811638bb <+310>:	cmpq   $0x0,0x130(%r15)
   0xffffffff811638c3 <+318>:	jne    0xffffffff811638cf <sync_inodes_sb+330>
[...]
(gdb) info line 1147
Line 1147 of "fs/fs-writeback.c" is at address 0xffffffff811638b5 <sync_inodes_sb+304> but contains no code.
(gdb) info line 1147
Line 1147 of "fs/fs-writeback.c" is at address 0xffffffff811638b5 <sync_inodes_sb+304> but contains no code.
(gdb) list 1125,1159
1125	static void wait_sb_inodes(struct super_block *sb)
1126	{
1127		struct inode *inode, *old_inode = NULL;
[...]
1144		list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
1145			struct address_space *mapping = inode->i_mapping;
1146	
1147			spin_lock(&inode->i_lock);
1148			if ((inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) ||
1149			    (mapping->nrpages == 0)) {
1150				spin_unlock(&inode->i_lock);
1151				continue;
1152			}
1153			__iget(inode);
1154			spin_unlock(&inode->i_lock);
[...]

wait_sb_inodes() appears to be inlined in sync_inodes_sb().

2) It seems that somehow lockdep is unhappy being fed inode->i_lock. I have no clue what's going on. (Is it that struct inode *inode is variable only valid in wait_sb_inodes and thus "non-static"? Is it that the spin_lock() call needs to be replaced with something with lockdep annotation. If so, how?)

3) Perhaps this is just a one time issue. I recorded it here (with a bit of analysis) in case some else runs into this too (or I manage to trigger this again).
Comment 1 Dave Chinner 2011-08-16 23:27:19 UTC
I cannot see how this occurs. I cannot see how we'd get an inode with an invalid lockdep class on the list at this point as the list is protected here by the inode_sb_list_lock() and the inodes are initialised before adding and removed before being freed.

As it is, I can't find my way through the maze of preprocessor and inline/macro indirection and config option ifdefs to work out exactly what code is being executed here. The locking code is an ugly tangled mass of spaghetti. AFAICT, we end up finally here:

static inline void __raw_spin_lock(raw_spinlock_t *lock)                         
{                                                                                
        preempt_disable();                                                       
        spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);                            
        LOCK_CONTENDED(lock, do_raw_spin_trylock, do_raw_spin_lock);             
}

passing a subclass of 0, which is what lockdep is complaining it can't find in the inode->i_lock. I really don't what sort of bug that implies - lockdep experts?

As it is, I can't see how I'm going to track this done down without a reproducable test case. Any ideas on what might have caused it?
Comment 2 Paul Bolle 2011-08-17 09:56:05 UTC
0) akpm@linux-foundation.org marked this as a regression. And rjw@sisk.pl 	made this block bug# 36912.

1) That might well be correct. But please note that the 3.0.1 kernel I triggered this issue with seems to be the first I ran on this machine that has CONFIG_LOCKDEP set. So on this machine this issue - if it's more than a one time goof - would never have showed up in the logs (lockdep.o depending on CONFIG_LOCKDEP).

(2) Uninteresting detail: I changed the way I built the kernel for this machine - which still tracks Fedora 14 - after it was made clear that v2.6.39.4 would be the last stable release for v2.6.39. I now sort of rebuilt a Fedora Rawhide kernel, but dropping Fedora's non-upstreamed patches.)
Comment 3 Paul Bolle 2011-08-19 08:23:23 UTC
(In reply to comment #1)
> Any ideas on what might have caused it?

0) This happened in a long session (a few days uptime). Looking through the (part of the) log messages for that session I see dozens of suspend and resume cycles. I also see one hibernate and thaw cycle too.

1) What surrounded this _seems_ to be:
- hibernate
- thaw
- two of the three "Poison overwritten" BUGs of bug #41102
- this message
- a suspend and resume cycle
- the third "Poison overwritten" BUG of bug #41102

2) So my _guess_ would be a hibernate and thaw cycle gone subtly wrong. On the other hand: hibernation problems tend to be much more serious, don't they? And this session even shut down cleanly so it wasn't in panic level problems or something similar.

3) As I said in comment #0:
> I recorded it here (with a bit of analysis) in case some else runs into this
> too (or I manage to trigger this again).
Comment 4 Bart Van Assche 2011-08-28 19:48:58 UTC
(In reply to comment #1)
> I cannot see how this occurs. [ ... ] Any ideas on what might have caused it?

Although I'm not sure that's what's going on here: lockdep also prints this complaint when the lockdep map contains uninitialized data. Maybe a pointer is invalid or a use-after-free has occurred.
Comment 5 Rafael J. Wysocki 2011-08-30 18:42:14 UTC
On Tuesday, August 30, 2011, Paul Bolle wrote:
> On Sun, 2011-08-28 at 21:01 +0200, Rafael J. Wysocki wrote:
> > Please verify if it still should
> > be listed and let the tracking team know (either way).
> 
> From comment #2:
> > [...] please note that the 3.0.1 kernel I
> > triggered this issue with seems to be the first I ran on this machine that
> has
> > CONFIG_LOCKDEP set. So on this machine this issue - if it's more than a one
> > time goof - would never have showed up in the logs [...].
> 
> At this moment I see little reason to track this bug entry as a
> regression.
Comment 6 Rafael J. Wysocki 2011-08-30 18:43:17 UTC
Dropping from the list of recent regressions.

Note You need to log in before you can comment on or make changes to this bug.