0) This showed up in /var/log/messages (that log indicates that this happened shortly before suspending to ram): INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. Pid: 1561, comm: sync Not tainted 3.0.1.1-1.local0.fc14.x86_64 #1 Call Trace: [<ffffffff8108c166>] register_lock_class+0xfc/0x2ca [<ffffffff810101ed>] ? native_sched_clock+0x35/0x37 [<ffffffff810101f8>] ? sched_clock+0x9/0xd [<ffffffff8108e463>] ? mark_lock+0x2d/0x1d8 [<ffffffff8108e6f6>] __lock_acquire+0xe8/0xdd8 [<ffffffff810f3623>] ? find_get_pages_tag+0xc6/0xf2 [<ffffffff8100fcb1>] ? paravirt_read_tsc+0x9/0xd [<ffffffff810101ed>] ? native_sched_clock+0x35/0x37 [<ffffffff810101f8>] ? sched_clock+0x9/0xd [<ffffffff8107fff5>] ? sched_clock_local+0x12/0x75 [<ffffffff811638b5>] ? sync_inodes_sb+0x130/0x1f5 [<ffffffff8108f8b2>] lock_acquire+0xb7/0xfb [<ffffffff811638b5>] ? sync_inodes_sb+0x130/0x1f5 [<ffffffff8108c00f>] ? lock_release_holdtime+0x54/0x5b [<ffffffff814eedd3>] _raw_spin_lock+0x36/0x69 [<ffffffff811638b5>] ? sync_inodes_sb+0x130/0x1f5 [<ffffffff814ef75f>] ? _raw_spin_unlock+0x2b/0x2f [<ffffffff811638b5>] sync_inodes_sb+0x130/0x1f5 [<ffffffff814ed3ac>] ? wait_for_common+0xbb/0x10a [<ffffffff81167215>] ? __sync_filesystem+0x7a/0x7a [<ffffffff811671e2>] __sync_filesystem+0x47/0x7a [<ffffffff8116722b>] sync_one_sb+0x16/0x18 [<ffffffff81145c13>] iterate_supers+0x72/0xc7 [<ffffffff81167160>] sync_filesystems+0x20/0x22 [<ffffffff8116729e>] sys_sync+0x21/0x33 [<ffffffff814f6242>] system_call_fastpath+0x16/0x1b 1) I've peeked around a bit in the vmlinux that corresponds with this kernel. I focused on sync_inodes_sb(). Note that 0x130 == 304. $ gdb ~/Rpmbuild/BUILD/kernel-3.0.fc14/linux-3.0.x86_64/vmlinux Reading symbols from /home/[...]/Rpmbuild/BUILD/kernel-3.0.fc14/linux-3.0.x86_64/vmlinux...done. (gdb) disassemble /m sync_inodes_sb+304 Dump of assembler code for function sync_inodes_sb: [...] 1146 1147 spin_lock(&inode->i_lock); 1148 if ((inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) || 0xffffffff811638b5 <+304>: testb $0x38,0x70(%rbx) 0xffffffff811638b9 <+308>: jne 0xffffffff811638c5 <sync_inodes_sb+320> 0xffffffff811638bb <+310>: cmpq $0x0,0x130(%r15) 0xffffffff811638c3 <+318>: jne 0xffffffff811638cf <sync_inodes_sb+330> [...] (gdb) info line 1147 Line 1147 of "fs/fs-writeback.c" is at address 0xffffffff811638b5 <sync_inodes_sb+304> but contains no code. (gdb) info line 1147 Line 1147 of "fs/fs-writeback.c" is at address 0xffffffff811638b5 <sync_inodes_sb+304> but contains no code. (gdb) list 1125,1159 1125 static void wait_sb_inodes(struct super_block *sb) 1126 { 1127 struct inode *inode, *old_inode = NULL; [...] 1144 list_for_each_entry(inode, &sb->s_inodes, i_sb_list) { 1145 struct address_space *mapping = inode->i_mapping; 1146 1147 spin_lock(&inode->i_lock); 1148 if ((inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) || 1149 (mapping->nrpages == 0)) { 1150 spin_unlock(&inode->i_lock); 1151 continue; 1152 } 1153 __iget(inode); 1154 spin_unlock(&inode->i_lock); [...] wait_sb_inodes() appears to be inlined in sync_inodes_sb(). 2) It seems that somehow lockdep is unhappy being fed inode->i_lock. I have no clue what's going on. (Is it that struct inode *inode is variable only valid in wait_sb_inodes and thus "non-static"? Is it that the spin_lock() call needs to be replaced with something with lockdep annotation. If so, how?) 3) Perhaps this is just a one time issue. I recorded it here (with a bit of analysis) in case some else runs into this too (or I manage to trigger this again).
I cannot see how this occurs. I cannot see how we'd get an inode with an invalid lockdep class on the list at this point as the list is protected here by the inode_sb_list_lock() and the inodes are initialised before adding and removed before being freed. As it is, I can't find my way through the maze of preprocessor and inline/macro indirection and config option ifdefs to work out exactly what code is being executed here. The locking code is an ugly tangled mass of spaghetti. AFAICT, we end up finally here: static inline void __raw_spin_lock(raw_spinlock_t *lock) { preempt_disable(); spin_acquire(&lock->dep_map, 0, 0, _RET_IP_); LOCK_CONTENDED(lock, do_raw_spin_trylock, do_raw_spin_lock); } passing a subclass of 0, which is what lockdep is complaining it can't find in the inode->i_lock. I really don't what sort of bug that implies - lockdep experts? As it is, I can't see how I'm going to track this done down without a reproducable test case. Any ideas on what might have caused it?
0) akpm@linux-foundation.org marked this as a regression. And rjw@sisk.pl made this block bug# 36912. 1) That might well be correct. But please note that the 3.0.1 kernel I triggered this issue with seems to be the first I ran on this machine that has CONFIG_LOCKDEP set. So on this machine this issue - if it's more than a one time goof - would never have showed up in the logs (lockdep.o depending on CONFIG_LOCKDEP). (2) Uninteresting detail: I changed the way I built the kernel for this machine - which still tracks Fedora 14 - after it was made clear that v2.6.39.4 would be the last stable release for v2.6.39. I now sort of rebuilt a Fedora Rawhide kernel, but dropping Fedora's non-upstreamed patches.)
(In reply to comment #1) > Any ideas on what might have caused it? 0) This happened in a long session (a few days uptime). Looking through the (part of the) log messages for that session I see dozens of suspend and resume cycles. I also see one hibernate and thaw cycle too. 1) What surrounded this _seems_ to be: - hibernate - thaw - two of the three "Poison overwritten" BUGs of bug #41102 - this message - a suspend and resume cycle - the third "Poison overwritten" BUG of bug #41102 2) So my _guess_ would be a hibernate and thaw cycle gone subtly wrong. On the other hand: hibernation problems tend to be much more serious, don't they? And this session even shut down cleanly so it wasn't in panic level problems or something similar. 3) As I said in comment #0: > I recorded it here (with a bit of analysis) in case some else runs into this > too (or I manage to trigger this again).
(In reply to comment #1) > I cannot see how this occurs. [ ... ] Any ideas on what might have caused it? Although I'm not sure that's what's going on here: lockdep also prints this complaint when the lockdep map contains uninitialized data. Maybe a pointer is invalid or a use-after-free has occurred.
On Tuesday, August 30, 2011, Paul Bolle wrote: > On Sun, 2011-08-28 at 21:01 +0200, Rafael J. Wysocki wrote: > > Please verify if it still should > > be listed and let the tracking team know (either way). > > From comment #2: > > [...] please note that the 3.0.1 kernel I > > triggered this issue with seems to be the first I ran on this machine that > has > > CONFIG_LOCKDEP set. So on this machine this issue - if it's more than a one > > time goof - would never have showed up in the logs [...]. > > At this moment I see little reason to track this bug entry as a > regression.
Dropping from the list of recent regressions.