Bug 14256

Summary: kernel BUG at fs/ext3/super.c:435
Product: File System Reporter: Rafael J. Wysocki (rjw)
Component: ext3Assignee: Jan Kara (jack)
Status: CLOSED CODE_FIX    
Severity: normal CC: axboe, carlos.velasco, dvhart, jack, mikpelinux, sandeen, tytso
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.31 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 13615    
Attachments: Patch reporting lost inode references
Test program causing lost inode references
lsof before umount in a reboot without busy error
lsof before umount in a "bad" reboot with busy error

Description Rafael J. Wysocki 2009-09-29 21:48:31 UTC
Subject    : [2.6.31] kernel BUG at fs/ext3/super.c:435
Submitter  : Mikael Pettersson <mikpe@it.uu.se>
Date       : 2009-09-21 7:29
References : http://marc.info/?l=linux-kernel&m=125351816109264&w=4

This entry is being used for tracking a regression from 2.6.30.  Please don't
close it until the problem is fixed in the mainline.
Comment 1 Rafael J. Wysocki 2009-10-04 20:47:41 UTC
On Sunday 04 October 2009, Mikael Pettersson wrote:
> Rafael J. Wysocki wrote:
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.30 and 2.6.31.  Please verify if it still should
> > be listed and let me know (either way).
> > 
> > 
> > Bug-Entry   : http://bugzilla.kernel.org/show_bug.cgi?id=3D14256
> > Subject             : kernel BUG at fs/ext3/super.c:435
> > Submitter   : Mikael Pettersson <mikpe@it.uu.se>
> > Date                : 2009-09-21 7:29 (11 days old)
> > References  : http://marc.info/?l=3Dlinux-kernel&m=3D125351816109264&w=3D4
> 
> The exact same bug (same cause, same symptom) just hit me again in
> 2.6.32-rc1.
Comment 2 Rafael J. Wysocki 2009-10-09 22:01:43 UTC
On Friday 09 October 2009, Mikael Pettersson wrote:
> Mikael Pettersson writes:
>  > Rafael J. Wysocki writes:
>  >  > On Sunday 04 October 2009, Mikael Pettersson wrote:
>  >  > > Rafael J. Wysocki wrote:
>  >  > > > The following bug entry is on the current list of known regressions
>  >  > > > introduced between 2.6.30 and 2.6.31.  Please verify if it still
>  should
>  >  > > > be listed and let me know (either way).
>  >  > > > 
>  >  > > > 
>  >  > > > Bug-Entry   : http://bugzilla.kernel.org/show_bug.cgi?id=3D14256
>  >  > > > Subject             : kernel BUG at fs/ext3/super.c:435
>  >  > > > Submitter   : Mikael Pettersson <mikpe@it.uu.se>
>  >  > > > Date                : 2009-09-21 7:29 (11 days old)
>  >  > > > References  :
>  http://marc.info/?l=3Dlinux-kernel&m=3D125351816109264&w=3D4
>  >  > > 
>  >  > > The exact same bug (same cause, same symptom) just hit me again in
>  2.6.32-rc1.
>  >  > 
>  >  > Thanks for the update.
>  >  > 
>  >  > Could you check the current Linus' tree, please?  There are some known
>  >  > regression fixes in there.
>  > 
>  > I tried simplified versions of the bug trigger on two machines
>  > running 2.6.32-rc1-git6, and neither triggered the kernel bug.
>  > 
>  > The original recipe involved doing a glibc rebuild, run its test
>  > suite, install it, and reboot. Today however machine 1 was already
>  > doing a rebuild so after the rebuild it did a reboot into the new
>  > kernel before the install. The second machine booted the new kernel
>  > directly to install the binary packages from the first machine.
>  > 
>  > I'll re-run the full bug trigger recipe on a third machine later next
>  > week (it must rebuild glibc itself anyway due to arch differences).
> 
> Not fixed in 2.6.32-rc3. A glibc rebuild + install triggered the
> exact same bug on the third machine.
Comment 3 Theodore Tso 2009-10-10 01:31:41 UTC
On Fri, Oct 09, 2009 at 10:01:44PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> > Not fixed in 2.6.32-rc3. A glibc rebuild + install triggered the
> > exact same bug on the third machine.

Can you isolate this down to something simpler?  Will "make install"
followed by a reboot be sufficient to replicate the assertion failure?
If it does, will copying in a new version of libc.so.6 and then
rebooting be enough to trigger the assert failure?

Thanks,

	     	       	       	   	  - Ted
Comment 4 Mikael Pettersson 2009-10-12 13:48:39 UTC
Theodore Tso writes:
 > On Fri, Oct 09, 2009 at 10:01:44PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
 > > > Not fixed in 2.6.32-rc3. A glibc rebuild + install triggered the
 > > > exact same bug on the third machine.
 > 
 > Can you isolate this down to something simpler?  Will "make install"
 > followed by a reboot be sufficient to replicate the assertion failure?

I've done repeated glibc downgrades and upgrades, using pre-built binary
packages, followed by reboots, but they have so far not triggered the bug.

My guess is that something in the build process or the test suite run
triggers the bug.

FWIW, the systems are rpm-based derived from Fedora, so the rebuilds
are just "rpmbuild --rebuild glibc-<version>.src.rpm", and the installs
are "rpm -Uvh" on the binary rpms, with --force if I'm doing a downgrade.
Comment 5 Eric Sandeen 2009-10-12 16:19:54 UTC
I can give this a whirl on fedora.  Just to try to reproduce more exactly, which version of the glibc rpm package are you rebuilding?

I'm also curious, does simply rebuilding the package and rebooting lead to the same issue (w/o installing the result?)

Do you have rpm configured in any unique way or is the build happening in all the normal/standard places?

So you have /tmp as ext3 as a separate fs, any other interesting fs config?

Thanks,
-Eric
Comment 6 Jan Kara 2009-11-05 16:07:31 UTC
Eric, have you succeeded in reproducing this?

What seems to happen is that while we are in ext3_put_super(), there are still some inode references held and therefore the inodes (regular files) don't get deleted and we complain about them in ext3_put_super. The question is who holds the reference... I did a quick audit of the code and couldn't find iget() unmatched with iput() or a similar problem.

I guess I can create a debugging patch which might be able to show us who holds the reference. Will try to code it next week. Mikael, will you be able to run such debugging patch?
Comment 7 Eric Sandeen 2009-11-05 16:13:15 UTC
Jan, nope haven't, to be honest fell off my radar as I was looking at that ext4 corruption.  I've not tried to reproduce it yet.

Sorry,
-Eric
Comment 8 Jan Kara 2009-11-11 15:27:53 UTC
Created attachment 23749 [details]
Patch reporting lost inode references

This patch reports more information about inodes left in orphan list.
Comment 9 Jan Kara 2009-11-11 15:29:18 UTC
Mikael, could you please run a kernel with the above patch and post here resulting messages during umount and also attach here your System.map file from the tested kernel? Thanks.
Comment 10 Theodore Tso 2009-11-20 15:11:21 UTC
Hi Mikael,

Have you had a chance to try running a kernel with Jan's debugging patch?   

Is this a problem you can reproduce reliably, or was it a one-off?
Comment 11 Rafael J. Wysocki 2009-11-21 14:27:39 UTC
On Saturday 21 November 2009, Mikael Pettersson wrote:
> Rafael J. Wysocki wrote:
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.30 and 2.6.31.  Please verify if it still should
> > be listed and let me know (either way).
> > 
> > 
> > Bug-Entry   : http://bugzilla.kernel.org/show_bug.cgi?id=3D14256
> > Subject             : kernel BUG at fs/ext3/super.c:435
> > Submitter   : Mikael Pettersson <mikpe@it.uu.se>
> > Date                : 2009-09-21 7:29 (57 days old)
> > References  : http://marc.info/?l=3Dlinux-kernel&m=3D125351816109264&w=3D4
> 
> Still occurs with kernel 2.6.32-rc8.
> 
> This time I only rebuilt the glibc src rpm, which also runs its
> test suite, but I did not install the new glibc binaries before
> the shutdown -r.
Comment 12 Jan Kara 2009-12-08 11:12:19 UTC
Ah, I've noticed that Mikael was not on CC of this bug so he didn't see my messages. Mikael, could you please try running the debugging patch from comment #8/#9? Thanks.
Comment 13 Mikael Pettersson 2009-12-08 11:47:41 UTC
(In reply to comment #12)
> Ah, I've noticed that Mikael was not on CC of this bug so he didn't see my
> messages. Mikael, could you please try running the debugging patch from
> comment
> #8/#9? Thanks.

I'll test the debug patch this weekend, right now I'm away from the machines that trigger the problem.

(Since I issued all my bug reports on LKML it never occurred to me to add a cc: to myself in this bugzilla entry. Sorry about that.)
Comment 14 Mikael Pettersson 2009-12-14 12:40:25 UTC
With Jan Kara'a debugging patch on top of 2.6.32 final I get the following after shutdown -r:

Unmounting file systems:  sb orphan head is 22
sb_info orphan list:
  inode sda2:22 at cc7c1618: mode 100600, nlink 0, next 13, i_count=2501 dentry=
NULL
  iget IPs:
 c009e590,
 c009dd08,
  inode sda2:13 at d1d932b8: mode 100600, nlink 0, next 12, i_count=2 dentry=
NULL
  iget IPs:
 c009e590,
 c009dd08,
  inode sda2:12 at d1de0618: mode 100600, nlink 0, next 0, i_count=1 dentry=
NULL
  iget IPs:
 c009e590,
 c009dd08,
kernel BUG at fs/ext3/super.c:450!

c009e590 is fs/fs-writeback.c:writeback_inodes_wb(), line 681.

c009dd08 is fs/fs-writeback.c:wait_sb_inodes(), line 1170.

I get this with both gcc-4.3.4 and gcc-4.4.2 compiled kernels.

On a second machine I got a similar result, except the three inodes had only one iget IP recorded, and it was the one in writeback_inodes_wb().
Comment 15 Jan Kara 2009-12-14 14:46:07 UTC
Thanks for the debugging. So the problematic files were regular files that lead pretty simple life - no references from unexpected places or so. What catches my eye is that inode numbers are surprisingly small (22, 12, 13; in your original report 17, 15). Since normal inode numbers start by 11 and that is usually taken by lost+found, numbers 12 and 13 are as small as they can get. I suppose this is a filesystem dedicated for the build, isn't it? And the files making problems are created as a first files on that filesystem in its root directory. Any idea what those files are?

Another thing: i_count=2501 for inode 22 is absurdly high. We'd have to leak references really heavily for that file.

I've checked the code in writeback_inodes_wb() and wait_sb_inodes() but reference counting there looks fine. So I have no clue what's going on right now. Anyway, adding Jens to CC just in case it rings a bell to him.
Comment 16 Mikael Pettersson 2009-12-15 09:08:45 UTC
The affected file system is /tmp. There's nothing special about it, except it being separate from / and /home. The builds are plain "rpmbuild --rebuild" so they do most accesses under /usr/src/ and /var/tmp/, both on /, but the glibc test suite will create temporary files on /tmp. I have no idea what those files are.
Comment 17 carlos.velasco 2009-12-29 13:18:03 UTC
I have reproduced the exact same problem in a new LFS system using 2.6.32.2 kernel.

The bug is definitely triggered but issuing a "make check" of eglibc before a reboot. You do not need to install it or just build it, whatever triggers this bug is related to something that eglibc makes during tests.

---

brw-rw---- 1 root disk 253, 3 Dec 29  2009 /dev/dm-3

brw-------  1 root root 253,  3 Dec 29  2009 LV0-tmp


Unmounting all other currently mounted file systems...
/etc/rc.d/rc6.d/S70mountfs: line 18: 22979 Segmentation fault      umount -a -d -r > /dev/null
[ FAIL ]


[  903.074244] sb orphan head is 13
[  903.088507] sb_info orphan list:
[  903.102590]   inode dm-3:13 at ffff880114948a10: mode 100600, nlink 0, next 12
[  903.131992]   inode dm-3:12 at ffff880114907340: mode 100600, nlink 0, next 0
[  903.149056] ------------[ cut here ]------------
[  903.149057] kernel BUG at fs/ext3/super.c:435!
[  903.149058] invalid opcode: 0000 [#1] SMP 
[  903.149060] last sysfs file: /sys/class/net/bonding_masters
[  903.149061] CPU 2 
[  903.149062] Modules linked in: bonding ipv6 bnx2 usbhid button hpilo
[  903.149067] Pid: 22979, comm: umount Not tainted 2.6.32.2 #1 ProLiant DL360 G6
[  903.149068] RIP: 0010:[<ffffffff8117d883>]  [<ffffffff8117d883>] ext3_put_super+0x233/0x280
[  903.149075] RSP: 0018:ffff88011cd6be28  EFLAGS: 00010202
[  903.149076] RAX: ffff8801149489c8 RBX: ffff88011d857e00 RCX: ffffffff815e0800
[  903.149077] RDX: 00000000007f3f13 RSI: 0000000000000046 RDI: ffffffff816dc130
[  903.149078] RBP: ffff88011cd6be58 R08: 0000000000000000 R09: 0000000000000001
[  903.149080] R10: 0000000000000000 R11: 000000000e200000 R12: ffff88011d3db400
[  903.149081] R13: ffff88011d857f98 R14: ffff88011d857f98 R15: ffff88011d0cc060
[  903.149083] FS:  00007f9f88456730(0000) GS:ffff88002c040000(0000) knlGS:0000000000000000
[  903.149084] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  903.149086] CR2: 00007f9f8845c000 CR3: 000000011dc84000 CR4: 00000000000006e0
[  903.149087] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  903.149088] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  903.149090] Process umount (pid: 22979, threadinfo ffff88011cd6a000, task ffff88011d0cc060)
[  903.149091] Stack:
[  903.149092]  ffff880100000000 ffff88011d3db400 ffff88011d3db400 ffffffff814a4ac0
[  903.149094] <0> ffffffff8163d1b8 ffff88011d3db400 ffff88011cd6be78 ffffffff810f6cc1
[  903.149096] <0> ffff88011f29de40 0000000000000003 ffff88011cd6be98 ffffffff810f6d8c
[  903.149098] Call Trace:
[  903.149104]  [<ffffffff810f6cc1>] generic_shutdown_super+0x51/0xf0
[  903.149106]  [<ffffffff810f6d8c>] kill_block_super+0x2c/0x50
[  903.149109]  [<ffffffff810f81ea>] deactivate_super+0x6a/0xf0
[  903.149112]  [<ffffffff81113d6e>] mntput_no_expire+0x9e/0xf0
[  903.149114]  [<ffffffff8111518e>] sys_umount+0x5e/0x460
[  903.149117]  [<ffffffff8100baeb>] system_call_fastpath+0x16/0x1b
[  903.149118] Code: c7 b1 de 54 81 31 c0 e8 3c 4d 30 00 4c 8b b3 98 01 00 00 49 8b 06 4d 39 f5 0f 18 08 75 11 4c 3b ab 98 01 00 00 0f 84 c4 fe ff ff <0f> 0b eb fe 49 8d 46 80 49 8d 4e 48 48 8b b0 c8 01 00 00 44 8b 
[  903.149131] RIP  [<ffffffff8117d883>] ext3_put_super+0x233/0x280
[  903.149134]  RSP <ffff88011cd6be28>
[  903.159165] ---[ end trace 3ed92bd591190341 ]---
[  903.159166] ------------[ cut here ]------------
[  903.159169] WARNING: at kernel/exit.c:895 do_exit+0x94a/0x9e0()
[  903.159170] Hardware name: ProLiant DL360 G6
[  903.159171] Modules linked in: bonding ipv6 bnx2 usbhid button hpilo
[  903.159175] Pid: 22979, comm: umount Tainted: G      D    2.6.32.2 #1
[  903.159176] Call Trace:
[  903.159179]  [<ffffffff8104c218>] warn_slowpath_common+0x78/0xf0
[  903.159181]  [<ffffffff8104c29f>] warn_slowpath_null+0xf/0x20
[  903.159183]  [<ffffffff81050fca>] do_exit+0x94a/0x9e0
[  903.159187]  [<ffffffff8106de01>] ? down_trylock+0x31/0x50
[  903.159190]  [<ffffffff81010305>] oops_end+0x75/0xa0
[  903.159192]  [<ffffffff810108a1>] die+0xd1/0x110
[  903.159194]  [<ffffffff8100d0d4>] do_trap+0xc4/0x170
[  903.159196]  [<ffffffff8100d580>] do_invalid_op+0x90/0xb0
[  903.159198]  [<ffffffff8117d883>] ? ext3_put_super+0x233/0x280
[  903.159200]  [<ffffffff8100c815>] invalid_op+0x15/0x20
[  903.159203]  [<ffffffff8117d883>] ? ext3_put_super+0x233/0x280
[  903.159205]  [<ffffffff8117d8c7>] ? ext3_put_super+0x277/0x280
[  903.159207]  [<ffffffff810f6cc1>] generic_shutdown_super+0x51/0xf0
[  903.159209]  [<ffffffff810f6d8c>] kill_block_super+0x2c/0x50
[  903.159211]  [<ffffffff810f81ea>] deactivate_super+0x6a/0xf0
[  903.159213]  [<ffffffff81113d6e>] mntput_no_expire+0x9e/0xf0
[  903.159215]  [<ffffffff8111518e>] sys_umount+0x5e/0x460
[  903.159217]  [<ffffffff8100baeb>] system_call_fastpath+0x16/0x1b
[  903.159218] ---[ end trace 3ed92bd591190342 ]---
Comment 18 Jan Kara 2010-01-11 21:59:30 UTC
Guys, I'm having problems reproducing this. I've checked out 2.10 version of eglibc but when I run "make check" I get error on:
GCONV_PATH=/local/test1/build/iconvdata LC_ALL=C LOCPATH=/local/test1/build/localedata    /local/test1/build/elf/ld-linux-x86-64.so.2 --library-path /local/test1/build:/local/test1/build/math:/local/test1/build/elf:/local/test1/build/dlfcn:/local/test1/build/nss:/local/test1/build/nis:/local/test1/build/rt:/local/test1/build/resolv:/local/test1/build/crypt:/local/test1/build/nptl /local/test1/build/localedata/tst_mblen

probably because of:
mblen:de_DE.ISO-8859-1:1:1:2:F:the return value is different from an expected value

When I configure the package without TLS, I cannot get even that far... So could you either provide me with a simpler reproduction case or tell me how to make "make check" do what I need? Thanks...
Comment 19 carlos.velasco 2010-01-11 22:55:40 UTC
Jan,

You usually run glibc checks with -k (some errors are usual and recurring):
make -k check

You can follow CLFS/LFS guideline here using the pre-downloaded version of eglibc and patches:
http://cross-lfs.org/view/svn/x86_64/final-system/eglibc-64bit.html

Regards,
Carlos Velasco
Comment 20 Jan Kara 2010-01-12 16:03:29 UTC
OK, now "make -k check" have been doing something for a reasonable time but I don't see problems with orphan files you see. Only at the end there are a few ld-linux-... processes hanging so that I cannot umount the filesystem (I have to kill them to umount the filesystem) - do you also observe such processes?

Also if you can still reproduce the problem - could you please find which test does exactly cause the problem so that I can concentrate on it? Thanks.
Comment 21 carlos.velasco 2010-01-12 17:30:52 UTC
I don't see any ld-linux process after make -k check

bash-4.0# ps -ef | grep -i ld-linux
root     13640 18376  0 18:27 ?        00:00:00 grep -i ld-linux

bash-4.0# umount /tmp
[435397.580983] sb orphan head is 14
[435397.595803] sb_info orphan list:
[435397.611165]   inode dm-3:14 at ffff8800c7880d28: mode 100600, nlink 0, next 13
[435397.639671]   inode dm-3:13 at ffff8800df2566e0: mode 100600, nlink 0, next 12
[435397.672072]   inode dm-3:12 at ffff8800df2560b0: mode 100600, nlink 0, next 0
bash-4.0# [435397.672080] ------------[ cut here ]------------
[435397.672082] kernel BUG at fs/ext3/super.c:435!
[435397.672083] invalid opcode: 0000 [#1] SMP 
[435397.672085] last sysfs file: /sys/class/net/bonding_masters
[435397.672086] CPU 2 
[435397.672087] Modules linked in: bonding ipv6 usbhid bnx2 hpilo button
[435397.672092] Pid: 13641, comm: umount Not tainted 2.6.32.3 #1 ProLiant DL360 G6
[435397.672093] RIP: 0010:[<ffffffff8117da53>]  [<ffffffff8117da53>] ext3_put_super+0x233/0x280
[435397.672099] RSP: 0018:ffff88011cda5e28  EFLAGS: 00010206
[435397.672101] RAX: ffff8800c7880ce0 RBX: ffff88011cd54e00 RCX: 000000000001055f
[435397.672102] RDX: 0000000000004646 RSI: 0000000000000082 RDI: ffffffff815e6330
[435397.672104] RBP: ffff88011cda5e58 R08: 000000000001055e R09: ffffffff816dd240
[435397.672105] R10: 0000000000000013 R11: ffffffff816dd240 R12: ffff88011d13b400
[435397.672106] R13: ffff88011cd54f98 R14: ffff88011cd54f98 R15: ffff88011d3e1350
[435397.672108] FS:  00007f910e82b730(0000) GS:ffff88002c040000(0000) knlGS:0000000000000000
[435397.672110] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[435397.672111] CR2: 00007f910dedb32e CR3: 000000011cf65000 CR4: 00000000000006e0
[435397.672112] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[435397.672114] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[435397.672116] Process umount (pid: 13641, threadinfo ffff88011cda4000, task ffff88011d3e1350)
[435397.672117] Stack:
[435397.672118]  ffff880100000000 ffff88011d13b400 ffff88011d13b400 ffffffff814a4ac0
[435397.672120] <0> ffffffff8163d1b8 ffff88011d13b400 ffff88011cda5e78 ffffffff810f6cd1
[435397.672121] <0> ffff88011f016180 0000000000000003 ffff88011cda5e98 ffffffff810f6d9c
[435397.672124] Call Trace:
[435397.672130]  [<ffffffff810f6cd1>] generic_shutdown_super+0x51/0xf0
[435397.672132]  [<ffffffff810f6d9c>] kill_block_super+0x2c/0x50
[435397.672134]  [<ffffffff810f81fa>] deactivate_super+0x6a/0xf0
[435397.672137]  [<ffffffff81113dce>] mntput_no_expire+0x9e/0xf0
[435397.672140]  [<ffffffff811151ee>] sys_umount+0x5e/0x460
[435397.672143]  [<ffffffff8100baeb>] system_call_fastpath+0x16/0x1b
[435397.672144] Code: c7 41 df 54 81 31 c0 e8 b7 52 30 00 4c 8b b3 98 01 00 00 49 8b 06 4d 39 f5 0f 18 08 75 11 4c 3b ab 98 01 00 00 0f 84 c4 fe ff ff <0f> 0b eb fe 49 8d 46 80 49 8d 4e 48 48 8b b0 c8 01 00 00 44 8b 
[435397.672158] RIP  [<ffffffff8117da53>] ext3_put_super+0x233/0x280
[435397.672160]  RSP <ffff88011cda5e28>
[435397.682079] ---[ end trace 7a92197c0643be5c ]---
[435397.682080] ------------[ cut here ]------------
[435397.682083] WARNING: at kernel/exit.c:895 do_exit+0x94a/0x9e0()
[435397.682084] Hardware name: ProLiant DL360 G6
[435397.682085] Modules linked in: bonding ipv6 usbhid bnx2 hpilo button
[435397.682088] Pid: 13641, comm: umount Tainted: G      D    2.6.32.3 #1
[435397.682090] Call Trace:
[435397.682093]  [<ffffffff8104c208>] warn_slowpath_common+0x78/0xf0
[435397.682095]  [<ffffffff8104c28f>] warn_slowpath_null+0xf/0x20
[435397.682097]  [<ffffffff81050fba>] do_exit+0x94a/0x9e0
[435397.682100]  [<ffffffff8106ddf1>] ? down_trylock+0x31/0x50
[435397.682104]  [<ffffffff81010305>] oops_end+0x75/0xa0
[435397.682106]  [<ffffffff810108a1>] die+0xd1/0x110
[435397.682108]  [<ffffffff8100d0d4>] do_trap+0xc4/0x170
[435397.682110]  [<ffffffff8100d580>] do_invalid_op+0x90/0xb0
[435397.682112]  [<ffffffff8117da53>] ? ext3_put_super+0x233/0x280
[435397.682114]  [<ffffffff8106ddf1>] ? down_trylock+0x31/0x50
[435397.682117]  [<ffffffff8100c815>] invalid_op+0x15/0x20
[435397.682119]  [<ffffffff8117da53>] ? ext3_put_super+0x233/0x280
[435397.682121]  [<ffffffff8117da97>] ? ext3_put_super+0x277/0x280
[435397.682123]  [<ffffffff810f6cd1>] generic_shutdown_super+0x51/0xf0
[435397.682126]  [<ffffffff810f6d9c>] kill_block_super+0x2c/0x50
[435397.682128]  [<ffffffff810f81fa>] deactivate_super+0x6a/0xf0
[435397.682130]  [<ffffffff81113dce>] mntput_no_expire+0x9e/0xf0
[435397.682132]  [<ffffffff811151ee>] sys_umount+0x5e/0x460
[435397.682134]  [<ffffffff8100baeb>] system_call_fastpath+0x16/0x1b
[435397.682135] ---[ end trace 7a92197c0643be5d ]---


There are many many tests, difficult finding which one.
Comment 22 Jan Kara 2010-01-12 20:12:39 UTC
Hmm, strange. I've now downloaded eglibc 2.10.1 from the site you pointed to and followed instructions there. Now I don't have any ld-linux hung processes and everything seems to finish fine but still no luck in reproducing the problem.

About finding which test is causing this: I suppose you could easily disable tests from different directories and the when you find in which directory the problematic test is, you could do a binary search on tests. But I understand it's quite a bit of work...
Comment 23 carlos.velasco 2010-01-12 23:25:48 UTC
It is something in the nptl (threads) subdir tests.

eglibc-build/nptl# make subdir=nptl -C ../../eglibc-2.10.1/nptl objdir=/compile/eglibc-build tests
...
eglibc-build/nptl# umount /tmp 
Segmentation fault

Not sure about how to isolate the exact test from here now... will try to do my best.
Comment 24 carlos.velasco 2010-01-13 00:34:00 UTC
Found. It is the test "tst-mutexpi9".

eglibc-2.10.1/nptl# GCONV_PATH=/compile/eglibc-build/iconvdata LC_ALL=C     /compile/eglibc-build/elf/ld-linux-x86-64.so.2 --library-path /compile/eglibc-build:/compile/eglibc-build/math:/compile/eglibc-build/elf:/compile/eglibc-build/dlfcn:/compile/eglibc-build/nss:/compile/eglibc-build/nis:/compile/eglibc-build/rt:/compile/eglibc-build/resolv:/compile/eglibc-build/crypt:/compile/eglibc-build/nptl /compile/eglibc-build/nptl/tst-mutexpi9  > /compile/eglibc-build/nptl/tst-mutexpi9.out

eglibc-2.10.1/nptl# umount /tmp
Segmentation fault

eglibc-2.10.1/nptl# cat /compile/eglibc-build/nptl/tst-mutexpi9.out
going to fork now
Comment 25 Jan Kara 2010-01-18 22:41:46 UTC
Whohoo, reproduced! The trick was that this test is creating files in /tmp and I was doing all the compilation (and unmounting) in a different partition than /tmp is. Thanks for identifying the test case.

For others interested: What the test does is: It creates a temporary file, mmaps it, deletes it, creates pthread mutex in the mmap, forks, does some mutex locking, exits. During this process we apparently leak an inode reference, yay.
What is interesting is that when the mutex is created without priority inheritance logic, everything works fine so there's something racy about our pthread mutex handling in the kernel.

I'll now go and debug it more, I just wanted to share the news.
Comment 26 Jan Kara 2010-01-19 19:56:41 UTC
OK, I've spent some time looking into this. It's definitely a futex problem - futexes take reference to inode and apparently this test case takes some code path where we don't drop all the inode references we have acquired. I've tried to find that place by looking into the code but wasn't successful.

I think the problem is caused by futex rewrite that happened between 2.6.30 and 2.6.31 so I'm adding Darren to CC (who was author of most of the patches).
Darren, I'm able to reproduce the problem with 2.6.33-rc3 kernel so apparently the problem isn't fixed yet. I'll attach here a minimal test program that triggers the problem.
Comment 27 Jan Kara 2010-01-19 19:57:25 UTC
Created attachment 24639 [details]
Test program causing lost inode references
Comment 28 Darren Hart 2010-01-19 23:24:41 UTC
From the following in the test:

  mem = mmap (NULL, ps, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);

this may be related to be the problem reported by Kosaki and fixed with his patch recently pulled into tip by Ingo and sent Linus and stable. Can you try with this patch from tip/core-fixes-for-linus?

commit 7485d0d3758e8e6491a5c9468114e74dc050785d
Author: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Date:   Tue Jan 5 16:32:43 2010 +0900

    futexes: Remove rw parameter from get_futex_key()
    
    Currently, futexes have two problem:
    
    A) The current futex code doesn't handle private file mappings properly.
    
    get_futex_key() uses PageAnon() to distinguish file and
    anon, which can cause the following bad scenario:
    
      1) thread-A call futex(private-mapping, FUTEX_WAIT), it
         sleeps on file mapping object.
      2) thread-B writes a variable and it makes it cow.
      3) thread-B calls futex(private-mapping, FUTEX_WAKE), it
         wakes up blocked thread on the anonymous page. (but it's nothing)
    
    B) Current futex code doesn't handle zero page properly.
    
    Read mode get_user_pages() can return zero page, but current
    futex code doesn't handle it at all. Then, zero page makes
    infinite loop internally.
    
    The solution is to use write mode get_user_page() always for
    page lookup. It prevents the lookup of both file page of private
    mappings and zero page.
    
    Performance concerns:
    
    Probaly very little, because glibc always initialize variables
    for futex before to call futex(). It means glibc users never see
    the overhead of this patch.
    
    Compatibility concerns:
    
    This patch has few compatibility issues. After this patch,
    FUTEX_WAIT require writable access to futex variables (read-only
    mappings makes EFAULT). But practically it's not a problem,
    glibc always initalizes variables for futexes explicitly - nobody
    uses read-only mappings.
    
    Reported-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
    Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
    Acked-by: Peter Zijlstra <peterz@infradead.org>
    Acked-by: Darren Hart <dvhltc@us.ibm.com>
    Cc: <stable@kernel.org>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
    Cc: Nick Piggin <npiggin@suse.de>
    Cc: Ulrich Drepper <drepper@gmail.com>
    LKML-Reference: <20100105162633.45A2.A69D9226@jp.fujitsu.com>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
Comment 29 Jan Kara 2010-01-20 10:40:30 UTC
I've tested a kernel I've pulled today from Linus (which has the above commit) and it still has the issue. Note that the mapping created is shared so it's unlikely that the above patch would really fix it...

BTW: The problem is 100% reproducible for me (and for the guys reporting it as well) so I believe that you should see it as well with the test program.
Comment 30 Mikael Pettersson 2010-01-20 13:52:46 UTC
Using the standalone test case I can reproduce the bug with all kernels back to 2.6.29, but not with 2.6.28. I'm trying a bisect now.
Comment 31 Darren Hart 2010-01-20 18:11:16 UTC
Jan, I apologize, I missed the shared mapping. I wouldn't expect Kosaki's patch to fix this. Have you experimented with the SHARED, RECURSIVE, and PRIO_INHERIT flags of the mutexattr to see if this is the only configuration in which you see the problem? Understanding which of these flags are required to reproduce would help point at where to look in the code. Right now I suspect a futex key reference imbalance in the futex_lock_pi() code path. 2.6.29 includes changes from me on fault handling as well as key reference imbalance fixes in this area.

Mikael, please let me know what your git bisection reveals.

I'm trying to make time to do some testing/analysis myself, but I'm being kept very busy on other pressing issues right now, so I may not be able to respond as quickly as I'd like.
Comment 32 Darren Hart 2010-01-20 19:10:25 UTC
I see above that PRIO_INHERIT is required, and the test depends on SHARED. I suppose we could try with pthreads instead of fork to see if it's related to SHARED. For now I'll start examining the futex_lock_pi() path - particular around the signal induced wakeup.
Comment 33 Darren Hart 2010-01-20 19:29:53 UTC
Took a look at the 2.6.29 code, I believe it is possible to have an inode reference imbalance when a fault is taken. Unfortunately, both queue_lock() and get_futex_key() acquire references to the inode (I'd like to do away with queue_lock() as it masks reference usage and generally complicates the corner-case-heavy futex code). In the fault path queue_unlock() will release the first inode reference, but if on the first attempt (attempt == 0) the get_user() fails, we'll simply return without dropping the second reference. Some instrumentation could confirm this. I'll take a look at later sources which should have a significantly different fault path.
Comment 34 Mikael Pettersson 2010-01-20 19:54:37 UTC
My bisection has identified the following as the culprit:

commit 38d47c1b7075bd7ec3881141bb3629da58f88dab
Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date:   Fri Sep 26 19:32:20 2008 +0200

    futex: rely on get_user_pages() for shared futexes
    
    On the way of getting rid of the mmap_sem requirement for shared futexes,
    start by relying on get_user_pages().
Comment 35 Darren Hart 2010-01-20 20:09:08 UTC
The fault path inode count problem appears to be fixed in 2.6.33-rc3 where the fault logic is rewritten and and we properly account for both references taken by queue_lock as well as get_futex_key(). Since Jan in comment #26 said he can reproduce on 2.6.33-rc3, this isn't likely to be the path that caused the problem. I'll continue to look.
Comment 36 Mikael Pettersson 2010-01-23 22:18:38 UTC
I've posted a futex bug fix to lkml:
<http://marc.info/?l=linux-kernel&m=126428261230399&w=2>

It fixed the standalone test case. I'm now doing a full glibc rebuild and test suite run on a machine with the fixed kernel.
Comment 37 Mikael Pettersson 2010-01-24 11:28:05 UTC
(In reply to comment #36)
> I've posted a futex bug fix to lkml:
> <http://marc.info/?l=linux-kernel&m=126428261230399&w=2>
> 
> It fixed the standalone test case. I'm now doing a full glibc rebuild and
> test
> suite run on a machine with the fixed kernel.

Completed. The patch eliminated all orphan inodes from the glibc test suite.
Comment 38 carlos.velasco 2010-01-24 17:13:44 UTC
It works for me.
Comment 39 carlos.velasco 2010-01-24 18:17:58 UTC
Not sure about the patch.
With patch applied, after testing and installing glibc I see plenty of tmp files left in /tmp:

-rw-------  1 root root     0 Jan 24 18:56 tst-atime.HiK5Qd

-rw-------  1 root root     0 Jan 24 19:07 cc2mcjSe.o
-rw-------  1 root root     0 Jan 24 19:07 cc3s3qwc.c
-rw-------  1 root root  9766 Jan 24 19:07 ccNVp9dh.ld
-rw-------  1 root root     0 Jan 24 19:07 cckbLeAj.le

I don't remember to see them without patch applied.

Also, when rebooting I see this error:

Unmounting all other currently mounted file systems...
umount: /: device is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))

Very odd...
Comment 40 carlos.velasco 2010-01-24 18:20:15 UTC
After rebooting the temp files are gone...
Comment 41 carlos.velasco 2010-01-24 19:27:33 UTC
Tested with unpatched kernel and the temp files are there too. Can't test the error unmounting / because when /tmp is unmounted it gives oops. This may be another problem.
Comment 42 carlos.velasco 2010-01-25 01:37:33 UTC
Same problem using 2.6.33-rc5 with the patch.

Unmounting all other currently mounted file systems...
umount: /: device is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))

I don't know why but this only happens with the patch applied.
Comment 43 Mikael Pettersson 2010-01-25 10:28:58 UTC
Please clarify exactly what you do to trigger the busy /.

Did you run the glibc test suite or the specific tst-mutex9 program?
If so, then without the futex_lock_pi() patch you wouldn't even get to a normal shutdown due to the oops, so the busy / may be normal and caused by something else.

Also, does the busy / prevent the shutdown from succeeding?
Comment 44 Jan Kara 2010-01-25 12:07:11 UTC
Carlos, EBUSY from umount usually means that some process has some file open on the filesystem (/ in your case). Are there any usual processes running before you run shutdown? Or maybe you could try just switching to single user mode and then use lsof to see processes which have files open on /...
Comment 45 carlos.velasco 2010-01-28 17:19:13 UTC
Sorry about delay, I made further tests.
I think the patch is good and this is another problem.

The oops happened during TEST of the eglibc. This has been solved by the patch.
This second issue is related to the INSTALL of the eglibc.

After "make install" and reboot I always see an umount busy message for /.
This busy message was not seen before because we stepped into the oops umounting /tmp, before umounting /.

I tried downgrading to linux kernel 2.6.28 but I couldn't, I receive an NMI interrupt and system hang before testing with this kernel.

So I have issued a "ps -ef" and a lsof just before the umount all.

Looking output of ps, there are no user processes apart of these:
root     30920     1  0 17:50 ttyS1    00:00:00 init [6]  
root     30921 30920  0 17:50 ttyS1    00:00:00 /bin/bash /etc/rc.d/init.d/rc 6
root     31085 30921  0 17:50 ttyS1    00:00:00 /bin/bash /etc/rc.d/rc6.d/S70mou

Umount command is:
"umount -a -d -r >/dev/null"

I ONLY see this error when installing eglibc. Rebooting the system without installing eglibc is successful without errors.

Looking into LSOF, I have been looking into differences between a "good reboot" and a "bad reboot - umount busy". I can only see that in the "bad reboot" the SIZE/OFF columns are empty for the new installed eglibc files.

Bad:
COMMAND    PID USER   FD      TYPE DEVICE SIZE/OFF   NODE NAME
init          1 root  DEL       REG  253,1          887056 /lib64/libc-2.10.1.so
init          1 root  DEL       REG  253,1          887055 /lib64/ld-2.10.1.so

Good:
COMMAND    PID USER   FD      TYPE DEVICE SIZE/OFF   NODE NAME
init         1 root  mem       REG  253,1  1752309 887013 /lib64/libc-2.10.1.so
init         1 root  mem       REG  253,1   149463 887007 /lib64/ld-2.10.1.so

Not sure how to proceed from here. Not sure if this is an ext3 issue also...
Comment 46 carlos.velasco 2010-01-28 17:23:55 UTC
Created attachment 24757 [details]
lsof before umount in a reboot without busy error
Comment 47 carlos.velasco 2010-01-28 17:24:43 UTC
Created attachment 24758 [details]
lsof before umount in a "bad" reboot with busy error
Comment 48 Jan Kara 2010-01-28 17:55:53 UTC
Thanks for debugging Carlos! Note, that in the 'bad' case, the files are deleted... When there are still open but deleted files, we cannot remount the filesystem read-only because that would make deleting the file at closing of the last file descriptor impossible. So what you see is an expected behavior...

Mikael, will you take care of pushing your fix to vanilla kernel and possibly also -stable releases? Thanks.
Comment 49 Jan Kara 2010-02-10 19:15:27 UTC
Rafael, the patch has been merged as 5ecb01cfdf96c5f465192bdb2a4fd4a61a24c6cc so I'm closing the bug.
Comment 50 Rafael J. Wysocki 2010-02-10 19:20:03 UTC
Thanks!