|Summary:||btrfs as root filesystems hangs on mount after d863b50ab01333659314c2034890cb76d9fdc3c7|
|Product:||File System||Reporter:||Asbjørn Sannes (ace)|
|Severity:||normal||CC:||akpm, florian, maciej.rutecki, paulmck, rjw|
|Bug Depends on:|
sysrq-t output when waiting for mount
Two boots with CONFIG_PROVE_RCU=y
Hung boot that continues after sysrq-t.
Description Asbjørn Sannes 2011-02-26 15:59:38 UTC
I use btrfs as the root filesystem on one of my installs and it hangs upon boot when trying to mount the root filesystem. If I boot of an ext3 volume it works. After the commit d863b50ab01333659314c2034890cb76d9fdc3c7 (vfs: call rcu_barrier after ->kill_sb()) the bootup hangs, pressing sysrq t for trace does not give a trace. The volume is located on an lvm volume. Reverting the commit ontop of v2.6.38-rc6-git6 makes it boot.
Comment 1 Andrew Morton 2011-02-26 18:51:37 UTC
We do need to see that sysrq-t output, please. (Unless there's a better way of finding out where rcu_barrier() got stuck?) Please add ignore_loglevel to the kernel boot command line and try again?
Comment 2 Asbjørn Sannes 2011-02-26 19:59:51 UTC
Ah, added ignore_loglevel :) Did not know about that one. I think this is the relevant part: [ 75.534655] mount D ffff8802278d58d8 5824 1896 1 0x00000000 [ 75.534655] ffff8802254e5c28 0000000000000086 ffff8802254e5b88 ffff88022786dcc0 [ 75.534655] ffff880200000000 ffff8802254e5fd8 ffff8802278d5620 ffff8802254e5fd8 [ 75.534655] ffff8802278d58e0 ffff8802278d58d8 ffff8802254e4000 ffff8802254e4000 [ 75.534655] Call Trace: [ 75.534655] [<ffffffff8159fb68>] schedule_timeout+0x22/0xbb [ 75.534655] [<ffffffff8104c251>] ? flat_send_IPI_allbutself+0x51/0x54 [ 75.534655] [<ffffffff81048433>] ? native_send_call_func_ipi+0x4a/0x60 [ 75.534655] [<ffffffff815a0eb9>] ? _raw_spin_unlock_irqrestore+0x20/0x2b [ 75.534655] [<ffffffff810970af>] ? smp_call_function_many+0x1ae/0x1cf [ 75.534655] [<ffffffff8159f172>] wait_for_common+0x9e/0x10b [ 75.534655] [<ffffffff81068f99>] ? default_wake_function+0x0/0xf [ 75.534655] [<ffffffff810bb31a>] ? call_rcu+0x10/0x12 [ 75.534655] [<ffffffff810bb30a>] ? call_rcu+0x0/0x12 [ 75.534655] [<ffffffff8159f279>] wait_for_completion+0x18/0x1a [ 75.534655] [<ffffffff810ba97a>] _rcu_barrier+0x94/0xa4 [ 75.534655] [<ffffffff810ba9a1>] rcu_barrier+0x17/0x19 [ 75.534655] [<ffffffff8111cd77>] deactivate_locked_super+0x26/0x46 [ 75.534655] [<ffffffff8111d7e7>] mount_bdev+0x148/0x182 [ 75.534655] [<ffffffff811a6350>] ? ext4_fill_super+0x0/0x226c [ 75.534655] [<ffffffff811a2020>] ext4_mount+0x10/0x12 [ 75.534655] [<ffffffff8111cfc9>] vfs_kern_mount+0xb8/0x1d1 [ 75.534655] [<ffffffff8111d140>] do_kern_mount+0x48/0xd8 [ 75.534655] [<ffffffff81133ecd>] do_mount+0x729/0x791 [ 75.534655] [<ffffffff810f0028>] ? memdup_user+0x43/0x63 [ 75.534655] [<ffffffff810f0081>] ? strndup_user+0x39/0x4f [ 75.534655] [<ffffffff81133fb8>] sys_mount+0x83/0xbf [ 75.534655] [<ffffffff810309fb>] system_call_fastpath+0x16/0x1b Anything else I can do? oh, seems this is not exactly btrfs code, so I was completly off.. I guess it tries ext4 before it tries btrfs, and that it tries ext3 before it tries ext4? That would make sense to me..
Comment 3 Andrew Morton 2011-02-26 20:50:50 UTC
That's the waiting thread. We need to work out what thread it's waiting on. Please attach the full sysrq-t output. (attach, don't paste: the wordwrapping makes it unreadable). Thanks.
Comment 4 Asbjørn Sannes 2011-02-26 21:02:20 UTC
Created attachment 49332 [details] sysrq-t output when waiting for mount
Comment 5 Rafael J. Wysocki 2011-02-26 21:08:25 UTC
The first bad commit is: commit d863b50ab01333659314c2034890cb76d9fdc3c7 Author: Boaz Harrosh <firstname.lastname@example.org> Date: Thu Feb 10 15:01:20 2011 -0800 vfs: call rcu_barrier after ->kill_sb() Tested-by: Tao Ma <email@example.com> Signed-off-by: Boaz Harrosh <firstname.lastname@example.org> Cc: Nick Piggin <email@example.com> Cc: Al Viro <firstname.lastname@example.org> Cc: Chris Mason <email@example.com> Signed-off-by: Andrew Morton <firstname.lastname@example.org> First-Bad-Commit : d863b50ab01333659314c2034890cb76d9fdc3c7
Comment 6 Andrew Morton 2011-02-26 21:24:09 UTC
Beats me. There are of course 1000 different versions of the rcu code but it seems that you're running kernel/rcutree.c and its _rcu_barrier() appears to be stuck in wait_for_completion(&rcu_barrier_completion). Paul, is there any debugging we can turn on to identify the cause of this? Thanks.
Comment 7 Paul E. McKenney 2011-02-26 23:50:52 UTC
CONFIG_PROVE_RCU might help. I wonder if that rcu_barrier() is in an RCU read-side critical section? No good can possibly come of that.
Comment 8 Asbjørn Sannes 2011-02-27 09:25:37 UTC
Created attachment 49462 [details] Two boots with CONFIG_PROVE_RCU=y Interestingly enough, it continues after sysrq-t to boot up.
Comment 9 Paul E. McKenney 2011-02-28 05:29:28 UTC
Interesting indeed! I take it that without CONFIG_PROVE_RCU=y, it did not boot all the way up?
Comment 10 Asbjørn Sannes 2011-03-02 05:54:01 UTC
Created attachment 49852 [details] Hung boot that continues after sysrq-t. Yes, after I do sysrq-t it continues. I'm attaching a full dmesg from such a boot.
Comment 11 Asbjørn Sannes 2011-03-02 06:10:27 UTC
Could this be affected by bad BIOS or the like? I played around with the BIOS a little bit, if I don't enable AHCI in the bios (which results in an AMD ACHI BIOS being run) and use "native IDE" everything seems to work fine.. It is kind of getting a bit ridiculous how many changes in the bios and/or kernel parameters I have to do these days to get Linux to boot :P Maybe this is another one of those bugs that only bites a few and really is not a regression for the rest ..
Comment 13 Asbjørn Sannes 2011-03-15 06:52:54 UTC
Hm, turning off C1E support in the BIOS makes it work. So I suppose this is more of the #15289 bug. Earlier I did not need to turn it off in the BIOS it was enough to add acpi_skip_timer_override to the boot parameters (bug#15289). Tested 2.6.36 + commit (d863b50ab01333659314c2034890cb76d9fdc3c7) and 2.6.37 + commit and it hung in all of those cases aswell. Maybe it is the same bug.