I use btrfs as the root filesystem on one of my installs and it hangs upon boot when trying to mount the root filesystem. If I boot of an ext3 volume it works. After the commit d863b50ab01333659314c2034890cb76d9fdc3c7 (vfs: call rcu_barrier after ->kill_sb()) the bootup hangs, pressing sysrq t for trace does not give a trace. The volume is located on an lvm volume. Reverting the commit ontop of v2.6.38-rc6-git6 makes it boot.
We do need to see that sysrq-t output, please. (Unless there's a better way of finding out where rcu_barrier() got stuck?) Please add ignore_loglevel to the kernel boot command line and try again?
Ah, added ignore_loglevel :) Did not know about that one. I think this is the relevant part: [ 75.534655] mount D ffff8802278d58d8 5824 1896 1 0x00000000 [ 75.534655] ffff8802254e5c28 0000000000000086 ffff8802254e5b88 ffff88022786dcc0 [ 75.534655] ffff880200000000 ffff8802254e5fd8 ffff8802278d5620 ffff8802254e5fd8 [ 75.534655] ffff8802278d58e0 ffff8802278d58d8 ffff8802254e4000 ffff8802254e4000 [ 75.534655] Call Trace: [ 75.534655] [<ffffffff8159fb68>] schedule_timeout+0x22/0xbb [ 75.534655] [<ffffffff8104c251>] ? flat_send_IPI_allbutself+0x51/0x54 [ 75.534655] [<ffffffff81048433>] ? native_send_call_func_ipi+0x4a/0x60 [ 75.534655] [<ffffffff815a0eb9>] ? _raw_spin_unlock_irqrestore+0x20/0x2b [ 75.534655] [<ffffffff810970af>] ? smp_call_function_many+0x1ae/0x1cf [ 75.534655] [<ffffffff8159f172>] wait_for_common+0x9e/0x10b [ 75.534655] [<ffffffff81068f99>] ? default_wake_function+0x0/0xf [ 75.534655] [<ffffffff810bb31a>] ? call_rcu+0x10/0x12 [ 75.534655] [<ffffffff810bb30a>] ? call_rcu+0x0/0x12 [ 75.534655] [<ffffffff8159f279>] wait_for_completion+0x18/0x1a [ 75.534655] [<ffffffff810ba97a>] _rcu_barrier+0x94/0xa4 [ 75.534655] [<ffffffff810ba9a1>] rcu_barrier+0x17/0x19 [ 75.534655] [<ffffffff8111cd77>] deactivate_locked_super+0x26/0x46 [ 75.534655] [<ffffffff8111d7e7>] mount_bdev+0x148/0x182 [ 75.534655] [<ffffffff811a6350>] ? ext4_fill_super+0x0/0x226c [ 75.534655] [<ffffffff811a2020>] ext4_mount+0x10/0x12 [ 75.534655] [<ffffffff8111cfc9>] vfs_kern_mount+0xb8/0x1d1 [ 75.534655] [<ffffffff8111d140>] do_kern_mount+0x48/0xd8 [ 75.534655] [<ffffffff81133ecd>] do_mount+0x729/0x791 [ 75.534655] [<ffffffff810f0028>] ? memdup_user+0x43/0x63 [ 75.534655] [<ffffffff810f0081>] ? strndup_user+0x39/0x4f [ 75.534655] [<ffffffff81133fb8>] sys_mount+0x83/0xbf [ 75.534655] [<ffffffff810309fb>] system_call_fastpath+0x16/0x1b Anything else I can do? oh, seems this is not exactly btrfs code, so I was completly off.. I guess it tries ext4 before it tries btrfs, and that it tries ext3 before it tries ext4? That would make sense to me..
That's the waiting thread. We need to work out what thread it's waiting on. Please attach the full sysrq-t output. (attach, don't paste: the wordwrapping makes it unreadable). Thanks.
Created attachment 49332 [details] sysrq-t output when waiting for mount
The first bad commit is: commit d863b50ab01333659314c2034890cb76d9fdc3c7 Author: Boaz Harrosh <bharrosh@panasas.com> Date: Thu Feb 10 15:01:20 2011 -0800 vfs: call rcu_barrier after ->kill_sb() Tested-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> First-Bad-Commit : d863b50ab01333659314c2034890cb76d9fdc3c7
Beats me. There are of course 1000 different versions of the rcu code but it seems that you're running kernel/rcutree.c and its _rcu_barrier() appears to be stuck in wait_for_completion(&rcu_barrier_completion). Paul, is there any debugging we can turn on to identify the cause of this? Thanks.
CONFIG_PROVE_RCU might help. I wonder if that rcu_barrier() is in an RCU read-side critical section? No good can possibly come of that.
Created attachment 49462 [details] Two boots with CONFIG_PROVE_RCU=y Interestingly enough, it continues after sysrq-t to boot up.
Interesting indeed! I take it that without CONFIG_PROVE_RCU=y, it did not boot all the way up?
Created attachment 49852 [details] Hung boot that continues after sysrq-t. Yes, after I do sysrq-t it continues. I'm attaching a full dmesg from such a boot.
Could this be affected by bad BIOS or the like? I played around with the BIOS a little bit, if I don't enable AHCI in the bios (which results in an AMD ACHI BIOS being run) and use "native IDE" everything seems to work fine.. It is kind of getting a bit ridiculous how many changes in the bios and/or kernel parameters I have to do these days to get Linux to boot :P Maybe this is another one of those bugs that only bites a few and really is not a regression for the rest ..
Maybe bug#27842 is related?
Hm, turning off C1E support in the BIOS makes it work. So I suppose this is more of the #15289 bug. Earlier I did not need to turn it off in the BIOS it was enough to add acpi_skip_timer_override to the boot parameters (bug#15289). Tested 2.6.36 + commit (d863b50ab01333659314c2034890cb76d9fdc3c7) and 2.6.37 + commit and it hung in all of those cases aswell. Maybe it is the same bug.
*** This bug has been marked as a duplicate of bug 15289 ***