Copying from: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/595489 Attempting to snapshot the root lv causes a deadlock in 2.6.35 when it suspends the root lv device to replace the table. The lvcreate -n snap -s -L 1g lv/root command hangs, can not be killed, and no further IO is possible, and the system must be hard booted with magic-sysrq. Narrowed down cause to commit 6b0310fbf087ad6: "ext4: don't return to userspace after freezing the fs with a mutex held". After reversing this change, the problem goes away.
Created attachment 26933 [details] Proposed patch I've sent this patch upstream, I think it should solve the most egregious problem here, though I think other deadlocks remain (looking as if they are unrelated to the commit mentioned in the original comment)
I tested this patch last night and it seemed to fix it.
Good deal. FWIW there is at least one other deadlock possible; it seems that the flushing during freeze isn't pushing all transactions out (or something...) and writeback tries to do more post-freeze. This takes s_umount and gets stopped in jbd, and then thaw wants s_umount to unfreeze the fs. This results in a) an inconsistent snapshot, and b) a stuck unfreeze (or snapshot create, which does unfreeze post-snap). I'm working on that.
Just to be clear, the patch in #1 should resolve the actual regression, the blathering in comment #3 needs further scrutiny but is a symptom of a problem which has existed for a while, I think. So let's not hold up on the primary fix here.
Handled-By : Eric Sandeen <sandeen@redhat.com> Patch : https://bugzilla.kernel.org/attachment.cgi?id=26933
It seems this patch still has not been applied to Linus's tree. Any idea why?
@Eric: Any chance to push the patch upstream?
Fixed by commit 437f88cc031ffe7f37f3e705367f4fe1f4be8b0f .