|Summary:||lvm snapshot causes deadlock in 2.6.35|
|Product:||File System||Reporter:||Phillip Susi (phill)|
|Component:||ext4||Assignee:||Eric Sandeen (sandeen)|
|Severity:||normal||CC:||maciej.rutecki, rjw, sandeen|
|Bug Depends on:|
Description Phillip Susi 2010-06-23 16:55:38 UTC
Copying from: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/595489 Attempting to snapshot the root lv causes a deadlock in 2.6.35 when it suspends the root lv device to replace the table. The lvcreate -n snap -s -L 1g lv/root command hangs, can not be killed, and no further IO is possible, and the system must be hard booted with magic-sysrq. Narrowed down cause to commit 6b0310fbf087ad6: "ext4: don't return to userspace after freezing the fs with a mutex held". After reversing this change, the problem goes away.
Comment 1 Eric Sandeen 2010-06-24 15:27:57 UTC
Created attachment 26933 [details] Proposed patch I've sent this patch upstream, I think it should solve the most egregious problem here, though I think other deadlocks remain (looking as if they are unrelated to the commit mentioned in the original comment)
Comment 2 Phillip Susi 2010-06-25 13:40:01 UTC
I tested this patch last night and it seemed to fix it.
Comment 3 Eric Sandeen 2010-06-25 14:41:07 UTC
Good deal. FWIW there is at least one other deadlock possible; it seems that the flushing during freeze isn't pushing all transactions out (or something...) and writeback tries to do more post-freeze. This takes s_umount and gets stopped in jbd, and then thaw wants s_umount to unfreeze the fs. This results in a) an inconsistent snapshot, and b) a stuck unfreeze (or snapshot create, which does unfreeze post-snap). I'm working on that.
Comment 4 Eric Sandeen 2010-06-28 19:58:03 UTC
Just to be clear, the patch in #1 should resolve the actual regression, the blathering in comment #3 needs further scrutiny but is a symptom of a problem which has existed for a while, I think. So let's not hold up on the primary fix here.
Comment 5 Rafael J. Wysocki 2010-06-28 21:08:28 UTC
Handled-By : Eric Sandeen <firstname.lastname@example.org> Patch : https://bugzilla.kernel.org/attachment.cgi?id=26933
Comment 6 Phillip Susi 2010-07-13 17:17:25 UTC
It seems this patch still has not been applied to Linus's tree. Any idea why?
Comment 7 Rafael J. Wysocki 2010-08-01 13:38:56 UTC
@Eric: Any chance to push the patch upstream?
Comment 8 Rafael J. Wysocki 2010-08-29 22:57:16 UTC
Fixed by commit 437f88cc031ffe7f37f3e705367f4fe1f4be8b0f .