Bug 10242 - rm command hangs
Summary: rm command hangs
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: LVM2/DM (show other bugs)
Hardware: All Linux
: P1 high
Assignee: Alasdair G Kergon
: 10207 (view as bug list)
Depends on:
Blocks: 9832
  Show dependency tree
Reported: 2008-03-14 05:47 UTC by Jean-Luc Coulon
Modified: 2008-03-28 15:23 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.25-rc5
Regression: Yes
Bisected commit-id:

task blocked, syslog for w to sysrq-trigger (33.67 KB, text/plain)
2008-03-14 08:32 UTC, Jean-Luc Coulon

Description Jean-Luc Coulon 2008-03-14 05:47:07 UTC
Latest working kernel version: 2.6.24
Earliest failing kernel version: 2.6.25-rc5-git4
Distribution: Debia/Sid
Hardware Environment: AUS A8V, Athlon 64 x2 4200+
Software Environment: raid1, cryptsetup (luks), lvm2, xfs
Problem Description: "sometimes" a rm command stalls. The files are deleted but the xterm or console are frozen.
A strace on the pid stalls as well without any message after the "attached to pid" message.
Then, it is impossible to sync the filestem (command hangs) or to umount the device (busy).
I've never seen this problem with 2.6.24 (this doesnt mean it doesnt exist). Maybe it was not existing with 2.6.25-rc2 but I've not used it too much.
I have it once or twice a day on 2.6.25-rc5.
The rm process is not killeable. I need to reboot to get rid of it. The filsystem, after playing the journals doesnt appear to be corrupted (xfs_check dosnt report any error).

Steps to reproduce: rm -rf /xxx/xxxx
(I got it mostly cleaning a tree via a script after building a debian package on my machine).

The filesystem is an xfs filesystem.
It is built on a raid1, encrypted with cryptsetup using luks and it is a lvm2 logical volume over this raid1.
Comment 1 Eric Sandeen 2008-03-14 07:28:01 UTC
Does it only happen on luks volumes?

try "echo w > /proc/sysrq-trigger" to see which  tasks that are in uninterruptable (blocked) state. If possible, attach here.

Comment 2 Jean-Luc Coulon 2008-03-14 08:32:58 UTC
Created attachment 15267 [details]
task blocked, syslog for w to sysrq-trigger

I've never had the problem on non-luks volume. But non luks have poor write/delete activity (root filesystem, /usr)

I've had the problem no doing a rm but running a c++ compilation.

Comment 3 Alasdair G Kergon 2008-03-14 11:38:58 UTC
assume this is the known dm-crypt regression - we're working on a patch
Comment 4 Alasdair G Kergon 2008-03-14 11:42:09 UTC
(a ref counting bug meaning in certain circumstances dm-crypt layer holds onto i/o for ever and never reports it completed)
Comment 5 Rafael J. Wysocki 2008-03-14 13:02:12 UTC
Is it a duplicate of Bug #10207?
Comment 6 Adrian Bunk 2008-03-14 13:08:26 UTC
Most likely we won't know for sure whether it's the same as bug #10207 until there's a fix for which Jean-Luc can verify whether or not it fixes the problem for him?
Comment 7 Milan Broz 2008-03-15 03:00:42 UTC
> Most likely we won't know for sure whether it's the same as bug #10207 until
> there's a fix for which Jean-Luc can verify whether or not it fixes the
> problem
> for him?

Please try patch in http://lkml.org/lkml/2008/3/14/347
Comment 8 Jean-Luc Coulon 2008-03-16 07:18:50 UTC
I've tested the patch (on 2.6.25-rc5-git4).
I've stressed a bit the system and I've no more the problem so far.

Comment 9 Milan Broz 2008-03-17 12:26:32 UTC
Latest patch for dm-crypt in http://lkml.org/lkml/2008/3/17/214
(the same patch mentioned in bug 10207)
Comment 10 Alasdair G Kergon 2008-03-24 05:48:40 UTC
Please test the patch in comment 9 - I think that one's ready to submit.
Comment 11 Rafael J. Wysocki 2008-03-27 15:37:36 UTC
Patch : http://lkml.org/lkml/2008/3/27/293
Comment 12 Rafael J. Wysocki 2008-03-27 15:38:34 UTC
*** Bug 10207 has been marked as a duplicate of this bug. ***
Comment 13 Adrian Bunk 2008-03-28 15:23:43 UTC
fixed by commit 3f1e9070f63b0eecadfa059959bf7c9dbe835962

Note You need to log in before you can comment on or make changes to this bug.