Latest working kernel version: N/A Earliest failing kernel version: 2.6.25-rc6 Distribution: Debian/Suse Hardware Environment: Dell XPS M1210 Laptop Dual Core CPU 2.0 Ghz RAM - 2GB HDD - SATA 60 GB + USB 120 GB Self-Powered Software Environment: DM + DM-Crypt + LVM2 Problem Description: > On my laptop, doing heavy C++ compilations in parallel with -j3 (this > is a dual core) often generates the following trace: > > INFO: task g++:25119 blocked for more than 120 seconds. > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this > message. > g++ D c03916c0 0 25119 25118 > ecfef200 00200086 c038ef18 c03916c0 c03916c0 f77b2230 f77b2378 > c17fa6c0 > 00000000 f89817bc f7486280 000000ff f7a1eb20 00000000 00000000 > 00000000 > c17fa6c0 00000000 d1401e9c c17f0408 c02a2aa1 d1401e94 c0147100 > c02a2c93 > Call Trace: > [<f89817bc>] dm_table_unplug_all+0x1e/0x2e [dm_mod] > [<c02a2aa1>] io_schedule+0x1b/0x24 > [<c0147100>] sync_page+0x33/0x36 > [<c02a2c93>] __wait_on_bit+0x33/0x58 > [<c01470cd>] sync_page+0x0/0x36 > [<c01472fd>] wait_on_page_bit+0x59/0x60 > [<c012cf7b>] wake_bit_function+0x0/0x3c > [<c014e20d>] truncate_inode_pages_range+0x238/0x29f > [<c014e27d>] truncate_inode_pages+0x9/0xc > [<f8c67187>] ext2_delete_inode+0x12/0x6e [ext2] > [<f8c67175>] ext2_delete_inode+0x0/0x6e [ext2] > [<c0170ea5>] generic_delete_inode+0x8f/0xf3 > [<c0170819>] iput+0x60/0x62 > [<c0169aa5>] do_unlinkat+0xb7/0xf9 > [<c0113a3e>] do_page_fault+0x1fa/0x4dc > [<c0104822>] sysenter_past_esp+0x5f/0x85 > ======================= > > This is with 2.6.25-rc6 (SMP) and has been present, as far as I can > remember, since the beginning of the 2.6.25-rc series. It is not > always reproducible, but the trace is always the same. > > My filesystem is stored on a ext3 (rw,noatime) dm_crypt'd partition > leaving in a LVM volume. > > % lsmod | grep dm | grep -v ' 0 *$' > dm_crypt 14340 1 > crypto_blkcipher 18308 6 ecb,cbc,dm_crypt > dm_mod 53008 26 dm_crypt,dm_mirror,dm_snapshot > > Here is the code I have in dm-table.o: > > 00001042 <dm_table_unplug_all>: > 1042: 56 push %esi > 1043: 53 push %ebx > 1044: 8b 98 a0 00 00 00 mov 0xa0(%eax),%ebx > 104a: 8d b0 a0 00 00 00 lea 0xa0(%eax),%esi > 1050: eb 10 jmp 1062 > <dm_table_unplug_all+0x20> > 1052: 8b 43 10 mov 0x10(%ebx),%eax > 1055: 8b 40 5c mov 0x5c(%eax),%eax > 1058: 8b 40 34 mov 0x34(%eax),%eax > 105b: e8 fc ff ff ff call 105c > <dm_table_unplug_all+0x1a> > 1060: 8b 1b mov (%ebx),%ebx > 1062: 8b 03 mov (%ebx),%eax > 1064: 0f 1f 40 00 nopl 0x0(%eax) > 1068: 39 f3 cmp %esi,%ebx > 106a: 75 e6 jne 1052 > <dm_table_unplug_all+0x10> > 106c: 5b pop %ebx > 106d: 5e pop %esi > 106e: c3 ret > > The symbol in 105b call is, after relocation, blk_unplug. > > It there anything else I can do to help debugging this? > > Sam Steps to reproduce: Steps to reproduce is not consistent. The last time I reported, I wasn't able to reproduce the bug on a different setup (An IBM xSeries server). One point to notice is that, for both us users, it has been reproducible on the laptops. I'm filing this bugzilla report because now the user count is 2. And the report is identical. This was earlier discussed here in this thread: https://www.redhat.com/archives/dm-devel/2008-March/msg00014.html
There's a commit in between rc6..rc8 which I believe addresses this bug. Neil, Since you run RC kernels, will it be possible for you to test rc8 ? commit 3f1e9070f63b0eecadfa059959bf7c9dbe835962 Author: Milan Broz <mbroz@redhat.com> Date: Fri Mar 28 14:16:07 2008 -0700 dm crypt: fix ctx pending Fix regression in dm-crypt introduced in commit 3a7f6c990ad04e6f576a159876c602d14d6f7fef ("dm crypt: use async crypto"). If write requests need to be split into pieces, the code must not process them in parallel because the crypto context cannot be shared. So there can be parallel crypto operations on one part of the write, but only one write bio can be processed at a time. This is not optimal and the workqueue code needs to be optimized for parallel processing, but for now it solves the problem without affecting the performance of synchronous crypto operation (most of current dm-crypt users). http://bugzilla.kernel.org/show_bug.cgi?id=10242 http://bugzilla.kernel.org/show_bug.cgi?id=10207 Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Neil, Did you get a chance to verify it with the latest rc?
Sorry, but I've got no idea what comment #2 is asking. I'm not aware that I was going to verify anything.... Confused.
You posted the same behavior on dm-devel some weeks back. This behavior was seen by me too. That is why I opened the bugzilla. Since you mentioned that you did use rc kernels, I wanted to know if the behavior was still present after the above mentioned patch was applied. I asked for your help in comment #1
I think I know what caused the confusion. Samuel Tardieu has this problem and sent and email to the linux-raid mailing list. I forwarded it to dm-devel because that was a more appropriate mailing list. You misunderstood that email and thought I was experiencing the problem. But I wasn't. I've never used dm-crypt. You need to talk to Samuel.
Assuming this is fixed. If not, please reopen with an updated trace from a recent kernel.