Hi, I'm copying a directory structure from a vfat partition to another. Both vfat partitions are in (different) disk images, mounted through the loopback device. The image files are on an ext4 partition encrypted with dm-crypt. I suspect device mapper, especially dm-crypt, *may* have something to do with this, as I have encountered similar problems with it before. The disk works fine. ------------------------------------------------------------ [10323.101237] INFO: task cp:4126 blocked for more than 120 seconds. [10323.101242] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [10323.101247] cp D 0000000000000000 0 4126 3017 0x00000000 [10323.101255] ffff880060b55658 0000000000000046 ffff88013d17a800 ffff880060b55fd8 [10323.101265] ffff88013e99bf00 ffff880060b55fd8 0000000000013600 0000000000004000 [10323.101273] 0000000000013600 0000000000013600 ffff88013fdb9f80 ffff88013e99bf00 [10323.101282] Call Trace: [10323.101293] [<ffffffff81064b6e>] ? ktime_get_ts+0xb2/0xbf [10323.101301] [<ffffffff811160a4>] ? sync_buffer+0x0/0x47 [10323.101308] [<ffffffff814572b6>] io_schedule+0x73/0xb5 [10323.101314] [<ffffffff811160e7>] sync_buffer+0x43/0x47 [10323.101321] [<ffffffff81457a28>] __wait_on_bit+0x48/0x7b [10323.101328] [<ffffffff81457ac9>] out_of_line_wait_on_bit+0x6e/0x79 [10323.101333] [<ffffffff811160a4>] ? sync_buffer+0x0/0x47 [10323.101340] [<ffffffff8105c25d>] ? wake_bit_function+0x0/0x33 [10323.101346] [<ffffffff81116024>] __wait_on_buffer+0x24/0x26 [10323.101352] [<ffffffff811185b2>] __bread+0xa4/0xba [10323.101361] [<ffffffffa01817d6>] fat_ent_bread+0x6f/0xb1 [fat] [10323.101370] [<ffffffffa0180e7c>] fat_alloc_clusters+0x202/0x473 [fat] [10323.101378] [<ffffffff8106cbf7>] ? trace_hardirqs_on+0xd/0xf [10323.101387] [<ffffffffa0183d1d>] fat_get_block+0x107/0x1de [fat] [10323.101393] [<ffffffff81116d9c>] __block_prepare_write+0x1bf/0x3a8 [10323.101400] [<ffffffff810c3a21>] ? ____pagevec_lru_add+0x145/0x15b [10323.101409] [<ffffffffa0183c16>] ? fat_get_block+0x0/0x1de [fat] [10323.101416] [<ffffffff810bb841>] ? grab_cache_page_write_begin+0x83/0xaf [10323.101422] [<ffffffff8111712a>] block_write_begin_newtrunc+0x82/0xbc [10323.101428] [<ffffffff8111743a>] cont_write_begin_newtrunc+0x2d6/0x2f7 [10323.101437] [<ffffffffa0183c16>] ? fat_get_block+0x0/0x1de [fat] [10323.101443] [<ffffffff81115c14>] ? generic_write_end+0x67/0x79 [10323.101451] [<ffffffffa0183f31>] fat_write_begin+0x44/0x68 [fat] [10323.101459] [<ffffffffa0183c16>] ? fat_get_block+0x0/0x1de [fat] [10323.101466] [<ffffffff810ba40c>] ? iov_iter_copy_from_user_atomic+0x81/0xf1 [10323.101472] [<ffffffff810ba69d>] generic_file_buffered_write+0x106/0x267 [10323.101479] [<ffffffff810467db>] ? current_fs_time+0x27/0x2e [10323.101485] [<ffffffff810bc5aa>] __generic_file_aio_write+0x350/0x385 [10323.101492] [<ffffffff810bc62b>] ? generic_file_aio_write+0x4c/0xad [10323.101498] [<ffffffff810bc63e>] generic_file_aio_write+0x5f/0xad [10323.101505] [<ffffffff810f1c20>] do_sync_write+0xcb/0x108 [10323.101511] [<ffffffff810f2639>] vfs_write+0xb2/0x177 [10323.101517] [<ffffffff810f27c1>] sys_write+0x4a/0x71 [10323.101524] [<ffffffff81002b1b>] system_call_fastpath+0x16/0x1b [10323.101528] INFO: lockdep is turned off. ------------------------------------------------------------ I got the same message 6 times while copying the roughly 2.6 GiB of data. Usually the copy process seemed to have paused in the middle of a big (tens of megabytes) file. Eventually it always continued, until suddenly pausing again. I think all 6 messages are from different pauses. This is how my system is setup: ------------------------------------------------------------ # losetup -a /dev/loop0: [fd02]:4195879 (/home/sliedes/scratch/freedos-fullcd.iso) /dev/loop1: [fd02]:4194311 (/home/sliedes/scratch/sda.freedos.img), offset 32256 /dev/loop2: [fd02]:4195170 (/home/sliedes/scratch/sda.new.img), offset 32256 ------------------------------------------------------------ The files are being copied from loop2 to loop1, with cp -R. ------------------------------------------------------------ # mount /dev/mapper/rootvg-root on / type ext4 (rw,noatime,user_xattr,errors=remount-ro) tmpfs on /lib/init/rw type tmpfs (rw,nosuid,mode=0755) proc on /proc type proc (rw,noexec,nosuid,nodev) sysfs on /sys type sysfs (rw,noexec,nosuid,nodev) udev on /dev type tmpfs (rw,mode=0755) tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev) devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=620) /dev/sda2 on /boot type ext4 (rw,noatime,user_xattr) fusectl on /sys/fs/fuse/connections type fusectl (rw) /dev/loop0 on /media/cdrom0 type iso9660 (rw) /dev/loop1 on /media/usbstick type vfat (rw,offset=32256) /dev/loop2 on /media/cdrom0 type vfat (rw,offset=32256) ------------------------------------------------------------ Here I have forgotten to unmount loop0 before mounting loop2 on the same mount point. That may be significant (but I doubt it is). I'm just reusing /media/usbstick and /media/cdrom0 here as mount points, the mounted filesystems are images. ------------------------------------------------------------ # dmsetup table sda3_crypt: 0 603094109 crypt aes-cbc-essiv:sha256 0000000000000000000000000000000000000000000000000000000000000000 0 8:3 2056 rootvg-swap: 0 7806976 linear 253:0 384 rootvg-root: 0 292962304 linear 253:0 7807360 # vgdisplay --- Volume group --- VG Name rootvg System ID Format lvm2 Metadata Areas 1 Metadata Sequence No 3 VG Access read/write VG Status resizable MAX LV 0 Cur LV 2 Open LV 2 Max PV 0 Cur PV 1 Act PV 1 VG Size 287.57 GiB PE Size 4.00 MiB Total PE 73619 Alloc PE / Size 36715 / 143.42 GiB Free PE / Size 36904 / 144.16 GiB VG UUID sb9cn1-t3eP-Sg9g-OX2d-R6f0-w76S-7yWwX6 # lvdisplay --- Logical volume --- LV Name /dev/rootvg/swap VG Name rootvg LV UUID gHA7rN-g5eY-mf2Y-cPed-3W7f-Qxs1-achVMv LV Write Access read/write LV Status available # open 1 LV Size 3.72 GiB Current LE 953 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:1 --- Logical volume --- LV Name /dev/rootvg/root VG Name rootvg LV UUID TWtcld-fZMX-7VIP-RKq6-iE92-FIVB-vE7DoS LV Write Access read/write LV Status available # open 1 LV Size 139.70 GiB Current LE 35762 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:2 # pvdisplay --- Physical volume --- PV Name /dev/dm-0 VG Name rootvg PV Size 287.58 GiB / not usable 3.55 MiB Allocatable yes PE Size 4.00 MiB Total PE 73619 Free PE 36904 Allocated PE 36715 PV UUID dlyua1-1NHL-Whcx-h8wm-kM9U-Rfmv-WTOoUi ------------------------------------------------------------
Created attachment 27486 [details] A script to make file system writes to block after a while. This seems the bug that has plagued me for quite some time and now I got into compiling vanilla 2.6.35.2 AMD64 kernel and made a test case that induces this bug quite reliably. Basically it makes a crypted file system that has 2 files that are also crypted file systems through loopback device and device mapper and then reads one and writes another and after a while, if everything goes wrong, the "task x was blocked for more than 120 seconds" message with some backtraces appears in dmesg. This is also independent of used file systems but it seems that this needs an encrypted file system through loopback device on top of encrypted file system to crash somewhat reliably. Attached a script to bring this bug out. Just give a block device that can be overwritten (like an empty USB stick) as the only parameter to this script and wait for a while (and maybe do some other disk activity too).