Bug 16502 - "task cp:4126 blocked for more than 120 seconds" when copying files FAT->FAT; possibly dm-crypt related
Summary: "task cp:4126 blocked for more than 120 seconds" when copying files FAT->FAT;...
Status: RESOLVED OBSOLETE
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: Block Layer (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Jens Axboe
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-08-03 19:32 UTC by Sami Liedes
Modified: 2012-08-09 15:26 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.35
Subsystem:
Regression: No
Bisected commit-id:


Attachments
A script to make file system writes to block after a while. (1.20 KB, application/octet-stream)
2010-08-17 14:04 UTC, Jussi Judin
Details

Description Sami Liedes 2010-08-03 19:32:44 UTC
Hi,

I'm copying a directory structure from a vfat partition to another. Both vfat partitions are in (different) disk images, mounted through the loopback device. The image files are on an ext4 partition encrypted with dm-crypt. I suspect device mapper, especially dm-crypt, *may* have something to do with this, as I have encountered similar problems with it before. The disk works fine.

------------------------------------------------------------
[10323.101237] INFO: task cp:4126 blocked for more than 120 seconds.
[10323.101242] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[10323.101247] cp            D 0000000000000000     0  4126   3017 0x00000000
[10323.101255]  ffff880060b55658 0000000000000046 ffff88013d17a800 ffff880060b55fd8
[10323.101265]  ffff88013e99bf00 ffff880060b55fd8 0000000000013600 0000000000004000
[10323.101273]  0000000000013600 0000000000013600 ffff88013fdb9f80 ffff88013e99bf00
[10323.101282] Call Trace:
[10323.101293]  [<ffffffff81064b6e>] ? ktime_get_ts+0xb2/0xbf
[10323.101301]  [<ffffffff811160a4>] ? sync_buffer+0x0/0x47
[10323.101308]  [<ffffffff814572b6>] io_schedule+0x73/0xb5
[10323.101314]  [<ffffffff811160e7>] sync_buffer+0x43/0x47
[10323.101321]  [<ffffffff81457a28>] __wait_on_bit+0x48/0x7b
[10323.101328]  [<ffffffff81457ac9>] out_of_line_wait_on_bit+0x6e/0x79
[10323.101333]  [<ffffffff811160a4>] ? sync_buffer+0x0/0x47
[10323.101340]  [<ffffffff8105c25d>] ? wake_bit_function+0x0/0x33
[10323.101346]  [<ffffffff81116024>] __wait_on_buffer+0x24/0x26
[10323.101352]  [<ffffffff811185b2>] __bread+0xa4/0xba
[10323.101361]  [<ffffffffa01817d6>] fat_ent_bread+0x6f/0xb1 [fat]
[10323.101370]  [<ffffffffa0180e7c>] fat_alloc_clusters+0x202/0x473 [fat]
[10323.101378]  [<ffffffff8106cbf7>] ? trace_hardirqs_on+0xd/0xf
[10323.101387]  [<ffffffffa0183d1d>] fat_get_block+0x107/0x1de [fat]
[10323.101393]  [<ffffffff81116d9c>] __block_prepare_write+0x1bf/0x3a8
[10323.101400]  [<ffffffff810c3a21>] ? ____pagevec_lru_add+0x145/0x15b
[10323.101409]  [<ffffffffa0183c16>] ? fat_get_block+0x0/0x1de [fat]
[10323.101416]  [<ffffffff810bb841>] ? grab_cache_page_write_begin+0x83/0xaf
[10323.101422]  [<ffffffff8111712a>] block_write_begin_newtrunc+0x82/0xbc
[10323.101428]  [<ffffffff8111743a>] cont_write_begin_newtrunc+0x2d6/0x2f7
[10323.101437]  [<ffffffffa0183c16>] ? fat_get_block+0x0/0x1de [fat]
[10323.101443]  [<ffffffff81115c14>] ? generic_write_end+0x67/0x79
[10323.101451]  [<ffffffffa0183f31>] fat_write_begin+0x44/0x68 [fat]
[10323.101459]  [<ffffffffa0183c16>] ? fat_get_block+0x0/0x1de [fat]
[10323.101466]  [<ffffffff810ba40c>] ? iov_iter_copy_from_user_atomic+0x81/0xf1
[10323.101472]  [<ffffffff810ba69d>] generic_file_buffered_write+0x106/0x267
[10323.101479]  [<ffffffff810467db>] ? current_fs_time+0x27/0x2e
[10323.101485]  [<ffffffff810bc5aa>] __generic_file_aio_write+0x350/0x385
[10323.101492]  [<ffffffff810bc62b>] ? generic_file_aio_write+0x4c/0xad
[10323.101498]  [<ffffffff810bc63e>] generic_file_aio_write+0x5f/0xad
[10323.101505]  [<ffffffff810f1c20>] do_sync_write+0xcb/0x108
[10323.101511]  [<ffffffff810f2639>] vfs_write+0xb2/0x177
[10323.101517]  [<ffffffff810f27c1>] sys_write+0x4a/0x71
[10323.101524]  [<ffffffff81002b1b>] system_call_fastpath+0x16/0x1b
[10323.101528] INFO: lockdep is turned off.
------------------------------------------------------------

I got the same message 6 times while copying the roughly 2.6 GiB of data. Usually the copy process seemed to have paused in the middle of a big (tens of megabytes) file. Eventually it always continued, until suddenly pausing again. I think all 6 messages are from different pauses.

This is how my system is setup:

------------------------------------------------------------
# losetup -a
/dev/loop0: [fd02]:4195879 (/home/sliedes/scratch/freedos-fullcd.iso)
/dev/loop1: [fd02]:4194311 (/home/sliedes/scratch/sda.freedos.img), offset 32256
/dev/loop2: [fd02]:4195170 (/home/sliedes/scratch/sda.new.img), offset 32256
------------------------------------------------------------

The files are being copied from loop2 to loop1, with cp -R.

------------------------------------------------------------
# mount
/dev/mapper/rootvg-root on / type ext4 (rw,noatime,user_xattr,errors=remount-ro)
tmpfs on /lib/init/rw type tmpfs (rw,nosuid,mode=0755)
proc on /proc type proc (rw,noexec,nosuid,nodev)
sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
udev on /dev type tmpfs (rw,mode=0755)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=620)
/dev/sda2 on /boot type ext4 (rw,noatime,user_xattr)
fusectl on /sys/fs/fuse/connections type fusectl (rw)
/dev/loop0 on /media/cdrom0 type iso9660 (rw)
/dev/loop1 on /media/usbstick type vfat (rw,offset=32256)
/dev/loop2 on /media/cdrom0 type vfat (rw,offset=32256)
------------------------------------------------------------

Here I have forgotten to unmount loop0 before mounting loop2 on the same mount point. That may be significant (but I doubt it is). I'm just reusing /media/usbstick and /media/cdrom0 here as mount points, the mounted filesystems are images.

------------------------------------------------------------
# dmsetup table
sda3_crypt: 0 603094109 crypt aes-cbc-essiv:sha256 0000000000000000000000000000000000000000000000000000000000000000 0 8:3 2056
rootvg-swap: 0 7806976 linear 253:0 384
rootvg-root: 0 292962304 linear 253:0 7807360

# vgdisplay 
  --- Volume group ---
  VG Name               rootvg
  System ID             
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  3
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                2
  Open LV               2
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               287.57 GiB
  PE Size               4.00 MiB
  Total PE              73619
  Alloc PE / Size       36715 / 143.42 GiB
  Free  PE / Size       36904 / 144.16 GiB
  VG UUID               sb9cn1-t3eP-Sg9g-OX2d-R6f0-w76S-7yWwX6

# lvdisplay 
  --- Logical volume ---
  LV Name                /dev/rootvg/swap
  VG Name                rootvg
  LV UUID                gHA7rN-g5eY-mf2Y-cPed-3W7f-Qxs1-achVMv
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                3.72 GiB
  Current LE             953
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:1
   
  --- Logical volume ---
  LV Name                /dev/rootvg/root
  VG Name                rootvg
  LV UUID                TWtcld-fZMX-7VIP-RKq6-iE92-FIVB-vE7DoS
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                139.70 GiB
  Current LE             35762
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:2

# pvdisplay 
  --- Physical volume ---
  PV Name               /dev/dm-0
  VG Name               rootvg
  PV Size               287.58 GiB / not usable 3.55 MiB
  Allocatable           yes 
  PE Size               4.00 MiB
  Total PE              73619
  Free PE               36904
  Allocated PE          36715
  PV UUID               dlyua1-1NHL-Whcx-h8wm-kM9U-Rfmv-WTOoUi
------------------------------------------------------------
Comment 1 Jussi Judin 2010-08-17 14:04:59 UTC
Created attachment 27486 [details]
A script to make file system writes to block after a while.

This seems the bug that has plagued me for quite some time and now I got into compiling vanilla 2.6.35.2 AMD64 kernel and made a test case that induces this bug quite reliably. Basically it makes a crypted file system that has 2 files that are also crypted file systems through loopback device and device mapper and then reads one and writes another and after a while, if everything goes wrong, the "task x was blocked for more than 120 seconds" message with some backtraces appears in dmesg. This is also independent of used file systems but it seems that this needs an encrypted file system through loopback device on top of encrypted file system to crash somewhat reliably.

Attached a script to bring this bug out. Just give a block device that can be overwritten (like an empty USB stick) as the only parameter to this script and wait for a while (and maybe do some other disk activity too).

Note You need to log in before you can comment on or make changes to this bug.