Bug 9401

Summary: lvmove SATA->USB with dm-crypt breaks file system
Product: IO/Storage Reporter: Marti Raudsepp (marti)
Component: LVM2/DMAssignee: Alasdair G Kergon (agk)
Status: CLOSED CODE_FIX    
Severity: normal CC: agk, alan, bugzilla.kernel.org, christophe, email.bug, gmazyland, jan, jim, mvanderkolff, tim.w.connors, zing
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.33 Subsystem:
Regression: No Bisected commit-id:
Attachments: dm-crypt-bug.sh

Description Marti Raudsepp 2007-11-18 08:58:21 UTC
Most recent kernel where this bug did not occur:
  ??? not tested with earlier than 2.6.19

Distribution:
  Gentoo

Hardware Environment:
 * USB disk 120GB Western Digital, model 00UE-00KVT0 (according to udev),
   serial DEF10CD7F64C
 * SATA disk 200GB Seagate 7200.7, model ST3200822AS
 * Motherboard Asus A8N5X, nForce4 chipset

Software Environment:
 * Offending file system: reiserfs v3.6, mounted with noatime,barrier=flush
 * dm-crypt using aes-256 with cbc-essiv:sha256; using assembly-optimized AES
   on x86_64 (CONFIG_CRYPTO_AES_X86_64)
 * LVM utilities version: 2.02.17 (2006-12-14)
 * LVM library version: 1.02.12 (2006-10-13)
 * LVM driver version: 4.11.0
 * cryptsetup-luks 1.0.5 (user space interface to dm-crypt)


----
Problem Description:
----
I'm reporting this here because my bug reports half a year ago on the LKML didn't turn up any solutions. I'd like to stress here that this is a *FULLY REPRODUCIBLE* bug.

After lvmove'ing a dm-crypted LVM volume from a SATA disk to a USB disk,
reiserfs starts spewing I/O errors with "bio too big device dm-1 (256 > 240)"
in dmesg; fsck reports no corruptions, and problems only occur at certain
lengths in different files, such as 112k, 240k, 368k, and only with files that
existed before, and not newly written files.  When copying the partition back
to its original disk, everything works again.

The same issue applies to ext3 to a lesser extent: the error messages still appear in dmesg, but instead of breaking, ext3 is simply _insanely_ slow; much slower than the USB disk normally is. So this sounds like a workaround in ext3, rather than a bug in reiserfs.

None of these problems occur either when:
(a) dm-crypt is missing from the picture, or
(b) the file system was initialized on the USB disk in the first place.

The original bug reports on LKML contain much more detail:
 * http://article.gmane.org/gmane.linux.kernel/502020
 * http://article.gmane.org/gmane.linux.file-systems/13619
 * http://article.gmane.org/gmane.linux.kernel/508104


----
Steps to reproduce:
----
$VOLGROUP     - name of the LVM volume group containing both USB and SATA PVs
$SATA_PV_DISK - physical volume path of the non-USB disk, e.g. /dev/sda5
$USB_PV_DISK  - physical volume path of the USB disk, e.g. /dev/sdc1

mkdir /mnt/punchbag

lvcreate -n punchbag --extents=60 primary $SATA_PV_DISK
cryptsetup luksFormat /dev/mapper/$VOLGROUP-punchbag
cryptsetup luksOpen /dev/mapper/$VOLGROUP-punchbag crypt-punchbag
mkfs.reiserfs /dev/mapper/crypt-punchbag
mount /dev/mapper/crypt-punchbag /mnt/punchbag -o rw,noatime,barrier=flush

# write some stuff onto /mnt/punchbag
dd if=/dev/zero of=/mnt/punchbag/junk bs=1M count=10

# make sure that nothing is written onto the disk hereafter
mount /mnt/punchbag -o remount,ro 

pvmove -i2 -npunchbag $SATA_PV_DISK $USB_PV_DISK
sync

# drop caches: otherwise the newly-written file will already be cached
echo 3 > /proc/sys/vm/drop_caches

# witness the breakage
sha1sum /mnt/punchbag/*
Comment 1 Marti Raudsepp 2007-11-18 09:12:48 UTC
Created attachment 13598 [details]
dm-crypt-bug.sh

This is the script I wrote to trigger the bug.

By the way, the first 'lvcreate' line in the bug report should have contained $VOLGROUP instead of "primary"
Comment 2 Milan Broz 2007-11-20 12:31:27 UTC
bugme-daemon@bugzilla.kernel.org wrote:
> This is the script I wrote to trigger the bug.

Thanks for report and script, I reproduced this bug finally.

I will try to find what's happening there.

Milan
Comment 3 Milan Broz 2007-11-22 02:37:15 UTC
Well the problem is not in dm-crypt, but is more generic - stacking block devices
and block queue restrictions.

Here are our simple stacked devices:

Physical volume (USB vs SATA)
	\_LV-volume (primary-punchbag)
		\_CRYPT volume (crypt-punchbag)

pvmove operation changes underlying Physical volume, unfortunately it has different
hw parameters (max_hw_sectors).

Mapping table for LV volume is correctly reloaded and block queue parameters are
properly set, but this won't happen for CRYPT volume on top of it.

So crypt volume still sends bigger bio that underlying device allows.
Of course this will not happen if we use USB (device with smallest max_sectors)
in the first place.

We can simply move dm-crypt out of the picture here, instead of cryptsetup,
try to use simple linear mapping over LV-volume and you get the same error:

dmsetup create crypt-punchbag --table "0 `blockdev --getsize /dev/mapper/$VOLGROUP-punchbag` linear /dev/mapper/$VOLGROUP-punchbag 0"

Also if crypt device mapping table is reloaded (to force new restriction to apply),
it will work correctly:

echo 3 > /proc/sys/vm/drop_caches
dmsetup suspend crypt-punchbag
dmsetup reload crypt-punchbag --table "`dmsetup table --showkeys crypt-punchbag`"
dmsetup resume crypt-punchbag
sha1sum /mnt/punchbag/junk

Problem is now fully understood but solution is not so simple.
This problem can happen with arbitrary block devices stacked mapping.

For long-term - one possible solution is that block devices should be responsible
for splitting requests (currently upper device should not send too big bio).
Some patches already exist for this.

But maybe some *workaround* at least for stacked device mapper block devices
to work correctly in this situation can be used.

Milan
--
mbroz@redhat.com
Comment 4 Marti Raudsepp 2007-11-24 07:16:56 UTC
(In reply to comment #3)
> Problem is now fully understood but solution is not so simple.
> This problem can happen with arbitrary block devices stacked mapping.
>
> For long-term - one possible solution is that block devices should be
> responsible for splitting requests (currently upper device should not send
> too
> big bio). Some patches already exist for this.

Disclaimer: I probably don't know what I'm talking about.

Has this been thoroughly discussed on the LKML, can you point to the discussions? I'm not convinced that this should be the final solution in the long run, because in this case, a 256-block request would be split into 240 and 16 -- the second request is almost wasted.

Though I realize that it could be viable for solving spurious race conditions when max_hw_sectors changes after a request had been submitted but not yet serviced.
Comment 5 Alasdair G Kergon 2009-10-18 18:42:33 UTC
*** Bug 13252 has been marked as a duplicate of this bug. ***
Comment 6 Michael van der Kolff 2011-08-05 09:37:12 UTC
Has any progress been made on this?  I see this error when resyncing an LVM-on-RAID setup to a USB disk.

Step 1:
  Plug in a USB disk
Step 2:
  Give it a compatible partition table
Step 3:
  Add device to LVM-on-RAID setup (mdadm --manage /dev/md1 --add /dev/sdc6)

I see some bio errors.  This is on ubuntu 10.04.3 using kernel 2.6.32-33

Cheers,

Michael
Comment 7 Marti Raudsepp 2011-08-05 11:03:18 UTC
I get the impression that this will never be fixed. LVM was intended for mostly server uses. Non-server requirements (such as proper USB disk support, boot time reduction, etc) aren't a priority to LVM developers. Sad, but that's the way things are.

Desktop distros, including Fedora, have already dropped LVM from their default install.
Comment 8 Jan Kratochvil 2011-08-05 12:54:37 UTC
This problem affects also md raid1 devices.

I do not see how to setup desktop machine without any RAID1/5/6 so that it works for the user unattended for some time with the unreliable drives we have today.
Comment 9 Tim Connors 2011-08-05 13:00:04 UTC
Yes, it does affect raid1.  In my case, my laptop only has 2 esata interfaces, so when I want to sync my raid backup disk to alternate with offsite storage, I have to make use of the USB output.
Comment 10 Alasdair G Kergon 2012-05-19 01:04:55 UTC
There's hope yet...

https://www.redhat.com/archives/dm-devel/2012-May/msg00200.html
Comment 11 bug email 2013-05-03 16:51:16 UTC
looks like the fix is called "immutable bio vecs"
https://lkml.org/lkml/2012/10/15/398
Comment 12 Jim Paris 2015-02-23 06:01:14 UTC
Alasdair, you marked this bug as resolved.  Could you point to the fix?  I just ran into it with 3.14.15 after adding a USB disk to a RAID1 array + LVM, but 3.14 included Kent Overstreet's immutable biovecs patch.  Is there something else that's needed?
Comment 13 SJ 2015-10-18 16:32:17 UTC
I'm also getting this problem.

currently I'm using kernel 4.1.10 on NixOS. I have a a a single SSD in my notebook that's setup as raid1. On top of that I have dm-crypt running.

Rationale: Attach external usb 3.0 drive, expand raid to 2 devices, make a hot sync, and detach again.

Here's some dmesg output if that's any useful: https://paste.simplylinux.ch/view/d0d6899f