Bug 9401
Summary: | lvmove SATA->USB with dm-crypt breaks file system | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | Marti Raudsepp (marti) |
Component: | LVM2/DM | Assignee: | Alasdair G Kergon (agk) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | agk, alan, bugzilla.kernel.org, christophe, email.bug, gmazyland, jan, jim, mvanderkolff, tim.w.connors, zing |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.33 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: | dm-crypt-bug.sh |
Description
Marti Raudsepp
2007-11-18 08:58:21 UTC
Created attachment 13598 [details]
dm-crypt-bug.sh
This is the script I wrote to trigger the bug.
By the way, the first 'lvcreate' line in the bug report should have contained $VOLGROUP instead of "primary"
bugme-daemon@bugzilla.kernel.org wrote: > This is the script I wrote to trigger the bug. Thanks for report and script, I reproduced this bug finally. I will try to find what's happening there. Milan Well the problem is not in dm-crypt, but is more generic - stacking block devices and block queue restrictions. Here are our simple stacked devices: Physical volume (USB vs SATA) \_LV-volume (primary-punchbag) \_CRYPT volume (crypt-punchbag) pvmove operation changes underlying Physical volume, unfortunately it has different hw parameters (max_hw_sectors). Mapping table for LV volume is correctly reloaded and block queue parameters are properly set, but this won't happen for CRYPT volume on top of it. So crypt volume still sends bigger bio that underlying device allows. Of course this will not happen if we use USB (device with smallest max_sectors) in the first place. We can simply move dm-crypt out of the picture here, instead of cryptsetup, try to use simple linear mapping over LV-volume and you get the same error: dmsetup create crypt-punchbag --table "0 `blockdev --getsize /dev/mapper/$VOLGROUP-punchbag` linear /dev/mapper/$VOLGROUP-punchbag 0" Also if crypt device mapping table is reloaded (to force new restriction to apply), it will work correctly: echo 3 > /proc/sys/vm/drop_caches dmsetup suspend crypt-punchbag dmsetup reload crypt-punchbag --table "`dmsetup table --showkeys crypt-punchbag`" dmsetup resume crypt-punchbag sha1sum /mnt/punchbag/junk Problem is now fully understood but solution is not so simple. This problem can happen with arbitrary block devices stacked mapping. For long-term - one possible solution is that block devices should be responsible for splitting requests (currently upper device should not send too big bio). Some patches already exist for this. But maybe some *workaround* at least for stacked device mapper block devices to work correctly in this situation can be used. Milan -- mbroz@redhat.com (In reply to comment #3) > Problem is now fully understood but solution is not so simple. > This problem can happen with arbitrary block devices stacked mapping. > > For long-term - one possible solution is that block devices should be > responsible for splitting requests (currently upper device should not send > too > big bio). Some patches already exist for this. Disclaimer: I probably don't know what I'm talking about. Has this been thoroughly discussed on the LKML, can you point to the discussions? I'm not convinced that this should be the final solution in the long run, because in this case, a 256-block request would be split into 240 and 16 -- the second request is almost wasted. Though I realize that it could be viable for solving spurious race conditions when max_hw_sectors changes after a request had been submitted but not yet serviced. *** Bug 13252 has been marked as a duplicate of this bug. *** Has any progress been made on this? I see this error when resyncing an LVM-on-RAID setup to a USB disk. Step 1: Plug in a USB disk Step 2: Give it a compatible partition table Step 3: Add device to LVM-on-RAID setup (mdadm --manage /dev/md1 --add /dev/sdc6) I see some bio errors. This is on ubuntu 10.04.3 using kernel 2.6.32-33 Cheers, Michael I get the impression that this will never be fixed. LVM was intended for mostly server uses. Non-server requirements (such as proper USB disk support, boot time reduction, etc) aren't a priority to LVM developers. Sad, but that's the way things are. Desktop distros, including Fedora, have already dropped LVM from their default install. This problem affects also md raid1 devices. I do not see how to setup desktop machine without any RAID1/5/6 so that it works for the user unattended for some time with the unreliable drives we have today. Yes, it does affect raid1. In my case, my laptop only has 2 esata interfaces, so when I want to sync my raid backup disk to alternate with offsite storage, I have to make use of the USB output. There's hope yet... https://www.redhat.com/archives/dm-devel/2012-May/msg00200.html looks like the fix is called "immutable bio vecs" https://lkml.org/lkml/2012/10/15/398 Alasdair, you marked this bug as resolved. Could you point to the fix? I just ran into it with 3.14.15 after adding a USB disk to a RAID1 array + LVM, but 3.14 included Kent Overstreet's immutable biovecs patch. Is there something else that's needed? I'm also getting this problem. currently I'm using kernel 4.1.10 on NixOS. I have a a a single SSD in my notebook that's setup as raid1. On top of that I have dm-crypt running. Rationale: Attach external usb 3.0 drive, expand raid to 2 devices, make a hot sync, and detach again. Here's some dmesg output if that's any useful: https://paste.simplylinux.ch/view/d0d6899f |