Setup: A USB stick with LUKS+EXT4 partition. It is supposed to be inserted/removed without previous manual unmounting, as the important info is readonly most of time. It is mounted/umounted by udev rules: # cat /etc/udev/rules.d/99-sensitive_mount.rules ACTION=="add", ENV{.ID_FS_TYPE_NEW}=="crypto_LUKS", ENV{ID_SERIAL}=="JetFlash_Transcend_8GB_082AA696KZPJGI1C-0:0", RUN+="/usr/local/sbin/sensitive_mount.sh" ACTION=="remove", ENV{.ID_FS_TYPE_NEW}=="crypto_LUKS", ENV{ID_SERIAL}=="JetFlash_Transcend_8GB_082AA696KZPJGI1C-0:0", RUN+="/usr/local/sbin/sensitive_umount.sh" # cat /usr/local/sbin/sensitive_mount.sh #!/bin/bash /usr/local/sbin/sensitive_umount.sh cryptsetup luksOpen --key-file /etc/sensitive.key /dev/disk/by-uuid/840dfa3f-e449-4124-8ebb-dea5785214e8 sensitive && mount -o noatime,sync /dev/mapper/sensitive /j/sensitive # cat /usr/local/sbin/sensitive_umount.sh #!/bin/bash umount -l /j/sensitive cryptsetup close sensitive It worked fine until addition of GPG stuff onto it. When gpg-agent is runnuing, it holds such UNIX socket open: # lsof | grep gnupg gpg-agent 3744 j 3u unix 0xffff88014453fa80 0t0 13557 /j/.gnupg/S.gpg-agent type=STREAM In the end, "cryptsetup close" refuses to work forever, saying that device is busy. No escape here. dmesg output for voluntary removal of USB stick when gpg-agent is not running: $ cat luks_umount_fail.dmesg [ 3671.056097] usb 2-1.4: USB disconnect, device number 3 [ 3671.080108] Buffer I/O error on dev dm-0, logical block 0, lost async page write [ 3671.080122] Buffer I/O error on dev dm-0, logical block 1, lost async page write [ 3671.080128] Buffer I/O error on dev dm-0, logical block 465, lost async page write [ 3671.080133] Buffer I/O error on dev dm-0, logical block 481, lost async page write [ 3671.080137] Buffer I/O error on dev dm-0, logical block 483, lost async page write [ 3671.080143] Buffer I/O error on dev dm-0, logical block 8679, lost async page write [ 3671.080492] Buffer I/O error on dev dm-0, logical block 557056, lost sync page write [ 3671.080505] JBD2: Error -5 detected when updating journal superblock for dm-0-8. [ 3671.080509] Aborting journal on device dm-0-8. [ 3671.080578] Buffer I/O error on dev dm-0, logical block 557056, lost sync page write [ 3671.080586] JBD2: Error -5 detected when updating journal superblock for dm-0-8. [ 3671.080610] EXT4-fs (dm-0): previous I/O error to superblock detected [ 3671.080898] Buffer I/O error on dev dm-0, logical block 0, lost sync page write [ 3671.081308] EXT4-fs error (device dm-0): ext4_put_super:800: Couldn't clean up the journal [ 3671.081313] EXT4-fs (dm-0): Remounting filesystem read-only [ 3671.081316] EXT4-fs (dm-0): previous I/O error to superblock detected [ 3671.081380] Buffer I/O error on dev dm-0, logical block 0, lost sync page write attaching it back: [ 3707.263232] usb 2-1.4: new high-speed USB device number 4 using ehci-pci [ 3707.350263] usb 2-1.4: New USB device found, idVendor=8564, idProduct=1000 [ 3707.350270] usb 2-1.4: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ 3707.350273] usb 2-1.4: Product: Mass Storage Device [ 3707.350276] usb 2-1.4: Manufacturer: JetFlash [ 3707.350279] usb 2-1.4: SerialNumber: 082AA696KZPJGI1C [ 3707.350657] usb-storage 2-1.4:1.0: USB Mass Storage device detected [ 3707.351751] scsi host5: usb-storage 2-1.4:1.0 [ 3708.524492] scsi 5:0:0:0: Direct-Access JetFlash Transcend 8GB 1100 PQ: 0 ANSI: 4 [ 3708.525690] sd 5:0:0:0: Attached scsi generic sg1 type 0 [ 3708.526455] sd 5:0:0:0: [sdb] 15433728 512-byte logical blocks: (7.90 GB/7.35 GiB) [ 3708.527197] sd 5:0:0:0: [sdb] Write Protect is off [ 3708.527203] sd 5:0:0:0: [sdb] Mode Sense: 43 00 00 00 [ 3708.527944] sd 5:0:0:0: [sdb] No Caching mode page found [ 3708.527948] sd 5:0:0:0: [sdb] Assuming drive cache: write through [ 3708.532079] sdb: sdb1 sdb2 [ 3708.535450] sd 5:0:0:0: [sdb] Attached SCSI removable disk [ 3711.853602] EXT4-fs (dm-0): recovery complete [ 3711.857413] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: (null) Now it is removed with gpg-agent running: [ 3768.584489] usb 2-1.4: USB disconnect, device number 4 [ 3773.421215] systemd-udevd[4409]: Process '/usr/local/sbin/sensitive_umount.sh' failed with exit code 5. [ 3778.857712] Buffer I/O error on dev dm-0, logical block 0, lost async page write [ 3778.857725] Buffer I/O error on dev dm-0, logical block 1, lost async page write [ 3778.857730] Buffer I/O error on dev dm-0, logical block 450, lost async page write [ 3778.857735] Buffer I/O error on dev dm-0, logical block 465, lost async page write [ 3778.857739] Buffer I/O error on dev dm-0, logical block 481, lost async page write [ 3778.857743] Buffer I/O error on dev dm-0, logical block 482, lost async page write [ 3778.857748] Buffer I/O error on dev dm-0, logical block 8679, lost async page write [ 3813.807257] Buffer I/O error on dev dm-0, logical block 557056, lost sync page write [ 3813.807292] JBD2: Error -5 detected when updating journal superblock for dm-0-8. [ 3813.807297] Aborting journal on device dm-0-8. [ 3813.808554] Buffer I/O error on dev dm-0, logical block 557056, lost sync page write [ 3813.808563] JBD2: Error -5 detected when updating journal superblock for dm-0-8. [ 3813.808588] EXT4-fs (dm-0): previous I/O error to superblock detected [ 3813.808654] Buffer I/O error on dev dm-0, logical block 0, lost sync page write [ 3813.808660] EXT4-fs error (device dm-0): ext4_put_super:800: Couldn't clean up the journal [ 3813.808664] EXT4-fs (dm-0): Remounting filesystem read-only [ 3813.808666] EXT4-fs (dm-0): previous I/O error to superblock detected [ 3813.808718] Buffer I/O error on dev dm-0, logical block 0, lost sync page write The important outcome of this bug is that user cannot mount the disk by same device-mapper name again until the machine is rebooted. Any help? I am ready to build and test patched kernel, and to tinker with code. Thanks for any help.
Well, firstly you need to try to work out what is still holding the device and why it cannot be removed and whether there is a controlled way to release it or whether it could be considered a bug. Secondly, see if there is a workaround that at least moves it out of the way so the name can be reused. At the dm layer, there are tricks like 'dmsetup rename' and 'dmsetup remove' which can also take --force and --deferred.
If a running gpg-agent is the issue, do you need a mechanism for the udev rule to signal to that process that it must release the device first?
- Try having the udev rule first kill the gpg-agent process. That could be generalised to kill all processes using the device being rules. Tools like 'lsof' show you what is using it.
Thanks for quick reply. > If a running gpg-agent is the issue, do you need a mechanism for the udev > rule to signal to that process that it must release the device first? There would be a race condition anyway I believe. Maybe not in this, but in another application. I believe I can workaround my personal issue by symlinking that unix socket to outside of my encrypted stick. > Well, firstly you need to try to work out what is still holding the device > and why it cannot be removed I think I'll increase kernel log verbosity level and see what it gives. Otherwise, I have no idea how to debug that.
Yes, or perhaps something like --no-use-standard-socket By enabling this option gpg-agent will listen on the socket named ‘S.gpg-agent’, located in the home directory, and not cre- ate a random socket below a temporary directory. Tools connect- ing to gpg-agent should first try to connect to the socket given in environment variable GPG_AGENT_INFO and then fall back to this socket. This option may not be used if the home directory is mounted as a remote file system. Note, that --use-standard- socket is the default on Windows systems.
or probably the opposite --use-standard-socket
I have GnuPG 2.1.9 and these options have no effect anymore. And there's no way to set socket location. --use-standard-socket --no-use-standard-socket --use-standard-socket-p Since GnuPG 2.1 the standard socket is always used. These options have no more effect. The command gpg-agent --use- standard-socket-p will thus always return success.
Ok, killing gpg-agent works for me, but still - this smells like a bug of device mapper.
The kernel quite correctly won't let you remove a filesystem or device in a safe and controlled way while it is still in use - it does look like your problem can be resolved entirely in userspace. If you can't find a way to control where the file is being placed, try the symlink idea, or speak to the gpg-agent developers.
So you consider it perfectly correct to be unable to reuse /dev/mapper name after that, and if one wants to reuse it, he needs to resort to rebooting physical machine, because even "dmsetup remove" and "dmsetup rename" don't work? # dmsetup remove sensitive device-mapper: remove ioctl on sensitive failed: Device or resource busy Command failed [ERR] 15:01root@acer ~ # dmsetup remove --force sensitive device-mapper: resume ioctl on sensitive failed: Invalid argument device-mapper: remove ioctl on sensitive failed: Device or resource busy Command failed [ERR] 15:02root@acer ~ # dmsetup remove --force sensitive device-mapper: resume ioctl on sensitive failed: Invalid argument device-mapper: remove ioctl on sensitive failed: Device or resource busy Command failed [ERR] 15:02root@acer ~ # dmsetup remove --force --deferred sensitive device-mapper: resume ioctl on sensitive failed: Invalid argument [OK] 15:02root@acer ~ # ls /dev/mapper/ control sensitive [OK] 15:02root@acer ~ # dmsetup remove --force --deferred sensitive device-mapper: resume ioctl on sensitive failed: Invalid argument Renaming: # dmsetup rename sensitive sensitive_garbage device-mapper: rename ioctl on sensitive failed: No such device or address Command failed
Are there still any users of the device? What does 'open count' of 'dmsetup info' output say? If so, you need to eliminate them first - kill the process, unmount the filesystem etc. The deferred remove OR the rename (but not both!) might have worked, but you need to look at the state with 'dmsetup info' after each command to find out. ('dmsetup info -c' gives a handy summary. Also 'dmsetup table' can show if a forced remove was attempted.)
Sorry for the delay, will check in nearest days. The ideat that there may be some processes still having open files is reasonable, will check that. Maybe it is really the reason why closing doesn't work.