After the recent DM changes My luks crypt devices hang on closing. I am running the latest mainline kernel as of 6/29/2015 compiled from a cloned git repo. If I revert to 4.1.0 the devices close correctly. This command triggers the bug: enigma jon # /opt/cryptsetup/sbin/cryptsetup --debug luksClose data # cryptsetup 1.7.0-git processing "/opt/cryptsetup/sbin/cryptsetup --debug luksClose data" # Running command close. # Locking memory. # Installing SIGINT/SIGTERM handler. # Unblocking interruption on signal. # Allocating crypt device context by device data. # Initialising device-mapper backend library. # dm version OF [16384] (*1) # dm versions OF [16384] (*1) # Detected dm-crypt version 1.14.0, dm-ioctl version 4.32.0. # Device-mapper backend running with UDEV support enabled. # dm status data OF [16384] (*1) # Releasing device-mapper backend. # Trying to open and read device /dev/sdc2 with direct-io. # Allocating crypt device /dev/sdc2 context. # Trying to open and read device /dev/sdc2 with direct-io. # Initialising device-mapper backend library. # dm table data OFW [16384] (*1) # Trying to open and read device /dev/sdc2 with direct-io. # Crypto backend (gcrypt 1.5.3, flawed whirlpool) initialized. # Detected kernel Linux 4.2.0-9-generic x86_64. # Reading LUKS header of size 1024 from device /dev/sdc2 # Key length 64, device size 3873472512 sectors, header size 4036 sectors. # Deactivating volume data. # dm status data OF [16384] (*1) # Udev cookie 0xd4d5533 (semid 720896) created # Udev cookie 0xd4d5533 (semid 720896) incremented to 1 # Udev cookie 0xd4d5533 (semid 720896) incremented to 2 # Udev cookie 0xd4d5533 (semid 720896) assigned to REMOVE task(2) with flags (0x0) # dm remove data OFT [16384] (*1) It seems to remove/close/relock the device, but never returns. I do not see any kernel panics, the task will just sit there forever causing shutdown/restarts to fail since most scripts try and close the devices on reboot/shutdown.
Created attachment 181331 [details] kernel cmdline
Created attachment 181341 [details] loaded modules
Created attachment 181351 [details] CPU info
Created attachment 181361 [details] dmesg
Created attachment 181371 [details] blocked tasks and trace
Created attachment 181391 [details] hung task timeout output
Hmm, can you try reverting commit 0f20972f7 ("dm: factor out a common cleanup_mapped_device()")? I think I need to stop putting "No functional change" in commit headers that aren't intended to have a functional change... ;)
I am compiling the kernel now with head at 1bc5e157ed2b4f5b206155fc772d860158acd201 and the above commit reverted. Will post results as soon as I can reboot into it. The issue doesn't exist in the 4.1.0 release and as I recall for a couple days into the 4.2 merge window it wasn't there as well.
I am seeing the same behavior with the above mentioned commit reverted. I have attached a new call trace.
Created attachment 181401 [details] sysrq show blocked state after revert
Created attachment 181411 [details] latest hung task notification
I have been testing the kernel before all the dm changes on the 25th and do not get the error before the 25th. after the changes on the 25th , the error occurs. there were a lot of commits that day so not sure which one is causing it yet.
I have stepped through every commit that touches the DM code and found the commit that breaks things: e4bc13adfd ("Merge branch 'for-4.2/writeback' of git://git.kernel.dk/linux-block") Until this commit things work as expected. Afterwards things no longer work. Time to see if disabling cgroup writeback support in the kernel config restores functionality.
disabling the config option CGROUP_WRITEBACK restores normal functionality. Something in that code is interfering with standard luks operations.
Reassigning to Tejun. Tejun, I can assist with this but it is clearly your area of expertise.
Heh, more like my area of failure. Yeah, definitely looks like a bug in cgroup writeback code. Looking into it.
Thanks for looking into it guys. I used to do a lot of coding for ircd-hybrid ages ago so have a lot of respect for you kernel developers.
Patches to fix the bug posted. http://lkml.kernel.org/g/20150702005253.GA26440@mtj.duckdns.org http://lkml.kernel.org/g/20150702005337.GB26440@mtj.duckdns.org Jon, it'd be great if you can confirm on the mailing list that the above two patches fix the reported problem. Thanks!
Hello Tejun, Thanks for your help! The expected behavior has returned and all is well. -Jon