Bug 100681
Summary: | writeback changes merged in 4.2 cause closing of luks crypt devices to hang | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | Jon Christopherson (jon) |
Component: | LVM2/DM | Assignee: | Tejun Heo (tj) |
Status: | CLOSED CODE_FIX | ||
Severity: | high | CC: | agk, jon, snitzer |
Priority: | P1 | ||
Hardware: | Intel | ||
OS: | Linux | ||
Kernel Version: | 4.2.0-9-generic #201506290030 SMP Mon Jun 29 00:33:15 CDT 2015 x86_64 x86_64 x86_64 GNU/Linux | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
kernel cmdline
loaded modules CPU info dmesg blocked tasks and trace hung task timeout output sysrq show blocked state after revert latest hung task notification |
Description
Jon Christopherson
2015-06-29 15:56:04 UTC
Created attachment 181331 [details]
kernel cmdline
Created attachment 181341 [details]
loaded modules
Created attachment 181351 [details]
CPU info
Created attachment 181361 [details]
dmesg
Created attachment 181371 [details]
blocked tasks and trace
Created attachment 181391 [details]
hung task timeout output
Hmm, can you try reverting commit 0f20972f7 ("dm: factor out a common cleanup_mapped_device()")? I think I need to stop putting "No functional change" in commit headers that aren't intended to have a functional change... ;) I am compiling the kernel now with head at 1bc5e157ed2b4f5b206155fc772d860158acd201 and the above commit reverted. Will post results as soon as I can reboot into it. The issue doesn't exist in the 4.1.0 release and as I recall for a couple days into the 4.2 merge window it wasn't there as well. I am seeing the same behavior with the above mentioned commit reverted. I have attached a new call trace. Created attachment 181401 [details]
sysrq show blocked state after revert
Created attachment 181411 [details]
latest hung task notification
I have been testing the kernel before all the dm changes on the 25th and do not get the error before the 25th. after the changes on the 25th , the error occurs. there were a lot of commits that day so not sure which one is causing it yet. I have stepped through every commit that touches the DM code and found the commit that breaks things: e4bc13adfd ("Merge branch 'for-4.2/writeback' of git://git.kernel.dk/linux-block") Until this commit things work as expected. Afterwards things no longer work. Time to see if disabling cgroup writeback support in the kernel config restores functionality. disabling the config option CGROUP_WRITEBACK restores normal functionality. Something in that code is interfering with standard luks operations. Reassigning to Tejun. Tejun, I can assist with this but it is clearly your area of expertise. Heh, more like my area of failure. Yeah, definitely looks like a bug in cgroup writeback code. Looking into it. Thanks for looking into it guys. I used to do a lot of coding for ircd-hybrid ages ago so have a lot of respect for you kernel developers. Patches to fix the bug posted. http://lkml.kernel.org/g/20150702005253.GA26440@mtj.duckdns.org http://lkml.kernel.org/g/20150702005337.GB26440@mtj.duckdns.org Jon, it'd be great if you can confirm on the mailing list that the above two patches fix the reported problem. Thanks! Hello Tejun, Thanks for your help! The expected behavior has returned and all is well. -Jon |