Bug 87821 - luksSuspend causes 'sync' to block indefinitely when used on a mounted ext{2,3,4} filesystem
Summary: luksSuspend causes 'sync' to block indefinitely when used on a mounted ext{2,...
Status: NEW
Alias: None
Product: File System
Classification: Unclassified
Component: ext4 (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: fs_ext4@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-11-06 01:28 UTC by Michael Ensslin
Modified: 2016-03-20 11:25 UTC (History)
2 users (show)

See Also:
Kernel Version: 3.16-3-amd64
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg output from a system where I tried extensively to reproduce the issue under different circumstances. (105.89 KB, application/octet-stream)
2014-11-06 01:28 UTC, Michael Ensslin
Details

Description Michael Ensslin 2014-11-06 01:28:16 UTC
Created attachment 156841 [details]
dmesg output from a system where I tried extensively to reproduce the issue under different circumstances.

Apart from sync, suspend-to-memory is also affected.

The issue only occurs with partitions on my main drive (/dev/sda).
Partitions on /dev/sdb are not affected.

I have confirmed that btrfs and FAT are not affected.

I have confirmed that ext4 filesystems mounted with -onobarrier are not affected.

The issue is discussed in this commit message: https://github.com/vianney/arch-luks-suspend/commit/f5e2c8f3844b820596324021c40b4593b1965e42. The conclusion is that it was intruduced by this commit: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=06a407f13daf9e48f0ef7189c7e54082b53940c7

A shell session that shows all sorts of cases that do and do not reproduce the issue can be seen here: https://bpaste.net/show/9820f57406c7

The following commands will reproduce the issue reliably:

cryptsetup luksFormat /dev/sda2
cryptsetup open /dev/sda2 crypt
mkfs.ext4 /dev/mapper/crypt
mount /dev/mapper/crypt /tmp/m
cryptsetup luksSuspend crypt
sync

At some point, my dmesg contained timeout messages with stacktraces. At some point, I was even able to cause a deadlock in the kernel that froze the linux consoles and left only the X server working, but I couldn't reproduce that issue.

An example stacktrace from my dmesg log is:

[ 2040.168356] INFO: task sync:1751 blocked for more than 120 seconds.
[ 2040.171652]       Tainted: G        W     3.16-3-amd64 #1
[ 2040.175000] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2040.178418] sync            D ffff880036dc2668     0  1751   1604 0x00000000
[ 2040.178425]  ffff880036dc2210 0000000000000086 0000000000014240 ffff8800b3f7ffd8
[ 2040.178430]  0000000000014240 ffff880036dc2210 ffff8800b3f7fec0 ffff8800b3f7fe58
[ 2040.178434]  ffff8800b3f7feb8 ffff880036dc2210 ffffffff811d2680 0000000000000000
[ 2040.178436] Call Trace:
[ 2040.178455]  [<ffffffff811d2680>] ? do_fsync+0x70/0x70
[ 2040.178467]  [<ffffffff81508409>] ? schedule_timeout+0x229/0x2a0
[ 2040.178475]  [<ffffffff81508ad1>] ? __schedule+0x2b1/0x710
[ 2040.178483]  [<ffffffff811d2680>] ? do_fsync+0x70/0x70
[ 2040.178490]  [<ffffffff81509918>] ? wait_for_completion+0xa8/0x120
[ 2040.178504]  [<ffffffff81094a90>] ? wake_up_state+0x10/0x10
[ 2040.178511]  [<ffffffff81273ff0>] ? submit_bio_wait+0x50/0x60
[ 2040.178519]  [<ffffffff8127f495>] ? blkdev_issue_flush+0x55/0x80
[ 2040.178572]  [<ffffffffa0195298>] ? ext4_sync_fs+0xd8/0x140 [ext4]
[ 2040.178578]  [<ffffffff811a8afc>] ? iterate_supers+0xac/0x100
[ 2040.178583]  [<ffffffff811d27a2>] ? sys_sync+0x52/0x90
[ 2040.178589]  [<ffffffff8150c7ad>] ? system_call_fast_compare_end+0x10/0x15

Apart from those timeout stack traces, dmesg doesn't appear to contain anything interesting.
Comment 1 Michael Ensslin 2014-11-06 01:45:28 UTC
Note: The issue can also be reproduced using a regular file on /dev/sda3 as a crypt container; this makes the fact that /dev/sdb isn't affected even more mysterious.

Note You need to log in before you can comment on or make changes to this bug.