Created attachment 278203 [details] dmesg kernel: vanilla 4.18.5 gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.0 Copyright (C) 2017 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. More or less reproducible for me using next sequence: - on host: create LV of appropriate size (20g in my case) mkfs.xfs -m reflink=1 /dev/data/LV mount /dev/data/LV /mnt/ run kvm VM with qcow2 image (/mnt/disk) - inside vm: sysbench --test=fileio --file-total-size=9G prepare - on host: cp --reflink=always disk disk.b - inside vm: sysbench --test=fileio --file-total-size=9G --file-test-mode=seqwr --max-time=6000 --max-requests=0 --threads=16 run Some time after i/o on /dev/data/LV fall to zero and fs become completely unavailable and then I see a bunch of records: [ 2580.058205] INFO: task worker:6343 blocked for more than 120 seconds. [ 2580.064719] Not tainted 4.18.5 #1 [ 2580.068614] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 2580.076496] worker D 0 6343 1 0x00000000 [ 2580.082034] Call Trace: [ 2580.084532] ? __schedule+0x386/0xc50 [ 2580.088248] ? xlog_grant_head_wait+0xa3/0x3a0 [ 2580.092741] schedule+0x2f/0x90 [ 2580.095932] xlog_grant_head_wait+0x53/0x3a0 [ 2580.100256] xlog_grant_head_check+0xb3/0x160 [ 2580.104662] xfs_log_reserve+0x108/0x3f0 [ 2580.108682] xfs_trans_reserve+0x1b4/0x2b0 [ 2580.112948] xfs_trans_alloc+0xbe/0x220 [ 2580.116952] xfs_vn_update_time+0xcb/0x2b0 [ 2580.121220] ? current_time+0x4d/0x90 [ 2580.125047] file_update_time+0xe0/0x120 [ 2580.129139] xfs_file_aio_write_checks+0x14f/0x2d0 [ 2580.134099] xfs_file_dio_aio_write+0xcc/0x420 [ 2580.138715] xfs_file_write_iter+0x7b/0xa0 [ 2580.142978] do_iter_readv_writev+0x139/0x190 [ 2580.147502] do_iter_write+0x7f/0x1c0 [ 2580.151329] vfs_writev+0x98/0x110 [ 2580.154907] ? lock_acquire+0x8e/0x230 [ 2580.158823] ? __fget+0x5/0x200 [ 2580.162131] ? do_pwritev+0x9c/0xe0 [ 2580.165782] ? __fget_light+0x51/0x60 [ 2580.169614] do_pwritev+0x9c/0xe0 [ 2580.173095] do_syscall_64+0x5a/0x190 [ 2580.176922] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 2580.182138] RIP: 0033:0x7fe1937b784a [ 2580.185836] Code: Bad RIP value. [ 2580.189239] RSP: 002b:00007fe05e1f5850 EFLAGS: 00000246 ORIG_RAX: 0000000000000128 [ 2580.197058] RAX: ffffffffffffffda RBX: 0000000000000014 RCX: 00007fe1937b784a [ 2580.204361] RDX: 000000000000001e RSI: 0000564bf126f200 RDI: 0000000000000014 [ 2580.211660] RBP: 0000564bf126f200 R08: 0000000000000000 R09: 0000000000000000 [ 2580.218968] R10: 00000000dccf0000 R11: 0000000000000246 R12: 000000000000001e [ 2580.226265] R13: 00000000dccf0000 R14: 0000564bf13312a0 R15: 00007fe05e9f67a0 Full dmesg attached
Created attachment 278205 [details] config
On Thu, Aug 30, 2018 at 02:32:35PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=200981 > > kernel: vanilla 4.18.5 > gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.0 > Copyright (C) 2017 Free Software Foundation, Inc. > This is free software; see the source for copying conditions. There is NO > warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. > > More or less reproducible for me using next sequence: > > - on host: > create LV of appropriate size (20g in my case) > mkfs.xfs -m reflink=1 /dev/data/LV > mount /dev/data/LV /mnt/ > run kvm VM with qcow2 image (/mnt/disk) > > - inside vm: > sysbench --test=fileio --file-total-size=9G prepare > > - on host: > cp --reflink=always disk disk.b > > - inside vm: > sysbench --test=fileio --file-total-size=9G --file-test-mode=seqwr > --max-time=6000 --max-requests=0 --threads=16 run > > Some time after i/o on /dev/data/LV fall to zero and fs become completely > unavailable and then I see a bunch of records: The first error is this: [ 2212.046108] ================================================ [ 2212.051809] WARNING: lock held when returning to user space! [ 2212.057511] 4.18.5 #1 Not tainted [ 2212.060864] ------------------------------------------------ [ 2212.066564] worker/6123 is leaving the kernel with locks still held! [ 2212.072961] 1 lock held by worker/6123: [ 2212.076835] #0: 000000009eab4f1b (sb_internal#2){.+.+}, at: xfs_trans_alloc+0x17c/0x220 Which happens 5 minutes before the hung processes start being reported. Looks like something has gone wrong and an error path has leaked a transaction. Can you see if commit dcbd44f79986 ("xfs: fix transaction leak on remote attr set/remove failure") addresses the problem you are seeing? -Dave.
> Can you see if commit dcbd44f79986 ("xfs: fix transaction leak on remote attr set/remove failure") addresses the problem you are seeing? I've just tried but see no difference. Warning still here and vm become blocked some time after. ( It worth noting I've CONFIG_LOCKDEP turned on )
Ok, so the obvious fix doesn't help. Any idea why lockdep is not giving you the full stack trace of where the lock that was leaked to userspace was first accessed? Without that, we have no idea which of the 70-odd calls to xfs_trans_alloc() in the XFS code the lockdep warning refers to. At minimum, we need to know what syscall is triggering the warning so we know what code to look at. Also, can you reproduce this on older kernels? If not, could you bisect it? -Dave.
Created attachment 278345 [details] dmesg showing all locks held in the system
Created attachment 278347 [details] missing sb_end_intwrite: proof of concept
Ok, it looks like a warning is a result of missing sb_end_intwrite in xfs_reflink_allocate_cow, namely in a sequence xfs_trans_alloc -> goto retry -> goto convert. With proof-of-concept patch (attached) warning is gone but deadlock is not. Though now I have a list of held locks (dmesg attached). Did you mean taintad kernel will not show locks held?
(In reply to Alexander Y. Fomichev from comment #7) > Ok, it looks like a warning is a result of missing y in > xfs_reflink_allocate_cow, namely in a sequence xfs_trans_alloc -> goto retry > -> goto convert. Yup, I see the issue. Nice work finding this! > With proof-of-concept patch (attached) warning is gone but > deadlock is not. The code is not missing a sb_end_intwrite() call - it's missing a xfs_trans_cancel() call. That's why it still deadlocks even though you shut up the warning. I'll have a closer look in the morning to see if there's any other paths that can leak the transaction and write a patch to fix it. Thanks again for tracking it down, Alexander :) Cheers, Dave.
I confirm. With xfs_trans_cancel() lock is gone and works like a charm. Thanks.