Bug 214767 - xfs seems to hang due to race condition? maybe related to (gratuitous) thaw.
Summary: xfs seems to hang due to race condition? maybe related to (gratuitous) thaw.
Status: NEW
Alias: None
Product: File System
Classification: Unclassified
Component: XFS (show other bugs)
Hardware: x86-64 Linux
: P1 high
Assignee: FileSystem/XFS Default Virtual Assignee
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-10-20 07:36 UTC by Christian Theune
Modified: 2021-11-20 22:31 UTC (History)
3 users (show)

See Also:
Kernel Version: 5.10.70
Subsystem:
Regression: No
Bisected commit-id:


Attachments
log of a vm with only vdc1 (/tmp) stuck (173.51 KB, text/plain)
2021-10-21 13:20 UTC, Christian Theune
Details

Description Christian Theune 2021-10-20 07:36:07 UTC
We have been getting somewhat frequently (around 1 per day for about 500 VMs) blocked tasks with tracebacks always similar to this:

First we get:

[656898.010322] INFO: task kworker/u2:1:458736 blocked for more than 122 seconds.
[656898.011988]       Not tainted 5.10.70 #1-NixOS
[656898.012981] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[656898.014679] task:kworker/u2:1    state:D stack:    0 pid:458736 ppid:     2 flags:0x00004000
[656898.016530] Workqueue: writeback wb_workfn (flush-253:0)
[656898.017698] Call Trace:
[656898.018310]  __schedule+0x271/0x860
[656898.019208]  schedule+0x46/0xb0
[656898.019944]  rwsem_down_read_slowpath+0x169/0x490
[656898.021008]  ? mempool_alloc+0x62/0x170
[656898.022032]  xfs_map_blocks+0xb9/0x400 [xfs]
[656898.022998]  iomap_do_writepage+0x163/0x850
[656898.023587]  ? __mod_memcg_lruvec_state+0x21/0xe0
[656898.024044]  write_cache_pages+0x186/0x3d0
[656898.024463]  ? iomap_migrate_page+0xc0/0xc0
[656898.024882]  ? submit_bio_noacct+0x3a9/0x420
[656898.025303]  iomap_writepages+0x1c/0x40
[656898.025712]  xfs_vm_writepages+0x64/0x90 [xfs]
[656898.026146]  do_writepages+0x34/0xc0
[656898.026513]  __writeback_single_inode+0x39/0x2a0
[656898.026969]  writeback_sb_inodes+0x200/0x470
[656898.027388]  __writeback_inodes_wb+0x4c/0xe0
[656898.027805]  wb_writeback+0x1d8/0x290
[656898.028161]  wb_workfn+0x29b/0x4d0
[656898.028526]  ? __switch_to_asm+0x42/0x70
[656898.028915]  ? __switch_to+0x7b/0x3e0
[656898.029285]  process_one_work+0x1df/0x370
[656898.029696]  worker_thread+0x50/0x400
[656898.030054]  ? process_one_work+0x370/0x370
[656898.030475]  kthread+0x11b/0x140
[656898.030798]  ? __kthread_bind_mask+0x60/0x60
[656898.031214]  ret_from_fork+0x22/0x30
[656898.031592] INFO: task nix-daemon:459204 blocked for more than 122 seconds.
[656898.032259]       Not tainted 5.10.70 #1-NixOS
[656898.032695] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[656898.033451] task:nix-daemon      state:D stack:    0 pid:459204 ppid:  1067 flags:0x00000000
[656898.034261] Call Trace:
[656898.034538]  __schedule+0x271/0x860
[656898.034881]  schedule+0x46/0xb0
[656898.035226]  xfs_log_commit_cil+0x6a4/0x800 [xfs]
[656898.035709]  ? wake_up_q+0xa0/0xa0
[656898.036073]  __xfs_trans_commit+0x9d/0x310 [xfs]
[656898.036561]  xfs_setattr_nonsize+0x342/0x520 [xfs]
[656898.037027]  notify_change+0x348/0x4c0
[656898.037407]  ? chmod_common+0xa1/0x150
[656898.037779]  chmod_common+0xa1/0x150
[656898.038139]  do_fchmodat+0x5a/0xb0
[656898.038491]  __x64_sys_chmod+0x17/0x20
[656898.038860]  do_syscall_64+0x33/0x40
[656898.039221]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[656898.039722] RIP: 0033:0x7f3d9b3a3b77
[656898.040072] RSP: 002b:00007ffc388e2b18 EFLAGS: 00000206 ORIG_RAX: 000000000000005a
[656898.040797] RAX: ffffffffffffffda RBX: 00007ffc388e2e10 RCX: 00007f3d9b3a3b77
[656898.041542] RDX: 0000000000008049 RSI: 0000000000008124 RDI: 00000000016a2400
[656898.042222] RBP: 00007ffc388e32a0 R08: 00007ffc388e4628 R09: 0000000000000000
[656898.042913] R10: 0000000000000000 R11: 0000000000000206 R12: 00007ffc388e4620
[656898.043604] R13: 0000000000000000 R14: 00007ffc388e2df0 R15: 00007ffc388e2e10


and then a while later we see this:

[657020.886633] INFO: task qemu-ga:750 blocked for more than 122 seconds.
[657020.888346]       Not tainted 5.10.70 #1-NixOS
[657020.889482] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[657020.891072] task:qemu-ga         state:D stack:    0 pid:  750 ppid:     1 flags:0x00000080
[657020.891906] Call Trace:
[657020.892188]  __schedule+0x271/0x860
[657020.892547]  ? kvm_sched_clock_read+0xd/0x20
[657020.892983]  schedule+0x46/0xb0
[657020.893319]  rwsem_down_write_slowpath+0x218/0x480
[657020.893802]  thaw_super+0x12/0x20
[657020.894130]  __x64_sys_ioctl+0x62/0xb0
[657020.894491]  do_syscall_64+0x33/0x40
[657020.894860]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[657020.895396] RIP: 0033:0x7f69a55f2b17
[657020.895762] RSP: 002b:00007ffe5687b548 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[657020.896470] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f69a55f2b17
[657020.897144] RDX: 0000000000000000 RSI: 00000000c0045878 RDI: 0000000000000006
[657020.897815] RBP: 0000000000000000 R08: 00007f69a5477c08 R09: 0000000000000000
[657020.898479] R10: 0000000000000000 R11: 0000000000000246 R12: 000055ef74a15190
[657020.899155] R13: 00007ffe5687b5c0 R14: 00007ffe5687b560 R15: 0000000000000006
[657020.899850] INFO: task kworker/u2:1:458736 blocked for more than 245 seconds.
[657020.900513]       Not tainted 5.10.70 #1-NixOS
[657020.900942] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[657020.901672] task:kworker/u2:1    state:D stack:    0 pid:458736 ppid:     2 flags:0x00004000
[657020.902469] Workqueue: writeback wb_workfn (flush-253:0)
[657020.902976] Call Trace:
[657020.903220]  __schedule+0x271/0x860
[657020.903556]  schedule+0x46/0xb0
[657020.903870]  rwsem_down_read_slowpath+0x169/0x490
[657020.904321]  ? mempool_alloc+0x62/0x170
[657020.904996]  xfs_map_blocks+0xb9/0x400 [xfs]
[657020.905412]  iomap_do_writepage+0x163/0x850
[657020.905823]  ? __mod_memcg_lruvec_state+0x21/0xe0
[657020.906269]  write_cache_pages+0x186/0x3d0
[657020.906664]  ? iomap_migrate_page+0xc0/0xc0
[657020.907079]  ? submit_bio_noacct+0x3a9/0x420
[657020.907487]  iomap_writepages+0x1c/0x40
[657020.907942]  xfs_vm_writepages+0x64/0x90 [xfs]
[657020.908376]  do_writepages+0x34/0xc0
[657020.908760]  __writeback_single_inode+0x39/0x2a0
[657020.909222]  writeback_sb_inodes+0x200/0x470
[657020.909635]  __writeback_inodes_wb+0x4c/0xe0
[657020.910058]  wb_writeback+0x1d8/0x290
[657020.910410]  wb_workfn+0x29b/0x4d0
[657020.910756]  ? __switch_to_asm+0x42/0x70
[657020.911135]  ? __switch_to+0x7b/0x3e0
[657020.911489]  process_one_work+0x1df/0x370
[657020.911912]  worker_thread+0x50/0x400
[657020.912264]  ? process_one_work+0x370/0x370
[657020.912667]  kthread+0x11b/0x140
[657020.913008]  ? __kthread_bind_mask+0x60/0x60
[657020.913413]  ret_from_fork+0x22/0x30
[657020.913791] INFO: task nix-daemon:459204 blocked for more than 245 seconds.
[657020.914440]       Not tainted 5.10.70 #1-NixOS
[657020.914879] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[657020.915612] task:nix-daemon      state:D stack:    0 pid:459204 ppid:  1067 flags:0x00000000
[657020.916399] Call Trace:
[657020.916654]  __schedule+0x271/0x860
[657020.917006]  schedule+0x46/0xb0
[657020.917343]  xfs_log_commit_cil+0x6a4/0x800 [xfs]
[657020.917806]  ? wake_up_q+0xa0/0xa0
[657020.918162]  __xfs_trans_commit+0x9d/0x310 [xfs]
[657020.918633]  xfs_setattr_nonsize+0x342/0x520 [xfs]
[657020.919101]  notify_change+0x348/0x4c0
[657020.919461]  ? chmod_common+0xa1/0x150
[657020.919831]  chmod_common+0xa1/0x150
[657020.920180]  do_fchmodat+0x5a/0xb0
[657020.920509]  __x64_sys_chmod+0x17/0x20
[657020.920878]  do_syscall_64+0x33/0x40
[657020.921222]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[657020.921703] RIP: 0033:0x7f3d9b3a3b77
[657020.922055] RSP: 002b:00007ffc388e2b18 EFLAGS: 00000206 ORIG_RAX: 000000000000005a
[657020.922772] RAX: ffffffffffffffda RBX: 00007ffc388e2e10 RCX: 00007f3d9b3a3b77
[657020.923435] RDX: 0000000000008049 RSI: 0000000000008124 RDI: 00000000016a2400
[657020.924115] RBP: 00007ffc388e32a0 R08: 00007ffc388e4628 R09: 0000000000000000
[657020.924787] R10: 0000000000000000 R11: 0000000000000206 R12: 00007ffc388e4620
[657020.925450] R13: 0000000000000000 R14: 00007ffc388e2df0 R15: 00007ffc388e2e10


IO is subsequently stalled on ALL disks (we have vda/vdb/vdc with two of them running XFS and one running swap).

I am pretty sure that this is not primarily an issue in the hypervisor (Qemu 4.1) or storage (Ceph Jewel) because I can warm reset the virtual machine using "system-reset" and the machine will boot properly.

This could be related to https://bugzilla.kernel.org/show_bug.cgi?id=207053 but it doesn't look quite right.
Comment 1 Christian Theune 2021-10-20 07:39:45 UTC
Also not that I'm not sure that it's related to the thaw at all because the stuck thaw info comes exactly 122 seconds after the first blocked task report which IMHO indicates that it's a subsequent error, not a cause.
Comment 2 Christian Theune 2021-10-20 16:16:12 UTC
Here's another excerpt that I dug out from a different VM with the same issue. No thawing happening AFAICT around this time:

[848865.353541] INFO: task nix-daemon:1855245 blocked for more than 122 seconds.
[848865.355625]       Not tainted 5.10.70 #1-NixOS
[848865.356912] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[848865.359117] task:nix-daemon      state:D stack:    0 pid:1855245 ppid:  2165 flags:0x00000000
[848865.360999] Call Trace:
[848865.361480]  __schedule+0x271/0x860
[848865.362069]  schedule+0x46/0xb0
[848865.362657]  xfs_log_commit_cil+0x6a4/0x800 [xfs]
[848865.363436]  ? wake_up_q+0xa0/0xa0
[848865.364026]  __xfs_trans_commit+0x9d/0x310 [xfs]
[848865.364818]  xfs_create+0x472/0x560 [xfs]
[848865.365517]  xfs_generic_create+0x247/0x320 [xfs]
[848865.366310]  ? xfs_lookup+0x55/0x100 [xfs]
[848865.366984]  path_openat+0xdd7/0x1070
[848865.367617]  do_filp_open+0x88/0x130
[848865.368199]  ? getname_flags.part.0+0x29/0x1a0
[848865.368925]  do_sys_openat2+0x97/0x150
[848865.369557]  __x64_sys_openat+0x54/0x90
[848865.370176]  do_syscall_64+0x33/0x40
[848865.370787]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[848865.371619] RIP: 0033:0x7f48b24c0ea8
[848865.372196] RSP: 002b:00007ffed91cee30 EFLAGS: 00000293 ORIG_RAX: 0000000000000101
[848865.373398] RAX: ffffffffffffffda RBX: 00000000000800c1 RCX: 00007f48b24c0ea8
[848865.374530] RDX: 00000000000800c1 RSI: 00000000015c4750 RDI: 00000000ffffff9c
[848865.375661] RBP: 00000000015c4750 R08: 0000000000000000 R09: 0000000000000003
[848865.376790] R10: 00000000000001b6 R11: 0000000000000293 R12: 00007ffed91cf220
[848865.377920] R13: 00007ffed91ceee0 R14: 00007ffed91ceed0 R15: 00007ffed91d0c50

and I can see it not making any progress:

[848988.235884] INFO: task nix-daemon:1855245 blocked for more than 245 seconds.
[848988.238567]       Not tainted 5.10.70 #1-NixOS
[848988.240254] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[848988.242358] task:nix-daemon      state:D stack:    0 pid:1855245 ppid:  2165 flags:0x00000000
[848988.243756] Call Trace:
[848988.244183]  __schedule+0x271/0x860
[848988.244780]  schedule+0x46/0xb0
[848988.245366]  xfs_log_commit_cil+0x6a4/0x800 [xfs]
[848988.246153]  ? wake_up_q+0xa0/0xa0
[848988.246775]  __xfs_trans_commit+0x9d/0x310 [xfs]
[848988.247572]  xfs_create+0x472/0x560 [xfs]
[848988.248269]  xfs_generic_create+0x247/0x320 [xfs]
[848988.249084]  ? xfs_lookup+0x55/0x100 [xfs]
[848988.249781]  path_openat+0xdd7/0x1070
[848988.250388]  do_filp_open+0x88/0x130
[848988.251000]  ? getname_flags.part.0+0x29/0x1a0
[848988.251746]  do_sys_openat2+0x97/0x150
[848988.252366]  __x64_sys_openat+0x54/0x90
[848988.253015]  do_syscall_64+0x33/0x40
[848988.253610]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[848988.254445] RIP: 0033:0x7f48b24c0ea8
[848988.255071] RSP: 002b:00007ffed91cee30 EFLAGS: 00000293 ORIG_RAX: 0000000000000101
[848988.256292] RAX: ffffffffffffffda RBX: 00000000000800c1 RCX: 00007f48b24c0ea8
[848988.257460] RDX: 00000000000800c1 RSI: 00000000015c4750 RDI: 00000000ffffff9c
[848988.258622] RBP: 00000000015c4750 R08: 0000000000000000 R09: 0000000000000003
[848988.259777] R10: 00000000000001b6 R11: 0000000000000293 R12: 00007ffed91cf220
[848988.260933] R13: 00007ffed91ceee0 R14: 00007ffed91ceed0 R15: 00007ffed91d0c50

Both stack traces are identical in every regard so that might be useful info.
Comment 3 Christian Theune 2021-10-20 16:31:07 UTC
I have another machine that managed to break free after 20 minutes which did around 3 thaws around that time:

Oct 20 18:05:22 pixometerstag04 kernel: INFO: task nix-daemon:1387736 blocked for more than 1228 seconds.
Oct 20 18:05:22 pixometerstag04 kernel:       Not tainted 5.10.70 #1-NixOS
Oct 20 18:05:22 pixometerstag04 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 20 18:05:22 pixometerstag04 kernel: task:nix-daemon      state:D stack:    0 pid:1387736 ppid:  1356 flags:0x00000000
Oct 20 18:05:22 pixometerstag04 kernel: Call Trace:
Oct 20 18:05:22 pixometerstag04 kernel:  __schedule+0x271/0x860
Oct 20 18:05:22 pixometerstag04 kernel:  schedule+0x46/0xb0
Oct 20 18:05:22 pixometerstag04 kernel:  xfs_log_commit_cil+0x6a4/0x800 [xfs]
Oct 20 18:05:22 pixometerstag04 kernel:  ? wake_up_q+0xa0/0xa0
Oct 20 18:05:22 pixometerstag04 kernel:  __xfs_trans_commit+0x9d/0x310 [xfs]
Oct 20 18:05:22 pixometerstag04 kernel:  xfs_create+0x472/0x560 [xfs]
Oct 20 18:05:22 pixometerstag04 kernel:  xfs_generic_create+0x247/0x320 [xfs]
Oct 20 18:05:22 pixometerstag04 kernel:  ? xfs_lookup+0x55/0x100 [xfs]
Oct 20 18:05:22 pixometerstag04 kernel:  path_openat+0xdd7/0x1070
Oct 20 18:05:22 pixometerstag04 kernel:  do_filp_open+0x88/0x130
Oct 20 18:05:22 pixometerstag04 kernel:  ? getname_flags.part.0+0x29/0x1a0
Oct 20 18:05:22 pixometerstag04 kernel:  do_sys_openat2+0x97/0x150
Oct 20 18:05:22 pixometerstag04 kernel:  __x64_sys_openat+0x54/0x90
Oct 20 18:05:22 pixometerstag04 kernel:  do_syscall_64+0x33/0x40
Oct 20 18:05:22 pixometerstag04 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Oct 20 18:05:22 pixometerstag04 kernel: RIP: 0033:0x7f3e3114dea8
Oct 20 18:05:22 pixometerstag04 kernel: RSP: 002b:00007ffd92f7df10 EFLAGS: 00000293 ORIG_RAX: 0000000000000101
Oct 20 18:05:22 pixometerstag04 kernel: RAX: ffffffffffffffda RBX: 00000000000800c1 RCX: 00007f3e3114dea8
Oct 20 18:05:22 pixometerstag04 kernel: RDX: 00000000000800c1 RSI: 000000000195cb50 RDI: 00000000ffffff9c
Oct 20 18:05:22 pixometerstag04 kernel: RBP: 000000000195cb50 R08: 0000000000000000 R09: 0000000000000003
Oct 20 18:05:22 pixometerstag04 kernel: R10: 00000000000001b6 R11: 0000000000000293 R12: 00007ffd92f7e300
Oct 20 18:05:22 pixometerstag04 kernel: R13: 00007ffd92f7dfc0 R14: 00007ffd92f7dfb0 R15: 00007ffd92f7fd30

However, the first thaw came between when the machine already reported 122 and 245s of hangs and the last thaw came after it broke free.

Interestingly it also logs a variety of things that worked while those were logged, maybe it only happened for the /tmp filesystem and not / ... apparently postgresql was still writing stuff on the disk in that period, so I'd guess this only happened on /tmp

This is square in the middle of the reported blocks:

Oct 20 17:52:00 pixometerstag04 postgres[1008]: user=,db= LOG:  checkpoint complete: wrote 0 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=0.004 s, sync=0.001 s, total=0.018 s; sync fi>
Comment 4 Christian Theune 2021-10-20 18:29:07 UTC
Another thing I noticed: it always hangs the nix-daemon (nixos.org) which does some dance around managing an overlay read-only mount that gets re-mounted rw for a short period while changing things in there. Here is what it looks like normally:

/dev/vda1 on / type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)
/dev/vda1 on /nix/store type xfs (ro,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)
Comment 5 Christian Theune 2021-10-20 18:37:09 UTC
I've also picked this up with the friends from NixOS in case we can shed some light on what the dance is that the nix-daemon performs here:
https://github.com/NixOS/nixpkgs/issues/142394
Comment 6 Eric Sandeen 2021-10-20 19:22:18 UTC
You might want to do an "echo w > /proc/sysrq-trigger when this happens, to get all blocked tasks.

The thaw seems a little suspicious, what all is going on there with freezing and thawing?
Comment 7 Christian Theune 2021-10-20 19:26:43 UTC
Thanks, I'll try to do something like that, likely by using a qemu monitor command to send in the trigger via keyboard when I catch this the next time it happens.

The freezing/thawing is part of our image backup structure and over the years we established to regularly call 'thaw' gratuitously as thawing can once in a blue moon cause VMs to get stuck infinitely if an initial thaw doesn't go through. This has been reliable for at least 2-3 years by now and I'm not sure whether this will turn out a red herring or not.
Comment 8 Dave Chinner 2021-10-20 22:38:31 UTC
On Wed, Oct 20, 2021 at 07:36:07AM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> [656898.034261] Call Trace:
> [656898.034538]  __schedule+0x271/0x860
> [656898.034881]  schedule+0x46/0xb0
> [656898.035226]  xfs_log_commit_cil+0x6a4/0x800 [xfs]

This looks like it is stuck on the hard throttle. Which means this
is the likely fix:

19f4e7cc8197 xfs: Fix CIL throttle hang when CIL space used going backwards

Cheers,

Dave.
Comment 9 Christian Theune 2021-10-21 04:21:30 UTC
Thanks Dave!

From what I can tell this fix was merged into 5.14 with a largish bunch of other changes (via 9f7b640f001f). I checked that the fix you mentioned applies cleanly to the current 5.10 codebase (it does) but I'm not sure whether we missed some more backports for 5.10.

I wonder whether "we" (I don't think I'm expert enough to do this reliably) need to check for other fixes that didn't make it back to 5.10. Do you have a recommendation how/who to approach?

As I'm not doing this often, I would contact GKH and ask for a backport of the patch you mentioned (given that you consider it appropriate to apply to 5.10 from a semantic perspective) but maybe you have better advice.
Comment 10 Christian Theune 2021-10-21 13:17:00 UTC
Alright,

I managed to catch a machine that logged a blocked request for only the /tmp partition. I logged in, ran a 'sync' and did the 'w' sysrequest. I'm attaching the log here where at first you see a number of blocked requests, then a while later I log in and run the sync and you can see the sync stuck as well as a large number of workers.
Comment 11 Christian Theune 2021-10-21 13:19:38 UTC
Going through the log I can see that aside from 'sync' and the workers there is a tar process that is stuck since 13:20 and not making any progress. I can also not see any IO on the block device.
Comment 12 Christian Theune 2021-10-21 13:20:15 UTC
Created attachment 299295 [details]
log of a vm with only vdc1 (/tmp) stuck
Comment 13 Pedram Fard 2021-10-29 02:21:22 UTC
We have run into a similar issue. We use xfs_freeze to take backup snapshots of our filesystems and occasionally, in the process the xfs_freeze/thaw process turns into zombies.
We used the echo w > /proc/sysrq-trigger to get the stack trace:

[736826.666940] task:kswapd0         state:D stack:    0 pid:  576 ppid:     2 flags:0x00004000
[736826.674439] Call Trace:
[736826.677686]  __schedule+0x1f9/0x660
[736826.681593]  schedule+0x46/0xb0
[736826.685141]  percpu_rwsem_wait+0xa9/0x180
[736826.688867]  ? percpu_free_rwsem+0x30/0x30
[736826.692764]  __percpu_down_read+0x49/0x60
[736826.696549]  xfs_trans_alloc+0x15d/0x170
[736826.700367]  xfs_free_eofblocks+0x130/0x1e0
[736826.807546]  xfs_fs_destroy_inode+0xa8/0x1b0
[736826.811610]  destroy_inode+0x3b/0x70
[736826.815150]  dispose_list+0x48/0x60
[736826.818796]  prune_icache_sb+0x54/0x80
[736826.822476]  super_cache_scan+0x161/0x1e0
[736826.826341]  do_shrink_slab+0x145/0x240
[736826.830294]  shrink_slab_memcg+0xcd/0x1e0
[736826.834371]  shrink_node_memcgs+0x186/0x1c0
[736826.838491]  shrink_node+0x14f/0x570
[736826.842088]  balance_pgdat+0x232/0x510
[736826.845820]  kswapd+0xe2/0x170
[736826.849154]  ? balance_pgdat+0x510/0x510
[736826.852944]  kthread+0x11b/0x140
[736826.856326]  ? __kthread_bind_mask+0x60/0x60
[736826.860313]  ret_from_fork+0x22/0x30

[736826.863839] task:systemd-logind  state:D stack:    0 pid: 2073 ppid:     1 flags:0x00000084
[736826.871453] Call Trace:
[736826.874898]  __schedule+0x1f9/0x660
[736826.878898]  ? kvm_sched_clock_read+0xd/0x20
[736826.883265]  schedule+0x46/0xb0
[736826.887003]  rwsem_down_write_slowpath+0x234/0x4b0
[736826.891114]  ? __kmalloc+0x154/0x2b0
[736826.894794]  prealloc_shrinker+0x4c/0xf0
[736826.898526]  alloc_super+0x29a/0x2f0
[736826.902210]  ? set_anon_super+0x40/0x40
[736826.905867]  sget_fc+0x6c/0x240
[736826.909206]  ? shmem_create+0x20/0x20
[736826.912912]  get_tree_nodev+0x23/0xb0
[736826.916533]  ? bpf_lsm_capset+0x10/0x10
[736826.920310]  vfs_get_tree+0x25/0xb0
[736826.923848]  do_new_mount+0x152/0x1b0
[736826.927548]  __x64_sys_mount+0x103/0x140
[736826.931531]  do_syscall_64+0x33/0x40
[736826.935597]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[736826.940323] RIP: 0033:0x7f49f8efcdba
[736826.944298] RSP: 002b:00007ffdb98f2658 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[736826.952238] RAX: ffffffffffffffda RBX: 000055f21a7113a0 RCX: 00007f49f8efcdba
[736826.959940] RDX: 000055f218672af0 RSI: 000055f21a721de0 RDI: 000055f218672af0
[736826.967721] RBP: 00007ffdb98f2690 R08: 000055f21a7a51e0 R09: 0000000000000000
[736826.975503] R10: 0000000000000006 R11: 0000000000000202 R12: 0000000000000000
[736826.983253] R13: 00007ffdb98f2740 R14: 00007ffdb98f2820 R15: 0000000000000000

[736826.990979] task:dockerd         state:D stack:    0 pid: 6639 ppid:     1 flags:0x00000080
[736826.999289] Call Trace:
[736827.002695]  __schedule+0x1f9/0x660
[736827.006708]  schedule+0x46/0xb0
[736827.010512]  rwsem_down_write_slowpath+0x234/0x4b0
[736827.015214]  ? __kmalloc+0x154/0x2b0
[736827.019291]  prealloc_shrinker+0x4c/0xf0
[736827.023495]  alloc_super+0x29a/0x2f0
[736827.027505]  sget+0x91/0x220
[736827.031089]  ? destroy_super_rcu+0x40/0x40
[736827.035268]  ? ovl_get_lowerstack+0x190/0x190 [overlay]
[736827.040172]  mount_nodev+0x26/0x90
[736827.044070]  legacy_get_tree+0x27/0x40
[736827.048169]  vfs_get_tree+0x25/0xb0
[736827.052088]  do_new_mount+0x152/0x1b0
[736827.056076]  __x64_sys_mount+0x103/0x140
[736827.060176]  do_syscall_64+0x33/0x40
[736827.064158]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[736827.068950] RIP: 0033:0x55c5416cc2aa
[736827.072915] RSP: 002b:000000c0015b2768 EFLAGS: 00000216 ORIG_RAX: 00000000000000a5
[736827.080676] RAX: ffffffffffffffda RBX: 000000c00005e800 RCX: 000055c5416cc2aa
[736827.088375] RDX: 000000c000df5320 RSI: 000000c0012db140 RDI: 000000c000df52e8
[736827.095311] RBP: 000000c0015b2800 R08: 000000c0015a0600 R09: 0000000000000000
[736827.102300] R10: 0000000000000000 R11: 0000000000000216 R12: 0000000000000133
[736827.109100] R13: 0000000000000132 R14: 0000000000000200 R15: 0000000000000055


[737030.525416] task:sh              state:D stack:    0 pid:27526 ppid: 27517 flags:0x00000080
[737030.532905] Call Trace:
[737030.536188]  __schedule+0x1f9/0x660
[737030.540060]  schedule+0x46/0xb0
[737030.543385]  percpu_rwsem_wait+0xa9/0x180
[737030.547220]  ? percpu_free_rwsem+0x30/0x30
[737030.551093]  __percpu_down_read+0x49/0x60
[737030.555220]  mnt_want_write+0x66/0x90
[737030.558966]  open_last_lookups+0x30d/0x3f0
[737030.562914]  ? path_init+0x2bd/0x3e0
[737030.681735]  path_openat+0x88/0x1d0
[737030.685600]  ? xfs_iunlock+0x94/0xe0
[737030.689575]  do_filp_open+0x88/0x130
[737030.693496]  ? __check_object_size.part.0+0x11f/0x140
[737030.698201]  do_sys_openat2+0x97/0x150
[737030.702232]  __x64_sys_openat+0x54/0x90
[737030.706301]  do_syscall_64+0x33/0x40
[737030.710248]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[737030.714945] RIP: 0033:0x7f4502c0d14e
[737030.718849] RSP: 002b:00007ffd066be7a0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
[737030.726628] RAX: ffffffffffffffda RBX: 00000000009ba360 RCX: 00007f4502c0d14e
[737030.734177] RDX: 0000000000000241 RSI: 00000000009bb8c0 RDI: ffffffffffffff9c
[737030.741759] RBP: 00007ffd066be8a0 R08: 0000000000000020 R09: 00000000009bb8c0
[737030.749452] R10: 00000000000001b6 R11: 0000000000000246 R12: 0000000000000000
[737030.757142] R13: 0000000000000001 R14: 0000000000000001 R15: 00000000009bad60
[737030.764836] task:bash            state:D stack:    0 pid:27580 ppid:     1 flags:0x00000084
[737030.773068] Call Trace:
[737030.776437]  __schedule+0x1f9/0x660
[737030.780354]  schedule+0x46/0xb0
[737030.784080]  percpu_rwsem_wait+0xa9/0x180
[737030.788264]  ? percpu_free_rwsem+0x30/0x30
[737030.792476]  __percpu_down_read+0x49/0x60
[737030.796642]  mnt_want_write+0x66/0x90
[737030.800660]  open_last_lookups+0x30d/0x3f0
[737030.804923]  ? path_init+0x2bd/0x3e0
[737030.808895]  path_openat+0x88/0x1d0
[737030.812800]  ? xfs_iunlock+0x94/0xe0
[737030.816745]  do_filp_open+0x88/0x130
[737030.820694]  ? getname_flags.part.0+0x29/0x1a0
[737030.825097]  ? __check_object_size.part.0+0x11f/0x140
[737030.829846]  ? audit_alloc_name+0x8c/0xe0
[737030.834010]  do_sys_openat2+0x97/0x150
[737030.838056]  __x64_sys_openat+0x54/0x90
[737030.842164]  do_syscall_64+0x33/0x40
[737030.846146]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[737030.850855] RIP: 0033:0x7f2072a9414e
[737030.854822] RSP: 002b:00007ffd855832e0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
[737030.862655] RAX: ffffffffffffffda RBX: 000000000178aab0 RCX: 00007f2072a9414e
[737030.870259] RDX: 0000000000000241 RSI: 000000000178af20 RDI: ffffffffffffff9c
[737030.877820] RBP: 00007ffd855833e0 R08: 0000000000000020 R09: 000000000178af20
[737030.885463] R10: 00000000000001b6 R11: 0000000000000246 R12: 0000000000000000
[737030.893083] R13: 0000000000000003 R14: 0000000000000001 R15: 000000000178aa60

[736992.593491] task:k               state:D stack:    0 pid:11166 ppid: 29014 flags:0x00000184
[736992.600911] Call Trace:
[736992.604234]  __schedule+0x1f9/0x660
[736992.607842]  schedule+0x46/0xb0
[736992.611295]  percpu_rwsem_wait+0xa9/0x180
[736992.614999]  ? percpu_free_rwsem+0x30/0x30
[736992.618901]  __percpu_down_read+0x49/0x60
[736992.622614]  mnt_want_write+0x66/0x90
[736992.626323]  open_last_lookups+0x30d/0x3f0
[736992.630064]  ? path_init+0x2bd/0x3e0
[736992.633795]  path_openat+0x88/0x1d0
[736992.637360]  do_filp_open+0x88/0x130
[736992.640911]  ? getname_flags.part.0+0x29/0x1a0
[736992.645006]  ? __check_object_size.part.0+0x11f/0x140
[736992.649268]  ? audit_alloc_name+0x8c/0xe0
[736992.653173]  do_sys_openat2+0x97/0x150
[736992.656832]  __x64_sys_openat+0x54/0x90
[736992.660619]  do_syscall_64+0x33/0x40
[736992.664181]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[736992.668543] RIP: 0033:0x7f81dc2d11ae
[736992.672208] RSP: 002b:00007ffcff615c00 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
[736992.679509] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f81dc2d11ae
[736992.686481] RDX: 0000000000000042 RSI: 00007f81d757e6b8 RDI: 00000000ffffff9c
[736992.693723] RBP: 00007f81d7ba14c4 R08: 000000000000003a R09: 0000000000000000
[736992.700684] R10: 00000000000001b6 R11: 0000000000000246 R12: 00007f81d757e6b8
[736992.708136] R13: 00007f81d7ba14c4 R14: 00007f81dbfb45b8 R15: 00007f81d7f3f8fc


[736992.715824] task:k               state:D stack:    0 pid:11273 ppid: 29014 flags:0x00000184
[736992.724256] Call Trace:
[736992.727698]  __schedule+0x1f9/0x660
[736992.731616]  schedule+0x46/0xb0
[736992.735367]  percpu_rwsem_wait+0xa9/0x180
[736992.739512]  ? percpu_free_rwsem+0x30/0x30
[736992.743681]  __percpu_down_read+0x49/0x60
[736992.747837]  mnt_want_write+0x66/0x90
[736992.751896]  open_last_lookups+0x30d/0x3f0
[736992.756191]  ? path_init+0x2bd/0x3e0
[736992.760175]  path_openat+0x88/0x1d0
[736992.764055]  do_filp_open+0x88/0x130
[736992.768115]  ? getname_flags.part.0+0x29/0x1a0
[736992.772563]  ? __check_object_size.part.0+0x11f/0x140
[736992.777391]  ? audit_alloc_name+0x8c/0xe0
[736992.781658]  do_sys_openat2+0x97/0x150
[736992.785794]  __x64_sys_openat+0x54/0x90
[736992.789856]  do_syscall_64+0x33/0x40
[736992.793913]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[736992.798746] RIP: 0033:0x7fb7d337f1ae
[736992.802878] RSP: 002b:00007ffd9b2dbda0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
[736992.810824] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fb7d337f1ae
[736992.818496] RDX: 0000000000000042 RSI: 00007fb7d2e2aa88 RDI: 00000000ffffff9c
[736992.826359] RBP: 00007fb7cecdc7dc R08: 0000000000000008 R09: 0000000000000000
[736992.833753] R10: 00000000000001b6 R11: 0000000000000246 R12: 00007fb7d2e2aa88
[736992.840727] R13: 00007fb7d2e2aa7c R14: 00007fb7d3150920 R15: 0000000000000004


[736992.847580] task:tail            state:D stack:    0 pid:11608 ppid:  9603 flags:0x00004186
[736992.855146] Call Trace:
[736992.858238]  __schedule+0x1f9/0x660
[736992.861943]  schedule+0x46/0xb0
[736992.865387]  percpu_rwsem_wait+0xa9/0x180
[736992.869138]  ? percpu_free_rwsem+0x30/0x30
[736992.873134]  __percpu_down_read+0x49/0x60
[736992.876948]  xfs_trans_alloc+0x15d/0x170
[736992.880854]  xfs_free_eofblocks+0x130/0x1e0
[736992.885085]  xfs_release+0x13d/0x160
[736992.888807]  __fput+0x96/0x240
[736992.892138]  task_work_run+0x5f/0x90
[736992.895834]  do_exit+0x22c/0x3b0
[736992.899189]  ? timerqueue_del+0x1e/0x40
[736992.902967]  do_group_exit+0x33/0xa0
[736992.906469]  get_signal+0x15d/0x5b0
[736993.013638]  arch_do_signal+0x25/0xf0
[736993.017223]  exit_to_user_mode_loop+0x8d/0xc0
[736993.021148]  exit_to_user_mode_prepare+0x6a/0x70
[736993.025371]  syscall_exit_to_user_mode+0x22/0x140
[736993.029417]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[736993.033852] RIP: 0033:0x7f907725e6f4
[736993.037391] RSP: 002b:00007ffcf7722f38 EFLAGS: 00000246 ORIG_RAX: 0000000000000023
[736993.044403] RAX: fffffffffffffdfc RBX: 0000000000000001 RCX: 00007f907725e6f4
[736993.051384] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007ffcf7722f40
[736993.058190] RBP: 0000000000000000 R08: 00007ffcf7723070 R09: 00007f9077312e80
[736993.065109] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffcf7722f40
[736993.071934] R13: 00007ffcf77256bf R14: 0000560396137440 R15: 0000000000000000


[736996.659414] task:process-exporte state:D stack:    0 pid:30396 ppid:  2204 flags:0x00004082
[736996.666720] Call Trace:
[736996.669864]  __schedule+0x1f9/0x660
[736996.673371]  schedule+0x46/0xb0
[736996.676730]  rwsem_down_write_slowpath+0x234/0x4b0
[736996.680991]  unregister_memcg_shrinker.isra.0+0x18/0x40
[736996.685377]  unregister_shrinker+0x7b/0x80
[736996.689295]  deactivate_locked_super+0x29/0xa0
[736996.693241]  cleanup_mnt+0x12d/0x190
[736996.696912]  task_work_run+0x5f/0x90
[736996.700444]  do_exit+0x22c/0x3b0
[736996.703996]  do_group_exit+0x33/0xa0
[736996.707581]  get_signal+0x15d/0x5b0
[736996.711191]  arch_do_signal+0x25/0xf0
[736996.714771]  exit_to_user_mode_loop+0x8d/0xc0
[736996.718759]  exit_to_user_mode_prepare+0x6a/0x70
[736996.722740]  syscall_exit_to_user_mode+0x22/0x140
[736996.726869]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[736996.835174] RIP: 0033:0x4a2c50
[736996.838687] RSP: 002b:000000c000213798 EFLAGS: 00000216 ORIG_RAX: 0000000000000000
[736996.845651] RAX: 0000000000000200 RBX: 000000c000032500 RCX: 00000000004a2c50
[736996.852636] RDX: 0000000000000200 RSI: 000000c0008ef600 RDI: 0000000000000005
[736996.859446] RBP: 000000c0002137e8 R08: 0000000000000000 R09: 0000000000000000
[736996.866347] R10: 0000000000000000 R11: 0000000000000216 R12: ffffffffffffffff
[736996.873124] R13: 000000000000000c R14: 000000000000000b R15: 0000000000000010
Comment 14 Pedram Fard 2021-10-29 02:22:46 UTC
Here is the result of echo t > /proc/sysrq-trigger:

1 I root       578     2  0  60 -20 -     0 -      Oct20 ?        00:00:00 [xfsalloc]
1 I root       579     2  0  60 -20 -     0 -      Oct20 ?        00:00:00 [xfs_mru_cache]
1 I root      1251     2  0  60 -20 -     0 -      Oct20 ?        00:00:00 [xfs-buf/nvme0n1]
1 I root      1252     2  0  60 -20 -     0 -      Oct20 ?        00:00:00 [xfs-conv/nvme0n]
1 I root      1253     2  0  60 -20 -     0 -      Oct20 ?        00:00:00 [xfs-cil/nvme0n1]
1 I root      1254     2  0  60 -20 -     0 -      Oct20 ?        00:00:00 [xfs-reclaim/nvm]
1 I root      1255     2  0  60 -20 -     0 -      Oct20 ?        00:00:00 [xfs-eofblocks/n]
1 I root      1256     2  0  60 -20 -     0 -      Oct20 ?        00:00:00 [xfs-log/nvme0n1]
1 S root      1257     2  0  80   0 -     0 -      Oct20 ?        00:02:06 [xfsaild/nvme0n1]
1 I root      1990     2  0  60 -20 -     0 -      Oct20 ?        00:00:00 [xfs-buf/dm-0]
1 I root      1991     2  0  60 -20 -     0 -      Oct20 ?        00:00:00 [xfs-conv/dm-0]
1 I root      1992     2  0  60 -20 -     0 -      Oct20 ?        00:00:00 [xfs-cil/dm-0]
1 I root      1993     2  0  60 -20 -     0 -      Oct20 ?        00:00:00 [xfs-reclaim/dm-]
1 I root      1994     2  0  60 -20 -     0 -      Oct20 ?        00:00:00 [xfs-eofblocks/d]
1 I root      1995     2  0  60 -20 -     0 -      Oct20 ?        00:00:00 [xfs-log/dm-0]
1 S root      1996     2  0  80   0 -     0 -      Oct20 ?        00:00:00 [xfsaild/dm-0]
1 I root      1999     2  0  60 -20 -     0 -      Oct20 ?        00:00:00 [xfs-buf/dm-1]
1 I root      2000     2  0  60 -20 -     0 -      Oct20 ?        00:00:00 [xfs-conv/dm-1]
1 I root      2001     2  0  60 -20 -     0 -      Oct20 ?        00:00:00 [xfs-cil/dm-1]
1 I root      2002     2  0  60 -20 -     0 -      Oct20 ?        00:00:00 [xfs-reclaim/dm-]
1 I root      2003     2  0  60 -20 -     0 -      Oct20 ?        00:00:00 [xfs-eofblocks/d]
1 I root      2004     2  0  60 -20 -     0 -      Oct20 ?        00:00:00 [xfs-log/dm-1]
1 S root      2005     2  0  80   0 -     0 -      Oct20 ?        00:01:53 [xfsaild/dm-1]
1 I root      2008     2  0  60 -20 -     0 -      Oct20 ?        00:00:00 [xfs-buf/dm-2]
1 I root      2009     2  0  60 -20 -     0 -      Oct20 ?        00:00:00 [xfs-conv/dm-2]
1 I root      2010     2  0  60 -20 -     0 -      Oct20 ?        00:00:00 [xfs-cil/dm-2]
1 I root      2011     2  0  60 -20 -     0 -      Oct20 ?        00:00:00 [xfs-reclaim/dm-]
1 I root      2012     2  0  60 -20 -     0 -      Oct20 ?        00:00:00 [xfs-eofblocks/d]
1 I root      2013     2  0  60 -20 -     0 -      Oct20 ?        00:00:00 [xfs-log/dm-2]
1 S root      2014     2  0  80   0 -     0 -      Oct20 ?        00:00:00 [xfsaild/dm-2]
0 S root      2226  1942  0  80   0 - 30496 -      17:41 ?        00:00:00 /bin/sh -f /usr/sbin/xfs_freeze -u /usr/local/appian
4 D root      2230  2226  0  80   0 - 29825 -      17:41 ?        00:00:00 /usr/sbin/xfs_io -F -r -p xfs_freeze -x -c thaw /usr/local/appian
1 I root      4068     2  0  60 -20 -     0 -      Oct20 ?        00:00:00 [xfs-buf/dm-3]
1 I root      4069     2  0  60 -20 -     0 -      Oct20 ?        00:00:00 [xfs-conv/dm-3]
1 I root      4070     2  0  60 -20 -     0 -      Oct20 ?        00:00:00 [xfs-cil/dm-3]
1 I root      4071     2  0  60 -20 -     0 -      Oct20 ?        00:00:00 [xfs-reclaim/dm-]
1 I root      4072     2  0  60 -20 -     0 -      Oct20 ?        00:00:00 [xfs-eofblocks/d]
1 I root      4073     2  0  60 -20 -     0 -      Oct20 ?        00:00:00 [xfs-log/dm-3]
1 S root      4074     2  0  80   0 -     0 -      Oct20 ?        00:02:01 [xfsaild/dm-3]
0 S root     10903 13809  0  80   0 - 29855 -      17:57 pts/0    00:00:00 grep --color=auto xfs
1 I root     11221     2  0  80   0 -     0 -      17:44 ?        00:00:00 [kworker/1:4-xfs]
4 D root     22209     1  0  80   0 - 29825 -      16:08 ?        00:00:00 /usr/sbin/xfs_io -F -r -p xfs_freeze -x -c thaw /usr/local/appian
1 I root     23187     2  0  80   0 -     0 -      17:37 ?        00:00:00 [kworker/1:3-xfs]
1 I root     30083     2  0  80   0 -     0 -      17:29 ?        00:00:00 [kworker/1:1-xfs]
Comment 15 Christian Theune 2021-11-09 05:19:05 UTC
@Pedram: not sure those are related. I'm 50/50 whether my issue is freeze-related and all my stacktraces have a 'xfs_log_commit_cil' in there which I don't see in yours, could be a separate issue.
Comment 16 Christian Theune 2021-11-10 15:16:35 UTC
I started trying out the fix that Dave and am using it with 5.10.76 (applied clean with a bit of fuzzing).

@Dave do you happen do know whether there's a helper that can stress test live systems in this regard?
Comment 17 Dave Chinner 2021-11-20 22:31:53 UTC
On Wed, Nov 10, 2021 at 03:16:35PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=214767
> 
> --- Comment #16 from Christian Theune (ct@flyingcircus.io) ---
> I started trying out the fix that Dave and am using it with 5.10.76 (applied
> clean with a bit of fuzzing).
> 
> @Dave do you happen do know whether there's a helper that can stress test
> live
> systems in this regard?

fstests has some tests in the "freeze" group that stress
freeze/thaw. And it has lots of other tests in it that will tell you
if there's a regression in your backport, so might be an idea to run
a full "auto" group pass rather than just the freeze group....

Cheers,

Dave.

Note You need to log in before you can comment on or make changes to this bug.