Several reporters note that after commit c1ccfcf1a9bf ("NFSD: Reschedule CB operations when backchannel rpc_clnt is shut down"), NFSD's callback work queue is blocked when one of the clients is unresponsive. We know that NFSD's callback_wq is single-threaded (ordered), and that there is only one WQ for all of the NFS server's clients. What blocks callback operations is the retry loop in nfsd4_run_cb_work(). It was added to ensure that CB_OFFLOAD operations are delivered reliably, but it causes head-of-queue blocking when any NFS client becomes unresponsive when a callback operation is pending. We've partially addressed this by giving each lease its own callback_wq. However it's clear that retrying callback operations from within the callback WQ is going to be problematic to some extent. The solution is to hoist the responsibility for retrying higher up into the individual implementations of the callback operations (CB_RECALL, CB_NOTIFY_LOCK, CB_OFFLOAD, and so on), since each of these operations has their own needs in terms of recourse when a callback operation cannot be sent.
A little code audit: 1 fs/nfsd/nfs4layouts.c 739 static const struct nfsd4_callback_ops nfsd4_cb_layout_ops = { 2 fs/nfsd/nfs4proc.c 1622 static const struct nfsd4_callback_ops nfsd4_cb_offload_ops = { 6 fs/nfsd/nfs4state.c 399 static const struct nfsd4_callback_ops nfsd4_cb_notify_lock_ops = { 7 fs/nfsd/nfs4state.c 3079 static const struct nfsd4_callback_ops nfsd4_cb_recall_any_ops = { 8 fs/nfsd/nfs4state.c 3084 static const struct nfsd4_callback_ops nfsd4_cb_getattr_ops = { 9 fs/nfsd/nfs4state.c 5182 static const struct nfsd4_callback_ops nfsd4_cb_recall_ops = { We have these five callback operations to deal with. I think the ->release nfsd4_callback_ops method might be used to schedule retry -- it's invoked by nfsd41_destroy_cb(), which should be able to tell whether a reply has been received. Now I just need to figure out how to keep a record of needing to resend a callback.
The retry loop in this case is requeueing the work using queue_delayed_work. That shouldn't be blocking jobs with shorter delays that are sitting on the same queue. Are you certain that's the case? That sounds like a bug in the workqueue implementation if so.
Agreed, the queue_delayed_work() isn't working the way I expected, but it may behave differently with an ordered work queue than it does with a bog standard work queue instance. In any event, some CB operations can be "fire and forget" while others will want some recourse on failure-to-send, and at least CB_OFFLOAD needs to be as reliable as we can make it. Thus having specific retry handlers for each CB operation seems like the best long-term approach.
Commit c1ccfcf1a9bf ("NFSD: Reschedule CB operations when backchannel rpc_clnt is shut down") was reverted from v6.9-rc to prevent an unresponsive client from backing up callbacks from all clients. In addition, I'm planning to prototype an implementation of OFFLOAD_STATUS for the Linux NFS client so the COPY operations don't hang if the CB_OFFLOAD gets lost.