Bug 218704

Summary: WQ_MEM_RECLAIM xprtiod:xprt_autoclose is flushing !WQ_MEM_RECLAIM events_highpri:rpcrdma_mr_refresh_worker
Product: File System Reporter: Chuck Lever (cel)
Component: NFSAssignee: Trond Myklebust (trondmy)
Status: NEW ---    
Severity: normal    
Priority: P3    
Hardware: All   
OS: Linux   
Kernel Version: Subsystem:
Regression: No Bisected commit-id:

Description Chuck Lever 2024-04-10 16:32:53 UTC
Over the past year or so, I've heard several reports of this workqueue splat during transport disconnect:

workqueue: WQ_MEM_RECLAIM xprtiod:xprt_autoclose [sunrpc] is flushing !WQ_MEM_RECLAIM events_highpri:rpcrdma_mr_refresh_worker [rpcrdma]
WARNING: CPU: 1 PID: 20378 at kernel/workqueue.c:3728 check_flush_dependency+0x101/0x120

 ? check_flush_dependency+0x101/0x120
 ? report_bug+0x175/0x1a0
 ? handle_bug+0x44/0x90
 ? exc_invalid_op+0x1c/0x70
 ? asm_exc_invalid_op+0x1f/0x30
 ? __pfx_rpcrdma_mr_refresh_worker+0x10/0x10 [rpcrdma aefd3d1b298311368fa14fa93ae5fb3818c3aeac]
 ? check_flush_dependency+0x101/0x120
 __flush_work.isra.0+0x20a/0x290
 __cancel_work_sync+0x129/0x1c0
 cancel_work_sync+0x14/0x20
 rpcrdma_xprt_disconnect+0x229/0x3f0 [rpcrdma aefd3d1b298311368fa14fa93ae5fb3818c3aeac]
 xprt_rdma_close+0x16/0x40 [rpcrdma aefd3d1b298311368fa14fa93ae5fb3818c3aeac]
 xprt_autoclose+0x63/0x110 [sunrpc a04d701bce94b5a8fb541cafbe1a489d6b1ab5b3]
 process_one_work+0x19e/0x3f0
 worker_thread+0x340/0x510
 ? __pfx_worker_thread+0x10/0x10
 kthread+0xf7/0x130
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x41/0x60
 ? __pfx_kthread+0x10/0x10
 ret_from_fork_asm+0x1a/0x30

Though alarming, it's relatively harmless. xprtiod is a WQ_MEM_RECLAIM work queue because reconnecting to an NFS server might be necessary to handle a direct reclaim. The MR refresh worker uses a !WQ_MEM_RECLAIM work queue because the RDMA core does not yet implement MEM_RECLAIM safety.

To address this splat in the short term, the work of releasing hardware-related resources in rpcrdma_xprt_disconnect() can be deferred to a !WQ_MEM_RECLAIM work queue. We might accomplish that by moving the req, rep, and sendctx data structures to struct rpcrdma_ep, and then release that during connection tear-down via a normal work queue item or via an RCU-controlled delay.
Comment 1 Chuck Lever 2024-04-13 16:29:47 UTC
Because rpcrdma_ep_destroy() is called directly by rpcrdma_xprt_disconnect(), it's also a potential source of these splats (it invokes RDMA core API functions, which don't generally tolerate being called in a MEM_RECLAIM context).

Perhaps the first step, then, is to move ep_destroy to a deferred context.