On recent v6.1.y, intermittently, NFSD threads wait forever for NFSv4 callback to shutdown. The wait is in __flush_workqueue(). A server system reboot is necessary to recover. On new kernels, similar symptoms but the indefinite wait is in the "destroy client" path, waiting for NFSv4 callback shutdown. The wait is on the wait_var_event() in nfsd41_cb_inflight_wait_complete(). In some cases, clients suspend (inactivity). The server converts them to courteous clients. The NFSv4 callback shutdown workqueue item for that client appears to be stuck waiting in rpc_shutdown_client(). Let's collect data under this bug report.
Chuck Lever <chuck.lever@oracle.com> writes: On 1/20/25 10:00 AM, Chuck Lever via Bugspray Bot wrote: > Chuck Lever writes via Kernel.org Bugzilla: > > On recent v6.1.y, intermittently, NFSD threads wait forever for NFSv4 > callback to shutdown. The wait is in __flush_workqueue(). A server system > reboot is necessary to recover. > > On new kernels, similar symptoms but the indefinite wait is in the "destroy > client" path, waiting for NFSv4 callback shutdown. The wait is on the > wait_var_event() in nfsd41_cb_inflight_wait_complete(). > > In some cases, clients suspend (inactivity). The server converts them to > courteous clients. The NFSv4 callback shutdown workqueue item for that client > appears to be stuck waiting in rpc_shutdown_client(). > > Let's collect data under this bug report. > > View: https://bugzilla.kernel.org/show_bug.cgi?id=219710#c0 > You can reply to this message to join the discussion. I've created a bugzilla.kernel.org report (see link above) so we can collect data in a common place. New comments on that bug are reflected to linux-nfs@vger.kernel.org. Instead of sending attachments to me or the list, please attach your data to this bug. (via https://msgid.link/70ba83c5-5104-4f8d-bd18-95dcc8c82551@oracle.com)
During server operation, you can capture trace events that might lead up to the hang with this command: # trace-cmd record -e nfsd:nfsd_cb_\* -e sunrpc:svc_xprt_detach -p function -l nfsd4_destroy_session These events are infrequent and tracing is lightweight, so this capture won't be a burden. When a server hang is observed, typically, "INFO: task has been blocked for 120 seconds" appears in the server's system journal. Basic collection instructions are: # rpcdebug -m rpc -c && echo t > /proc/sysrq-trigger Then stop the trace and package trace.dat and the system journal and attach them to this bug.
v6.1 symptomology: The following stack trace shows that the active callback workqueue item is stuck in rpc_shutdown_client() waiting for RPC operations to complete. Because the callback workqueue is single-threaded (max_active = 1) and this work item never completes, it prevents subsequent flush_workqueue() calls on the callback work queue from completing. > [7257352.347503] task:kworker/u32:6 state:D stack:0 pid:173983 ppid:2 > flags:0x00004000 > [7257352.347511] Workqueue: nfsd4_callbacks nfsd4_run_cb_work [nfsd] > [7257352.347568] Call Trace: > [7257352.347571] <TASK> > [7257352.347577] __schedule+0x34d/0x9e0 > [7257352.347589] schedule+0x5a/0xd0 > [7257352.347597] schedule_timeout+0x94/0x150 > [7257352.347606] ? __bpf_trace_tick_stop+0x10/0x10 > [7257352.347616] rpc_shutdown_client+0xf2/0x150 [sunrpc] > [7257352.347683] ? cpuusage_read+0x10/0x10 > [7257352.347694] nfsd4_process_cb_update+0x4c/0x270 [nfsd] > [7257352.347763] nfsd4_run_cb_work+0x9f/0x150 [nfsd] > [7257352.347812] process_one_work+0x1c7/0x380 > [7257352.347824] worker_thread+0x4d/0x380 > [7257352.347835] ? rescuer_thread+0x3a0/0x3a0 > [7257352.347843] kthread+0xda/0x100 > [7257352.347849] ? kthread_complete_and_exit+0x20/0x20 > [7257352.347859] ret_from_fork+0x22/0x30 > [7257352.347875] </TASK> Based on the full stack trace output, there do not appear to be any outstanding RPC operations.
origin/linux-6.1.y is missing this commit: > 961b4b5e86bf ("NFSD: Reset cb_seq_status after NFS4ERR_DELAY") which is related to the problem reported in this bug, though it might not be a fix. This branch is based on 6.1.126 and has 961b4b5 applied to it for easy testing. > > https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git/log/?h=nfsd-6.1.y
Created attachment 307511 [details] Recorded trace 20250120 (syslog + trace-cmd) Sorry no rpcdebug logs. I will try to record next crash.
I looked at this with Chuck the other day. As far as wait_var_event() getting stuck, I think all that would take is for nfsd4_cb_sequence_done() to continually set cb_need_restart on every call. That would cause the callback to not be destroyed and to never call nfsd41_cb_inflight_end(). That happens in the need_restart: label in nfsd4_cb_sequence_done. These two cases goto that: if (!clp->cl_minorversion) { /* * If the backchannel connection was shut down while this * task was queued, we need to resubmit it after setting up * a new backchannel connection. * * Note that if we lost our callback connection permanently * the submission code will error out, so we don't need to * handle that case here. */ if (RPC_SIGNALLED(task)) goto need_restart; return true; } if (cb->cb_held_slot < 0) goto need_restart; It doesn't seem likely that it somehow lost the slot, so my guess is that the RPC task is continually returning with RPC_SIGNALLED() set. Question for Baptiste -- what NFS versions are your clients using?
The trace captures I've reviewed suggest that a callback session is in use, so I would say the NFS minor version is 1 or higher. Perhaps it's not the RPC_SIGNALLED test above that is the problem, but the one later in nfsd4_cb_sequence_done().
Hello. My clients are all v4.2 using Debian stable kernel 6.1.0-29-amd64 (= 6.1.123-1). If you suspect one of my client doing "wrong things" checking my server trace. Give me the IP and I can give you back the client syslog. In my case, one point seems important. The hang occur almost exactly one time per week. But not always the same day in the week (depending on when I reboot the server). The greater is the server uptime, the greater is the probability to see some "unrecognized reply" messages in syslog. I always see one or two "unrecognized reply" message around 120 seconds before the hang message. So it may something that happen on client or server weekly jobs ? Or maybe some memory leak or cache corruption ? Or something related to expired Kerberos cache file ? Or expired NFS session ? ... It seems also that the number of nfsd_cb_recall_any callback message increase with the server uptime. This seems in favor of the memory leak hypothesis. Regards, Baptiste.
(In reply to Baptiste PELLEGRIN from comment #8) > I always see one or two "unrecognized reply" message around 120 seconds > before the hang message. > > So it may something that happen on client or server weekly jobs ? > Or maybe some memory leak or cache corruption ? > Or something related to expired Kerberos cache file ? > Or expired NFS session ? > ... > > It seems also that the number of nfsd_cb_recall_any callback message > increase with the server uptime. This seems in favor of the memory leak > hypothesis. The server generates a CB_RECALL_ANY message for each active client. If the number of active clients increases from zero at server boot time to a few dozen, that would also explain why you see more of these over time. If your NFS server does not also have NFS mount points, a few client-side trace points can be enabled to capture more details about NFSv4 callback activity. "-e sunrpc:xprt_reserve" for example would help us match the XIDs in the callback operations to the messages you see in the server's system journal.
(In reply to Chuck Lever from comment #7) > The trace captures I've reviewed suggest that a callback session is in use, > so I would say the NFS minor version is 1 or higher. Perhaps it's not the > RPC_SIGNALLED test above that is the problem, but the one later in > nfsd4_cb_sequence_done(). Ok, good. Knowing that it's not v4.0 allows us to rule out some codepaths. There are a couple of other cases where we goto need_restart: The NFS4ERR_BADSESSION case does this, and also if it doesn't get a reply at all (case 1). There is also this that looks a little sketchy: ------------8<------------------- trace_nfsd_cb_free_slot(task, cb); nfsd41_cb_release_slot(cb); if (RPC_SIGNALLED(task)) goto need_restart; out: return ret; retry_nowait: if (rpc_restart_call_prepare(task)) ret = false; goto out; need_restart: if (!test_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags)) { trace_nfsd_cb_restart(clp, cb); task->tk_status = 0; cb->cb_need_restart = true; } return false; ------------8<------------------- Probably now the same bug, but it looks like if RPC_SIGNALLED returns true, then it'll restart the RPC after releasing the slot. It seems like that could break the reply cache handling, as the restarted call could be on a different slot. I'll look at patching that, at least, though I'm not sure it's related to the hang. More notes. The only way RPC_TASK_SIGNALLED gets set is: nfsd4_process_cb_update() rpc_shutdown_client() rpc_killall_tasks() That gets called if: if (clp->cl_flags & NFSD4_CLIENT_CB_FLAG_MASK) nfsd4_process_cb_update(cb); Which means that NFSD4_CLIENT_CB_UPDATE was probably set? NFSD4_CLIENT_CB_KILL seems less likely since that would nerf the cb_need_restart handling.
Tom Talpey <tom@talpey.com> replies to comment #10: On 1/21/2025 12:35 PM, Jeff Layton via Bugspray Bot wrote: > Jeff Layton writes via Kernel.org Bugzilla: > > (In reply to Chuck Lever from comment #7) >> The trace captures I've reviewed suggest that a callback session is in use, >> so I would say the NFS minor version is 1 or higher. Perhaps it's not the >> RPC_SIGNALLED test above that is the problem, but the one later in >> nfsd4_cb_sequence_done(). > > > Ok, good. Knowing that it's not v4.0 allows us to rule out some codepaths. > There are a couple of other cases where we goto need_restart: > > The NFS4ERR_BADSESSION case does this, and also if it doesn't get a reply at > all (case 1). Note that one thread in Benoît's recent logs is stuck in nfsd4_bind_conn_to_session(), and three threads also in nfsd4_destroy_session(), so there is certainly some session/connection dance going on. Combining an invalid replay cache entry could easily make things worse. There's also one thread in nfsd4_destroy_clientid(), which seems important, but odd. And finally, the laundromat is running. No shortage of races! Tom. > There is also this that looks a little sketchy: > > ------------8<------------------- > trace_nfsd_cb_free_slot(task, cb); > nfsd41_cb_release_slot(cb); > > if (RPC_SIGNALLED(task)) > goto need_restart; > out: > return ret; > retry_nowait: > if (rpc_restart_call_prepare(task)) > ret = false; > goto out; > need_restart: > if (!test_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags)) { > trace_nfsd_cb_restart(clp, cb); > task->tk_status = 0; > cb->cb_need_restart = true; > } > return false; > ------------8<------------------- > > Probably now the same bug, but it looks like if RPC_SIGNALLED returns true, > then it'll restart the RPC after releasing the slot. It seems like that could > break the reply cache handling, as the restarted call could be on a different > slot. I'll look at patching that, at least, though I'm not sure it's related > to the hang. > > More notes. The only way RPC_TASK_SIGNALLED gets set is: > > nfsd4_process_cb_update() > rpc_shutdown_client() > rpc_killall_tasks() > > That gets called if: > > if (clp->cl_flags & NFSD4_CLIENT_CB_FLAG_MASK) > nfsd4_process_cb_update(cb); > > Which means that NFSD4_CLIENT_CB_UPDATE was probably set? > NFSD4_CLIENT_CB_KILL seems less likely since that would nerf the > cb_need_restart handling. > > View: https://bugzilla.kernel.org/show_bug.cgi?id=219710#c10 > You can reply to this message to join the discussion. (via https://msgid.link/cf8650cb-1d2e-4771-981a-d66d2c455637@talpey.com)
Chuck Lever <chuck.lever@oracle.com> replies to comment #11: On 1/21/25 2:38 PM, Tom Talpey wrote: > On 1/21/2025 12:35 PM, Jeff Layton via Bugspray Bot wrote: >> Jeff Layton writes via Kernel.org Bugzilla: >> >> (In reply to Chuck Lever from comment #7) >>> The trace captures I've reviewed suggest that a callback session is >>> in use, >>> so I would say the NFS minor version is 1 or higher. Perhaps it's not >>> the >>> RPC_SIGNALLED test above that is the problem, but the one later in >>> nfsd4_cb_sequence_done(). >> >> >> Ok, good. Knowing that it's not v4.0 allows us to rule out some >> codepaths. >> There are a couple of other cases where we goto need_restart: >> >> The NFS4ERR_BADSESSION case does this, and also if it doesn't get a >> reply at all (case 1). > > Note that one thread in Benoît's recent logs is stuck in > nfsd4_bind_conn_to_session(), and three threads also in > nfsd4_destroy_session(), so there is certainly some > session/connection dance going on. Combining an invalid > replay cache entry could easily make things worse. Yes, the client returns RETRY_UNCACHED_REP for some of the backchannel operations. NFSD never asserts cachethis in CB_SEQUENCE. I'm trying to understand why NFSD would skip incrementing its slot sequence number. > There's also one thread in nfsd4_destroy_clientid(), which > seems important, but odd. And finally, the laundromat is > running. No shortage of races! The hangs are all related here: they are waiting for flush_workqueue() on the callback workqueue. In v6.1, there is only one callback_wq and it's max_active is 1. If the current work item hangs, then that work queue stalls. > Tom. > > >> There is also this that looks a little sketchy: >> >> ------------8<------------------- >> trace_nfsd_cb_free_slot(task, cb); >> nfsd41_cb_release_slot(cb); >> >> if (RPC_SIGNALLED(task)) >> goto need_restart; >> out: >> return ret; >> retry_nowait: >> if (rpc_restart_call_prepare(task)) >> ret = false; >> goto out; >> need_restart: >> if (!test_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags)) { >> trace_nfsd_cb_restart(clp, cb); >> task->tk_status = 0; >> cb->cb_need_restart = true; >> } >> return false; >> ------------8<------------------- >> >> Probably now the same bug, but it looks like if RPC_SIGNALLED returns >> true, then it'll restart the RPC after releasing the slot. It seems >> like that could break the reply cache handling, as the restarted call >> could be on a different slot. I'll look at patching that, at least, >> though I'm not sure it's related to the hang. >> >> More notes. The only way RPC_TASK_SIGNALLED gets set is: >> >> nfsd4_process_cb_update() >> rpc_shutdown_client() >> rpc_killall_tasks() >> >> That gets called if: >> >> if (clp->cl_flags & NFSD4_CLIENT_CB_FLAG_MASK) >> nfsd4_process_cb_update(cb); >> >> Which means that NFSD4_CLIENT_CB_UPDATE was probably set? >> NFSD4_CLIENT_CB_KILL seems less likely since that would nerf the >> cb_need_restart handling. >> >> View: https://bugzilla.kernel.org/show_bug.cgi?id=219710#c10 >> You can reply to this message to join the discussion. > (via https://msgid.link/5243289d-12d3-403b-847d-491d9fe66af4@oracle.com)
(In reply to Chuck Lever from comment #9) > > The server generates a CB_RECALL_ANY message for each active client. If the > number of active clients increases from zero at server boot time to a few > dozen, that would also explain why you see more of these over time. This mean that something can stay active overnight ? In my case, no client are running outside opening hour. They are all suspended or power off. > > "-e sunrpc:xprt_reserve" for example would help us match the XIDs in the > callback operations to the messages you see in the server's system journal. I will try to help you I much as I can. Is not really a problem for me to run trace-cmd on all clients as they are all managed with Puppet. I will try as soon as possible. You confirm to me that I need to run "trace-cmd record -e sunrpc:xprt_reserve" on all my clients ? No more flags ? Did you see that I have added the "-e sunrpc:svc_xprt_detach -p function -l nfsd4_destroy_session" flags in my last recorded trace ? Do I need to add new one for the next crash ? I will also send to you the dump of current kernel task next time. Regards, Baptiste.
Chuck Lever <chuck.lever@oracle.com> replies to comment #13: On 1/22/25 6:40 AM, Baptiste PELLEGRIN via Bugspray Bot wrote: > Baptiste PELLEGRIN writes via Kernel.org Bugzilla: > > (In reply to Chuck Lever from comment #9) >> >> The server generates a CB_RECALL_ANY message for each active client. If the >> number of active clients increases from zero at server boot time to a few >> dozen, that would also explain why you see more of these over time. > > This mean that something can stay active overnight ? In my case, no client > are running outside opening hour. They are all suspended or power off. > >> >> "-e sunrpc:xprt_reserve" for example would help us match the XIDs in the >> callback operations to the messages you see in the server's system journal. > > > I will try to help you I much as I can. Is not really a problem for me to run > trace-cmd on all clients as they are all managed with Puppet. I will try as > soon as possible. > > You confirm to me that I need to run "trace-cmd record -e > sunrpc:xprt_reserve" on all my clients ? No more flags ? Don't change the clients. The additional command line option above goes on the existing server trace-cmd, but only if the NFS server system does not have any NFS mounts of its own. I'm trying to keep the amount of generated trace data to a minimum. The point here is to record the XIDs of each of the backchannel operations. These XIDs will show up in the "unrecognized XID" messages on your server. > Did you see that I have added the "-e sunrpc:svc_xprt_detach -p function -l > nfsd4_destroy_session" flags in my last recorded trace ? Do I need to add new > one for the next crash ? > > I will also send to you the dump of current kernel task next time. (via https://msgid.link/10abf17a-b996-4881-9cf1-675fb0f5f9e8@oracle.com)
Created attachment 307527 [details] Logs and traces from Jan-18 pt1 Logs and traces from Jan-18 pt1 Here are the traces from two NFS crashes that occurred this past weekend. Both occurred in the AM (US time) on Jan 18, a few hours apart from one another. I followed the instructions I found on the various threads. There was no output to `rpcdebug -m rpc -c`, not sure what I did wrong there. The syslog ought to contain the output of sysrq-trigger, however. The output from trace-cmd captures several days' worth of logs in either case, but not from system boot. The syslogs I have cut from ~one hour before the incident until it finished shutting down prior to reboot. I have removed the output of other services. Both are VMs on GCE running the 6.1.119 kernel from Debian bookworm (6.1.0-28) ~60Gi memory, 16 CPUs.
Created attachment 307528 [details] Logs and traces from Jan-18 pt2 Part 2 of previous
(In reply to Chuck Lever from comment #0) > On recent v6.1.y, intermittently, NFSD threads wait forever for NFSv4 > callback to shutdown. The wait is in __flush_workqueue(). A server system > reboot is necessary to recover. > > On new kernels, similar symptoms but the indefinite wait is in the "destroy > client" path, waiting for NFSv4 callback shutdown. The wait is on the > wait_var_event() in nfsd41_cb_inflight_wait_complete(). Hi, I've had a similar problem recently. And I've also done some analysis. [ 6526.031343] INFO: task bash:846259 blocked for more than 606 seconds. [ 6526.032060] Not tainted 6.6.0-gfbf24d352c28-dirty #22 [ 6526.032635] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 6526.033404] task:bash state:D stack:0 pid:846259 ppid:838395 flags:0x0000020d [ 6526.034226] Call trace: [ 6526.034527] __switch_to+0x218/0x3e0 [ 6526.034925] __schedule+0x734/0x11a8 [ 6526.035323] schedule+0xa8/0x200 [ 6526.035731] nfsd4_shutdown_callback+0x24c/0x2f0 [ 6526.036228] __destroy_client+0x414/0x680 [ 6526.036663] nfs4_state_destroy_net+0x144/0x448 [ 6526.037152] nfs4_state_shutdown_net+0x2c8/0x450 [ 6526.037640] nfsd_shutdown_net+0x100/0x2e0 [ 6526.038078] nfsd_last_thread+0x190/0x330 [ 6526.038518] nfsd_svc+0x3cc/0x4a0 [ 6526.038892] write_threads+0x15c/0x2f0 [ 6526.039301] nfsctl_transaction_write+0x90/0xd0 [ 6526.039836] vfs_write+0x110/0x688 [ 6526.040221] ksys_write+0xd0/0x188 [ 6526.040607] __arm64_sys_write+0x4c/0x68 [ 6526.041035] invoke_syscall+0x68/0x198 [ 6526.041455] el0_svc_common.constprop.0+0x11c/0x150 [ 6526.041967] do_el0_svc+0x38/0x50 [ 6526.042353] el0_svc+0x5c/0x240 [ 6526.042723] el0t_64_sync_handler+0x100/0x130 [ 6526.043186] el0t_64_sync+0x188/0x190 [ 6526.051007] INFO: task cat:846265 blocked for more than 606 seconds. 1) Check cl_cb_inflight crash> nfs4_client.cl_cb_inflight ffff000012338f08 cl_cb_inflight = { counter = 1 }, crash> 2) No work is associated with nfsd Only two works unrelated to NFSD. crash> p callback_wq callback_wq = $1 = (struct workqueue_struct *) 0xffff0000c30a1400 crash> crash> workqueue_struct.cpu_pwq 0xffff0000c30a1400 cpu_pwq = 0xccfe9cb5d8d0 crash> kmem -o PER-CPU OFFSET VALUES: CPU 0: ffff2f015341c000 CPU 1: ffff2f0153442000 CPU 2: ffff2f0153468000 CPU 3: ffff2f015348e000 crash> // ffff2f015341c000 + ccfe9cb5d8d0 = FFFFFBFFEFF798D0 crash> rd FFFFFBFFEFF798D0 fffffbffeff798d0: ffff0000d3488d00 ..H..... crash> // ffff2f0153442000 + ccfe9cb5d8d0 = FFFFFBFFEFF9F8D0 crash> rd FFFFFBFFEFF9F8D0 fffffbffeff9f8d0: ffff0000d3488d00 ..H..... crash> // ffff2f0153468000 + ccfe9cb5d8d0 = FFFFFBFFEFFC58D0 crash> rd FFFFFBFFEFFC58D0 fffffbffeffc58d0: ffff0000d3488d00 ..H..... crash> // ffff2f015348e000 + ccfe9cb5d8d0 = FFFFFBFFEFFEB8D0 crash> rd FFFFFBFFEFFEB8D0 fffffbffeffeb8d0: ffff0000d3488d00 ..H..... crash> crash> pool_workqueue.pool ffff0000d3488d00 pool = 0xffff0000c01b6800, crash> crash> worker_pool.worklist 0xffff0000c01b6800 worklist = { next = 0xffff0000c906c4a8, prev = 0xffffd0ff8944fc68 <stats_flush_dwork+8> }, crash> crash> list 0xffff0000c906c4a8 ffff0000c906c4a8 ffffd0ff8944fc68 ffff0000c01b6860 crash> crash> work_struct.func ffff0000c906c4a0 func = 0xffffd0ff84fae128 <wb_update_bandwidth_workfn>, crash> work_struct.func 0xffffd0ff8944fc60 func = 0xffffd0ff8510b258 <flush_memcg_stats_dwork>, crash> 3) No running kworker I checked vmcore by "foreach bt" and find all kworker are as follows. PID: 62 TASK: ffff0000c31d0040 CPU: 1 COMMAND: "kworker/R-nfsio" #0 [ffff800080c27b80] __switch_to at ffffd0ff866297dc #1 [ffff800080c27bd0] __schedule at ffffd0ff8662a180 #2 [ffff800080c27d00] schedule at ffffd0ff8662ac9c #3 [ffff800080c27d40] rescuer_thread at ffffd0ff84b418e4 #4 [ffff800080c27e60] kthread at ffffd0ff84b52e14 PID: 94 TASK: ffff0000c74ba080 CPU: 0 COMMAND: "kworker/0:1H" #0 [ffff800080e07c00] __switch_to at ffffd0ff866297dc #1 [ffff800080e07c50] __schedule at ffffd0ff8662a180 #2 [ffff800080e07d80] schedule at ffffd0ff8662ac9c #3 [ffff800080e07dc0] worker_thread at ffffd0ff84b40f94 #4 [ffff800080e07e60] kthread at ffffd0ff84b52e14 4) Check works releated to nfsd4_run_cb_work crash> p nfsd4_run_cb_work nfsd4_run_cb_work = $5 = {void (struct work_struct *)} 0xffffd0ff855691e0 <nfsd4_run_cb_work> crash> search ffffd0ff855691e0 ffff000010474138: ffffd0ff855691e0 ffff0000104750f8: ffffd0ff855691e0 ffff0000104752f0: ffffd0ff855691e0 ffff0000104756e0: ffffd0ff855691e0 ffff000012338388: ffffd0ff855691e0 ffff000012339288: ffffd0ff855691e0 ffff00001233a908: ffffd0ff855691e0 ffff00001233b808: ffffd0ff855691e0 ffff0000c745d038: ffffd0ff855691e0 ffff0000c86499f8: ffffd0ff855691e0 ffff0000c8649b30: ffffd0ff855691e0 ffff0000c9ff8dc8: ffffd0ff855691e0 crash> ffff000010474138 --> (work) ffff000010474120 ffff0000104750f8 --> (work) ffff0000104750e0 ffff0000104752f0 --> (work) ffff0000104752d8 ffff0000104756e0 --> (work) ffff0000104756c8 ffff000012338388 --> (work) ffff000012338370 ffff000012339288 --> (work) ffff000012339270 ffff00001233a908 --> (work) ffff00001233a8f0 ffff00001233b808 --> (work) ffff00001233b7f0 ffff0000c745d038 --> (work) ffff0000c745d020 ffff0000c86499f8 --> (work) ffff0000c86499e0 ffff0000c8649b30 --> (work) ffff0000c8649b18 ffff0000c9ff8dc8 --> (work) ffff0000c9ff8db0 crash> work_struct.data ffff0000104750e0 data = { counter = 68719476704 // FFFFFFFE0 bit0~5 are 0 }, crash> crash> work_struct.data ffff0000c9ff8db0 data = { counter = 256 // 0x100 }, crash> I have added some debug information and am trying to reproduce it. Could you please provide more information you got? Or any suggestions about this? Thanks. > > In some cases, clients suspend (inactivity). The server converts them to > courteous clients. The NFSv4 callback shutdown workqueue item for that > client appears to be stuck waiting in rpc_shutdown_client(). > > Let's collect data under this bug report.
There is another scenario that could explain a hang here. From nfsd4_cb_sequence_done(): ------------------8<--------------------- case -NFS4ERR_BADSLOT: goto retry_nowait; case -NFS4ERR_SEQ_MISORDERED: if (session->se_cb_seq_nr[cb->cb_held_slot] != 1) { session->se_cb_seq_nr[cb->cb_held_slot] = 1; goto retry_nowait; } break; default: nfsd4_mark_cb_fault(cb->cb_clp); } trace_nfsd_cb_free_slot(task, cb); nfsd41_cb_release_slot(cb); if (RPC_SIGNALLED(task)) goto need_restart; out: return ret; retry_nowait: if (rpc_restart_call_prepare(task)) ret = false; goto out; ------------------8<--------------------- Since it doesn't check RPC_SIGNALLED in the v4.1+ case until very late in the function, it's possible to get a BADSLOT or SEQ_MISORDERED error that causes the callback client to immediately resubmit the rpc_task to the RPC engine without resubmitting to the callback workqueue. I think that we should assume that when RPC_SIGNALLED returns true that the result is suspect, and that we should halt further processing into the CB_SEQUENCE response and restart the callback.
Chuck Lever <chuck.lever@oracle.com> replies to comment #18: On 1/23/25 8:50 AM, Jeff Layton via Bugspray Bot wrote: > Jeff Layton writes via Kernel.org Bugzilla: > > There is another scenario that could explain a hang here. From > nfsd4_cb_sequence_done(): > > ------------------8<--------------------- > case -NFS4ERR_BADSLOT: > goto retry_nowait; > case -NFS4ERR_SEQ_MISORDERED: > if (session->se_cb_seq_nr[cb->cb_held_slot] != 1) { > session->se_cb_seq_nr[cb->cb_held_slot] = 1; > goto retry_nowait; > } > break; > default: > nfsd4_mark_cb_fault(cb->cb_clp); > } > trace_nfsd_cb_free_slot(task, cb); > nfsd41_cb_release_slot(cb); > > if (RPC_SIGNALLED(task)) > goto need_restart; > out: > return ret; > retry_nowait: > if (rpc_restart_call_prepare(task)) > ret = false; > goto out; > ------------------8<--------------------- > > Since it doesn't check RPC_SIGNALLED in the v4.1+ case until very late in the > function, it's possible to get a BADSLOT or SEQ_MISORDERED error that causes > the callback client to immediately resubmit the rpc_task to the RPC engine > without resubmitting to the callback workqueue. > > I think that we should assume that when RPC_SIGNALLED returns true that the > result is suspect, and that we should halt further processing into the > CB_SEQUENCE response and restart the callback. When cb->cb_seq_status is set to any value other than 1, that means the client replied successfully. RPC_SIGNALLED has nothing to do with whether a reply is suspect, it means only that the rpc_clnt has been asked to terminate. The potential loop you noticed is concerning, but I haven't seen evidence in the "echo t > sysrq-trigger" output that there is a running RPC such as you described here. (via https://msgid.link/2f9248c9-77c9-4a99-82db-6cedfb3161b9@oracle.com)
Created attachment 307533 [details] Recorded trace 20250123 This time I have got a complete trace. - trace-cmd with all flags enabled : trace-cmd record -e nfsd:nfsd_cb_\* -e sunrpc:svc_xprt_detach -e sunrpc:xprt_reserve -p function -l nfsd4_destroy_session - Syslog with all kernel messages and a task dump just after the crash. Hope this help.
An RPC task has shown up in the "rpcdebug" output this time: kernel: [262696.214194] -pid- flgs status -client- --rqstp- -timeout ---ops-- kernel: [262696.214199] 112 2281 0 a6ceb575 0 357 83bde67e nfs4_cbv1 CB_RECALL_ANY a:call_start [sunrpc] q:delayq Here is a CB_RECALL_ANY operation that is waiting to start. It's in the RPC client's delay_queue, so this looks like it came through the NFS4ERR_DELAY arm of nfsd4_cb_sequence_done(): the RPC was reset to the beginning of the FSM, and rpc_delay was called. There is a commit missing from origin/linux-6.1.y: 961b4b5e86bf ("NFSD: Reset cb_seq_status after NFS4ERR_DELAY") Without this commit, the server will continue rescheduling this RPC forever. That could be what is blocking rpc_shutdown_client9). This commit is applied to the nfsd-6.1.y branch of: https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git Can anyone on a v6.1-based kernel pull this branch and try it?
I patched the current Debian stable kernels : Debian Linux 6.1.0-29-amd64 (= 6.1.123-1) Based on Linux 6.1.120 Debian Linux 6.1.0-30-amd64 (= 6.1.124-1) Based on Linux 6.1.124 with there two commits : 8626664c87ee NFSD: Replace dprintks in nfsd4_cb_sequence_done() 961b4b5e86bf NFSD: Reset cb_seq_status after NFS4ERR_DELAY Now my two NFS servers are running these patched kernel. I need to wait one week to see If the crash still occur. Chuck, does this patch may remove the "unrecognized reply" messages from my syslog ? I see you have improved the backchannel session logging. Did I need to change my trace-record command ?
(In reply to Baptiste PELLEGRIN from comment #22) > I patched the current Debian stable kernels : > > Debian Linux 6.1.0-29-amd64 (= 6.1.123-1) Based on Linux 6.1.120 > Debian Linux 6.1.0-30-amd64 (= 6.1.124-1) Based on Linux 6.1.124 > > with there two commits : > > 8626664c87ee NFSD: Replace dprintks in nfsd4_cb_sequence_done() > 961b4b5e86bf NFSD: Reset cb_seq_status after NFS4ERR_DELAY > > Now my two NFS servers are running these patched kernel. I need to wait one > week to see If the crash still occur. It's probable that a hang might still occur, but the symptoms might be closer to the hangs that happen on v6.12. If so, I count that as step forward: one less bug. > Chuck, does this patch may remove the "unrecognized reply" messages from my > syslog ? It doesn't. > I see you have improved the backchannel session logging. Did I need to > change my trace-record command ? The trace-cmd command line I provided above uses a glob expression that should also enable the trace points added in 8626664c87ee ("NFSD: Replace dprintks in nfsd4_cb_sequence_done()").
Hi, We had similar tracing running as we observed a similar issue on our 6.11 kernel. Recently, we upgraded to 6.12.11 and this has resulted in the warning below. I'm not 100% sure this is the same issue. If not, let me know and I'll open a different thread/bug. Unfortunately the attachment is too large to upload as the trace.dat file is ~900MiB. I've uploaded it here: https://homes.esat.kuleuven.be/~rtheys/logs-6.12.11.tar.zst This file contains the warning, trace.dat file and the output of "echo t" > /proc/sysrq-trigger. The warning that was issued was: [Wed Jan 29 10:11:17 2025] cb_status=-521 tk_status=-10036 [Wed Jan 29 10:11:17 2025] WARNING: CPU: 16 PID: 1670626 at fs/nfsd/nfs4callback.c:1339 nfsd4_cb_done+0x171/0x180 [nfsd] [Wed Jan 29 10:11:17 2025] Modules linked in: dm_snapshot(E) nfsv4(E) dns_resolver(E) nfs(E) netfs(E) binfmt_misc(E) rpcsec_gss_krb5(E) rpcrdma(E) rdma_cm(E) iw_cm(E) ib_cm(E) ib_core(E) bonding(E) tls(E) rfkill(E) nft_ct(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nf_tables(E) nfnetlink(E) vfat(E) fat(E) dm_thin_pool(E) dm_persistent_data(E) dm_bio_prison(E) dm_bufio(E) libcrc32c(E) dm_service_time(E) dm_multipath(E) intel_rapl_msr(E) intel_rapl_common(E) intel_uncore_frequency(E) intel_uncore_frequency_common(E) skx_edac(E) skx_edac_common(E) nfit(E) libnvdimm(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) mgag200(E) kvm(E) i2c_algo_bit(E) dell_pc(E) rapl(E) drm_shmem_helper(E) ipmi_ssif(E) intel_cstate(E) platform_profile(E) dell_smbios(E) dcdbas(E) dell_wmi_descriptor(E) intel_uncore(E) wmi_bmof(E) pcspkr(E) drm_kms_helper(E) i2c_i801(E) mei_me(E) mei(E) intel_pch_thermal(E) i2c_mux(E) lpc_ich(E) i2c_smbus(E) joydev(E) [Wed Jan 29 10:11:17 2025] acpi_power_meter(E) ipmi_si(E) acpi_ipmi(E) ipmi_devintf(E) ipmi_msghandler(E) nfsd(E) nfs_acl(E) lockd(E) auth_rpcgss(E) grace(E) fuse(E) drm(E) sunrpc(E) ext4(E) mbcache(E) jbd2(E) lpfc(E) sd_mod(E) sg(E) nvmet_fc(E) nvmet(E) nvme_fc(E) ahci(E) crct10dif_pclmul(E) nvme_fabrics(E) crc32_pclmul(E) libahci(E) crc32c_intel(E) polyval_clmulni(E) nvme_core(E) megaraid_sas(E) ixgbe(E) polyval_generic(E) ghash_clmulni_intel(E) mdio(E) libata(E) wdat_wdt(E) scsi_transport_fc(E) dca(E) wmi(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) [Wed Jan 29 10:11:17 2025] CPU: 16 UID: 0 PID: 1670626 Comm: kworker/u193:0 Kdump: loaded Tainted: G E 6.12.11-1.el9.esat.x86_64 #1 [Wed Jan 29 10:11:17 2025] Tainted: [E]=UNSIGNED_MODULE [Wed Jan 29 10:11:17 2025] Hardware name: Dell Inc. PowerEdge R740/00WGD1, BIOS 2.20.1 09/13/2023 [Wed Jan 29 10:11:17 2025] Workqueue: rpciod rpc_async_schedule [sunrpc] [Wed Jan 29 10:11:17 2025] RIP: 0010:nfsd4_cb_done+0x171/0x180 [nfsd] [Wed Jan 29 10:11:17 2025] Code: 0f 1f 44 00 00 e9 1d ff ff ff 80 3d 1c a7 01 00 00 0f 85 d9 fe ff ff 48 c7 c7 e5 b2 06 c1 c6 05 08 a7 01 00 01 e8 1f 4f ef d4 <0f> 0b 8b 75 54 e9 bc fe ff ff 0f 0b 0f 1f 00 90 90 90 90 90 90 90 [Wed Jan 29 10:11:17 2025] RSP: 0018:ffffa469b58c7e08 EFLAGS: 00010282 [Wed Jan 29 10:11:17 2025] RAX: 0000000000000000 RBX: ffff8a8f13ef6400 RCX: 0000000000000000 [Wed Jan 29 10:11:17 2025] RDX: 0000000000000002 RSI: ffffffff97398443 RDI: 00000000ffffffff [Wed Jan 29 10:11:17 2025] RBP: ffff8a8c574515b8 R08: 0000000000000000 R09: ffffa469b58c7cb0 [Wed Jan 29 10:11:17 2025] R10: ffffa469b58c7ca8 R11: ffffffff97fdf688 R12: ffff8a7548f73f60 [Wed Jan 29 10:11:17 2025] R13: ffff8a8f13ef6400 R14: 0000000004248060 R15: ffffffffc0d66a40 [Wed Jan 29 10:11:17 2025] FS: 0000000000000000(0000) GS:ffff8a8ae0a00000(0000) knlGS:0000000000000000 [Wed Jan 29 10:11:17 2025] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [Wed Jan 29 10:11:17 2025] CR2: 00007f8a14576160 CR3: 000000274fa20004 CR4: 00000000007726f0 [Wed Jan 29 10:11:17 2025] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [Wed Jan 29 10:11:17 2025] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [Wed Jan 29 10:11:17 2025] PKRU: 55555554 [Wed Jan 29 10:11:17 2025] Call Trace: [Wed Jan 29 10:11:17 2025] <TASK> [Wed Jan 29 10:11:17 2025] ? __warn+0x84/0x130 [Wed Jan 29 10:11:17 2025] ? nfsd4_cb_done+0x171/0x180 [nfsd] [Wed Jan 29 10:11:17 2025] ? report_bug+0x1c3/0x1d0 [Wed Jan 29 10:11:17 2025] ? handle_bug+0x5b/0xa0 [Wed Jan 29 10:11:17 2025] ? exc_invalid_op+0x14/0x70 [Wed Jan 29 10:11:17 2025] ? asm_exc_invalid_op+0x16/0x20 [Wed Jan 29 10:11:17 2025] ? __pfx_rpc_exit_task+0x10/0x10 [sunrpc] [Wed Jan 29 10:11:17 2025] ? nfsd4_cb_done+0x171/0x180 [nfsd] [Wed Jan 29 10:11:17 2025] ? nfsd4_cb_done+0x171/0x180 [nfsd] [Wed Jan 29 10:11:17 2025] rpc_exit_task+0x5b/0x170 [sunrpc] [Wed Jan 29 10:11:17 2025] __rpc_execute+0x9f/0x370 [sunrpc] [Wed Jan 29 10:11:17 2025] rpc_async_schedule+0x2b/0x40 [sunrpc] [Wed Jan 29 10:11:17 2025] process_one_work+0x179/0x390 [Wed Jan 29 10:11:17 2025] worker_thread+0x239/0x340 [Wed Jan 29 10:11:17 2025] ? __pfx_worker_thread+0x10/0x10 [Wed Jan 29 10:11:17 2025] kthread+0xdb/0x110 [Wed Jan 29 10:11:17 2025] ? __pfx_kthread+0x10/0x10 [Wed Jan 29 10:11:17 2025] ret_from_fork+0x2d/0x50 [Wed Jan 29 10:11:17 2025] ? __pfx_kthread+0x10/0x10 [Wed Jan 29 10:11:17 2025] ret_from_fork_asm+0x1a/0x30 [Wed Jan 29 10:11:17 2025] </TASK> [Wed Jan 29 10:11:17 2025] ---[ end trace 0000000000000000 ]--- It also seems to indicate an issue with nfsd4_cb_done in a workqueue. The "echo t > /proc/sysrq-trigger" was taken after stopping the trace-cmd. I hope the trace.dat file contains enough information to help find the root cause. It was started before the first client mounted something from this server. Regards, Rik
> [Wed Jan 29 10:11:17 2025] cb_status=-521 tk_status=-10036 -521 = -EBADHANDLE (NFS4ERR_BADHANDLE) -10036 = -NFS4ERR_BAD_XDR I see several similar events in the trace.dat report. My guess is the client is generating BADHANDLE (nfs4_callback_getattr) but the server is having some trouble decoding that result. My first impression is that XDR result decoders are supposed to return -EIO in this case. NFS4ERR_BAD_XDR is supposed to mean the /remote side/ was not able to decode a Call, not the local side couldn't decode the Reply. RFC 8881 Section 20.1.3 states: "If the filehandle specified is not one for which the client holds an OPEN_DELEGATE_WRITE delegation, an NFS4ERR_BADHANDLE error is returned." This appears to be a bug in the new CB_GETATTR implementation. It might or might not cause the callback workqueue to stall, but it should probably be filed as a separate bug.
Looking further back in trace.dat: kworker/u194:9-1629754 [033] 309447.521022: nfsd_cb_free_slot: task:00001fa3@0000009a sessionid=67952a7d:4dd6ab1d:00000fb0:00000000 new slot seqno=8100 kworker/u193:6-1630755 [004] 309468.053064: nfsd_cb_seq_status: task:00001fa4@0000009a sessionid=67952a7d:4dd6ab1d:00000fb0:00000000 tk_status=0 seq_status=0 kworker/u193:6-1630755 [004] 309468.053066: nfsd_cb_free_slot: task:00001fa4@0000009a sessionid=67952a7d:4dd6ab1d:00000fb0:00000000 new slot seqno=8101 kworker/u194:13-1634037 [013] 309549.519294: nfsd_cb_seq_status: task:00001fa5@0000009a sessionid=67952a7d:4dd6ab1d:00000fb0:00000000 tk_status=-5 seq_status=1 kworker/u194:13-1634037 [037] 309558.734930: nfsd_cb_seq_status: task:00001fa6@0000009a sessionid=67952a7d:4dd6ab1d:00000fb0:00000000 tk_status=-5 seq_status=1 kworker/u194:0-1662968 [047] 309567.950612: nfsd_cb_seq_status: task:00001fa7@0000009a sessionid=67952a7d:4dd6ab1d:00000fb0:00000000 tk_status=-5 seq_status=1 kworker/u194:13-1634037 [043] 309577.167133: nfsd_cb_seq_status: task:00001fa8@0000009a sessionid=67952a7d:4dd6ab1d:00000fb0:00000000 tk_status=-5 seq_status=1 kworker/u193:14-1658519 [008] 309586.381715: nfsd_cb_seq_status: task:00001fa9@0000009a sessionid=67952a7d:4dd6ab1d:00000fb0:00000000 tk_status=-5 seq_status=1 kworker/u194:6-1659115 [045] 309588.599171: nfsd_cb_seq_status: task:00001faa@0000009a sessionid=67952a7d:4dd6ab1d:00000fb0:00000000 tk_status=-512 seq_status=1 (Not shown here) this is a CB_RECALL_ANY operation that was sent several times but timed out. The retry mechanism could have dropped this operation after the first timeout, or it could have noticed that the callback transport was marked FAULT. This situation appears to resolve itself.
(In reply to Chuck Lever from comment #25) > > [Wed Jan 29 10:11:17 2025] cb_status=-521 tk_status=-10036 > > -521 = -EBADHANDLE (NFS4ERR_BADHANDLE) > > -10036 = -NFS4ERR_BAD_XDR > > I see several similar events in the trace.dat report. My guess is the client > is generating BADHANDLE (nfs4_callback_getattr) but the server is having > some trouble decoding that result. > > My first impression is that XDR result decoders are supposed to return -EIO > in this case. NFS4ERR_BAD_XDR is supposed to mean the /remote side/ was not > able to decode a Call, not the local side couldn't decode the Reply. > > RFC 8881 Section 20.1.3 states: > > "If the filehandle specified is not one for which the client holds an > OPEN_DELEGATE_WRITE delegation, an NFS4ERR_BADHANDLE error is returned." > > This appears to be a bug in the new CB_GETATTR implementation. It might or > might not cause the callback workqueue to stall, but it should probably be > filed as a separate bug. I've opened Bug #219737 for this. Regards, Rik
Hello. Good news for 6.1 kernels. With these patches applied : 8626664c87ee NFSD: Replace dprintks in nfsd4_cb_sequence_done() 961b4b5e86bf NFSD: Reset cb_seq_status after NFS4ERR_DELAY No hangs anymore for me since more than two weeks of server uptime. And previously the hangs occurred every weeks. I just see some suspicious server load maybe caused by the send of RPC_RECALL_ANY to shutdown/suspended/courtesy clients. I see a lot of work on the list that try to address these problems : nfsd: eliminate special handling of NFS4ERR_SEQ_MISORDERED nfsd: handle NFS4ERR_BADSLOT on CB_SEQUENCE better nfsd: when CB_SEQUENCE gets ESERVERFAULT don't increment seq_nr nfsd: only check RPC_SIGNALLED() when restarting rpc_task nfsd: always release slot when requeueing callback nfsd: lift NFSv4.0 handling out of nfsd4_cb_sequence_done() nfsd: prepare nfsd4_cb_sequence_done() for error handling rework NFSD: Skip sending CB_RECALL_ANY when the backchannel isn't up NFSD: fix hang in nfsd4_shutdown_callback Thanks a lot ! Baptiste.
Salvatore Bonaccorso <carnil@debian.org> replies to comment #28: Hi, On Mon, Feb 10, 2025 at 12:05:32PM +0000, Baptiste PELLEGRIN via Bugspray Bot wrote: > Baptiste PELLEGRIN writes via Kernel.org Bugzilla: > > Hello. > > Good news for 6.1 kernels. > > With these patches applied : > > 8626664c87ee NFSD: Replace dprintks in nfsd4_cb_sequence_done() > 961b4b5e86bf NFSD: Reset cb_seq_status after NFS4ERR_DELAY > > No hangs anymore for me since more than two weeks of server uptime. And > previously the hangs occurred every weeks. > > I just see some suspicious server load maybe caused by the send of > RPC_RECALL_ANY to shutdown/suspended/courtesy clients. > > I see a lot of work on the list that try to address these problems : > > nfsd: eliminate special handling of NFS4ERR_SEQ_MISORDERED > nfsd: handle NFS4ERR_BADSLOT on CB_SEQUENCE better > nfsd: when CB_SEQUENCE gets ESERVERFAULT don't increment seq_nr > nfsd: only check RPC_SIGNALLED() when restarting rpc_task > nfsd: always release slot when requeueing callback > nfsd: lift NFSv4.0 handling out of nfsd4_cb_sequence_done() > nfsd: prepare nfsd4_cb_sequence_done() for error handling rework > > NFSD: Skip sending CB_RECALL_ANY when the backchannel isn't up > > NFSD: fix hang in nfsd4_shutdown_callback So I see the backport of 961b4b5e86bf NFSD: Reset cb_seq_status after NFS4ERR_DELAY landed in the just released 6.1.129 stable version. Do we consider this to be sufficient to have stabilized the situation about this issue? (I do realize much other work has dne as well which partially has flown down to stable series already). This reply is mainly in focus of https://bugs.debian.org/1071562 Regards, Salvatore (via https://msgid.link/Z7iC42RF-7Qj2s4d@eldamar.lan)
Harald Dunkel <harald.dunkel@aixigo.com> replies to comment #29: Hi folks, I don't like to bring bad news, but yesterday I had a problem with kernel 6.12.9 (twice), see attachment. AFAIK 6.12.9 is supposed to include 961b4b5e86bf56a2e4b567f81682defa5cba957e and 8626664c87eebb21a40d4924b2f244a1f56d8806. Regards Harri ________________________________________ From: Salvatore Bonaccorso <salvatore.bonaccorso@gmail.com> on behalf of Salvatore Bonaccorso <carnil@debian.org> Sent: Friday, February 21, 2025 14:42 To: Baptiste PELLEGRIN via Bugspray Bot Cc: anna@kernel.org; jlayton@kernel.org; cel@kernel.org; herzog@phys.ethz.ch; tom@talpey.com; trondmy@kernel.org; benoit.gschwind@minesparis.psl.eu; baptiste.pellegrin@ac-grenoble.fr; linux-nfs@vger.kernel.org; Harald Dunkel; chuck.lever@oracle.com Subject: Re: NFSD threads hang when destroying a session or client ID Hi, On Mon, Feb 10, 2025 at 12:05:32PM +0000, Baptiste PELLEGRIN via Bugspray Bot wrote: > Baptiste PELLEGRIN writes via Kernel.org Bugzilla: > > Hello. > > Good news for 6.1 kernels. > > With these patches applied : > > 8626664c87ee NFSD: Replace dprintks in nfsd4_cb_sequence_done() > 961b4b5e86bf NFSD: Reset cb_seq_status after NFS4ERR_DELAY > > No hangs anymore for me since more than two weeks of server uptime. And > previously the hangs occurred every weeks. > > I just see some suspicious server load maybe caused by the send of > RPC_RECALL_ANY to shutdown/suspended/courtesy clients. > > I see a lot of work on the list that try to address these problems : > > nfsd: eliminate special handling of NFS4ERR_SEQ_MISORDERED > nfsd: handle NFS4ERR_BADSLOT on CB_SEQUENCE better > nfsd: when CB_SEQUENCE gets ESERVERFAULT don't increment seq_nr > nfsd: only check RPC_SIGNALLED() when restarting rpc_task > nfsd: always release slot when requeueing callback > nfsd: lift NFSv4.0 handling out of nfsd4_cb_sequence_done() > nfsd: prepare nfsd4_cb_sequence_done() for error handling rework > > NFSD: Skip sending CB_RECALL_ANY when the backchannel isn't up > > NFSD: fix hang in nfsd4_shutdown_callback So I see the backport of 961b4b5e86bf NFSD: Reset cb_seq_status after NFS4ERR_DELAY landed in the just released 6.1.129 stable version. Do we consider this to be sufficient to have stabilized the situation about this issue? (I do realize much other work has dne as well which partially has flown down to stable series already). This reply is mainly in focus of https://bugs.debian.org/1071562 Regards, Salvatore District Court Aachen - HRB 8057 Management Board: Arnaud Picut (CEO), Hicham El Bonne (CTO) Chairman of the Supervisory Board: Benjamin Carl Lucas (via https://msgid.link/PR3PR01MB69723701495481CCA7D55F0585C72@PR3PR01MB6972.eurprd01.prod.exchangelabs.com)
Created attachment 307692 [details] x2
Salvatore Bonaccorso <carnil@debian.org> replies to comment #30: Hi Harry, On Fri, Feb 21, 2025 at 01:57:47PM +0000, Harald Dunkel wrote: > Hi folks, > > I don't like to bring bad news, but yesterday I had a problem with > kernel 6.12.9 (twice), see attachment. AFAIK 6.12.9 is supposed to > include 961b4b5e86bf56a2e4b567f81682defa5cba957e and > 8626664c87eebb21a40d4924b2f244a1f56d8806. At first glance this looks like a different issue, as the backtrace does ot mention at all nfsd4_destroy_session but I will leave it to Chuck et all as the experts to correft me if I'm asserting this wrongly. Regards, Salvatore (via https://msgid.link/Z7iOXnzzMKBYEQA3@eldamar.lan)
(In reply to Bugspray Bot from comment #31) > Created attachment 307692 [details] > x2 That warning is probably the bug that 4990d098433d fixes. That patch just went into -rc3 and hasn't trickled out to stable kernels yet. It is related to callbacks, but is unrelated to this BZ.
(In reply to Bugspray Bot from comment #30) > > So I see the backport of 961b4b5e86bf NFSD: Reset cb_seq_status after > NFS4ERR_DELAY landed in the just released 6.1.129 stable version. > > Do we consider this to be sufficient to have stabilized the situation > about this issue? (I do realize much other work has dne as well which > partially has flown down to stable series already). > That seems to have the two patches that I would consider most likely to fix the hangs reported here: 036ac2778f7b28885814c6fbc07e156ad1624d03 961b4b5e86bf56a2e4b567f81682defa5cba957e
(In reply to Bugspray Bot from comment #29) > So I see the backport of 961b4b5e86bf NFSD: Reset cb_seq_status after > NFS4ERR_DELAY landed in the just released 6.1.129 stable version. > > Do we consider this to be sufficient to have stabilized the situation > about this issue? (I do realize much other work has dne as well which > partially has flown down to stable series already). > > This reply is mainly in focus of https://bugs.debian.org/1071562 Since we do not have a reproducer, it's difficult to ascertain with certainty that the issue has been addressed. Thus I feel it is up to the various original reporters to decide if they are happy with the solution above. I'm comfortable leaving this bug open for a while.