Bug 202435
Summary: | nfs4.1: RPC request reserved 84 but used 276 | ||
---|---|---|---|
Product: | File System | Reporter: | Donald Buczek (buczek) |
Component: | NFS | Assignee: | bfields |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | 284094739, carnil, jlayton, L.Bonnaud, pmenzel+bugzilla.kernel.org, trondmy |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 4.14.87 | Subsystem: | |
Regression: | No | Bisected commit-id: |
Description
Donald Buczek
2019-01-28 14:18:53 UTC
This is a server bug, not a client issue. (In reply to Donald Buczek from comment #0) > If you search the web for "RPC request reserved but used" you find a > few other people reporting this bug, but I didn't find a solution yet. That symptom could have multiple different causes, so I wouldn't assume all those reports are cases of the same bug. Looking at the trace.... A bare SEQUENCE call is getting a SEQUENCE+PUTFH+SETATTR+GETATTR reply. So it's a cached reply. That's probably confusing the logic that predicts how large a result will be. There were some patches in that area recently, I'll take a look. Yes, this looks likely to be fixed by upstream commit 53da6a53e1d4 "nfsd4: catch some false session retries" (may also need preceding 085def3ade52 "nfsd4: fix cached replies to solo SEQUENCE compounds"). Those are probably reasonable to backport to stable branches. Somebody could check whether they apply and fix the problem and, if so, mail them to stable@vger.kernel.org and cc: to linux-nfs@vger.kernel.org and bfields@fieldses.org. I'm not volunteering to do that, but would happily ack them. These two patches are already included in 4.19 LTS and 4.20 stable. They do apply to 4.14 LTS and all previous LTS versions shown on kernel.org. The problem occurred only a few times with hundredth of nfs-mounts using nfs4.1 and we don't know what triggered it. So I can't verify, that the problem is fixed by switching a few servers to 4.14 with the two patches on top on it. And I can't reboot all servers to a new kernel. I've seen similar messages, and added a small patch to print out the XID of the request: [ 99.433198] RPC request 576820793 reserved 108 but used 324 [ 119.840823] RPC request 2540410425 reserved 200 but used 324 I then sniffed traffic while running a testcase that seemed to get a lot of these and saw this: 35112 2019-04-20 06:38:54.139728563 fd90:79d3:5065:f00d::da9 fd90:79d3:5065:f00d::444 NFS 294 V4 Call (Reply In 35156) COMMIT FH: 0x4a1e9ff2 Offset: 0 Len: 0 35156 2019-04-20 06:38:54.176012154 fd90:79d3:5065:f00d::444 fd90:79d3:5065:f00d::da9 NFS 410 V4 Reply (Call In 35112) SETATTR ...and... 43316 2019-04-20 06:39:14.583768975 fd90:79d3:5065:f00d::da9 fd90:79d3:5065:f00d::444 NFS 318 V4 Call (Reply In 43319) CLOSE StateID: 0xacd1 43319 2019-04-20 06:39:14.584101924 fd90:79d3:5065:f00d::444 fd90:79d3:5065:f00d::da9 NFS 410 V4 Reply (Call In 43316) SETATTR ...in both cases, it looks like the server is sending the reply for a different compound (a SETATTR rather than a COMMIT or CLOSE reply). Which kernel? The two commits 53da6a53e1d4 "nfsd4: catch some false session retries" and "nfsd4: fix cached replies to solo SEQUENCE compounds", which fixed my problem, have been backported to v4.14 (v4.14.99) and are in all kernels v4.15 and later, so this bug could be closed. |