Bug 211471

Summary: Kodi/VLC unable to stream from NFS server after updating to 5.10.11
Product: File System Reporter: mwyeatts
Component: NFSAssignee: Trond Myklebust (trondmy)
Status: RESOLVED CODE_FIX    
Severity: normal CC: a.m.wollmarker, bfields, cam, chuck.lever, chucklever
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 5.10.11 Subsystem:
Regression: No Bisected commit-id:
Attachments: Android Kodi Crash

Description mwyeatts 2021-01-30 05:33:53 UTC
Created attachment 294995 [details]
Android Kodi Crash

After updating to kernel 5.10.11, streaming a video from my NFS share to Kodi/VLC on my Android TV causes the applications to either crash or the video to hang/produce artifacts.

Downgrading to 5.10.10 seems to fix the issue and videos stream normally.

I can try to get more logs if necessary, I'm not sure what would be useful.

Thank you.
Comment 1 bfields 2021-01-30 15:48:09 UTC
So, to make sure I understand your setup: Your NFS server is a Linux box running 5.10.10 or 5.10.11, your client is an Android system, and the attached backtrace is from the client.  Is that right?

Are there any kernel backtraces or other interesting-looking messages in the logs on the client or server?

What version of NFS is in use?
Comment 2 bfields 2021-01-30 15:56:03 UTC
Only nfs/rpc changes I see in that range are:

00ee972739fb2526d3936f1e7ccfc8c91d250c60 "SUNRPC: Handle TCP socket sends with kernel_sendpage() again"
de82ec8e5e8cba33f84ebef26478b636e94a90fb "nfsd: Fixes for nfsd4_encode_read_plus_data()"
6533681890902e3b59bbceaea311760b3791c28d "nfsd: Don't set eof on a truncated READ_PLUS"

Could also be a networking issue.  Or a video issue if it's actually the client you're upgrading (I'm assuming it's the server).
Comment 3 mwyeatts 2021-01-30 17:00:27 UTC
(In reply to bfields from comment #1)
> So, to make sure I understand your setup: Your NFS server is a Linux box
> running 5.10.10 or 5.10.11, your client is an Android system, and the
> attached backtrace is from the client.  Is that right?


That's right. I'm running Arch Linux on my laptop with 5.10.11 as the server and Kodi as a client on Android.

> Are there any kernel backtraces or other interesting-looking messages in the
> logs on the client or server?

Besides the crash on the client, I didn't see any other interesting messages. I looked in journalctl on the server and no errors were getting logged, let me know if there is somewhere else I should be looking.

> What version of NFS is in use?

On the server, I'm using nfs-utils 2.5.2

nfsstat -s says 
Server nfs v3

On the client/kodi, I think they are using libnfs which defaults to NFSv3
https://github.com/xbmc/xbmc/blob/master/tools/depends/target/libnfs/Makefile

>Or a video issue if it's actually the client you're upgrading (I'm assuming
>it's the server).

Right, when updating the server from 5.10.10 to 5.10.11 thats when the problems started, the client remained the same.
Comment 4 bfields 2021-01-30 20:25:38 UTC
Of those three commits mentioned in comment 2, the two nfsd commits only affect NFSv4.  I guess you could try reverting

00ee972739fb2526d3936f1e7ccfc8c91d250c60 "SUNRPC: Handle TCP socket sends with kernel_sendpage() again"

and see if that makes a difference.

You could also try a git-bisect.  How easy is it to reproduce the problem?  Should be 7-8 tries to narrow the regression down to a single commit.

libnfs isn't even mentioned in the attached backtrace.... It's a bit of a mystery how a change in server behavior would cause kodi to crash in some other code.  You can report to the Kodi and/or libnfs people and see if they have some idea.
Comment 5 Cameron Berkenpas 2021-01-31 21:20:51 UTC
I bisected this morning and found that 00ee972739fb2526d3936f1e7ccfc8c91d250c60 is the problematic commit for me. 

I've reverted 00ee972739fb2526d3936f1e7ccfc8c91d250c60 against 5.10.12 and kodi+NFS works fine for me now. I can confirm I had the same issue with 5.10.11 and 5.10.12 before reverting.

Wish I had found this bug before I spent all that time bisecting. :)
Comment 6 Cameron Berkenpas 2021-01-31 21:24:53 UTC
To follow up, it's probably worth mentioning that standard Linux NFS clients (Kubuntu 20.04 and 20.10) machines didn't seem to have issues playing videos over NFS, which is probably how this was missed.

I'm using NFSv3 with Kodi, and NFSv4 on the linux hosts. This morning, I tried mounting one of the volumes via NFSv3 to the Kubuntu 20.10 machine and there were no issues watching video (which is all I tried).
Comment 7 bfields 2021-01-31 22:07:52 UTC
OK, so sounds like there's a problem that's specific to the libnfs client used in Kodi?

The problematic patch was supposed to be effectively a revert of da1661b93bf4 "SUNRPC: Teach server to use xprt_sock_sendmsg for socket sends", which first went into 5.7.  Did you previously use 5.6 or earlier kernels without problems?

Adding Chuck to see if he has an idea.
Comment 8 Chuck Lever 2021-01-31 22:12:22 UTC
This problem was already reported on linux-nfs@ on Friday.

I've diagnosed the issue: the code after 4a85a6a3320b ("SUNRPC: Handle TCP socket sends with kernel_sendpage() again") does not handle non-page-aligned NFS READ requests properly. The Linux NFS client rarely emits that kind of READ, which is why I missed this issue. But Kodi sends these all the time. The NFS server returns corrupt READ data in this case (the file as stored on the server is not affected).

I'm testing a fix now.
Comment 9 mwyeatts 2021-02-11 03:52:33 UTC
Works great in 5.10.15, thank you.