Bug 50841

Summary: cant read from cifs share mounted with 'cache=none' (SMB response too short)
Product: File System Reporter: Fadeeva Marina (astarta)
Component: CIFSAssignee: Jeff Layton (jlayton)
Status: RESOLVED INVALID    
Severity: normal CC: alan, fs_cifs, jlayton, sprabhu
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.5.x/3.6.x Subsystem:
Regression: Yes Bisected commit-id:
Attachments: successful read of the same file using 3.4 kernel (share mounted with directio)

Description Fadeeva Marina 2012-11-21 09:49:30 UTC
This is a copy of my bug report to RH bz, but I got no response there.

Description of problem:

Cant read files from samba shares mounted with 'cache=none' option.

Version-Release number of selected component (if applicable):
All 3.5.x - 3.6.x vanilla kernels. Fedora 17.

How reproducible:
Always.

Steps to Reproduce:
1. Create smb share *(see additional info for details)

2. Create a file on this share, for example using dd:
dd if=/dev/urandom of=/smb_dir_path/dd_file bs=1024 count=1024

3. mount this share with 'cache=none' options from Fedora 17 or from any system that runs kernel starting from 3.5.0 version.
mount -t cifs //host/share /mount_point/ -o cache=none

4. try to read this file in the below way (note block size value):
dd if=/mount_point/dd_file of=/dev/null bs=131013 count=1

Actual results:
dd will never finish.
dmesg is constantly spammed by messages "CIFS VFS: SMB response too short"

Expected results:
read should succeed.

Additional info:
* smb shares controlled by Windows OS are not affected (they can be successfully read when mounted with cache=none), smb shares controlled by Centos 5 with samba 3.0.33 seems to be not affected also.
The problem is 100% reproducible with samba shares controlled by Centos 6 (samba-3.5.4-68, kernel 2.6.32), latest Fedora (16,17), latest ubuntu. So the problem could also be related to version of samba running on the samba server.

Mounting with 'cache=loose' workarounds the issue.

The problem does not exist with 3.4 and earlier kernels.

Important!
The problem was introduced by 'cifs: convert cifs_iovec_read to use async reads' patch (http://www.spinics.net/lists/linux-cifs/msg05802.html).
I've reverted this patch and everything is working fine again.

Attached you can find kernel logs created with cifsFYI enabled from unsuccessful read with 3.6.2 (messages.bad) kernel and from the same, but successful operation, with 3.4.5 kernel (messages.good).
Comment 1 Fadeeva Marina 2012-11-21 09:50:01 UTC
Created attachment 86841 [details]
successful read of the same file using 3.4 kernel (share mounted with directio)
Comment 2 Jeff Layton 2012-11-22 11:32:05 UTC
Thanks for the bug report, I'll look at it as soon as I'm able. I don't see the "messages.bad" file attached here, but I'll probably need to reproduce this to track it down anyway.
Comment 3 Fadeeva Marina 2012-11-22 11:54:27 UTC
The attachment appeared to be too big :) uploaded it to external server. see at:
www.rat.ru/astarta/messages.bad

Thank you for your attention!
Comment 4 Jeff Layton 2012-11-22 12:07:26 UTC
I was able to reproduce this and from looking at a capture, this appears to be a bug in samba. It's sending malformed frames in response to the read requests we're making. The SMB responses are 4 bytes of zeroes and that's it.

I'll report this to the samba folks, and mention that it's a regression from earlier versions of samba.
Comment 5 Jeff Layton 2012-11-22 12:13:36 UTC
Depending on your I/O sizes, you may be able to work around this problem temporarily by mounting with a lower rsize= value. Maybe try something like rsize=61440 or so? No idea if that'll help, but it may be worth experimenting with.
Comment 6 Fadeeva Marina 2012-11-22 12:38:14 UTC
Ok, I see. Thank you for your kind explanation!

I question thought, if samba server is somehow broken and responds with malformed frames, shouldn't the kernel work around this issue and avoid looping with 'SMB response too short' message?
Earlier kernels (3.4 and older) worked in this case, even with such malformed frames.
Comment 7 Fadeeva Marina 2012-11-22 14:00:29 UTC
Also it's not clear why revering the 'cifs: convert cifs_iovec_read to use async
reads' patch helps? Probably the patch is somehow wrong?
Comment 8 Jeff Layton 2012-11-22 15:10:30 UTC
> Ok, I see. Thank you for your kind explanation!
> 
> I question thought, if samba server is somehow broken and responds with
> malformed frames, shouldn't the kernel work around this issue and avoid
> looping
> with 'SMB response too short' message?
> Earlier kernels (3.4 and older) worked in this case, even with such malformed
> frames.

That would be nice, but it's difficult to handle in practice. All we see is that the server sent us this bogus reply. Because it's not even long enough to get to the MID we can't identify it as the response to any particular request, so we have no way to issue a clear error back to the caller.

> Also it's not clear why revering the 'cifs: convert cifs_iovec_read to use
> async reads' patch helps? Probably the patch is somehow wrong?

No, I'm fairly certain the patch is fine. What it does is make the code use asynchronous and larger read requests on the wire. It's likely that the larger reads are what triggers the bug on the server, which is why I suggested artificially limiting the rsize in the code as a potential workaround.
Comment 9 Jeff Layton 2012-11-22 15:19:34 UTC
Bug reported to samba.org bugzilla here:

    https://bugzilla.samba.org/show_bug.cgi?id=9422

I'll plan to close this as NOTABUG once we have some acknowledgement from the samba folks on the problem.
Comment 10 Jeff Layton 2012-11-22 15:31:46 UTC
> Earlier kernels (3.4 and older) worked in this case, even with such malformed
> frames.

Oh, and I doubt earlier kernels ever saw these malformed frames -- this is almost certainly triggered by the fact that we issue larger read requests (which the server is claiming to support). They certainly were not any better equipped to deal with them and would almost certainly loop indefinitely like this if they saw them.
Comment 11 Jeff Layton 2012-11-23 02:08:46 UTC
Volker from the samba team has generated a patch that seems to fix this in samba. At this point, I'm going to close this as INVALID since it wasn't a kernel bug, but rather one in samba.
Comment 12 Fadeeva Marina 2012-11-23 08:45:13 UTC
Thank you for your help! It's greatly appreciated!