after mounting a CIFS share on a windows server 2003, r2 3790 SP2, and doing "nothing"/idling, after about 50-60 seconds accessing the mounted share is not possible anymore due to "permission denied" errors.
sniffing the traffic reveals that a "Echo Request" message is being sent immediately followed by a "Echo Response, Error: STATUS_ACCESS_DENIED" message. after that one has to remount the share in order to access it again.
accessing the share and copying files seems to work fine immediately after mounting, but only until the "Echo Request" message.
2.6.37 was not affected. i've tested 2.6.38-rc2 and 2.6.38-rc3 and both suffer from the issues.
mount.cifs version: 1.14-3.5.6, samba-3.5.6
# CONFIG_CIFS_STATS is not set
# CONFIG_CIFS_DEBUG2 is not set
CONFIG_SMB_FS=m (module is not loaded)
Display Internal CIFS Data Structures for Debugging
CIFS Version 1.69
Features: dfs lanman posix spnego xattr acl
Active VFS Requests: 0
1) Name: xxxx Domain: xxxx Uses: 1 OS: Windows Server 2003 R2 3790 Service Pack 2
NOS: Windows Server 2003 R2 5.2 Capability: 0x1f3fd
SMB session status: 1 TCP status: 1
Local Users To Server: 1 SecMode: 0xf Req On Wire: 0
1) xxxx Mounts: 1 Type: NTFS DevInfo: 0x20 Attributes: 0x700ff
PathComponentMax: 255 Status: 0x1 type: DISK
i've now compiled the debug settings too if that helps, but not yet rebooted (and only remote access at the moment).
This sounds very much like a server bug, but I'll need to see what's happening on the wire to be sure.
Would you be able to get a wire capture of the traffic between client and server during the failed echo and subsequent access attempts? Instructions on how to do this are here:
i've sent you a capture file in private. 2.6.37 worked fine, but i'll try again tomorrow just to be sure the admins didn't also change something on the server.
Thanks for the capture and the bug report...
I think this is a bug in the handling of security signatures on the client. It looks like some of the packets during the session setup were gone from the capture however and I can't be sure how the mount was done. Are you able to disable security signatures on the mount and tell me if it works around the bug?
I'll have a look in a bit and we should be able to get this fixed up.
Created attachment 46182 [details]
patch -- enable signing flag in SMB header when server has it on
This patch passes basic smoke testing for me, and I think will fix the bug. Would you be able to test it and let me know if it does?
thanks for the patch, i've applied it and recompiled the kernel, but i can only
test it tomorrow when i'm able to reboot.
in the meantime i've tried all "sec" parameters for mount.cifs, only ntlmi and
ntlmv2i allow mounting, plain ntlm and ntlmv2 result in:
CIFS VFS: Server requires packet signing to be enabled in
CIFS VFS: cifs_mount failed w/return code = -95
both ntmli and ntlmv2i result in permission denied after the idle time.
btw, i also always get "CIFS VFS: Unexpected SMB signature" in dmesg which i
remember also from earlier kernels but never experienced any problems with this
patch makes sense and sounds like a must fix.
Let me know if any negative feedback on it.
patch is great :) everything works fine again as it should. after an echo request follows a correct echo response and no permission denied errors anymore
thanks very much for the fast fix!
hmm.. i guess i posted my last message too early :/ i've tested with one share before:
mount ... && ls /share && sleep 75 && ls /share
worked. sniffed correct responses.
now i've mounted all of my shares and not only one, left them alone for a while and then i couldn't access the folder anymore and everything hangs. it took about 1-2 minutes to umount.
i got this in dmesg:
CIFS VFS: No task to wake, unknown frame received! NumMids 0
Received Data is: : dump of 37 bytes of data at 0xe9042ac0
00000075 424d53ff 00000072 c0018000 u . . . � S M B r . . . . . . �
00000000 00000000 00000000 13140000 . . . . . . . . . . . . . . . .
001e0000 0f000211 . . . . .
CIFS VFS: Cmd: 114 Err: 0x0 Flags: 0x80 Flgs2: 0xc001 Mid: 30 Pid: 4884
CIFS VFS: smb buf e9042ac0 len 121
CIFS VFS: Dump pending requests:
i'm trying to reproduce and sniff the session again.
1) the patch works for my posted "echo request" problem, hence this bug can be closed.
2) the reason why the connection got torn down and access was not possible anymore was that i'm also using virtualbox (v4.0.2) with NAT network adapter and a windows xp image. i've connected the shares in XP but not supplied any passwords. hence when windows starts it tries to mount but initially fails until one supplies the passwords in win explorer.
steps to reproduce:
1) mount shares in linux, idle
2) start windows xp in virtualbox with NAT, login - win tries to mount associated shares during login, but no passwords supplied => fails
3) idle on linux, don't access the shares for 60-70 seconds
4) echo request is sent, but no response from server anymore, as connection has been torn down.
to sum the network messages up:
linux has properly setup everything, echo request + echo response have been sent
windows starts: negotiation, session setup, failing trans2 requests for shares as no pwd supplied, logoff, tree disconnect
TCP teardown, FIN+ACK, ACK
linux sends echo request, no reply from server anymore. after some time TCP RST, then new TCP handshake, again echo requests, but no reply from server.
my workaround for this will just be using bridged network adapters :)
if you think this is a client-side CIFS problem, i can supply some pcap files in private, but i have to remove some packets again with the pwd hashes etc. which could take a while :)
Thanks for testing it. Patch posted to email@example.com. I expect that Steve will commit it soon.
As far as the second problem...is this something that worked in 2.6.37?
The linux and windows clients wouldn't be sharing a socket in this case, so I'm a little unclear as to why the server would stop responding on the linux client's socket just because the windows client tore down its connection?
Does the linux client eventually shut down the connection when the server stops responding to echoes?
Also, what sort of server is this? Most servers will close down the connection when the signatures are wrong. This one just sent back an error, so I'm a little curious...
i've rebooted into 2.6.37-gentoo (it is nearly mainline (only iwlwifi and bootsplash patches)) and i could _not_ reproduce the second problem in this kernel version.
so it only seems to affect 2.6.38-rcX. i'm wondering too as to why the different TCP connections with different source ports suddenly don't work anymore.
i've waited only for a few minutes (2-3 echo requests) and the client doesn't shutdown within this timespan. unmounting the shares under linux takes a long time (even with umount -l).
my first post mentions "windows server 2003, r2 3790 SP2". what other information would you need?
To give you some background...
The timeout behavior has changed significantly in 2.6.38. We used to time out individual calls to the server, but that has always been very problematic. It's easily possible for the server to take several minutes on certain calls, and we'd usually time those out prematurely. That leads to data corruption issues and application errors.
The problem is that CIFS is not NFS and treating individual calls that way doesn't really work the same way. With 2.6.38, we've moved to a scheme to time out the socket as a whole instead of individual calls. In order to tell if the server is still responding, we do an SMB echo (sort of like a ping) to the server. The default is to attempt this 5 times before giving up and trying to reconnect the socket. So, 2-3 requests won't be enough to make it time out. This is tunable however via the "echo_retries" module parameter.
I agree that unmounting takes too long when the server is unresponsive, but fixing that is non-trivial. With the new timeout scheme though, we're in a far better position to eventually do so.
Now, all that said, none of that should change how the sockets get set up, only how and when the client decides to time out when the server is unresponsive.
What may be best is to open a new bug with this info since it sounds like a separate problem from the original one.
Fixed by commit e3f0dadb2b44746f6223ce4560406d19e02fb1cc .