Bug 216889

Summary: CIFS 1.0 mount succeeds but any attempt to read from / write to the share fails
Product: File System Reporter: Davyd McColl (davydm)
Component: CIFSAssignee: fs_cifs (fs_cifs)
Status: REOPENED ---    
Severity: normal CC: alex1970, pc, smfrench
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 6.0.16 Subsystem:
Regression: No Bisected commit-id:
Attachments: trace.pcap
trace.log

Description Davyd McColl 2023-01-05 16:58:59 UTC
Symptoms: CIFS 1.0 mount command works - the share mounts and I can (sometimes) list contents. However, I'm unable to read or write files - any attempt to do so either stalls (looks like file buffers fill, reach saturation and aren't drained) or, in the case of `touch`, I get a segfault.

I first noticed this in a 6.1.0 kernel (and every subsequent 6.1.x), which I've masked on my gentoo system. 6.0.15 works fine - 6.0.16 doesn't.

I'm not sure if this is perhaps related to work around https://bugzilla.kernel.org/show_bug.cgi?id=215375 - the very reason I've moved on to a 6.x kernel was because that issue was fixed in earlier 6.x kernels and the patch from https://bugs.gentoo.org/821895 no longer applies cleanly.

I don't see anything interesting in logs - dmesg, /var/log/messages, /var/log/syslog. If there's more information I can furnish, please guide me.
Comment 1 Davyd McColl 2023-01-05 17:00:25 UTC
btw, `mount` shows the following for the share:

//mede8er/mede8er on /mnt/mede8er-smb type cifs (rw,nosuid,nodev,relatime,vers=1.0,sec=none,cache=strict,uid=1000,noforceuid,gid=1000,noforcegid,addr=192.168.50.105,iocharset=utf8,soft,unix,posixpaths,serverino,mapposix,nobrl,acl,rsize=61440,wsize=65536,bsize=1048576,echo_interval=60,actimeo=1,closetimeo=5)

my fstab entry is

//mede8er/mede8er /mnt/mede8er-smb cifs noauto,guest,users,uid=daf,gid=daf,iocharset=utf8    ,vers=1.0,nobrl,guest,sec=none 0 0
Comment 2 Davyd McColl 2023-01-05 17:16:45 UTC
sorry, I didn't find this when I originally searched, but it's probably related: https://bugzilla.kernel.org/show_bug.cgi?id=216881

I've also just tested 6.0.17, same issue. So the problem definitely came in with 6.0.16, as far as I can tell.
Comment 3 Paulo Alcantara 2023-01-05 20:34:38 UTC
Unfortunately I couldn't reproduce it in v6.0.{15,16,17} from stable
tree[1] when mounting SMB1 share (plus UNIX extensions) against Samba
4.15.8.

The cifs commits that went in v6.0.16:

	e4538d07e959 cifs: fix uninitialised var in smb2_compound_op()
	7e02dc7d2ee8 cifs: fix use-after-free on the link name
	b0a1236fe0fd cifs: fix memory leaks in session setup
	4d99a4c67481 cifs: Fix xid leak in cifs_get_file_info_unix()
	983ec6379b9b cifs: fix double-fault crash during ntlmssp
	a13e51760703 cifs: fix oops during encryption
	2d046892a493 cifs: improve symlink handling for smb2+
	a0db9c98d0d2 cifs: replace kfree() with kfree_sensitive() for sensitive data

So, I'd guess you are missing this:

	9ee2afe5207b cifs: prevent copying past input buffer boundaries

Could you share the following

- SMB server and version
- Server settings (e.g. provide smb.conf for Samba)

Are you trying it from stable tree[1]?  If not, please try it instead.
Also, check if mainline kernel[2] works as well.

Provide verbose logs and network traces:

	# umount all existing shares
	dmesg --clear
	echo 'module cifs +p' > /sys/kernel/debug/dynamic_debug/control
	echo 'file fs/cifs/* +p' > /sys/kernel/debug/dynamic_debug/control
	echo 1 > /proc/fs/cifs/cifsFYI
	echo 1 > /sys/module/dns_resolver/parameters/debug
	tcpdump -s 0 -w trace.pcap port 445 & pid=$!
	sleep 3
	mount.cifs //mede8er/mede8er /mnt/mede8er-smb -o ...
	# reproduce the crash under /mnt/mede8er-smb...
	sleep 3
	kill $pid
	dmesg > trace.log
	cat /proc/fs/cifs/dfscache >> trace.log

and then send trace.pcap and trace.log files.  If you weren't able to
provide trace.pcap due to crash, it's OK, provide at least the dmesg
log.

[1] git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
[2] https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux.git
Comment 4 Davyd McColl 2023-01-06 09:24:57 UTC
Created attachment 303533 [details]
trace.pcap
Comment 5 Davyd McColl 2023-01-06 09:25:12 UTC
Created attachment 303534 [details]
trace.log
Comment 6 Davyd McColl 2023-01-06 09:30:33 UTC
Requested files attached. As for the source server, it's a Mede8er 600X3D with the latest (though quite old, I'll admit) firware from here: https://www.mede8erforum.com/index.php/board,127.0.html

Whilst I've tried unpacking the firmware from there, I haven't had success with mounting or unsquashing the squashfs images contained therein (where I assume to find an ancient samba binary, since this is a little linux box, but with quite old software)

Please let me know if the provided traces are helpful enough to continue; I can perform a bisect against mainline again, as I did with my prior cifs1.0 report, but, as I'm sure you know, it takes quite a bit of time.
Comment 7 Paulo Alcantara 2023-01-06 15:57:44 UTC
Hi Davyd,

Thanks for the provided logs and trace.

In dmesg, we can see that we are passing a bogus address (%rdi) to
kstrdup() in cifs_new_fileinfo()

	...
	[  222.706348] RDX: 0000000000000040 RSI: 0000000000000cc0 RDI: 01d92125bea2ff00

and if we look at the network trace, particularly packet #38, that
%rdi value is coming from FILE_ALL_INFO::CreationTime in the response,
which is obviously wrong.

So, the problem seems to be in cifs_open_file() where we assign @fi to
@buf which is wrong as @buf must hold a pointer to a
cifs_open_info_data structure passed by cifs_nt_open().

That being said, could you pleasy try this patch[1] and see if it
fixes your issue?

Please make sure that you also have below fix in your kernel

	9ee2afe5207b ("cifs: prevent copying past input buffer boundaries")

[1] https://pc.cjr.nz/smb1.diff
Comment 8 Davyd McColl 2023-01-06 17:48:43 UTC
Applying 9ee2afe5207b from mainline and then your supplied smb1.diff, I can run the same sequence as you've asked for above and the touch command works fine. To double-check, I've also tried "echo bar >> /mnt/mede8er-smb/foo" and been able to re-read the file. So this looks like a win to me (:
Comment 9 Paulo Alcantara 2023-01-06 18:15:35 UTC
Thanks for quickly testing it.

Let me know if you find any other issues.  I'm still waiting for
feedback on [1] that the same patch should also help.

Once confirmed it works, I'll post the patch to mailing list and then
update bug accordingly.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=216881
Comment 10 careca 2023-01-07 08:15:51 UTC
I can confirm severe issues on CIFS-mount (samba) after upgrade from 6.0.15 to 6.0.16.

In my case, I am not able to log in to GNOME. System stalls, and I cannot access any TTY.

Removing CIFS-mount entry from fstab allows to successfully log in into 6.0.16.

I did not test manually mounting CIFS manually, but I expect similar behavior...

If it may be helpful, I can try delivering more data, please tell me how (step by step).

regards
Comment 11 Paulo Alcantara 2023-01-09 17:37:05 UTC
FYI, fixes[1] sent upstream already.

[1] https://lore.kernel.org/linux-cifs/20230107200134.4822-1-pc@cjr.nz/T/#t
Comment 12 Davyd McColl 2023-01-14 07:33:39 UTC
After some testing, I can confirm that this still works - but 6.0.16 transfers files about 1/3 as fast as 6.0.15. There's some other new regression in there.

For context, I'm transferring files across a wifi network that isn't the greatest, but alternating boots between 6.0.16 and 6.0.15, I find that I can get > 2mb/s on 6.0.15 and I get 400k-700k/s on 6.0.16.

I hadn't noticed because I sync files overnight, but had to manually sync today and was appalled at the slow rate, rebooted everything, and then did some testing between kernel versions, showing 6.0.16 with this patch to be the culprit.
Comment 13 Davyd McColl 2023-01-14 07:34:51 UTC
For context, I'm using my own software (https://github.com/fluffynuts/bitsplat/) to copy files across the network. In an effort to provide more consistent / accurate estimates of delivery time, this software performs fsyncs after each block.
Comment 14 Paulo Alcantara 2023-01-14 16:57:21 UTC
What about latest 6.1.y and mainline kernels?

If the issue still persists, then provide the following information from 6.0.15, 6.1.y and mainline kernels by performing such file transfer:

  - verbose logs and network traces (as in comment #3)
  - output of /proc/fs/cifs/Stats
Comment 15 Steve French 2023-01-14 17:47:37 UTC
As Paulo noted, something as simple as comparing the contents of /proc/fs/cifs/Stats (the number of reads, writes, opens, closes, query infos etc.) with and without the patch can tell us if something unexpected changed.  Also can be helpful to see if any reconnects occurred.   Then logging at the logs can give us more information
Comment 16 Davyd McColl 2023-01-15 14:30:31 UTC
I'm happy to say that I'm not seeing the speed issues I mentioned after a recent reboot into 6.0.16 and a test sync. The device is only on slow wi-fi, and sometimes speeds do dip, but what convinced me that something was awry is that last time I rebooted 4 times (iirc) to observe and test 6.0.15 vs the patched 6.0.16. Perhaps it was just unlucky timing. I'll keep an eye on it though, and report back if I see the same behavior - but right now, I'm getting a solid 1.8-2.4mb/s to that player -_-
Comment 17 Davyd McColl 2023-01-29 06:36:57 UTC
It looks like this patch is now in 6.1.8? Certainly I can mount, read from & write to the share on my media player without this patch applied. If so, thanks for getting it in there (:
Comment 18 Paulo Alcantara 2023-01-31 17:29:56 UTC
(In reply to Davyd McColl from comment #17)
> It looks like this patch is now in 6.1.8? Certainly I can mount, read from &
> write to the share on my media player without this patch applied. If so,
> thanks for getting it in there (:

Yep, those patches were backported by stable team.