Sending of data using linux function send fails if size is too large. glibc function is: ssize_t send(int sockfd, const void *buf, size_t len, int flags); Type of len is size_t, argument is stored in a kernel structure msgheader which contains an iovec. This iovec contains a size_t (64bit) length field, but in the linux kernel in function tcp_sendmsg the following lines while (--iovlen >= 0) { int seglen = iov->iov_len; unsigned char __user *from = iov->iov_base; convert the len to int (32 bit). Thus sending of 5 GB of data results in 1 GB sent (no problem), but sending of 4 GB results in 0 bytes sent. Workaround in userspace is easy (e.g. instead of len use len < 0x8000000 ? len : 0x7fffffff) but the kernel should handle this correctly.
Created attachment 27681 [details] send.cpp - show effect as size_t will be casted to int by kernel, calling send with (2^31 < (int)datalen < 2^32-1 does nothing Please find testcase attached. I tested on ubuntu 64 bit with the following result: Connect to server: 127.0.0.1:9000 size_t max: 18446744073709551615 Try to send 10, sent 12 bytes. Try to send 4195632, sent 7766 bytes. Try to send 2147483647, sent 10000 bytes. Try to send 2147483648, sent 0 bytes. Try to send 5494538240, sent 10000 bytes. Connection closed To reproduce: sudo apt-get install g++ nc -l -k 9000 g++ -o send ./send.cpp ./send
(switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Mon, 16 Aug 2010 10:01:09 GMT bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=16603 > > Summary: send of data > 4 GB fails on 64 bit systems > Product: Networking > Version: 2.5 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: IPV4 > AssignedTo: shemminger@linux-foundation.org > ReportedBy: bono@onlinehome.de > Regression: No > > > Sending of data using linux function send fails if size is too large. glibc > function is: > > ssize_t send(int sockfd, const void *buf, size_t len, int flags); > > Type of len is size_t, argument is stored in a kernel structure msgheader > which > contains an iovec. This iovec contains a size_t (64bit) length field, but in > the linux kernel in function tcp_sendmsg the following lines > > while (--iovlen >= 0) { > int seglen = iov->iov_len; > unsigned char __user *from = iov->iov_base; > > > convert the len to int (32 bit). > > Thus sending of 5 GB of data results in 1 GB sent (no problem), but sending > of > 4 GB results in 0 bytes sent. > > Workaround in userspace is easy (e.g. instead of len use len < 0x8000000 ? > len > : 0x7fffffff) but the kernel should handle this correctly. > whoops. Yes, I think seglen should be size_t. Do you know if making that change fixes the bug?
From: Andrew Morton <akpm@linux-foundation.org> Date: Mon, 27 Sep 2010 16:15:40 -0700 > Yes, I think seglen should be size_t. Do you know if making that > change fixes the bug? Oh boy, what a rats nest. Just scanning generically I see that net/core/iovec.c:verify_iovec() has similar issues, it should probably use "long" instead of "int" because it's trying to prevent the return value being interpreted as an error code. So if you fix that and make it return "long" this leads to another set of problems, even if you fix that TCP bit sys_sendmsg() holds the total length in an 'int' too. So more stuff to fix. I'll try to do the whole audit, but no promises ;)
Ok, I suspect the following is enough to fix this specific bug report, TCP sending. However, as I stated in my previous reply there are creepy crawlies all over the place. For example, all of the routines in net/core/iovec.c (memcpy_toiovec, memcpy_toiovecend, memcpy_fromiovec, memcpy_fromiovecend, csum_partial_copy_fromiovecend) that cast using min_t() are currently casting "down" to "unsigned int". They should probably case "up" to "size_t". Otherwise 4GB iov_len's on 64-bit will be truncated to zero just as they do in the TCP path being fixed here. If, alterntatively, we want to try and take the size_t values all the way down the code paths to the individual copies, that is a huge undertaking. It's huge because we end up getting to the architecture specific csum_copy_*() routines, which all take 'int' as the length and many are written in assembler and would need audits and potentially changes before we can make the type 'long' or 'size_t'. Anyways, this part here is simple enough and I'll push it to Linus and -stable. -------------------- tcp: Fix >4GB writes on 64-bit. Fixes kernel bugzilla #16603 tcp_sendmsg() truncates iov_len to an 'int' which a 4GB write to write zero bytes, for example. There is also the problem higher up of how verify_iovec() works. It wants to prevent the total length from looking like an error return value. However it does this using 'int', but syscalls return 'long' (and thus signed 64-bit on 64-bit machines). So it could trigger false-positives on 64-bit as written. So fix it to use 'long'. Reported-by: Olaf Bonorden <bono@onlinehome.de> Reported-by: Daniel Büse <dbuese@gmx.de> Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net> diff --git a/include/linux/socket.h b/include/linux/socket.h index a2fada9..a8f56e1 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -322,7 +322,7 @@ extern int csum_partial_copy_fromiovecend(unsigned char *kdata, int offset, unsigned int len, __wsum *csump); -extern int verify_iovec(struct msghdr *m, struct iovec *iov, struct sockaddr *address, int mode); +extern long verify_iovec(struct msghdr *m, struct iovec *iov, struct sockaddr *address, int mode); extern int memcpy_toiovec(struct iovec *v, unsigned char *kdata, int len); extern int memcpy_toiovecend(const struct iovec *v, unsigned char *kdata, int offset, int len); diff --git a/net/core/iovec.c b/net/core/iovec.c index 1cd98df..e6b133b 100644 --- a/net/core/iovec.c +++ b/net/core/iovec.c @@ -35,9 +35,10 @@ * in any case. */ -int verify_iovec(struct msghdr *m, struct iovec *iov, struct sockaddr *address, int mode) +long verify_iovec(struct msghdr *m, struct iovec *iov, struct sockaddr *address, int mode) { - int size, err, ct; + int size, ct; + long err; if (m->msg_namelen) { if (mode == VERIFY_READ) { diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 95d75d4..f115ea6 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -943,7 +943,7 @@ int tcp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, sg = sk->sk_route_caps & NETIF_F_SG; while (--iovlen >= 0) { - int seglen = iov->iov_len; + size_t seglen = iov->iov_len; unsigned char __user *from = iov->iov_base; iov++;
On Mon, 27 Sep 2010 20:24:25 -0700 (PDT) David Miller <davem@davemloft.net> wrote: > Ok, I suspect the following is enough to fix this specific bug > report, TCP sending. > > However, as I stated in my previous reply there are creepy > crawlies all over the place. > > For example, all of the routines in net/core/iovec.c (memcpy_toiovec, > memcpy_toiovecend, memcpy_fromiovec, memcpy_fromiovecend, > csum_partial_copy_fromiovecend) that cast using min_t() are currently > casting "down" to "unsigned int". > > They should probably case "up" to "size_t". eep. A blanket suckyfix might be, at the syscall level: if (size > 4g) { do_it_in_4g_hunks(); do_the_last_bit(); } unless that would break some networking syscall->framesize guarantees or something? And such a "fix" would make it hard to test the real fix! I'm surprised that this issue hasn't come up before.
From: Andrew Morton <akpm@linux-foundation.org> Date: Mon, 27 Sep 2010 20:56:24 -0700 > A blanket suckyfix might be, at the syscall level: > > if (size > 4g) { > do_it_in_4g_hunks(); > do_the_last_bit(); > } > > unless that would break some networking syscall->framesize guarantees > or something? There is not a single length passed in, but a vector of them, that's what the iovec conveys. Even if you could, socket send calls have atomicity guarentees. For a datagram socket, for example, a single send call corresponds to one packet on the network. BTW, this aspect of datagram sockets is a part of the reason we don't see many reports about this stuff. :-) Only stream protocols can really take such enormous lengths, and the most popular (TCP) does much of the iovec handling by hand. We just need to fix our junk, and the {min,max}_t(size_t, ...) change I suggested is likely the easiest path.
verified with kernel 2.6.37-020637.201101050908 Repeated test (see comment #1): Connect to server: 127.0.0.1:9000 size_t max: 18446744073709551615 Try to send 10, sent 12 bytes. Try to send 4195632, sent 7766 bytes. Try to send 2147483647, sent 10000 bytes. Try to send 2147483648, sent 10000 bytes. Try to send 5494538240, sent 10000 bytes. Connection closed