Bug 16603 - send of data > 4 GB fails on 64 bit systems
Summary: send of data > 4 GB fails on 64 bit systems
Status: RESOLVED CODE_FIX
Alias: None
Product: Networking
Classification: Unclassified
Component: IPV4 (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Stephen Hemminger
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-08-16 10:01 UTC by Olaf Bonorden
Modified: 2012-10-30 16:02 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.37
Subsystem:
Regression: No
Bisected commit-id:


Attachments
send.cpp - show effect (2.08 KB, text/x-c++src)
2010-08-22 05:51 UTC, dbuese
Details

Description Olaf Bonorden 2010-08-16 10:01:06 UTC
Sending of data using linux function send fails if size is too large. glibc function is:

ssize_t send(int sockfd, const void *buf, size_t len, int flags);

Type of len is size_t, argument is stored in a kernel structure msgheader which contains an iovec. This iovec contains a size_t (64bit) length field, but in the linux kernel in function tcp_sendmsg the following lines

       while (--iovlen >= 0) {
                int seglen = iov->iov_len;
                unsigned char __user *from = iov->iov_base;


convert the len to int (32 bit).

Thus sending of 5 GB of data results in 1 GB sent (no problem), but sending of 4 GB results in 0 bytes sent.

Workaround in userspace is easy (e.g. instead of len use len < 0x8000000 ? len : 0x7fffffff) but the kernel should handle this correctly.
Comment 1 dbuese 2010-08-22 05:51:55 UTC
Created attachment 27681 [details]
send.cpp - show effect

as size_t will be casted to int by kernel, calling send with (2^31 < (int)datalen < 2^32-1 does nothing
Please find testcase attached. I tested on ubuntu 64 bit with the following result:
Connect to server: 127.0.0.1:9000
size_t max: 18446744073709551615
Try to send 10, sent 12 bytes.
Try to send 4195632, sent 7766 bytes.
Try to send 2147483647, sent 10000 bytes.
Try to send 2147483648, sent 0 bytes.
Try to send 5494538240, sent 10000 bytes.
Connection closed

To reproduce:
sudo apt-get install g++
nc -l -k 9000
g++ -o send ./send.cpp
./send
Comment 2 Andrew Morton 2010-09-27 23:16:38 UTC
(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Mon, 16 Aug 2010 10:01:09 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=16603
> 
>            Summary: send of data > 4 GB fails on 64 bit systems
>            Product: Networking
>            Version: 2.5
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: IPV4
>         AssignedTo: shemminger@linux-foundation.org
>         ReportedBy: bono@onlinehome.de
>         Regression: No
> 
> 
> Sending of data using linux function send fails if size is too large. glibc
> function is:
> 
> ssize_t send(int sockfd, const void *buf, size_t len, int flags);
> 
> Type of len is size_t, argument is stored in a kernel structure msgheader
> which
> contains an iovec. This iovec contains a size_t (64bit) length field, but in
> the linux kernel in function tcp_sendmsg the following lines
> 
>        while (--iovlen >= 0) {
>                 int seglen = iov->iov_len;
>                 unsigned char __user *from = iov->iov_base;
> 
> 
> convert the len to int (32 bit).
> 
> Thus sending of 5 GB of data results in 1 GB sent (no problem), but sending
> of
> 4 GB results in 0 bytes sent.
> 
> Workaround in userspace is easy (e.g. instead of len use len < 0x8000000 ?
> len
> : 0x7fffffff) but the kernel should handle this correctly.
> 

whoops.

Yes, I think seglen should be size_t.  Do you know if making that
change fixes the bug?
Comment 3 David S. Miller 2010-09-28 02:41:35 UTC
From: Andrew Morton <akpm@linux-foundation.org>
Date: Mon, 27 Sep 2010 16:15:40 -0700

> Yes, I think seglen should be size_t.  Do you know if making that
> change fixes the bug?

Oh boy, what a rats nest.

Just scanning generically I see that net/core/iovec.c:verify_iovec()
has similar issues, it should probably use "long" instead of "int"
because it's trying to prevent the return value being interpreted
as an error code.

So if you fix that and make it return "long" this leads to another
set of problems, even if you fix that TCP bit sys_sendmsg() holds
the total length in an 'int' too.  So more stuff to fix.

I'll try to do the whole audit, but no promises ;)
Comment 4 David S. Miller 2010-09-28 03:24:28 UTC
Ok, I suspect the following is enough to fix this specific bug
report, TCP sending.

However, as I stated in my previous reply there are creepy
crawlies all over the place.

For example, all of the routines in net/core/iovec.c (memcpy_toiovec,
memcpy_toiovecend, memcpy_fromiovec, memcpy_fromiovecend,
csum_partial_copy_fromiovecend) that cast using min_t() are currently
casting "down" to "unsigned int".

They should probably case "up" to "size_t".

Otherwise 4GB iov_len's on 64-bit will be truncated to zero just
as they do in the TCP path being fixed here.

If, alterntatively, we want to try and take the size_t values all
the way down the code paths to the individual copies, that is
a huge undertaking.

It's huge because we end up getting to the architecture specific
csum_copy_*() routines, which all take 'int' as the length and
many are written in assembler and would need audits and potentially
changes before we can make the type 'long' or 'size_t'.

Anyways, this part here is simple enough and I'll push it to
Linus and -stable.

--------------------
tcp: Fix >4GB writes on 64-bit.

Fixes kernel bugzilla #16603

tcp_sendmsg() truncates iov_len to an 'int' which a 4GB write to write
zero bytes, for example.

There is also the problem higher up of how verify_iovec() works.  It
wants to prevent the total length from looking like an error return
value.

However it does this using 'int', but syscalls return 'long' (and
thus signed 64-bit on 64-bit machines).  So it could trigger
false-positives on 64-bit as written.  So fix it to use 'long'.

Reported-by: Olaf Bonorden <bono@onlinehome.de>
Reported-by: Daniel Büse <dbuese@gmx.de>
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/include/linux/socket.h b/include/linux/socket.h
index a2fada9..a8f56e1 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -322,7 +322,7 @@ extern int csum_partial_copy_fromiovecend(unsigned char *kdata,
 					  int offset, 
 					  unsigned int len, __wsum *csump);
 
-extern int verify_iovec(struct msghdr *m, struct iovec *iov, struct sockaddr *address, int mode);
+extern long verify_iovec(struct msghdr *m, struct iovec *iov, struct sockaddr *address, int mode);
 extern int memcpy_toiovec(struct iovec *v, unsigned char *kdata, int len);
 extern int memcpy_toiovecend(const struct iovec *v, unsigned char *kdata,
 			     int offset, int len);
diff --git a/net/core/iovec.c b/net/core/iovec.c
index 1cd98df..e6b133b 100644
--- a/net/core/iovec.c
+++ b/net/core/iovec.c
@@ -35,9 +35,10 @@
  *	in any case.
  */
 
-int verify_iovec(struct msghdr *m, struct iovec *iov, struct sockaddr *address, int mode)
+long verify_iovec(struct msghdr *m, struct iovec *iov, struct sockaddr *address, int mode)
 {
-	int size, err, ct;
+	int size, ct;
+	long err;
 
 	if (m->msg_namelen) {
 		if (mode == VERIFY_READ) {
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 95d75d4..f115ea6 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -943,7 +943,7 @@ int tcp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 	sg = sk->sk_route_caps & NETIF_F_SG;
 
 	while (--iovlen >= 0) {
-		int seglen = iov->iov_len;
+		size_t seglen = iov->iov_len;
 		unsigned char __user *from = iov->iov_base;
 
 		iov++;
Comment 5 Andrew Morton 2010-09-28 03:55:51 UTC
On Mon, 27 Sep 2010 20:24:25 -0700 (PDT) David Miller <davem@davemloft.net> wrote:

> Ok, I suspect the following is enough to fix this specific bug
> report, TCP sending.
> 
> However, as I stated in my previous reply there are creepy
> crawlies all over the place.
> 
> For example, all of the routines in net/core/iovec.c (memcpy_toiovec,
> memcpy_toiovecend, memcpy_fromiovec, memcpy_fromiovecend,
> csum_partial_copy_fromiovecend) that cast using min_t() are currently
> casting "down" to "unsigned int".
> 
> They should probably case "up" to "size_t".

eep.

A blanket suckyfix might be, at the syscall level:

	if (size > 4g) {
		do_it_in_4g_hunks();
		do_the_last_bit();
	}

unless that would break some networking syscall->framesize guarantees
or something?

And such a "fix" would make it hard to test the real fix!


I'm surprised that this issue hasn't come up before.
Comment 6 David S. Miller 2010-09-28 04:07:21 UTC
From: Andrew Morton <akpm@linux-foundation.org>
Date: Mon, 27 Sep 2010 20:56:24 -0700

> A blanket suckyfix might be, at the syscall level:
> 
>       if (size > 4g) {
>               do_it_in_4g_hunks();
>               do_the_last_bit();
>       }
> 
> unless that would break some networking syscall->framesize guarantees
> or something?

There is not a single length passed in, but a vector of them, that's
what the iovec conveys.

Even if you could, socket send calls have atomicity guarentees.  For a
datagram socket, for example, a single send call corresponds to one
packet on the network.

BTW, this aspect of datagram sockets is a part of the reason we don't
see many reports about this stuff. :-)  Only stream protocols can
really take such enormous lengths, and the most popular (TCP) does
much of the iovec handling by hand.

We just need to fix our junk, and the {min,max}_t(size_t, ...) change
I suggested is likely the easiest path.
Comment 7 dbuese 2011-01-17 22:25:23 UTC
verified with kernel 2.6.37-020637.201101050908
Repeated test (see comment #1):
Connect to server: 127.0.0.1:9000
size_t max: 18446744073709551615
Try to send 10, sent 12 bytes.
Try to send 4195632, sent 7766 bytes.
Try to send 2147483647, sent 10000 bytes.
Try to send 2147483648, sent 10000 bytes.
Try to send 5494538240, sent 10000 bytes.
Connection closed

Note You need to log in before you can comment on or make changes to this bug.