Bug 8747 - MSG_ERRQUEUE messages do not pass to connected raw sockets
Summary: MSG_ERRQUEUE messages do not pass to connected raw sockets
Status: RESOLVED DUPLICATE of bug 10437
Alias: None
Product: Networking
Classification: Unclassified
Component: IPV6 (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Hideaki YOSHIFUJI
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-07-13 10:01 UTC by Dmitry Butskoy
Modified: 2008-04-11 07:51 UTC (History)
0 users

See Also:
Kernel Version: 2.6.23
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
Proposed patch which should fix this issue (449 bytes, patch)
2007-07-13 10:04 UTC, Dmitry Butskoy
Details | Diff
Ah additional patch needed to fix the issue completely. (515 bytes, patch)
2008-04-03 06:13 UTC, Dmitry Butskoy
Details | Diff

Description Dmitry Butskoy 2007-07-13 10:01:14 UTC
Problem Description:

It is related to the possibility to obtain MSG_ERRQUEUE messages from the udp and raw sockets, both connected and unconnected.

There is a little typo in net/ipv6/icmp.c code, which prevents such messages to be delivered to the errqueue of the correspond raw socket, when the socket is CONNECTED. The typo is due to swap of local/remote addresses.

Consider __raw_v6_lookup() function from net/ipv6/raw.c. When a raw socket is looked up usual way, it is something like:

sk = __raw_v6_lookup(sk, nexthdr, daddr, saddr, IP6CB(skb)->iif);

where "daddr" is a destination address of the incoming packet (IOW our local address), "saddr" is a source address of the incoming packet (the remote end).

But when the raw socket is looked up for some icmp error report, in net/ipv6/icmp.c:icmpv6_notify() , daddr/saddr are obtained from the echoed fragment of the "bad" packet, i.e. "daddr" is the original destination address of that packet, "saddr" is our local address. Hence, for icmpv6_notify() must use "saddr, daddr" in its arguments, not "daddr, saddr" ...


Steps to reproduce:

Create some raw socket, connect it to an address, and cause some error situation: f.e. set ttl=1 where the remote address is more than 1 hop to reach.
Set IPV6_RECVERR .
Then send something and wait for the error (f.e. poll() with POLLERR|POLLIN). You should receive "time exceeded" icmp message (because of "ttl=1"), but the socket do not receive it.

If you do not connect your raw socket, you will receive MSG_ERRQUEUE  successfully. (The reason is that for unconnected socket there are no actual checks for local/remote addresses).
Comment 1 Dmitry Butskoy 2007-07-13 10:04:49 UTC
Created attachment 12027 [details]
Proposed patch which should fix this issue
Comment 2 Andrew Morton 2007-07-13 10:39:31 UTC
Subject: Re: [Bugme-new]  New: MSG_ERRQUEUE messages do not pass
 to connected raw sockets

On Fri, 13 Jul 2007 09:56:20 -0700 (PDT) bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=8747
> 
>            Summary: MSG_ERRQUEUE messages do not pass to connected raw
>                     sockets
>            Product: Networking
>            Version: 2.5
>      KernelVersion: 2.6.22
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: IPV6
>         AssignedTo: yoshfuji@linux-ipv6.org
>         ReportedBy: dmitry@butskoy.name
> 
> 
> Problem Description:
> 
> It is related to the possibility to obtain MSG_ERRQUEUE messages from the udp
> and raw sockets, both connected and unconnected.
> 
> There is a little typo in net/ipv6/icmp.c code, which prevents such messages
> to
> be delivered to the errqueue of the correspond raw socket, when the socket is
> CONNECTED. The typo is due to swap of local/remote addresses.
> 
> Consider __raw_v6_lookup() function from net/ipv6/raw.c. When a raw socket is
> looked up usual way, it is something like:
> 
> sk = __raw_v6_lookup(sk, nexthdr, daddr, saddr, IP6CB(skb)->iif);
> 
> where "daddr" is a destination address of the incoming packet (IOW our local
> address), "saddr" is a source address of the incoming packet (the remote
> end).
> 
> But when the raw socket is looked up for some icmp error report, in
> net/ipv6/icmp.c:icmpv6_notify() , daddr/saddr are obtained from the echoed
> fragment of the "bad" packet, i.e. "daddr" is the original destination
> address
> of that packet, "saddr" is our local address. Hence, for icmpv6_notify() must
> use "saddr, daddr" in its arguments, not "daddr, saddr" ...
> 
> 
> Steps to reproduce:
> 
> Create some raw socket, connect it to an address, and cause some error
> situation: f.e. set ttl=1 where the remote address is more than 1 hop to
> reach.
> Set IPV6_RECVERR .
> Then send something and wait for the error (f.e. poll() with POLLERR|POLLIN).
> You should receive "time exceeded" icmp message (because of "ttl=1"), but the
> socket do not receive it.
> 
> If you do not connect your raw socket, you will receive MSG_ERRQUEUE 
> successfully. (The reason is that for unconnected socket there are no actual
> checks for local/remote addresses).
> 
> 

This bugzilla report includes a patch, which is below.

Dmitry, please don't send patches via bugzilla: we very much prefer that
they be emailed directly as per Documentation/SubmittingPatches, thanks.

--- net/ipv6/icmp.c	2007-02-04 21:44:54.000000000 +0300
+++ net/ipv6/icmp.c.OK	2007-07-13 20:57:37.000000000 +0400
@@ -600,7 +600,7 @@
 
 	read_lock(&raw_v6_lock);
 	if ((sk = sk_head(&raw_v6_htable[hash])) != NULL) {
-		while((sk = __raw_v6_lookup(sk, nexthdr, daddr, saddr,
+		while((sk = __raw_v6_lookup(sk, nexthdr, saddr, daddr,
 					    IP6CB(skb)->iif))) {
 			rawv6_err(sk, skb, NULL, type, code, inner_offset, info);
 			sk = sk_next(sk);
Comment 3 Andrew Morton 2007-07-13 11:29:10 UTC
Added to -mm as net-msg_errqueue-messages-do-not-pass-to-connected-raw-sockets.patch
Comment 4 Dmitry Butskoy 2008-04-02 08:53:04 UTC
The changes applied seem not enough.

Besides the daddr/saddr swapping (which is now fixed), the saddr and daddr itself are wrong. It seems that instead of obtaining them from the *echoed* ipv6 header in the icmp body, both addresses are obtained from the ipv6 header if the icmp packet immediately.

How things should be done, can be found in the similar ipv4 code, as well as in the __udp6_lib_err() ...


I SET THE "HIGH" PRIO FOR THIS, because:

I'm an author of new implementation of the traceroute(8) for Linux (http://traceroute.sourceforge.net). This implementation is now included in most  distros (Fedora/RedHat, Gentoo, Ubuntu etc.).

Previously, I assumed that any kernel >= 2.6.22.2 (when daddr/saddr swap issue was fixed) will be OK, and implement the correspond checks in my code. But since the 2.6.22.2 things still not OK! Hence my assumption that the issue "will be fixed soon" is wrong...

For the proper checking, please, run my traceroute as
"traceroute -6 -I" or "traceroute -6 -T" . In both cases (icmp and tcp tracing) the connected raw sockets are used (certainly for kernels >= 2.6.22.2).

Additionally, please, remove the "Pretty useless feature?" comment from the net/ipv4/icmp.c:icmp_unreach() code, as it is already not useless... :)


Now, for new kernels, including >= 2.6.23, I assume the issue is gone, but it is not true! Hence, people, who do "traceroute -I" or "traceroute -T" (icmp and tcp) for ipv6, 
Comment 5 Dmitry Butskoy 2008-04-03 06:13:09 UTC
Created attachment 15593 [details]
Ah additional patch needed to fix the issue completely.
Comment 6 Dmitry Butskoy 2008-04-10 05:54:11 UTC

*** This bug has been marked as a duplicate of bug 10437 ***

Note You need to log in before you can comment on or make changes to this bug.