Bug 16529 (l2tpv3)

Summary: xennet driver crashes when using with pseudowire aka l2tpv3
Product: Networking Reporter: Thomas Heil (heil)
Component: OtherAssignee: Arnaldo Carvalho de Melo (acme)
Status: RESOLVED CODE_FIX    
Severity: normal    
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.35 Subsystem:
Regression: No Bisected commit-id:

Description Thomas Heil 2010-08-06 12:46:15 UTC
I tried to use use the new l2tpv3 implementation on two xen domU's but one of them is crashing when the first l2tpv3 packet is received.

As a great man mentioned: 
--
guessing that eth_type_trans() has tried to pull an ethernet header from
the skb and has run off the end, which suggests an issue with the skb
that was passed up. Does the ethernet driver do proper alignment of data
in its skbs? Perhaps it doesn't allocate as much headroom as other
drivers? Perhaps the L2TP code assumes things about the skb that aren't
valid..
--
Here is a link for the setup
http://www.pastebin.org/445975
and here a link with more details about the crash http://pastebin.org/449221

According the the hints from James Chapmann we tried generic x86 with a different nic driver like e1000 and this works without any problem. 

Iam using OpenWrt to create the system images. To speed up the process i would prepare Xen images, make them available or whatever is wished because pseudowire is already ready in OpenWrt Trunk so that every man also with "price sensitivy" hardware can use it.
Comment 1 Thomas Heil 2010-08-09 16:58:37 UTC
Is there something I can do to speedup the process?
Comment 2 Andrew Morton 2010-08-25 22:31:46 UTC
(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Fri, 6 Aug 2010 12:46:18 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=16529
> 
>            Summary: xennet driver crashes when using with pseudowire aka
>                     l2tpv3
>            Product: Networking
>            Version: 2.5
>     Kernel Version: 2.6.35
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Other
>         AssignedTo: acme@ghostprotocols.net
>         ReportedBy: heil@terminal-consulting.de
>         Regression: No
> 
> 
> I tried to use use the new l2tpv3 implementation on two xen domU's but one of
> them is crashing when the first l2tpv3 packet is received.
> 
> As a great man mentioned: 
> --
> guessing that eth_type_trans() has tried to pull an ethernet header from
> the skb and has run off the end, which suggests an issue with the skb
> that was passed up. Does the ethernet driver do proper alignment of data
> in its skbs? Perhaps it doesn't allocate as much headroom as other
> drivers? Perhaps the L2TP code assumes things about the skb that aren't
> valid..
> --
> Here is a link for the setup
> http://www.pastebin.org/445975
> and here a link with more details about the crash http://pastebin.org/449221
> 
> According the the hints from James Chapmann we tried generic x86 with a
> different nic driver like e1000 and this works without any problem. 
> 
> Iam using OpenWrt to create the system images. To speed up the process i
> would
> prepare Xen images, make them available or whatever is wished because
> pseudowire is already ready in OpenWrt Trunk so that every man also with
> "price
> sensitivy" hardware can use it.
> 

Seems to be a problem with xennet afacit?
Comment 3 Jeremy Fitzhardinge 2010-08-25 23:51:27 UTC
 On 08/25/2010 03:31 PM, Andrew Morton wrote:
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
>
> On Fri, 6 Aug 2010 12:46:18 GMT
> bugzilla-daemon@bugzilla.kernel.org wrote:
>
>> https://bugzilla.kernel.org/show_bug.cgi?id=16529
>>
>>            Summary: xennet driver crashes when using with pseudowire aka
>>                     l2tpv3
>>            Product: Networking
>>            Version: 2.5
>>     Kernel Version: 2.6.35
>>           Platform: All
>>         OS/Version: Linux
>>               Tree: Mainline
>>             Status: NEW
>>           Severity: normal
>>           Priority: P1
>>          Component: Other
>>         AssignedTo: acme@ghostprotocols.net
>>         ReportedBy: heil@terminal-consulting.de
>>         Regression: No
>>
>>
>> I tried to use use the new l2tpv3 implementation on two xen domU's but one
>> of
>> them is crashing when the first l2tpv3 packet is received.
>>
>> As a great man mentioned: 
>> --
>> guessing that eth_type_trans() has tried to pull an ethernet header from
>> the skb and has run off the end, which suggests an issue with the skb
>> that was passed up. Does the ethernet driver do proper alignment of data
>> in its skbs? Perhaps it doesn't allocate as much headroom as other
>> drivers? Perhaps the L2TP code assumes things about the skb that aren't
>> valid..
>> --
>> Here is a link for the setup
>> http://www.pastebin.org/445975
>> and here a link with more details about the crash http://pastebin.org/449221

Please attach these to the bug or something; pastebin is not working for
me at the moment.

>> According the the hints from James Chapmann we tried generic x86 with a
>> different nic driver like e1000 and this works without any problem. 
>>
>> Iam using OpenWrt to create the system images. To speed up the process i
>> would
>> prepare Xen images, make them available or whatever is wished because
>> pseudowire is already ready in OpenWrt Trunk so that every man also with
>> "price
>> sensitivy" hardware can use it.
>>
> Seems to be a problem with xennet afacit?

Possibly.  What's a l2tpv3 and what does it do differently from other
protocols?

    J
Comment 4 Eric Dumazet 2010-08-26 07:23:20 UTC
Le jeudi 26 août 2010 à 08:10 +0100, Ian Campbell a écrit :
> On Wed, 2010-08-25 at 23:56 +0100, Jeremy Fitzhardinge wrote:
> 
> > >> Here is a link for the setup
> > >> http://www.pastebin.org/445975
> > >> and here a link with more details about the crash
> http://pastebin.org/449221
> > 
> > Please attach these to the bug or something; pastebin is not working for
> > me at the moment.
> 
> It's really slow for me but got there eventually. I've attached the
> files.
> 
> It's apparently hitting this BUG_ON in __skb_pull:
>         
>         static inline unsigned char *__skb_pull(struct sk_buff *skb, unsigned
>         int len)
>         {
>                 skb->len -= len;
>                 BUG_ON(skb->len < skb->data_len);
>                 return skb->data += len;
>         }
> 
> Ian.
> 

Thanks a lot.

I'll submit a patch to fix l2tp_eth_dev_recv() 

Its illegal to call dev_forward_skb() with a too small payload.
Comment 5 Eric Dumazet 2010-08-26 08:03:51 UTC
Here is the patch, could you test it please ?

Thanks !

[PATCH] l2tp: test for malicious frames in l2tp_eth_dev_recv()

close https://bugzilla.kernel.org/show_bug.cgi?id=16529

Before calling dev_forward_skb(), we should make sure skb contains at
least an ethernet header, even if length included in upper layer said
so.

Reported-by: Thomas Heil <heil@terminal-consulting.de>
Reported-by: Ian Campbell <Ian.Campbell@eu.citrix.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/l2tp/l2tp_core.c |    2 +-
 net/l2tp/l2tp_eth.c  |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/l2tp/l2tp_eth.c b/net/l2tp/l2tp_eth.c
index 58c6c4c..0687c5c 100644
--- a/net/l2tp/l2tp_eth.c
+++ b/net/l2tp/l2tp_eth.c
@@ -132,7 +132,7 @@ static void l2tp_eth_dev_recv(struct l2tp_session *session, struct sk_buff *skb,
 		printk("\n");
 	}
 
-	if (data_len < ETH_HLEN)
+	if (skb->len < ETH_HLEN)
 		goto error;
 
 	secpath_reset(skb);
Comment 6 Anonymous Emailer 2010-08-26 08:26:19 UTC
Reply-To: Ian.Campbell@eu.citrix.com

On Wed, 2010-08-25 at 23:56 +0100, Jeremy Fitzhardinge wrote:

> >> Here is a link for the setup
> >> http://www.pastebin.org/445975
> >> and here a link with more details about the crash
> http://pastebin.org/449221
> 
> Please attach these to the bug or something; pastebin is not working for
> me at the moment.

It's really slow for me but got there eventually. I've attached the
files.

It's apparently hitting this BUG_ON in __skb_pull:
        
        static inline unsigned char *__skb_pull(struct sk_buff *skb, unsigned int len)
        {
                skb->len -= len;
                BUG_ON(skb->len < skb->data_len);
                return skb->data += len;
        }

Ian.
Comment 7 Eric Dumazet 2010-08-26 08:35:26 UTC
Le jeudi 26 août 2010 à 09:14 +0100, Ian Campbell a écrit :
> On Thu, 2010-08-26 at 09:03 +0100, Eric Dumazet wrote:
> > Here is the patch, could you test it please ?
> > 
> > Thanks !
> > 
> > [PATCH] l2tp: test for malicious frames in l2tp_eth_dev_recv()
> > 
> > close https://bugzilla.kernel.org/show_bug.cgi?id=16529
> > 
> > Before calling dev_forward_skb(), we should make sure skb contains at
> > least an ethernet header, even if length included in upper layer said
> > so.
> 
> Does this imply that there is some problem with xen-netfront setting
> skb->len or skb->data_len or something incorrectly? It's not clear where
> data_len has come from in this context.

data_len is a 16bit field provided in a prior encapsulation header,
provided by user (untrusted source)

Some buggy or malicious software sent an invalid frame,


< encapsulation [len=1000] >  < 'runt' eth frame (len<14) > 

Another fix would be to change l2tp_recv_dequeue_skb(), and check

L2TP_SKB_CB(skb)->length against skb->len, before calling 

(*session->recv_skb)(session, skb, length);


I prefer the one liner patch I sent you, as a minimum fix.
Comment 8 Anonymous Emailer 2010-08-26 08:56:25 UTC
Reply-To: Ian.Campbell@eu.citrix.com

On Thu, 2010-08-26 at 09:34 +0100, Eric Dumazet wrote:
> Le jeudi 26 août 2010 à 09:14 +0100, Ian Campbell a écrit :
> > On Thu, 2010-08-26 at 09:03 +0100, Eric Dumazet wrote:
> > > Here is the patch, could you test it please ?
> > > 
> > > Thanks !
> > > 
> > > [PATCH] l2tp: test for malicious frames in l2tp_eth_dev_recv()
> > > 
> > > close https://bugzilla.kernel.org/show_bug.cgi?id=16529
> > > 
> > > Before calling dev_forward_skb(), we should make sure skb contains at
> > > least an ethernet header, even if length included in upper layer said
> > > so.
> > 
> > Does this imply that there is some problem with xen-netfront setting
> > skb->len or skb->data_len or something incorrectly? It's not clear where
> > data_len has come from in this context.
> 
> data_len is a 16bit field provided in a prior encapsulation header,
> provided by user (untrusted source)
> 
> Some buggy or malicious software sent an invalid frame,
> 
> 
> < encapsulation [len=1000] >  < 'runt' eth frame (len<14) > 
> 
> Another fix would be to change l2tp_recv_dequeue_skb(), and check
> 
> L2TP_SKB_CB(skb)->length against skb->len, before calling 
> 
> (*session->recv_skb)(session, skb, length);
> 
> 
> I prefer the one liner patch I sent you, as a minimum fix.

Thanks, I just wanted to be sure we weren't papering over a potential
issue in xen-netfront.

Ian.
Comment 9 Anonymous Emailer 2010-08-26 09:29:42 UTC
Reply-To: Ian.Campbell@eu.citrix.com

On Thu, 2010-08-26 at 09:03 +0100, Eric Dumazet wrote:
> Here is the patch, could you test it please ?
> 
> Thanks !
> 
> [PATCH] l2tp: test for malicious frames in l2tp_eth_dev_recv()
> 
> close https://bugzilla.kernel.org/show_bug.cgi?id=16529
> 
> Before calling dev_forward_skb(), we should make sure skb contains at
> least an ethernet header, even if length included in upper layer said
> so.

Does this imply that there is some problem with xen-netfront setting
skb->len or skb->data_len or something incorrectly? It's not clear where
data_len has come from in this context.

Ian.

> 
> Reported-by: Thomas Heil <heil@terminal-consulting.de>
> Reported-by: Ian Campbell <Ian.Campbell@eu.citrix.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
>  net/l2tp/l2tp_core.c |    2 +-
>  net/l2tp/l2tp_eth.c  |    2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/l2tp/l2tp_eth.c b/net/l2tp/l2tp_eth.c
> index 58c6c4c..0687c5c 100644
> --- a/net/l2tp/l2tp_eth.c
> +++ b/net/l2tp/l2tp_eth.c
> @@ -132,7 +132,7 @@ static void l2tp_eth_dev_recv(struct l2tp_session
> *session, struct sk_buff *skb,
>               printk("\n");
>       }
>  
> -     if (data_len < ETH_HLEN)
> +     if (skb->len < ETH_HLEN)
>               goto error;
>  
>       secpath_reset(skb);
> 
>
Comment 10 Eric Dumazet 2010-08-26 09:45:18 UTC
Le jeudi 26 août 2010 à 10:03 +0200, Eric Dumazet a écrit :
> Here is the patch, could you test it please ?
> 
> Thanks !
> 
> [PATCH] l2tp: test for malicious frames in l2tp_eth_dev_recv()
> 
> close https://bugzilla.kernel.org/show_bug.cgi?id=16529
> 
> Before calling dev_forward_skb(), we should make sure skb contains at
> least an ethernet header, even if length included in upper layer said
> so.
> 
> Reported-by: Thomas Heil <heil@terminal-consulting.de>
> Reported-by: Ian Campbell <Ian.Campbell@eu.citrix.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
>  net/l2tp/l2tp_core.c |    2 +-
>  net/l2tp/l2tp_eth.c  |    2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/l2tp/l2tp_eth.c b/net/l2tp/l2tp_eth.c
> index 58c6c4c..0687c5c 100644
> --- a/net/l2tp/l2tp_eth.c
> +++ b/net/l2tp/l2tp_eth.c
> @@ -132,7 +132,7 @@ static void l2tp_eth_dev_recv(struct l2tp_session
> *session, struct sk_buff *skb,
>               printk("\n");
>       }
>  
> -     if (data_len < ETH_HLEN)
> +     if (skb->len < ETH_HLEN)
>               goto error;
>  
>       secpath_reset(skb);
> 

Hmm, reading this code again, I suspect a much better fix is to make
sure 'ethernet header' is in skb head, not in a fragment.

Maybe frame is valid but only L2TP encapsulation in skb->header at this
point.

Thanks !

[PATCH] l2tp: test for ethernet header in l2tp_eth_dev_recv()

close https://bugzilla.kernel.org/show_bug.cgi?id=16529

Before calling dev_forward_skb(), we should make sure skb head contains
at least an ethernet header, even if length included in upper layer said
so. Use pskb_may_pull() to make sure this ethernet header is present in
skb head.

Reported-by: Thomas Heil <heil@terminal-consulting.de>
Reported-by: Ian Campbell <Ian.Campbell@eu.citrix.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/l2tp/l2tp_eth.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/l2tp/l2tp_eth.c b/net/l2tp/l2tp_eth.c
index 58c6c4c..1ae6976 100644
--- a/net/l2tp/l2tp_eth.c
+++ b/net/l2tp/l2tp_eth.c
@@ -132,7 +132,7 @@ static void l2tp_eth_dev_recv(struct l2tp_session *session, struct sk_buff *skb,
 		printk("\n");
 	}
 
-	if (data_len < ETH_HLEN)
+	if (!pskb_may_pull(skb, sizeof(ETH_HLEN)))
 		goto error;
 
 	secpath_reset(skb);
Comment 11 David S. Miller 2010-08-26 21:26:11 UTC
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 26 Aug 2010 11:44:35 +0200

> [PATCH] l2tp: test for ethernet header in l2tp_eth_dev_recv()
> 
> close https://bugzilla.kernel.org/show_bug.cgi?id=16529
> 
> Before calling dev_forward_skb(), we should make sure skb head contains
> at least an ethernet header, even if length included in upper layer said
> so. Use pskb_may_pull() to make sure this ethernet header is present in
> skb head.
> 
> Reported-by: Thomas Heil <heil@terminal-consulting.de>
> Reported-by: Ian Campbell <Ian.Campbell@eu.citrix.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied, thanks Eric.
Comment 12 Thomas Heil 2010-08-27 00:44:01 UTC
Ive create a patch
-
--- a/net/l2tp/l2tp_eth.c
+++ b/net/l2tp/l2tp_eth.c
@@ -132,7 +132,10 @@ static void l2tp_eth_dev_recv(struct l2t
                printk("\n");
        }
 
-       if (data_len < ETH_HLEN)
+  if (skb->len < ETH_HLEN)
+    goto error;
+
+       if (!pskb_may_pull(skb, sizeof(ETH_HLEN)))
                goto error;
 
        secpath_reset(skb);
-
But the result is
root@l-01:/# ------------[ cut here ]------------
kernel BUG at include/linux/skbuff.h:1185!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/kernel/uevent_seqnum
Modules linked in: nf_nat_tftp nf_conntrack_tftp nf_nat_irc nf_conntrack_irc nf_nat_ftp nf_conntrack_ftp ipt_MASQUERADE iptable_nat nf_nat xt_NOTRACK iptable_raw xt_state nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack pppoe pppox ipt_REJECT xt_TCPMSS ipt_LOG xt_comment xt_multiport xt_mac xt_limit iptable_mangle iptable_filter ip_tables xt_tcpudp x_tables l2tp_ip l2tp_eth l2tp_netlink l2tp_core tunnel6 tunnel4 tun ppp_async ppp_generic slhc crc_ccitt xen_netfront evtchn xenfs ipv6

Pid: 0, comm: swapper Tainted: G        W   2.6.35.3 #1 /
EIP: 0061:[<c11d09a8>] EFLAGS: 00010287 CPU: 0
EIP is at eth_type_trans+0x28/0xe0
EAX: 0000004c EBX: dfb23c00 ECX: dfb23c00 EDX: df9ed800
ESI: df9ed800 EDI: dfb23c28 EBP: df9ed800 ESP: c192fb4c
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
Process swapper (pid: 0, ti=c192f000 task=c12ee9a0 task.ti=c12e5000)
Stack:
 dfb23c00 df9ed800 c11bfb45 df8f8400 dfb23c00 e089f4b3 00001b62 00000000
<0> 00000066 dfb23c28 dfb23c00 e089f340 dfb23c28 df8f8400 e08a2c51 c1301fdc
<0> 00000003 c1301fd8 00000000 00000001 00000200 c103647e 00000000 df8f8444
Call Trace:
 [<c11bfb45>] ? dev_forward_skb+0x95/0xc0
 [<e089f4b3>] ? 0xe089f4b3
 [<e089f340>] ? 0xe089f340
 [<e08a2c51>] ? l2tp_recv_common+0x681/0x740 [l2tp_core]
 [<c103647e>] ? __wake_up+0x3e/0x60
 [<e08a30e0>] ? l2tp_udp_recv_core+0x300/0x530 [l2tp_core]
 [<e08a3bab>] ? l2tp_udp_encap_recv+0x6b/0xc0 [l2tp_core]
 [<c1027302>] ? pvclock_clocksource_read+0xe2/0x150
 [<c12023da>] ? udp_queue_rcv_skb+0xba/0x2a0
 [<c1202bfe>] ? __udp4_lib_rcv+0x56e/0x7f0
 [<c1246bbd>] ? _raw_spin_lock_irqsave+0x1d/0x30
Comment 13 Thomas Heil 2010-09-15 11:47:11 UTC
Let me add some important information
 
The crash occours only, if the two xen domu's are running under the same dom0. So this bug seems more related to the xen dom0 / host part.

I also tried other xen domu's like gentoo with 2.6.35. The result is the same. Its not OpenWrt related. Iam running Centos 5.5 64Bit, 2.6.18-194.11.3.el5xen, Xen version 3.1.2-194.11.3.el5.

So I'll try to open a bug at xen