Bug 48701

Summary: Kernel OOPS when transmitting packets using OVS and DSA
Product: Networking Reporter: Barry G (barry)
Component: OtherAssignee: Stephen Hemminger (stephen)
Status: RESOLVED CODE_FIX    
Severity: normal CC: eric.dumazet, jesse, pshelar
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.6.1 Subsystem:
Regression: No Bisected commit-id:
Attachments: Kernel .config

Description Barry G 2012-10-11 16:13:39 UTC
When I compiled the kernel (3.6.1 mainline) with distributed switch architectural (DSA) and OVS (OpenvSwich), I can pass ping traffic fine, but larger packets cause an oops.

This is on a Freescale 83XX Processor:
# uname -a
Linux PPCTEST 3.6.1+ #1 PREEMPT Thu Oct 11 08:36:35 PDT 2012 ppc GNU/Linux
# Unable to handle kernel paging request for data at address 0x12000007
Faulting instruction address: 0xc008b5ec
Oops: Kernel access of bad area, sig: 11 [#1]
PREEMPT SEL B1305
Modules linked in:
NIP: c008b5ec LR: c008b5e0 CTR: c0041a68
REGS: dfffbb30 TRAP: 0300   Not tainted  (3.6.1+)
MSR: 00001032 <ME,IR,DR,RI>  CR: 22048024  XER: 00000000
DAR: 12000007, DSISR: 20000000
TASK = c0407368[0] 'swapper' THREAD: c0424000
GPR00: c008b5e0 dfffbbe0 c0407368 00100100 00000001 dfb91720 000000c3 c040ae80 
GPR08: 00000000 00000000 c0440d30 c07fd000 24048022 101810a0 c03a79b8 c03abacc 
GPR16: c042e474 00200000 dfffbc1c c034be84 c034be6c dfffbc18 00000000 00008100 
GPR24: 00000000 dfabc000 00000000 df9b8384 00009032 dfb91720 00100100 12000007 
NIP [c008b5ec] kfree+0x64/0xd4
LR [c008b5e0] kfree+0x58/0xd4
Call Trace:
[dfffbbe0] [c008b5e0] kfree+0x58/0xd4 (unreliable)
[dfffbbf8] [c0217300] skb_free_head+0x4c/0x5c
[dfffbc00] [c0217bbc] __kfree_skb+0x18/0xbc
[dfffbc10] [c0314c94] do_execute_actions+0x5f8/0x650
[dfffbc78] [c0316158] ovs_dp_process_received_packet+0xbc/0xe8
[dfffbd18] [c0319900] ovs_vport_receive+0x64/0x78
[dfffbd30] [c031a30c] netdev_frame_hook+0xa0/0xb8
[dfffbd48] [c021ff68] __netif_receive_skb+0x50c/0x6d0
[dfffbdb0] [c02f1e1c] dsa_rcv+0x228/0x24c
[dfffbde0] [c02200b8] __netif_receive_skb+0x65c/0x6d0
[dfffbe48] [c0221668] napi_skb_finish+0x38/0x88
[dfffbe58] [c01f5a14] gfar_clean_rx_ring+0x3a4/0x4b0
[dfffbeb8] [c01f87ac] gfar_poll+0x4c0/0x5dc
[dfffbf78] [c0225040] net_rx_action+0x74/0x180
[dfffbfa8] [c001d144] __do_softirq+0xac/0x13c
[dfffbff0] [c000bd20] call_do_softirq+0x14/0x24
[c0425e80] [c000500c] do_softirq+0x70/0xb0
[c0425ea0] [c001d220] irq_exit+0x4c/0x80
[c0425ea8] [c0005158] do_IRQ+0x10c/0x128
[c0425ed0] [c000dce0] ret_from_except+0x0/0x14
--- Exception: 501 at cpu_idle+0x98/0xec
    LR = cpu_idle+0x98/0xec
[c0425f90] [c0007f08] cpu_idle+0x54/0xec (unreliable)
[c0425fa8] [c0003d58] rest_init+0x78/0xa0
[c0425fc0] [c03dc934] start_kernel+0x288/0x29c
[c0425ff0] [00003438] 0x3438
Instruction dump:
5529c9f4 7c09582e 7c695a14 70098000 41a20008 8063001c 83c30014 7cbd2b78 
480fec99 5463103a 7c63f214 83e30058 <813f0000> 801f0004 7f890040 41bc0010 
---[ end trace ffca2b34cb60156e ]---

Kernel panic - not syncing: Fatal exception in interrupt
Rebooting in 10 seconds..

This is 100% repeatable.  

I had to do some small kernel tweaks to get the DSA drivers to find the hardware, so it is possible I broke something, but unlikely since my changes
were not in this area at all.

Barry
Comment 1 Barry G 2012-10-11 16:14:19 UTC
Created attachment 82951 [details]
Kernel .config

This is the kernel config I used.
Comment 2 pshelar 2012-10-12 07:24:13 UTC
(In reply to comment #1)
> Created an attachment (id=82951) [details]
> Kernel .config
> 
> This is the kernel config I used.
I could not reproduce it on x86_64 arch and I do not have access to PPC or DSA, now I am looking at code. meanwhile can you try to reproduce it only with PPC without DSA, so that we can narrow it down?
one more thing, Are you using any openflow rule for this network?
Comment 3 Barry G 2012-10-12 15:39:13 UTC

Thanks for the help.  

All tests are done using OpenvSwitch 1.7.1 (unless mainline kernel module used as mentioned).

For kernels older than 3.6, I am building the openvswitch module (datapath/linux/openvswitch.ko)
                                                                                                                                                                                      
Here is what I have learned so far:                                                                                                                                                   
Linux 3.0.28 works great (datapath_type=netdev or system (openvswitch.ko mod from 1.7.1))                                                                                             
Linux 3.2.31 works great (datapath_type=netdev or system (openvswitch.ko mod from 1.7.1))                                                                                             
Linux 3.6.1 works great (datapath_type=netdev)                                                                                                                                        
Linux 3.6.1 causes OOPS (datapath_type=system (CONFIG_OPENVSWITCH=y))                                                                                                                 
                                                                                                                                                                                      
I tried to compile the openvswitch.ko distributed with 1.7.1 for 3.6.1, but                                                                                                           
got the following error:                                                                                                                                                              
datapath.c:42:24: error: asm/system.h: No such file or directory                                                                                                                      
datapath.c:66:2: error: #error Kernels before 2.6.18 or after 3.3 are not supported by this version of Open vSwitch.                                                                  
                                                                                                                                                                                      
It doesn't appear to be the DSA drivers alone causing the oops since                                                                                                                  
3.6.1 and a datapath_type=netdev has never OOPS'd.                                                                                                                                    
                                                                                                                                                                                      
Is there some compatibility issue between 1.7.1 (userspace) and the mainline 3.6.1 OVS mod?                                                                                           
                                                                                                                                                                                      
Unfortunately my box is a powerpc/marvell only box, so if I disable DSA                                                                                                               
I won't have networking ports :-(                                                                                                                                                     
                                                                                                                                                                                      
As far as the OOPS goes, it isn't instant.  The flows get added, packets                                                                                                              
are flowing through the device, and at some point 1 to 10 seconds later                                                                                                                
the OOPS appears.  Timing issue?                                                                                                                   
                                                                                                                                                                                      
I will try 3.3/other bisecting and see if I can learn anything.                                                                                                                       
                                                                                                                                                                                      
Thanks,                                                                                                                                                                               
                                                                                                                                                                                      
Barry                                                                                                                                                                                 
                                                                                                                                                                                      
Additional Info:                                                                                                                                                                      
                                                                                                                                                                                      
The flows that are instantiated by the flow controller (Floodlight) in this                                                                                                           
testbed are:                                                                                                                                                                          
# ovs-ofctl dump-flows br0                                                                                                                                                            
NXST_FLOW reply (xid=0x4):                                                                                                                                                            
 cookie=0x20000000000000, duration=37.938s, table=0, n_packets=38, n_bytes=3686, idle_timeout=5, idle_age=0, priority=0,in_port=6,vlan_tci=0x0000,dl_src=00:04:75:ad:65:74,dl_dst=00:3
0:a7:01:6d:e7 actions=output:15                                                                                                                                                       
 cookie=0x20000000000000, duration=38.197s, table=0, n_packets=75, n_bytes=7312, idle_timeout=5, idle_age=1, priority=0,in_port=22,vlan_tci=0x0000,dl_src=00:1c:23:1b:2a:13,dl_dst=00:
04:75:ad:65:74 actions=output:6
 cookie=0x20000000000000, duration=38.186s, table=0, n_packets=37, n_bytes=3626, idle_timeout=5, idle_age=1, priority=0,in_port=6,vlan_tci=0x0000,dl_src=00:04:75:ad:65:74,dl_dst=00:3
0:a7:42:4a:47 actions=output:16
 cookie=0x20000000000000, duration=38.169s, table=0, n_packets=74, n_bytes=7252, idle_timeout=5, idle_age=1, priority=0,in_port=6,vlan_tci=0x0000,dl_src=00:04:75:ad:65:74,dl_dst=00:1
c:23:1b:2a:13 actions=output:22
 cookie=0x20000000000000, duration=81.435s, table=0, n_packets=81, n_bytes=7938, idle_timeout=5, idle_age=1, priority=0,in_port=16,vlan_tci=0x0000,dl_src=00:30:a7:42:4a:47,dl_dst=00:
1c:23:1b:2a:13 actions=output:22
 cookie=0x20000000000000, duration=37.946s, table=0, n_packets=39, n_bytes=3784, idle_timeout=5, idle_age=0, priority=0,in_port=15,vlan_tci=0x0000,dl_src=00:30:a7:01:6d:e7,dl_dst=00:
04:75:ad:65:74 actions=output:6
 cookie=0x20000000000000, duration=81.427s, table=0, n_packets=80, n_bytes=7840, idle_timeout=5, idle_age=1, priority=0,in_port=22,vlan_tci=0x0000,dl_src=00:1c:23:1b:2a:13,dl_dst=00:
30:a7:42:4a:47 actions=output:16
 cookie=0x20000000000000, duration=81.131s, table=0, n_packets=84, n_bytes=8194, idle_timeout=5, idle_age=1, priority=0,in_port=15,vlan_tci=0x0000,dl_src=00:30:a7:01:6d:e7,dl_dst=00:
1c:23:1b:2a:13 actions=output:22
 cookie=0x20000000000000, duration=81.121s, table=0, n_packets=81, n_bytes=7900, idle_timeout=5, idle_age=1, priority=0,in_port=22,vlan_tci=0x0000,dl_src=00:1c:23:1b:2a:13,dl_dst=00:
30:a7:01:6d:e7 actions=output:15
 cookie=0x20000000000000, duration=38.197s, table=0, n_packets=38, n_bytes=3724, idle_timeout=5, idle_age=1, priority=0,in_port=16,vlan_tci=0x0000,dl_src=00:30:a7:42:4a:47,dl_dst=00:
04:75:ad:65:74 actions=output:6


# ovs-dpctl dump-flows br0
in_port(15),eth(src=00:30:a7:01:6d:e7,dst=00:04:75:ad:65:74),eth_type(0x0800),ipv4(src=192.168.1.42,dst=192.168.1.5,proto=1,tos=0,ttl=64,frag=no),icmp(type=0,code=0), packets:47, byt
es:4606, used:0.041s, actions:6
in_port(22),eth(src=00:1c:23:1b:2a:13,dst=00:04:75:ad:65:74),eth_type(0x0800),ipv4(src=192.168.1.40,dst=192.168.1.5,proto=1,tos=0,ttl=64,frag=no),icmp(type=8,code=0), packets:47, byt
es:4606, used:0.771s, actions:6
in_port(6),eth(src=00:04:75:ad:65:74,dst=00:1c:23:1b:2a:13),eth_type(0x0800),ipv4(src=192.168.1.5,dst=192.168.1.40,proto=1,tos=0,ttl=64,frag=no),icmp(type=8,code=0), packets:46, byte
s:4508, used:0.661s, actions:22
in_port(16),eth(src=00:30:a7:42:4a:47,dst=00:1c:23:1b:2a:13),eth_type(0x0800),ipv4(src=192.168.1.41,dst=192.168.1.40,proto=1,tos=0,ttl=255,frag=no),icmp(type=0,code=0), packets:90, b
ytes:8820, used:0.490s, actions:22
in_port(15),eth(src=00:30:a7:01:6d:e7,dst=00:1c:23:1b:2a:13),eth_type(0x0800),ipv4(src=192.168.1.42,dst=192.168.1.40,proto=1,tos=0,ttl=64,frag=no),icmp(type=0,code=0), packets:90, by
tes:8820, used:0.199s, actions:22
in_port(6),eth(src=00:04:75:ad:65:74,dst=00:30:a7:42:4a:47),eth_type(0x0800),ipv4(src=192.168.1.5,dst=192.168.1.41,proto=1,tos=0,ttl=64,frag=no),icmp(type=8,code=0), packets:47, byte
s:4606, used:0.321s, actions:16
in_port(6),eth(src=00:04:75:ad:65:74,dst=00:30:a7:01:6d:e7),eth_type(0x0800),ipv4(src=192.168.1.5,dst=192.168.1.42,proto=1,tos=0,ttl=64,frag=no),icmp(type=8,code=0), packets:47, byte
s:4606, used:0.042s, actions:15
in_port(22),eth(src=00:1c:23:1b:2a:13,dst=00:30:a7:42:4a:47),eth_type(0x0800),ipv4(src=192.168.1.40,dst=192.168.1.41,proto=1,tos=0,ttl=64,frag=no),icmp(type=8,code=0), packets:90, by
tes:8820, used:0.490s, actions:16
in_port(22),eth(src=00:1c:23:1b:2a:13,dst=00:30:a7:01:6d:e7),eth_type(0x0800),ipv4(src=192.168.1.40,dst=192.168.1.42,proto=1,tos=0,ttl=64,frag=no),icmp(type=8,code=0), packets:90, by
tes:8820, used:0.199s, actions:15
in_port(16),eth(src=00:30:a7:42:4a:47,dst=00:04:75:ad:65:74),eth_type(0x0800),ipv4(src=192.168.1.41,dst=192.168.1.5,proto=1,tos=0,ttl=255,frag=no),icmp(type=0,code=0), packets:47, by
tes:4606, used:0.320s, actions:6
in_port(6),eth(src=00:04:75:ad:65:74,dst=00:1c:23:1b:2a:13),eth_type(0x0800),ipv4(src=192.168.1.5,dst=192.168.1.40,proto=1,tos=0,ttl=64,frag=no),icmp(type=0,code=0), packets:47, byte
s:4606, used:0.771s, actions:22
in_port(22),eth(src=00:1c:23:1b:2a:13,dst=00:04:75:ad:65:74),eth_type(0x0800),ipv4(src=192.168.1.40,dst=192.168.1.5,proto=1,tos=0,ttl=64,frag=no),icmp(type=0,code=0), packets:46, byt
es:4508, used:0.660s, actions:6
# 


Here is another OOPS from today:
# Unable to handle kernel paging request for data at address 0x00100158
Faulting instruction address: 0xc008b5e8
Oops: Kernel access of bad area, sig: 11 [#1]
PREEMPT SELTEST
Modules linked in:
NIP: c008b5e8 LR: c008b5e0 CTR: c0041a68
REGS: dfffbb98 TRAP: 0300   Not tainted  (3.6.1+)
MSR: 00001032 <ME,IR,DR,RI>  CR: 22044024  XER: 00000000
DAR: 00100158, DSISR: 20000000
TASK = c0407368[0] 'swapper' THREAD: c0424000
GPR00: c008b5e0 dfffbc48 c0407368 00100100 00000001 df0f4000 000000c3 c040ae80 
GPR08: 00000000 00000000 c0440d30 c07fd000 24044082 10083760 c03a79b8 c03abacc 
GPR16: c042e474 00200000 dfffbd60 c042e470 c0423840 c041da84 dfffa000 000086dd 
GPR24: 00008100 c0430000 df95dc00 c0c00498 00009032 df0f4000 00100100 dfb79e20 
NIP [c008b5e8] kfree+0x60/0xd4
LR [c008b5e0] kfree+0x58/0xd4
Call Trace:
[dfffbc48] [c008b5e0] kfree+0x58/0xd4 (unreliable)
[dfffbc60] [c0217300] skb_free_head+0x4c/0x5c
[dfffbc68] [c0217bbc] __kfree_skb+0x18/0xbc
[dfffbc78] [c0316138] ovs_dp_process_received_packet+0x9c/0xe8
[dfffbd18] [c0319900] ovs_vport_receive+0x64/0x78
[dfffbd30] [c031a30c] netdev_frame_hook+0xa0/0xb8
[dfffbd48] [c021ff68] __netif_receive_skb+0x50c/0x6d0
[dfffbdb0] [c02f1e1c] dsa_rcv+0x228/0x24c
[dfffbde0] [c02200b8] __netif_receive_skb+0x65c/0x6d0
[dfffbe48] [c0221668] napi_skb_finish+0x38/0x88
[dfffbe58] [c01f5a14] gfar_clean_rx_ring+0x3a4/0x4b0
[dfffbeb8] [c01f87ac] gfar_poll+0x4c0/0x5dc
[dfffbf78] [c0225040] net_rx_action+0x74/0x180
[dfffbfa8] [c001d144] __do_softirq+0xac/0x13c
[dfffbff0] [c000bd20] call_do_softirq+0x14/0x24
[c0425e80] [c000500c] do_softirq+0x70/0xb0
[c0425ea0] [c001d220] irq_exit+0x4c/0x80
[c0425ea8] [c0005158] do_IRQ+0x10c/0x128
[c0425ed0] [c000dce0] ret_from_except+0x0/0x14
--- Exception: 501 at cpu_idle+0x98/0xec
    LR = cpu_idle+0x98/0xec
[c0425f90] [c0007f08] cpu_idle+0x54/0xec (unreliable)
[c0425fa8] [c0003d58] rest_init+0x78/0xa0
[c0425fc0] [c03dc934] start_kernel+0x288/0x29c
[c0425ff0] [00003438] 0x3438
Instruction dump:
3d234000 5529c9f4 7c09582e 7c695a14 70098000 41a20008 8063001c 83c30014 
7cbd2b78 480fec99 5463103a 7c63f214 <83e30058> 813f0000 801f0004 7f890040 
---[ end trace 14b944389ce8f931 ]---
Comment 4 Barry G 2012-10-12 21:58:59 UTC
Hello,

Today I had a little bit of time.  I discovered that
3.3.8 worked fine with the in-kernel ovs module, but 3.6.1 
died.  I git bisect'd it down and this is the changeset that is breaking me:
$ git bisect bad
a1c7fff7e18f59e684e07b0f9a770561cd39f395 is the first bad commit
commit a1c7fff7e18f59e684e07b0f9a770561cd39f395
Author: Eric Dumazet <edumazet@google.com>
Date:   Thu May 17 07:34:16 2012 +0000

    net: netdev_alloc_skb() use build_skb()
    
    netdev_alloc_skb() is used by networks driver in their RX path to
    allocate an skb to receive an incoming frame.
    
    With recent skb->head_frag infrastructure, it makes sense to change
    netdev_alloc_skb() to use build_skb() and a frag allocator.
    
    This permits a zero copy splice(socket->pipe), and better GRO or TCP
    coalescing.
    
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

:040000 040000 17938b1b46bc38aa126cc23b7a7647259297657d 1e29cf65869391eb13552c51e0cf288fc7085fec M      net
Comment 5 pshelar 2012-10-12 22:25:01 UTC
(In reply to comment #3)
> 
> Thanks for the help.  
> 
> All tests are done using OpenvSwitch 1.7.1 (unless mainline kernel module
> used
> as mentioned).
> 
> For kernels older than 3.6, I am building the openvswitch module
> (datapath/linux/openvswitch.ko)
> 
> Here is what I have learned so far:                                           
> Linux 3.0.28 works great (datapath_type=netdev or system (openvswitch.ko mod
> from 1.7.1))                                                                  
> Linux 3.2.31 works great (datapath_type=netdev or system (openvswitch.ko mod
> from 1.7.1))                                                                  
> Linux 3.6.1 works great (datapath_type=netdev)                                
> Linux 3.6.1 causes OOPS (datapath_type=system (CONFIG_OPENVSWITCH=y))         
> 
> I tried to compile the openvswitch.ko distributed with 1.7.1 for 3.6.1, but   
> got the following error:                                                      
> datapath.c:42:24: error: asm/system.h: No such file or directory              
> datapath.c:66:2: error: #error Kernels before 2.6.18 or after 3.3 are not
> supported by this version of Open vSwitch.                                    
> 
right 3.6 support is not yet added to out of tree ovs.

> It doesn't appear to be the DSA drivers alone causing the oops since          
> 3.6.1 and a datapath_type=netdev has never OOPS'd.                            
> 
> Is there some compatibility issue between 1.7.1 (userspace) and the mainline
> 3.6.1 OVS mod?                                                                
>
There should not be any compatibility issue. 

> Unfortunately my box is a powerpc/marvell only box, so if I disable DSA       
> I won't have networking ports :-(                                             
> 
> As far as the OOPS goes, it isn't instant.  The flows get added, packets      
> are flowing through the device, and at some point 1 to 10 seconds later       
> the OOPS appears.  Timing issue?                                              
> 
> I will try 3.3/other bisecting and see if I can learn anything.               
> 
> Thanks,                                                                       
> 
> Barry                                                                         
> 
> Additional Info:                                                              

this is helpful
Thanks
> 
> The flows that are instantiated by the flow controller (Floodlight) in this   
> testbed are:                                                                  
> # ovs-ofctl dump-flows br0                                                    
> NXST_FLOW reply (xid=0x4):                                                    
>  cookie=0x20000000000000, duration=37.938s, table=0, n_packets=38,
> n_bytes=3686, idle_timeout=5, idle_age=0,
> priority=0,in_port=6,vlan_tci=0x0000,dl_src=00:04:75:ad:65:74,dl_dst=00:3
> 0:a7:01:6d:e7 actions=output:15                                               
>  cookie=0x20000000000000, duration=38.197s, table=0, n_packets=75,
> n_bytes=7312, idle_timeout=5, idle_age=1,
> priority=0,in_port=22,vlan_tci=0x0000,dl_src=00:1c:23:1b:2a:13,dl_dst=00:
> 04:75:ad:65:74 actions=output:6
>  cookie=0x20000000000000, duration=38.186s, table=0, n_packets=37,
> n_bytes=3626, idle_timeout=5, idle_age=1,
> priority=0,in_port=6,vlan_tci=0x0000,dl_src=00:04:75:ad:65:74,dl_dst=00:3
> 0:a7:42:4a:47 actions=output:16
>  cookie=0x20000000000000, duration=38.169s, table=0, n_packets=74,
> n_bytes=7252, idle_timeout=5, idle_age=1,
> priority=0,in_port=6,vlan_tci=0x0000,dl_src=00:04:75:ad:65:74,dl_dst=00:1
> c:23:1b:2a:13 actions=output:22
>  cookie=0x20000000000000, duration=81.435s, table=0, n_packets=81,
> n_bytes=7938, idle_timeout=5, idle_age=1,
> priority=0,in_port=16,vlan_tci=0x0000,dl_src=00:30:a7:42:4a:47,dl_dst=00:
> 1c:23:1b:2a:13 actions=output:22
>  cookie=0x20000000000000, duration=37.946s, table=0, n_packets=39,
> n_bytes=3784, idle_timeout=5, idle_age=0,
> priority=0,in_port=15,vlan_tci=0x0000,dl_src=00:30:a7:01:6d:e7,dl_dst=00:
> 04:75:ad:65:74 actions=output:6
>  cookie=0x20000000000000, duration=81.427s, table=0, n_packets=80,
> n_bytes=7840, idle_timeout=5, idle_age=1,
> priority=0,in_port=22,vlan_tci=0x0000,dl_src=00:1c:23:1b:2a:13,dl_dst=00:
> 30:a7:42:4a:47 actions=output:16
>  cookie=0x20000000000000, duration=81.131s, table=0, n_packets=84,
> n_bytes=8194, idle_timeout=5, idle_age=1,
> priority=0,in_port=15,vlan_tci=0x0000,dl_src=00:30:a7:01:6d:e7,dl_dst=00:
> 1c:23:1b:2a:13 actions=output:22
>  cookie=0x20000000000000, duration=81.121s, table=0, n_packets=81,
> n_bytes=7900, idle_timeout=5, idle_age=1,
> priority=0,in_port=22,vlan_tci=0x0000,dl_src=00:1c:23:1b:2a:13,dl_dst=00:
> 30:a7:01:6d:e7 actions=output:15
>  cookie=0x20000000000000, duration=38.197s, table=0, n_packets=38,
> n_bytes=3724, idle_timeout=5, idle_age=1,
> priority=0,in_port=16,vlan_tci=0x0000,dl_src=00:30:a7:42:4a:47,dl_dst=00:
> 04:75:ad:65:74 actions=output:6
> 
> 
> # ovs-dpctl dump-flows br0
>
> in_port(15),eth(src=00:30:a7:01:6d:e7,dst=00:04:75:ad:65:74),eth_type(0x0800),ipv4(src=192.168.1.42,dst=192.168.1.5,proto=1,tos=0,ttl=64,frag=no),icmp(type=0,code=0),
> packets:47, byt
> es:4606, used:0.041s, actions:6
>
> in_port(22),eth(src=00:1c:23:1b:2a:13,dst=00:04:75:ad:65:74),eth_type(0x0800),ipv4(src=192.168.1.40,dst=192.168.1.5,proto=1,tos=0,ttl=64,frag=no),icmp(type=8,code=0),
> packets:47, byt
> es:4606, used:0.771s, actions:6
>
> in_port(6),eth(src=00:04:75:ad:65:74,dst=00:1c:23:1b:2a:13),eth_type(0x0800),ipv4(src=192.168.1.5,dst=192.168.1.40,proto=1,tos=0,ttl=64,frag=no),icmp(type=8,code=0),
> packets:46, byte
> s:4508, used:0.661s, actions:22
>
> in_port(16),eth(src=00:30:a7:42:4a:47,dst=00:1c:23:1b:2a:13),eth_type(0x0800),ipv4(src=192.168.1.41,dst=192.168.1.40,proto=1,tos=0,ttl=255,frag=no),icmp(type=0,code=0),
> packets:90, b
> ytes:8820, used:0.490s, actions:22
>
> in_port(15),eth(src=00:30:a7:01:6d:e7,dst=00:1c:23:1b:2a:13),eth_type(0x0800),ipv4(src=192.168.1.42,dst=192.168.1.40,proto=1,tos=0,ttl=64,frag=no),icmp(type=0,code=0),
> packets:90, by
> tes:8820, used:0.199s, actions:22
>
> in_port(6),eth(src=00:04:75:ad:65:74,dst=00:30:a7:42:4a:47),eth_type(0x0800),ipv4(src=192.168.1.5,dst=192.168.1.41,proto=1,tos=0,ttl=64,frag=no),icmp(type=8,code=0),
> packets:47, byte
> s:4606, used:0.321s, actions:16
>
> in_port(6),eth(src=00:04:75:ad:65:74,dst=00:30:a7:01:6d:e7),eth_type(0x0800),ipv4(src=192.168.1.5,dst=192.168.1.42,proto=1,tos=0,ttl=64,frag=no),icmp(type=8,code=0),
> packets:47, byte
> s:4606, used:0.042s, actions:15
>
> in_port(22),eth(src=00:1c:23:1b:2a:13,dst=00:30:a7:42:4a:47),eth_type(0x0800),ipv4(src=192.168.1.40,dst=192.168.1.41,proto=1,tos=0,ttl=64,frag=no),icmp(type=8,code=0),
> packets:90, by
> tes:8820, used:0.490s, actions:16
>
> in_port(22),eth(src=00:1c:23:1b:2a:13,dst=00:30:a7:01:6d:e7),eth_type(0x0800),ipv4(src=192.168.1.40,dst=192.168.1.42,proto=1,tos=0,ttl=64,frag=no),icmp(type=8,code=0),
> packets:90, by
> tes:8820, used:0.199s, actions:15
>
> in_port(16),eth(src=00:30:a7:42:4a:47,dst=00:04:75:ad:65:74),eth_type(0x0800),ipv4(src=192.168.1.41,dst=192.168.1.5,proto=1,tos=0,ttl=255,frag=no),icmp(type=0,code=0),
> packets:47, by
> tes:4606, used:0.320s, actions:6
>
> in_port(6),eth(src=00:04:75:ad:65:74,dst=00:1c:23:1b:2a:13),eth_type(0x0800),ipv4(src=192.168.1.5,dst=192.168.1.40,proto=1,tos=0,ttl=64,frag=no),icmp(type=0,code=0),
> packets:47, byte
> s:4606, used:0.771s, actions:22
>
> in_port(22),eth(src=00:1c:23:1b:2a:13,dst=00:04:75:ad:65:74),eth_type(0x0800),ipv4(src=192.168.1.40,dst=192.168.1.5,proto=1,tos=0,ttl=64,frag=no),icmp(type=0,code=0),
> packets:46, byt
> es:4508, used:0.660s, actions:6
> # 
> 
> 
> Here is another OOPS from today:
> # Unable to handle kernel paging request for data at address 0x00100158
> Faulting instruction address: 0xc008b5e8
> Oops: Kernel access of bad area, sig: 11 [#1]
> PREEMPT SELTEST
> Modules linked in:
> NIP: c008b5e8 LR: c008b5e0 CTR: c0041a68
> REGS: dfffbb98 TRAP: 0300   Not tainted  (3.6.1+)
> MSR: 00001032 <ME,IR,DR,RI>  CR: 22044024  XER: 00000000
> DAR: 00100158, DSISR: 20000000
> TASK = c0407368[0] 'swapper' THREAD: c0424000
> GPR00: c008b5e0 dfffbc48 c0407368 00100100 00000001 df0f4000 000000c3
> c040ae80 
> GPR08: 00000000 00000000 c0440d30 c07fd000 24044082 10083760 c03a79b8
> c03abacc 
> GPR16: c042e474 00200000 dfffbd60 c042e470 c0423840 c041da84 dfffa000
> 000086dd 
> GPR24: 00008100 c0430000 df95dc00 c0c00498 00009032 df0f4000 00100100
> dfb79e20 
> NIP [c008b5e8] kfree+0x60/0xd4
> LR [c008b5e0] kfree+0x58/0xd4
> Call Trace:
> [dfffbc48] [c008b5e0] kfree+0x58/0xd4 (unreliable)
> [dfffbc60] [c0217300] skb_free_head+0x4c/0x5c
> [dfffbc68] [c0217bbc] __kfree_skb+0x18/0xbc
> [dfffbc78] [c0316138] ovs_dp_process_received_packet+0x9c/0xe8
> [dfffbd18] [c0319900] ovs_vport_receive+0x64/0x78
> [dfffbd30] [c031a30c] netdev_frame_hook+0xa0/0xb8
> [dfffbd48] [c021ff68] __netif_receive_skb+0x50c/0x6d0
> [dfffbdb0] [c02f1e1c] dsa_rcv+0x228/0x24c
> [dfffbde0] [c02200b8] __netif_receive_skb+0x65c/0x6d0
> [dfffbe48] [c0221668] napi_skb_finish+0x38/0x88
> [dfffbe58] [c01f5a14] gfar_clean_rx_ring+0x3a4/0x4b0
> [dfffbeb8] [c01f87ac] gfar_poll+0x4c0/0x5dc
> [dfffbf78] [c0225040] net_rx_action+0x74/0x180
> [dfffbfa8] [c001d144] __do_softirq+0xac/0x13c
> [dfffbff0] [c000bd20] call_do_softirq+0x14/0x24
> [c0425e80] [c000500c] do_softirq+0x70/0xb0
> [c0425ea0] [c001d220] irq_exit+0x4c/0x80
> [c0425ea8] [c0005158] do_IRQ+0x10c/0x128
> [c0425ed0] [c000dce0] ret_from_except+0x0/0x14
> --- Exception: 501 at cpu_idle+0x98/0xec
>     LR = cpu_idle+0x98/0xec
> [c0425f90] [c0007f08] cpu_idle+0x54/0xec (unreliable)
> [c0425fa8] [c0003d58] rest_init+0x78/0xa0
> [c0425fc0] [c03dc934] start_kernel+0x288/0x29c
> [c0425ff0] [00003438] 0x3438
> Instruction dump:
> 3d234000 5529c9f4 7c09582e 7c695a14 70098000 41a20008 8063001c 83c30014 
> 7cbd2b78 480fec99 5463103a 7c63f214 <83e30058> 813f0000 801f0004 7f890040 
> ---[ end trace 14b944389ce8f931 ]---
Comment 6 pshelar 2012-10-12 23:26:14 UTC
(In reply to comment #4)
> Hello,
> 
> Today I had a little bit of time.  I discovered that
> 3.3.8 worked fine with the in-kernel ovs module, but 3.6.1 
> died.  I git bisect'd it down and this is the changeset that is breaking me:
> $ git bisect bad
> a1c7fff7e18f59e684e07b0f9a770561cd39f395 is the first bad commit
> commit a1c7fff7e18f59e684e07b0f9a770561cd39f395
> Author: Eric Dumazet <edumazet@google.com>
> Date:   Thu May 17 07:34:16 2012 +0000
> 
>     net: netdev_alloc_skb() use build_skb()
> 
>     netdev_alloc_skb() is used by networks driver in their RX path to
>     allocate an skb to receive an incoming frame.
> 
>     With recent skb->head_frag infrastructure, it makes sense to change
>     netdev_alloc_skb() to use build_skb() and a frag allocator.
> 
>     This permits a zero copy splice(socket->pipe), and better GRO or TCP
>     coalescing.
> 
>     Signed-off-by: Eric Dumazet <edumazet@google.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> :040000 040000 17938b1b46bc38aa126cc23b7a7647259297657d
> 1e29cf65869391eb13552c51e0cf288fc7085fec M      net

OK, I do see packet with head_frag in my setup but no crash. So it looks something else also required to crash it.
what device offload are set on DSA devices? TSO, GRO, etc?
Comment 7 Barry G 2012-10-12 23:33:24 UTC
The only setting that my device appears to support is GRO.  I get the crash with GRO enabled or disabled (at least on the node DSA devices).  Haven't tried messing with the device offload capabilities on the CPU port.  Will try that.
Comment 8 Eric Dumazet 2012-10-13 08:20:37 UTC
I wonder if its not yet another interaction with skb recycling

This was removed in commit acb600def2110

Could you try latest Linus tree ?
Comment 9 Eric Dumazet 2012-10-13 09:12:21 UTC
Also could you post the disassembly of build_skb() function ?

I wonder if the following :

memset(skb, 0, offsetof(struct sk_buff, tail));
...
skb->head_frag = frag_size != 0;

could be reordered so that the memset() is done after skb->head_frag = 1
Comment 10 Barry G 2012-10-14 05:24:12 UTC
I am using gcc 4.2.4.  Here is the disassembly of build_skb() in a1c7fff7:

00001ddc <build_skb>:
    1ddc:       94 21 ff e8     stwu    r1,-24(r1)
    1de0:       7c 08 02 a6     mflr    r0
    1de4:       bf 81 00 08     stmw    r28,8(r1)
    1de8:       7c 9c 23 79     mr.     r28,r4
    1dec:       7c 7e 1b 78     mr      r30,r3
    1df0:       90 01 00 1c     stw     r0,28(r1)
    1df4:       41 82 00 0c     beq-    1e00 <build_skb+0x24>
    1df8:       7f 9d e3 78     mr      r29,r28
    1dfc:       48 00 00 0c     b       1e08 <build_skb+0x2c>
    1e00:       48 00 00 01     bl      1e00 <build_skb+0x24>
    1e04:       7c 7d 1b 78     mr      r29,r3
    1e08:       3d 20 00 00     lis     r9,0
    1e0c:       38 80 00 20     li      r4,32
    1e10:       80 69 00 00     lwz     r3,0(r9)
    1e14:       3b e0 00 00     li      r31,0
    1e18:       48 00 00 01     bl      1e18 <build_skb+0x3c>
    1e1c:       7c 60 1b 79     mr.     r0,r3
    1e20:       41 82 00 64     beq-    1e84 <build_skb+0xa8>
    1e24:       7c 1f 03 78     mr      r31,r0
    1e28:       38 80 00 00     li      r4,0                                                                                                                                          
    1e2c:       38 a0 00 a0     li      r5,160                                                                                                                                        
    1e30:       48 00 00 01     bl      1e30 <build_skb+0x54>                                                                                                                         
    1e34:       81 3f 00 88     lwz     r9,136(r31)                                                                                                                                   
    1e38:       31 7c ff ff     addic   r11,r28,-1                                                                                                                                    
    1e3c:       7c 0b e1 10     subfe   r0,r11,r28                                                                                                                                    
    1e40:       3b 80 00 01     li      r28,1                                                                                                                                         
    1e44:       50 09 d1 4a     rlwimi  r9,r0,26,5,5                                                                                                                                  
    1e48:       38 1d 00 c0     addi    r0,r29,192                                                                                                                                    
    1e4c:       90 1f 00 b0     stw     r0,176(r31)                                                                                                                                   
    1e50:       3b bd ff 40     addi    r29,r29,-192                                                                                                                                  
    1e54:       91 3f 00 88     stw     r9,136(r31)                                                                                                                                   
    1e58:       93 9f 00 b4     stw     r28,180(r31)                                                                                                                                  
    1e5c:       7f be ea 14     add     r29,r30,r29                                                                                                                                   
    1e60:       93 df 00 a8     stw     r30,168(r31)                                                                                                                                  
    1e64:       38 80 00 00     li      r4,0                                                                                                                                          
    1e68:       93 df 00 ac     stw     r30,172(r31)                                                                                                                                  
    1e6c:       7f a3 eb 78     mr      r3,r29                                                                                                                                        
    1e70:       38 a0 00 24     li      r5,36                                                                                                                                         
    1e74:       93 df 00 a0     stw     r30,160(r31)                                                                                                                                  
    1e78:       93 bf 00 a4     stw     r29,164(r31)                                                                                                                                  
    1e7c:       48 00 00 01     bl      1e7c <build_skb+0xa0>                                                                                                                         
    1e80:       93 9d 00 24     stw     r28,36(r29)                                                                                                                                   
    1e84:       80 01 00 1c     lwz     r0,28(r1)                                                                                                                                     
    1e88:       7f e3 fb 78     mr      r3,r31                                                                                                                                        
    1e8c:       bb 81 00 08     lmw     r28,8(r1)                                                                                                                                     
    1e90:       38 21 00 18     addi    r1,r1,24                                                                                                                                      
    1e94:       7c 08 03 a6     mtlr    r0                                                                                                                                            
    1e98:       4e 80 00 20     blr       

Working on pulling down and building Linus's tree.

Thanks!
Comment 11 Barry G 2012-10-14 06:29:59 UTC
I built Linus's version at 3d6ee36df and I was unable to reproduce
the issue.  It does appear to be fixed in upstream.

Anything more we care about or should we write this off to skb recycling?

Thanks,

Barry
Comment 12 Eric Dumazet 2012-10-14 08:37:05 UTC
1) You eventually could try Linus tree right before commit acb600def2110,

or 

2) revert acb600def2110 (ie skb recycling coming back) and check if bug triggers again.

or 

3) backport acb600def2110 to 3.6.2 and check bug is fixed.

Thanks
Comment 13 Barry G 2012-10-14 17:34:06 UTC
I did (2) and it OOPs right away again.

So in summary for anyone looking at this bug later, on powerpc a bug was introduced in a1c7fff7e18f59e684e07b0f9a770561cd39f395 that caused OOPses when dealing with powerpc skb buffers.  I could only ever hit it using Openvswitch.  This was fixed in acb600def2110b1310466c0e485c0d26299898ae which should make it into 3.7.

Thanks Eric and pshelar!