Hi there, I recently found that packets tagged with a vlan id of 0 are no longer received on the main interface. There were no dmesg entries on the dropped packets. I attempted to setup a vlan 0 interface and configure it, but couldn't successfully route traffic to the device. I can recreate this on two of the three networking devices I have, my guess is that the third does successfully handle hardware acceleration of vlan tags. After a bisection this appears to be related to commit bcc6d47903612c3861201cc3a866fb604f26b8b2, which seems to try to merge the non-hardware accelerated and hardware accelerated code paths for handling vlans. In the process, it appears vlan id 0 (802.1p) packets are no longer handled correctly. Unfortunately I don't know the code paths well enough to figure out what's going wrong, but I'd be happy to provide more information, run tests or try out patches if it would help, just let me know. Thanks... 5:) Mike 5:)
(switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Sun, 14 Aug 2011 12:48:16 GMT bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=41152 > > Summary: kernel 3.0 and above fails to handle vlan id 0 > (802.1p) packets properly without hardware > acceleration > Product: Networking > Version: 2.5 > Kernel Version: 3.0 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Other > AssignedTo: acme@ghostprotocols.net > ReportedBy: mike.auty@gmail.com > Regression: Yes > > > Hi there, > > I recently found that packets tagged with a vlan id of 0 are no longer > received > on the main interface. There were no dmesg entries on the dropped packets. > I > attempted to setup a vlan 0 interface and configure it, but couldn't > successfully route traffic to the device. I can recreate this on two of the > three networking devices I have, my guess is that the third does successfully > handle hardware acceleration of vlan tags. > > After a bisection this appears to be related to commit > bcc6d47903612c3861201cc3a866fb604f26b8b2, which seems to try to merge the > non-hardware accelerated and hardware accelerated code paths for handling > vlans. In the process, it appears vlan id 0 (802.1p) packets are no longer > handled correctly. > > Unfortunately I don't know the code paths well enough to figure out what's > going wrong, but I'd be happy to provide more information, run tests or try > out > patches if it would help, just let me know. Thanks... 5:) > > Mike 5:)
Wed, Aug 17, 2011 at 12:09:18AM CEST, akpm@linux-foundation.org wrote: > >(switched to email. Please respond via emailed reply-to-all, not via the >bugzilla web interface). > >On Sun, 14 Aug 2011 12:48:16 GMT >bugzilla-daemon@bugzilla.kernel.org wrote: > >> https://bugzilla.kernel.org/show_bug.cgi?id=41152 >> >> Summary: kernel 3.0 and above fails to handle vlan id 0 >> (802.1p) packets properly without hardware >> acceleration >> Product: Networking >> Version: 2.5 >> Kernel Version: 3.0 >> Platform: All >> OS/Version: Linux >> Tree: Mainline >> Status: NEW >> Severity: normal >> Priority: P1 >> Component: Other >> AssignedTo: acme@ghostprotocols.net >> ReportedBy: mike.auty@gmail.com >> Regression: Yes >> >> >> Hi there, >> >> I recently found that packets tagged with a vlan id of 0 are no longer >> received >> on the main interface. There were no dmesg entries on the dropped packets. >> I >> attempted to setup a vlan 0 interface and configure it, but couldn't >> successfully route traffic to the device. I can recreate this on two of the >> three networking devices I have, my guess is that the third does >> successfully >> handle hardware acceleration of vlan tags. >> >> After a bisection this appears to be related to commit >> bcc6d47903612c3861201cc3a866fb604f26b8b2, which seems to try to merge the >> non-hardware accelerated and hardware accelerated code paths for handling >> vlans. In the process, it appears vlan id 0 (802.1p) packets are no longer >> handled correctly. >> >> Unfortunately I don't know the code paths well enough to figure out what's >> going wrong, but I'd be happy to provide more information, run tests or try >> out >> patches if it would help, just let me know. Thanks... 5:) >> >> Mike 5:) Hi Mike. May I ask what NIC are you seeing the regression on? It may have something to do with dev->vlangrp and ndo_vlan_add/kill_vid. VID 0 was recently only added by the latter ones. So if driver only depended on dev->vlangrp, 0 was not there. This was changed recently by my "vlan cleanup" patches. It may work for you again on net-next today. Jirka >
On 17/08/11 06:37, Jiri Pirko wrote: > > Hi Mike. May I ask what NIC are you seeing the regression on? > It may have something to do with dev->vlangrp and ndo_vlan_add/kill_vid. > VID 0 was recently only added by the latter ones. So if driver only > depended on dev->vlangrp, 0 was not there. This was changed recently by > my "vlan cleanup" patches. It may work for you again on net-next today. > Hi there, I'm finding it on the following two NICs: 02:00.0 Ethernet controller: Attansic Technology Corp. L1 Gigabit Ethernet Adapter (rev b0) 05:01.0 Network controller [0280]: Broadcom Corporation BCM4306 802.11b/g Wireless LAN Controller [14e4:4320] (rev 03) and (on a different machine): 02:00.0 Network controller [0280]: Intel Corporation WiFi Link 6000 Series [8086:422c] (rev 35) The only one I have had any success with is: 00:19.0 Ethernet controller [0200]: Intel Corporation 82577LM Gigabit Network Connection [8086:10ea] (rev 05) Which I assume is because it has actual hardware acceleration. I may be able to put net-next on the Intel Wifi box for testing at some point, but I don't know how soon I'll be able to do that. Please let me know whether that would be a worthwhile test, and if so I'll try and get it done. Thanks... Mike 5:)
Wed, Aug 17, 2011 at 08:36:15AM CEST, mike.auty@gmail.com wrote: >On 17/08/11 06:37, Jiri Pirko wrote: >> >> Hi Mike. May I ask what NIC are you seeing the regression on? >> It may have something to do with dev->vlangrp and ndo_vlan_add/kill_vid. >> VID 0 was recently only added by the latter ones. So if driver only >> depended on dev->vlangrp, 0 was not there. This was changed recently by >> my "vlan cleanup" patches. It may work for you again on net-next today. >> > >Hi there, > >I'm finding it on the following two NICs: > >02:00.0 Ethernet controller: Attansic Technology Corp. L1 Gigabit >Ethernet Adapter (rev b0) > >05:01.0 Network controller [0280]: Broadcom Corporation BCM4306 >802.11b/g Wireless LAN Controller [14e4:4320] (rev 03) > >and (on a different machine): > >02:00.0 Network controller [0280]: Intel Corporation WiFi Link 6000 >Series [8086:422c] (rev 35) I just obtained very similar card (8086:422b). Going to look at it right away. One more thing. What do you use to generate vlan0 tagged packets? I'm using pktgen with "vlan_id 0". Would you please try that it behaves the same for you? > >The only one I have had any success with is: > >00:19.0 Ethernet controller [0200]: Intel Corporation 82577LM Gigabit >Network Connection [8086:10ea] (rev 05) > >Which I assume is because it has actual hardware acceleration. I may be >able to put net-next on the Intel Wifi box for testing at some point, >but I don't know how soon I'll be able to do that. Please let me know >whether that would be a worthwhile test, and if so I'll try and get it >done. Thanks... > >Mike 5:) >
Wed, Aug 17, 2011 at 12:59:51PM CEST, jpirko@redhat.com wrote: >Wed, Aug 17, 2011 at 08:36:15AM CEST, mike.auty@gmail.com wrote: >>On 17/08/11 06:37, Jiri Pirko wrote: >>> >>> Hi Mike. May I ask what NIC are you seeing the regression on? >>> It may have something to do with dev->vlangrp and ndo_vlan_add/kill_vid. >>> VID 0 was recently only added by the latter ones. So if driver only >>> depended on dev->vlangrp, 0 was not there. This was changed recently by >>> my "vlan cleanup" patches. It may work for you again on net-next today. >>> >> >>Hi there, >> >>I'm finding it on the following two NICs: >> >>02:00.0 Ethernet controller: Attansic Technology Corp. L1 Gigabit >>Ethernet Adapter (rev b0) >> >>05:01.0 Network controller [0280]: Broadcom Corporation BCM4306 >>802.11b/g Wireless LAN Controller [14e4:4320] (rev 03) >> >>and (on a different machine): >> >>02:00.0 Network controller [0280]: Intel Corporation WiFi Link 6000 >>Series [8086:422c] (rev 35) > >I just obtained very similar card (8086:422b). Going to look at it right >away. > >One more thing. What do you use to generate vlan0 tagged packets? I'm >using pktgen with "vlan_id 0". Would you please try that it behaves the >same for you? I'm using following pktgen script: http://pastebin.com/E3f4R8XY On rx site with Intel wireless card I use following stap script to observe incoming packets: http://pastebin.com/VeXLhauu All is looking good there. What do you use to look at incoming packets? Jirka > >> >>The only one I have had any success with is: >> >>00:19.0 Ethernet controller [0200]: Intel Corporation 82577LM Gigabit >>Network Connection [8086:10ea] (rev 05) >> >>Which I assume is because it has actual hardware acceleration. I may be >>able to put net-next on the Intel Wifi box for testing at some point, >>but I don't know how soon I'll be able to do that. Please let me know >>whether that would be a worthwhile test, and if so I'll try and get it >>done. Thanks... >> >>Mike 5:) >>
On 17/08/11 11:59, Jiri Pirko wrote: > > I just obtained very similar card (8086:422b). Going to look at it right > away. > > One more thing. What do you use to generate vlan0 tagged packets? I'm > using pktgen with "vlan_id 0". Would you please try that it behaves the > same for you? > Sorry, I haven't been using pktgen. I've got an actual device (a Samsung android phone) which seems to tag all normal outbound packets with this type of vlan tag. I only discovered a month ago that I needed the 8021q module to be able to talk to it, and then suddenly it stopped working once I moved to the 3.0 kernel. I might not have made it clear, but the packets are received (in so much as the packet is definitely sent, and it's seen by tools such as wireshark), but no reply is ever sent. I've attached packet logs from the 3.0.1 kernel and the 2.6.39.3 kernel. Oddly the tagging only seems to be used on the first SYN,ACK packet, but again I don't know enough about the pipeline or what the Samsung kernel's doing to cause that. I hope that's of some help? I may be able to get systemtap support rolled into my kernel tomorrow at some point, but if not then it will have to wait until the weekend. I don't know if that will provide useful information for debugging this, but I am happy to run whatever tests I can to figure this out... Mike 5:)
Thu, Aug 18, 2011 at 12:48:41AM CEST, mike.auty@gmail.com wrote: >On 17/08/11 11:59, Jiri Pirko wrote: >> >> I just obtained very similar card (8086:422b). Going to look at it right >> away. >> >> One more thing. What do you use to generate vlan0 tagged packets? I'm >> using pktgen with "vlan_id 0". Would you please try that it behaves the >> same for you? >> > >Sorry, I haven't been using pktgen. I've got an actual device (a >Samsung android phone) which seems to tag all normal outbound packets >with this type of vlan tag. I only discovered a month ago that I needed >the 8021q module to be able to talk to it, and then suddenly it stopped >working once I moved to the 3.0 kernel. > >I might not have made it clear, but the packets are received (in so much >as the packet is definitely sent, and it's seen by tools such as >wireshark), but no reply is ever sent. I've attached packet logs from >the 3.0.1 kernel and the 2.6.39.3 kernel. Oddly the tagging only seems >to be used on the first SYN,ACK packet, but again I don't know enough >about the pipeline or what the Samsung kernel's doing to cause that. > >I hope that's of some help? I may be able to get systemtap support >rolled into my kernel tomorrow at some point, but if not then it will >have to wait until the weekend. I don't know if that will provide >useful information for debugging this, but I am happy to run whatever >tests I can to figure this out... > >Mike 5:) Patch posted: http://patchwork.ozlabs.org/patch/110535/ sorry I forgot to cc you Mike. Thanks a lot for report! Jirka
On 18/08/11 17:37, Jiri Pirko wrote: > > Patch posted: > http://patchwork.ozlabs.org/patch/110535/ > > sorry I forgot to cc you Mike. Thanks a lot for report! No problem, Thanks very much for the speedy fix! I've applied the patch and can confirm it solves my problem. I look forward to seeing it hit the mainline... 5:) Mike 5:)
Patch: http://patchwork.ozlabs.org/patch/110535/
Fixed by commit c5114cd59d2664f258b0d021d79b1532d94bdc2b .