Bug 10693 - sky2: driver-specific VLAN support is broken with "Marvell 88E8053 PCI-E Gigabit Ethernet Controller (rev 22)"
Summary: sky2: driver-specific VLAN support is broken with "Marvell 88E8053 PCI-E Giga...
Status: RESOLVED DUPLICATE of bug 9606
Alias: None
Product: Drivers
Classification: Unclassified
Component: Network (show other bugs)
Hardware: All Linux
: P1 high
Assignee: Stephen Hemminger
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-05-14 01:50 UTC by Aurélien Géron
Modified: 2008-09-16 11:46 UTC (History)
0 users

See Also:
Kernel Version: 2.6.24.4
Subsystem:
Regression: ---
Bisected commit-id:


Attachments

Description Aurélien Géron 2008-05-14 01:50:59 UTC
Note: this bug looks a lot like Bug #9606 but does not seem to be exactly the same because in my case everything works fine for some time, then randomly hangs.

Latest working kernel version: Unknown
Earliest failing kernel version: Unknown.
Distribution: Debian etch
Hardware Environment:
Intel Mobile CPU, and 2 Marvell Gigabit Ethernet controllers (eth0 and eth1).

lspci details:
00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS, 943/940GML and 945GT Express Memory Controller Hub (rev 03)
00:02.0 VGA compatible controller: Intel Corporation Mobile 945GM/GMS, 943/940GML Express Integrated Graphics Controller (rev 03)
00:02.1 Display controller: Intel Corporation Mobile 945GM/GMS/GME, 943/940GML Express Integrated Graphics Controller (rev 03)
00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 (rev 02)
00:1c.1 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 2 (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #3 (rev 02)
00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #4 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI Controller (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e2)
00:1e.2 Multimedia audio controller: Intel Corporation 82801G (ICH7 Family) AC'97 Audio Controller (rev 02)
00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface Bridge (rev 02)
00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller (rev 02)
00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 02)
01:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (rev 22)
02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (rev 22)

Software Environment: bash

Problem Description:
I configured multiple VLANs on eth1 (vlan161 to vlan166 + vlan170, using /etc/network/interfaces).  Everything works fine for some time (a few minutes up to a few days), then for some unknown reason the sky2 driver suddenly hangs and restarts.  Unfortunately, the VLAN support seems broken after the sky2 restarts.  I get the following dmesg output:

May 13 14:07:42 wibox kernel: sky2 eth1: hung mac 0:124 fifo 195 (115:110)
May 13 14:07:42 wibox kernel: sky2 eth1: receiver hang detected
May 13 14:07:42 wibox kernel: sky2 eth1: disabling interface
May 13 14:07:42 wibox kernel: sky2 eth1: enabling interface
May 13 14:07:44 wibox kernel: sky2 eth1: Link is up at 100 Mbps, full duplex, flow control rx
May 13 14:08:14 wibox kernel: sky2 eth1: rx length error: status 0x402300 length 64
May 13 14:08:14 wibox last message repeated 5 times
May 13 14:08:14 wibox kernel: sky2 eth1: rx length error: status 0x522100 length 82
May 13 14:08:14 wibox kernel: sky2 eth1: rx length error: status 0x402300 length 64
[...]

Steps to reproduce:
Configure some VLANs on a sky2-managed Gigabit Ethernet port, and manage to get the sky2 driver to hang and automatically restart (I don't know how to force the sky2 driver to hang, I would just flow some trafic through the VLANs for some time, but there is probably a better way).

Thanks for your help.
Comment 1 Aurélien Géron 2008-05-19 04:34:02 UTC
Hi,

I tried to find the bug in the source code, and I think I may have found the answer, but I do not know how to fix the problem, yet.

Basically, it seems that everything is properly initialized, including VLAN tags associated to each interface.  But if for some reason the sky2 watchdog detects a hang, it restarts the interface, but it forgets to set the VLANs again.  From then on, all packets received are rejected because they are tagged and the sky2 driver excepts untagged packets (hence the "rx length" error message).

Therefore, after any hang, the watchdog does not restart the interface properly when VLAN tagging is used.

In sky2.c (line 2195), the error message "%s: rx length error: status %#x length %d\n" is displayed only if (line 2177) length != count (actual length different than expected length).  The VLAN ID bytes are taken into account (on line 2151) like this:
#ifdef SKY2_VLAN_TAG_USED
        /* Account for vlan tag */
        if (sky2->vlgrp && (status & GMR_FS_VLAN))
                count -= VLAN_HLEN;
#endif

In my error messages, I can read that the status is equal to 0x402300 or 0x522100, for example (see above), and therefore I known that (status & GMR_FS_VLAN) is TRUE (GMR_FS_VLAN is equal to 1<<13).  Since I get rx length errors, I believe that the count does not take into account the VLAN header bytes, and I think that the only possibility for this to happen is if sky2->vlgrp is NULL.

Apparently, sky2->vlgrp gets set properly upon driver initialization, but it gets unset when the sky2 watchdog restarts the device.

sky2->vlgrp seems to be set only in function sky2_vlan_rx_register (on line 1155).

And function sky2_vlan_rx_register gets called only in sky2_init_netdev (on line 4011):
#ifdef SKY2_VLAN_TAG_USED
        /* The workaround for FE+ status conflicts with VLAN tag detection. */
        if (!(sky2->hw->chip_id == CHIP_ID_YUKON_FE_P &&
              sky2->hw->chip_rev == CHIP_REV_YU_FE2_A0)) {
                dev->features |= NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX;
                dev->vlan_rx_register = sky2_vlan_rx_register;
        }
#endif

So the solution might be to cut and paste this code from sky2_init_netdev (which does not seem to be called when the sky2 watchdog restarts the device) into sky2_start (which is called both upon initialization of the device, and when the watchdog restarts the device) ?

Unfortunately, I cannot test this right now because my server is a remote server and I cannot risk to loose the connection to it.  I will try to get a similar server up and running for testing soon.

Does this seem like a reasonable explanation for this bug?
Comment 2 Aurélien Géron 2008-05-19 04:40:37 UTC
Sorry for the typos.  I meant "the sky2 driver *expects* untagged packets" (not "excepts") and "sky2_up" rather than "sky2_start".
Comment 3 Stephen Hemminger 2008-09-16 11:46:40 UTC

*** This bug has been marked as a duplicate of bug 9606 ***

Note You need to log in before you can comment on or make changes to this bug.