Latest working kernel version: 2.6.25 Earliest failing kernel version: 2.6.28-rc3 Distribution: Debian Hardware Environment: Dell Vostro 200 Software Environment: Debian Lenny Problem Description: I initially signaled this bug in debian BTS, see http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=497392 The e1000e kernel module does not initialize correctly, then the network cannot initialize. It does not complain, but packets do not exit correctly. The port's LED on switch blinks quickly all time, IPv6 RA's are recevied by the kernel, but no DHCP request packet reaches router, nor any ICMP one after manual configuration. rmmod / modprobe module *sometimes* makes it working correctly, then the network interface is bringed up, DHCP responds, and communication can be established. The problem appeared first in Debian 2.6.26-3 kernel, but I reproduced it with stock 2.6.28-rc3, as requested by Bastian Blank. Steps to reproduce: Boot the computer When the module is inserted, nothing complains, the RAs are recevied, but ipv4 cannot be used on the interface. DHCP does not respond, and manual configuration does not allow packets to reach another computer. tcpdump on the router does not show any packet going from the computer during DHCP or after manual configuration. After a few(exact number may change) rmmod's & modprobe's, IPv4 can goes through interface.
I have the same problem, also on a vostro 200, replugging the cable sometimes solves the problem: Settings for eth0: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full Advertised auto-negotiation: Yes Speed: 100Mb/s Duplex: Full Port: Twisted Pair PHYAD: 1 Transceiver: internal Auto-negotiation: on Supports Wake-on: pumbag Wake-on: g Current message level: 0x00000001 (1) Link detected: yes root@ubuntu:/home/tlb# ethtool -e eth0 length 256 Offset Values ------ ------ 0x0000 00 21 70 08 dd 4f 00 08 ff ff 12 10 ff ff ff ff 0x0010 ff ff ff ff c3 10 38 02 28 10 c0 10 86 80 00 00 0x0020 02 04 00 00 00 00 85 86 20 00 00 00 00 00 07 00 0x0030 84 06 40 2b 43 00 04 00 ad ba ad ba be 10 bf 10 0x0040 ad ba 4c 29 bd 10 ad ba 00 00 00 00 00 00 00 00 0x0050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0060 00 01 00 40 32 12 07 40 ff ff ff ff ff ff ff ff 0x0070 ff ff ff ff ff ff ff ff ff ff ff ff ff ff d6 de 0x0080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x00a0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0x00b0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0x00c0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0x00d0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0x00e0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0x00f0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff root@ubuntu:/home/tlb# cat /proc/interrupts CPU0 CPU1 0: 30 1 IO-APIC-edge timer 1: 1 1 IO-APIC-edge i8042 8: 26 27 IO-APIC-edge rtc0 9: 0 0 IO-APIC-fasteoi acpi 12: 2 2 IO-APIC-edge i8042 16: 0 0 IO-APIC-fasteoi uhci_hcd:usb1 18: 1 2 IO-APIC-fasteoi ehci_hcd:usb2, uhci_hcd:usb8 19: 9659 9705 IO-APIC-fasteoi uhci_hcd:usb5, uhci_hcd:usb7, ata_piix, ata_piix 21: 478 488 IO-APIC-fasteoi uhci_hcd:usb3 22: 88 89 IO-APIC-fasteoi HDA Intel 23: 0 0 IO-APIC-fasteoi ehci_hcd:usb4, uhci_hcd:usb6 25: 1327 1272 PCI-MSI-edge eth0 NMI: 0 0 Non-maskable interrupts LOC: 11532 7848 Local timer interrupts RES: 1144 971 Rescheduling interrupts CAL: 83 51 Function call interrupts TLB: 927 820 TLB shootdowns SPU: 0 0 Spurious interrupts ERR: 0 MIS: 0 cat /var/log/dmesg x0] pci_bus 0000:03: resource 0 io: [0x3000-0x4fff] pci_bus 0000:03: resource 1 mem: [0xec000000-0xedffffff] pci_bus 0000:03: resource 2 mem: [0xe4000000-0xe40fffff] pci_bus 0000:03: resource 3 mem: [0x0-0x0] pci_bus 0000:04: resource 0 io: [0x5000-0x6fff] pci_bus 0000:04: resource 1 mem: [0xe8000000-0xe9ffffff] pci_bus 0000:04: resource 2 mem: [0xe4100000-0xe41fffff] pci_bus 0000:04: resource 3 mem: [0x0-0x0] pci_bus 0000:0c: resource 0 io: [0x7000-0x8fff] pci_bus 0000:0c: resource 1 mem: [0xea000000-0xebffffff] pci_bus 0000:0c: resource 2 mem: [0xe4200000-0xe42fffff] pci_bus 0000:0c: resource 3 mem: [0x0-0x0] pci_bus 0000:15: resource 0 io: [0x9000-0xcfff] pci_bus 0000:15: resource 1 mem: [0xe4300000-0xe7ffffff] pci_bus 0000:15: resource 2 mem: [0xe0000000-0xe3ffffff] pci_bus 0000:15: resource 3 io: [0x00-0xffff] pci_bus 0000:15: resource 4 mem: [0x000000-0xffffffff] pci_bus 0000:16: resource 0 io: [0x9000-0x90ff] pci_bus 0000:16: resource 1 io: [0x9400-0x94ff] pci_bus 0000:16: resource 2 mem: [0xe0000000-0xe3ffffff] pci_bus 0000:16: resource 3 mem: [0x88000000-0x8bffffff] NET: Registered protocol family 2 IP route cache hash table entries: 32768 (order: 5, 131072 bytes) TCP established hash table entries: 131072 (order: 8, 1048576 bytes) TCP bind hash table entries: 65536 (order: 7, 524288 bytes) TCP: Hash tables configured (established 131072 bind 65536) TCP reno registered NET: Registered protocol family 1 checking if image is initramfs... it is Freeing initrd memory: 4732k freed Simple Boot Flag at 0x35 set to 0x1 Machine check exception polling timer started. Scanning for low memory corruption every 60 seconds audit: initializing netlink socket (disabled) type=2000 audit(1232711864.715:1): initialized highmem bounce pool size: 64 pages HugeTLB registered 4 MB page size, pre-allocated 0 pages VFS: Disk quotas dquot_6.5.2 Dquot-cache hash table entries: 1024 (order 0, 4096 bytes) Created ptmx node in devpts ino 2 msgmni has been set to 1712 Block layer SCSI generic (bsg) driver version 0.4 loaded (major 254) io scheduler noop registered io scheduler cfq registered (default) pci 0000:00:02.0: Boot video device Linux agpgart interface v0.103 agpgart-intel 0000:00:00.0: Intel 945GM Chipset agpgart-intel 0000:00:00.0: detected 7932K stolen memory agpgart-intel 0000:00:00.0: AGP aperture is 256M @ 0xd0000000 brd: module loaded PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 0x60,0x64 irq 1,12 serio: i8042 KBD port at 0x60,0x64 irq 1 serio: i8042 AUX port at 0x60,0x64 irq 12 mice: PS/2 mouse device common for all mice cpuidle: using governor ladder cpuidle: using governor menu Using IPI No-Shortcut mode registered taskstats version 1 Freeing unused kernel memory: 292k freed fuse init (API version 7.11) ACPI: SSDT 7F6F1D26, 0240 (r1 PmRef Cpu0Ist 100 INTL 20050513) ACPI: SSDT 7F6F1FEB, 065A (r1 PmRef Cpu0Cst 100 INTL 20050513) ACPI: CPU0 (power states: C1[C1] C2[C2] C3[C3]) processor ACPI_CPU:00: registered as cooling_device0 ACPI: Processor [CPU0] (supports 8 throttling states) ACPI: SSDT 7F6F1C5E, 00C8 (r1 PmRef Cpu1Ist 100 INTL 20050513) ACPI: SSDT 7F6F1F66, 0085 (r1 PmRef Cpu1Cst 100 INTL 20050513) ACPI: CPU1 (power states: C1[C1] C2[C2] C3[C3]) processor ACPI_CPU:01: registered as cooling_device1 ACPI: Processor [CPU1] (supports 8 throttling states) thermal LNXTHERM:01: registered as thermal_zone0 ACPI: Thermal Zone [THM0] (33 C) thermal LNXTHERM:02: registered as thermal_zone1 ACPI: Thermal Zone [THM1] (36 C) IBM TrackPoint firmware: 0x0e, buttons: 3/3 input: TPPS/2 IBM TrackPoint as /devices/platform/i8042/serio1/input/input0 e1000e: Intel(R) PRO/1000 Network Driver - 0.3.3.3-k6 e1000e: Copyright (c) 1999-2008 Intel Corporation. e1000e 0000:02:00.0: Disabling L1 ASPM e1000e 0000:02:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16 e1000e 0000:02:00.0: setting latency timer to 64 0000:02:00.0: 0000:02:00.0: Failed to initialize MSI interrupts. Falling back to legacy interrupts. input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input1 usbcore: registered new interface driver usbfs usbcore: registered new interface driver hub usbcore: registered new device driver usb uhci_hcd: USB Universal Host Controller Interface driver uhci_hcd 0000:00:1d.0: power state changed by ACPI to D0 uhci_hcd 0000:00:1d.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16 uhci_hcd 0000:00:1d.0: setting latency timer to 64 uhci_hcd 0000:00:1d.0: UHCI Host Controller uhci_hcd 0000:00:1d.0: new USB bus registered, assigned bus number 1 uhci_hcd 0000:00:1d.0: irq 16, io base 0x00001820 usb usb1: New USB device found, idVendor=1d6b, idProduct=0001 usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1 usb usb1: Product: UHCI Host Controller usb usb1: Manufacturer: Linux 2.6.29-rc2-custom uhci_hcd usb usb1: SerialNumber: 0000:00:1d.0 usb usb1: configuration #1 chosen from 1 choice hub 1-0:1.0: USB hub found hub 1-0:1.0: 2 ports detected uhci_hcd 0000:00:1d.1: PCI INT B -> GSI 17 (level, low) -> IRQ 17 uhci_hcd 0000:00:1d.1: setting latency timer to 64 uhci_hcd 0000:00:1d.1: UHCI Host Controller uhci_hcd 0000:00:1d.1: new USB bus registered, assigned bus number 2 uhci_hcd 0000:00:1d.1: irq 17, io base 0x00001840 usb usb2: New USB device found, idVendor=1d6b, idProduct=0001 usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1 usb usb2: Product: UHCI Host Controller usb usb2: Manufacturer: Linux 2.6.29-rc2-custom uhci_hcd usb usb2: SerialNumber: 0000:00:1d.1 usb usb2: configuration #1 chosen from 1 choice hub 2-0:1.0: USB hub found hub 2-0:1.0: 2 ports detected ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver Warning! ehci_hcd should always be loaded before uhci_hcd and ohci_hcd, not after uhci_hcd 0000:00:1d.2: power state changed by ACPI to D0 uhci_hcd 0000:00:1d.2: PCI INT C -> GSI 18 (level, low) -> IRQ 18 uhci_hcd 0000:00:1d.2: setting latency timer to 64 uhci_hcd 0000:00:1d.2: UHCI Host Controller uhci_hcd 0000:00:1d.2: new USB bus registered, assigned bus number 3 uhci_hcd 0000:00:1d.2: irq 18, io base 0x00001860 usb usb3: New USB device found, idVendor=1d6b, idProduct=0001 usb usb3: New USB device strings: Mfr=3, Product=2, SerialNumber=1 usb usb3: Product: UHCI Host Controller usb usb3: Manufacturer: Linux 2.6.29-rc2-custom uhci_hcd usb usb3: SerialNumber: 0000:00:1d.2 usb usb3: configuration #1 chosen from 1 choice hub 3-0:1.0: USB hub found hub 3-0:1.0: 2 ports detected ehci_hcd 0000:00:1d.7: power state changed by ACPI to D0 ehci_hcd 0000:00:1d.7: PCI INT D -> GSI 19 (level, low) -> IRQ 19 ehci_hcd 0000:00:1d.7: setting latency timer to 64 ehci_hcd 0000:00:1d.7: EHCI Host Controller ehci_hcd 0000:00:1d.7: new USB bus registered, assigned bus number 4 ehci_hcd 0000:00:1d.7: debug port 1 ehci_hcd 0000:00:1d.7: cache line size of 32 is not supported ehci_hcd 0000:00:1d.7: irq 19, io mem 0xee444000 ehci_hcd 0000:00:1d.7: USB 2.0 started, EHCI 1.00 usb usb4: New USB device found, idVendor=1d6b, idProduct=0002 usb usb4: New USB device strings: Mfr=3, Product=2, SerialNumber=1 usb usb4: Product: EHCI Host Controller usb usb4: Manufacturer: Linux 2.6.29-rc2-custom ehci_hcd usb usb4: SerialNumber: 0000:00:1d.7 usb usb4: configuration #1 chosen from 1 choice hub 4-0:1.0: USB hub found hub 4-0:1.0: 8 ports detected uhci_hcd 0000:00:1d.3: PCI INT D -> GSI 19 (level, low) -> IRQ 19 uhci_hcd 0000:00:1d.3: setting latency timer to 64 uhci_hcd 0000:00:1d.3: UHCI Host Controller uhci_hcd 0000:00:1d.3: new USB bus registered, assigned bus number 5 uhci_hcd 0000:00:1d.3: irq 19, io base 0x00001880 usb usb5: New USB device found, idVendor=1d6b, idProduct=0001 usb usb5: New USB device strings: Mfr=3, Product=2, SerialNumber=1 usb usb5: Product: UHCI Host Controller usb usb5: Manufacturer: Linux 2.6.29-rc2-custom uhci_hcd usb usb5: SerialNumber: 0000:00:1d.3 usb usb5: configuration #1 chosen from 1 choice hub 5-0:1.0: USB hub found hub 5-0:1.0: 2 ports detected e1000e 0000:02:00.0: Warning: detected ASPM enabled in EEPROM SCSI subsystem initialized libata version 3.00 loaded. ahci 0000:00:1f.2: version 3.0 ahci 0000:00:1f.2: PCI INT B -> GSI 16 (level, low) -> IRQ 16 ahci 0000:00:1f.2: AHCI 0001.0100 32 slots 4 ports 1.5 Gbps 0x1 impl SATA mode ahci 0000:00:1f.2: flags: 64bit ncq pm led clo pio slum part ahci 0000:00:1f.2: setting latency timer to 64 scsi0 : ahci scsi1 : ahci scsi2 : ahci scsi3 : ahci ata1: SATA max UDMA/133 abar m1024@0xee444400 port 0xee444500 irq 16 ata2: DUMMY ata3: DUMMY ata4: DUMMY 0000:02:00.0: eth0: (PCI Express:2.5GB/s:Width x1) 00:16:d3:3b:a6:18 0000:02:00.0: eth0: Intel(R) PRO/1000 Network Connection 0000:02:00.0: eth0: MAC: 2, PHY: 2, PBA No: 005302-003 ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata1.00: ACPI cmd ef/02:00:00:00:00:a0 succeeded ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out ata1.00: ACPI cmd ef/5f:00:00:00:00:a0 succeeded ata1.00: ACPI cmd ef/10:03:00:00:00:a0 filtered out usb 1-1: new full speed USB device using uhci_hcd and address 2 ata1.00: ATA-7: HTS721010G9SA00, MCZIC10V, max UDMA/100 ata1.00: 195371568 sectors, multi 16: LBA48 ata1.00: ACPI cmd ef/02:00:00:00:00:a0 succeeded ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out ata1.00: ACPI cmd ef/5f:00:00:00:00:a0 succeeded ata1.00: ACPI cmd ef/10:03:00:00:00:a0 filtered out ata1.00: configured for UDMA/100 ata1.00: configured for UDMA/100 ata1: EH complete scsi 0:0:0:0: Direct-Access ATA HTS721010G9SA00 MCZI PQ: 0 ANSI: 5 pata_acpi 0000:00:1f.1: PCI INT C -> GSI 16 (level, low) -> IRQ 16 pata_acpi 0000:00:1f.1: setting latency timer to 64 pata_acpi 0000:00:1f.1: PCI INT C disabled scsi 0:0:0:0: Attached scsi generic sg0 type 0 ata_piix 0000:00:1f.1: version 2.12 ata_piix 0000:00:1f.1: PCI INT C -> GSI 16 (level, low) -> IRQ 16 ata_piix 0000:00:1f.1: setting latency timer to 64 scsi4 : ata_piix scsi5 : ata_piix ata5: PATA max UDMA/100 cmd 0x1f0 ctl 0x3f6 bmdma 0x1810 irq 14 ata6: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0x1818 irq 15 Driver 'sd' needs updating - please use bus_type methods sd 0:0:0:0: [sda] 195371568 512-byte hardware sectors: (100 GB/93.1 GiB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 0:0:0:0: [sda] 195371568 512-byte hardware sectors: (100 GB/93.1 GiB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sda:<6>usb 1-1: New USB device found, idVendor=056a, idProduct=00b2 usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0 usb 1-1: Product: PTZ-930 usb 1-1: Manufacturer: Tablet usb 1-1: configuration #1 chosen from 1 choice usbcore: registered new interface driver hiddev usbcore: registered new interface driver usbhid usbhid: v2.6:USB HID core driver ata6: port disabled. ignoring. usb 3-1: new low speed USB device using uhci_hcd and address 2 sda1 sda2 sd 0:0:0:0: [sda] Attached SCSI disk usb 3-1: New USB device found, idVendor=413c, idProduct=2003 usb 3-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0 usb 3-1: Product: Dell USB Keyboard usb 3-1: Manufacturer: Dell usb 3-1: configuration #1 chosen from 1 choice input: Dell Dell USB Keyboard as /devices/pci0000:00/0000:00:1d.2/usb3/3-1/3-1:1.0/input/input2 generic-usb 0003:413C:2003.0001: input,hidraw0: USB HID v1.10 Keyboard [Dell Dell USB Keyboard] on usb-0000:00:1d.2-1/input0 PM: Starting manual resume from disk kjournald starting. Commit interval 5 seconds EXT3-fs: mounted filesystem with ordered data mode. udevd version 124 started cfg80211: Calling CRDA to update world regulatory domain lib80211: common routines for IEEE802.11 drivers lib80211_crypt: registered algorithm 'NULL' intel_rng: FWH not detected Non-volatile memory driver v1.3 input: Power Button (FF) as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input3 ACPI: Power Button (FF) [PWRF] input: Lid Switch as /devices/LNXSYSTM:00/device:00/PNP0C0D:00/input/input4 ACPI: Lid Switch [LID] input: Sleep Button (CM) as /devices/LNXSYSTM:00/device:00/PNP0C0E:00/input/input5 ACPI: Sleep Button (CM) [SLPB] acpi device:03: registered as cooling_device2 input: Video Bus as /devices/LNXSYSTM:00/device:00/PNP0A08:00/device:02/input/input6 ACPI: Video Device [VID] (multi-head: yes rom: no post: no) ACPI: AC Adapter [AC] (on-line) ACPI: Battery Slot [BAT0] (battery present) sdhci: Secure Digital Host Controller Interface driver sdhci: Copyright(c) Pierre Ossman sdhci-pci 0000:15:00.2: SDHCI controller found [1180:0822] (rev 18) sdhci-pci 0000:15:00.2: PCI INT C -> GSI 18 (level, low) -> IRQ 18 Registered led device: mmc0 mmc0: SDHCI controller on PCI [0000:15:00.2] using PIO yenta_cardbus 0000:15:00.0: CardBus bridge found [17aa:201c] iwl3945: Intel(R) PRO/Wireless 3945ABG/BG Network Connection driver for Linux, 1.2.26ks iwl3945: Copyright(c) 2003-2008 Intel Corporation iwl3945 0000:03:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17 iwl3945 0000:03:00.0: setting latency timer to 64 thinkpad_acpi: ThinkPad ACPI Extras v0.22 thinkpad_acpi: http://ibm-acpi.sf.net/ thinkpad_acpi: ThinkPad BIOS 7BETD2WW (2.13 ), EC 7BHT40WW-1.13 thinkpad_acpi: Lenovo ThinkPad X60s, model 170237G thinkpad_acpi: ACPI backlight control delay disabled thinkpad_acpi: radio switch found; radios are enabled thinkpad_acpi: This ThinkPad has standard ACPI backlight brightness control, supported by the ACPI video driver thinkpad_acpi: Disabling thinkpad-acpi brightness events by default... Registered led device: tpacpi::thinklight Registered led device: tpacpi::power Registered led device: tpacpi:orange:batt Registered led device: tpacpi:green:batt Registered led device: tpacpi::dock_active Registered led device: tpacpi::bay_active Registered led device: tpacpi::dock_batt Registered led device: tpacpi::unknown_led Registered led device: tpacpi::standby thinkpad_acpi: Standard ACPI backlight interface available, not loading native one. input: ThinkPad Extra Buttons as /devices/virtual/input/input7 iwl3945: Tunable channels: 13 802.11bg, 23 802.11a channels iwl3945: Detected Intel Wireless WiFi Link 3945ABG iwl3945 0000:03:00.0: PCI INT A disabled wmaster0 (iwl3945): not using net_device_ops yet phy0: Selected rate control algorithm 'iwl-3945-rs' yenta_cardbus 0000:15:00.0: ISA IRQ mask 0x0cb8, PCI irq 16 yenta_cardbus 0000:15:00.0: Socket status: 30000006 yenta_cardbus 0000:15:00.0: pcmcia: parent PCI bridge I/O window: 0x9000 - 0xcfff yenta_cardbus 0000:15:00.0: pcmcia: parent PCI bridge Memory window: 0xe4300000 - 0xe7ffffff yenta_cardbus 0000:15:00.0: pcmcia: parent PCI bridge Memory window: 0xe0000000 - 0xe3ffffff wlan0 (iwl3945): not using net_device_ops yet HDA Intel 0000:00:1b.0: PCI INT B -> GSI 17 (level, low) -> IRQ 17 hda_intel: probe_mask set to 0x1 for device 17aa:2010 HDA Intel 0000:00:1b.0: setting latency timer to 64 loop: module loaded rt2870sta: module is from the staging directory, the quality is unknown, you have been warned. rtusb init ---> usbcore: registered new interface driver rt2870 swap_cgroup: uses 1940 bytes of vmalloc for pointer array space and 1986560 bytes to hold mem_cgroup pointers on swap swap_cgroup can be disabled by noswapaccount boot option. Adding 1984016k swap on /dev/sda2. Priority:-1 extents:1 across:1984016k EXT3 FS on sda1, internal journal ip_tables: (C) 2000-2006 Netfilter Core Team
Exactly the same here, Debian Lenny, Dell Vostro 200 too. I after upgrading the kernel from 2.6.24-1 (debian) to 2.6.26 (debian) that started happening. I'm stuck with 2.6.24 since then (didn't try 2.6.25). Found Bastien's report at bugs.debian, and followed the same suggestion (try vanilla kernel.org 2.6.28). Still the same. dell:/home/jsveiga# lspci 00:00.0 Host bridge: Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller (rev 02) 00:01.0 PCI bridge: Intel Corporation 82G33/G31/P35/P31 Express PCI Express Root Port (rev 02) 00:02.0 VGA compatible controller: Intel Corporation 82G33/G31 Express Integrated Graphics Controller (rev 02) 00:19.0 Ethernet controller: Intel Corporation 82562V-2 10/100 Network Connection (rev 02) 00:1a.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 (rev 02) 00:1a.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 (rev 02) 00:1a.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #6 (rev 02) 00:1a.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 (rev 02) 00:1b.0 Audio device: Intel Corporation 82801I (ICH9 Family) HD Audio Controller (rev 02) 00:1d.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 02) 00:1d.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 02) 00:1d.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 02) 00:1d.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 02) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 92) 00:1f.0 ISA bridge: Intel Corporation 82801IR (ICH9R) LPC Interface Controller (rev 02) 00:1f.2 IDE interface: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 4 port SATA IDE Controller (rev 02) 00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 02) 00:1f.5 IDE interface: Intel Corporation 82801I (ICH9 Family) 2 port SATA IDE Controller (rev 02) 02:00.0 Communication controller: Conexant HSF 56k Data/Fax Modem
Now here's an interesting fact: I'm beta testing Windows 7, and got the same problem there (sometimes network does not work, disabling/enabling the connection brings it back - not a DHCP issue, my IP is fixed). Works fine with Windows Vista. Maybe whatever changed from 2.6.25 to 2.6.26 also changed from Vista to Seven!
It looks like I have the same problem: Intel 82562V-2 network card in a Dell Vostro 200. When I run kernel 2.6.24 (driver version 0.2.0) the network card works fine 100% of the time. When I run kernel 2.6.27 (driver version 0.3.3.3-k6) chances are 50/50 that the network card gives trouble. Symptoms of the problem are: (1) even before there is a login prompt, there is considerable activity on the eth0 device; (2) getting an IP address through DHCP (normal mode of operation on my home LAN) is impossible; (3) manual configuration of eth0 is possible, but then the network is excruciatingly slow. I can send debugging output to everyone who wants to have it.
Looks like the problem "birth" was between vanilla 2.6.25.19 to 2.6.26. You can copy ../drivers/net/e1000e/* from 2.6.25.19 to 2.6.26, cleanly compile the 2.6.26 kernel, and it seems to be Ok (just rebooted 10 times and got the network working everytime). With the "complete" 2.6.26 (same .config), it's 50/50 of reboots with the quickly flashing router LED and no network. I've seen some other reports on the web even with Vista getting troubles with this NIC on Dell computers. Something introduced by Intel on the newer drivers does not like Dell? Unfortunately the e1000e tree from 2.6.25.19 does not compile on 2.6.28.4 (which still has the problem). I'm not willing to try that with 2.6.27 because of the 2.6.27+e1000e bricking NICs history. Developers, any test you'd like us to try? Thanks,
there were quite a few patches between those two kernel versions, is there a chance you can do git-bisect on drivers/net/e1000e git-bisect start v2.6.26 v2.6.25 drivers/net/e1000e We have not seen this behavior here.
Sorry, somehow I did not get the cc: when this was updated; only saw it now. I'm new to git; I'm on "git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6" so I can try the bisect process (I was trying to do the job with diff and vi). Will post result as soon as I get them. Thanks (I was worried that nobody was looking at this)
Hello Jesse, below is the result of bisect. To avoid false "goods" I used a script to reboot 8 times and ping the router, and only considered "good" if 8 times it succeeded. All the "bads" actually happened on the first reboot, so there's little chance of error here. Dell Vostro 200 indeed uses ICH9, so it sounds right. Summarizing the problem: During the boot process one can see the router/switch led associated with the PC go off, then start blinking quite fast. ethtool says the the link is detected, and ifconfig says the interface is up, no errors in dmesg, but it is not possible to access the network. Sometimes disconnecting/reconnecting the ethernet cable brings it back to normal, sometimes not. dell:/usr/src/git/linux-2.6# git bisect good 97ac8caee238d2a81c23661916f7acd3a22c85fe is first bad commit commit 97ac8caee238d2a81c23661916f7acd3a22c85fe Author: Bruce Allan <bruce.w.allan@intel.com> Date: Tue Apr 29 09:16:05 2008 -0700 e1000e: Add support for BM PHYs on ICH9 This patch adds support for the BM PHY, a new PHY model being used on ICH9-based implementations. This new PHY exposes issues in the ICH9 silicon when receiving jumbo frames large enough to use more than a certain part of the Rx FIFO, and this unfortunately breaks packet split jumbo receives. For this reason we re-introduce (for affected adapters only) the jumbo single-skb receive routine back so that people who do wish to use jumbo frames on these ich9 platforms can do so. Part of this problem has to do with CPU sleep states and to make sure that all the wake up timings are correctly we force them with the recently merged pm_qos infrastructure written by Mark Gross. (See http://lkml.org/lkml/2007/10/4/400). To make code read a bit easier we introduce a _IS_ICH flag so that we don't need to do mac type checks over the code. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com> :040000 040000 692d08ad725d7abb00e82a7eb012e60a4e5a8048 ff07a5d12182d2b3054dfd735372ff1a23ce0c3f M drivers BR, Joao
PS1: Not sure if this is important, but when I first got the problem, my BIOS was 1.0.11; I updated to 1.0.15 because it said: Fixes/Enhancements: 1.Updated Intel microcode. but it didn't help.
adding bruce, our ICH expert. Bruce do you see anything in the offending commit 97ac8caee that seems in error?
we unfortunately don't have one of these systems here, if it isn't too much trouble, can you run the ethregs utility and dump the output to a file before (when its working) and after (when its not) It doesn't dump the phy registers, which we may need to do with a driver change to the before and after drivers. I really appreciate the effort taken with the bisect! I'm a little suprised that this system is effected by that patch (and that it works before that patch) ethregs can be downloaded from http://prdownloads.sf.net/e1000/ethregs-1.1.tar.gz you'll need to probably install pciutils-devel to be able to build it, I haven't tried it before on debian.
Comment#1 from Troels Liebe Bentsen looks suspicious because the EEPROM dump suggests it is an 82562-V device but the dmesg output suggests it is an 82573. As for comment#5 from Joao S Veiga, the e1000e bricking NICs issue only happens on 2.6.27-rcX and 2.6.27 kernels with ftrace enabled. The issue was fixed in 2.6.27.1. Offhand, the only thing I see that is possibly wrong in the commit reported by the git bisect (btw, thanks for doing that) is e1000e_disable_gig_wol_ich8lan() should check the PHY type and *not* write the PHY_CTRL register if it is an IFE PHY. This is part of the suspend/shutdown path so it might not be related to this bug however.
It turns out while we don't have a Dell Vostro 200, I do have a similar ICH9 w/ IFE PHY platform and will try to repro this issue in-house.
Created attachment 20318 [details] ethregs dumps on Vostro 200 bisect good/bad
Sorry for the delay; I git-replayed to the bad state and collected the dumps. While booted on the bad state, I collected some dumps with the NIC not working, then after I managed to make it work (dis/re-connect nw cable, un/re-load e1000e) collected one more. Then I moved on with the bisect to the good state and collected one dump. (to compile ethregs in debian, the pciutils-devel is in the libpci-dev package) I'll be far from my Vostro 200 from feb/22 to mar/3, so unfortunately I won't be able to do more tests next week, but I'll be glad to do it as soon as I'm back.
I'm continuing to work this issue with Bruce Allan. I looked at the ethregs good & bad dumps, and they are essentially the same, with nothing obviously wrong in either one. The earlier dmesg trace is indeed bogus. It may show a problem, but not one with the 82562V-2 at Bus:Device.Function 0:19.0. Could someone provide a dmesg trace taken after the problem first shows, taken on a system with this original reported device ? Maybe I'll find a clue in it. Like Bruce, I don't have a Dell Vostro 200. I do have an ICH9 with the same 82562V-2 device at 00:19.0. I tried for local repro using 2.6.26 vanilla kernel under Fedora 9, so I'm likely using a significantly different kernel .config, and certainly networking 'middleware' than on the Debian Lenny installs that I see you (Joao and Bastien) are reporting. That may be relevant. I don't know yet, but plan a Debian Lenny install next to find out. With the exception of the MAC address, and the PXE version, my EEPROM image is the same as reported. I'm following up separately on the PXE version, but doubt its relevant. It may also be that the important difference between our configurations is in the connection to the link partner. Can you tell me please what make/model switch you are connected to, and if you are using a straight through cable or a crossover (if you know) ? Thanks Dave
Hello, Thanks for your investigations. I'm connecting to a SMC FS5 switch with a straight through cable.
Thanks Bastien. I can't get hold of exactly that switch, but based on it specs, I tried the closest I could find, a Trendnet TE100 S5 10/100 switch that does auto MDI/MDIX (which accommodates both straight-thru and corossover TX/RX cablings). On my system I am still not able to see the reported failure. btw, Joao - what switch are you using ? I strongly suspect that MDI/MDIX behavior is relevant, esp because commit 97ac8ca that Joao identifed in his bisect introduces (or tries to improve - I'm not sure yet) auto-MDI/MDIX for the IFE phy in your platform. Of course if we do have an interoperability problem with this switch type, it might still be an issue with the connected switch. I can't be sure that its the Auto-MDI/MDIX feature that's causing the problem just from the commit id, as the commit is 1100 lines of code, and the MDI/MDIX parrt is ony about 20 lines within it. But by testing with an updated e1000e driver that removes the MDI/MDIX code alone, we could find out. If you can build a kernel, and are prepared to help in that effort, please let me know, and I'll send you a patch to apply to your 2.6.26 kernel tree. There's other stuff that you could do to help, and might add circumstantial evidence to the suspected MDI/MDIX failure. See if there's a firmware update for your switch, and if so apply it and retest (I actually went looking through the SMC site & FAQs for the SMC-FS5, but couldn't find anything interesting). I think its unlikely it will resolve the issue, but if a FW update does resolve it, then that's almost an admission of a problem on the switch vendor's part. Could you try another cable (a crossover if you were using a straight-through), maybe try another switch ? I'll keep trying to get a repro here, but am not too optimistic that I'll succeed without the right switch. I still haven't tried Debian Lenny, but don;t expect that to be relevant. If you're up for a switch-swap, I think that would help us get definite root cause, and be my best shot at providing a true fix or workaround. I can arrange to send you a (110V, 50Hz) replacement, so let me know if your interesting in that too, and we would then discuss the detail offline from this bugzilla.
Created attachment 20389 [details] router's good tcpdump As I live in France, I cannot use a 110V switch, and I don't have another one. I tried with a crossover cacle, it didn't help. But as I plugged a crossover cable, I tried to connect it directly to the router, and the connection was up each time I plugged or reboot. I attach the router's tcpdump, but I'm not sure it can be useful. I may try to build ethregs if it works on OpenBSD (as my router runs it) I can test a patch for the 2.6.26 kernel if you send one
Hi Dave (sorry, as I mentioned before I'm away from home this week, and not often on the web). I'll get the dmesg from before/after as soonb as I'm back home; but I can answer the switch connection part: I'm using a Gi-Link (rtl8186 chipset), with a 2.4-kernel based firmware, connected to the Vostro 200 with a direct (not crossover) CAT5 cable, all wires correctly crimped (not only rx/tx pairs). Although I'd love to accept your offer for a switch, I'm in Brazil, so shipping costs would be too big. I have a 100Mbps switch (encore) and a 10Mbps hub (also encore IIRC) which I can test with as soon as I'm back home on Tuesday 3/3. I'll also be glad to test any patch or debug version of the module. BR, Joao
Created attachment 20408 [details] Removes e1000e ICH8 MDIX detect code To be applied to vanilla 2.6.26 kernel tree.
Thanks Bastien, Joao. I've added a small patch (revertICH8MDIX.patch) that reverts the MDI/MDIX section of the commit that's been identified as introducing the problem. Bastien, Joao, if one or both of you could test with this patch applied that would be great. I think that there may be a problem with enabling MDI/MDIX detection because I see in at least one other place in the code [force_speed_duplex()] where we are disabling it for ICH8 100Mbps interfaces. I don't yet understand the relation between this new code - the code my patch removes - and force_speed_duplex, but knowing for sure that it is (or isn't) the MDI/MDIX stuff causing the problem will help me know that my focus is good. Dave
Hello Dave, I'm back, and made the tests. First I tried the first bad commit from git bisect with an Encore 100Mbps switch (8 port NWay Switch, model ENH908-NWY), but it didn't make any difference. the first boot did come up with the NIC working, but it was a lucky one. I'm attaching dmesg outputs from booting with the encore and the gilink. (my 10Mbps hub I'd also test vanished, sorry) Then I compiled a vanilla kernel.org 2.6.26 to make sure it didn't work, booted 5 times (all 5 got a nonworking NIC), then patched it with your revert patch, compiled, and got 5 good boots in a row. I've attached dmesgs for these two cases, and also the /proc/config.gz. Please let me know if more tests are needed. BR, Joao
Created attachment 20419 [details] Switched switch and revertICH8MDIX tests
Hello, it works well with the revertICH8MDIX patch applied, plugged on switch.
Thanks. I apologize for the delay in getting back to you. I am working on 2 fronts to move this along. 1) I have our interop lab helping in the testing , as they have access to a lot of swiitch types and are our best shot at seeing a local repro. 2) I am working on some code that will allow us to get relevant phy configuration and status information from your configurations that will let me see what's going on. The current theory is that the auto-MDI/MDIX behavior within our own phy is switching TX/RX , but that so is the auto-MDI/MDIX capbility in the connected switch, and that they both keep switching. If you could try another switch, that'd still be a good datapoint. Also, if you could try the standalone latest released driver at sourceforge, (http://sourceforge.net/projects/e1000/ click Download, then Browse All Packages, then click in the download link for e1000e_stable), it'd be useful to know if the problem still exists there. I expect that it does.
Hello, thanks for the update. Using the module from sourceforge: e1000e: Intel(R) PRO/1000 Network Driver - 0.5.11.2-NAPI Symptoms are the same (fast blinking, no network access most of the times). Now, at the risk of losing any credibility I might have, here's a strange thing: After a thunderstorm earlier this week (which killed my PS3), my Gi-Link router is going south. It's WAN (ethernet) interface is behaving badly (I have to plug/unplug the cable several times to make it connect to the cable modem). BUT - I no longer can reproduce the fast-blinking/no connection problem on this Gi-Link with the Vostro 200!! I can use the 2.6.26 vanilla, the first bad bisect from 2.6.25 to 2.6.26, or the latest standalone driver, and the connection between the router and the Vostro 200 works all the time. I don't know if there's any reasonable explanation for this. ifconfig shows no error/drops on both sides of the vostro/router link (does show some errors on the WAN interface though). The only change I could expect from the lightning would be on the impedance from half-damaged front-end circuits. LUCKLY I still get the problem with the Encore switch (ENH908-NWY) I had tested before, and which I used to confirm that the standalone driver still does the same. I also found that the problem can be reproduced with the KNOPPIX_V6.0.1CD-2009-02-08-EN.iso live CD/pendrive (debian lenny based, 2.6.28.4 kernel), so if you want a quick way of checking multiple PCs for selecting a testbed, using this on a pendrive could make it easier. BR, Joao
Joao, That's strange about the thunderstorm, but OK you still see the problem with the Encore. I now have tried a bunch of auto MDI/MDIX 10/100 equipment, and one of those is also the Encore, which I managed to get from Amazon. Here's my current list. Netgear RP614 v2 NetGear MR314 LinkSys BEFSR81 Encore ENH908-NWY Using each of these, I still fail to see the problem, so far. Thanks also for testing using the e1000e-0.5.11.2. I'll provide the debug code(which I referenced in my previous note) on that as a base - it'll be faster to work with the standalone driver. I should have that as a new comment and attachment in a day or two, and will provide instructions with it. Dave
Thank you Dave, I've brought a Netgear ProSafe Gigabit Switch GS108 from work for testing, and the problem does not appears with it (0.5.11.2-NAPI). I was never able to reproduce it again with the half-fried Gi-Link; I'll maybe buy a Linksys WRT54G as a replacement this weekend, so I'll have one more "partner" to try. Currently only the Encore switch is doing the fast-blinking with the Vostro 200. I will also try to bring more switch models from work (including other same-model Encore) to check if I can get more "bad" test cases to try with your debug code. Best regards, Joao S Veiga
I brought another Encore ENH908-NWY from work, and it also fastblinks (so it's not unique to my specific ENH908-NWY unit). I'll try to bring a Gbit 3Com switch, and an Edimax tomorrow. I also noticed something interesting: I connected the switch before turning the Vostro 200 on, and when I turned the PC on I got the fast blinking BEFORE linux booted - while still on the GRUB menu. At this point, the behavior is the same as after booted with the "bad" MDIX detect: Unplugging/plugging the network cable sometimes brings a steady led on, most of the times the fast blinking. So whatever causes the "resonance" is the default behavior of the NIC, without the driver telling it what to do. I did mention I get the same thing with Windows 7 Beta 64bits (but not with Vista 32); do they use different MDIX detection code too? BR, Joao
Created attachment 20696 [details] e1000e patch dumps phy registers to msg log on ethtool -d
Created attachment 20697 [details] small script to invoke mulitple phy register dumps Requires that the patch e1000e-0.5.18.3.dumphy.patch is applied to e1000e-0.5.18.3 base driver.
Thanks for continuing to work with me on this Joao. Your new note is interesting, and I haven't had too much time to ponder it yet. Somehow we'll have to exaplain how the MDI/MDIX commit can cause/resolve the issue. I have attached items that together should allow you to collect more data for me , so that I can see whatever the phy is seeing with the MDI/MDIX code running. e1000e-0.5.18.3.dumpphy.patch This is to apply to the (new) e1000e-0.5.18.3 sourceforge driver, which will almost certianly still exhibit the problem we are discussing. Could you please use that driver with this patch for your continued testing. The patch simply adds a phy regsiter dump capability to the "ethtool -d ethx" command, and will place the data in the system message log. e1000e-0.5.18.3.dump.sh This small bash script invokes 21 phy dumps at 300ms apart, placing the result in a file (via dmesg) "phyregisters". If you could run the script when the LAN port is exhibiting the flashing LED behavior under the new driver, and send me the file "phyregisters", I think it should allow me to see what's going on. Thanks again Dave
Thank you Dave, Before going to the dump tests: I tested an old Edimax "Switch Fast Ethernet" (no model number), and a 3Com Gbit 3CGSU08 switch, and none fast blinked. I could also get fastblinks again with my Gi-Link, but I have to un/plug the ethernet cable several times (about 10% fastblinks) Tests (running vanilla kernel.org 2.6.26, attached config.gz): 1 - I first compiled the new driver module as it came from sourceforge, to make sure it still has the fastblinks. 2 - ifdown, unloaded old modules, loaded the new module: e1000e: Intel(R) PRO/1000 Network Driver - 0.5.18.3-NAPI e1000e: Copyright(c) 1999 - 2009 Intel Corporation. ACPI: PCI Interrupt 0000:00:19.0[A] -> GSI 20 (level, low) -> IRQ 20 PCI: Setting latency timer of device 0000:00:19.0 to 64 0000:00:19.0: : Failed to initialize MSI interrupts. Falling back to legacy interrupts. 0000:00:19.0: eth0: (PCI Express:2.5GB/s:Width x1) 00:1e:c9:1a:83:5d 0000:00:19.0: eth0: Intel(R) PRO/10/100 Network Connection 0000:00:19.0: eth0: MAC: 8, PHY: 7, PBA No: ffffff-0ff 3 - ifup, unplugged the CAT5e from the GiLink, plugged on the Encore, it fastblinks. Connected/disconnected the cable from the Encore 10 times, got 7 bads, 3 goods. 4 - applied the e1000e-0.5.18.3.dumpphy.patch, recompiled, ifdown, un/load e1000e module, ifup. Result note: I had left the Encore in a working state, but when I unloaded the module it started fastblinking. When I ifup'ped the led turned off, then back to fastblink. I think this confirms that when "left alone" the NIC hardware also causes the fastblinks. 5 - ran the script, collected bad_phyregisters_1 6 - un/plugged the cable 5 times, until I got a working state, collected good_phyregisters_2 7 - un/plugged the cable twice, until I got fastblinking, collected bad_phyregisters_3 8 - unplugged from Encore, un/plugged to Gi-Link 8 times, until I got a bad state, collected bad_phyregisters_4 9 - I recompiled the kernel enabling MSI interrupts, to avoid the warning when loading the module, just in case, but it did not change the fastblinking results. ACPI: PCI interrupt for device 0000:00:19.0 disabled e1000e: Intel(R) PRO/1000 Network Driver - 0.5.18.3-NAPI e1000e: Copyright(c) 1999 - 2009 Intel Corporation. ACPI: PCI Interrupt 0000:00:19.0[A] -> GSI 20 (level, low) -> IRQ 20 PCI: Setting latency timer of device 0000:00:19.0 to 64 0000:00:19.0: eth0: (PCI Express:2.5GB/s:Width x1) 00:1e:c9:1a:83:5d 0000:00:19.0: eth0: Intel(R) PRO/10/100 Network Connection 0000:00:19.0: eth0: MAC: 8, PHY: 7, PBA No: ffffff-0ff 10 - collected bad_phyregisters_5 and good_phyregisters_6 with MSI interrupts enabled and the Encore switch, also attached the msi_config.gz. Please let me know if more tests are needed. Best regards, Joao S Veiga
Created attachment 20714 [details] phy registers dump tests
Linksys WRT54G V8 firmware 8.00.5 does not fastblink. BR Joao
Thanks Joao for the register dumps. They show what's happening. First, for both the good and bad cases the first read result was always different from the others, but it seems that's because the registers around 0x13 are statistics registers that are cleareed on read, and so probably not important. The outstanding item is the setting of register 0x1C, which reads: bad case 0x90 good case 0xB0 (I see this value in my testing, even with the NWay switch) In both cases, the 82562-V indicates (bit 4) that the Autodetect function has sucessfully acheived resolution, the difference is bit 5, which the Phy documentation says "Indicates the state of the MDI pair. 1 = MDI-X (cross over). 0 = MDI (straight through)." So in the failing case, our 8256V-2 has decided that the cable is X-over, which it isn't. Interestingly, all the failure cases show the same thing, and they don't show us changing back and forth. Maybe we did indeed detect the cross-over configuration based on the link partner's first auto-MDI/MDI-X switch, changed to the crosssover configuration itself, which sent the link partner into a weird mode. I don't know the detail of the auto-detcted function though, so have forwarded the data to our Silicon sustaining team for further analysis, and will update when they respond. In the meantime, its good that you have the LinkSys.
I have this same bug on a Dell Inspiron 530, also with an Intel 82562V-2 NIC. I am running a Mandriva 1010.0, with a 2.6.31 kernel, so 2.6.31 still has this bug apparently. details at : https://qa.mandriva.com/show_bug.cgi?id=43493 I have a Ruby Tech 8 port 10/100 switch, a straight throw cable, and no other switch to test with at the moment. Best regards, Dominique
I switched to a netgear GS-108T, and there is no more problem. For Dominique : I ran a few months with dave's patch applied on my debian kernel, it worked weel.
bug #14737 may be related But I'm not sure.
Created attachment 24749 [details] proposed parameter to control mdi-x This is a proposed patch to allow the user to override MDI-X behavior at module load time. patch is against 2.6.33-rc5, please test with your funny switch(es). <apply patch> make M=drivers/net/e1000e insmod drivers/net/e1000e/e1000e.ko mdix=1,1 this will force MDI (straight through) mode on the phy, value of 2 forces MDI-X mode.
Hi, one of my funny switches (actually a Gi-link router) had a lightning problem, and does not misbehave with the mdi-x, but I still have the Encore ENH908-NWY which triggers the problem, so I used it to test. I was having troubles booting with 2.6.33-rc5, so instead of deviate from the intended course to solve that, I applied your patch to the 2.6.29.1 I'm running now (it applied Ok, and I installed the new module in /lib/modules): # modinfo e1000e filename: /lib/modules/2.6.29.1/kernel/drivers/net/e1000e/e1000e.ko version: 0.3.3.3-k6 (...) parm: mdix:MDI-X crossover control: 0 - auto (default), 1 - mdi only, 2 - mdix only (array of int) Using the module with mdix=1 worked 100% of the time, with mdix=2 made the fastblink appear 100% of the time. Without the parameter, it ramdom (seems it depends on how it was set previously): If I booted on a fastblink state, reloading e1000e with mdix=1 always fixed it. Reloading with mdix=2 didn't. Good: ifdown eth0; modprobe -r e1000e; modprobe e1000e mdix=1; ifup eth0 Fastblinks: ifdown eth0; modprobe -r e1000e; modprobe e1000e mdix=2; ifup eth0 (note: I used "mdix=1" or "mdix=2", not "mdix=1,1" - is that right?) I couldn't however make it work directly from boot. I tried: - adding "options e1000e mdix=1" on /etc/modprobe.d/e1000e (which makes modprobe e1000e work without the option, but not at boot time, it seems), - adding "mdix=1" and "e1000e.mdix=1" in the kernel line (grub), but got ramdom results - adding "e1000e mdix=1" on /etc/modules, also ramdom results. Also noticed the option does not show up at /sys/module/e1000e/parameters/, which makes it hard to understand if the failures at boot happened because the option was not set or if it was set but did not work well. When loading the module on the commandline, it tells what the MDI-X was set to, but I couldn't see that during boot time. Best regards, Joao S Veiga
As I gave my "funny switch" to my father-in-law, it shall be hard for me to test it. I may try to borrow it next week-end, if you need more feedback, though.
Hello, I borrowed my old switch for testing. I applied patch to Debian's 2.6.32-trunk, built & installed the module, and made tests. * mdix=1 leads to 100% connection * mdix=2 leads to 100% failure * mdix=0 leads to random results (looks like it depends on timing, more failures if I try to connect as soon as I install the module, but I cannot be sure) option assignment via /etc/modprobe.d works well after rebuilding initrd (did you do that Joao ?) Thanks for your work. Best regards, Bastien Durel
Bastien, could you explain with an example how option assignment via /etc/modprobe.d works? Like Joao, I tried "mdix=1" on the kernel line in /boot/grub/menu.lst, but got random results. Thanks! Best regards, Nico Poppelier
Hello, Mos Linux distros uses initial ram images to boot. These images are generated once, at kernel install time, and ships copies of /etc/modprobe.conf & /etc/modprobe.d/* and running (and mabye others) modules Their init script loads embed modules with parameters taken from embed modprobe config files. Then when Linux switches its root to your root disk, e1000e is already loaded with its old, empty, configuration. You have to rebuild the initrd after modifying modprobe files (calling mkinitrd with proper arguments, or "update-initrd -u" under debian) Regards, Bastien Durel
in fedora you can use dracut to rebuild the initrd. latest update: I still haven't committed this patch to our out-of-tree driver, but I still believe it to be a good feature. I hope we will release the fix soon, but we need some time (hard to find) to complete the correct upstream fix, which is probably to enable ethtool to control MDI-X state as well as report it.
When could you commit a correct upstream fix? I just upgraded to Ubuntu 10.04 and am again building a patched kernel (2.6.32-22)! It would be nice to be able to use stock kernels instead of patched ones. The patch works of course, and I'm still grateful it is available. :-) Regards, Nico
Hello, Have you commited a patch ? Any reports ? Thanks, -- Bastien
I am also affected by this bug. It exists also in Ubuntu 10.10, so i assume the patch has not bin commited yet. The issue is described in the launchpad system at https://bugs.launchpad.net/ubuntu/+bug/408351
I upgraded to Ubuntu 11.04 (Natty Narwhal) yesterday, and the problem is still there. Narwhal uses kernel version 2.6.38. I can apply the patch presented here more than a year ago (see post of 2010-01-27) but would really appreciate a permanent solution!
I applied the patch linked to by Comment #41 From Jesse Brandeburg 2010-01-27 and added the option mdix=1 to the e1000e module. It corrected a long standing problem booting the Vostro 200. I am connected to a Netgear 8 port 10/100 mbps switch FS608 v3. The switch node blinked rapidly and the computer could not access the network when booting sometimes. My work around was to power cycle the Netgear switch repeatedly until it started working. Power cycling the switch causes the driver to reset and try again. This patch needs to be in the kernel. I am using Linux version 3.3.3-gentoo. The patch applied OK once I found the actual e1000e driver directory in my source tree. Please add the patch in Comment #41 to the official kernel sources.
The module parameter patch will not be accepted upstream, but I have finished the code for the driver changes and associated app changes to ethtool to allow this to work. These changes will go into our internal test and then be pushed upstream. there will be a new argument to ethtool -s ethx called mdix to be used like this: ethtool -s ethx mdix <auto|on|off>
A patch referencing this bug report has been merged in Linux v3.7-rc1: commit 4e8186b68fb944ad9e7fd4080cd8bd8f10eb7cbd Author: Jesse Brandeburg <jesse.brandeburg@intel.com> Date: Thu Jul 26 02:31:14 2012 +0000 e1000e: implement MDI/MDI-X control