Most recent kernel where this bug did not occur: N/A Distribution: Fedora Core 5 Hardware Environment: 00:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3) 00:01.0 ISA bridge: nVidia Corporation CK804 ISA Bridge (rev a3) 00:01.1 SMBus: nVidia Corporation CK804 SMBus (rev a2) 00:02.0 USB Controller: nVidia Corporation CK804 USB Controller (rev a2) 00:02.1 USB Controller: nVidia Corporation CK804 USB Controller (rev a3) 00:06.0 IDE interface: nVidia Corporation CK804 IDE (rev a2) 00:08.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev a3) 00:09.0 PCI bridge: nVidia Corporation CK804 PCI Bridge (rev a2) 00:0a.0 Bridge: nVidia Corporation CK804 Ethernet Controller (rev a3) 00:0b.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) 00:0c.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) 00:0d.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) 00:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 01:09.0 FireWire (IEEE 1394): VIA Technologies, Inc. IEEE 1394 Host Controller (rev 80) 01:0a.0 Ethernet controller: Marvell Technology Group Ltd. 88E8001 Gigabit Ethernet Controller (rev 13) 01:0b.0 Multimedia audio controller: Creative Labs SB Live! EMU10k1 (rev 0a) 01:0b.1 Input device controller: Creative Labs SB Live! MIDI/Game Port (rev 0a) 05:00.0 VGA compatible controller: nVidia Corporation GeForce 7800 GTX (rev a1) Software Environment: Problem Description: I am runnig a 2.6.16.1 kernel on a DFI LP NF4 SLI-DR Expert mobo, which has an nvidia chipset with onboard nic. The nic works fine with the forcedeth driver, perfomance is ok (good). The system is a x86_64 FC5 install on a dual core opteron 170 cpu with 2GB (2x1GB in dual channel) of Ram installed. When I suspend the machine using ACPI S3 or swsup and resume it the network is dead. I cannot recive an packages. ifdown / ifup does not help. Restarting the network using /sbin/service network restart also does not get network working. Unloading and loading the driver (modprobe -r forcedeth;modprobe forcedeth) has the same result-> dead network. I have to reboot in order to get the network working again. I have a static IP so no dhcp issue. This makes suspend useles on my box, because I have to reboot anyway to get my network working. What could be causing this? If there is any info that I can provide to help fixing this bug tell me. I also noticed this (don't know if it is related or not but doubt it): cat /proc/interrupts CPU0 CPU1 0: 640968 628532 IO-APIC-edge timer 1: 4763 4745 IO-APIC-edge i8042 8: 0 0 IO-APIC-edge rtc 9: 0 0 IO-APIC-level acpi 14: 1552 1082 IO-APIC-edge ide0 15: 44443 44261 IO-APIC-edge ide1 16: 57625 44633 IO-APIC-level libata 17: 972904 0 IO-APIC-level eth0 Steps to reproduce: suspend or hibernate resume try to do anything with the onbaord nic. I mailed this to lkml but got no reply so I filled it as a bug.
I also noticed this (don't know if it is related or not but doubt it): cat /proc/interrupts CPU0 CPU1 0: 640968 628532 IO-APIC-edge timer 1: 4763 4745 IO-APIC-edge i8042 8: 0 0 IO-APIC-edge rtc 9: 0 0 IO-APIC-level acpi 14: 1552 1082 IO-APIC-edge ide0 15: 44443 44261 IO-APIC-edge ide1 16: 57625 44633 IO-APIC-level libata 17: 972904 0 IO-APIC-level eth0 note: all irqs of eth0 are only handled by cpu0 never 1 irqbalance is running.
Created attachment 7895 [details] Awfully experimental suspend/resume support for the forcedeth driver
If you are really bored and you can afford to crash your computer a few times, you can try the patch above. No warranty implied, really. Don't worry about irqbalance. It is a different topic. -- Ueimor
ok I will try it this weekend.
the patch works fine!! thx. Can this be included into the mainline kernel ? suspend->resume->network works fine What about the irq issue?
is this patch going to be included in 2.6.17 ?
How do I suspend the ethernet device? Using "suspend -f" will only suspend the console. Also, there is no "resume" command.
ifup the device suspend the box (S3,S4 or software suspend) resume and see if your are still able to recive/send anything.
That was my question. How do I suspend the box into S3 or S5?
I don't know what S5 is but S3 (supend to ram): do as root: echo mem > /sys/power/state for S4 (suspend to disk): echo disk > /sys/power/state (on older kernels /proc/power/state)
I've recently got myself a new nforce 570 board (mcp55) with dual-gige onboard lan and also noticed, that after S4 network is down. After "ifconfig eth0 down && ifconfig eth0 up" it's coming back to life in my case (running kernel 2.6.18-rc1). lspci: 00:00.0 RAM memory: nVidia Corporation MCP55 Memory Controller (rev a1) 00:01.0 ISA bridge: nVidia Corporation MCP55 LPC Bridge (rev a2) 00:01.1 SMBus: nVidia Corporation MCP55 SMBus (rev a2) 00:01.2 RAM memory: nVidia Corporation MCP55 Memory Controller (rev a2) 00:02.0 USB Controller: nVidia Corporation MCP55 USB Controller (rev a1) 00:02.1 USB Controller: nVidia Corporation MCP55 USB Controller (rev a2) 00:04.0 IDE interface: nVidia Corporation MCP55 IDE (rev a1) 00:06.0 PCI bridge: nVidia Corporation Unknown device 0370 (rev a2) 00:06.1 Audio device: nVidia Corporation MCP55 High Definition Audio (rev a2) 00:08.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a2) 00:0a.0 PCI bridge: nVidia Corporation Unknown device 0376 (rev a2) 00:0b.0 PCI bridge: nVidia Corporation Unknown device 0374 (rev a2) 00:0c.0 PCI bridge: nVidia Corporation Unknown device 0374 (rev a2) 00:0d.0 PCI bridge: nVidia Corporation Unknown device 0378 (rev a2) 00:0e.0 PCI bridge: nVidia Corporation Unknown device 0375 (rev a2) 00:0f.0 PCI bridge: nVidia Corporation Unknown device 0377 (rev a2) 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 01:06.0 Mass storage controller: Promise Technology, Inc. PDC20268 (Ultra100 TX2) (rev 02) 01:07.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10) 01:08.0 Multimedia audio controller: Creative Labs SB Live! EMU10k1 (rev 07) 01:08.1 Input device controller: Creative Labs SB Live! MIDI/Game Port (rev 07) 01:0b.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link) 07:00.0 VGA compatible controller: ATI Technologies Inc RV370 5B60 [Radeon X300 (PCIE)] 07:00.1 Display controller: ATI Technologies Inc RV370 [Radeon X300SE] (Only one GigE showing up, because I've disabled the second one in BIOS)
I just tried the 'awfully experimental' patch from Romieu and found that it does not blow up and it even _works_ when I disable msi/msix. So for now I added "options forcedeth msi=0 msix=0" to /etc/modprobe.d/forcedeth. With msi enabled I get |Jul 15 12:29:37 melchior kernel: APIC error on CPU0: 00(40) followed by a lot of |Jul 15 12:29:37 melchior kernel: APIC error on CPU0: 40(40) |Jul 15 12:30:07 melchior last message repeated 6122 times |Jul 15 12:31:03 melchior last message repeated 11430 times Also interesting: |Jul 15 12:47:00 melchior kernel: pnp: Device 00:08 activated. |Jul 15 12:47:00 melchior kernel: pnp: Failed to activate device 00:09. even though I get this with the working case too. |00:08.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a2) |00:09.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a2)
nice to hear that it works (with minor issues) can you post the autput of cat /proc/interupts? (one with msi enabled) and one with off? also dual or single cpu system?
Single core (Orleans): |processor : 0 |vendor_id : AuthenticAMD |cpu family : 15 |model : 79 |model name : AMD Athlon(tm) 64 Processor 3200+ |stepping : 2 |cpu MHz : 1000.000 |cache size : 512 KB |fpu : yes |fpu_exception : yes |cpuid level : 1 |wp : yes |flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow pni cx16 lahf_lm svm cr8_legacy |bogomips : 2010.90 |TLB size : 1024 4K pages |clflush size : 64 |cache_alignment : 64 |address sizes : 40 bits physical, 48 bits virtual |power management: ts fid vid ttp tm stc /proc/interrupts, with msi | CPU0 | 0: 18536 IO-APIC-edge timer | 1: 8 IO-APIC-edge i8042 | 7: 1 IO-APIC-edge parport0 | 8: 0 IO-APIC-edge rtc | 9: 0 IO-APIC-level acpi | 14: 100 IO-APIC-edge ide0 | 50: 381 IO-APIC-level ehci_hcd:usb1, HDA Intel | 58: 293 IO-APIC-level ohci_hcd:usb2 | 66: 0 IO-APIC-level EMU10K1 | 74: 1 IO-APIC-level gige0 | 98: 0 PCI-MSI-X eth0 |106: 0 PCI-MSI-X eth0 |114: 5221 PCI-MSI-X eth0 |122: 236 PCI-MSI-X eth1 |130: 284 PCI-MSI-X eth1 |138: 5224 PCI-MSI-X eth1 |233: 2304 IO-APIC-level ide2 |NMI: 40 |LOC: 18496 |ERR: 0 |MIS: 0 /proc/interrupts, without msi | CPU0 | 0: 84858 IO-APIC-edge timer | 1: 10 IO-APIC-edge i8042 | 7: 1 IO-APIC-edge parport0 | 8: 0 IO-APIC-edge rtc | 9: 0 IO-APIC-level acpi | 14: 100 IO-APIC-edge ide0 | 50: 381 IO-APIC-level ehci_hcd:usb1, HDA Intel | 58: 4938 IO-APIC-level ohci_hcd:usb2 | 66: 0 IO-APIC-level EMU10K1 | 74: 1 IO-APIC-level gige0 | 82: 24087 IO-APIC-level eth0 | 90: 28469 IO-APIC-level eth1 |233: 23167 IO-APIC-level ide2, radeon@pci:0000:07:00.0 |NMI: 60 |LOC: 84821 |ERR: 0 |MIS: 0 gige0 is a realtek pci card from the old system with a udev renaming rule. eth0 and eth1 are the two onboard interfaces. Which brings me to another issue I'll probably have to open a new bug for: With both interfaces enabled I'm unable to get a 1000Mbit link, only 100Mbit works and link detection takes ages (a few secs). If I disable the second interface 1000Mbit works (would have to check for the link detection time). And on eth0 (no cable attached) ethtool says: | Speed: Unknown! (65535) | Duplex: Unknown! (255) As opposed to gige0 (no cable attached): | Speed: 100Mb/s | Duplex: Half Now, I was about to say forcing speed with ethtool doesn't work, but just found that "ethtool -s eth1 speed 1000 duplex full autoneg on" actually did get the speed up to 1000 just now. Weird.
any updates? anything I can do to fix this one?
sorry should be anything I can do to help fixing this one?
I'm happy to report, that with 2.6.18-mm2 suspend to disk works without additional patches, even with MSI interrupts enabled (the mm2 announcement said something about MSI changes, so I figured I'd try both with msi disabled and enabled).
It would be better if the patch avoids a complete close/open cycle which can fail. As: 1) I do not have a lot of time to poke in the guts of the driver without documentation and figure the use of the registers (let alone seekreet ones :o) ) 2) It would imply a new test cycle and more delay 3) The user experience without the patch sucks 4) The patch is not _that_ rotten I see no reason to further delay the inclusion of the patch in the kernel. I'll do a proper submission of the patch to Jeff. Thanks for your help and patience. -- Ueimor
Fix has been included in mainline under id a189317fa0e9d425cd3a4c248b06f96d876cf7fd : http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=a189317fa0e9d425cd3a4c248b06f96d876cf7fd It is available since 2.6.20-rc1. -- Ueimor