Distribution: Debian GNU/Linux 4.0 Hardware Environment: Chipset: Intel GMCH: 915GM, ICH6M: 82801FBM CPU: Intel Celeron M 1.50GHz RAM: 1GB RAM DDR2 DIMM 400/533 (Dual Channel) BIOS: ACPI BIOS Security: Atmel TPM 1.2 I/O: Windbond W83627EHF, Oxford OX16PCI954 Ehthernet: Realket PCI-E 8101E Storage: Integrated SATA and IDE controller Software Environment: 1) sftp 2) netcat 3) dhcp, tftboot, pxelinux nfs client/server Problem Description: 1) System randomly hangs on data transfer over the network. In some cases I can transfer/receive 10 or more gigabytes without any problems, sometimes system hangs after trasferring less than 1 MB. 2) When I boot system over the network, after transferring certain amount of data, network stalls, and then "NETDEV WATCHDOG: eth5: transmit timed out" message appears, and netowork continue to function properly for some time, then network stalls again, etc. 3) With kernel 2.6.18 I had soft lockup in r8169_rx_interrupt function. I've also tried different revisions of r8101 driver from the realtek, problem is similar - system either hangs, or during sftp transfer gives error message: "Corrupted MAC on input". Tried r8169.c from the linux/kernel/git/torvalds/linux-2.6 - no luck. If needed, I can provide more detailed information.
Created attachment 15085 [details] Kernel config
Created attachment 15086 [details] dmesg output after system boot
Created attachment 15087 [details] ifconfig output
Created attachment 15088 [details] complete lspci output
Created attachment 15089 [details] /proc/interrupts
Created attachment 15090 [details] /proc/iomem
Created attachment 15091 [details] /proc/ioports
The r8169 driver has undergone several changes between 2.6.23.9 and 2.6.24. Can you give 2.6.24 a try (with/without MMCONFIG) ? Thanks. -- Ueimor
Yes, sure, I'll give it a try on monday and post report.
I've compiled and installed 2.6.24.3 kernel (with PCI Access set to "Any" and MSI turned on) and tried to boot from the network, the problem is the same: nfs: server 192.168.100.1 not responding, still trying... nfs: server 192.168.100.1 not responding, still trying... ... nfs: server 192.168.100.1 not responding, still trying... NETDEV WATHDOG: eth0: transmit timeout r8169: eth0: link up nfs: server 192.168.100.1 OK nfs: server 192.168.100.1 OK ... nfs: server 192.168.100.1 OK
With the same 2.6.24.3 kernel (without nfs support) while transferring file using sftp, i got: int3: 0000 [#1] Modules linked in: Pid: 0, comm: swapper Not tainted (2.6.24-diamond #2) EIP: 0060:[<c0468211>] EFLAGS 00000002 CPU: 0 EIP is at ignore_int+0x1/0x50 EAX: 0001f802 EBX: f7744000 ECX: c0285950 EDX: 0000f802 ESI: f76af7cc EDI: 00000000 EBP: f774407c ESP: c0467f1c DS: 007b ES: 007b FS: 0000 GS: 0000 SS:0068 Processor swapper (pid: 0, ti=c0466000 task=c04332e0 task.ti=c0466000) Stack: c01efbda 00000060 00010002 c02857db c02816c3 c0115614 0000000f f76c2ab0 00000000 00000286 00000100 c04b5a80 c0467f64 f75b9340 00000000 00000000 0000000e c013ae75 c043fef0 f75b9340 0000000e 0000000e c013be57 00000310 Call Trace: [<c01efbda>] ioread8+0x2a/0x30 [<c02857db>] ata_bmdma_status+0xb/0x10 [<c02816c3>] ata_interrupt+0x143/0x1c0 [<c0115614>] activate_task+0x24/0x40 [<c013ae75>] handle_IRQ_event+0x25/0x60 [<c013be57>] handle_edge_irq+0x77/0xf0 [<c0104c35>] do_IRQ+0x45/0x80 [<c010322f>] common_interrupt+0x23/0x28 [<c048007b>] asus_hides_smbus_hostbridge+0x20b/0x270 [<c010162a>] default_idle+0x2a/0x40 [<c0100edf>] cpu_idle+0x3f/0x60 [<c0468aca>] start_kernel+0x1fa/0x280 [<c0468360>] unknown_bootoption+0x0/0x1f0 ======================= COde: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc <cc> cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc EIP: [<c0468211>] ignore_int+0x1/0x50 SS:ESP 0068:c0467f1c Kernel panic - not syncing: Fatal exception in interrupt
Created attachment 15157 [details] Force renegotiation after resume Please try the attached patch and send the output of mii-tool after resume. -- Ueimor
Your problem #1 looks like a dupe of bug:6807 Your problem #2 looks like a dupe of bug:10109
(In reply to comment #13) > Your problem #1 looks like a dupe of bug:6807 > Your problem #2 looks like a dupe of bug:10109 > It is a dupe of 6807 bug, but not 10109. I don't have PME event option in BIOS.
(In reply to comment #12) > Created an attachment (id=15157) [details] > Force renegotiation after resume > > Please try the attached patch and send the output of mii-tool after resume. > > -- > Ueimor > Sorry, had no time to apply patch and check it. But is seems to me that the problem is in rtl8169_rx_interrupt handling routine.
Maxim, can you give 2.6.27-rc a try ? There are a few r8169 related changes in it that could fix your problems. Thanks in advance. -- Ueimor
No luck, with 2.6.27-rc8-git4. Network becomes unusable after transferring ~300 MB, but at least system did not hang. No error messages in dmesg, even with RTL8169_DEBUG turned on. As an experiment we have added udelay(10) to all the i/o read and write functions in the 2.6.26.2 kernel, and surprisingly network became more stable. I think we have had problems only once over about a week intensive network use. Probably, it is required to wait some time before/after register read/write operations? But unfortunately I didn't find any info in the datasheet.
We should close this as dupe. #10109 has IMHO nothing to do with PME at all as I can trigger it without that, too. So both problems described here are already reported in other bugs.
Fixed in 2.6.30. -- Ueimor