Bug 10134
Summary: | r8169 randomly hangs system | ||
---|---|---|---|
Product: | Drivers | Reporter: | Maxim Radugin (kilowatt) |
Component: | Network | Assignee: | Francois Romieu (romieu) |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | eike-kernel |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.23.9 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
Kernel config
dmesg output after system boot ifconfig output complete lspci output /proc/interrupts /proc/iomem /proc/ioports Force renegotiation after resume |
Description
Maxim Radugin
2008-02-29 07:14:51 UTC
Created attachment 15085 [details]
Kernel config
Created attachment 15086 [details]
dmesg output after system boot
Created attachment 15087 [details]
ifconfig output
Created attachment 15088 [details]
complete lspci output
Created attachment 15089 [details]
/proc/interrupts
Created attachment 15090 [details]
/proc/iomem
Created attachment 15091 [details]
/proc/ioports
The r8169 driver has undergone several changes between 2.6.23.9 and 2.6.24. Can you give 2.6.24 a try (with/without MMCONFIG) ? Thanks. -- Ueimor Yes, sure, I'll give it a try on monday and post report. I've compiled and installed 2.6.24.3 kernel (with PCI Access set to "Any" and MSI turned on) and tried to boot from the network, the problem is the same: nfs: server 192.168.100.1 not responding, still trying... nfs: server 192.168.100.1 not responding, still trying... ... nfs: server 192.168.100.1 not responding, still trying... NETDEV WATHDOG: eth0: transmit timeout r8169: eth0: link up nfs: server 192.168.100.1 OK nfs: server 192.168.100.1 OK ... nfs: server 192.168.100.1 OK With the same 2.6.24.3 kernel (without nfs support) while transferring file using sftp, i got: int3: 0000 [#1] Modules linked in: Pid: 0, comm: swapper Not tainted (2.6.24-diamond #2) EIP: 0060:[<c0468211>] EFLAGS 00000002 CPU: 0 EIP is at ignore_int+0x1/0x50 EAX: 0001f802 EBX: f7744000 ECX: c0285950 EDX: 0000f802 ESI: f76af7cc EDI: 00000000 EBP: f774407c ESP: c0467f1c DS: 007b ES: 007b FS: 0000 GS: 0000 SS:0068 Processor swapper (pid: 0, ti=c0466000 task=c04332e0 task.ti=c0466000) Stack: c01efbda 00000060 00010002 c02857db c02816c3 c0115614 0000000f f76c2ab0 00000000 00000286 00000100 c04b5a80 c0467f64 f75b9340 00000000 00000000 0000000e c013ae75 c043fef0 f75b9340 0000000e 0000000e c013be57 00000310 Call Trace: [<c01efbda>] ioread8+0x2a/0x30 [<c02857db>] ata_bmdma_status+0xb/0x10 [<c02816c3>] ata_interrupt+0x143/0x1c0 [<c0115614>] activate_task+0x24/0x40 [<c013ae75>] handle_IRQ_event+0x25/0x60 [<c013be57>] handle_edge_irq+0x77/0xf0 [<c0104c35>] do_IRQ+0x45/0x80 [<c010322f>] common_interrupt+0x23/0x28 [<c048007b>] asus_hides_smbus_hostbridge+0x20b/0x270 [<c010162a>] default_idle+0x2a/0x40 [<c0100edf>] cpu_idle+0x3f/0x60 [<c0468aca>] start_kernel+0x1fa/0x280 [<c0468360>] unknown_bootoption+0x0/0x1f0 ======================= COde: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc <cc> cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc EIP: [<c0468211>] ignore_int+0x1/0x50 SS:ESP 0068:c0467f1c Kernel panic - not syncing: Fatal exception in interrupt Created attachment 15157 [details]
Force renegotiation after resume
Please try the attached patch and send the output of mii-tool after resume.
--
Ueimor
Your problem #1 looks like a dupe of bug:6807 Your problem #2 looks like a dupe of bug:10109 (In reply to comment #13) > Your problem #1 looks like a dupe of bug:6807 > Your problem #2 looks like a dupe of bug:10109 > It is a dupe of 6807 bug, but not 10109. I don't have PME event option in BIOS. (In reply to comment #12) > Created an attachment (id=15157) [details] > Force renegotiation after resume > > Please try the attached patch and send the output of mii-tool after resume. > > -- > Ueimor > Sorry, had no time to apply patch and check it. But is seems to me that the problem is in rtl8169_rx_interrupt handling routine. Maxim, can you give 2.6.27-rc a try ? There are a few r8169 related changes in it that could fix your problems. Thanks in advance. -- Ueimor No luck, with 2.6.27-rc8-git4. Network becomes unusable after transferring ~300 MB, but at least system did not hang. No error messages in dmesg, even with RTL8169_DEBUG turned on. As an experiment we have added udelay(10) to all the i/o read and write functions in the 2.6.26.2 kernel, and surprisingly network became more stable. I think we have had problems only once over about a week intensive network use. Probably, it is required to wait some time before/after register read/write operations? But unfortunately I didn't find any info in the datasheet. We should close this as dupe. #10109 has IMHO nothing to do with PME at all as I can trigger it without that, too. So both problems described here are already reported in other bugs. Fixed in 2.6.30. -- Ueimor |