Bug 5507
Summary: | longhaul driver breaks 8139too | ||
---|---|---|---|
Product: | Power Management | Reporter: | Ulrich Holeschak (ulrich) |
Component: | cpufreq | Assignee: | cpufreq (cpufreq) |
Status: | CLOSED DOCUMENTED | ||
Severity: | normal | CC: | bunk, lenb, protasnb |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.13.4 | Subsystem: | |
Regression: | No | Bisected commit-id: |
Description
Ulrich Holeschak
2005-10-27 13:27:52 UTC
For more information i have now enabled the complete logging with ethtool for the 8139too device. The result is as follows: Nov 1 17:31:20 router kernel: eth2: Queued Tx packet size 160 to slot 1. Nov 1 17:31:51 router kernel: eth2: Queued Tx packet size 160 to slot 2. Nov 1 17:32:22 router kernel: eth2: Queued Tx packet size 160 to slot 3. Nov 1 17:32:29 router kernel: NETDEV WATCHDOG: eth2: transmit timed out Nov 1 17:32:29 router kernel: eth2: Transmit timeout, status 01 0000 0000 media 10. Nov 1 17:32:29 router kernel: eth2: Tx queue start entry 8 dirty entry 4. Nov 1 17:32:29 router kernel: eth2: Tx descriptor 0 is 00002000. (queue head) Nov 1 17:32:29 router kernel: eth2: Tx descriptor 1 is 00002000. Nov 1 17:32:29 router kernel: eth2: Tx descriptor 2 is 00002000. Nov 1 17:32:29 router kernel: eth2: Tx descriptor 3 is 00002000. Nov 1 17:32:29 router kernel: eth2: link up, 100Mbps, full-duplex, lpa 0x41E1 Nov 1 17:32:29 router kernel: eth2: Promiscuous mode enabled. Nov 1 17:32:29 router kernel: eth2: Queued Tx packet size 255 to slot 0. Nov 1 17:32:29 router kernel: eth2: Queued Tx packet size 249 to slot 1. Nov 1 17:32:53 router kernel: eth2: Queued Tx packet size 160 to slot 2. Nov 1 17:32:54 router kernel: eth2: rtl8139_rx() status 402001, size 0040, cur 0000. Nov 1 17:32:54 router kernel: eth2: Queued Tx packet size 42 to slot 3. Nov 1 17:32:54 router kernel: eth2: rtl8139_rx() status 604001, size 0060, cur 0044. Nov 1 17:32:54 router kernel: eth2: Queued Tx packet size 104 to slot 0. Nov 1 17:32:54 router kernel: eth2: rtl8139_rx() status 404001, size 0040, cur 00a8. Nov 1 17:32:54 router kernel: eth2: Queued Tx packet size 58 to slot 1. Nov 1 17:32:54 router kernel: eth2: rtl8139_rx() status 404001, size 0040, cur 00ec. Nov 1 17:32:54 router kernel: eth2: rtl8139_rx() status 824001, size 0082, cur 0130. Nov 1 17:32:54 router kernel: eth2: Queued Tx packet size 54 to slot 2. Nov 1 17:32:54 router kernel: eth2: Queued Tx packet size 58 to slot 3. Nov 1 17:32:54 router kernel: eth2: rtl8139_rx() status e84001, size 00e8, cur 01b8. Nov 1 17:32:54 router kernel: eth2: Queued Tx packet size 54 to slot 0. Nov 1 17:32:54 router kernel: eth2: Queued Tx packet size 155 to slot 1. Nov 1 17:32:54 router kernel: eth2: rtl8139_rx() status 1454001, size 0145, cur 02a4. Nov 1 17:32:54 router kernel: eth2: Queued Tx packet size 54 to slot 2. Nov 1 17:32:54 router kernel: eth2: Queued Tx packet size 196 to slot 3. Nov 1 17:32:54 router kernel: eth2: rtl8139_rx() status a24001, size 00a2, cur 03f0. Nov 1 17:32:54 router kernel: eth2: Queued Tx packet size 54 to slot 0. Nov 1 17:32:54 router kernel: eth2: Queued Tx packet size 93 to slot 1. Nov 1 17:32:54 router kernel: eth2: rtl8139_rx() status 404001, size 0040, cur 0498. Nov 1 17:32:54 router kernel: eth2: rtl8139_rx() status 944001, size 0094, cur 04dc. Nov 1 17:32:54 router kernel: eth2: Queued Tx packet size 54 to slot 2. Nov 1 17:32:54 router kernel: eth2: Queued Tx packet size 1514 to slot 3. Nov 1 17:32:54 router kernel: eth2: Queued Tx packet size 1410 to slot 0. Nov 1 17:32:54 router kernel: eth2: rtl8139_rx() status 404001, size 0040, cur 0574. Nov 1 17:32:54 router kernel: eth2: rtl8139_rx() status 844001, size 0084, cur 05b8. Nov 1 17:32:54 router kernel: eth2: Queued Tx packet size 54 to slot 1. Nov 1 17:32:54 router kernel: eth2: Queued Tx packet size 138 to slot 2. Nov 1 17:32:54 router kernel: eth2: rtl8139_rx() status 634001, size 0063, cur 0640. Nov 1 17:32:54 router kernel: eth2: Queued Tx packet size 93 to slot 3. Nov 1 17:32:54 router kernel: eth2: rtl8139_rx() status 964001, size 0096, cur 06a8. Nov 1 17:32:54 router kernel: eth2: Queued Tx packet size 161 to slot 0. Nov 1 17:32:54 router kernel: eth2: rtl8139_rx() status 8f4001, size 008f, cur 0744. Nov 1 17:32:54 router kernel: eth2: rtl8139_rx() status 8f4001, size 008f, cur 07d8. Nov 1 17:32:54 router kernel: eth2: Queued Tx packet size 54 to slot 1. Nov 1 17:33:24 router kernel: eth2: Queued Tx packet size 160 to slot 2. Nov 1 17:33:55 router kernel: eth2: Queued Tx packet size 160 to slot 3. Nov 1 17:34:26 router kernel: eth2: Queued Tx packet size 160 to slot 0. Nov 1 17:34:57 router kernel: eth2: Queued Tx packet size 160 to slot 1. Nov 1 17:35:05 router kernel: NETDEV WATCHDOG: eth2: transmit timed out Nov 1 17:35:05 router kernel: eth2: Transmit timeout, status 01 0000 0000 media 10. Nov 1 17:35:05 router kernel: eth2: Tx queue start entry 26 dirty entry 22. Nov 1 17:35:05 router kernel: eth2: Tx descriptor 0 is 00002000. Nov 1 17:35:05 router kernel: eth2: Tx descriptor 1 is 00002000. Nov 1 17:35:05 router kernel: eth2: Tx descriptor 2 is 00002000. (queue head) Nov 1 17:35:05 router kernel: eth2: Tx descriptor 3 is 00002000. Nov 1 17:35:05 router kernel: eth2: link up, 100Mbps, full-duplex, lpa 0x41E1 Nov 1 17:35:05 router kernel: eth2: Promiscuous mode enabled. Nov 1 17:35:28 router kernel: eth2: Queued Tx packet size 160 to slot 0. Nov 1 17:35:34 router kernel: eth2: Queued Tx packet size 255 to slot 1. Nov 1 17:35:34 router kernel: eth2: Queued Tx packet size 249 to slot 2. From my point of view i would say, that the device is not getting the transmit interrupt sometimes, but i may be completetly wrong at this point ... Now i have digged a bit deeper, this is what i have found until now: - Basically the device is able to transmit and receive, but not always. - Very often the device gets in a state, where it's neiter able to send or receive. In this state the driver tells the transmit telegram is queued, but you will never see it on the network. If something is received from the remote side the interrupt handler is never called. - If i ping from the local host it's sometimes possible to get multiple transmit/receive telegrams through, but then device blocks again. After the timeout and automatic reset transmit/receive is possible again for a short time. - If i ping from the remote side, normally every telegram get's lost. I expect that this is not an interrupt problem, because for transmitting the interrupt is only needed at the end of transmission, but the telegramn is never send over the network. It must be something else, but what? Now i had a bit more time and i made more tests. In this case i used kernel 2.6.14, which has still the same problems. I used the old (2.6.11) source code for the mii and 8139too kernel modules (with some small modifications) in the new kernel 2.6.14. The problem is still the same. This means, the problem is NOT in the 8139too driver, something else (maybe conceptual) has changed in the kernel. Slowly we are getting closer ... I have switched eth0 and eth2 again, there are interrupts coming from the card and the interrupt handler is also called. So the interrupt seems no to be the problem. Then i have rebooted with "apm=off", and everything works fine! The problem has something to to with APM but what? I tried with removing the APM code from the 8139too driver, but this did not help. At the moment i have processor frequency scaling active, this may be the reason for the problems. The interesting thing is, that things work better, if the CPU load is high on the system. Perhaps the CPU frequency scaling has no influence on the 8139too driver if the CPU load is high ... Now i tried disabling the CPU frequency scaling (with APM is still active) and everything works fine! So something has changed in the CPU frequency scaling modules that influences the 8139too driver. My cpufreq module is "longhaul". Now i have found the reason why the 8139too driver fails. It's caused by overriding the pci bus master mode in the longhaul driver! I have removed the following blocks of code in the function do_powersave() of longhaul.c: /* * get current pci bus master state for all devices * and clear bus master bit */ dev = NULL; i = 0; do { dev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, dev); if (dev != NULL) { pci_read_config_word(dev, PCI_COMMAND, &pci_cmd); cmd_state[i++] = pci_cmd; pci_cmd &= ~PCI_COMMAND_MASTER; pci_write_config_word(dev, PCI_COMMAND, pci_cmd); } } while (dev != NULL); ... /* restore pci bus master state for all devices */ dev = NULL; i = 0; do { dev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, dev); if (dev != NULL) { pci_cmd = cmd_state[i++]; pci_write_config_byte(dev, PCI_COMMAND, pci_cmd); } } while (dev != NULL); and everythings works fine! (At least with the 8139too driver). I expect that some other driver needs this code, so what to do now? Is there perhaps a different way to achive the same thing in longhaul.c? As a note, the longhaul driver got marked as BROKEN in 2.6.16.2. > From my point of view i would say, that the device is not getting the
> transmit
> interrupt sometimes, but i may be completetly wrong at this point ...
You are right. Processor is "offline" when frequency is changed. All interrupts
except 0 are masked. PCI bus mastering must also be turn off. That code You
deleted is responsible for halting PCI bus.
For "Processor type and features->Timer frequency" = 100Hz "offline" time is max
1/50s, for 250Hz it is 1/125s and for 1000Hz 1/500s.
Btw. Try to use ACPI not APM. Any update on this problem? Ulrich, have you tried suggestion in #9? Thanks. Yes i have tried suggestion #9 with no change. With newer kernel versions i had no problem with the 8139too driver, but i got instabilities during hard disk access. I have measured the current consumption of my computer with and without the longhaul driver, i nearly identical. So i did not see any reasion for using driver any more ... |