Bug 7440
Summary: | (net 3c59x) suddenly receives no more packets | ||
---|---|---|---|
Product: | Drivers | Reporter: | Pierre Ynard (linkfanel) |
Component: | Network | Assignee: | Steffen Klassert (klassert) |
Status: | CLOSED OBSOLETE | ||
Severity: | normal | CC: | alan, petr, protasnb, stephen |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.18.1 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Bug Depends on: | 6444 | ||
Bug Blocks: | |||
Attachments: | 3c59x module logs |
Description
Pierre Ynard
2006-10-31 18:17:56 UTC
Created attachment 9390 [details]
3c59x module logs
Kernel logs with full debug output from 3c59x module.
I estimate by host timeouts that the bug has been triggered aroung 19:15.
I have read through the driver code and made further debugging. My understanding is the following. At the moment when the device "crashes", logs indicates that it receives 32 packets from the network, which is the size of the rx_ring for incoming packets. I can confirm that those 32 packets are really 32 different valid packets, sent on the network at exactly the same time, i.e. between two interrupts, and that actually there are very likely more than 32 of them, which is more than the size of the rx_ring... It looks like those further overflowing packets are eventually "ignored", since the interrupt handler reads the 32 packets on the rx_ring and returns, and then no more interrupt is ever sent by the device to signal reception of new packets. Note that it is only my understanding, but yes, I have reasons to believe that my network does send more than 32 different packets at the same time before my driver can handle them (at least that is what happens according to the driver). I guess that I could increase the size of the rx_ring to see if it "fixes" the problem. I tried changing RX_RING_SIZE from 32 to 256 packets, and it definitely solved the problem. I've got the same problem both TX/RX ring. The problem depends on wich direction have more traffic. In my case if on one iface i've got RX and on other i forward the traffic that arrives on the first, i got the message on first with RX ring and on second TX ring. I've rise up to 256 RX_RING_SIZE and TX_RING_SIZE, and seems to be ok. Do not forget the max_interrupt_work! Rise this from 32 to about 1024 or 2048. This is another big problem to this driver :( What is the status of this problem? Maybe the fix described in #3 and #4 need to be submitted to lkml? I tried to decrease RX_RING_SIZE and TX_RING_SIZE to 2 and added some printk's. I did some stress tests with iperf and the tx/rx rings were full many times but I've got no such problems with the driver. I tested this with 2.6.24 and 3c905B/C cards. It seems that those problems occur when the rings are full. Increasing the ring sizes makes it of course less likely that the ring size is full, but the problem is probaply still there. Could somebody try whether the problems are still present in newer kernels and send the config if the problem is still there. I added a dependency on bug #6444, the symptoms are very similar. Also I posted a patch to support ringsize changes with ethtool there. Detail about the patch can be found at the webpage of bug #6444. I've increased the RX_RING_SIZE and TX_RING_SIZE and max_interrupt_work and I still have to ifdown/ifup on the device running this driver every so often. It doesn't get to the state where it won't accept or send anymore packets, but it does hit a size limit, where I can got most webpages or small files but anything larger than a few KB will fail to download, either timing out after getting the initial chunk or silently failing. I understand that you'd close the ticket because there is little point in keeping it open. But, for the record, I still need to patch the driver to increase the ring buffer size to mitigate this issue. |