Bug 12432

Summary: sis900 transmit timeouts
Product: Networking Reporter: gionnico
Component: OtherAssignee: Arnaldo Carvalho de Melo (acme)
Status: REJECTED INVALID    
Severity: normal CC: kernel
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.28 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: 2.6.28 vanilla kernel dmesg after trying to assign a ip though dhcpcd
diff .config.works .config.noworks

Description gionnico 2009-01-11 11:00:57 UTC
Latest working kernel version: 2.6.27
Earliest failing kernel version: 2.6.28
Distribution: gentoo
Hardware Environment: i686, sis900 ethernet
Problem Description: i can't reach the machine anymore, since i've installed 2.6.28 vanilla. it worked in 2.6.27 with the same configuration.

Steps to reproduce:
You need a sis900 ethernet and even if you set an ip, the machine doesn't respond anymore to the net.
I tried assigning a dynamic ip with dhcpcd eth0 and i'll post the dmesg.
Comment 1 gionnico 2009-01-11 11:04:07 UTC
Created attachment 19747 [details]
2.6.28 vanilla kernel dmesg after trying to assign a ip though dhcpcd

The system has got a "Silicon Integrated Systems [SiS] SiS900 PCI Fast Ethernet (rev 90)" ethernet controller.

As said, the IP had already been manually set, but I can't reach the machine.

ifconfig (same as ifconfig -a)

eth0      Link encap:Ethernet  HWaddr OK
          inet addr:192.168.OK  Bcast:192.168.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
          Interrupt:22 Base address:0xa000

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
Comment 2 Daniel Drake 2009-01-11 11:23:02 UTC
Original downstream report:
https://bugs.gentoo.org/show_bug.cgi?id=253646

You can see the transmit timeout errors in the dmesg attachment above.
Comment 3 Anonymous Emailer 2009-01-11 11:40:31 UTC
Reply-To: akpm@linux-foundation.org


(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Sun, 11 Jan 2009 11:00:58 -0800 (PST) bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=12432
> 
>            Summary: [2.6.28 regression] sis900 fast ethernet breakage
>            Product: Networking
>            Version: 2.5
>      KernelVersion: 2.6.28
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Other
>         AssignedTo: acme@ghostprotocols.net
>         ReportedBy: gionnico@email.it
> 
> 
> Latest working kernel version: 2.6.27
> Earliest failing kernel version: 2.6.28
> Distribution: gentoo
> Hardware Environment: i686, sis900 ethernet
> Problem Description: i can't reach the machine anymore, since i've installed
> 2.6.28 vanilla. it worked in 2.6.27 with the same configuration.
> 
> Steps to reproduce:
> You need a sis900 ethernet and even if you set an ip, the machine doesn't
> respond anymore to the net.
> I tried assigning a dynamic ip with dhcpcd eth0 and i'll post the dmesg.

It's a regression and I'm not seeing any likely-looking changes to that
driver in 2.6.28.
Comment 4 Daniel Drake 2009-01-19 05:07:44 UTC
gionnico, given the lack of response here (the bug is not obvious), it would be helpful if you could perform a bisection. This is quite time consuming (you will have to test about 14 kernels) but it will tell us the exact commit which introduced the bug. If you have time/patience, the process is described here:

http://www.reactivated.net/weblog/archives/2006/01/using-git-bisect-to-find-buggy-kernel-patches/

Use v2.6.27 as good and v2.6.28 as bad.
Comment 5 gionnico 2009-01-19 10:29:16 UTC
Uff you could have said that I should have used the 2.6.28 config.

I used the 2.6.27 (and default adjustments=All default for new options) and the problem didn't present.

I'll patch the 2 config files diff. If the changed options shouldn't block my device (so there's a bug) I can git bisect the wrong config again.
Comment 6 gionnico 2009-01-19 10:30:54 UTC
Created attachment 19889 [details]
diff .config.works .config.noworks
Comment 7 Daniel Drake 2009-01-19 10:44:27 UTC
Please only ever use unified diffs, and please attach both config files in full.
Actually, assuming I am reading it right, you have disabled CONFIG_PCI_QUIRKS in your 2.6.28 config. Why? I don't think you want to do this.
Comment 8 Tomas Mudrunka 2009-03-04 16:14:52 UTC
I've got similar problem. I am getting lot of this messages:
harvie@harvie-ntb ~ $ dmesg | grep timeout
eth0: Transmit timeout, status 00000004 00000000 
eth0: Transmit timeout, status 00000004 00000000 
...

And pings to nearest router have 100x longer latency when using sis900 100Mb Tx Full-Duplex (or any slower mode) than when using 802.11g link to the same network.

with this NIC:
harvie@harvie-ntb ~ $ lspci | grep 900
00:04.0 Ethernet controller: Silicon Integrated Systems [SiS] SiS900 PCI Fast Ethernet (rev 91)

and with this kernel:
harvie@harvie-ntb ~ $ uname -a
Linux harvie-ntb 2.6.28-ARCH #1 SMP PREEMPT Sun Feb 22 11:03:50 UTC 2009 i686...

using noapic option

I found more people with this problem:
http://www.linuxquestions.org/questions/linux-networking-3/netdev-watchdog-eth0-transmit-timed-out-46634/

And somebody also written the patch (i'm not sure if it works):
http://linux.derkeiler.com/Mailing-Lists/Kernel/2003-10/4410.html

diff -puN drivers/net/sis900.c~sis900-tx_timeout-fix drivers/net/sis900.c
--- 25/drivers/net/sis900.c~sis900-tx_timeout-fix Mon Oct 20 16:10:04 2003
+++ 25-akpm/drivers/net/sis900.c Mon Oct 20 16:10:11 2003
@@ -1438,7 +1438,7 @@ static void sis900_tx_timeout(struct net
                         pci_unmap_single(sis_priv->pci_dev,
                                 sis_priv->tx_ring[i].bufptr, skb->len,
                                 PCI_DMA_TODEVICE);
- dev_kfree_skb(skb);
+ dev_kfree_skb_irq(skb);
                         sis_priv->tx_skbuff[i] = 0;
                         sis_priv->tx_ring[i].cmdsts = 0;
                         sis_priv->tx_ring[i].bufptr = 0;

_

- 

Hope this will help you.
Comment 9 gionnico 2009-03-05 04:59:17 UTC
(In reply to comment #7)
> Please only ever use unified diffs, and please attach both config files in
> full.
> Actually, assuming I am reading it right, you have disabled CONFIG_PCI_QUIRKS
> in your 2.6.28 config. Why? I don't think you want to do this.
> 

In my case I just needed that PCI_QUIRKS.
Comment 10 Daniel Drake 2009-03-05 05:23:22 UTC
OK, closing then. Thomas, feel free to file another bug report for your issue.