I have a problem where I can lock up a number of machines by changing the link state on a sis190 Ethernet port. For example, during a data transfer such as FTP if I unplug the Ethernet cable and plug it back in, the Ethernet interface will stop responding and the machine will lock up after a minute or so. This behaviour is repeatable. I have the sis190 driver loaded as a module. I haven't found a kernel version where this doesn't happen. It happens with kernel 2.6.20.15, for example.
Reply-To: akpm@linux-foundation.org On Thu, 15 Nov 2007 07:30:53 -0800 (PST) bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=9386 > > Summary: sis190 network driver crash > Product: Drivers > Version: 2.5 > KernelVersion: 2.6.23.1 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: high > Priority: P1 > Component: Network > AssignedTo: jgarzik@pobox.com > ReportedBy: chris@linuxepos.com > CC: romieu@fr.zoreil.com > > > I have a problem where I can lock up a number of machines by > changing the link state on a sis190 Ethernet port. For example, during > a data transfer such as FTP if I unplug the Ethernet cable and plug it > back in, the Ethernet interface will stop responding and the machine > will lock up after a minute or so. This behaviour is repeatable. I have the > sis190 driver loaded as a module. > > I haven't found a kernel version where this doesn't happen. It happens with > kernel 2.6.20.15, for example. >
From: Andrew Morton <akpm@linux-foundation.org> Date: Thu, 15 Nov 2007 11:58:41 -0800 > On Thu, 15 Nov 2007 07:30:53 -0800 (PST) bugme-daemon@bugzilla.kernel.org > wrote: > > > http://bugzilla.kernel.org/show_bug.cgi?id=9386 ... > > I have a problem where I can lock up a number of machines by > > changing the link state on a sis190 Ethernet port. For example, during > > a data transfer such as FTP if I unplug the Ethernet cable and plug it > > back in, the Ethernet interface will stop responding and the machine > > will lock up after a minute or so. This behaviour is repeatable. I have the > > sis190 driver loaded as a module. > > > > I haven't found a kernel version where this doesn't happen. It happens with > > kernel 2.6.20.15, for example. I wonder if somehow sis190_phy_task() is creating some kind of deadlock when handling the link down and up events. It takes the RTNL semaphore in sis190_phy_task() but it doesn't call anything which can see deadlocking on that. It does invoke the link-watch layer, indirectly via the various netif_carrier_{on,off}() calls it makes but those should be OK since they just schedule workqueue things. Perhaps what is contributing to the problem is that sis190_interrupt() still processes the RX and TX queues even when a link change event is signalled. Perhaps the chip doesn't like that. Francois, I noticed two issues while reviewing the driver for this bug: 1) The interrupt handler does no SMP locking, the chip might not be happy with one thread (in phy_task) programming the MDIO whilst another thread does RX/TX ring processing, for example. 2) The timeout limit check in __mdio_cmd() is buggy, it should be 99 instead of 999.
David Miller <davem@davemloft.net> : [...] > I wonder if somehow sis190_phy_task() is creating some kind > of deadlock when handling the link down and up events. I should be able to test it during the week end if nobody beats me The sis190 stands headless in the kitchen. Given the current situation here, I do not have the oomph to turn the kitchen into a debug lab when I am back from work.
Created attachment 13591 [details] remove duplicate INIT_WORK
Created attachment 13592 [details] mdio operation failure is not correctly checked
Created attachment 13593 [details] scheduling while atomic *ouch*
Created attachment 13594 [details] add a debug message
Chris, can you try the patches above against 2.6.24-rc4 ? It does not negotiate the link correctly when the cable is removed during a transfer but it does not crash any more here. I'll try to fix it tomorrow. -- Ueimor
(In reply to comment #8) > Chris, can you try the patches above against 2.6.24-rc4 ? > > It does not negotiate the link correctly when the cable is removed during > a transfer but it does not crash any more here. I'll try to fix it tomorrow. > > -- > Ueimor > Hi, I have tried kernel 2.6.24-rc3 WITHOUT the patches and can confirm that the machine does still crash. However, when running 2.6.24-rc3 WITH the above 4 patches I can NOT get the machine to crash. The interface does still stop responding as noted above. Fantastic work guys.
I have some extra hacks which seem able to recover as well. Give me a day or two to polish it. I can not warrant that it will always recover fast though. -- Ueimor
Created attachment 13657 [details] remove duplicate INIT_WORK (#2)
Created attachment 13658 [details] mdio operation failure is not correctly checked (#2)
Created attachment 13659 [details] scheduling while atomic fix (#2)
Created attachment 13660 [details] remove needless MII reset (#2)
Created attachment 13661 [details] link management simplification (#2)
Chris, can you give the #2 list a try ? It replaces the previous serie. The driver should not recover the link (yes, I'm late) but it should not crash either. An ifconfig down/up cycle should return the device to life. -- Ueimor
(In reply to comment #16) > Chris, can you give the #2 list a try ? It replaces the previous serie. > > The driver should not recover the link (yes, I'm late) but it should not > crash either. An ifconfig down/up cycle should return the device to life. > > -- > Ueimor > Hi Francois, As you asked I have patched kernel 2.6.24-rc3 with your new #2 patch set ( 5 patches in all). I can NOT get the machine to crash and as you say the link is NOT recovered. I can confirm that an ifconfig down/up does bring the link back to life. Hope this helps. Regards, Chris.
Chris: [...] > Hope this helps. Yes. Thanks for your testing. Can you give the 5 incoming patches a try on top of the preceding ones ? Their diff is close enough from the code which allows me to recover but I have not had time to test it today. YMMV. -- Ueimor
Created attachment 13684 [details] account for Tx errors (#2)
Created attachment 13685 [details] move the Tx timeout recovery task into user context (#2)
Created attachment 13686 [details] force Tx recovery (#2)
Created attachment 13687 [details] shorten timeouts (#2)
Created attachment 13688 [details] remove superfluous sis190_soft_reset (#2)
(In reply to comment #18) > Chris: > [...] > > Hope this helps. > > Yes. Thanks for your testing. > > Can you give the 5 incoming patches a try on top of the preceding ones ? > > Their diff is close enough from the code which allows me to recover but I > have > not had time to test it today. YMMV. > > -- > Ueimor > Hi Francois, OK, I patched the kernel with the 5 new patches on top of your #2 patch set. I can NOT crash the machine. However, during a data transfer, after a link state change the interface sometimes recovers and other times not. If I quickly disconnect and reconnect the link 2 or 3 times with, say, a half a second between disconnect and reconnect the interface doesn't recover even after a ifdown/up. Regards, Chris.
Created attachment 13700 [details] debug helper Chris: [...] > I can NOT crash the machine. However, during a data transfer, after a link > state change the interface sometimes recovers and other times not. If I > quickly > disconnect and reconnect the link 2 or 3 times with, say, a half a second > between disconnect and reconnect the interface doesn't recover even after a > ifdown/up. I can not reproduce it here. The netdev watchdog can be slow but it always trigger if necessary :o/ Can you: - apply the attached debug patch - sysctl -w kernel.printk="8 8 8 8" - ethtool -s ethX msglvl 65535 before ifconfig up - start the ftp xfer - gzip en send the log up to the point where the driver does not recover -- Ueimor
(In reply to comment #25) > Created an attachment (id=13700) [details] > debug helper > > Chris: > [...] > > I can NOT crash the machine. However, during a data transfer, after a link > > state change the interface sometimes recovers and other times not. If I > quickly > > disconnect and reconnect the link 2 or 3 times with, say, a half a second > > between disconnect and reconnect the interface doesn't recover even after a > > ifdown/up. > > I can not reproduce it here. The netdev watchdog can be slow but it always > trigger if necessary :o/ > > Can you: > - apply the attached debug patch > - sysctl -w kernel.printk="8 8 8 8" > - ethtool -s ethX msglvl 65535 before ifconfig up > - start the ftp xfer > - gzip en send the log up to the point where the driver does not recover > > -- > Ueimor > Hi Francois, I have patched the kernel with your degug patch. I have tried now for one and a half hours to break the driver and I can't. The driver recovers no matter what I do. I've tried multiple transfers back and forth - up to 5 at the same time, pulling the plug very quickly at times. It seems like you have done a great job. All I can presume is that I hadn't installed the module properly on the last compile - but I'm pretty sure that I had, however, I may well be mistaken. I'll roll these patches out to the 10 machines out in the field and see what happens. One machine has a very broken router on a sis190 interface that keeps changing the link state very quickly about once a day - this is how I noticed the problem in the first place. I'll report back very soon and keep trying to break the driver on my box here. Regards, Chris.
Created attachment 13749 [details] remove duplicate INIT_WORK (#3)
Created attachment 13750 [details] mdio operation failure is not correctly checked (#3)
Created attachment 13751 [details] scheduling while atomic fix (#3)
Created attachment 13752 [details] remove needless MII reset (#3)
Created attachment 13753 [details] link management simplification (#3)
Created attachment 13754 [details] account for Tx errors (#3)
Created attachment 13755 [details] move the Tx timeout recovery task into user context (#3)
Created attachment 13756 [details] force Tx recovery (#3)
Created attachment 13757 [details] shorten timeouts (#3)
Created attachment 13758 [details] remove superfluous sis190_soft_reset (#3)
Chris, can you try the serie #3 above ? The #2 serie could deadlock under specific conditions when the device was closed. It would be nice if you could plug/unplug the cable during a transfer and ifconfig down the device while pluging/unpluging the cable. -- Ueimor
(In reply to comment #37) > Chris, can you try the serie #3 above ? > > The #2 serie could deadlock under specific conditions when the device was > closed. It would be nice if you could plug/unplug the cable during a transfer > and ifconfig down the device while pluging/unpluging the cable. > > -- > Ueimor > Hi Francois, OK, I patched the kernel with the #3 set, after removing all other patches. I can NOT get the machine to crash, however, I can lock up the interface - even an ifdown/up does not bring it back to life. I have the following in the logs: Nov 27 11:48:58 devel kernel: eth0: mii ext = 0000. Nov 27 11:48:58 devel kernel: eth0: mii lpa = 40a1 adv = 01e1. Nov 27 11:48:58 devel kernel: eth0: link on 100 Mbps Half Duplex mode. ... ... after pulling the cable out a few times .... ... Nov 27 11:51:22 devel kernel: eth0: Tx timeout, status 00001a01 377673c8. Nov 27 11:51:22 devel kernel: eth0: Tx timeout, status 00001a01 37767028. Nov 27 11:51:23 devel last message repeated 2 times Nov 27 11:51:34 devel kernel: NETDEV WATCHDOG: eth0: transmit timed out Nov 27 11:51:34 devel kernel: eth0: Tx timeout, status 00001a11 37767024. Nov 27 11:51:40 devel kernel: NETDEV WATCHDOG: eth0: transmit timed out Nov 27 11:51:40 devel kernel: eth0: Tx timeout, status 00001a11 37767014. ... ... At this point the interface doesn't come back to life even with ifup/down ... Nov 27 12:01:09 devel dhcpd: receive_packet failed on eth0: Network is down Nov 27 12:01:13 devel kernel: eth0: mii ext = 0000. Nov 27 12:01:13 devel kernel: eth0: mii lpa = 40a1 adv = 01e1. Nov 27 12:01:13 devel kernel: eth0: link on 100 Mbps Half Duplex mode. Nov 27 12:02:18 devel dhcpd: receive_packet failed on eth0: Network is down ... ... I have to rmmod sis190/ modprobe sis190/ ifup eth0 to get it going again.. ... Nov 27 12:02:21 devel kernel: ACPI: PCI interrupt for device 0000:00:04.0 disabl ed Nov 27 12:02:27 devel kernel: sis190 Gigabit Ethernet driver 1.2 loaded. Nov 27 12:02:27 devel kernel: ACPI: PCI Interrupt 0000:00:04.0[A] -> Link [LNKD] -> GSI 10 (level, low) -> IRQ 10 Nov 27 12:02:27 devel kernel: 0000:00:04.0: Read MAC address from APC. Nov 27 12:02:28 devel kernel: 0000:00:04.0: Realtek PHY RTL8201 transceiver at a ddress 1. Nov 27 12:02:28 devel kernel: 0000:00:04.0: Using transceiver at address 1 as de fault. Nov 27 12:02:28 devel kernel: 0000:00:04.0: SiS 190 PCI Fast Ethernet adapter at f8a0ec00 (IRQ: 10), 00:17:31:90:ba:e5 Nov 27 12:02:28 devel kernel: eth0: GMII mode. Nov 27 12:02:28 devel kernel: eth0: Enabling Auto-negotiation. Nov 27 12:02:33 devel kernel: eth0: mii ext = 0000. Nov 27 12:02:33 devel kernel: eth0: mii lpa = 40a1 adv = 01e1. Nov 27 12:02:33 devel kernel: eth0: link on 100 Mbps Half Duplex mode. .... .... I suspect that I should have had some more debugging patched in. Do you want me to try again with the debug patches that you sent? Regards, Chris.
Chris: [...] > I suspect that I should have had some more debugging patched in. Do you want > me > to try again with the debug patches that you sent? Not immediately. Can you simply send me the complete dmesg above ? Your annotations are useful but I am curious to know what hides behind the three little dots. -- Ueimor
Created attachment 13770 [details] Hunt for the reset_task against rmmod/insmod difference Chris, can you add the patch above on top of the #3 serie ? Just FIY, my sis1900 looks like this when it recovers: # ifconfig eth2 eth2 Link encap:Ethernet HWaddr 00:11:D8:17:FF:62 inet addr:10.0.1.2 Bcast:10.255.255.255 Mask:255.0.0.0 inet6 addr: fe80::211:d8ff:fe17:ff62/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:307246 errors:2 dropped:0 overruns:0 frame:2 ^ ^ TX packets:155076 errors:0 dropped:64 overruns:0 carrier:0 ^^ collisions:0 txqueuelen:1000 RX bytes:440946478 (420.5 MiB) TX bytes:10483464 (9.9 MiB) Interrupt:22 Base address:0xdead -- Ueimor
Hi, I am trying to apply all the patches from the #3 serie to help on testing. But it fail on chuck 3 of link management simplification (#3). It is because the debug helper. Should I skip the debug helper? Still a bit tedious to test this stuff :-)
Juan, the patches should be applied in this order: - remove duplicate INIT_WORK (#3) - mdio operation failure is not correctly checked (#3) - scheduling while atomic fix (#3) - remove needless MII reset (#3) - link management simplification (#3) - account for Tx errors (#3) - move the Tx timeout recovery task into user context (#3) - force Tx recovery (#3) - shorten timeouts (#3) - remove superfluous sis190_soft_reset (#3) - Hunt for the reset_task against rmmod/insmod difference -- Ueimor
(In reply to comment #39) > Chris: > [...] > > I suspect that I should have had some more debugging patched in. Do you > want me > > to try again with the debug patches that you sent? > > Not immediately. Can you simply send me the complete dmesg above ? > Your annotations are useful but I am curious to know what hides behind > the three little dots. > Hi Francois, There's not really anything of interest behind the dots, but here goes: Nov 27 11:48:58 devel dhcpd: receive_packet failed on eth0: Network is down Nov 27 11:48:58 devel kernel: eth0: mii ext = 0000. Nov 27 11:48:58 devel kernel: eth0: mii lpa = 40a1 adv = 01e1. Nov 27 11:48:58 devel kernel: eth0: link on 100 Mbps Half Duplex mode. Nov 27 11:49:10 devel login: PAM unable to dlopen(/lib/security/pam_console.so) Nov 27 11:49:10 devel login: PAM adding faulty module: /lib/security/pam_console .so Nov 27 11:49:10 devel PAM-securetty[3469]: Couldn't open /etc/securetty Nov 27 11:49:11 devel pam_winbind[3469]: write to socket failed! Nov 27 11:49:11 devel pam_winbind[3469]: internal module error (retval = 3, user = `root') Nov 27 11:49:11 devel pam_winbind[3469]: write to socket failed! Nov 27 11:49:11 devel pam_winbind[3469]: internal module error (retval = 3, user = `root') Nov 27 11:49:11 devel PAM_pwdb[3469]: (login) session opened for user root by LO GIN(uid=0) Nov 27 11:49:20 devel ftpd[4893]: wu-ftpd - TLS settings: control allow, client_ cert allow, data allow Nov 27 11:49:23 devel pam_winbind[4893]: write to socket failed! Nov 27 11:49:23 devel pam_winbind[4893]: internal module error (retval = 3, user = `root') Nov 27 11:49:23 devel pam_winbind[4893]: write to socket failed! Nov 27 11:49:23 devel pam_winbind[4893]: internal module error (retval = 3, user = `root') Nov 27 11:49:23 devel PAM_unix[4893]: (ftp) session opened for user root by (uid =0) Nov 27 11:49:23 devel ftpd: devel.linuxepos1.demon.co.uk: root[4893]: FTP LOGIN FROM devel.linuxepos1.demon.co.uk [192.168.0.1], root Nov 27 11:49:31 devel login: PAM unable to dlopen(/lib/security/pam_console.so) Nov 27 11:49:31 devel login: PAM adding faulty module: /lib/security/pam_console .so Nov 27 11:49:31 devel PAM-securetty[3470]: Couldn't open /etc/securetty Nov 27 11:49:32 devel pam_winbind[3470]: write to socket failed! Nov 27 11:49:32 devel pam_winbind[3470]: internal module error (retval = 3, user = `root') Nov 27 11:49:32 devel pam_winbind[3470]: write to socket failed! Nov 27 11:49:32 devel pam_winbind[3470]: internal module error (retval = 3, user = `root') Nov 27 11:49:32 devel PAM_pwdb[3470]: (login) session opened for user root by LO GIN(uid=0) Nov 27 11:50:01 devel MailScanner: succeeded Nov 27 11:50:01 devel last message repeated 2 times Nov 27 11:50:12 devel PAM_unix[4893]: (ftp) session closed for user root Nov 27 11:50:12 devel ftpd: devel.linuxepos1.demon.co.uk: root: QUIT[4893]: FTP session closed Nov 27 11:51:22 devel kernel: eth0: Tx timeout, status 00001a01 377673c8. Nov 27 11:51:22 devel kernel: eth0: Tx timeout, status 00001a01 37767028. Nov 27 11:51:23 devel last message repeated 2 times Nov 27 11:51:34 devel kernel: NETDEV WATCHDOG: eth0: transmit timed out Nov 27 11:51:34 devel kernel: eth0: Tx timeout, status 00001a11 37767024. Nov 27 11:51:40 devel kernel: NETDEV WATCHDOG: eth0: transmit timed out Nov 27 11:51:40 devel kernel: eth0: Tx timeout, status 00001a11 37767014. Nov 27 11:58:55 devel nmbd[3293]: [2007/11/27 11:58:55, 0] nmbd/nmbd_browsesync. c:find_domain_master_name_query_fail(351) Nov 27 11:58:55 devel nmbd[3293]: find_domain_master_name_query_fail: Nov 27 11:58:55 devel nmbd[3293]: Unable to find the Domain Master Browser nam e LINUXEPOS<1b> for the workgroup LINUXEPOS. Nov 27 11:58:55 devel nmbd[3293]: Unable to sync browse lists in this workgrou p. Nov 27 12:00:02 devel /sbin/hotplug: no runnable /etc/hotplug/kernel.agent is in stalled Nov 27 12:00:02 devel last message repeated 2 times Nov 27 12:00:02 devel MailScanner: succeeded Nov 27 12:00:02 devel /sbin/hotplug: no runnable /etc/hotplug/kernel.agent is in stalled Nov 27 12:00:02 devel MailScanner: succeeded Nov 27 12:00:02 devel MailScanner: succeeded Nov 27 12:00:34 devel /sbin/hotplug: no runnable /etc/hotplug/kernel.agent is in stalled Nov 27 12:00:35 devel /sbin/hotplug: no runnable /etc/hotplug/kernel.agent is in stalled Nov 27 12:01:09 devel dhcpd: receive_packet failed on eth0: Network is down Nov 27 12:01:13 devel kernel: eth0: mii ext = 0000. Nov 27 12:01:13 devel kernel: eth0: mii lpa = 40a1 adv = 01e1. Nov 27 12:01:13 devel kernel: eth0: link on 100 Mbps Half Duplex mode. Nov 27 12:02:18 devel dhcpd: receive_packet failed on eth0: Network is down Nov 27 12:02:21 devel kernel: ACPI: PCI interrupt for device 0000:00:04.0 disabl ed Nov 27 12:02:21 devel /sbin/hotplug: no runnable /etc/hotplug/drivers.agent is i nstalled Nov 27 12:02:21 devel /sbin/hotplug: no runnable /etc/hotplug/module.agent is in stalled Nov 27 12:02:27 devel kernel: sis190 Gigabit Ethernet driver 1.2 loaded. Nov 27 12:02:27 devel kernel: ACPI: PCI Interrupt 0000:00:04.0[A] -> Link [LNKD] -> GSI 10 (level, low) -> IRQ 10 Nov 27 12:02:27 devel kernel: 0000:00:04.0: Read MAC address from APC. Nov 27 12:02:27 devel /sbin/hotplug: no runnable /etc/hotplug/module.agent is in stalled Nov 27 12:02:27 devel /sbin/hotplug: no runnable /etc/hotplug/drivers.agent is i nstalled Nov 27 12:02:28 devel kernel: 0000:00:04.0: Realtek PHY RTL8201 transceiver at a ddress 1. Nov 27 12:02:28 devel kernel: 0000:00:04.0: Using transceiver at address 1 as de fault. Nov 27 12:02:28 devel kernel: 0000:00:04.0: SiS 190 PCI Fast Ethernet adapter at f8a0ec00 (IRQ: 10), 00:17:31:90:ba:e5 Nov 27 12:02:28 devel kernel: eth0: GMII mode. Nov 27 12:02:28 devel kernel: eth0: Enabling Auto-negotiation. Nov 27 12:02:33 devel kernel: eth0: mii ext = 0000. Nov 27 12:02:33 devel kernel: eth0: mii lpa = 40a1 adv = 01e1. Nov 27 12:02:33 devel kernel: eth0: link on 100 Mbps Half Duplex mode. Nov 27 12:05:20 devel login: PAM unable to dlopen(/lib/security/pam_console.so) Nov 27 12:05:20 devel login: PAM adding faulty module: /lib/security/pam_console .so Nov 27 12:05:22 devel PAM-securetty[8382]: Couldn't open /etc/securetty Nov 27 12:05:23 devel pam_winbind[8382]: write to socket failed! Nov 27 12:05:23 devel pam_winbind[8382]: internal module error (retval = 3, user = `root') Nov 27 12:05:23 devel pam_winbind[8382]: write to socket failed! Nov 27 12:05:23 devel pam_winbind[8382]: internal module error (retval = 3, user = `root') Nov 27 12:05:23 devel PAM_pwdb[8382]: (login) session opened for user root by (u id=0) Nov 27 12:05:55 devel login: PAM unable to dlopen(/lib/security/pam_console.so) Nov 27 12:05:55 devel login: PAM adding faulty module: /lib/security/pam_console .so Nov 27 12:05:57 devel PAM-securetty[8412]: Couldn't open /etc/securetty
(In reply to comment #40) > Created an attachment (id=13770) [details] > Hunt for the reset_task against rmmod/insmod difference > > Chris, can you add the patch above on top of the #3 serie ? > > Just FIY, my sis1900 looks like this when it recovers: > > # ifconfig eth2 > eth2 Link encap:Ethernet HWaddr 00:11:D8:17:FF:62 > inet addr:10.0.1.2 Bcast:10.255.255.255 Mask:255.0.0.0 > inet6 addr: fe80::211:d8ff:fe17:ff62/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:307246 errors:2 dropped:0 overruns:0 frame:2 > ^ ^ > TX packets:155076 errors:0 dropped:64 overruns:0 carrier:0 > ^^ > collisions:0 txqueuelen:1000 > RX bytes:440946478 (420.5 MiB) TX bytes:10483464 (9.9 MiB) > Interrupt:22 Base address:0xdead > > -- > Ueimor > Hi Francois, With this new patch on top of #3 the sis190 interface simply doesn't respond. If I try to ping another machine on the sis190 network there is no reply. I can ping the IP address of the sis190 interface, but that's all. If I remove the patch with -R and recompile the interface works again. I have repeated the patch/ compile process 3 times now and there is no change. Here's dmesg with the non responsive patch: ACPI: PCI interrupt for device 0000:00:04.0 disabled sis190 Gigabit Ethernet driver 1.2 loaded. ACPI: PCI Interrupt 0000:00:04.0[A] -> Link [LNKD] -> GSI 10 (level, low) -> IRQ 10 PCI: Setting latency timer of device 0000:00:04.0 to 64 0000:00:04.0: Read MAC address from APC. 0000:00:04.0: Realtek PHY RTL8201 transceiver at address 1. 0000:00:04.0: Using transceiver at address 1 as default. 0000:00:04.0: SiS 190 PCI Fast Ethernet adapter at dea0ec00 (IRQ: 10), 00:17:31:90:bc:35 eth0: GMII mode. eth0: Enabling Auto-negotiation. eth0: mii ext = 0000. eth0: mii lpa = 40a1 adv = 01e1. eth0: link on 100 Mbps Half Duplex mode. Here's ifconfig for the interface: eth0 Link encap:Ethernet HWaddr 00:17:31:90:BC:35 inet addr:192.168.0.1 Bcast:192.168.0.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:6 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:360 (360.0 b) TX bytes:0 (0.0 b) Interrupt:10 Base address:0xdead
My output said: 0000:00:04.0: Unknown PHY transceiver at address 1. 0000:00:04.0: Using transceiver at address 1 as default. eth0:RGMII mode. it looks a similar problem to this one: http://www.ussg.iu.edu/hypermail/linux/kernel/0408.3/0244.html
(In reply to comment #45) > My output said: > > 0000:00:04.0: Unknown PHY transceiver at address 1. > 0000:00:04.0: Using transceiver at address 1 as default. > > eth0:RGMII mode. > > it looks a similar problem to this one: > http://www.ussg.iu.edu/hypermail/linux/kernel/0408.3/0244.html > Hi Juan, The problem you refer to is for a sis900 driver. This bug# is for the sis190 driver. Is yours a sis190 or sis900, I wonder? Regards, Chris.
Chris, I have a sis190 driver. The problem I was refering was about an unknow PHY transceiver. On sis190.c is defined here: } mii_chip_table[] = { { "Broadcom PHY BCM5461", { 0x0020, 0x60c0 }, LAN, F_PHY_BCM5461 }, { "Broadcom PHY AC131", { 0x0143, 0xbc70 }, LAN, 0 }, { "Agere PHY ET1101B", { 0x0282, 0xf010 }, LAN, 0 }, { "Marvell PHY 88E1111", { 0x0141, 0x0cc0 }, LAN, F_PHY_88E1111 }, { "Realtek PHY RTL8201", { 0x0000, 0x8200 }, LAN, 0 }, { NULL, } }; I wonder if this unknow issue get fixed putting the right values.
Chris : [...] > With this new patch on top of #3 the sis190 interface simply doesn't respond. Ok, revert this patch and forget it for now. I have set up a git tree which contains an updated version of ethtool at: git://kernel.org/pub/scm/linux/kernel/git/romieu/ethtool.git Please do a: $ git clone git://git.kernel.org/pub/scm/linux/kernel/git/romieu/ethtool.git $ cd ethtool $ git checkout -f sis19x $ ./autogen.sh; ./configure; make The build directory should contain an ethtool binary which supports dumping the registers of the sis19x ('ethtool -d ethX'). Can you send the content of the registers: - when the adapter is up and running (two samples separated by a few ping packets) - when the adapter is lost in wonderland (before and after the watchdog kicks in if possible) -- Ueimor
Created attachment 13842 [details] Display more info when the PHY is unknown Juan, can you add the attached patch on top of the #3 serie and report the resulting messages ? Thanks in advance. -- Ueimor
I guess that the important it is this one: 0000:00:04.0: Unknown PHY transceiver at address 1 (001c:c912)
Looking information about the PHY on the system, I had opened the case. I knew that it had a SIS968 chipset. on the features page said: Gigabit Ethernet MAC Controller - 10/100/1000Mbps triple speed - MII/RGMII standard interface to support external PHY but my surprise when looking at the motherboard was that it has a RTL8211BL chip inside. So I made this change on the driver: { "Realtek PHY RTL8201", { 0x0000, 0x8200 }, LAN, 0 }, + { "Realtek PHY RTL8211BL",{ 0x001c, 0xc912 }, LAN, 0 }, { NULL, } }; Is any way to find oout more info about this chip?
juanjo@apertus.es: [...] > { "Realtek PHY RTL8201", { 0x0000, 0x8200 }, LAN, 0 }, > + { "Realtek PHY RTL8211BL",{ 0x001c, 0xc912 }, LAN, 0 }, > { NULL, } > }; > > Is any way to find oout more info about this chip ? The datasheet is available at http://www.realtek.com.tw/ -> Products -> Communications Network ICs -> PHYceivers 100/100/1000 Gigabit Ethernet -> 1 port It seem rather classical though. I would expect the current code to handle it once the ID is added. Do you still experience problems with the #3 serie + the 968 cmos access code + your 8211 change ?
upss. I never thought of looking at that page, I google for RTL8211BL and I could no t find usefull information on the couple of pages... sorry for the hashle. my 8211 change did not help to recognize the PHY chip. I am trying without the #3 serie and over 2.6.24-rc4 . I can not find why it is failing to identify this chip. Be a bit patient with me :-) just one question: 001c and c912 are the 3th and 4th of Basic registers of MII PHY when checking with mii-diag
Juan: [...] > just one question: 001c and c912 are the 3th and 4th of Basic registers of > MII > PHY when checking with mii-diag Yes, as outlined in 7.2.3 p.23 and 7.2.4 p.24 of RTL8211B(L)_DataSheet_1.4.pdf. -- Ueimor
Created attachment 13887 [details] Removing 0xfff0 allows to identify PHY chip I had to removed this condition to allow the transceiver to de identify
ok, I modify the sis190_init_phy() so it detects the transceiver propertly. But it does not modify the behaivour. There is one lite detail that bugs me now. This is conected to a 10Base router, but it said: "eth0: RGMII mode." I thought that this should be "MII mode."
Juan: [...] > I thought that this should be "MII mode." Not necessarily. We could fail the autonegotiation for unrelated reasons. Can you send a patch of all your changes (on top of ...) as well as a complete dmesg and a 'mii-tool -vv' ? -- Ueimor
Created attachment 13899 [details] output of mii-diag -vvv
Created attachment 13900 [details] dmesg with all the patches from test #3 and personal changes.
Created attachment 13901 [details] ID for PHY RTL8221BL , Bridge search update from #9467 and attach13887 Here is the changes I made, some from bug 9467, others from your suggestions and the third one is a proposal so the PHY gets identifyed (I am not sure if this one breaks somethingelse)
Created attachment 14026 [details] Ethtool output 1 This is the first run of your patched ethtool. At this point the interface is up and running fine.
Created attachment 14027 [details] Ethtool output 2 This is the output of ethtool after a bit of data has traversed the interface - at this point all is well with the interface.
Created attachment 14028 [details] Ethtool output 3 An ftp "put" was set running and the cable was unplugged a few times. This is the output of ethtool before the NETDEV WATCHDOG kicked in. At this point the interface is unresponsive.
Created attachment 14029 [details] Ethtool output 4 This is the output of ethtool after the NETDEV WATCHDOG kicked in. The interface is still unresponsive. I waited a few minutes and it was still unresponsive so I ifdown/up - still no joy. I eventually had to do an ifdown eth0; rmmod sis190; modprobe sis190; ifup eth0. The interface then burst in to life. Appologies for the delay in getting back - I had to travel at short notice.
If just tested the latests patchs, on the Ubuntu Feisty Kernel ( 2.6.20 ) with some minor changes so the patchs would work and compile. All works fine, PC does not hang, I can remove the cable and re-connect and connection works. Just have 1 problem, when I try to rmmod sis190 the pc goes crazy. Good work Francois and all, this "dam" card is almost working great :)
Created attachment 14794 [details] sis190.c for 2.6.20 Kernel (Ubuntu Feisty) This sis190.c has the patches from Francois (#3 Series) with minor changes to compile in the 2.6.20 kernel. It works, but when rmmod sis190 the pc goes nuts. Havent looked at the reason still, no time.
I am testing right now with kernel 2.6.25-rc6 and some of the (#3 series) can not be applied to the kernel anymore. I am seeing errors when transfer files using the cifs module. But I do not know how to report back with useful information. Please provide some status and a bit of info on how to help :-)
Comment on attachment 13749 [details] remove duplicate INIT_WORK (#3) The patch is included in 2.6.24.
Comment on attachment 13751 [details] scheduling while atomic fix (#3) The patch is included in 2.6.24.
Comment on attachment 13750 [details] mdio operation failure is not correctly checked (#3) The patch is included in 2.6.24.
Is anything outstanding on this or can I close it ?