Bug 9386 - sis190 network driver crash
Summary: sis190 network driver crash
Status: RESOLVED PATCH_ALREADY_AVAILABLE
Alias: None
Product: Drivers
Classification: Unclassified
Component: Network (show other bugs)
Hardware: All Linux
: P1 high
Assignee: Francois Romieu
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-11-15 07:30 UTC by Christopher Moore
Modified: 2009-04-08 10:04 UTC (History)
6 users (show)

See Also:
Kernel Version: 2.6.23.1
Subsystem:
Regression: No
Bisected commit-id:


Attachments
remove duplicate INIT_WORK (798 bytes, text/plain)
2007-11-17 13:34 UTC, Francois Romieu
Details
mdio operation failure is not correctly checked (770 bytes, text/plain)
2007-11-17 13:35 UTC, Francois Romieu
Details
scheduling while atomic *ouch* (959 bytes, text/plain)
2007-11-17 13:36 UTC, Francois Romieu
Details
add a debug message (1011 bytes, text/plain)
2007-11-17 13:36 UTC, Francois Romieu
Details
remove duplicate INIT_WORK (#2) (805 bytes, text/plain)
2007-11-20 15:09 UTC, Francois Romieu
Details
mdio operation failure is not correctly checked (#2) (770 bytes, text/plain)
2007-11-20 15:10 UTC, Francois Romieu
Details
scheduling while atomic fix (#2) (997 bytes, text/plain)
2007-11-20 15:11 UTC, Francois Romieu
Details
remove needless MII reset (#2) (1.14 KB, text/plain)
2007-11-20 15:12 UTC, Francois Romieu
Details
link management simplification (#2) (4.63 KB, text/plain)
2007-11-20 15:12 UTC, Francois Romieu
Details
account for Tx errors (#2) (2.38 KB, text/plain)
2007-11-21 15:35 UTC, Francois Romieu
Details
move the Tx timeout recovery task into user context (#2) (4.48 KB, text/plain)
2007-11-21 15:35 UTC, Francois Romieu
Details
force Tx recovery (#2) (1.49 KB, text/plain)
2007-11-21 15:36 UTC, Francois Romieu
Details
shorten timeouts (#2) (896 bytes, text/plain)
2007-11-21 15:36 UTC, Francois Romieu
Details
remove superfluous sis190_soft_reset (#2) (838 bytes, text/plain)
2007-11-21 15:37 UTC, Francois Romieu
Details
debug helper (810 bytes, text/plain)
2007-11-22 14:03 UTC, Francois Romieu
Details
remove duplicate INIT_WORK (#3) (805 bytes, text/plain)
2007-11-25 14:33 UTC, Francois Romieu
Details
mdio operation failure is not correctly checked (#3) (783 bytes, text/plain)
2007-11-25 14:34 UTC, Francois Romieu
Details
scheduling while atomic fix (#3) (940 bytes, text/plain)
2007-11-25 14:34 UTC, Francois Romieu
Details
remove needless MII reset (#3) (1.14 KB, text/plain)
2007-11-25 14:35 UTC, Francois Romieu
Details
link management simplification (#3) (6.36 KB, text/plain)
2007-11-25 14:35 UTC, Francois Romieu
Details
account for Tx errors (#3) (2.38 KB, text/plain)
2007-11-25 14:36 UTC, Francois Romieu
Details
move the Tx timeout recovery task into user context (#3) (4.61 KB, text/plain)
2007-11-25 14:36 UTC, Francois Romieu
Details
force Tx recovery (#3) (1.59 KB, text/x-patch)
2007-11-25 14:37 UTC, Francois Romieu
Details
shorten timeouts (#3) (896 bytes, text/plain)
2007-11-25 14:37 UTC, Francois Romieu
Details
remove superfluous sis190_soft_reset (#3) (838 bytes, text/plain)
2007-11-25 14:37 UTC, Francois Romieu
Details
Hunt for the reset_task against rmmod/insmod difference (1.13 KB, text/plain)
2007-11-27 14:23 UTC, Francois Romieu
Details
Display more info when the PHY is unknown (679 bytes, text/plain)
2007-12-03 14:20 UTC, Francois Romieu
Details
Removing 0xfff0 allows to identify PHY chip (398 bytes, application/octet-stream)
2007-12-06 02:43 UTC, Juan Jose Pablos
Details
output of mii-diag -vvv (1.38 KB, application/octet-stream)
2007-12-06 16:26 UTC, Juan Jose Pablos
Details
dmesg with all the patches from test #3 and personal changes. (11.44 KB, application/octet-stream)
2007-12-06 16:30 UTC, Juan Jose Pablos
Details
ID for PHY RTL8221BL , Bridge search update from #9467 and attach13887 (2.72 KB, text/x-patch)
2007-12-06 16:38 UTC, Juan Jose Pablos
Details
Ethtool output 1 (1.21 KB, application/octet-stream)
2007-12-14 04:45 UTC, Christopher Moore
Details
Ethtool output 2 (1.21 KB, text/plain)
2007-12-14 04:48 UTC, Christopher Moore
Details
Ethtool output 3 (1.21 KB, text/plain)
2007-12-14 04:50 UTC, Christopher Moore
Details
Ethtool output 4 (1.22 KB, text/plain)
2007-12-14 04:53 UTC, Christopher Moore
Details
sis190.c for 2.6.20 Kernel (Ubuntu Feisty) (45.78 KB, text/plain)
2008-02-13 08:52 UTC, Marco Silva
Details

Description Christopher Moore 2007-11-15 07:30:52 UTC
I have a problem where I can lock up a number of machines by 
changing the link state on a sis190 Ethernet port. For example, during 
a data transfer such as FTP if I unplug the Ethernet cable and plug it 
back in, the Ethernet interface will stop responding and the machine 
will lock up after a minute or so. This behaviour is repeatable. I have the sis190 driver loaded as a module.

I haven't found a kernel version where this doesn't happen. It happens with kernel 2.6.20.15, for example.
Comment 1 Anonymous Emailer 2007-11-15 11:59:23 UTC
Reply-To: akpm@linux-foundation.org

On Thu, 15 Nov 2007 07:30:53 -0800 (PST) bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=9386
> 
>            Summary: sis190 network driver crash
>            Product: Drivers
>            Version: 2.5
>      KernelVersion: 2.6.23.1
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: high
>           Priority: P1
>          Component: Network
>         AssignedTo: jgarzik@pobox.com
>         ReportedBy: chris@linuxepos.com
>                 CC: romieu@fr.zoreil.com
> 
> 
> I have a problem where I can lock up a number of machines by 
> changing the link state on a sis190 Ethernet port. For example, during 
> a data transfer such as FTP if I unplug the Ethernet cable and plug it 
> back in, the Ethernet interface will stop responding and the machine 
> will lock up after a minute or so. This behaviour is repeatable. I have the
> sis190 driver loaded as a module.
> 
> I haven't found a kernel version where this doesn't happen. It happens with
> kernel 2.6.20.15, for example.
> 
Comment 2 David S. Miller 2007-11-15 15:00:49 UTC
From: Andrew Morton <akpm@linux-foundation.org>
Date: Thu, 15 Nov 2007 11:58:41 -0800

> On Thu, 15 Nov 2007 07:30:53 -0800 (PST) bugme-daemon@bugzilla.kernel.org
> wrote:
> 
> > http://bugzilla.kernel.org/show_bug.cgi?id=9386
 ...
> > I have a problem where I can lock up a number of machines by 
> > changing the link state on a sis190 Ethernet port. For example, during 
> > a data transfer such as FTP if I unplug the Ethernet cable and plug it 
> > back in, the Ethernet interface will stop responding and the machine 
> > will lock up after a minute or so. This behaviour is repeatable. I have the
> > sis190 driver loaded as a module.
> > 
> > I haven't found a kernel version where this doesn't happen. It happens with
> > kernel 2.6.20.15, for example.

I wonder if somehow sis190_phy_task() is creating some kind
of deadlock when handling the link down and up events.

It takes the RTNL semaphore in sis190_phy_task() but it doesn't
call anything which can see deadlocking on that.

It does invoke the link-watch layer, indirectly via the
various netif_carrier_{on,off}() calls it makes but those
should be OK since they just schedule workqueue things.

Perhaps what is contributing to the problem is that
sis190_interrupt() still processes the RX and TX queues
even when a link change event is signalled.  Perhaps
the chip doesn't like that.

Francois, I noticed two issues while reviewing the driver for
this bug:

1) The interrupt handler does no SMP locking, the chip might
   not be happy with one thread (in phy_task) programming
   the MDIO whilst another thread does RX/TX ring processing,
   for example.

2) The timeout limit check in __mdio_cmd() is buggy, it should
   be 99 instead of 999.
Comment 3 Francois Romieu 2007-11-15 15:38:25 UTC
David Miller <davem@davemloft.net> :
[...]
> I wonder if somehow sis190_phy_task() is creating some kind
> of deadlock when handling the link down and up events.

I should be able to test it during the week end if nobody beats me

The sis190 stands headless in the kitchen. Given the current situation
here, I do not have the oomph to turn the kitchen into a debug lab when
I am back from work.
Comment 4 Francois Romieu 2007-11-17 13:34:23 UTC
Created attachment 13591 [details]
remove duplicate INIT_WORK
Comment 5 Francois Romieu 2007-11-17 13:35:42 UTC
Created attachment 13592 [details]
mdio operation failure is not correctly checked
Comment 6 Francois Romieu 2007-11-17 13:36:11 UTC
Created attachment 13593 [details]
scheduling while atomic *ouch*
Comment 7 Francois Romieu 2007-11-17 13:36:46 UTC
Created attachment 13594 [details]
add a debug message
Comment 8 Francois Romieu 2007-11-17 13:41:27 UTC
Chris, can you try the patches above against 2.6.24-rc4 ?

It does not negotiate the link correctly when the cable is removed during
a transfer but it does not crash any more here. I'll try to fix it tomorrow.

-- 
Ueimor
Comment 9 Christopher Moore 2007-11-19 04:01:23 UTC
(In reply to comment #8)
> Chris, can you try the patches above against 2.6.24-rc4 ?
> 
> It does not negotiate the link correctly when the cable is removed during
> a transfer but it does not crash any more here. I'll try to fix it tomorrow.
> 
> -- 
> Ueimor
> 
Hi,

I have tried kernel 2.6.24-rc3 WITHOUT the patches and can confirm that the machine does still crash. However, when running 2.6.24-rc3 WITH the above 4 patches I can NOT get the machine to crash. The interface does still stop responding as noted above. 

Fantastic work guys.
Comment 10 Francois Romieu 2007-11-19 15:30:36 UTC
I have some extra hacks which seem able to recover as well. Give me a day or two
to polish it. I can not warrant that it will always recover fast though.

-- 
Ueimor
Comment 11 Francois Romieu 2007-11-20 15:09:29 UTC
Created attachment 13657 [details]
remove duplicate INIT_WORK (#2)
Comment 12 Francois Romieu 2007-11-20 15:10:37 UTC
Created attachment 13658 [details]
mdio operation failure is not correctly checked (#2)
Comment 13 Francois Romieu 2007-11-20 15:11:20 UTC
Created attachment 13659 [details]
scheduling while atomic fix (#2)
Comment 14 Francois Romieu 2007-11-20 15:12:16 UTC
Created attachment 13660 [details]
remove needless MII reset (#2)
Comment 15 Francois Romieu 2007-11-20 15:12:57 UTC
Created attachment 13661 [details]
link management simplification (#2)
Comment 16 Francois Romieu 2007-11-20 15:18:23 UTC
Chris, can you give the #2 list a try ? It replaces the previous serie.

The driver should not recover the link (yes, I'm late) but it should not
crash either. An ifconfig down/up cycle should return the device to life.

-- 
Ueimor
Comment 17 Christopher Moore 2007-11-21 06:25:23 UTC
(In reply to comment #16)
> Chris, can you give the #2 list a try ? It replaces the previous serie.
> 
> The driver should not recover the link (yes, I'm late) but it should not
> crash either. An ifconfig down/up cycle should return the device to life.
> 
> -- 
> Ueimor
> 

Hi Francois,

As you asked I have patched kernel 2.6.24-rc3 with your new #2 patch set ( 5 patches in all). I can NOT get the machine to crash and as you say the link is NOT recovered.
I can confirm that an ifconfig down/up does bring the link back to life.

Hope this helps.

Regards, Chris.
Comment 18 Francois Romieu 2007-11-21 15:33:58 UTC
Chris:
[...]
> Hope this helps.

Yes. Thanks for your testing.

Can you give the 5 incoming patches a try on top of the preceding ones ?

Their diff is close enough from the code which allows me to recover but I have
not had time to test it today. YMMV.

-- 
Ueimor
Comment 19 Francois Romieu 2007-11-21 15:35:01 UTC
Created attachment 13684 [details]
account for Tx errors (#2)
Comment 20 Francois Romieu 2007-11-21 15:35:47 UTC
Created attachment 13685 [details]
move the Tx timeout recovery task into user context (#2)
Comment 21 Francois Romieu 2007-11-21 15:36:21 UTC
Created attachment 13686 [details]
force Tx recovery (#2)
Comment 22 Francois Romieu 2007-11-21 15:36:52 UTC
Created attachment 13687 [details]
shorten timeouts (#2)
Comment 23 Francois Romieu 2007-11-21 15:37:21 UTC
Created attachment 13688 [details]
remove superfluous sis190_soft_reset (#2)
Comment 24 Christopher Moore 2007-11-22 05:06:39 UTC
(In reply to comment #18)
> Chris:
> [...]
> > Hope this helps.
> 
> Yes. Thanks for your testing.
> 
> Can you give the 5 incoming patches a try on top of the preceding ones ?
> 
> Their diff is close enough from the code which allows me to recover but I
> have
> not had time to test it today. YMMV.
> 
> -- 
> Ueimor
> 

Hi Francois,

OK, I patched the kernel with the 5 new patches on top of your #2 patch set.
I can NOT crash the machine. However, during a data transfer, after a link
state change the interface sometimes recovers and other times not. If I quickly
disconnect and reconnect the link 2 or 3 times with, say, a half a second
between disconnect and reconnect the interface doesn't recover even after a
ifdown/up. 

Regards, Chris.
Comment 25 Francois Romieu 2007-11-22 14:03:22 UTC
Created attachment 13700 [details]
debug helper

Chris:
[...]
> I can NOT crash the machine. However, during a data transfer, after a link
> state change the interface sometimes recovers and other times not. If I
> quickly
> disconnect and reconnect the link 2 or 3 times with, say, a half a second
> between disconnect and reconnect the interface doesn't recover even after a
> ifdown/up.

I can not reproduce it here. The netdev watchdog can be slow but it always
trigger if necessary :o/

Can you:
- apply the attached debug patch
- sysctl -w kernel.printk="8 8 8 8"
- ethtool -s ethX msglvl 65535 before ifconfig up
- start the ftp xfer
- gzip en send the log up to the point where the driver does not recover

-- 
Ueimor
Comment 26 Christopher Moore 2007-11-22 18:33:17 UTC
(In reply to comment #25)
> Created an attachment (id=13700) [details]
> debug helper
> 
> Chris:
> [...]
> > I can NOT crash the machine. However, during a data transfer, after a link
> > state change the interface sometimes recovers and other times not. If I
> quickly
> > disconnect and reconnect the link 2 or 3 times with, say, a half a second
> > between disconnect and reconnect the interface doesn't recover even after a
> > ifdown/up.
> 
> I can not reproduce it here. The netdev watchdog can be slow but it always
> trigger if necessary :o/
> 
> Can you:
> - apply the attached debug patch
> - sysctl -w kernel.printk="8 8 8 8"
> - ethtool -s ethX msglvl 65535 before ifconfig up
> - start the ftp xfer
> - gzip en send the log up to the point where the driver does not recover
> 
> -- 
> Ueimor
> 

Hi Francois,

I have patched the kernel with your degug patch. I have tried now for one and a half hours to break the driver and I can't. The driver recovers no matter what I do. I've tried multiple transfers back and forth - up to 5 at the same time, pulling the plug very quickly at times.
It seems like you have done a great job.
All I can presume is that I hadn't installed the module properly on the last compile - but I'm pretty sure that I had, however, I may well be mistaken.

I'll roll these patches out to the 10 machines out in the field and see what happens. One machine has a very broken router on a sis190 interface that keeps changing the link state very quickly about once a day - this is how I noticed the problem in the first place.

I'll report back very soon and keep trying to break the driver on my box here.

Regards, Chris.    
Comment 27 Francois Romieu 2007-11-25 14:33:37 UTC
Created attachment 13749 [details]
remove duplicate INIT_WORK (#3)
Comment 28 Francois Romieu 2007-11-25 14:34:20 UTC
Created attachment 13750 [details]
mdio operation failure is not correctly checked (#3)
Comment 29 Francois Romieu 2007-11-25 14:34:49 UTC
Created attachment 13751 [details]
scheduling while atomic fix (#3)
Comment 30 Francois Romieu 2007-11-25 14:35:24 UTC
Created attachment 13752 [details]
remove needless MII reset (#3)
Comment 31 Francois Romieu 2007-11-25 14:35:46 UTC
Created attachment 13753 [details]
link management simplification (#3)
Comment 32 Francois Romieu 2007-11-25 14:36:15 UTC
Created attachment 13754 [details]
account for Tx errors (#3)
Comment 33 Francois Romieu 2007-11-25 14:36:41 UTC
Created attachment 13755 [details]
move the Tx timeout recovery task into user context (#3)
Comment 34 Francois Romieu 2007-11-25 14:37:03 UTC
Created attachment 13756 [details]
force Tx recovery (#3)
Comment 35 Francois Romieu 2007-11-25 14:37:24 UTC
Created attachment 13757 [details]
shorten timeouts (#3)
Comment 36 Francois Romieu 2007-11-25 14:37:47 UTC
Created attachment 13758 [details]
remove superfluous sis190_soft_reset (#3)
Comment 37 Francois Romieu 2007-11-25 14:42:06 UTC
Chris, can you try the serie #3 above ?

The #2 serie could deadlock under specific conditions when the device was
closed. It would be nice if you could plug/unplug the cable during a transfer
and ifconfig down the device while pluging/unpluging the cable.

-- 
Ueimor
Comment 38 Christopher Moore 2007-11-27 04:16:55 UTC
(In reply to comment #37)
> Chris, can you try the serie #3 above ?
> 
> The #2 serie could deadlock under specific conditions when the device was
> closed. It would be nice if you could plug/unplug the cable during a transfer
> and ifconfig down the device while pluging/unpluging the cable.
> 
> -- 
> Ueimor
> 

Hi Francois,

OK, I patched the kernel with the #3 set, after removing all other patches.
I can NOT get the machine to crash, however, I can lock up the interface - even an ifdown/up does not bring it back to life. I have the following in the logs:

Nov 27 11:48:58 devel kernel: eth0: mii ext = 0000.
Nov 27 11:48:58 devel kernel: eth0: mii lpa = 40a1 adv = 01e1.
Nov 27 11:48:58 devel kernel: eth0: link on 100 Mbps Half Duplex mode.
...
... after pulling the cable out a few times ....
...
Nov 27 11:51:22 devel kernel: eth0: Tx timeout, status 00001a01 377673c8.
Nov 27 11:51:22 devel kernel: eth0: Tx timeout, status 00001a01 37767028.
Nov 27 11:51:23 devel last message repeated 2 times
Nov 27 11:51:34 devel kernel: NETDEV WATCHDOG: eth0: transmit timed out
Nov 27 11:51:34 devel kernel: eth0: Tx timeout, status 00001a11 37767024.
Nov 27 11:51:40 devel kernel: NETDEV WATCHDOG: eth0: transmit timed out
Nov 27 11:51:40 devel kernel: eth0: Tx timeout, status 00001a11 37767014.
...
... At this point the interface doesn't come back to life even with ifup/down
...
Nov 27 12:01:09 devel dhcpd: receive_packet failed on eth0: Network is down
Nov 27 12:01:13 devel kernel: eth0: mii ext = 0000.
Nov 27 12:01:13 devel kernel: eth0: mii lpa = 40a1 adv = 01e1.
Nov 27 12:01:13 devel kernel: eth0: link on 100 Mbps Half Duplex mode.
Nov 27 12:02:18 devel dhcpd: receive_packet failed on eth0: Network is down

...
... I have to rmmod sis190/ modprobe sis190/ ifup eth0 to get it going again..
...

Nov 27 12:02:21 devel kernel: ACPI: PCI interrupt for device 0000:00:04.0 disabl
ed
Nov 27 12:02:27 devel kernel: sis190 Gigabit Ethernet driver 1.2 loaded.
Nov 27 12:02:27 devel kernel: ACPI: PCI Interrupt 0000:00:04.0[A] -> Link [LNKD]
 -> GSI 10 (level, low) -> IRQ 10
Nov 27 12:02:27 devel kernel: 0000:00:04.0: Read MAC address from APC.
Nov 27 12:02:28 devel kernel: 0000:00:04.0: Realtek PHY RTL8201 transceiver at a
ddress 1.
Nov 27 12:02:28 devel kernel: 0000:00:04.0: Using transceiver at address 1 as de
fault.
Nov 27 12:02:28 devel kernel: 0000:00:04.0: SiS 190 PCI Fast Ethernet adapter at
 f8a0ec00 (IRQ: 10), 00:17:31:90:ba:e5
Nov 27 12:02:28 devel kernel: eth0: GMII mode.
Nov 27 12:02:28 devel kernel: eth0: Enabling Auto-negotiation.
Nov 27 12:02:33 devel kernel: eth0: mii ext = 0000.
Nov 27 12:02:33 devel kernel: eth0: mii lpa = 40a1 adv = 01e1.
Nov 27 12:02:33 devel kernel: eth0: link on 100 Mbps Half Duplex mode.
....
....

I suspect that I should have had some more debugging patched in. Do you want me to try again with the debug patches that you sent?

Regards, Chris.
Comment 39 Francois Romieu 2007-11-27 13:25:04 UTC
Chris:
[...]
> I suspect that I should have had some more debugging patched in. Do you want
> me
> to try again with the debug patches that you sent?

Not immediately. Can you simply send me the complete dmesg above ?
Your annotations are useful but I am curious to know what hides behind
the three little dots.

-- 
Ueimor
Comment 40 Francois Romieu 2007-11-27 14:23:21 UTC
Created attachment 13770 [details]
Hunt for the reset_task against rmmod/insmod difference

Chris, can you add the patch above on top of the #3 serie ?

Just FIY, my sis1900 looks like this when it recovers:

# ifconfig eth2                                            
eth2      Link encap:Ethernet  HWaddr 00:11:D8:17:FF:62                         
          inet addr:10.0.1.2  Bcast:10.255.255.255  Mask:255.0.0.0              
          inet6 addr: fe80::211:d8ff:fe17:ff62/64 Scope:Link                    
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1                    
          RX packets:307246 errors:2 dropped:0 overruns:0 frame:2               
                                   ^                            ^
          TX packets:155076 errors:0 dropped:64 overruns:0 carrier:0            
                                             ^^
          collisions:0 txqueuelen:1000                                          
          RX bytes:440946478 (420.5 MiB)  TX bytes:10483464 (9.9 MiB)           
          Interrupt:22 Base address:0xdead                                      

-- 
Ueimor
Comment 41 Juan Jose Pablos 2007-11-29 09:21:15 UTC
Hi,
I am trying to apply all the patches from the #3 serie to help on testing. But it fail on chuck 3 of link management simplification (#3). It is because the debug helper.

Should I skip the debug helper? Still a bit tedious to test this stuff :-)
Comment 42 Francois Romieu 2007-11-29 12:37:48 UTC
Juan, the patches should be applied in this order:
- remove duplicate INIT_WORK (#3)
- mdio operation failure is not correctly checked (#3)
- scheduling while atomic fix (#3)
- remove needless MII reset (#3)
- link management simplification (#3)
- account for Tx errors (#3)
- move the Tx timeout recovery task into user context (#3)
- force Tx recovery (#3)
- shorten timeouts (#3)
- remove superfluous sis190_soft_reset (#3)
- Hunt for the reset_task against rmmod/insmod difference

-- 
Ueimor
Comment 43 Christopher Moore 2007-11-29 13:13:08 UTC
(In reply to comment #39)
> Chris:
> [...]
> > I suspect that I should have had some more debugging patched in. Do you
> want me
> > to try again with the debug patches that you sent?
> 
> Not immediately. Can you simply send me the complete dmesg above ?
> Your annotations are useful but I am curious to know what hides behind
> the three little dots.
> 

Hi Francois,

There's not really anything of interest behind the dots, but here goes:

Nov 27 11:48:58 devel dhcpd: receive_packet failed on eth0: Network is down
Nov 27 11:48:58 devel kernel: eth0: mii ext = 0000.
Nov 27 11:48:58 devel kernel: eth0: mii lpa = 40a1 adv = 01e1.
Nov 27 11:48:58 devel kernel: eth0: link on 100 Mbps Half Duplex mode.
Nov 27 11:49:10 devel login: PAM unable to dlopen(/lib/security/pam_console.so)
Nov 27 11:49:10 devel login: PAM adding faulty module: /lib/security/pam_console
.so
Nov 27 11:49:10 devel PAM-securetty[3469]: Couldn't open /etc/securetty
Nov 27 11:49:11 devel pam_winbind[3469]: write to socket failed!
Nov 27 11:49:11 devel pam_winbind[3469]: internal module error (retval = 3, user
 = `root')
Nov 27 11:49:11 devel pam_winbind[3469]: write to socket failed!
Nov 27 11:49:11 devel pam_winbind[3469]: internal module error (retval = 3, user
 = `root')
Nov 27 11:49:11 devel PAM_pwdb[3469]: (login) session opened for user root by LO
GIN(uid=0)
Nov 27 11:49:20 devel ftpd[4893]: wu-ftpd - TLS settings: control allow, client_
cert allow, data allow
Nov 27 11:49:23 devel pam_winbind[4893]: write to socket failed!
Nov 27 11:49:23 devel pam_winbind[4893]: internal module error (retval = 3, user
 = `root')
Nov 27 11:49:23 devel pam_winbind[4893]: write to socket failed!
Nov 27 11:49:23 devel pam_winbind[4893]: internal module error (retval = 3, user
 = `root')
Nov 27 11:49:23 devel PAM_unix[4893]: (ftp) session opened for user root by (uid
=0)
Nov 27 11:49:23 devel ftpd: devel.linuxepos1.demon.co.uk: root[4893]: FTP LOGIN
FROM devel.linuxepos1.demon.co.uk [192.168.0.1], root
Nov 27 11:49:31 devel login: PAM unable to dlopen(/lib/security/pam_console.so)
Nov 27 11:49:31 devel login: PAM adding faulty module: /lib/security/pam_console
.so
Nov 27 11:49:31 devel PAM-securetty[3470]: Couldn't open /etc/securetty
Nov 27 11:49:32 devel pam_winbind[3470]: write to socket failed!
Nov 27 11:49:32 devel pam_winbind[3470]: internal module error (retval = 3, user
 = `root')
Nov 27 11:49:32 devel pam_winbind[3470]: write to socket failed!
Nov 27 11:49:32 devel pam_winbind[3470]: internal module error (retval = 3, user
 = `root')
Nov 27 11:49:32 devel PAM_pwdb[3470]: (login) session opened for user root by LO
GIN(uid=0)
Nov 27 11:50:01 devel MailScanner:  succeeded
Nov 27 11:50:01 devel last message repeated 2 times
Nov 27 11:50:12 devel PAM_unix[4893]: (ftp) session closed for user root
Nov 27 11:50:12 devel ftpd: devel.linuxepos1.demon.co.uk: root: QUIT[4893]: FTP
session closed
Nov 27 11:51:22 devel kernel: eth0: Tx timeout, status 00001a01 377673c8.
Nov 27 11:51:22 devel kernel: eth0: Tx timeout, status 00001a01 37767028.
Nov 27 11:51:23 devel last message repeated 2 times
Nov 27 11:51:34 devel kernel: NETDEV WATCHDOG: eth0: transmit timed out
Nov 27 11:51:34 devel kernel: eth0: Tx timeout, status 00001a11 37767024.
Nov 27 11:51:40 devel kernel: NETDEV WATCHDOG: eth0: transmit timed out
Nov 27 11:51:40 devel kernel: eth0: Tx timeout, status 00001a11 37767014.
Nov 27 11:58:55 devel nmbd[3293]: [2007/11/27 11:58:55, 0] nmbd/nmbd_browsesync.
c:find_domain_master_name_query_fail(351)
Nov 27 11:58:55 devel nmbd[3293]:   find_domain_master_name_query_fail:
Nov 27 11:58:55 devel nmbd[3293]:   Unable to find the Domain Master Browser nam
e LINUXEPOS<1b> for the workgroup LINUXEPOS.
Nov 27 11:58:55 devel nmbd[3293]:   Unable to sync browse lists in this workgrou
p.
Nov 27 12:00:02 devel /sbin/hotplug: no runnable /etc/hotplug/kernel.agent is in
stalled
Nov 27 12:00:02 devel last message repeated 2 times
Nov 27 12:00:02 devel MailScanner:  succeeded
Nov 27 12:00:02 devel /sbin/hotplug: no runnable /etc/hotplug/kernel.agent is in
stalled
Nov 27 12:00:02 devel MailScanner:  succeeded
Nov 27 12:00:02 devel MailScanner:  succeeded
Nov 27 12:00:34 devel /sbin/hotplug: no runnable /etc/hotplug/kernel.agent is in
stalled
Nov 27 12:00:35 devel /sbin/hotplug: no runnable /etc/hotplug/kernel.agent is in
stalled
Nov 27 12:01:09 devel dhcpd: receive_packet failed on eth0: Network is down
Nov 27 12:01:13 devel kernel: eth0: mii ext = 0000.
Nov 27 12:01:13 devel kernel: eth0: mii lpa = 40a1 adv = 01e1.
Nov 27 12:01:13 devel kernel: eth0: link on 100 Mbps Half Duplex mode.
Nov 27 12:02:18 devel dhcpd: receive_packet failed on eth0: Network is down
Nov 27 12:02:21 devel kernel: ACPI: PCI interrupt for device 0000:00:04.0 disabl
ed
Nov 27 12:02:21 devel /sbin/hotplug: no runnable /etc/hotplug/drivers.agent is i
nstalled
Nov 27 12:02:21 devel /sbin/hotplug: no runnable /etc/hotplug/module.agent is in
stalled
Nov 27 12:02:27 devel kernel: sis190 Gigabit Ethernet driver 1.2 loaded.
Nov 27 12:02:27 devel kernel: ACPI: PCI Interrupt 0000:00:04.0[A] -> Link [LNKD]
 -> GSI 10 (level, low) -> IRQ 10
Nov 27 12:02:27 devel kernel: 0000:00:04.0: Read MAC address from APC.
Nov 27 12:02:27 devel /sbin/hotplug: no runnable /etc/hotplug/module.agent is in
stalled
Nov 27 12:02:27 devel /sbin/hotplug: no runnable /etc/hotplug/drivers.agent is i
nstalled
Nov 27 12:02:28 devel kernel: 0000:00:04.0: Realtek PHY RTL8201 transceiver at a
ddress 1.
Nov 27 12:02:28 devel kernel: 0000:00:04.0: Using transceiver at address 1 as de
fault.
Nov 27 12:02:28 devel kernel: 0000:00:04.0: SiS 190 PCI Fast Ethernet adapter at
 f8a0ec00 (IRQ: 10), 00:17:31:90:ba:e5
Nov 27 12:02:28 devel kernel: eth0: GMII mode.
Nov 27 12:02:28 devel kernel: eth0: Enabling Auto-negotiation.
Nov 27 12:02:33 devel kernel: eth0: mii ext = 0000.
Nov 27 12:02:33 devel kernel: eth0: mii lpa = 40a1 adv = 01e1.
Nov 27 12:02:33 devel kernel: eth0: link on 100 Mbps Half Duplex mode.
Nov 27 12:05:20 devel login: PAM unable to dlopen(/lib/security/pam_console.so)
Nov 27 12:05:20 devel login: PAM adding faulty module: /lib/security/pam_console
.so
Nov 27 12:05:22 devel PAM-securetty[8382]: Couldn't open /etc/securetty
Nov 27 12:05:23 devel pam_winbind[8382]: write to socket failed!
Nov 27 12:05:23 devel pam_winbind[8382]: internal module error (retval = 3, user
 = `root')
Nov 27 12:05:23 devel pam_winbind[8382]: write to socket failed!
Nov 27 12:05:23 devel pam_winbind[8382]: internal module error (retval = 3, user
 = `root')
Nov 27 12:05:23 devel PAM_pwdb[8382]: (login) session opened for user root by (u
id=0)
Nov 27 12:05:55 devel login: PAM unable to dlopen(/lib/security/pam_console.so)
Nov 27 12:05:55 devel login: PAM adding faulty module: /lib/security/pam_console
.so
Nov 27 12:05:57 devel PAM-securetty[8412]: Couldn't open /etc/securetty
Comment 44 Christopher Moore 2007-11-30 02:06:48 UTC
(In reply to comment #40)
> Created an attachment (id=13770) [details]
> Hunt for the reset_task against rmmod/insmod difference
> 
> Chris, can you add the patch above on top of the #3 serie ?
> 
> Just FIY, my sis1900 looks like this when it recovers:
> 
> # ifconfig eth2                                            
> eth2      Link encap:Ethernet  HWaddr 00:11:D8:17:FF:62                       
>           inet addr:10.0.1.2  Bcast:10.255.255.255  Mask:255.0.0.0            
>           inet6 addr: fe80::211:d8ff:fe17:ff62/64 Scope:Link                  
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1                  
>           RX packets:307246 errors:2 dropped:0 overruns:0 frame:2             
>                                    ^                            ^
>           TX packets:155076 errors:0 dropped:64 overruns:0 carrier:0          
>                                              ^^
>           collisions:0 txqueuelen:1000                                        
>           RX bytes:440946478 (420.5 MiB)  TX bytes:10483464 (9.9 MiB)         
>           Interrupt:22 Base address:0xdead                                    
> 
> -- 
> Ueimor
> 

Hi Francois,

With this new patch on top of #3 the sis190 interface simply doesn't respond. If I try to ping another machine on the sis190 network there is no reply. I can ping the IP address of the sis190 interface, but that's all. If I remove the patch with -R and recompile the interface works again. I have repeated the patch/ compile process 3 times now and there is no change.
Here's dmesg with the non responsive patch:

ACPI: PCI interrupt for device 0000:00:04.0 disabled
sis190 Gigabit Ethernet driver 1.2 loaded.
ACPI: PCI Interrupt 0000:00:04.0[A] -> Link [LNKD] -> GSI 10 (level, low) -> IRQ 10
PCI: Setting latency timer of device 0000:00:04.0 to 64
0000:00:04.0: Read MAC address from APC.
0000:00:04.0: Realtek PHY RTL8201 transceiver at address 1.
0000:00:04.0: Using transceiver at address 1 as default.
0000:00:04.0: SiS 190 PCI Fast Ethernet adapter at dea0ec00 (IRQ: 10), 00:17:31:90:bc:35
eth0: GMII mode.
eth0: Enabling Auto-negotiation.
eth0: mii ext = 0000.
eth0: mii lpa = 40a1 adv = 01e1.
eth0: link on 100 Mbps Half Duplex mode.

Here's ifconfig for the interface:

eth0      Link encap:Ethernet  HWaddr 00:17:31:90:BC:35  
          inet addr:192.168.0.1  Bcast:192.168.0.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:6 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:360 (360.0 b)  TX bytes:0 (0.0 b)
          Interrupt:10 Base address:0xdead 
Comment 45 Juan Jose Pablos 2007-11-30 05:45:09 UTC
My output said:

0000:00:04.0: Unknown PHY transceiver at address 1.
0000:00:04.0: Using transceiver at address 1 as default.

eth0:RGMII mode.

it looks a similar problem to this one:
http://www.ussg.iu.edu/hypermail/linux/kernel/0408.3/0244.html
Comment 46 Christopher Moore 2007-11-30 06:43:45 UTC
(In reply to comment #45)
> My output said:
> 
> 0000:00:04.0: Unknown PHY transceiver at address 1.
> 0000:00:04.0: Using transceiver at address 1 as default.
> 
> eth0:RGMII mode.
> 
> it looks a similar problem to this one:
> http://www.ussg.iu.edu/hypermail/linux/kernel/0408.3/0244.html
> 

Hi Juan,

The problem you refer to is for a sis900 driver. This bug# is for the sis190 driver. Is yours a sis190 or sis900, I wonder?

Regards, Chris.
Comment 47 Juan Jose Pablos 2007-11-30 07:15:08 UTC
Chris,
I have a sis190 driver. The problem I was refering was about an unknow PHY  transceiver.
On sis190.c is defined here:

} mii_chip_table[] = {
        { "Broadcom PHY BCM5461", { 0x0020, 0x60c0 }, LAN, F_PHY_BCM5461 },
        { "Broadcom PHY AC131",   { 0x0143, 0xbc70 }, LAN, 0 },
        { "Agere PHY ET1101B",    { 0x0282, 0xf010 }, LAN, 0 },
        { "Marvell PHY 88E1111",  { 0x0141, 0x0cc0 }, LAN, F_PHY_88E1111 },
        { "Realtek PHY RTL8201",  { 0x0000, 0x8200 }, LAN, 0 },
        { NULL, }
};

I wonder if this unknow issue get fixed putting the right values.
Comment 48 Francois Romieu 2007-12-03 14:13:16 UTC
Chris :
[...]
> With this new patch on top of #3 the sis190 interface simply doesn't respond.

Ok, revert this patch and forget it for now.

I have set up a git tree which contains an updated version of ethtool at:
git://kernel.org/pub/scm/linux/kernel/git/romieu/ethtool.git

Please do a:
$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/romieu/ethtool.git
$ cd ethtool
$ git checkout -f sis19x
$ ./autogen.sh; ./configure; make

The build directory should contain an ethtool binary which supports dumping
the registers of the sis19x ('ethtool -d ethX'). Can you send the content
of the registers:
- when the adapter is up and running (two samples separated by a few ping
  packets)
- when the adapter is lost in wonderland (before and after the watchdog kicks
  in if possible)

-- 
Ueimor
Comment 49 Francois Romieu 2007-12-03 14:20:57 UTC
Created attachment 13842 [details]
Display more info when the PHY is unknown

Juan, can you add the attached patch on top of the #3 serie and report
the resulting messages ?

Thanks in advance.

-- 
Ueimor
Comment 50 Juan Jose Pablos 2007-12-03 15:59:56 UTC
I guess that the important it is this one:

0000:00:04.0: Unknown PHY transceiver at address 1 (001c:c912)
Comment 51 Juan Jose Pablos 2007-12-04 00:16:57 UTC
Looking information about the PHY on the system, I had opened the case. I knew that it had a SIS968 chipset. on the features page said: 
Gigabit Ethernet MAC Controller
- 10/100/1000Mbps triple speed
- MII/RGMII standard interface to support external PHY
but my surprise when looking at the motherboard was that it has a RTL8211BL chip inside. So I made this change on the driver:

        { "Realtek PHY RTL8201",  { 0x0000, 0x8200 }, LAN, 0 },
+        { "Realtek PHY RTL8211BL",{ 0x001c, 0xc912 }, LAN, 0 },
        { NULL, }
};

Is any way to find oout more info about this chip?
Comment 52 Francois Romieu 2007-12-04 14:13:49 UTC
juanjo@apertus.es:
[...]
>         { "Realtek PHY RTL8201",  { 0x0000, 0x8200 }, LAN, 0 },
> +       { "Realtek PHY RTL8211BL",{ 0x001c, 0xc912 }, LAN, 0 },
>         { NULL, }
> };
> 
> Is any way to find oout more info about this chip ?

The datasheet is available at http://www.realtek.com.tw/
-> Products
   -> Communications Network ICs
      -> PHYceivers 100/100/1000 Gigabit Ethernet
         -> 1 port

It seem rather classical though. I would expect the current code to
handle it once the ID is added.

Do you still experience problems with the #3 serie + the 968 cmos
access code + your 8211 change ?
Comment 53 Juan Jose Pablos 2007-12-04 16:52:05 UTC
upss. I never thought of looking at that page, I google for RTL8211BL and I could no t find usefull information on the couple of pages... sorry for the hashle.

my 8211 change did not help to recognize the PHY chip. I am trying without the #3 serie and over 2.6.24-rc4  . I can not find why it is failing to identify this chip. Be a bit patient with me :-)
just one question: 001c and c912 are the 3th and 4th of Basic registers of MII PHY when checking with mii-diag
Comment 54 Francois Romieu 2007-12-05 14:33:54 UTC
Juan:
[...]
> just one question: 001c and c912 are the 3th and 4th of Basic registers of
> MII
> PHY when checking with mii-diag

Yes, as outlined in 7.2.3 p.23 and 7.2.4 p.24 of RTL8211B(L)_DataSheet_1.4.pdf.

-- 
Ueimor
Comment 55 Juan Jose Pablos 2007-12-06 02:43:08 UTC
Created attachment 13887 [details]
Removing  0xfff0 allows to identify PHY chip

I had to removed this condition to allow the transceiver to de identify
Comment 56 Juan Jose Pablos 2007-12-06 02:50:27 UTC
ok, 
I modify the  sis190_init_phy() so it detects the transceiver propertly. But it does not modify the behaivour. There is one lite detail that bugs me now. This is conected to a 10Base router, but it said: "eth0: RGMII mode." 
I thought that this should be "MII mode."
Comment 57 Francois Romieu 2007-12-06 15:31:52 UTC
Juan:
[...]
> I thought that this should be "MII mode."

Not necessarily. We could fail the autonegotiation for unrelated reasons.

Can you send a patch of all your changes (on top of ...) as well as a complete
dmesg and a 'mii-tool -vv' ?

-- 
Ueimor
Comment 58 Juan Jose Pablos 2007-12-06 16:26:53 UTC
Created attachment 13899 [details]
output of mii-diag -vvv
Comment 59 Juan Jose Pablos 2007-12-06 16:30:03 UTC
Created attachment 13900 [details]
dmesg with all the patches from test #3 and personal changes.
Comment 60 Juan Jose Pablos 2007-12-06 16:38:10 UTC
Created attachment 13901 [details]
ID for PHY RTL8221BL , Bridge search update from #9467 and  attach13887 

Here is the changes I made, some from bug 9467, others from your suggestions and the third one is a proposal so the PHY gets identifyed (I am not sure if this one breaks somethingelse)
Comment 61 Christopher Moore 2007-12-14 04:45:31 UTC
Created attachment 14026 [details]
Ethtool output 1

This is the first run of your patched ethtool. At this point the interface is up and running fine.
Comment 62 Christopher Moore 2007-12-14 04:48:04 UTC
Created attachment 14027 [details]
Ethtool output 2

This is the output of ethtool after a bit of data has traversed the interface - at this point all is well with the interface.
Comment 63 Christopher Moore 2007-12-14 04:50:13 UTC
Created attachment 14028 [details]
Ethtool output 3

An ftp "put" was set running and the cable was unplugged a few times. This is the output of ethtool before the NETDEV WATCHDOG kicked in. At this point the interface is unresponsive.
Comment 64 Christopher Moore 2007-12-14 04:53:37 UTC
Created attachment 14029 [details]
Ethtool output 4

This is the output of ethtool after the NETDEV WATCHDOG kicked in. The interface is still unresponsive. I waited a few minutes and it was still unresponsive so I ifdown/up - still no joy. I eventually had to do an ifdown eth0; rmmod sis190; modprobe sis190; ifup eth0. The interface then burst in to life.

Appologies for the delay in getting back - I had to travel at short notice.
Comment 65 Marco Silva 2008-02-13 08:50:10 UTC
If just tested the latests patchs, on the Ubuntu Feisty Kernel ( 2.6.20 ) with some minor changes so the patchs would work and compile.

All works fine, PC does not hang, I can remove the cable and re-connect and connection works.

Just have 1 problem, when I try to rmmod sis190 the pc goes crazy.

Good work Francois and all, this "dam" card is almost working great :)
Comment 66 Marco Silva 2008-02-13 08:52:51 UTC
Created attachment 14794 [details]
sis190.c for 2.6.20 Kernel (Ubuntu Feisty)

This sis190.c has the patches from Francois (#3 Series) with minor changes to compile in the 2.6.20 kernel. 

It works, but when rmmod sis190 the pc goes nuts. Havent looked at the reason still, no time.
Comment 67 Juan Jose Pablos 2008-03-23 16:42:23 UTC
I am testing right now with kernel 2.6.25-rc6 and some of the (#3 series) can not be applied to the kernel anymore. I am seeing errors when transfer files using the cifs module. But I do not know how to report back with useful information. Please provide some status and a bit of info on how to help :-)
Comment 68 Francois Romieu 2008-04-27 09:08:33 UTC
Comment on attachment 13749 [details]
remove duplicate INIT_WORK (#3)

The patch is included in 2.6.24.
Comment 69 Francois Romieu 2008-04-27 09:09:51 UTC
Comment on attachment 13751 [details]
scheduling while atomic fix (#3)

The patch is included in 2.6.24.
Comment 70 Francois Romieu 2008-04-27 09:10:29 UTC
Comment on attachment 13750 [details]
mdio operation failure is not correctly checked (#3)

The patch is included in 2.6.24.
Comment 71 Alan 2009-03-24 10:34:42 UTC
Is anything outstanding on this or can I close it ?

Note You need to log in before you can comment on or make changes to this bug.