Bug 26632 - rtl8169 slow to come up and not allowing telnetd-ssl negotiation
Summary: rtl8169 slow to come up and not allowing telnetd-ssl negotiation
Status: RESOLVED PATCH_ALREADY_AVAILABLE
Alias: None
Product: Drivers
Classification: Unclassified
Component: Network (show other bugs)
Hardware: All Linux
: P1 high
Assignee: drivers_network@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-01-13 09:53 UTC by Arthur Marsh
Modified: 2011-01-20 01:46 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.37-git9
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments

Description Arthur Marsh 2011-01-13 09:53:22 UTC
I have an ASUS M3A78 Pro motherboard (AMD64, RS780 chipset) with onboard Realtek 8169. Normally the ethernet connection comes up quickly, including with kernel 2.6.37-git7. With kernel 2.6.37-git9, I get the following in dmesg:

$ dmesg|grep eth0
[    0.878911] r8169 0000:02:00.0: eth0: RTL8168c/8111c at 0xffffc9000007a000, 00:23:54:5e:5d:d5, XID 1c4000c0 IRQ 40
[   11.156602] r8169 0000:02:00.0: eth0: link down
[   11.156639] r8169 0000:02:00.0: eth0: link down
[   11.157333] ADDRCONF(NETDEV_UP): eth0: link is not ready
[   12.780948] r8169 0000:02:00.0: eth0: link up
[   12.783051] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready

At this stage I can connect to the internet with HTTP/HTTPS but a telnet-ssl <address-of-this-machine> or telnet-ssl from another machine to this one fails at the telnet negotiation stage. telnet-ssl from this machine to another works.
Comment 1 Arthur Marsh 2011-01-13 13:59:50 UTC
This appears to be a more general problem with inbound telnetd-ssl sessions hanging when addressed via the IP address of the Ethernet card, but not affecting the localhost interface:

telnet 192.168.1.100
 Trying 192.168.1.100... (OK)
 Negotiations...............................
*************************
The Telnet server is not sending required responses.

?Telnet waiting for response to WILL TERMINAL-TYPE
?Telnet waiting for response to WILL NAWS
?Telnet waiting for response to WILL AUTHENTICATION
?Telnet waiting for response to WILL NEW-ENVIRONMENT
?Telnet waiting for response to WILL COM-PORT-CONTROL

You can continue to wait or you can cancel with Ctrl-C.
In case the Telnet server never responds as required,
you can try connecting to this host with TELNET /NOWAIT.
Use SET HINTS OFF to suppress further hints.
*************************
.......................................^CClosing 192.168.1.100...OK

When the local machine is running kernel 2.6.37-git9, such as above, using the IP address of the local machine fails, but I get:

C-Kermit>telnet localhost
 DNS Lookup...  Trying 127.0.0.1...  Reverse DNS Lookup... (OK)
Authenticating with SSL

Password:
Last login: Thu Jan 13 20:21:21 CST 2011 from 192.168.1.101 on pts/0
Linux victoria 2.6.37-git9 #2 SMP Thu Jan 13 21:16:17 CST 2011 i686

I've tried this on both machines here, a Pentium-II with a Realtek 8139 ethernet card and an AMD-64 machine with a Realtek 8169 ethernet card and observe the same symptoms.
Comment 2 Arthur Marsh 2011-01-15 14:18:12 UTC
I'm currently trying a git-bisect on Linus' kernel tree to identify when this bug first appeared.
Comment 3 Arthur Marsh 2011-01-15 23:06:34 UTC
The git-bisect returned:

0ab03c2b1478f2438d2c80204f7fef65b1bca9cf is the first bad commit
commit 0ab03c2b1478f2438d2c80204f7fef65b1bca9cf
Author: Jan Engelhardt <jengelh@medozas.de>
Date:   Fri Jan 7 03:15:05 2011 +0000

    netlink: test for all flags of the NLM_F_DUMP composite

    Due to NLM_F_DUMP is composed of two bits, NLM_F_ROOT | NLM_F_MATCH,
    when doing "if (x & NLM_F_DUMP)", it tests for _either_ of the bits
    being set. Because NLM_F_MATCH's value overlaps with NLM_F_EXCL,
    non-dump requests with NLM_F_EXCL set are mistaken as dump requests.

    Substitute the condition to test for _all_ bits being set.

    Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
    Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

:040000 040000 1a0717ab0c87787309c3c3af88d666b44f327f64 cba6279de85b7ebeaf21f19f1d93b59468fdd01d M      net

This problem did not only affect telnet, but also there were warning messages on system shutdown about daemons not returning status.

I tried git cherry-pick 0ab03c2b1478f2438d2c80204f7fef65b1bca9cf and verified that the resulting kernel had these problems, then git revert 0ab03c2b1478f2438d2c80204f7fef65b1bca9cf and verified that the resulting kernel did *not* have problems.
Comment 4 Andrew Morton 2011-01-19 22:59:30 UTC
(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Thu, 13 Jan 2011 09:53:26 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=26632
> 
>            Summary: rtl8169 slow to come up and not allowing telnetd-ssl
>                     negotiation
>            Product: Drivers
>            Version: 2.5
>     Kernel Version: 2.6.37-git9
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: high
>           Priority: P1
>          Component: Network
>         AssignedTo: drivers_network@kernel-bugs.osdl.org
>         ReportedBy: arthur.marsh@internode.on.net
>                 CC: arthur.marsh@internode.on.net
>         Regression: Yes
> 
> 
> I have an ASUS M3A78 Pro motherboard (AMD64, RS780 chipset) with onboard
> Realtek 8169. Normally the ethernet connection comes up quickly, including
> with
> kernel 2.6.37-git7. With kernel 2.6.37-git9, I get the following in dmesg:
> 
> $ dmesg|grep eth0
> [    0.878911] r8169 0000:02:00.0: eth0: RTL8168c/8111c at
> 0xffffc9000007a000,
> 00:23:54:5e:5d:d5, XID 1c4000c0 IRQ 40
> [   11.156602] r8169 0000:02:00.0: eth0: link down
> [   11.156639] r8169 0000:02:00.0: eth0: link down
> [   11.157333] ADDRCONF(NETDEV_UP): eth0: link is not ready
> [   12.780948] r8169 0000:02:00.0: eth0: link up
> [   12.783051] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> 
> At this stage I can connect to the internet with HTTP/HTTPS but a telnet-ssl
> <address-of-this-machine> or telnet-ssl from another machine to this one
> fails
> at the telnet negotiation stage. telnet-ssl from this machine to another
> works.
> 

This regression was bisected to

0ab03c2b1478f2438d2c80204f7fef65b1bca9cf is the first bad commit
commit 0ab03c2b1478f2438d2c80204f7fef65b1bca9cf
Author: Jan Engelhardt <jengelh@medozas.de>
Date:   Fri Jan 7 03:15:05 2011 +0000

    netlink: test for all flags of the NLM_F_DUMP composite
Comment 5 David S. Miller 2011-01-20 01:03:55 UTC
From: Andrew Morton <akpm@linux-foundation.org>
Date: Wed, 19 Jan 2011 14:59:04 -0800

> This regression was bisected to
> 
> 0ab03c2b1478f2438d2c80204f7fef65b1bca9cf is the first bad commit
> commit 0ab03c2b1478f2438d2c80204f7fef65b1bca9cf
> Author: Jan Engelhardt <jengelh@medozas.de>
> Date:   Fri Jan 7 03:15:05 2011 +0000
> 
>     netlink: test for all flags of the NLM_F_DUMP composite

Which is being reverted.
Comment 6 Anonymous Emailer 2011-01-20 01:06:53 UTC
Reply-To: jengelh@medozas.de

On Wednesday 2011-01-19 23:59, Andrew Morton wrote:

>(switched to email.  Please respond via emailed reply-to-all, not via the
>bugzilla web interface).
>
>On Thu, 13 Jan 2011 09:53:26 GMT
>bugzilla-daemon@bugzilla.kernel.org wrote:
>
>> https://bugzilla.kernel.org/show_bug.cgi?id=26632
>> 
>>            Summary: rtl8169 slow to come up and not allowing telnetd-ssl
>>                     negotiation
>
>This regression was bisected to
>
>0ab03c2b1478f2438d2c80204f7fef65b1bca9cf is the first bad commit
>    netlink: test for all flags of the NLM_F_DUMP composite

Adressed by dave/net-next-2.6 b8f3ab4290f1e720166e888ea2a1d1d44c4d15dd 
for now while it is being figured out what to really do.

Note You need to log in before you can comment on or make changes to this bug.