Bug 5025 - Excessive time to connect to cable provider using dhcpcd
Summary: Excessive time to connect to cable provider using dhcpcd
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Network (show other bugs)
Hardware: i386 Linux
: P2 low
Assignee: Francois Romieu
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-08-08 12:48 UTC by Frank Peters
Modified: 2005-09-09 11:30 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.13-rc6
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
Output of lspci -vx and dmsg for 2.6.13-rc6 and 2.6.11.7 (24.56 KB, text/plain)
2005-08-08 15:16 UTC, Frank Peters
Details
dhcpcd tcpdump for kernel-2.6.11.7 (4.28 KB, text/plain)
2005-08-08 22:07 UTC, Frank Peters
Details
dhcpcd tcpdump for kernel-2.6.13-rc6 (63.13 KB, text/plain)
2005-08-08 22:08 UTC, Frank Peters
Details
dhcpcd tcpdump for kernel-2.6.11.7 (4.28 KB, application/octet-stream)
2005-08-08 22:13 UTC, Frank Peters
Details
dhcpcd tcpdump for kernel-2.6.13rc6 (63.13 KB, application/octet-stream)
2005-08-08 22:14 UTC, Frank Peters
Details
strace of dhcpcd on kernel-2.6.13-rc6 (159.12 KB, text/plain)
2005-08-09 09:07 UTC, Frank Peters
Details
strace of dhcpcd on kernel-2.6.11.7 (8.72 KB, text/plain)
2005-08-09 09:08 UTC, Frank Peters
Details
modprobe 3c59x debug=3 kernel log (2.66 KB, text/plain)
2005-08-09 17:29 UTC, Frank Peters
Details

Description Frank Peters 2005-08-08 12:48:40 UTC
Most recent kernel where this bug did not occur: 2.6.11.7 (I don't know about
2.6.11 versions beyond 2.6.11.7.  I moved directly from 2.6.11.7 to 2.6.12.)    
Distribution: None -- self-compiled system
Hardware Environment: Asus P4B533 motherboard with 3Com Etherlink 3C900B NIC
Software Environment: Standard stock packages 
Problem Description: On all kernels prior to 2.6.12 (see above) a connection
could be established to my Internet cable provider in about 10 seconds.  Since
2.6.12, and currently with 2.6.13-rc6, the connection time is about four
minutes.  No changes whatsoever were made to my hardware or software.
Once the connection is finally established, the eth0 link functions normally.
Also, no errors/warnings or any other anomolous information is present in
any log files.
Steps to reproduce: After kernel boot-up, load eth0 module with 'modprobe 3c59x'
and then connect to cable provider using 'dhcpcd -t 240'.  It requires close
to four minutes to connect.
Comment 1 Francois Romieu 2005-08-08 13:47:20 UTC
Can you :
- send the output of a 'lspci -vx' under 2.6.13-rc6;
- send the dmesg from boot to the loading of the module under both 2.6.13-rc6
and 2.6.11.7
- try to revert :
http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff_plain;h=3c8fad1829cc33e903500b41d989fa50ab196378;hp=1922163c8dfe717c089bdcc18ade4a65350a09c8

--
Ueimor
Comment 2 Andrew Morton 2005-08-08 14:40:36 UTC
bugme-daemon@kernel-bugs.osdl.org wrote:
>
> http://bugzilla.kernel.org/show_bug.cgi?id=5025
> 
>            Summary: Excessive time to connect to cable provider using dhcpcd
>     Kernel Version: 2.6.13-rc6
>             Status: NEW
>           Severity: low
>              Owner: greg@kroah.com
>          Submitter: frank.peters@comcast.net
> 
> 
> Most recent kernel where this bug did not occur: 2.6.11.7 (I don't know about
> 2.6.11 versions beyond 2.6.11.7.  I moved directly from 2.6.11.7 to 2.6.12.)    
> Distribution: None -- self-compiled system
> Hardware Environment: Asus P4B533 motherboard with 3Com Etherlink 3C900B NIC
> Software Environment: Standard stock packages 
> Problem Description: On all kernels prior to 2.6.12 (see above) a connection
> could be established to my Internet cable provider in about 10 seconds.  Since
> 2.6.12, and currently with 2.6.13-rc6, the connection time is about four
> minutes.  No changes whatsoever were made to my hardware or software.
> Once the connection is finally established, the eth0 link functions normally.
> Also, no errors/warnings or any other anomolous information is present in
> any log files.
> Steps to reproduce: After kernel boot-up, load eth0 module with 'modprobe 3c59x'
> and then connect to cable provider using 'dhcpcd -t 240'.  It requires close
> to four minutes to connect.
> 

IIRC, we established that this was not a device driver problem and we're
suspecting a generic networking problem.

Could someone please talk Frank through generating a suitable tcpdump trace?

Comment 3 Frank Peters 2005-08-08 15:03:58 UTC
On Mon, 8 Aug 2005 13:47:22 -0700
bugme-daemon@kernel-bugs.osdl.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=5025
>
> Can you :
> - send the output of a 'lspci -vx' under 2.6.13-rc6;
> - send the dmesg from boot to the loading of the module under both 2.6.13-rc6
> and 2.6.11.7
>

All this is attached as lspci-dmsg.txt.


>
> - try to revert :
> http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff_plain;h=3c8fad1829cc33e903500b41d989fa50ab196378;hp=1922163c8dfe717c089bdcc18ade4a65350a09c8
> 
> --

I'm not sure what this is about.  I'll check into it later.


> IIRC, we established that this was not a device driver problem and we're
> suspecting a generic networking problem.
> 
> Could someone please talk Frank through generating a suitable tcpdump trace?
> 

I'll start doing some research into tcpdump tracing and hopefull will
be able to apply to apply it soon.

Frank Peters
Comment 4 Francois Romieu 2005-08-08 15:11:46 UTC
> Could someone please talk Frank through generating a suitable tcpdump trace?

Frank, before the dhcpcd command, can you issue for both 2.6.11.7 and 2.6.13-rc6:
tcpdump -v -s 0 -i eth0 -l -n -U -w /tmp/quuz
Stop it when the link is up and attach both /tmp/quuz to the bug report.

Thanks.

--
Ueimor

Comment 5 Frank Peters 2005-08-08 15:16:37 UTC
Created attachment 5551 [details]
Output of lspci -vx and dmsg for 2.6.13-rc6 and 2.6.11.7
Comment 6 Frank Peters 2005-08-08 15:58:09 UTC
On Mon, 8 Aug 2005 15:11:51 -0700
bugme-daemon@kernel-bugs.osdl.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=5025
> 
> 
> Frank, before the dhcpcd command, can you issue for both 2.6.11.7 and 2.6.13-rc6:
> tcpdump -v -s 0 -i eth0 -l -n -U -w /tmp/quuz
> Stop it when the link is up and attach both /tmp/quuz to the bug report.
> 

That doesn't seem to be possible.  Running tcpdump before dhcpcd only
gives me a "Network is down" message.

I use a non-networked desktop system.  Only when I need to establish an
Internet connection do I invoke dhcpcd which sets up the eth0 interface.
Before then the eth0 interface does not exist.

Frank Peters

Comment 7 Frank Peters 2005-08-08 16:59:04 UTC
On Mon, 8 Aug 2005 15:11:51 -0700
bugme-daemon@kernel-bugs.osdl.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=5025
> 
> 
> Frank, before the dhcpcd command, can you issue for both 2.6.11.7 and 2.6.13-rc6:
> tcpdump -v -s 0 -i eth0 -l -n -U -w /tmp/quuz
> Stop it when the link is up and attach both /tmp/quuz to the bug report.
> 

Issuing tcpdump before dhcpcd does not work, but what if I use the
following bash script to execute tcpdump immediately after dhcpcd:

#! /bin/bash

dhcpcd -t 240 &
tcpdump -v -s 0 -i eth0 -l -n -U -w /tmp/quuz-2.6.13-rc6

Then as soon as 'netstat -nr' shows that the link is established,
I would manually terminate tcpdump.

Would this give you the information that you need?

Frank Peters

Comment 8 Frank Peters 2005-08-08 22:05:07 UTC
On Mon, 8 Aug 2005 15:11:51 -0700
bugme-daemon@kernel-bugs.osdl.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=5025
> 
> 
> Frank, before the dhcpcd command, can you issue for both 2.6.11.7 and 2.6.13-rc6:
> tcpdump -v -s 0 -i eth0 -l -n -U -w /tmp/quuz
> Stop it when the link is up and attach both /tmp/quuz to the bug report.
> 

I found a way to use tcpdump even with no eth0 addresses already
established. Here is the method I used:

ifconfig eth0 up
tcpdump -v -s 0 -i eth0 -l -n -U -w /tmp/quuz
dhcpcd -t 240

This allows listening on the eth0 interface without any address
assigned to it.  Tcpdump is then invoked, followed by dhcpcd.

If this is acceptable, I have attached files of this packet
capture method for both kernels 2.6.11.7 and 2.6.13-rc6.

The attached files are quuz-26117 and quuz-2613rc6.

Frank Peters

Comment 9 Frank Peters 2005-08-08 22:07:18 UTC
Created attachment 5552 [details]
dhcpcd tcpdump for kernel-2.6.11.7
Comment 10 Frank Peters 2005-08-08 22:08:51 UTC
Created attachment 5553 [details]
dhcpcd tcpdump for kernel-2.6.13-rc6
Comment 11 Frank Peters 2005-08-08 22:13:14 UTC
Created attachment 5554 [details]
dhcpcd tcpdump for kernel-2.6.11.7
Comment 12 Frank Peters 2005-08-08 22:14:26 UTC
Created attachment 5555 [details]
dhcpcd tcpdump for kernel-2.6.13rc6
Comment 13 Frank Peters 2005-08-08 22:28:38 UTC
http://bugzilla.kernel.org/show_bug.cgi?id=5025

I am terribly sorry.  I posted the same attachments twice as both
text/plain and application/octet stream.  There is no difference
between the two forms, however.  They both are produced by tcpdump
and both can be loaded into a program like ethereal.

Also, my attempts to obsolete the plain/text entries were not
accurate. 

This is my first use of bugzilla and I need to get used to
the options.

Frank Peters

Comment 14 Frank Peters 2005-08-08 22:49:29 UTC
http://bugzilla.kernel.org/show_bug.cgi?id=5025

Looking at the tcpdump files in Ethereal, a network analyzer,
I see only a long delay in the case of 2.6.13-rc6 before the
actual dhcp negotiation begins, whereas with 2.6.11.7 it begins
immediately.  With 2.6.13rc5, during the long delay there are no
packets of any kind except the ARP packets originating from the
Cisco hardware.

Using the debugging option with dhcpcd, the same long delay
with no activity is also evident when using 2.6.13-rc6.

If this were a general networking problem, wouldn't there be
some kind of packet exchange originating from the local machine
right from the beginning?

Frank Peters

Comment 15 Frank Peters 2005-08-09 09:07:22 UTC
Created attachment 5562 [details]
strace of dhcpcd on kernel-2.6.13-rc6
Comment 16 Frank Peters 2005-08-09 09:08:38 UTC
Created attachment 5563 [details]
strace of dhcpcd on kernel-2.6.11.7
Comment 17 Frank Peters 2005-08-09 09:09:56 UTC
http://bugzilla.kernel.org/show_bug.cgi?id=5025

Attached are the results of strace done with the the dhcpcd
command for both kernel-2.6.13-rc6 and kernel-2.6.11.7.

The files are:

strace-2613rc6

strace-26117

The only diference I can immediately see between the two traces
is a great deal of time spent invoking the gettimeofday() function
with kernel-2.6.13rc6.

I hope this can shed some light on the problem.

Frank Peters

Comment 18 Francois Romieu 2005-08-09 16:13:15 UTC
The symptom suggest that the early Tx (Rx ?) packets are lost somewhere.
I see no change in the driver which could explain it. However you enabled ACPI
in 2.6.13-rc6 whereas it is disabled in 2.6.11.7.

So... How does 2.6.13-rc behave if you disable ACPI and/or use the same boot
command line options ?

--
Ueimor
Comment 19 Frank Peters 2005-08-09 16:37:40 UTC
On Tue, 9 Aug 2005 16:13:16 -0700
bugme-daemon@kernel-bugs.osdl.org wrote:

> 
> So... How does 2.6.13-rc behave if you disable ACPI and/or use the same boot
> command line options ?
> 

The same behavior is seen with ACPI disabled.  If I boot 2.6.13-rc6
with the option "acpi=off" the same slow start-up with dhcpcd results.
ACPI doesn't seem to matter.

Frank Peters

Comment 20 Frank Peters 2005-08-09 17:29:05 UTC
Created attachment 5567 [details]
modprobe 3c59x debug=3 kernel log
Comment 21 Frank Peters 2005-08-09 17:35:30 UTC
On Wed, 10 Aug 2005 00:39:54 +0200
Tommy Christensen <tommy.christensen@tpack.net> wrote:

> 
> ... could you please try enabling some
> debug output from the network driver, and attach the output
> to the report?  (Only the 2.6.13-rc6 kernel is relevant here).
> 
> # modprobe 3c59x debug=3
> should do it.
> 

The kernel-2.6.13-rc6 log for this module loading 
followed immediately by dhcpcd is attached as 3c59x-debug.


> Another interesting experiment would be to add a delay before
> calling dhcpcd, but after ifconfig eth0 up.
> 
> If you do:
> # modprobe 3c59x
> # ifconfig eth0 up
> # sleep 120
> # dhcpcd -t 240
> 
> does the dhcpcd command complete as fast as it did on the
> 2.6.11.7 kernel?
> 

Success!  The dhcpcd command completes with no delay at all.
I just added the command 'echo "start"' to get a fix on
when the dhcpcd process would begin.  There was no noticable
delay whatsoever.

Interestingly, I noticed the same thing with kernel-2.6.11.7
while trying to get a tcpdump.  The dhcpcd process finished
without any delay after first issuing "ifconfig eth0 up" when
otherwise it would require about 10 seconds.  Maybe this problem
is not so unique to 2.6.12+.

Frank Peters
 

Comment 22 Frank Peters 2005-08-10 19:31:51 UTC
On Thu, 11 Aug 2005 02:08:05 +0200
Tommy Christensen <tommy.christensen@tpack.net> wrote:

> 
> Recent kernels (>= 2.6.12) won't send packets to a network
> interface unless the carrier is OK.
> 
> The odd thing here is that the driver waits for 60 seconds
> between each check for carrier. That seems rather long.
> Coupled with the exponential back-off of the dhcp process, this
> leads to the long delays you're experiencing.
> 
> On top of that, it seems that link actually *is* established,
> when the phy status register is checked at first. However, due
> to the link bit beeing latched low, the driver reports link down
> here. When the register is read again (60s later!) everything
> falls into place and packets can start flowing.
> 
> The patch below avoids this "false" link down condition by
> re-reading the phy status register. Could you give it a try?
> 

Success again!  The patch fixes the problem.  The dhcpcd
now establishes a connection within 10 seconds.

Many thanks to all concerned for your time and effort
on this rather low priority issue.

Frank Peters


[3c59x-carrier.patch  text/plain (622B)]
[3c59x] Read current link status from phy

The phy status register must be read twice in order to
get the actual link state.

Signed-off-by: Tommy S. Christensen <tommy.christensen@tpack.net>


--- linux-2.6.13-rc6/drivers/net/3c59x.c-old	Sun Aug  7 20:18:56 2005
+++ linux-2.6.13-rc6/drivers/net/3c59x.c	Thu Aug 11 01:59:13 2005
@@ -1873,6 +1873,7 @@ vortex_timer(unsigned long data)
 		{
 			spin_lock_bh(&vp->lock);
 			mii_status = mdio_read(dev, vp->phys[0], 1);
+			mii_status = mdio_read(dev, vp->phys[0], 1);
 			ok = 1;
 			if (vortex_debug > 2)
 				printk(KERN_DEBUG "%s: MII transceiver has status %4.4x.\n",

Comment 23 Frank Peters 2005-09-09 06:47:12 UTC
For me, the problem is fixed.  Using the patch supplied by 
Tommy Christensen eliminates the problem.

What happens now?  Who is supposed to change the status of
this report?

Will the kernel driver eventually include this patch or
should I just keep manually modifying the code with each
new kernel release?

Frank Peters

Comment 24 Andrew Morton 2005-09-09 11:30:28 UTC
Tommy's patch is in Linus's post-2.6.13 tree.

Note You need to log in before you can comment on or make changes to this bug.