Bug 5025
Summary: | Excessive time to connect to cable provider using dhcpcd | ||
---|---|---|---|
Product: | Drivers | Reporter: | Frank Peters (frank.peters) |
Component: | Network | Assignee: | Francois Romieu (romieu) |
Status: | RESOLVED CODE_FIX | ||
Severity: | low | CC: | akpm, romieu |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.13-rc6 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: |
Output of lspci -vx and dmsg for 2.6.13-rc6 and 2.6.11.7
dhcpcd tcpdump for kernel-2.6.11.7 dhcpcd tcpdump for kernel-2.6.13-rc6 dhcpcd tcpdump for kernel-2.6.11.7 dhcpcd tcpdump for kernel-2.6.13rc6 strace of dhcpcd on kernel-2.6.13-rc6 strace of dhcpcd on kernel-2.6.11.7 modprobe 3c59x debug=3 kernel log |
Description
Frank Peters
2005-08-08 12:48:40 UTC
Can you : - send the output of a 'lspci -vx' under 2.6.13-rc6; - send the dmesg from boot to the loading of the module under both 2.6.13-rc6 and 2.6.11.7 - try to revert : http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff_plain;h=3c8fad1829cc33e903500b41d989fa50ab196378;hp=1922163c8dfe717c089bdcc18ade4a65350a09c8 -- Ueimor bugme-daemon@kernel-bugs.osdl.org wrote: > > http://bugzilla.kernel.org/show_bug.cgi?id=5025 > > Summary: Excessive time to connect to cable provider using dhcpcd > Kernel Version: 2.6.13-rc6 > Status: NEW > Severity: low > Owner: greg@kroah.com > Submitter: frank.peters@comcast.net > > > Most recent kernel where this bug did not occur: 2.6.11.7 (I don't know about > 2.6.11 versions beyond 2.6.11.7. I moved directly from 2.6.11.7 to 2.6.12.) > Distribution: None -- self-compiled system > Hardware Environment: Asus P4B533 motherboard with 3Com Etherlink 3C900B NIC > Software Environment: Standard stock packages > Problem Description: On all kernels prior to 2.6.12 (see above) a connection > could be established to my Internet cable provider in about 10 seconds. Since > 2.6.12, and currently with 2.6.13-rc6, the connection time is about four > minutes. No changes whatsoever were made to my hardware or software. > Once the connection is finally established, the eth0 link functions normally. > Also, no errors/warnings or any other anomolous information is present in > any log files. > Steps to reproduce: After kernel boot-up, load eth0 module with 'modprobe 3c59x' > and then connect to cable provider using 'dhcpcd -t 240'. It requires close > to four minutes to connect. > IIRC, we established that this was not a device driver problem and we're suspecting a generic networking problem. Could someone please talk Frank through generating a suitable tcpdump trace? On Mon, 8 Aug 2005 13:47:22 -0700 bugme-daemon@kernel-bugs.osdl.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=5025 > > Can you : > - send the output of a 'lspci -vx' under 2.6.13-rc6; > - send the dmesg from boot to the loading of the module under both 2.6.13-rc6 > and 2.6.11.7 > All this is attached as lspci-dmsg.txt. > > - try to revert : > http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff_plain;h=3c8fad1829cc33e903500b41d989fa50ab196378;hp=1922163c8dfe717c089bdcc18ade4a65350a09c8 > > -- I'm not sure what this is about. I'll check into it later. > IIRC, we established that this was not a device driver problem and we're > suspecting a generic networking problem. > > Could someone please talk Frank through generating a suitable tcpdump trace? > I'll start doing some research into tcpdump tracing and hopefull will be able to apply to apply it soon. Frank Peters > Could someone please talk Frank through generating a suitable tcpdump trace?
Frank, before the dhcpcd command, can you issue for both 2.6.11.7 and 2.6.13-rc6:
tcpdump -v -s 0 -i eth0 -l -n -U -w /tmp/quuz
Stop it when the link is up and attach both /tmp/quuz to the bug report.
Thanks.
--
Ueimor
Created attachment 5551 [details]
Output of lspci -vx and dmsg for 2.6.13-rc6 and 2.6.11.7
On Mon, 8 Aug 2005 15:11:51 -0700 bugme-daemon@kernel-bugs.osdl.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=5025 > > > Frank, before the dhcpcd command, can you issue for both 2.6.11.7 and 2.6.13-rc6: > tcpdump -v -s 0 -i eth0 -l -n -U -w /tmp/quuz > Stop it when the link is up and attach both /tmp/quuz to the bug report. > That doesn't seem to be possible. Running tcpdump before dhcpcd only gives me a "Network is down" message. I use a non-networked desktop system. Only when I need to establish an Internet connection do I invoke dhcpcd which sets up the eth0 interface. Before then the eth0 interface does not exist. Frank Peters On Mon, 8 Aug 2005 15:11:51 -0700 bugme-daemon@kernel-bugs.osdl.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=5025 > > > Frank, before the dhcpcd command, can you issue for both 2.6.11.7 and 2.6.13-rc6: > tcpdump -v -s 0 -i eth0 -l -n -U -w /tmp/quuz > Stop it when the link is up and attach both /tmp/quuz to the bug report. > Issuing tcpdump before dhcpcd does not work, but what if I use the following bash script to execute tcpdump immediately after dhcpcd: #! /bin/bash dhcpcd -t 240 & tcpdump -v -s 0 -i eth0 -l -n -U -w /tmp/quuz-2.6.13-rc6 Then as soon as 'netstat -nr' shows that the link is established, I would manually terminate tcpdump. Would this give you the information that you need? Frank Peters On Mon, 8 Aug 2005 15:11:51 -0700 bugme-daemon@kernel-bugs.osdl.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=5025 > > > Frank, before the dhcpcd command, can you issue for both 2.6.11.7 and 2.6.13-rc6: > tcpdump -v -s 0 -i eth0 -l -n -U -w /tmp/quuz > Stop it when the link is up and attach both /tmp/quuz to the bug report. > I found a way to use tcpdump even with no eth0 addresses already established. Here is the method I used: ifconfig eth0 up tcpdump -v -s 0 -i eth0 -l -n -U -w /tmp/quuz dhcpcd -t 240 This allows listening on the eth0 interface without any address assigned to it. Tcpdump is then invoked, followed by dhcpcd. If this is acceptable, I have attached files of this packet capture method for both kernels 2.6.11.7 and 2.6.13-rc6. The attached files are quuz-26117 and quuz-2613rc6. Frank Peters Created attachment 5552 [details]
dhcpcd tcpdump for kernel-2.6.11.7
Created attachment 5553 [details]
dhcpcd tcpdump for kernel-2.6.13-rc6
Created attachment 5554 [details]
dhcpcd tcpdump for kernel-2.6.11.7
Created attachment 5555 [details]
dhcpcd tcpdump for kernel-2.6.13rc6
http://bugzilla.kernel.org/show_bug.cgi?id=5025 I am terribly sorry. I posted the same attachments twice as both text/plain and application/octet stream. There is no difference between the two forms, however. They both are produced by tcpdump and both can be loaded into a program like ethereal. Also, my attempts to obsolete the plain/text entries were not accurate. This is my first use of bugzilla and I need to get used to the options. Frank Peters http://bugzilla.kernel.org/show_bug.cgi?id=5025 Looking at the tcpdump files in Ethereal, a network analyzer, I see only a long delay in the case of 2.6.13-rc6 before the actual dhcp negotiation begins, whereas with 2.6.11.7 it begins immediately. With 2.6.13rc5, during the long delay there are no packets of any kind except the ARP packets originating from the Cisco hardware. Using the debugging option with dhcpcd, the same long delay with no activity is also evident when using 2.6.13-rc6. If this were a general networking problem, wouldn't there be some kind of packet exchange originating from the local machine right from the beginning? Frank Peters Created attachment 5562 [details]
strace of dhcpcd on kernel-2.6.13-rc6
Created attachment 5563 [details]
strace of dhcpcd on kernel-2.6.11.7
http://bugzilla.kernel.org/show_bug.cgi?id=5025 Attached are the results of strace done with the the dhcpcd command for both kernel-2.6.13-rc6 and kernel-2.6.11.7. The files are: strace-2613rc6 strace-26117 The only diference I can immediately see between the two traces is a great deal of time spent invoking the gettimeofday() function with kernel-2.6.13rc6. I hope this can shed some light on the problem. Frank Peters The symptom suggest that the early Tx (Rx ?) packets are lost somewhere. I see no change in the driver which could explain it. However you enabled ACPI in 2.6.13-rc6 whereas it is disabled in 2.6.11.7. So... How does 2.6.13-rc behave if you disable ACPI and/or use the same boot command line options ? -- Ueimor On Tue, 9 Aug 2005 16:13:16 -0700 bugme-daemon@kernel-bugs.osdl.org wrote: > > So... How does 2.6.13-rc behave if you disable ACPI and/or use the same boot > command line options ? > The same behavior is seen with ACPI disabled. If I boot 2.6.13-rc6 with the option "acpi=off" the same slow start-up with dhcpcd results. ACPI doesn't seem to matter. Frank Peters Created attachment 5567 [details]
modprobe 3c59x debug=3 kernel log
On Wed, 10 Aug 2005 00:39:54 +0200 Tommy Christensen <tommy.christensen@tpack.net> wrote: > > ... could you please try enabling some > debug output from the network driver, and attach the output > to the report? (Only the 2.6.13-rc6 kernel is relevant here). > > # modprobe 3c59x debug=3 > should do it. > The kernel-2.6.13-rc6 log for this module loading followed immediately by dhcpcd is attached as 3c59x-debug. > Another interesting experiment would be to add a delay before > calling dhcpcd, but after ifconfig eth0 up. > > If you do: > # modprobe 3c59x > # ifconfig eth0 up > # sleep 120 > # dhcpcd -t 240 > > does the dhcpcd command complete as fast as it did on the > 2.6.11.7 kernel? > Success! The dhcpcd command completes with no delay at all. I just added the command 'echo "start"' to get a fix on when the dhcpcd process would begin. There was no noticable delay whatsoever. Interestingly, I noticed the same thing with kernel-2.6.11.7 while trying to get a tcpdump. The dhcpcd process finished without any delay after first issuing "ifconfig eth0 up" when otherwise it would require about 10 seconds. Maybe this problem is not so unique to 2.6.12+. Frank Peters On Thu, 11 Aug 2005 02:08:05 +0200 Tommy Christensen <tommy.christensen@tpack.net> wrote: > > Recent kernels (>= 2.6.12) won't send packets to a network > interface unless the carrier is OK. > > The odd thing here is that the driver waits for 60 seconds > between each check for carrier. That seems rather long. > Coupled with the exponential back-off of the dhcp process, this > leads to the long delays you're experiencing. > > On top of that, it seems that link actually *is* established, > when the phy status register is checked at first. However, due > to the link bit beeing latched low, the driver reports link down > here. When the register is read again (60s later!) everything > falls into place and packets can start flowing. > > The patch below avoids this "false" link down condition by > re-reading the phy status register. Could you give it a try? > Success again! The patch fixes the problem. The dhcpcd now establishes a connection within 10 seconds. Many thanks to all concerned for your time and effort on this rather low priority issue. Frank Peters [3c59x-carrier.patch text/plain (622B)] [3c59x] Read current link status from phy The phy status register must be read twice in order to get the actual link state. Signed-off-by: Tommy S. Christensen <tommy.christensen@tpack.net> --- linux-2.6.13-rc6/drivers/net/3c59x.c-old Sun Aug 7 20:18:56 2005 +++ linux-2.6.13-rc6/drivers/net/3c59x.c Thu Aug 11 01:59:13 2005 @@ -1873,6 +1873,7 @@ vortex_timer(unsigned long data) { spin_lock_bh(&vp->lock); mii_status = mdio_read(dev, vp->phys[0], 1); + mii_status = mdio_read(dev, vp->phys[0], 1); ok = 1; if (vortex_debug > 2) printk(KERN_DEBUG "%s: MII transceiver has status %4.4x.\n", For me, the problem is fixed. Using the patch supplied by Tommy Christensen eliminates the problem. What happens now? Who is supposed to change the status of this report? Will the kernel driver eventually include this patch or should I just keep manually modifying the code with each new kernel release? Frank Peters Tommy's patch is in Linus's post-2.6.13 tree. |