Bug 16187
Summary: | Carrier detection failed in dhcpcd when link is up | ||
---|---|---|---|
Product: | Networking | Reporter: | Christian Casteyde (casteyde.christian) |
Component: | Other | Assignee: | Arnaldo Carvalho de Melo (acme) |
Status: | CLOSED INVALID | ||
Severity: | normal | CC: | andreas.sturmlechner, bruce.w.allan, davem, maciej.rutecki, rjw |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.35-rc2 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 16055 | ||
Attachments: |
lspci output for 2.6.34 on my computer
git bisect log |
(switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Sat, 12 Jun 2010 15:15:31 GMT bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=16187 > > Summary: Carrier detection failed in dhcpcd when link is up > Product: Networking > Version: 2.5 > Kernel Version: 2.6.35-rc2 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Other > AssignedTo: acme@ghostprotocols.net > ReportedBy: casteyde.christian@free.fr > Regression: Yes > > > Created an attachment (id=26742) > --> (https://bugzilla.kernel.org/attachment.cgi?id=26742) > lspci output for 2.6.34 on my computer > > Kernel at least 2.6.35-rc2, 2.6.34 works fine Seems to be post-2.6.34 breakage in the tulip driver. > Athlon X2 3000 in 64bits mode > Slackware 13.1 > "Ethernet controller: ALi Corporation ULi 1689,1573 integrated ethernet. (rev > 40)" (from lspci) > > Since 2.6.35-rc2 (didn't checked -rc1), dhcpcd hangs fails to detect carrier > appearance at boot. > > The Slackware network script uses dhcpcd to bring DHCP interfaces up. At > boot, > it seems my network device doesn't have the link up immediatly. dhcpcd tries > to > bring it up, and wait for the carrier. The carrier indeed goes up, but dhcpcd > gets absolutly no notification of that, and therefore times out. > > logs says that: > Jun 12 16:53:58 sirius logger: /etc/rc.d/rc.inet1: /sbin/route add -net > 127.0.0.0 netmask 255.0.0.0 lo > Jun 12 16:53:58 sirius logger: /etc/rc.d/rc.inet1: /sbin/dhcpcd -t 10 eth0 > Jun 12 16:53:58 sirius dhcpcd: version 5.2.2 starting > Jun 12 16:53:58 sirius kernel: ADDRCONF(NETDEV_UP): eth0: link is not ready > Jun 12 16:53:58 sirius dhcpcd: eth0: waiting for carrier > Jun 12 16:54:01 sirius kernel: uli526x: eth0 NIC Link is Up 100 Mbps Full > duplex > Jun 12 16:54:01 sirius kernel: ADDRCONF(NETDEV_CHANGE): eth0: link becomes > ready > then dhcpcd times out a little later (10 seconds as -t 10 is specified). > > If I use the -K option (no carrier detection), and does ifconfig eth0 up just > before issueing the dhcpcd command, dhcpcd doesn't wait for the carrier and > gets a lease correctly. > > Reversely, with the 2.6.34 kernel, dhcpcd indeed gets the link up > notification, > and gets the lease immediatly. In this case, the log says: > > Jun 12 17:04:21 sirius logger: /etc/rc.d/rc.inet1: /sbin/route add -net > 127.0.0.0 netmask 255.0.0.0 lo > Jun 12 17:04:21 sirius logger: /etc/rc.d/rc.inet1: /sbin/dhcpcd -t 10 eth0 > Jun 12 17:04:22 sirius dhcpcd: version 5.2.2 starting > Jun 12 17:04:22 sirius kernel: ADDRCONF(NETDEV_UP): eth0: link is not ready > Jun 12 17:04:22 sirius dhcpcd: eth0: waiting for carrier > Jun 12 17:04:25 sirius dhcpcd: eth0: carrier acquired > Jun 12 17:04:25 sirius kernel: uli526x: eth0 NIC Link is Up 100 Mbps Full > duplex > Jun 12 17:04:25 sirius kernel: ADDRCONF(NETDEV_CHANGE): eth0: link becomes > ready > Jun 12 17:04:25 sirius dhcpcd: eth0: broadcasting for a lease > Jun 12 17:04:29 sirius dhcpcd: eth0: offered 192.168.1.3 from 192.168.1.1 > Jun 12 17:04:29 sirius dhcpcd: eth0: acknowledged 192.168.1.3 from > 192.168.1.1 > Jun 12 17:04:29 sirius dhcpcd: eth0: checking for 192.168.1.3 > Jun 12 17:04:34 sirius dhcpcd: eth0: leased 192.168.1.3 for 864000 seconds > Jun 12 17:04:34 sirius dhcpcd: forking to background > > As you can see, dhcpcd asks for a lease as soon as the kernel tells it the > link > is up (message "carrier acquired"). > > Therefore, I think 2.6.35-rc* notification of carrier is broken, and at least > it broke dhcpcd way of watching the link. > Exact, my kernel config is:
-*- Generic Media Independent Interface device support
[*] "Tulip" family network device support
<*> ULi M526x controller support
Le mardi 15 juin 2010 23:24:18, Andrew Morton a écrit :
> (switched to email. Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
>
> On Sat, 12 Jun 2010 15:15:31 GMT
>
> bugzilla-daemon@bugzilla.kernel.org wrote:
> > https://bugzilla.kernel.org/show_bug.cgi?id=16187
> >
> > Summary: Carrier detection failed in dhcpcd when link is up
> > Product: Networking
> > Version: 2.5
> >
> > Kernel Version: 2.6.35-rc2
> >
> > Platform: All
> >
> > OS/Version: Linux
> >
> > Tree: Mainline
> >
> > Status: NEW
> >
> > Severity: normal
> > Priority: P1
> >
> > Component: Other
> >
> > AssignedTo: acme@ghostprotocols.net
> > ReportedBy: casteyde.christian@free.fr
> > Regression: Yes
> >
> > Created an attachment (id=26742)
> >
> > --> (https://bugzilla.kernel.org/attachment.cgi?id=26742)
> >
> > lspci output for 2.6.34 on my computer
> >
> > Kernel at least 2.6.35-rc2, 2.6.34 works fine
>
> Seems to be post-2.6.34 breakage in the tulip driver.
>
> > Athlon X2 3000 in 64bits mode
> > Slackware 13.1
> > "Ethernet controller: ALi Corporation ULi 1689,1573 integrated ethernet.
> > (rev 40)" (from lspci)
> >
> > Since 2.6.35-rc2 (didn't checked -rc1), dhcpcd hangs fails to detect
> > carrier appearance at boot.
> >
> > The Slackware network script uses dhcpcd to bring DHCP interfaces up. At
> > boot, it seems my network device doesn't have the link up immediatly.
> > dhcpcd tries to bring it up, and wait for the carrier. The carrier
> > indeed goes up, but dhcpcd gets absolutly no notification of that, and
> > therefore times out.
> >
> > logs says that:
> > Jun 12 16:53:58 sirius logger: /etc/rc.d/rc.inet1: /sbin/route add -net
> > 127.0.0.0 netmask 255.0.0.0 lo
> > Jun 12 16:53:58 sirius logger: /etc/rc.d/rc.inet1: /sbin/dhcpcd -t 10
> > eth0 Jun 12 16:53:58 sirius dhcpcd: version 5.2.2 starting
> > Jun 12 16:53:58 sirius kernel: ADDRCONF(NETDEV_UP): eth0: link is not
> > ready Jun 12 16:53:58 sirius dhcpcd: eth0: waiting for carrier
> > Jun 12 16:54:01 sirius kernel: uli526x: eth0 NIC Link is Up 100 Mbps Full
> > duplex
> > Jun 12 16:54:01 sirius kernel: ADDRCONF(NETDEV_CHANGE): eth0: link
> > becomes ready
> > then dhcpcd times out a little later (10 seconds as -t 10 is specified).
> >
> > If I use the -K option (no carrier detection), and does ifconfig eth0 up
> > just before issueing the dhcpcd command, dhcpcd doesn't wait for the
> > carrier and gets a lease correctly.
> >
> > Reversely, with the 2.6.34 kernel, dhcpcd indeed gets the link up
> > notification, and gets the lease immediatly. In this case, the log says:
> >
> > Jun 12 17:04:21 sirius logger: /etc/rc.d/rc.inet1: /sbin/route add -net
> > 127.0.0.0 netmask 255.0.0.0 lo
> > Jun 12 17:04:21 sirius logger: /etc/rc.d/rc.inet1: /sbin/dhcpcd -t 10
> > eth0 Jun 12 17:04:22 sirius dhcpcd: version 5.2.2 starting
> > Jun 12 17:04:22 sirius kernel: ADDRCONF(NETDEV_UP): eth0: link is not
> > ready Jun 12 17:04:22 sirius dhcpcd: eth0: waiting for carrier
> > Jun 12 17:04:25 sirius dhcpcd: eth0: carrier acquired
> > Jun 12 17:04:25 sirius kernel: uli526x: eth0 NIC Link is Up 100 Mbps Full
> > duplex
> > Jun 12 17:04:25 sirius kernel: ADDRCONF(NETDEV_CHANGE): eth0: link
> > becomes ready
> > Jun 12 17:04:25 sirius dhcpcd: eth0: broadcasting for a lease
> > Jun 12 17:04:29 sirius dhcpcd: eth0: offered 192.168.1.3 from 192.168.1.1
> > Jun 12 17:04:29 sirius dhcpcd: eth0: acknowledged 192.168.1.3 from
> > 192.168.1.1 Jun 12 17:04:29 sirius dhcpcd: eth0: checking for
> > 192.168.1.3
> > Jun 12 17:04:34 sirius dhcpcd: eth0: leased 192.168.1.3 for 864000
> > seconds Jun 12 17:04:34 sirius dhcpcd: forking to background
> >
> > As you can see, dhcpcd asks for a lease as soon as the kernel tells it
> > the link is up (message "carrier acquired").
> >
> > Therefore, I think 2.6.35-rc* notification of carrier is broken, and at
> > least it broke dhcpcd way of watching the link.
On Tue, Jun 15, 2010 at 02:24:18PM -0700, Andrew Morton wrote:
>
> (switched to email. Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
I've resync to linus' tree (2.6.35-rc3) and reviewed the output of:
git diff v2.6.34 drivers/net/tulip/
I don't see anything that would affect how link state
changes get reported to user space.
I'm not inclined to believe this is a tulip "bug" unless
core netdev behavior changed and tulip is not longer
doing the right thing.
hth,
grant
On Thursday, June 17, 2010 11:17 PM, Grant Grundler wrote:
> On Tue, Jun 15, 2010 at 02:24:18PM -0700, Andrew Morton wrote:
>>
>> (switched to email. Please respond via emailed reply-to-all, not
>> via the bugzilla web interface).
>
> I've resync to linus' tree (2.6.35-rc3) and reviewed the output of:
> git diff v2.6.34 drivers/net/tulip/
>
> I don't see anything that would affect how link state
> changes get reported to user space.
>
> I'm not inclined to believe this is a tulip "bug" unless
> core netdev behavior changed and tulip is not longer
> doing the right thing.
>
> hth,
> grant
I don't believe this is a tulip specific bug - the same thing has been reported against e1000e and bnx2 (IIRC); I have not had the time to investigate further.
Bruce.
On Fri, 18 Jun 2010 08:04:36 -0700 "Allan, Bruce W" <bruce.w.allan@intel.com> wrote: > On Thursday, June 17, 2010 11:17 PM, Grant Grundler wrote: > > On Tue, Jun 15, 2010 at 02:24:18PM -0700, Andrew Morton wrote: > >> > >> (switched to email. Please respond via emailed reply-to-all, not > >> via the bugzilla web interface). > > > > I've resync to linus' tree (2.6.35-rc3) and reviewed the output of: > > git diff v2.6.34 drivers/net/tulip/ > > > > I don't see anything that would affect how link state > > changes get reported to user space. > > > > I'm not inclined to believe this is a tulip "bug" unless > > core netdev behavior changed and tulip is not longer > > doing the right thing. > > > > hth, > > grant > > I don't believe this is a tulip specific bug - the same thing has been > reported against e1000e and bnx2 (IIRC); I have not had the time to > investigate further. So it's affecting three drivers. One thing which changed in there recently is : commit b2db756449f63f98049587f7ede4a8e85e0c79b1 : Author: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> : AuthorDate: Sat Mar 20 16:11:12 2010 -0700 : Commit: David S. Miller <davem@davemloft.net> : CommitDate: Sat Mar 20 16:11:12 2010 -0700 : : ipv6: Reduce timer events for addrconf_verify(). So perhaps someone could test the simple reversion patch, below? I couldn't locate these e1000e and bnx2 bug reports so I couldn't cc the reporters :( I'm seeing several patches on netdev "use netif_carrier_off to prevent tx timeout". Is that related? net/ipv6/addrconf.c | 27 ++++----------------------- 1 file changed, 4 insertions(+), 23 deletions(-) diff -puN net/ipv6/addrconf.c~revert-2 net/ipv6/addrconf.c --- a/net/ipv6/addrconf.c~revert-2 +++ a/net/ipv6/addrconf.c @@ -100,10 +100,6 @@ #define INFINITY_LIFE_TIME 0xFFFFFFFF #define TIME_DELTA(a, b) ((unsigned long)((long)(a) - (long)(b))) -#define ADDRCONF_TIMER_FUZZ_MINUS (HZ > 50 ? HZ/50 : 1) -#define ADDRCONF_TIMER_FUZZ (HZ / 4) -#define ADDRCONF_TIMER_FUZZ_MAX (HZ) - #ifdef CONFIG_SYSCTL static void addrconf_sysctl_register(struct inet6_dev *idev); static void addrconf_sysctl_unregister(struct inet6_dev *idev); @@ -3159,15 +3155,15 @@ int ipv6_chk_home_addr(struct net *net, static void addrconf_verify(unsigned long foo) { - unsigned long now, next, next_sec, next_sched; struct inet6_ifaddr *ifp; struct hlist_node *node; + unsigned long now, next; int i; rcu_read_lock_bh(); spin_lock(&addrconf_verify_lock); now = jiffies; - next = round_jiffies_up(now + ADDR_CHECK_FREQUENCY); + next = now + ADDR_CHECK_FREQUENCY; del_timer(&addr_chk_timer); @@ -3181,8 +3177,7 @@ restart: continue; spin_lock(&ifp->lock); - /* We try to batch several events at once. */ - age = (now - ifp->tstamp + ADDRCONF_TIMER_FUZZ_MINUS) / HZ; + age = (now - ifp->tstamp) / HZ; if (ifp->valid_lft != INFINITY_LIFE_TIME && age >= ifp->valid_lft) { @@ -3252,21 +3247,7 @@ restart: } } - next_sec = round_jiffies_up(next); - next_sched = next; - - /* If rounded timeout is accurate enough, accept it. */ - if (time_before(next_sec, next + ADDRCONF_TIMER_FUZZ)) - next_sched = next_sec; - - /* And minimum interval is ADDRCONF_TIMER_FUZZ_MAX. */ - if (time_before(next_sched, jiffies + ADDRCONF_TIMER_FUZZ_MAX)) - next_sched = jiffies + ADDRCONF_TIMER_FUZZ_MAX; - - ADBG((KERN_DEBUG "now = %lu, schedule = %lu, rounded schedule = %lu => %lu\n", - now, next, next_sec, next_sched)); - - addr_chk_timer.expires = next_sched; + addr_chk_timer.expires = time_before(next, jiffies + HZ) ? jiffies + HZ : next; add_timer(&addr_chk_timer); spin_unlock(&addrconf_verify_lock); rcu_read_unlock_bh(); _ Handled-By : Andrew Morton <akpm@linux-foundation.org> I can test since I can reproduce it However, I didn't managed to revert the patch: root@sirius:/usr/src/linux# patch -p1 -R < ../revert.patch patching file net/ipv6/addrconf.c Hunk #1 succeeded at 104 with fuzz 2 (offset 4 lines). Hunk #2 FAILED at 3163. Hunk #3 FAILED at 3185. Hunk #4 FAILED at 3256. 3 out of 4 hunks FAILED -- saving rejects to file net/ipv6/addrconf.c.rej I applied it on -rc1, -rc2 and -rc3 and all failed I bisected it: 10708f37ae729baba9b67bd134c3720709d4ae62 is the first bad commit commit 10708f37ae729baba9b67bd134c3720709d4ae62 Author: Jan Engelhardt <jengelh@medozas.de> Date: Thu Mar 11 09:57:29 2010 +0000 net: core: add IFLA_STATS64 support `ip -s link` shows interface counters truncated to 32 bit. This is because interface statistics are transported only in 32-bit quantity to userspace. This commit adds a new IFLA_STATS64 attribute that exports them in full 64 bit. References: http://lkml.indiana.edu/hypermail/linux/kernel/0307.3/0215.html Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: David S. Miller <davem@davemloft.net> :040000 040000 d3495c3a1f0c879f0635279e05ccffbcab6b6f1b e42d269b76c89d6d098b9f290b959e0320615756 M include :040000 040000 94c49d66f55109c55c7d832ed74205b0b40cb983 eab089628db95a473a1ced8d99c629b3b3b2ba91 M net Created attachment 26904 [details]
git bisect log
Good job, thanks! First-Bad-Commit : 10708f37ae729baba9b67bd134c3720709d4ae62 Update: Still present in 2.6.35-rc4. Bug is in dhcpd, it allocates a fixed 256 byte buffer for netlink messages. Bug is fixed in dhcpcd-5.2.7 |
Created attachment 26742 [details] lspci output for 2.6.34 on my computer Kernel at least 2.6.35-rc2, 2.6.34 works fine Athlon X2 3000 in 64bits mode Slackware 13.1 "Ethernet controller: ALi Corporation ULi 1689,1573 integrated ethernet. (rev 40)" (from lspci) Since 2.6.35-rc2 (didn't checked -rc1), dhcpcd hangs fails to detect carrier appearance at boot. The Slackware network script uses dhcpcd to bring DHCP interfaces up. At boot, it seems my network device doesn't have the link up immediatly. dhcpcd tries to bring it up, and wait for the carrier. The carrier indeed goes up, but dhcpcd gets absolutly no notification of that, and therefore times out. logs says that: Jun 12 16:53:58 sirius logger: /etc/rc.d/rc.inet1: /sbin/route add -net 127.0.0.0 netmask 255.0.0.0 lo Jun 12 16:53:58 sirius logger: /etc/rc.d/rc.inet1: /sbin/dhcpcd -t 10 eth0 Jun 12 16:53:58 sirius dhcpcd: version 5.2.2 starting Jun 12 16:53:58 sirius kernel: ADDRCONF(NETDEV_UP): eth0: link is not ready Jun 12 16:53:58 sirius dhcpcd: eth0: waiting for carrier Jun 12 16:54:01 sirius kernel: uli526x: eth0 NIC Link is Up 100 Mbps Full duplex Jun 12 16:54:01 sirius kernel: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready then dhcpcd times out a little later (10 seconds as -t 10 is specified). If I use the -K option (no carrier detection), and does ifconfig eth0 up just before issueing the dhcpcd command, dhcpcd doesn't wait for the carrier and gets a lease correctly. Reversely, with the 2.6.34 kernel, dhcpcd indeed gets the link up notification, and gets the lease immediatly. In this case, the log says: Jun 12 17:04:21 sirius logger: /etc/rc.d/rc.inet1: /sbin/route add -net 127.0.0.0 netmask 255.0.0.0 lo Jun 12 17:04:21 sirius logger: /etc/rc.d/rc.inet1: /sbin/dhcpcd -t 10 eth0 Jun 12 17:04:22 sirius dhcpcd: version 5.2.2 starting Jun 12 17:04:22 sirius kernel: ADDRCONF(NETDEV_UP): eth0: link is not ready Jun 12 17:04:22 sirius dhcpcd: eth0: waiting for carrier Jun 12 17:04:25 sirius dhcpcd: eth0: carrier acquired Jun 12 17:04:25 sirius kernel: uli526x: eth0 NIC Link is Up 100 Mbps Full duplex Jun 12 17:04:25 sirius kernel: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready Jun 12 17:04:25 sirius dhcpcd: eth0: broadcasting for a lease Jun 12 17:04:29 sirius dhcpcd: eth0: offered 192.168.1.3 from 192.168.1.1 Jun 12 17:04:29 sirius dhcpcd: eth0: acknowledged 192.168.1.3 from 192.168.1.1 Jun 12 17:04:29 sirius dhcpcd: eth0: checking for 192.168.1.3 Jun 12 17:04:34 sirius dhcpcd: eth0: leased 192.168.1.3 for 864000 seconds Jun 12 17:04:34 sirius dhcpcd: forking to background As you can see, dhcpcd asks for a lease as soon as the kernel tells it the link is up (message "carrier acquired"). Therefore, I think 2.6.35-rc* notification of carrier is broken, and at least it broke dhcpcd way of watching the link.