Created attachment 24165 [details] Wireshark log while acquiring IP address The IP address which was assigned by DHCP is dropped and the address 169.254.123.251 is set. This happens on Clevo D4J model D410J with 2.6.32, but works correctly with 2.6.31. Steps to reproduce: 1. Boot the system 2. In an xterm window execute the following command to see the actual IP address settings: while true; do clear; date; /sbin/ifconfig; sleep 1; done 3. Unplug the network cable for some seconds: the IP address should be gone 4. Plug the network cable. 5. After about 10 seconds the IP address 192.168.1.64 assigned by DHCP is visible 6. After about 50 seconds (counting from the plug of the network cable) no IP address is assigned 7. After about 65 seconds (counting from the plug of the network cable) the IP address 169.254.123.251 is assigned. This IP address will not work on this network. I run Wireshark during the whole process, see the attached log file. Packet 25..28: DHCP address is fetched. Packet 63: DHCP release: this should not happen
Created attachment 24166 [details] dmesg from 2.6.32
Bug #14791 maybe related to this one.
Created attachment 24217 [details] git bisect log I "git bisect"ed the problem: 61cbe54d9479ad98283b2dda686deae4c34b2d59 is the first bad commit commit 61cbe54d9479ad98283b2dda686deae4c34b2d59 Author: Mike Galbraith <efault@gmx.de> Date: Wed Sep 9 15:41:37 2009 +0200 sched: Keep kthreads at default priority Removes kthread/workqueue priority boost, they increase worst-case desktop latencies. Signed-off-by: Mike Galbraith <efault@gmx.de> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1252486344.28645.18.camel@marge.simson.net> Signed-off-by: Ingo Molnar <mingo@elte.hu> :040000 040000 d670b7245cb0fc1b422e44a8f2d29b3615099b04 6186e24908a8cdd5770ee465f738abe857ba5604 M kernel
Hi, I bisected the problem, see http://bugzilla.kernel.org/show_bug.cgi?id=14794 for details. Any idea what is the relation between this problem and the found patch? Regards, Márton Németh Andrew Morton wrote: > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > A 2.6.31 -> 2.6.32 regression. > > On Sun, 13 Dec 2009 08:59:48 GMT > bugzilla-daemon@bugzilla.kernel.org wrote: > >> http://bugzilla.kernel.org/show_bug.cgi?id=14794 >> >> Summary: IP address assigned by DHCP is dropped after ~40 >> seconds >> Product: Networking >> Version: 2.5 >> Kernel Version: 2.6.32 >> Platform: All >> OS/Version: Linux >> Tree: Mainline >> Status: NEW >> Severity: normal >> Priority: P1 >> Component: IPV4 >> AssignedTo: shemminger@linux-foundation.org >> ReportedBy: nm127@freemail.hu >> Regression: Yes >> >> >> Created an attachment (id=24165) >> --> (http://bugzilla.kernel.org/attachment.cgi?id=24165) >> Wireshark log while acquiring IP address >> >> The IP address which was assigned by DHCP is dropped and the address >> 169.254.123.251 is set. This happens on Clevo D4J model D410J with 2.6.32, >> but >> works correctly with 2.6.31. >> >> Steps to reproduce: >> 1. Boot the system >> 2. In an xterm window execute the following command to see the actual IP >> address settings: >> while true; do clear; date; /sbin/ifconfig; sleep 1; done >> 3. Unplug the network cable for some seconds: the IP address should be gone >> 4. Plug the network cable. >> 5. After about 10 seconds the IP address 192.168.1.64 assigned by DHCP is >> visible >> 6. After about 50 seconds (counting from the plug of the network cable) no >> IP >> address is assigned >> 7. After about 65 seconds (counting from the plug of the network cable) the >> IP >> address 169.254.123.251 is assigned. This IP address will not work on this >> network. >> >> I run Wireshark during the whole process, see the attached log file. >> Packet 25..28: DHCP address is fetched. >> Packet 63: DHCP release: this should not happen
On Thu, 17 Dec 2009 21:53:20 +0100 N__meth M__rton <nm127@freemail.hu> wrote: > Hi, > > I bisected the problem, see http://bugzilla.kernel.org/show_bug.cgi?id=14794 > for details. Any idea what is the relation between this problem and the found > patch? > Please don't update this report via the bugillla interface. See "(switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface)." : 61cbe54d9479ad98283b2dda686deae4c34b2d59 is the first bad commit : commit 61cbe54d9479ad98283b2dda686deae4c34b2d59 : Author: Mike Galbraith <efault@gmx.de> : Date: Wed Sep 9 15:41:37 2009 +0200 : : sched: Keep kthreads at default priority Strange. Might be a timing thing bu a) I doubt if any kernel threads are involved in maintaining a DHCP lease and b) even if they were, such a race wouldn't be this repeatable. Maybe it was a bisection glitch. Did you try reverting just that patch, see if it fixed things again?
(switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). A 2.6.31 -> 2.6.32 regression. On Sun, 13 Dec 2009 08:59:48 GMT bugzilla-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=14794 > > Summary: IP address assigned by DHCP is dropped after ~40 > seconds > Product: Networking > Version: 2.5 > Kernel Version: 2.6.32 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: IPV4 > AssignedTo: shemminger@linux-foundation.org > ReportedBy: nm127@freemail.hu > Regression: Yes > > > Created an attachment (id=24165) > --> (http://bugzilla.kernel.org/attachment.cgi?id=24165) > Wireshark log while acquiring IP address > > The IP address which was assigned by DHCP is dropped and the address > 169.254.123.251 is set. This happens on Clevo D4J model D410J with 2.6.32, > but > works correctly with 2.6.31. > > Steps to reproduce: > 1. Boot the system > 2. In an xterm window execute the following command to see the actual IP > address settings: > while true; do clear; date; /sbin/ifconfig; sleep 1; done > 3. Unplug the network cable for some seconds: the IP address should be gone > 4. Plug the network cable. > 5. After about 10 seconds the IP address 192.168.1.64 assigned by DHCP is > visible > 6. After about 50 seconds (counting from the plug of the network cable) no IP > address is assigned > 7. After about 65 seconds (counting from the plug of the network cable) the > IP > address 169.254.123.251 is assigned. This IP address will not work on this > network. > > I run Wireshark during the whole process, see the attached log file. > Packet 25..28: DHCP address is fetched. > Packet 63: DHCP release: this should not happen >
Andrew Morton wrote, On 12/17/2009 10:24 PM: > On Thu, 17 Dec 2009 21:53:20 +0100 > N__meth M__rton <nm127@freemail.hu> wrote: > >> Hi, >> >> I bisected the problem, see http://bugzilla.kernel.org/show_bug.cgi?id=14794 >> for details. Any idea what is the relation between this problem and the >> found >> patch? >> > > Please don't update this report via the bugillla interface. See > "(switched to email. Please respond via emailed reply-to-all, not via > the bugzilla web interface)." > > : 61cbe54d9479ad98283b2dda686deae4c34b2d59 is the first bad commit > : commit 61cbe54d9479ad98283b2dda686deae4c34b2d59 > : Author: Mike Galbraith <efault@gmx.de> > : Date: Wed Sep 9 15:41:37 2009 +0200 > : > : sched: Keep kthreads at default priority > > Strange. Might be a timing thing bu a) I doubt if any kernel threads > are involved in maintaining a DHCP lease and b) even if they were, such > a race wouldn't be this repeatable. > "Removes kthread/workqueue priority boost[...]" - if there could be workqueues involved - maybe something with link-watch? Jarek P.
Reply-To: peterz@infradead.org On Thu, 2009-12-17 at 21:53 +0100, Németh Márton wrote: > > I bisected the problem, see > http://bugzilla.kernel.org/show_bug.cgi?id=14794 > for details. Any idea what is the relation between this problem and > the found > patch? If you take 32 and revert just that one patch it works again?
Hi, Jarek Poplawski wrote: > Andrew Morton wrote, On 12/17/2009 10:24 PM: > >> On Thu, 17 Dec 2009 21:53:20 +0100 >> Németh Márton <nm127@freemail.hu> wrote: >> >>> Hi, >>> >>> I bisected the problem, see >>> http://bugzilla.kernel.org/show_bug.cgi?id=14794 >>> for details. Any idea what is the relation between this problem and the >>> found >>> patch? >>> >> : 61cbe54d9479ad98283b2dda686deae4c34b2d59 is the first bad commit >> : commit 61cbe54d9479ad98283b2dda686deae4c34b2d59 >> : Author: Mike Galbraith <efault@gmx.de> >> : Date: Wed Sep 9 15:41:37 2009 +0200 >> : >> : sched: Keep kthreads at default priority >> >> Strange. Might be a timing thing bu a) I doubt if any kernel threads >> are involved in maintaining a DHCP lease and b) even if they were, such >> a race wouldn't be this repeatable. >> > > "Removes kthread/workqueue priority boost[...]" - if there could be > workqueues involved - maybe something with link-watch? Unfortunately reverting the commit 61cbe54d9479ad98283b2dda686deae4c34b2d59 on top of 2.6.32 does not solve the problem. I use KDE, and the KNetworkManager icon is visible on the task bar. When I unplug the network cable the "disconnected" icon appears. After I plug the network cable again, a rotating wheel appears. Here comes the difference: *when* the rotating wheel changes to "connected" state. One case, when the IP address is kept the "connected" icon appears right after the IP address assigned by DHCP appears in the ifconfig output. In the wrong case this rotating wheel is still there for about 60 seconds while the the assigned IP address is dropped and an IP address like 169.254.123.251 is assigned. There is a workaround, too. If the IP address 169.254.123.251 was assigned and I execute "dhclient eth0" as root, an IP address will be assigned by DHCP and this address is not dropped anymore until I again unplug and replug the network cable. My driver for the network card is 8139too. My network card is: 00:08.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10) Subsystem: CLEVO/KAPOK Computer Device 4702 Flags: bus master, medium devsel, latency 64, IRQ 19 I/O ports at 1000 [size=256] Memory at d0004000 (32-bit, non-prefetchable) [size=256] Capabilities: [50] Power Management version 2 Kernel driver in use: 8139too Kernel modules: 8139too Regars, Márton Németh
Reply-To: rdreier@cisco.com > Unfortunately reverting the commit 61cbe54d9479ad98283b2dda686deae4c34b2d59 on > top of 2.6.32 does not solve the problem. Is there any possibility that one of the steps of the bisection was wrong? Could you possibly have accidentally marked a "bad" kernel as "good"? Addresses like 169.254.123.251 are RFC 3297 zeroconf link-local addresses. I think network manager will assign one of those if it thinks the DHCP negotiation failed. It might be informative to compare the network manager and dhclient log output (maybe in /var/log/daemon.log?) in the good and bad cases. - R.
Reply-To: rdreier@cisco.com > Unfortunately reverting the commit 61cbe54d9479ad98283b2dda686deae4c34b2d59 on > top of 2.6.32 does not solve the problem. By the way, one thing you could double check is that 61cbe54d is definitely bad, and the 61cbe54d tree with that commit reverted is definitely good. - R.
On Fri, Dec 18, 2009 at 12:16:13AM -0800, Roland Dreier wrote: > > > Unfortunately reverting the commit > 61cbe54d9479ad98283b2dda686deae4c34b2d59 on > > top of 2.6.32 does not solve the problem. > > Is there any possibility that one of the steps of the bisection was > wrong? Could you possibly have accidentally marked a "bad" kernel as > "good"? > > Addresses like 169.254.123.251 are RFC 3297 zeroconf link-local > addresses. I think network manager will assign one of those if it > thinks the DHCP negotiation failed. > > It might be informative to compare the network manager and dhclient log > output (maybe in /var/log/daemon.log?) in the good and bad cases. Btw, I'm not sure if it matters, but your dmesg doesn't mirror what you described. You mentioned one unplug only, while there are a few link down / link up events visible. So I wonder if bad contact didn't happen here. Then dhcp client might hit some limit of tries. On the other hand, such an effect could be amplified by changes in 2.6.32 too (e.g. with different timing). Could you attach 2.6.31 dmesg after this test? Jarek P.
Jarek Poplawski wrote: > Btw, I'm not sure if it matters, but your dmesg doesn't mirror what > you described. You mentioned one unplug only, while there are a few > link down / link up events visible. So I wonder if bad contact didn't You are right, the dmesg I attached contains more plug and unplug event because I really plugged and unplugged the network cable several times to be sure that the IP address is always dropped. The wireshark log contains only one unplug and plug, however. Regards, Márton Németh
Németh Márton worte: > I use KDE, and the KNetworkManager icon is visible on the task bar. When > I unplug the network cable the "disconnected" icon appears. After I plug the > network cable again, a rotating wheel appears. Here comes the difference: > *when* > the rotating wheel changes to "connected" state. One case, when the IP > address > is kept the "connected" icon appears right after the IP address assigned by > DHCP appears in the ifconfig output. In the wrong case this rotating wheel is > still there for about 60 seconds while the the assigned IP address is dropped > and an IP address like 169.254.123.251 is assigned. I would like to add that with 2.6.32 the IP address is dropped even if I exit the KNetworkManager before unplugging the network cable. I can reproduce the problem without touching the network cable. If I execute (I am running Debian 5.0, kernel 2.6.32): # /etc/init.d/network-manager stop then the network goes down. After this I execute: # /etc/init.d/network-manager start then the network comes up, the IP address assigned by DHCP appears for about a minute then the this IP address is dropped and the address starting with 169.254 is assigned. Regards, Márton Németh
Jarek Poplawski wrote: > happen here. Then dhcp client might hit some limit of tries. On the > other hand, such an effect could be amplified by changes in 2.6.32 > too (e.g. with different timing). Could you attach 2.6.31 dmesg after > this test? OK, here is the dmesg of 2.6.31. In this case I took care about unplug the network cable only once. Regards, Márton Németh
Created attachment 24221 [details] dmesg of 2.6.31
Roland Dreier wrote: > > Unfortunately reverting the commit > 61cbe54d9479ad98283b2dda686deae4c34b2d59 on > > top of 2.6.32 does not solve the problem. > > Is there any possibility that one of the steps of the bisection was > wrong? Could you possibly have accidentally marked a "bad" kernel as > "good"? I am afraid that I might need to repeat the whole "git bisect" process because reverting the found patch have not solved the problem... Regards, Márton Németh
On Fri, Dec 18, 2009 at 05:49:01PM +0100, Németh Márton wrote: > Németh Márton worte: > > I use KDE, and the KNetworkManager icon is visible on the task bar. When > > I unplug the network cable the "disconnected" icon appears. After I plug > the > > network cable again, a rotating wheel appears. Here comes the difference: > *when* > > the rotating wheel changes to "connected" state. One case, when the IP > address > > is kept the "connected" icon appears right after the IP address assigned by > > DHCP appears in the ifconfig output. In the wrong case this rotating wheel > is > > still there for about 60 seconds while the the assigned IP address is > dropped > > and an IP address like 169.254.123.251 is assigned. > > I would like to add that with 2.6.32 the IP address is dropped even if I exit > the KNetworkManager before unplugging the network cable. > > I can reproduce the problem without touching the network cable. If I execute > (I am running Debian 5.0, kernel 2.6.32): > > # /etc/init.d/network-manager stop > > then the network goes down. After this I execute: > > # /etc/init.d/network-manager start > > then the network comes up, the IP address assigned by DHCP appears for about > a > minute then the this IP address is dropped and the address starting with > 169.254 is assigned. Hmm... currently I'm out of new (wrong ;-) ideas. It seems there is some longer break in (mostly multicast) traffic just before releasing the DHCP address, according to this wireshark dump. Maybe this network manager tries reloading to fix something? (Isn't there nothing strange in logs from this network manager, btw?) You wrote earlier that you can get it working OK with dhclient, so I wonder if it's not some userspace (KNetworkManager) incompatibility with the new kernel (I mean if it works OK with basic dhcp tools started as root). Could you verify that more? Regards, Jarek P.
Jarek Poplawski írta: > On Fri, Dec 18, 2009 at 05:49:01PM +0100, Németh Márton wrote: >> Németh Márton worte: >>> I use KDE, and the KNetworkManager icon is visible on the task bar. When >>> I unplug the network cable the "disconnected" icon appears. After I plug >>> the >>> network cable again, a rotating wheel appears. Here comes the difference: >>> *when* >>> the rotating wheel changes to "connected" state. One case, when the IP >>> address >>> is kept the "connected" icon appears right after the IP address assigned by >>> DHCP appears in the ifconfig output. In the wrong case this rotating wheel >>> is >>> still there for about 60 seconds while the the assigned IP address is >>> dropped >>> and an IP address like 169.254.123.251 is assigned. >> I would like to add that with 2.6.32 the IP address is dropped even if I >> exit >> the KNetworkManager before unplugging the network cable. >> >> I can reproduce the problem without touching the network cable. If I execute >> (I am running Debian 5.0, kernel 2.6.32): >> >> # /etc/init.d/network-manager stop >> >> then the network goes down. After this I execute: >> >> # /etc/init.d/network-manager start >> >> then the network comes up, the IP address assigned by DHCP appears for about >> a >> minute then the this IP address is dropped and the address starting with >> 169.254 is assigned. > > Hmm... currently I'm out of new (wrong ;-) ideas. It seems there is > some longer break in (mostly multicast) traffic just before releasing > the DHCP address, according to this wireshark dump. Maybe this network > manager tries reloading to fix something? (Isn't there nothing strange > in logs from this network manager, btw?) > > You wrote earlier that you can get it working OK with dhclient, so I > wonder if it's not some userspace (KNetworkManager) incompatibility > with the new kernel (I mean if it works OK with basic dhcp tools > started as root). Could you verify that more? I upgraded the Debian package "network-manager" from 0.6.6-3 to 0.7.2-2. The problem seems to be solved: the IP address is not dropped in 2.6.32 and in 2.6.31. The conclusion is for me that a user-space program caused the problem, thanks for the hint. Regards, Márton Németh
On Sat, Dec 19, 2009 at 11:26:30AM +0100, Németh Márton wrote: > I upgraded the Debian package "network-manager" from 0.6.6-3 to 0.7.2-2. > The problem seems to be solved: the IP address is not dropped in 2.6.32 > and in 2.6.31. The conclusion is for me that a user-space program caused > the problem, thanks for the hint. Yes, Debian often fixes our bugs on time! ;-) Thanks, Jarek P.
On Tuesday 29 December 2009, Jarek Poplawski wrote: > Rafael J. Wysocki wrote, On 12/29/2009 04:26 PM: > > ... > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14794 > > Subject : IP address assigned by DHCP is dropped after ~40 > seconds > > Submitter : Márton Németh <nm127@freemail.hu> > > Date : 2009-12-13 08:59 (17 days old) > > > > > IMHO this bug might be considered as fixed by updating a userspace tool > (KNetworkManager) - unless somebody wants to seek for the exact reason...
On Tuesday 29 December 2009, Mike Galbraith wrote: > On Tue, 2009-12-29 at 16:28 +0100, Rafael J. Wysocki wrote: > > This message has been generated automatically as a part of a report > > of regressions introduced between 2.6.31 and 2.6.32. > > > > The following bug entry is on the current list of known regressions > > introduced between 2.6.31 and 2.6.32. Please verify if it still should > > be listed and let me know (either way). > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14794 > > Subject : IP address assigned by DHCP is dropped after ~40 > seconds > > Submitter : Márton Németh <nm127@freemail.hu> > > Date : 2009-12-13 08:59 (17 days old) > > This one was a userspace bug it seems. > > <quote> > Comment #21 From Jarek Poplawski 2009-12-19 13:48:21 ------- > On Sat, Dec 19, 2009 at 11:26:30AM +0100, Németh Márton wrote: > > I upgraded the Debian package "network-manager" from 0.6.6-3 to 0.7.2-2. > > The problem seems to be solved: the IP address is not dropped in 2.6.32 > > and in 2.6.31. The conclusion is for me that a user-space program caused > > the problem, thanks for the hint. > > Yes, Debian often fixes our bugs on time! ;-)