Bug 10071
Summary: | kernel hang in inet_init | ||
---|---|---|---|
Product: | Networking | Reporter: | Yannick Dirou (yannick.dirou) |
Component: | IPV4 | Assignee: | Jarek Poplawski (jarkao2) |
Status: | CLOSED PATCH_ALREADY_AVAILABLE | ||
Severity: | normal | CC: | acpi-bugzilla, bunk, jarkao2 |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.25 rc2 latest git | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 9832 | ||
Attachments: |
my config
dmesg 2.6.25-rc2 with acpi on dmesg 2.6.25-rc3 with acpi=ht add some debugging dmesg 2.6.25-rc3 with pnpacpi=off dmesg 2.6.25-rc3 with clocksource=jiffies dmesg 2.6.25-rc3 + debug patch printj.2.diff add some more debugging to af_inet |
Description
Yannick Dirou
2008-02-23 00:34:05 UTC
Created attachment 14956 [details]
my config
Reply-To: akpm@linux-foundation.org On Sat, 23 Feb 2008 00:34:06 -0800 (PST) bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=10071 > > Summary: kernel hang in inet_init > Product: Networking > Version: 2.5 > KernelVersion: 2.6.25 rc2 latest git > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: IPV4 > AssignedTo: shemminger@linux-foundation.org > ReportedBy: yannick.dirou@axetic.com > > > Distribution:ubuntu hardy > Hardware Environment:dell dimension 5150 > Problem Description: > kernel hang mostly but boot slowly sometimes > > Steps to reproduce: > boot > > when the computer manage to boot, inet_init last some time: > [ 1.725306] NET: Registered protocol family 2 > [ 6.825384] IP route cache hash table entries: 32768 (order: 5, 131072 > bytes) > [ 6.825803] TCP established hash table entries: 131072 (order: 8, 1048576 > bytes) > [ 6.826691] TCP bind hash table entries: 65536 (order: 9, 2359296 bytes) > [ 6.836160] TCP: Hash tables configured (established 131072 bind 65536) > [ 6.836255] TCP reno registered > [ 7.081569] initcall 0xc03b8920: inet_init+0x0/0x3aa() returned 0. > [ 7.081569] initcall 0xc03b8920 ran for 5106 msecs: inet_init+0x0/0x3aa() > > When booting with acpi=ht it works: > [0.124668] Calling initcall 0xc03b8920: inet_init+0x0/0x3aa() > [ 0.124867] NET: Registered protocol family 2 > [ 0.288067] IP route cache hash table entries: 32768 (order: 5, 131072 > bytes) > [ 0.288856] TCP established hash table entries: 131072 (order: 8, 1048576 > bytes) > [ 0.289739] TCP bind hash table entries: 65536 (order: 9, 2359296 bytes) > [ 0.299156] TCP: Hash tables configured (established 131072 bind 65536) > [ 0.299250] TCP reno registered > Feb 23 09:10:28 tiyann kernel: [ 0.162267] initcall 0xc03b8920: > inet_init+0x0/0x3aa() returned 0. > [ 0.162383] initcall 0xc03b8920 ran for 33 msecs: inet_init+0x0/0x3aa() > > there seem to be a weird interaction between acpi and inet_init > > any hint on how to provide better information > thanks > On 25-02-2008 23:00, Andrew Morton wrote: > On Sat, 23 Feb 2008 00:34:06 -0800 (PST) bugme-daemon@bugzilla.kernel.org > wrote: > >> http://bugzilla.kernel.org/show_bug.cgi?id=10071 ... >> KernelVersion: 2.6.25 rc2 latest git ... >> there seem to be a weird interaction between acpi and inet_init >> >> any hint on how to provide better information Yannick, here are a few questions: - is this bug just with this git, but OK before (or 2.6.25-rc3?)? - do all other drivers work properly and only net is a problem? - could you add to bugzilla full dmesg in both cases? Regards, Jarek P. (In reply to comment #3) > > Yannick, here are a few questions: > > - is this bug just with this git, but OK before (or 2.6.25-rc3?)? don't know for sure but i think it appeared between 2.6.24 an 2.6.25 same behavior with 2.6.25-rc3 by the way > - do all other drivers work properly and only net is a problem? If i manage to boot (i have to shutdown / reboot computer three of four time most of the time) everything works ok , including network > - could you add to bugzilla full dmesg in both cases? sure , will do Created attachment 15026 [details]
dmesg 2.6.25-rc2 with acpi on
Created attachment 15027 [details]
dmesg 2.6.25-rc3 with acpi=ht
Created attachment 15038 [details]
add some debugging
Yannick, could you try this patch with 2.6.25-rc3 and send dmesg ("no ht" only)? It only adds some simple debugging. And BTW, could you try if boot parameter clocksource=jiffies makes any difference? Thanks, Jarek P.
One more exercise: could you try at boot pnpacpi=off as well (even before patch)? Created attachment 15062 [details]
dmesg 2.6.25-rc3 with pnpacpi=off
inet_init still slow (or hang!)
Created attachment 15064 [details]
dmesg 2.6.25-rc3 with clocksource=jiffies
only 33 msecs for inet_init
but it breaks my keyboard with xorg
i had to boot with the single option
Created attachment 15066 [details]
dmesg 2.6.25-rc3 + debug patch printj.2.diff
Created attachment 15071 [details]
add some more debugging to af_inet
Yannick, here is one more simple patch. Should be applied without removing previous one. (If you have more time it would be interesting to try these
boot options too: clocksource=acpi_pm and maybe highres=off - but no need
to send dmesg). Thanks again, Jarek P.
Actually, one more - even more interesting idea: could you recompile your kernel after changing in menuconfig "RCU implementation type" to Classic RCU? (In reply to comment #12) > Created an attachment (id=15071) [details] > add some more debugging to af_inet > > Yannick, here is one more simple patch. Should be applied without removing > previous one. [ 1.316999] Calling initcall 0xc03b8ea0: inet_init+0x0/0x4da() [ 1.317223] NET: Registered protocol family 2 [ 1.317280] 4294668611 net/ipv4/af_inet.c 1387 [ 1.317339] 4294668611 net/ipv4/af_inet.c 1390 [ 1.317386] 4294668611 net/ipv4/af_inet.c 1393 [ 1.317432] 4294668611 net/ipv4/af_inet.c 1402 [ 1.317478] 4294668611 net/ipv4/af_inet.c 1407 [ 1.071904] 0 4294668617l [ 4.436399] 0 4294671982l [ 7.483941] 0 4294675030l [ 7.483941] 4294675030 net/ipv4/af_inet.c 1419 [ 7.483941] 4294675030 net/ipv4/af_inet.c 1426 [ 7.483941] 4294675030 net/ipv4/route.c 2998 [ 7.483941] 4294675030 net/ipv4/route.c 3009 [ 7.483941] 4294675030 net/ipv4/route.c 3016 [ 7.483941] IP route cache hash table entries: 32768 (order: 5, 131072 bytes) [ 7.483941] 4294675030 net/ipv4/route.c 3027 [ 7.484691] TCP established hash table entries: 131072 (order: 8, 1048576 bytes) [ 7.485578] TCP bind hash table entries: 65536 (order: 9, 2359296 bytes) [ 7.495032] TCP: Hash tables configured (established 131072 bind 65536) [ 7.495128] TCP reno registered [ 7.755912] initcall 0xc03b8ea0: inet_init+0x0/0x4da() returned 0. [ 7.755912] initcall 0xc03b8ea0 ran for 6138 msecs: inet_init+0x0/0x4da() with clocksource=acpi_pm [ 1.256155] NET: Registered protocol family 2 [ 1.256201] 4294668550 net/ipv4/af_inet.c 1387 [ 1.256260] 4294668550 net/ipv4/af_inet.c 1390 [ 1.256306] 4294668550 net/ipv4/af_inet.c 1393 [ 1.256343] 4294668550 net/ipv4/af_inet.c 1402 [ 1.256389] 4294668550 net/ipv4/af_inet.c 1407 [ 1.007131] 0 4294668556l [ 4.371675] 0 4294671921l [ 7.480234] 0 4294675030l [ 7.480234] 4294675030 net/ipv4/af_inet.c 1419 [ 7.480234] 4294675030 net/ipv4/af_inet.c 1426 [ 7.480234] 4294675030 net/ipv4/route.c 2998 [ 7.480234] 4294675030 net/ipv4/route.c 3009 [ 7.480234] 4294675030 net/ipv4/route.c 3016 [ 7.480234] IP route cache hash table entries: 32768 (order: 5, 131072 bytes) [ 7.480234] 4294675030 net/ipv4/route.c 3027 [ 7.481145] TCP established hash table entries: 131072 (order: 8, 1048576 bytes) [ 7.482044] TCP bind hash table entries: 65536 (order: 9, 2359296 bytes) [ 7.491739] TCP: Hash tables configured (established 131072 bind 65536) [ 7.491835] TCP reno registered fast boot with highres=off [ 1.038282] NET: Registered protocol family 2 [ 1.038329] 4294668592 net/ipv4/af_inet.c 1387 [ 1.038387] 4294668592 net/ipv4/af_inet.c 1390 [ 1.038433] 4294668592 net/ipv4/af_inet.c 1393 [ 1.038479] 4294668592 net/ipv4/af_inet.c 1402 [ 1.038524] 4294668592 net/ipv4/af_inet.c 1407 [ 1.044807] 0 4294668598l [ 1.050811] 0 4294668604l [ 1.056816] 0 4294668610l [ 1.056816] 4294668610 net/ipv4/af_inet.c 1419 [ 1.056816] 4294668610 net/ipv4/af_inet.c 1426 [ 1.056816] 4294668610 net/ipv4/route.c 2998 [ 1.056816] 4294668610 net/ipv4/route.c 3009 [ 1.056816] 4294668610 net/ipv4/route.c 3016 [ 1.056816] IP route cache hash table entries: 32768 (order: 5, 131072 bytes) [ 1.056816] 4294668610 net/ipv4/route.c 3027 [ 1.057544] TCP established hash table entries: 131072 (order: 8, 1048576 bytes) [ 1.058426] TCP bind hash table entries: 65536 (order: 9, 2359296 bytes) [ 1.067825] TCP: Hash tables configured (established 131072 bind 65536) [ 1.067923] TCP reno registered bingo with classic rcu [ 0.770854] NET: Registered protocol family 2 [ 0.770868] 4294668327 net/ipv4/af_inet.c 1387 [ 0.770884] 4294668327 net/ipv4/af_inet.c 1390 [ 0.770888] 4294668327 net/ipv4/af_inet.c 1393 [ 0.770891] 4294668327 net/ipv4/af_inet.c 1402 [ 0.770893] 4294668327 net/ipv4/af_inet.c 1407 [ 0.773078] 0 4294668328l [ 0.774036] 0 4294668329l [ 0.775037] 0 4294668330l [ 0.775040] 4294668330 net/ipv4/af_inet.c 1419 [ 0.775077] 4294668330 net/ipv4/af_inet.c 1426 [ 0.775077] 4294668330 net/ipv4/route.c 2998 [ 0.775077] 4294668330 net/ipv4/route.c 3009 [ 0.775077] 4294668330 net/ipv4/route.c 3016 [ 0.775077] IP route cache hash table entries: 32768 (order: 5, 131072 bytes) [ 0.775077] 4294668330 net/ipv4/route.c 3027 [ 0.775492] TCP established hash table entries: 131072 (order: 8, 1048576 bytes) [ 0.776318] TCP bind hash table entries: 65536 (order: 9, 2359296 bytes) [ 0.785670] TCP: Hash tables configured (established 131072 bind 65536) [ 0.785715] TCP reno registered On Thu, Feb 28, 2008 at 12:53:03PM -0800, bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=10071 ... > ------- Comment #17 from yannick.dirou@axetic.com 2008-02-28 12:53 ------- > bingo with classic rcu Congratulations Yannick! So, it seems mainly RCU vs. timer(?) problem. I send CC of this message to RCU author, linux-kernel and netdev lists. I'm not sure how much time you could/will to offer for this debugging. Anyway, if it's possible I think it would be interesting to check with CONFIG_PREEMPT_RCU = y again, but CONFIG_HOTPLUG_CPU = n ("Processor type and features --> Support for suspend on SMP and ...") in the meantime. Many thanks, Jarek P. > [ 0.770854] NET: Registered protocol family 2 > [ 0.770868] 4294668327 net/ipv4/af_inet.c 1387 > [ 0.770884] 4294668327 net/ipv4/af_inet.c 1390 > [ 0.770888] 4294668327 net/ipv4/af_inet.c 1393 > [ 0.770891] 4294668327 net/ipv4/af_inet.c 1402 > [ 0.770893] 4294668327 net/ipv4/af_inet.c 1407 > [ 0.773078] 0 4294668328l > [ 0.774036] 0 4294668329l > [ 0.775037] 0 4294668330l > [ 0.775040] 4294668330 net/ipv4/af_inet.c 1419 > [ 0.775077] 4294668330 net/ipv4/af_inet.c 1426 > [ 0.775077] 4294668330 net/ipv4/route.c 2998 > [ 0.775077] 4294668330 net/ipv4/route.c 3009 > [ 0.775077] 4294668330 net/ipv4/route.c 3016 > [ 0.775077] IP route cache hash table entries: 32768 (order: 5, 131072 > bytes) > [ 0.775077] 4294668330 net/ipv4/route.c 3027 > [ 0.775492] TCP established hash table entries: 131072 (order: 8, 1048576 > bytes) > [ 0.776318] TCP bind hash table entries: 65536 (order: 9, 2359296 bytes) > [ 0.785670] TCP: Hash tables configured (established 131072 bind 65536) > [ 0.785715] TCP reno registered > > > -- > Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You are on the CC list for the bug, or are watching someone who is. > You are the assignee for the bug, or are watching the assignee. Reply-To: paulmck@linux.vnet.ibm.com On Thu, Feb 28, 2008 at 10:34:57PM +0100, Jarek Poplawski wrote: > On Thu, Feb 28, 2008 at 12:53:03PM -0800, bugme-daemon@bugzilla.kernel.org > wrote: > > http://bugzilla.kernel.org/show_bug.cgi?id=10071 > ... > > ------- Comment #17 from yannick.dirou@axetic.com 2008-02-28 12:53 ------- > > bingo with classic rcu > > Congratulations Yannick! > > So, it seems mainly RCU vs. timer(?) problem. I send CC of this message > to RCU author, linux-kernel and netdev lists. Interesting! Any chance of a pointer to the original messages? A quick Google search and a look at LKML showed me nothing. :-/ Thanx, Paul > I'm not sure how much time you could/will to offer for this debugging. > Anyway, if it's possible I think it would be interesting to check with > CONFIG_PREEMPT_RCU = y again, but CONFIG_HOTPLUG_CPU = n ("Processor > type and features --> Support for suspend on SMP and ...") in the > meantime. > > Many thanks, > Jarek P. > > > > [ 0.770854] NET: Registered protocol family 2 > > [ 0.770868] 4294668327 net/ipv4/af_inet.c 1387 > > [ 0.770884] 4294668327 net/ipv4/af_inet.c 1390 > > [ 0.770888] 4294668327 net/ipv4/af_inet.c 1393 > > [ 0.770891] 4294668327 net/ipv4/af_inet.c 1402 > > [ 0.770893] 4294668327 net/ipv4/af_inet.c 1407 > > [ 0.773078] 0 4294668328l > > [ 0.774036] 0 4294668329l > > [ 0.775037] 0 4294668330l > > [ 0.775040] 4294668330 net/ipv4/af_inet.c 1419 > > [ 0.775077] 4294668330 net/ipv4/af_inet.c 1426 > > [ 0.775077] 4294668330 net/ipv4/route.c 2998 > > [ 0.775077] 4294668330 net/ipv4/route.c 3009 > > [ 0.775077] 4294668330 net/ipv4/route.c 3016 > > [ 0.775077] IP route cache hash table entries: 32768 (order: 5, 131072 > > bytes) > > [ 0.775077] 4294668330 net/ipv4/route.c 3027 > > [ 0.775492] TCP established hash table entries: 131072 (order: 8, > 1048576 > > bytes) > > [ 0.776318] TCP bind hash table entries: 65536 (order: 9, 2359296 bytes) > > [ 0.785670] TCP: Hash tables configured (established 131072 bind 65536) > > [ 0.785715] TCP reno registered > > > > > > -- > > Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email > > ------- You are receiving this mail because: ------- > > You are on the CC list for the bug, or are watching someone who is. > > You are the assignee for the bug, or are watching the assignee. Reply-To: paulmck@linux.vnet.ibm.com On Thu, Feb 28, 2008 at 04:58:23PM -0800, Paul E. McKenney wrote: > On Thu, Feb 28, 2008 at 10:34:57PM +0100, Jarek Poplawski wrote: > > On Thu, Feb 28, 2008 at 12:53:03PM -0800, bugme-daemon@bugzilla.kernel.org > wrote: > > > http://bugzilla.kernel.org/show_bug.cgi?id=10071 > > ... > > > ------- Comment #17 from yannick.dirou@axetic.com 2008-02-28 12:53 > ------- > > > bingo with classic rcu > > > > Congratulations Yannick! > > > > So, it seems mainly RCU vs. timer(?) problem. I send CC of this message > > to RCU author, linux-kernel and netdev lists. > > Interesting! Any chance of a pointer to the original messages? > A quick Google search and a look at LKML showed me nothing. :-/ Thanks to Raphael for pointing me at the above bugzilla URL! Yannick, Would you be willing to try your test again after applying the patch in http://lkml.org/lkml/2008/2/2/11 and rebuilding your kernel? This patch does not appear to be in 2.6.25-rc3. :-( I am reposting this patch separately. Thanx, Paul (In reply to comment #20) > Reply-To: paulmck@linux.vnet.ibm.com > > Would you be willing to try your test again after applying the > patch in http://lkml.org/lkml/2008/2/2/11 and rebuilding your > kernel? > > This patch does not appear to be in 2.6.25-rc3. :-( > > I am reposting this patch separately. > > Thanx, Paul Congratulations Paul and thanks a lot to Jarek , It works, no more hang or slow boot : [ 1.061675] Calling initcall 0xc03b8e10: inet_init+0x0/0x3ba() [ 1.061895] NET: Registered protocol family 2 [ 1.339007] IP route cache hash table entries: 32768 (order: 5, 131072 bytes) [ 1.339470] TCP established hash table entries: 131072 (order: 8, 1048576 bytes) [ 1.340375] TCP bind hash table entries: 65536 (order: 9, 2359296 bytes) [ 1.349809] TCP: Hash tables configured (established 131072 bind 65536) [ 1.349906] TCP reno registered [ 1.098514] initcall 0xc03b8e10: inet_init+0x0/0x3ba() returned 0. [ 1.098604] initcall 0xc03b8e10 ran for 32 msecs: inet_init+0x0/0x3ba() This entry is being used for tracking a regression from 2.6.24. Please don't close it until the problem is fixed in the mainline. > ------- Comment #22 from rjw@sisk.pl 2008-02-29 04:35 -------
> This entry is being used for tracking a regression from 2.6.24. Please don't
> close it until the problem is fixed in the mainline.
Sorry! It's my bad: I didn't know the rules and asked Yannick to do this.
Jarek P.
I promise i won't do it again ! ;-) Regressions list annotation: Patch : http://lkml.org/lkml/2008/2/2/11 Handled-By : "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> fixed by commit 2232c2d8e0a6a31061dec311f3d1cf7624bc14f1 |