Bug 10071

Summary: kernel hang in inet_init
Product: Networking Reporter: Yannick Dirou (yannick.dirou)
Component: IPV4Assignee: Jarek Poplawski (jarkao2)
Status: CLOSED PATCH_ALREADY_AVAILABLE    
Severity: normal CC: acpi-bugzilla, bunk, jarkao2
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.25 rc2 latest git Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 9832    
Attachments: my config
dmesg 2.6.25-rc2 with acpi on
dmesg 2.6.25-rc3 with acpi=ht
add some debugging
dmesg 2.6.25-rc3 with pnpacpi=off
dmesg 2.6.25-rc3 with clocksource=jiffies
dmesg 2.6.25-rc3 + debug patch printj.2.diff
add some more debugging to af_inet

Description Yannick Dirou 2008-02-23 00:34:05 UTC
Distribution:ubuntu hardy
Hardware Environment:dell dimension 5150
Problem Description:
kernel hang mostly but boot slowly sometimes

Steps to reproduce:
boot

when the computer manage to boot, inet_init last some time:
[    1.725306] NET: Registered protocol family 2
[    6.825384] IP route cache hash table entries: 32768 (order: 5, 131072 bytes)
[    6.825803] TCP established hash table entries: 131072 (order: 8, 1048576 bytes)
[    6.826691] TCP bind hash table entries: 65536 (order: 9, 2359296 bytes)
[    6.836160] TCP: Hash tables configured (established 131072 bind 65536)
[    6.836255] TCP reno registered
[    7.081569] initcall 0xc03b8920: inet_init+0x0/0x3aa() returned 0.
[    7.081569] initcall 0xc03b8920 ran for 5106 msecs: inet_init+0x0/0x3aa()

When booting with acpi=ht it works:
[0.124668] Calling initcall 0xc03b8920: inet_init+0x0/0x3aa()
[    0.124867] NET: Registered protocol family 2
[    0.288067] IP route cache hash table entries: 32768 (order: 5, 131072 bytes)
[    0.288856] TCP established hash table entries: 131072 (order: 8, 1048576 bytes)
[    0.289739] TCP bind hash table entries: 65536 (order: 9, 2359296 bytes)
[    0.299156] TCP: Hash tables configured (established 131072 bind 65536)
[    0.299250] TCP reno registered
Feb 23 09:10:28 tiyann kernel: [    0.162267] initcall 0xc03b8920: inet_init+0x0/0x3aa() returned 0.
[    0.162383] initcall 0xc03b8920 ran for 33 msecs: inet_init+0x0/0x3aa()

there seem to be a weird interaction between acpi and inet_init

any hint on how to provide better information
thanks
Comment 1 Yannick Dirou 2008-02-23 00:35:14 UTC
Created attachment 14956 [details]
my config
Comment 2 Anonymous Emailer 2008-02-25 14:02:01 UTC
Reply-To: akpm@linux-foundation.org

On Sat, 23 Feb 2008 00:34:06 -0800 (PST) bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=10071
> 
>            Summary: kernel hang in inet_init
>            Product: Networking
>            Version: 2.5
>      KernelVersion: 2.6.25 rc2 latest git
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: IPV4
>         AssignedTo: shemminger@linux-foundation.org
>         ReportedBy: yannick.dirou@axetic.com
> 
> 
> Distribution:ubuntu hardy
> Hardware Environment:dell dimension 5150
> Problem Description:
> kernel hang mostly but boot slowly sometimes
> 
> Steps to reproduce:
> boot
> 
> when the computer manage to boot, inet_init last some time:
> [    1.725306] NET: Registered protocol family 2
> [    6.825384] IP route cache hash table entries: 32768 (order: 5, 131072
> bytes)
> [    6.825803] TCP established hash table entries: 131072 (order: 8, 1048576
> bytes)
> [    6.826691] TCP bind hash table entries: 65536 (order: 9, 2359296 bytes)
> [    6.836160] TCP: Hash tables configured (established 131072 bind 65536)
> [    6.836255] TCP reno registered
> [    7.081569] initcall 0xc03b8920: inet_init+0x0/0x3aa() returned 0.
> [    7.081569] initcall 0xc03b8920 ran for 5106 msecs: inet_init+0x0/0x3aa()
> 
> When booting with acpi=ht it works:
> [0.124668] Calling initcall 0xc03b8920: inet_init+0x0/0x3aa()
> [    0.124867] NET: Registered protocol family 2
> [    0.288067] IP route cache hash table entries: 32768 (order: 5, 131072
> bytes)
> [    0.288856] TCP established hash table entries: 131072 (order: 8, 1048576
> bytes)
> [    0.289739] TCP bind hash table entries: 65536 (order: 9, 2359296 bytes)
> [    0.299156] TCP: Hash tables configured (established 131072 bind 65536)
> [    0.299250] TCP reno registered
> Feb 23 09:10:28 tiyann kernel: [    0.162267] initcall 0xc03b8920:
> inet_init+0x0/0x3aa() returned 0.
> [    0.162383] initcall 0xc03b8920 ran for 33 msecs: inet_init+0x0/0x3aa()
> 
> there seem to be a weird interaction between acpi and inet_init
> 
> any hint on how to provide better information
> thanks
> 
Comment 3 Jarek Poplawski 2008-02-27 05:49:26 UTC
On 25-02-2008 23:00, Andrew Morton wrote:
> On Sat, 23 Feb 2008 00:34:06 -0800 (PST) bugme-daemon@bugzilla.kernel.org
> wrote:
> 
>> http://bugzilla.kernel.org/show_bug.cgi?id=10071
...
>>      KernelVersion: 2.6.25 rc2 latest git
...
>> there seem to be a weird interaction between acpi and inet_init
>>
>> any hint on how to provide better information

Yannick, here are a few questions:

- is this bug just with this git, but OK before (or 2.6.25-rc3?)?
- do all other drivers work properly and only net is a problem?
- could you add to bugzilla full dmesg in both cases?

Regards,
Jarek P.
Comment 4 Yannick Dirou 2008-02-27 06:08:07 UTC
(In reply to comment #3)
> 
> Yannick, here are a few questions:
> 
> - is this bug just with this git, but OK before (or 2.6.25-rc3?)?
don't know for sure but i think it appeared between 2.6.24 an 2.6.25
same behavior with 2.6.25-rc3 by the way
> - do all other drivers work properly and only net is a problem?
If i manage to boot (i have to shutdown / reboot computer three of four time most of the time) everything works ok , including network
> - could you add to bugzilla full dmesg in both cases?
sure , will do
Comment 5 Yannick Dirou 2008-02-27 06:09:30 UTC
Created attachment 15026 [details]
dmesg 2.6.25-rc2 with acpi on
Comment 6 Yannick Dirou 2008-02-27 06:34:19 UTC
Created attachment 15027 [details]
dmesg 2.6.25-rc3 with acpi=ht
Comment 7 Jarek Poplawski 2008-02-27 16:24:01 UTC
Created attachment 15038 [details]
add some debugging

Yannick, could you try this patch with 2.6.25-rc3 and send dmesg ("no ht" only)? It only adds some simple debugging. And BTW, could you try if boot parameter clocksource=jiffies makes any difference? Thanks, Jarek P.
Comment 8 Jarek Poplawski 2008-02-28 04:35:11 UTC
One more exercise: could you try at boot pnpacpi=off as well (even before patch)?
Comment 9 Yannick Dirou 2008-02-28 09:22:21 UTC
Created attachment 15062 [details]
dmesg 2.6.25-rc3 with pnpacpi=off

inet_init still slow (or hang!)
Comment 10 Yannick Dirou 2008-02-28 09:50:33 UTC
Created attachment 15064 [details]
dmesg 2.6.25-rc3 with clocksource=jiffies

only 33 msecs for inet_init
but it breaks my keyboard with xorg
i had to boot with the single option
Comment 11 Yannick Dirou 2008-02-28 10:31:51 UTC
Created attachment 15066 [details]
dmesg 2.6.25-rc3 + debug patch printj.2.diff
Comment 12 Jarek Poplawski 2008-02-28 11:53:43 UTC
Created attachment 15071 [details]
add some more debugging to af_inet

Yannick, here is one more simple patch. Should be applied without removing previous one. (If you have more time it would be interesting to try these
boot options too: clocksource=acpi_pm and maybe highres=off - but no need
to send dmesg). Thanks again, Jarek P.
Comment 13 Jarek Poplawski 2008-02-28 12:18:46 UTC
Actually, one more - even more interesting idea: could you recompile your kernel after changing in menuconfig "RCU implementation type" to Classic RCU?
Comment 14 Yannick Dirou 2008-02-28 12:26:04 UTC
(In reply to comment #12)
> Created an attachment (id=15071) [details]
> add some more debugging to af_inet
> 
> Yannick, here is one more simple patch. Should be applied without removing
> previous one. 
[    1.316999] Calling initcall 0xc03b8ea0: inet_init+0x0/0x4da()
[    1.317223] NET: Registered protocol family 2
[    1.317280] 4294668611 net/ipv4/af_inet.c 1387
[    1.317339] 4294668611 net/ipv4/af_inet.c 1390
[    1.317386] 4294668611 net/ipv4/af_inet.c 1393
[    1.317432] 4294668611 net/ipv4/af_inet.c 1402
[    1.317478] 4294668611 net/ipv4/af_inet.c 1407
[    1.071904] 0 4294668617l
[    4.436399] 0 4294671982l
[    7.483941] 0 4294675030l
[    7.483941] 4294675030 net/ipv4/af_inet.c 1419
[    7.483941] 4294675030 net/ipv4/af_inet.c 1426
[    7.483941] 4294675030 net/ipv4/route.c 2998
[    7.483941] 4294675030 net/ipv4/route.c 3009
[    7.483941] 4294675030 net/ipv4/route.c 3016
[    7.483941] IP route cache hash table entries: 32768 (order: 5, 131072 bytes)
[    7.483941] 4294675030 net/ipv4/route.c 3027
[    7.484691] TCP established hash table entries: 131072 (order: 8, 1048576 bytes)
[    7.485578] TCP bind hash table entries: 65536 (order: 9, 2359296 bytes)
[    7.495032] TCP: Hash tables configured (established 131072 bind 65536)
[    7.495128] TCP reno registered
[    7.755912] initcall 0xc03b8ea0: inet_init+0x0/0x4da() returned 0.
[    7.755912] initcall 0xc03b8ea0 ran for 6138 msecs: inet_init+0x0/0x4da()
Comment 15 Yannick Dirou 2008-02-28 12:31:25 UTC
with clocksource=acpi_pm

[    1.256155] NET: Registered protocol family 2
[    1.256201] 4294668550 net/ipv4/af_inet.c 1387
[    1.256260] 4294668550 net/ipv4/af_inet.c 1390
[    1.256306] 4294668550 net/ipv4/af_inet.c 1393
[    1.256343] 4294668550 net/ipv4/af_inet.c 1402
[    1.256389] 4294668550 net/ipv4/af_inet.c 1407
[    1.007131] 0 4294668556l
[    4.371675] 0 4294671921l
[    7.480234] 0 4294675030l
[    7.480234] 4294675030 net/ipv4/af_inet.c 1419
[    7.480234] 4294675030 net/ipv4/af_inet.c 1426
[    7.480234] 4294675030 net/ipv4/route.c 2998
[    7.480234] 4294675030 net/ipv4/route.c 3009
[    7.480234] 4294675030 net/ipv4/route.c 3016
[    7.480234] IP route cache hash table entries: 32768 (order: 5, 131072 bytes)
[    7.480234] 4294675030 net/ipv4/route.c 3027
[    7.481145] TCP established hash table entries: 131072 (order: 8, 1048576 bytes)
[    7.482044] TCP bind hash table entries: 65536 (order: 9, 2359296 bytes)
[    7.491739] TCP: Hash tables configured (established 131072 bind 65536)
[    7.491835] TCP reno registered
Comment 16 Yannick Dirou 2008-02-28 12:35:39 UTC
fast boot with highres=off

[    1.038282] NET: Registered protocol family 2
[    1.038329] 4294668592 net/ipv4/af_inet.c 1387
[    1.038387] 4294668592 net/ipv4/af_inet.c 1390
[    1.038433] 4294668592 net/ipv4/af_inet.c 1393
[    1.038479] 4294668592 net/ipv4/af_inet.c 1402
[    1.038524] 4294668592 net/ipv4/af_inet.c 1407
[    1.044807] 0 4294668598l
[    1.050811] 0 4294668604l
[    1.056816] 0 4294668610l
[    1.056816] 4294668610 net/ipv4/af_inet.c 1419
[    1.056816] 4294668610 net/ipv4/af_inet.c 1426
[    1.056816] 4294668610 net/ipv4/route.c 2998
[    1.056816] 4294668610 net/ipv4/route.c 3009
[    1.056816] 4294668610 net/ipv4/route.c 3016
[    1.056816] IP route cache hash table entries: 32768 (order: 5, 131072 bytes)
[    1.056816] 4294668610 net/ipv4/route.c 3027
[    1.057544] TCP established hash table entries: 131072 (order: 8, 1048576 bytes)
[    1.058426] TCP bind hash table entries: 65536 (order: 9, 2359296 bytes)
[    1.067825] TCP: Hash tables configured (established 131072 bind 65536)
[    1.067923] TCP reno registered
Comment 17 Yannick Dirou 2008-02-28 12:53:02 UTC
bingo with classic rcu
[    0.770854] NET: Registered protocol family 2
[    0.770868] 4294668327 net/ipv4/af_inet.c 1387
[    0.770884] 4294668327 net/ipv4/af_inet.c 1390
[    0.770888] 4294668327 net/ipv4/af_inet.c 1393
[    0.770891] 4294668327 net/ipv4/af_inet.c 1402
[    0.770893] 4294668327 net/ipv4/af_inet.c 1407
[    0.773078] 0 4294668328l
[    0.774036] 0 4294668329l
[    0.775037] 0 4294668330l
[    0.775040] 4294668330 net/ipv4/af_inet.c 1419
[    0.775077] 4294668330 net/ipv4/af_inet.c 1426
[    0.775077] 4294668330 net/ipv4/route.c 2998
[    0.775077] 4294668330 net/ipv4/route.c 3009
[    0.775077] 4294668330 net/ipv4/route.c 3016
[    0.775077] IP route cache hash table entries: 32768 (order: 5, 131072 bytes)
[    0.775077] 4294668330 net/ipv4/route.c 3027
[    0.775492] TCP established hash table entries: 131072 (order: 8, 1048576 bytes)
[    0.776318] TCP bind hash table entries: 65536 (order: 9, 2359296 bytes)
[    0.785670] TCP: Hash tables configured (established 131072 bind 65536)
[    0.785715] TCP reno registered
Comment 18 Jarek Poplawski 2008-02-28 13:30:06 UTC
On Thu, Feb 28, 2008 at 12:53:03PM -0800, bugme-daemon@bugzilla.kernel.org wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=10071
...
> ------- Comment #17 from yannick.dirou@axetic.com  2008-02-28 12:53 -------
> bingo with classic rcu

Congratulations Yannick!

So, it seems mainly RCU vs. timer(?) problem. I send CC of this message
to RCU author, linux-kernel and netdev lists.

I'm not sure how much time you could/will to offer for this debugging.
Anyway, if it's possible I think it would be interesting to check with
CONFIG_PREEMPT_RCU = y again, but CONFIG_HOTPLUG_CPU = n ("Processor
type and features --> Support for suspend on SMP and ...") in the
meantime.

Many thanks,
Jarek P.


> [    0.770854] NET: Registered protocol family 2
> [    0.770868] 4294668327 net/ipv4/af_inet.c 1387
> [    0.770884] 4294668327 net/ipv4/af_inet.c 1390
> [    0.770888] 4294668327 net/ipv4/af_inet.c 1393
> [    0.770891] 4294668327 net/ipv4/af_inet.c 1402
> [    0.770893] 4294668327 net/ipv4/af_inet.c 1407
> [    0.773078] 0 4294668328l
> [    0.774036] 0 4294668329l
> [    0.775037] 0 4294668330l
> [    0.775040] 4294668330 net/ipv4/af_inet.c 1419
> [    0.775077] 4294668330 net/ipv4/af_inet.c 1426
> [    0.775077] 4294668330 net/ipv4/route.c 2998
> [    0.775077] 4294668330 net/ipv4/route.c 3009
> [    0.775077] 4294668330 net/ipv4/route.c 3016
> [    0.775077] IP route cache hash table entries: 32768 (order: 5, 131072
> bytes)
> [    0.775077] 4294668330 net/ipv4/route.c 3027
> [    0.775492] TCP established hash table entries: 131072 (order: 8, 1048576
> bytes)
> [    0.776318] TCP bind hash table entries: 65536 (order: 9, 2359296 bytes)
> [    0.785670] TCP: Hash tables configured (established 131072 bind 65536)
> [    0.785715] TCP reno registered
> 
> 
> -- 
> Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug, or are watching someone who is.
> You are the assignee for the bug, or are watching the assignee.
Comment 19 Anonymous Emailer 2008-02-28 16:58:44 UTC
Reply-To: paulmck@linux.vnet.ibm.com

On Thu, Feb 28, 2008 at 10:34:57PM +0100, Jarek Poplawski wrote:
> On Thu, Feb 28, 2008 at 12:53:03PM -0800, bugme-daemon@bugzilla.kernel.org
> wrote:
> > http://bugzilla.kernel.org/show_bug.cgi?id=10071
> ...
> > ------- Comment #17 from yannick.dirou@axetic.com  2008-02-28 12:53 -------
> > bingo with classic rcu
> 
> Congratulations Yannick!
> 
> So, it seems mainly RCU vs. timer(?) problem. I send CC of this message
> to RCU author, linux-kernel and netdev lists.

Interesting!  Any chance of a pointer to the original messages?
A quick Google search and a look at LKML showed me nothing.  :-/

						Thanx, Paul

> I'm not sure how much time you could/will to offer for this debugging.
> Anyway, if it's possible I think it would be interesting to check with
> CONFIG_PREEMPT_RCU = y again, but CONFIG_HOTPLUG_CPU = n ("Processor
> type and features --> Support for suspend on SMP and ...") in the
> meantime.
> 
> Many thanks,
> Jarek P.
> 
> 
> > [    0.770854] NET: Registered protocol family 2
> > [    0.770868] 4294668327 net/ipv4/af_inet.c 1387
> > [    0.770884] 4294668327 net/ipv4/af_inet.c 1390
> > [    0.770888] 4294668327 net/ipv4/af_inet.c 1393
> > [    0.770891] 4294668327 net/ipv4/af_inet.c 1402
> > [    0.770893] 4294668327 net/ipv4/af_inet.c 1407
> > [    0.773078] 0 4294668328l
> > [    0.774036] 0 4294668329l
> > [    0.775037] 0 4294668330l
> > [    0.775040] 4294668330 net/ipv4/af_inet.c 1419
> > [    0.775077] 4294668330 net/ipv4/af_inet.c 1426
> > [    0.775077] 4294668330 net/ipv4/route.c 2998
> > [    0.775077] 4294668330 net/ipv4/route.c 3009
> > [    0.775077] 4294668330 net/ipv4/route.c 3016
> > [    0.775077] IP route cache hash table entries: 32768 (order: 5, 131072
> > bytes)
> > [    0.775077] 4294668330 net/ipv4/route.c 3027
> > [    0.775492] TCP established hash table entries: 131072 (order: 8,
> 1048576
> > bytes)
> > [    0.776318] TCP bind hash table entries: 65536 (order: 9, 2359296 bytes)
> > [    0.785670] TCP: Hash tables configured (established 131072 bind 65536)
> > [    0.785715] TCP reno registered
> > 
> > 
> > -- 
> > Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
> > ------- You are receiving this mail because: -------
> > You are on the CC list for the bug, or are watching someone who is.
> > You are the assignee for the bug, or are watching the assignee.
Comment 20 Anonymous Emailer 2008-02-28 20:32:56 UTC
Reply-To: paulmck@linux.vnet.ibm.com

On Thu, Feb 28, 2008 at 04:58:23PM -0800, Paul E. McKenney wrote:
> On Thu, Feb 28, 2008 at 10:34:57PM +0100, Jarek Poplawski wrote:
> > On Thu, Feb 28, 2008 at 12:53:03PM -0800, bugme-daemon@bugzilla.kernel.org
> wrote:
> > > http://bugzilla.kernel.org/show_bug.cgi?id=10071
> > ...
> > > ------- Comment #17 from yannick.dirou@axetic.com  2008-02-28 12:53
> -------
> > > bingo with classic rcu
> > 
> > Congratulations Yannick!
> > 
> > So, it seems mainly RCU vs. timer(?) problem. I send CC of this message
> > to RCU author, linux-kernel and netdev lists.
> 
> Interesting!  Any chance of a pointer to the original messages?
> A quick Google search and a look at LKML showed me nothing.  :-/

Thanks to Raphael for pointing me at the above bugzilla URL!

Yannick,

Would you be willing to try your test again after applying the
patch in http://lkml.org/lkml/2008/2/2/11 and rebuilding your
kernel?

This patch does not appear to be in 2.6.25-rc3.  :-(

I am reposting this patch separately.

							Thanx, Paul
Comment 21 Yannick Dirou 2008-02-28 22:47:44 UTC
(In reply to comment #20)
> Reply-To: paulmck@linux.vnet.ibm.com
> 
> Would you be willing to try your test again after applying the
> patch in http://lkml.org/lkml/2008/2/2/11 and rebuilding your
> kernel?
> 
> This patch does not appear to be in 2.6.25-rc3.  :-(
> 
> I am reposting this patch separately.
> 
>                                                         Thanx, Paul

Congratulations Paul and thanks a lot to Jarek , It works, 
no more hang or slow boot :

[    1.061675] Calling initcall 0xc03b8e10: inet_init+0x0/0x3ba()
[    1.061895] NET: Registered protocol family 2
[    1.339007] IP route cache hash table entries: 32768 (order: 5, 131072 bytes)
[    1.339470] TCP established hash table entries: 131072 (order: 8, 1048576 bytes)
[    1.340375] TCP bind hash table entries: 65536 (order: 9, 2359296 bytes)
[    1.349809] TCP: Hash tables configured (established 131072 bind 65536)
[    1.349906] TCP reno registered
[    1.098514] initcall 0xc03b8e10: inet_init+0x0/0x3ba() returned 0.
[    1.098604] initcall 0xc03b8e10 ran for 32 msecs: inet_init+0x0/0x3ba()
Comment 22 Rafael J. Wysocki 2008-02-29 04:35:50 UTC
This entry is being used for tracking a regression from 2.6.24.  Please don't
close it until the problem is fixed in the mainline.
Comment 23 Jarek Poplawski 2008-02-29 04:42:01 UTC
> ------- Comment #22 from rjw@sisk.pl  2008-02-29 04:35 -------
> This entry is being used for tracking a regression from 2.6.24.  Please don't
> close it until the problem is fixed in the mainline.

Sorry! It's my bad: I didn't know the rules and asked Yannick to do this.
Jarek P.
Comment 24 Yannick Dirou 2008-02-29 06:26:24 UTC
I promise i won't do it again !
;-)
Comment 25 Rafael J. Wysocki 2008-03-02 16:47:03 UTC
Regressions list annotation:
Patch : http://lkml.org/lkml/2008/2/2/11
Handled-By : "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Comment 26 Adrian Bunk 2008-03-09 17:38:42 UTC
fixed by commit 2232c2d8e0a6a31061dec311f3d1cf7624bc14f1