Bug 10326

Summary: inconsistent lock state in net_rx_action
Product: Networking Reporter: Marcus Better (marcus)
Component: OtherAssignee: Arnaldo Carvalho de Melo (acme)
Status: CLOSED CODE_FIX    
Severity: normal CC: rjw
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.25-rc6 (git a4083c9271e0a697278e089f2c0b9a95363ada0a) Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 9832    
Attachments: Kernel config
Output of lspci -v

Description Marcus Better 2008-03-25 13:21:22 UTC
Latest working kernel version: 2.6.24
Earliest failing kernel version: 2.6.25-rc6 (git a4083c9271e0a697278e089f2c0b9a95363ada0a)
Distribution: Debian testing/unstable
Hardware Environment: LG LE50 Express laptop, i386
Software Environment: Debian i386, X.org 7.3, KDE 4

Problem Description:

Got this during bootup, somewhere before launching X. It only happened once, I have rebooted a few times and didn't reproduce it.

Mar 25 19:48:46 better kernel: =================================
Mar 25 19:48:46 better kernel: [ INFO: inconsistent lock state ]
Mar 25 19:48:46 better kernel: 2.6.25-rc6-lg #4
Mar 25 19:48:46 better kernel: ---------------------------------
Mar 25 19:48:46 better kernel: inconsistent {in-softirq-W} -> {softirq-on-W} usage.
Mar 25 19:48:46 better kernel: syslogd/2773 [HC0[0]:SC0[0]:HE1:SE1] takes:
Mar 25 19:48:46 better kernel:  (&napi->poll_lock){-+..}, at: [<c03db11c>] netpoll_poll+0xbc/0x390
Mar 25 19:48:46 better kernel: {in-softirq-W} state was registered at:
Mar 25 19:48:46 better kernel:   [<c024171a>] __lock_acquire+0x36a/0x1070
Mar 25 19:48:46 better kernel:   [<c024153f>] __lock_acquire+0x18f/0x1070
Mar 25 19:48:46 better kernel:   [<c024247f>] lock_acquire+0x5f/0x80
Mar 25 19:48:46 better kernel:   [<c03cfafc>] net_rx_action+0xbc/0x1d0
Mar 25 19:48:46 better kernel:   [<c044a9e4>] _spin_lock+0x34/0x40
Mar 25 19:48:46 better kernel:   [<c03cfafc>] net_rx_action+0xbc/0x1d0
Mar 25 19:48:46 better kernel:   [<c03cfafc>] net_rx_action+0xbc/0x1d0
Mar 25 19:48:46 better kernel:   [<c03cfaa0>] net_rx_action+0x60/0x1d0
Mar 25 19:48:46 better kernel:   [<c02254f2>] __do_softirq+0x62/0xc0
Mar 25 19:48:46 better kernel:   [<c0225595>] do_softirq+0x45/0x50
Mar 25 19:48:46 better kernel:   [<c0225797>] irq_exit+0x77/0x80
Mar 25 19:48:46 better kernel:   [<c0211a95>] smp_apic_timer_interrupt+0x45/0x80
Mar 25 19:48:46 better kernel:   [<c02058ad>] apic_timer_interrupt+0x29/0x38
Mar 25 19:48:46 better kernel:   [<c02058b7>] apic_timer_interrupt+0x33/0x38
Mar 25 19:48:46 better kernel:   [<c022007b>] __set_personality+0x2b/0x240
Mar 25 19:48:46 better kernel:   [<c0220e51>] release_console_sem+0x191/0x1e0
Mar 25 19:48:46 better kernel:   [<c05751aa>] init_netconsole+0x1da/0x1f0
Mar 25 19:48:46 better kernel:   [<c0558700>] kernel_init+0x90/0x270
Mar 25 19:48:46 better kernel:   [<c0204ecf>] restore_nocheck+0x12/0x15
Mar 25 19:48:46 better kernel:   [<c0240eec>] trace_hardirqs_on+0x9c/0x110
Mar 25 19:48:46 better kernel:   [<c0558670>] kernel_init+0x0/0x270
Mar 25 19:48:46 better kernel:   [<c0558670>] kernel_init+0x0/0x270
Mar 25 19:48:46 better kernel:   [<c0205a23>] kernel_thread_helper+0x7/0x14
Mar 25 19:48:46 better kernel:   [<ffffffff>] 0xffffffff
Mar 25 19:48:46 better kernel: irq event stamp: 174638
Mar 25 19:48:46 better kernel: hardirqs last  enabled at (174637): [<c0204ecf>] restore_nocheck+0x12/0x15
Mar 25 19:48:46 better kernel: hardirqs last disabled at (174638): [<c044ac65>] _spin_lock_irqsave+0x15/0x60
Mar 25 19:48:46 better kernel: softirqs last  enabled at (174636): [<c0225595>] do_softirq+0x45/0x50
Mar 25 19:48:46 better kernel: softirqs last disabled at (174629): [<c0225595>] do_softirq+0x45/0x50
Mar 25 19:48:46 better kernel: 
Mar 25 19:48:46 better kernel: other info that might help us debug this:
Mar 25 19:48:46 better kernel: 3 locks held by syslogd/2773:
Mar 25 19:48:46 better kernel:  #0:  (&tty->atomic_write_lock){--..}, at: [<c036c699>] tty_write_lock+0x19/0x50
Mar 25 19:48:46 better kernel:  #1:  (target_list_lock){--..}, at: [<c0399c0d>] write_msg+0x2d/0xd0
Mar 25 19:48:46 better kernel:  #2:  (&napi->poll_lock){-+..}, at: [<c03db11c>] netpoll_poll+0xbc/0x390
Mar 25 19:48:46 better kernel: 
Mar 25 19:48:46 better kernel: stack backtrace:
Mar 25 19:48:46 better kernel: Pid: 2773, comm: syslogd Not tainted 2.6.25-rc6-lg #4
Mar 25 19:48:46 better kernel:  [<c023ff6f>] print_usage_bug+0x15f/0x170
Mar 25 19:48:46 better kernel:  [<c0240bd1>] mark_lock+0x401/0x590
Mar 25 19:48:46 better kernel:  [<c0240d98>] mark_held_locks+0x38/0x70
Mar 25 19:48:46 better kernel:  [<c0398a43>] sky2_poll+0x7a3/0xad0
Mar 25 19:48:46 better kernel:  [<c0240eec>] trace_hardirqs_on+0x9c/0x110
Mar 25 19:48:46 better kernel:  [<c0398a43>] sky2_poll+0x7a3/0xad0
Mar 25 19:48:46 better kernel:  [<c03db11c>] netpoll_poll+0xbc/0x390
Mar 25 19:48:46 better kernel:  [<c044adb2>] _spin_trylock+0x42/0x80
Mar 25 19:48:46 better kernel:  [<c03db0d3>] netpoll_poll+0x73/0x390
Mar 25 19:48:46 better kernel:  [<c03db51f>] netpoll_send_skb+0x12f/0x1d0
Mar 25 19:48:46 better kernel:  [<c0399c7c>] write_msg+0x9c/0xd0
Mar 25 19:48:46 better kernel:  [<c0220bd4>] __call_console_drivers+0x54/0x70
Mar 25 19:48:46 better kernel:  [<c0220d90>] release_console_sem+0xd0/0x1e0
Mar 25 19:48:46 better kernel:  [<c037a8d9>] do_con_write+0x189/0x1a80
Mar 25 19:48:46 better kernel:  [<c0240d98>] mark_held_locks+0x38/0x70
Mar 25 19:48:46 better kernel:  [<c044affc>] _spin_unlock_irqrestore+0x4c/0x60
Mar 25 19:48:46 better kernel:  [<c0240eec>] trace_hardirqs_on+0x9c/0x110
Mar 25 19:48:46 better kernel:  [<c044afe7>] _spin_unlock_irqrestore+0x37/0x60
Mar 25 19:48:46 better kernel:  [<c037c1f7>] con_put_char+0x27/0x30
Mar 25 19:48:46 better kernel:  [<c036e6ed>] opost+0xad/0x220
Mar 25 19:48:46 better kernel:  [<c036f08b>] write_chan+0x1ab/0x350
Mar 25 19:48:46 better kernel:  [<c021a790>] default_wake_function+0x0/0x10
Mar 25 19:48:46 better kernel:  [<c036c87b>] tty_write+0x13b/0x1d0
Mar 25 19:48:47 better kernel:  [<c036eee0>] write_chan+0x0/0x350
Mar 25 19:48:47 better kernel:  [<c02802e3>] do_loop_readv_writev+0x33/0x60
Mar 25 19:48:47 better kernel:  [<c0280d50>] do_readv_writev+0x170/0x180
Mar 25 19:48:47 better kernel:  [<c036c740>] tty_write+0x0/0x1d0
Mar 25 19:48:47 better kernel:  [<c0280d9c>] vfs_writev+0x3c/0x50
Mar 25 19:48:47 better kernel:  [<c0281261>] sys_writev+0x41/0x70
Mar 25 19:48:47 better kernel:  [<c0204de6>] sysenter_past_esp+0x5f/0xa5
Mar 25 19:48:47 better kernel:  =======================
Comment 1 Marcus Better 2008-03-25 13:22:29 UTC
Created attachment 15431 [details]
Kernel config
Comment 2 Marcus Better 2008-03-25 13:23:23 UTC
Created attachment 15432 [details]
Output of lspci -v
Comment 3 Anonymous Emailer 2008-03-25 13:43:57 UTC
Reply-To: akpm@linux-foundation.org


(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Tue, 25 Mar 2008 13:21:23 -0700 (PDT)
bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=10326
> 
>            Summary: inconsistent lock state in net_rx_action
>            Product: Networking
>            Version: 2.5
>      KernelVersion: 2.6.25-rc6 (git
>                     a4083c9271e0a697278e089f2c0b9a95363ada0a)
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Other
>         AssignedTo: acme@ghostprotocols.net
>         ReportedBy: marcus@better.se
> 
> 
> Latest working kernel version: 2.6.24
> Earliest failing kernel version: 2.6.25-rc6 (git
> a4083c9271e0a697278e089f2c0b9a95363ada0a)

A post-2.624 regression.

> Distribution: Debian testing/unstable
> Hardware Environment: LG LE50 Express laptop, i386
> Software Environment: Debian i386, X.org 7.3, KDE 4
> 
> Problem Description:
> 
> Got this during bootup, somewhere before launching X. It only happened once,
> I
> have rebooted a few times and didn't reproduce it.
> 
> Mar 25 19:48:46 better kernel: =================================
> Mar 25 19:48:46 better kernel: [ INFO: inconsistent lock state ]
> Mar 25 19:48:46 better kernel: 2.6.25-rc6-lg #4
> Mar 25 19:48:46 better kernel: ---------------------------------
> Mar 25 19:48:46 better kernel: inconsistent {in-softirq-W} -> {softirq-on-W}
> usage.
> Mar 25 19:48:46 better kernel: syslogd/2773 [HC0[0]:SC0[0]:HE1:SE1] takes:
> Mar 25 19:48:46 better kernel:  (&napi->poll_lock){-+..}, at: [<c03db11c>]
> netpoll_poll+0xbc/0x390
> Mar 25 19:48:46 better kernel: {in-softirq-W} state was registered at:
> Mar 25 19:48:46 better kernel:   [<c024171a>] __lock_acquire+0x36a/0x1070
> Mar 25 19:48:46 better kernel:   [<c024153f>] __lock_acquire+0x18f/0x1070
> Mar 25 19:48:46 better kernel:   [<c024247f>] lock_acquire+0x5f/0x80
> Mar 25 19:48:46 better kernel:   [<c03cfafc>] net_rx_action+0xbc/0x1d0
> Mar 25 19:48:46 better kernel:   [<c044a9e4>] _spin_lock+0x34/0x40
> Mar 25 19:48:46 better kernel:   [<c03cfafc>] net_rx_action+0xbc/0x1d0
> Mar 25 19:48:46 better kernel:   [<c03cfafc>] net_rx_action+0xbc/0x1d0
> Mar 25 19:48:46 better kernel:   [<c03cfaa0>] net_rx_action+0x60/0x1d0
> Mar 25 19:48:46 better kernel:   [<c02254f2>] __do_softirq+0x62/0xc0
> Mar 25 19:48:46 better kernel:   [<c0225595>] do_softirq+0x45/0x50
> Mar 25 19:48:46 better kernel:   [<c0225797>] irq_exit+0x77/0x80
> Mar 25 19:48:46 better kernel:   [<c0211a95>]
> smp_apic_timer_interrupt+0x45/0x80
> Mar 25 19:48:46 better kernel:   [<c02058ad>] apic_timer_interrupt+0x29/0x38
> Mar 25 19:48:46 better kernel:   [<c02058b7>] apic_timer_interrupt+0x33/0x38
> Mar 25 19:48:46 better kernel:   [<c022007b>] __set_personality+0x2b/0x240
> Mar 25 19:48:46 better kernel:   [<c0220e51>] release_console_sem+0x191/0x1e0
> Mar 25 19:48:46 better kernel:   [<c05751aa>] init_netconsole+0x1da/0x1f0
> Mar 25 19:48:46 better kernel:   [<c0558700>] kernel_init+0x90/0x270
> Mar 25 19:48:46 better kernel:   [<c0204ecf>] restore_nocheck+0x12/0x15
> Mar 25 19:48:46 better kernel:   [<c0240eec>] trace_hardirqs_on+0x9c/0x110
> Mar 25 19:48:46 better kernel:   [<c0558670>] kernel_init+0x0/0x270
> Mar 25 19:48:46 better kernel:   [<c0558670>] kernel_init+0x0/0x270
> Mar 25 19:48:46 better kernel:   [<c0205a23>] kernel_thread_helper+0x7/0x14
> Mar 25 19:48:46 better kernel:   [<ffffffff>] 0xffffffff
> Mar 25 19:48:46 better kernel: irq event stamp: 174638
> Mar 25 19:48:46 better kernel: hardirqs last  enabled at (174637):
> [<c0204ecf>]
> restore_nocheck+0x12/0x15
> Mar 25 19:48:46 better kernel: hardirqs last disabled at (174638):
> [<c044ac65>]
> _spin_lock_irqsave+0x15/0x60
> Mar 25 19:48:46 better kernel: softirqs last  enabled at (174636):
> [<c0225595>]
> do_softirq+0x45/0x50
> Mar 25 19:48:46 better kernel: softirqs last disabled at (174629):
> [<c0225595>]
> do_softirq+0x45/0x50
> Mar 25 19:48:46 better kernel: 
> Mar 25 19:48:46 better kernel: other info that might help us debug this:
> Mar 25 19:48:46 better kernel: 3 locks held by syslogd/2773:
> Mar 25 19:48:46 better kernel:  #0:  (&tty->atomic_write_lock){--..}, at:
> [<c036c699>] tty_write_lock+0x19/0x50
> Mar 25 19:48:46 better kernel:  #1:  (target_list_lock){--..}, at:
> [<c0399c0d>]
> write_msg+0x2d/0xd0
> Mar 25 19:48:46 better kernel:  #2:  (&napi->poll_lock){-+..}, at:
> [<c03db11c>]
> netpoll_poll+0xbc/0x390
> Mar 25 19:48:46 better kernel: 
> Mar 25 19:48:46 better kernel: stack backtrace:
> Mar 25 19:48:46 better kernel: Pid: 2773, comm: syslogd Not tainted
> 2.6.25-rc6-lg #4
> Mar 25 19:48:46 better kernel:  [<c023ff6f>] print_usage_bug+0x15f/0x170
> Mar 25 19:48:46 better kernel:  [<c0240bd1>] mark_lock+0x401/0x590
> Mar 25 19:48:46 better kernel:  [<c0240d98>] mark_held_locks+0x38/0x70
> Mar 25 19:48:46 better kernel:  [<c0398a43>] sky2_poll+0x7a3/0xad0
> Mar 25 19:48:46 better kernel:  [<c0240eec>] trace_hardirqs_on+0x9c/0x110
> Mar 25 19:48:46 better kernel:  [<c0398a43>] sky2_poll+0x7a3/0xad0
> Mar 25 19:48:46 better kernel:  [<c03db11c>] netpoll_poll+0xbc/0x390
> Mar 25 19:48:46 better kernel:  [<c044adb2>] _spin_trylock+0x42/0x80
> Mar 25 19:48:46 better kernel:  [<c03db0d3>] netpoll_poll+0x73/0x390
> Mar 25 19:48:46 better kernel:  [<c03db51f>] netpoll_send_skb+0x12f/0x1d0
> Mar 25 19:48:46 better kernel:  [<c0399c7c>] write_msg+0x9c/0xd0
> Mar 25 19:48:46 better kernel:  [<c0220bd4>] __call_console_drivers+0x54/0x70
> Mar 25 19:48:46 better kernel:  [<c0220d90>] release_console_sem+0xd0/0x1e0
> Mar 25 19:48:46 better kernel:  [<c037a8d9>] do_con_write+0x189/0x1a80
> Mar 25 19:48:46 better kernel:  [<c0240d98>] mark_held_locks+0x38/0x70
> Mar 25 19:48:46 better kernel:  [<c044affc>]
> _spin_unlock_irqrestore+0x4c/0x60
> Mar 25 19:48:46 better kernel:  [<c0240eec>] trace_hardirqs_on+0x9c/0x110
> Mar 25 19:48:46 better kernel:  [<c044afe7>]
> _spin_unlock_irqrestore+0x37/0x60
> Mar 25 19:48:46 better kernel:  [<c037c1f7>] con_put_char+0x27/0x30
> Mar 25 19:48:46 better kernel:  [<c036e6ed>] opost+0xad/0x220
> Mar 25 19:48:46 better kernel:  [<c036f08b>] write_chan+0x1ab/0x350
> Mar 25 19:48:46 better kernel:  [<c021a790>] default_wake_function+0x0/0x10
> Mar 25 19:48:46 better kernel:  [<c036c87b>] tty_write+0x13b/0x1d0
> Mar 25 19:48:47 better kernel:  [<c036eee0>] write_chan+0x0/0x350
> Mar 25 19:48:47 better kernel:  [<c02802e3>] do_loop_readv_writev+0x33/0x60
> Mar 25 19:48:47 better kernel:  [<c0280d50>] do_readv_writev+0x170/0x180
> Mar 25 19:48:47 better kernel:  [<c036c740>] tty_write+0x0/0x1d0
> Mar 25 19:48:47 better kernel:  [<c0280d9c>] vfs_writev+0x3c/0x50
> Mar 25 19:48:47 better kernel:  [<c0281261>] sys_writev+0x41/0x70
> Mar 25 19:48:47 better kernel:  [<c0204de6>] sysenter_past_esp+0x5f/0xa5
> Mar 25 19:48:47 better kernel:  =======================
> 
Comment 4 Jarek Poplawski 2008-03-26 16:03:33 UTC
Andrew Morton wrote, On 03/25/2008 09:43 PM:

> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
> 
> On Tue, 25 Mar 2008 13:21:23 -0700 (PDT)
> bugme-daemon@bugzilla.kernel.org wrote:
> 
>> http://bugzilla.kernel.org/show_bug.cgi?id=10326
>>
>>            Summary: inconsistent lock state in net_rx_action
>>            Product: Networking
>>            Version: 2.5
>>      KernelVersion: 2.6.25-rc6 (git
>>                     a4083c9271e0a697278e089f2c0b9a95363ada0a)
>>           Platform: All
>>         OS/Version: Linux
>>               Tree: Mainline
>>             Status: NEW
>>           Severity: normal
>>           Priority: P1
>>          Component: Other
>>         AssignedTo: acme@ghostprotocols.net
>>         ReportedBy: marcus@better.se
>>
>>
>> Latest working kernel version: 2.6.24
>> Earliest failing kernel version: 2.6.25-rc6 (git
>> a4083c9271e0a697278e089f2c0b9a95363ada0a)
> 
> A post-2.624 regression.
> 
>> Distribution: Debian testing/unstable
>> Hardware Environment: LG LE50 Express laptop, i386
>> Software Environment: Debian i386, X.org 7.3, KDE 4
>>
>> Problem Description:
>>
>> Got this during bootup, somewhere before launching X. It only happened once,
>> I
>> have rebooted a few times and didn't reproduce it.
>>
>> Mar 25 19:48:46 better kernel: =================================
>> Mar 25 19:48:46 better kernel: [ INFO: inconsistent lock state ]
>> Mar 25 19:48:46 better kernel: 2.6.25-rc6-lg #4
>> Mar 25 19:48:46 better kernel: ---------------------------------
>> Mar 25 19:48:46 better kernel: inconsistent {in-softirq-W} -> {softirq-on-W}
>> usage.
>> Mar 25 19:48:46 better kernel: syslogd/2773 [HC0[0]:SC0[0]:HE1:SE1] takes:
>> Mar 25 19:48:46 better kernel:  (&napi->poll_lock){-+..}, at: [<c03db11c>]
>> netpoll_poll+0xbc/0x390
>> Mar 25 19:48:46 better kernel: {in-softirq-W} state was registered at:
>> Mar 25 19:48:46 better kernel:   [<c024171a>] __lock_acquire+0x36a/0x1070
>> Mar 25 19:48:46 better kernel:   [<c024153f>] __lock_acquire+0x18f/0x1070
>> Mar 25 19:48:46 better kernel:   [<c024247f>] lock_acquire+0x5f/0x80
>> Mar 25 19:48:46 better kernel:   [<c03cfafc>] net_rx_action+0xbc/0x1d0
>> Mar 25 19:48:46 better kernel:   [<c044a9e4>] _spin_lock+0x34/0x40
>> Mar 25 19:48:46 better kernel:   [<c03cfafc>] net_rx_action+0xbc/0x1d0
>> Mar 25 19:48:46 better kernel:   [<c03cfafc>] net_rx_action+0xbc/0x1d0
>> Mar 25 19:48:46 better kernel:   [<c03cfaa0>] net_rx_action+0x60/0x1d0
>> Mar 25 19:48:46 better kernel:   [<c02254f2>] __do_softirq+0x62/0xc0
>> Mar 25 19:48:46 better kernel:   [<c0225595>] do_softirq+0x45/0x50
>> Mar 25 19:48:46 better kernel:   [<c0225797>] irq_exit+0x77/0x80
>> Mar 25 19:48:46 better kernel:   [<c0211a95>]
>> smp_apic_timer_interrupt+0x45/0x80
>> Mar 25 19:48:46 better kernel:   [<c02058ad>] apic_timer_interrupt+0x29/0x38
>> Mar 25 19:48:46 better kernel:   [<c02058b7>] apic_timer_interrupt+0x33/0x38
>> Mar 25 19:48:46 better kernel:   [<c022007b>] __set_personality+0x2b/0x240
>> Mar 25 19:48:46 better kernel:   [<c0220e51>]
>> release_console_sem+0x191/0x1e0
>> Mar 25 19:48:46 better kernel:   [<c05751aa>] init_netconsole+0x1da/0x1f0
>> Mar 25 19:48:46 better kernel:   [<c0558700>] kernel_init+0x90/0x270
>> Mar 25 19:48:46 better kernel:   [<c0204ecf>] restore_nocheck+0x12/0x15
>> Mar 25 19:48:46 better kernel:   [<c0240eec>] trace_hardirqs_on+0x9c/0x110
>> Mar 25 19:48:46 better kernel:   [<c0558670>] kernel_init+0x0/0x270
>> Mar 25 19:48:46 better kernel:   [<c0558670>] kernel_init+0x0/0x270
>> Mar 25 19:48:46 better kernel:   [<c0205a23>] kernel_thread_helper+0x7/0x14
>> Mar 25 19:48:46 better kernel:   [<ffffffff>] 0xffffffff
>> Mar 25 19:48:46 better kernel: irq event stamp: 174638
>> Mar 25 19:48:46 better kernel: hardirqs last  enabled at (174637):
>> [<c0204ecf>]
>> restore_nocheck+0x12/0x15
>> Mar 25 19:48:46 better kernel: hardirqs last disabled at (174638):
>> [<c044ac65>]
>> _spin_lock_irqsave+0x15/0x60
>> Mar 25 19:48:46 better kernel: softirqs last  enabled at (174636):
>> [<c0225595>]
>> do_softirq+0x45/0x50
>> Mar 25 19:48:46 better kernel: softirqs last disabled at (174629):
>> [<c0225595>]
>> do_softirq+0x45/0x50
>> Mar 25 19:48:46 better kernel: 
>> Mar 25 19:48:46 better kernel: other info that might help us debug this:
>> Mar 25 19:48:46 better kernel: 3 locks held by syslogd/2773:
>> Mar 25 19:48:46 better kernel:  #0:  (&tty->atomic_write_lock){--..}, at:
>> [<c036c699>] tty_write_lock+0x19/0x50
>> Mar 25 19:48:46 better kernel:  #1:  (target_list_lock){--..}, at:
>> [<c0399c0d>]
>> write_msg+0x2d/0xd0
>> Mar 25 19:48:46 better kernel:  #2:  (&napi->poll_lock){-+..}, at:
>> [<c03db11c>]
>> netpoll_poll+0xbc/0x390
>> Mar 25 19:48:46 better kernel: 
>> Mar 25 19:48:46 better kernel: stack backtrace:
>> Mar 25 19:48:46 better kernel: Pid: 2773, comm: syslogd Not tainted
>> 2.6.25-rc6-lg #4
>> Mar 25 19:48:46 better kernel:  [<c023ff6f>] print_usage_bug+0x15f/0x170
>> Mar 25 19:48:46 better kernel:  [<c0240bd1>] mark_lock+0x401/0x590
>> Mar 25 19:48:46 better kernel:  [<c0240d98>] mark_held_locks+0x38/0x70
>> Mar 25 19:48:46 better kernel:  [<c0398a43>] sky2_poll+0x7a3/0xad0
>> Mar 25 19:48:46 better kernel:  [<c0240eec>] trace_hardirqs_on+0x9c/0x110
>> Mar 25 19:48:46 better kernel:  [<c0398a43>] sky2_poll+0x7a3/0xad0
>> Mar 25 19:48:46 better kernel:  [<c03db11c>] netpoll_poll+0xbc/0x390
>> Mar 25 19:48:46 better kernel:  [<c044adb2>] _spin_trylock+0x42/0x80
>> Mar 25 19:48:46 better kernel:  [<c03db0d3>] netpoll_poll+0x73/0x390
>> Mar 25 19:48:46 better kernel:  [<c03db51f>] netpoll_send_skb+0x12f/0x1d0
>> Mar 25 19:48:46 better kernel:  [<c0399c7c>] write_msg+0x9c/0xd0


netconsole's write_msg calls netpoll_send_udp (then netpoll_send_skb) always
under spin_lock_irqsave(&target_list_lock, flags), and here this lock (and 
afterwards napi->poll_lock) is seen taken with irqs enabled. So, IMHO, it
looks like some other irq (tracking?) imbalance and not networking bug.

Regards,
Jarek P.

PS: linux-kernel added to CC.

>> Mar 25 19:48:46 better kernel:  [<c0220bd4>]
>> __call_console_drivers+0x54/0x70
>> Mar 25 19:48:46 better kernel:  [<c0220d90>] release_console_sem+0xd0/0x1e0
>> Mar 25 19:48:46 better kernel:  [<c037a8d9>] do_con_write+0x189/0x1a80
>> Mar 25 19:48:46 better kernel:  [<c0240d98>] mark_held_locks+0x38/0x70
>> Mar 25 19:48:46 better kernel:  [<c044affc>]
>> _spin_unlock_irqrestore+0x4c/0x60
>> Mar 25 19:48:46 better kernel:  [<c0240eec>] trace_hardirqs_on+0x9c/0x110
>> Mar 25 19:48:46 better kernel:  [<c044afe7>]
>> _spin_unlock_irqrestore+0x37/0x60
>> Mar 25 19:48:46 better kernel:  [<c037c1f7>] con_put_char+0x27/0x30
>> Mar 25 19:48:46 better kernel:  [<c036e6ed>] opost+0xad/0x220
>> Mar 25 19:48:46 better kernel:  [<c036f08b>] write_chan+0x1ab/0x350
>> Mar 25 19:48:46 better kernel:  [<c021a790>] default_wake_function+0x0/0x10
>> Mar 25 19:48:46 better kernel:  [<c036c87b>] tty_write+0x13b/0x1d0
>> Mar 25 19:48:47 better kernel:  [<c036eee0>] write_chan+0x0/0x350
>> Mar 25 19:48:47 better kernel:  [<c02802e3>] do_loop_readv_writev+0x33/0x60
>> Mar 25 19:48:47 better kernel:  [<c0280d50>] do_readv_writev+0x170/0x180
>> Mar 25 19:48:47 better kernel:  [<c036c740>] tty_write+0x0/0x1d0
>> Mar 25 19:48:47 better kernel:  [<c0280d9c>] vfs_writev+0x3c/0x50
>> Mar 25 19:48:47 better kernel:  [<c0281261>] sys_writev+0x41/0x70
>> Mar 25 19:48:47 better kernel:  [<c0204de6>] sysenter_past_esp+0x5f/0xa5
>> Mar 25 19:48:47 better kernel:  =======================
>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
Comment 5 Anonymous Emailer 2008-03-26 17:14:11 UTC
Reply-To: akpm@linux-foundation.org

On Thu, 27 Mar 2008 00:13:41 +0100 Jarek Poplawski <jarkao2@gmail.com> wrote:

> Andrew Morton wrote, On 03/25/2008 09:43 PM:
> 
> > (switched to email.  Please respond via emailed reply-to-all, not via the
> > bugzilla web interface).
> > 
> > On Tue, 25 Mar 2008 13:21:23 -0700 (PDT)
> > bugme-daemon@bugzilla.kernel.org wrote:
> > 
> >> http://bugzilla.kernel.org/show_bug.cgi?id=10326
> >>
> >>            Summary: inconsistent lock state in net_rx_action
> >>            Product: Networking
> >>            Version: 2.5
> >>      KernelVersion: 2.6.25-rc6 (git
> >>                     a4083c9271e0a697278e089f2c0b9a95363ada0a)
> >>           Platform: All
> >>         OS/Version: Linux
> >>               Tree: Mainline
> >>             Status: NEW
> >>           Severity: normal
> >>           Priority: P1
> >>          Component: Other
> >>         AssignedTo: acme@ghostprotocols.net
> >>         ReportedBy: marcus@better.se
> >>
> >>
> >> Latest working kernel version: 2.6.24
> >> Earliest failing kernel version: 2.6.25-rc6 (git
> >> a4083c9271e0a697278e089f2c0b9a95363ada0a)
> > 
> > A post-2.624 regression.
> > 
> >> Distribution: Debian testing/unstable
> >> Hardware Environment: LG LE50 Express laptop, i386
> >> Software Environment: Debian i386, X.org 7.3, KDE 4
> >>
> >> Problem Description:
> >>
> >> Got this during bootup, somewhere before launching X. It only happened
> once, I
> >> have rebooted a few times and didn't reproduce it.
> >>
> >> Mar 25 19:48:46 better kernel: =================================
> >> Mar 25 19:48:46 better kernel: [ INFO: inconsistent lock state ]
> >> Mar 25 19:48:46 better kernel: 2.6.25-rc6-lg #4
> >> Mar 25 19:48:46 better kernel: ---------------------------------
> >> Mar 25 19:48:46 better kernel: inconsistent {in-softirq-W} ->
> {softirq-on-W}
> >> usage.
> >> Mar 25 19:48:46 better kernel: syslogd/2773 [HC0[0]:SC0[0]:HE1:SE1] takes:
> >> Mar 25 19:48:46 better kernel:  (&napi->poll_lock){-+..}, at: [<c03db11c>]
> >> netpoll_poll+0xbc/0x390
> >> Mar 25 19:48:46 better kernel: {in-softirq-W} state was registered at:
> >> Mar 25 19:48:46 better kernel:   [<c024171a>] __lock_acquire+0x36a/0x1070
> >> Mar 25 19:48:46 better kernel:   [<c024153f>] __lock_acquire+0x18f/0x1070
> >> Mar 25 19:48:46 better kernel:   [<c024247f>] lock_acquire+0x5f/0x80
> >> Mar 25 19:48:46 better kernel:   [<c03cfafc>] net_rx_action+0xbc/0x1d0
> >> Mar 25 19:48:46 better kernel:   [<c044a9e4>] _spin_lock+0x34/0x40
> >> Mar 25 19:48:46 better kernel:   [<c03cfafc>] net_rx_action+0xbc/0x1d0
> >> Mar 25 19:48:46 better kernel:   [<c03cfafc>] net_rx_action+0xbc/0x1d0
> >> Mar 25 19:48:46 better kernel:   [<c03cfaa0>] net_rx_action+0x60/0x1d0
> >> Mar 25 19:48:46 better kernel:   [<c02254f2>] __do_softirq+0x62/0xc0
> >> Mar 25 19:48:46 better kernel:   [<c0225595>] do_softirq+0x45/0x50
> >> Mar 25 19:48:46 better kernel:   [<c0225797>] irq_exit+0x77/0x80
> >> Mar 25 19:48:46 better kernel:   [<c0211a95>]
> >> smp_apic_timer_interrupt+0x45/0x80
> >> Mar 25 19:48:46 better kernel:   [<c02058ad>]
> apic_timer_interrupt+0x29/0x38
> >> Mar 25 19:48:46 better kernel:   [<c02058b7>]
> apic_timer_interrupt+0x33/0x38
> >> Mar 25 19:48:46 better kernel:   [<c022007b>] __set_personality+0x2b/0x240
> >> Mar 25 19:48:46 better kernel:   [<c0220e51>]
> release_console_sem+0x191/0x1e0
> >> Mar 25 19:48:46 better kernel:   [<c05751aa>] init_netconsole+0x1da/0x1f0
> >> Mar 25 19:48:46 better kernel:   [<c0558700>] kernel_init+0x90/0x270
> >> Mar 25 19:48:46 better kernel:   [<c0204ecf>] restore_nocheck+0x12/0x15
> >> Mar 25 19:48:46 better kernel:   [<c0240eec>] trace_hardirqs_on+0x9c/0x110
> >> Mar 25 19:48:46 better kernel:   [<c0558670>] kernel_init+0x0/0x270
> >> Mar 25 19:48:46 better kernel:   [<c0558670>] kernel_init+0x0/0x270
> >> Mar 25 19:48:46 better kernel:   [<c0205a23>]
> kernel_thread_helper+0x7/0x14
> >> Mar 25 19:48:46 better kernel:   [<ffffffff>] 0xffffffff
> >> Mar 25 19:48:46 better kernel: irq event stamp: 174638
> >> Mar 25 19:48:46 better kernel: hardirqs last  enabled at (174637):
> [<c0204ecf>]
> >> restore_nocheck+0x12/0x15
> >> Mar 25 19:48:46 better kernel: hardirqs last disabled at (174638):
> [<c044ac65>]
> >> _spin_lock_irqsave+0x15/0x60
> >> Mar 25 19:48:46 better kernel: softirqs last  enabled at (174636):
> [<c0225595>]
> >> do_softirq+0x45/0x50
> >> Mar 25 19:48:46 better kernel: softirqs last disabled at (174629):
> [<c0225595>]
> >> do_softirq+0x45/0x50
> >> Mar 25 19:48:46 better kernel: 
> >> Mar 25 19:48:46 better kernel: other info that might help us debug this:
> >> Mar 25 19:48:46 better kernel: 3 locks held by syslogd/2773:
> >> Mar 25 19:48:46 better kernel:  #0:  (&tty->atomic_write_lock){--..}, at:
> >> [<c036c699>] tty_write_lock+0x19/0x50
> >> Mar 25 19:48:46 better kernel:  #1:  (target_list_lock){--..}, at:
> [<c0399c0d>]
> >> write_msg+0x2d/0xd0
> >> Mar 25 19:48:46 better kernel:  #2:  (&napi->poll_lock){-+..}, at:
> [<c03db11c>]
> >> netpoll_poll+0xbc/0x390
> >> Mar 25 19:48:46 better kernel: 
> >> Mar 25 19:48:46 better kernel: stack backtrace:
> >> Mar 25 19:48:46 better kernel: Pid: 2773, comm: syslogd Not tainted
> >> 2.6.25-rc6-lg #4
> >> Mar 25 19:48:46 better kernel:  [<c023ff6f>] print_usage_bug+0x15f/0x170
> >> Mar 25 19:48:46 better kernel:  [<c0240bd1>] mark_lock+0x401/0x590
> >> Mar 25 19:48:46 better kernel:  [<c0240d98>] mark_held_locks+0x38/0x70
> >> Mar 25 19:48:46 better kernel:  [<c0398a43>] sky2_poll+0x7a3/0xad0
> >> Mar 25 19:48:46 better kernel:  [<c0240eec>] trace_hardirqs_on+0x9c/0x110
> >> Mar 25 19:48:46 better kernel:  [<c0398a43>] sky2_poll+0x7a3/0xad0
> >> Mar 25 19:48:46 better kernel:  [<c03db11c>] netpoll_poll+0xbc/0x390
> >> Mar 25 19:48:46 better kernel:  [<c044adb2>] _spin_trylock+0x42/0x80
> >> Mar 25 19:48:46 better kernel:  [<c03db0d3>] netpoll_poll+0x73/0x390
> >> Mar 25 19:48:46 better kernel:  [<c03db51f>] netpoll_send_skb+0x12f/0x1d0
> >> Mar 25 19:48:46 better kernel:  [<c0399c7c>] write_msg+0x9c/0xd0
> 
> 
> netconsole's write_msg calls netpoll_send_udp (then netpoll_send_skb) always
> under spin_lock_irqsave(&target_list_lock, flags), and here this lock (and 
> afterwards napi->poll_lock) is seen taken with irqs enabled. So, IMHO, it
> looks like some other irq (tracking?) imbalance and not networking bug.
> 

No, it's not an irq_disable() thing, directly.

What lockdep is saying is that sky2_poll() is taking napi->poll_lock for
writing with softirqs enabled, but net_rx_action() takes the same lock from
within softirq context.

If sky2_poll() always takes napi->poll_lock under local_irq_disable() then
that would be a lockdep bug.
Comment 6 Jarek Poplawski 2008-03-27 01:52:12 UTC
On Wed, Mar 26, 2008 at 05:14:03PM -0700, Andrew Morton wrote:
...
> > >> http://bugzilla.kernel.org/show_bug.cgi?id=10326
...
> No, it's not an irq_disable() thing, directly.
> 
> What lockdep is saying is that sky2_poll() is taking napi->poll_lock for
> writing with softirqs enabled, but net_rx_action() takes the same lock from
> within softirq context.
> 
> If sky2_poll() always takes napi->poll_lock under local_irq_disable() then
> that would be a lockdep bug.

sky2_poll() doesn't take napi->poll_lock; this lock is taken by
netpoll_poll() before calling sky2_poll(). And before this hardirqs
are disabled in write_msg(). So, theoretically lockdep could be right
if sky2_poll() would enable irqs after this. (If it were done in
netpoll - lockdep should warn before or after sky2_poll() call.)
But I really can't see any such possibility in sky2_poll().

Regards,
Jarek P.
Comment 7 Jarek Poplawski 2008-03-27 02:04:22 UTC
On Thu, Mar 27, 2008 at 09:55:42AM +0100, Jarek Poplawski wrote:
...
> sky2_poll() doesn't take napi->poll_lock; this lock is taken by
> netpoll_poll() before calling sky2_poll(). And before this hardirqs
> are disabled in write_msg().

Actually, hardirqs are disabled again before netpoll_poll() (in
netpoll_send_skb()), so sky2_poll() should need enable_irqs 2x
to justify such a warning.

Jarek P.
Comment 8 Anonymous Emailer 2008-03-27 02:18:38 UTC
Reply-To: akpm@linux-foundation.org

On Thu, 27 Mar 2008 09:55:42 +0100 Jarek Poplawski <jarkao2@gmail.com> wrote:

> On Wed, Mar 26, 2008 at 05:14:03PM -0700, Andrew Morton wrote:
> ...
> > > >> http://bugzilla.kernel.org/show_bug.cgi?id=10326
> ...
> > No, it's not an irq_disable() thing, directly.
> > 
> > What lockdep is saying is that sky2_poll() is taking napi->poll_lock for
> > writing with softirqs enabled, but net_rx_action() takes the same lock from
> > within softirq context.
> > 
> > If sky2_poll() always takes napi->poll_lock under local_irq_disable() then
> > that would be a lockdep bug.
> 
> sky2_poll() doesn't take napi->poll_lock; this lock is taken by
> netpoll_poll() before calling sky2_poll(). And before this hardirqs
> are disabled in write_msg(). So, theoretically lockdep could be right
> if sky2_poll() would enable irqs after this. (If it were done in
> netpoll - lockdep should warn before or after sky2_poll() call.)
> But I really can't see any such possibility in sky2_poll().

I can't spot it from a five-minute read either.  gcc's autoinlining really
makes this sort of thing much harder than it used to be :(

Anyway, the accusation is that lockdep is busted, in that it doesn't realise that
local_irq_disable() blocks softirqs.

I bet the net code is wrong and we missed it ;)
Comment 9 Jarek Poplawski 2008-03-27 03:02:27 UTC
On Thu, Mar 27, 2008 at 02:18:12AM -0700, Andrew Morton wrote:
...
> Anyway, the accusation is that lockdep is busted, in that it doesn't realise
> that
> local_irq_disable() blocks softirqs.

Usually lockdep gets this right, so I've some doubts too. But here
it's not only about softirqs; this:

> >> Mar 25 19:48:46 better kernel:  #1:  (target_list_lock){--..}, at:
> [<c0399c0d>]

means lockdep saw hardirqs enabled while acquiring/holding this lock.

Jarek P.
Comment 10 Peter Zijlstra 2008-03-27 03:57:08 UTC
On Thu, 2008-03-27 at 02:18 -0700, Andrew Morton wrote:
> On Thu, 27 Mar 2008 09:55:42 +0100 Jarek Poplawski <jarkao2@gmail.com> wrote:
> 
> > On Wed, Mar 26, 2008 at 05:14:03PM -0700, Andrew Morton wrote:
> > ...
> > > > >> http://bugzilla.kernel.org/show_bug.cgi?id=10326
> > ...
> > > No, it's not an irq_disable() thing, directly.
> > > 
> > > What lockdep is saying is that sky2_poll() is taking napi->poll_lock for
> > > writing with softirqs enabled, but net_rx_action() takes the same lock
> from
> > > within softirq context.
> > > 
> > > If sky2_poll() always takes napi->poll_lock under local_irq_disable()
> then
> > > that would be a lockdep bug.
> > 
> > sky2_poll() doesn't take napi->poll_lock; this lock is taken by
> > netpoll_poll() before calling sky2_poll(). And before this hardirqs
> > are disabled in write_msg(). So, theoretically lockdep could be right
> > if sky2_poll() would enable irqs after this. (If it were done in
> > netpoll - lockdep should warn before or after sky2_poll() call.)
> > But I really can't see any such possibility in sky2_poll().
> 
> I can't spot it from a five-minute read either.  gcc's autoinlining really
> makes this sort of thing much harder than it used to be :(
> 
> Anyway, the accusation is that lockdep is busted, in that it doesn't realise
> that
> local_irq_disable() blocks softirqs.
> 
> I bet the net code is wrong and we missed it ;)

How about this:

<irqs disabled>

  netpoll_poll()
    poll_napi()
      spin_trylock(&napi->poll_lock)
      poll_one_napi()
        napi->poll() := sky2_poll()
          napi_complete()
            local_irq_disable()
            local_irq_enable() <--- *BUG*

  <irq>
    irq_exit()
      do_softirq()
        net_rx_action()
          spin_lock(&napi->poll_lock) <--- Deadlock!

Because we still hold the lock....
Comment 11 David S. Miller 2008-03-27 04:04:07 UTC
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Thu, 27 Mar 2008 11:56:19 +0100

> How about this:
> 
> <irqs disabled>
> 
>   netpoll_poll()
>     poll_napi()
>       spin_trylock(&napi->poll_lock)
>       poll_one_napi()
>         napi->poll() := sky2_poll()
>           napi_complete()
>             local_irq_disable()
>             local_irq_enable() <--- *BUG*
> 
>   <irq>
>     irq_exit()
>       do_softirq()
>         net_rx_action()
>           spin_lock(&napi->poll_lock) <--- Deadlock!
> 
> Because we still hold the lock....

Yep, that will deadlock.

Can some test if using local_irq_{save,restore}() in
napi_complete() cures this lockdep warning?

Thanks.
Comment 12 Marcus Better 2008-03-27 04:59:19 UTC
David Miller wrote:
> Can some test if using local_irq_{save,restore}() in
> napi_complete() cures this lockdep warning?

I cannot reproduce the bug, so cannot help much.

Marcus
Comment 13 Jarek Poplawski 2008-03-27 05:17:22 UTC
On Thu, Mar 27, 2008 at 11:56:19AM +0100, Peter Zijlstra wrote:
> On Thu, 2008-03-27 at 02:18 -0700, Andrew Morton wrote:
...
> > I bet the net code is wrong and we missed it ;)

It looks like you are natural born winner! Congratulations!

> How about this:
> 
> <irqs disabled>
> 
>   netpoll_poll()
>     poll_napi()
>       spin_trylock(&napi->poll_lock)
>       poll_one_napi()
>         napi->poll() := sky2_poll()
>           napi_complete()
>             local_irq_disable()
>             local_irq_enable() <--- *BUG*

Yes! I missed it's unconditional here... Great catch!

On the other hand, still a question why lockdep doesn't see this
every day?

Jarek P.
Comment 14 Jarek Poplawski 2008-03-27 05:27:48 UTC
On Thu, Mar 27, 2008 at 12:59:02PM +0100, Marcus Better wrote:
> David Miller wrote:
>> Can some test if using local_irq_{save,restore}() in
>> napi_complete() cures this lockdep warning?

Isn't it now justified enough without lockdep's ack?

> I cannot reproduce the bug, so cannot help much.

Marcus, you've helped here very much and enough!

Thanks,
Jarek P.
Comment 15 Peter Zijlstra 2008-03-27 05:31:09 UTC
On Thu, 2008-03-27 at 13:22 +0100, Jarek Poplawski wrote:
> On Thu, Mar 27, 2008 at 11:56:19AM +0100, Peter Zijlstra wrote:
> > On Thu, 2008-03-27 at 02:18 -0700, Andrew Morton wrote:
> ....
> > > I bet the net code is wrong and we missed it ;)
> 
> It looks like you are natural born winner! Congratulations!
> 
> > How about this:
> > 
> > <irqs disabled>
> > 
> >   netpoll_poll()
> >     poll_napi()
> >       spin_trylock(&napi->poll_lock)
> >       poll_one_napi()
> >         napi->poll() := sky2_poll()
> >           napi_complete()
> >             local_irq_disable()
> >             local_irq_enable() <--- *BUG*
> 
> Yes! I missed it's unconditional here... Great catch!
> 
> On the other hand, still a question why lockdep doesn't see this
> every day?

My guess is that it is a race between polling the device and irq pushing
the packet. That is, normally the IRQ handler wins and netpoll doesn't
have anything to do and it doesn't traverse this code path.

(although I must admit to being a little out of my depth here)
Comment 16 Jarek Poplawski 2008-03-27 05:43:41 UTC
On Thu, Mar 27, 2008 at 01:30:14PM +0100, Peter Zijlstra wrote:
...
> My guess is that it is a race between polling the device and irq pushing
> the packet. That is, normally the IRQ handler wins and netpoll doesn't
> have anything to do and it doesn't traverse this code path.
> 
> (although I must admit to being a little out of my depth here)

Probably you're right but there were really a lot of testing and
stressing netconsole for misterious lockups for quite a long time
with this new napi, so it's hard to believe this was hidden so
well...

Jarek P.
Comment 17 David S. Miller 2008-03-27 14:44:37 UTC
From: Jarek Poplawski <jarkao2@gmail.com>
Date: Thu, 27 Mar 2008 13:32:48 +0100

> On Thu, Mar 27, 2008 at 12:59:02PM +0100, Marcus Better wrote:
> > David Miller wrote:
> >> Can some test if using local_irq_{save,restore}() in
> >> napi_complete() cures this lockdep warning?
> 
> Isn't it now justified enough without lockdep's ack?

Absolutely, but I want to know that the problem is solved
and confirmed by a tester.
Comment 18 Rafael J. Wysocki 2008-03-27 14:57:50 UTC
On Thursday, 27 of March 2008, Jarek Poplawski wrote:
> On Thu, Mar 27, 2008 at 01:30:14PM +0100, Peter Zijlstra wrote:
> ...
> > My guess is that it is a race between polling the device and irq pushing
> > the packet. That is, normally the IRQ handler wins and netpoll doesn't
> > have anything to do and it doesn't traverse this code path.
> > 
> > (although I must admit to being a little out of my depth here)
> 
> Probably you're right but there were really a lot of testing and
> stressing netconsole for misterious lockups for quite a long time
> with this new napi, so it's hard to believe this was hidden so
> well...

I assume there will be a patch posted for this issue.  Correct?

Rafael
Comment 19 Jarek Poplawski 2008-03-27 15:18:36 UTC
On Thu, Mar 27, 2008 at 10:56:49PM +0100, Rafael J. Wysocki wrote:
...
> I assume there will be a patch posted for this issue.  Correct?

I guess Peter should know the answer!

Jarek P.
Comment 20 David S. Miller 2008-03-27 15:22:23 UTC
From: Jarek Poplawski <jarkao2@gmail.com>
Date: Thu, 27 Mar 2008 23:22:48 +0100

> On Thu, Mar 27, 2008 at 10:56:49PM +0100, Rafael J. Wysocki wrote:
> ...
> > I assume there will be a patch posted for this issue.  Correct?
> 
> I guess Peter should know the answer!

Instead of going back and forth like this can someone post
just take it upon themselves to post such an obvious patch
already :-)

All of this "he'll do it, she'll do it" just wastes time.

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index a2f0032..fae6a7e 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -383,9 +383,11 @@ static inline void __napi_complete(struct napi_struct *n)
 
 static inline void napi_complete(struct napi_struct *n)
 {
-	local_irq_disable();
+	unsigned long flags;
+
+	local_irq_save(flags);
 	__napi_complete(n);
-	local_irq_enable();
+	local_irq_restore(flags);
 }
 
 /**
Comment 21 Rafael J. Wysocki 2008-03-27 15:32:39 UTC
Regressions list annotation:
Patch : http://bugzilla.kernel.org/show_bug.cgi?id=10326#c20
Comment 22 Jarek Poplawski 2008-03-27 15:35:04 UTC
On Thu, Mar 27, 2008 at 03:22:18PM -0700, David Miller wrote:
...
> Instead of going back and forth like this can someone post
> just take it upon themselves to post such an obvious patch
> already :-)
> 
> All of this "he'll do it, she'll do it" just wastes time.

Right! She or he should also remember to add Peter's lockup
scenario and his signed-off-by...

Jarek P.
Comment 23 David S. Miller 2008-03-27 17:59:41 UTC
From: Jarek Poplawski <jarkao2@gmail.com>
Date: Thu, 27 Mar 2008 23:39:13 +0100

> On Thu, Mar 27, 2008 at 03:22:18PM -0700, David Miller wrote:
> ...
> > Instead of going back and forth like this can someone post
> > just take it upon themselves to post such an obvious patch
> > already :-)
> > 
> > All of this "he'll do it, she'll do it" just wastes time.
> 
> Right! She or he should also remember to add Peter's lockup
> scenario and his signed-off-by...

I didn't put in a Signed-off-by since Peter didn't give
one, but I did give him credit for the discovery and I
did list explicitly his given logic.
Comment 24 Peter Zijlstra 2008-03-28 03:12:58 UTC
On Thu, 2008-03-27 at 17:59 -0700, David Miller wrote:
> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Thu, 27 Mar 2008 23:39:13 +0100
> 
> > On Thu, Mar 27, 2008 at 03:22:18PM -0700, David Miller wrote:
> > ...
> > > Instead of going back and forth like this can someone post
> > > just take it upon themselves to post such an obvious patch
> > > already :-)
> > > 
> > > All of this "he'll do it, she'll do it" just wastes time.
> > 
> > Right! She or he should also remember to add Peter's lockup
> > scenario and his signed-off-by...
> 
> I didn't put in a Signed-off-by since Peter didn't give
> one, but I did give him credit for the discovery and I
> did list explicitly his given logic.

Hehe, nr_emails > loc_changed situation alert!

Fwiw here's my ACK:

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>