Bug 10326
Summary: | inconsistent lock state in net_rx_action | ||
---|---|---|---|
Product: | Networking | Reporter: | Marcus Better (marcus) |
Component: | Other | Assignee: | Arnaldo Carvalho de Melo (acme) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | rjw |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.25-rc6 (git a4083c9271e0a697278e089f2c0b9a95363ada0a) | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 9832 | ||
Attachments: |
Kernel config
Output of lspci -v |
Description
Marcus Better
2008-03-25 13:21:22 UTC
Created attachment 15431 [details]
Kernel config
Created attachment 15432 [details]
Output of lspci -v
Reply-To: akpm@linux-foundation.org (switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Tue, 25 Mar 2008 13:21:23 -0700 (PDT) bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=10326 > > Summary: inconsistent lock state in net_rx_action > Product: Networking > Version: 2.5 > KernelVersion: 2.6.25-rc6 (git > a4083c9271e0a697278e089f2c0b9a95363ada0a) > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Other > AssignedTo: acme@ghostprotocols.net > ReportedBy: marcus@better.se > > > Latest working kernel version: 2.6.24 > Earliest failing kernel version: 2.6.25-rc6 (git > a4083c9271e0a697278e089f2c0b9a95363ada0a) A post-2.624 regression. > Distribution: Debian testing/unstable > Hardware Environment: LG LE50 Express laptop, i386 > Software Environment: Debian i386, X.org 7.3, KDE 4 > > Problem Description: > > Got this during bootup, somewhere before launching X. It only happened once, > I > have rebooted a few times and didn't reproduce it. > > Mar 25 19:48:46 better kernel: ================================= > Mar 25 19:48:46 better kernel: [ INFO: inconsistent lock state ] > Mar 25 19:48:46 better kernel: 2.6.25-rc6-lg #4 > Mar 25 19:48:46 better kernel: --------------------------------- > Mar 25 19:48:46 better kernel: inconsistent {in-softirq-W} -> {softirq-on-W} > usage. > Mar 25 19:48:46 better kernel: syslogd/2773 [HC0[0]:SC0[0]:HE1:SE1] takes: > Mar 25 19:48:46 better kernel: (&napi->poll_lock){-+..}, at: [<c03db11c>] > netpoll_poll+0xbc/0x390 > Mar 25 19:48:46 better kernel: {in-softirq-W} state was registered at: > Mar 25 19:48:46 better kernel: [<c024171a>] __lock_acquire+0x36a/0x1070 > Mar 25 19:48:46 better kernel: [<c024153f>] __lock_acquire+0x18f/0x1070 > Mar 25 19:48:46 better kernel: [<c024247f>] lock_acquire+0x5f/0x80 > Mar 25 19:48:46 better kernel: [<c03cfafc>] net_rx_action+0xbc/0x1d0 > Mar 25 19:48:46 better kernel: [<c044a9e4>] _spin_lock+0x34/0x40 > Mar 25 19:48:46 better kernel: [<c03cfafc>] net_rx_action+0xbc/0x1d0 > Mar 25 19:48:46 better kernel: [<c03cfafc>] net_rx_action+0xbc/0x1d0 > Mar 25 19:48:46 better kernel: [<c03cfaa0>] net_rx_action+0x60/0x1d0 > Mar 25 19:48:46 better kernel: [<c02254f2>] __do_softirq+0x62/0xc0 > Mar 25 19:48:46 better kernel: [<c0225595>] do_softirq+0x45/0x50 > Mar 25 19:48:46 better kernel: [<c0225797>] irq_exit+0x77/0x80 > Mar 25 19:48:46 better kernel: [<c0211a95>] > smp_apic_timer_interrupt+0x45/0x80 > Mar 25 19:48:46 better kernel: [<c02058ad>] apic_timer_interrupt+0x29/0x38 > Mar 25 19:48:46 better kernel: [<c02058b7>] apic_timer_interrupt+0x33/0x38 > Mar 25 19:48:46 better kernel: [<c022007b>] __set_personality+0x2b/0x240 > Mar 25 19:48:46 better kernel: [<c0220e51>] release_console_sem+0x191/0x1e0 > Mar 25 19:48:46 better kernel: [<c05751aa>] init_netconsole+0x1da/0x1f0 > Mar 25 19:48:46 better kernel: [<c0558700>] kernel_init+0x90/0x270 > Mar 25 19:48:46 better kernel: [<c0204ecf>] restore_nocheck+0x12/0x15 > Mar 25 19:48:46 better kernel: [<c0240eec>] trace_hardirqs_on+0x9c/0x110 > Mar 25 19:48:46 better kernel: [<c0558670>] kernel_init+0x0/0x270 > Mar 25 19:48:46 better kernel: [<c0558670>] kernel_init+0x0/0x270 > Mar 25 19:48:46 better kernel: [<c0205a23>] kernel_thread_helper+0x7/0x14 > Mar 25 19:48:46 better kernel: [<ffffffff>] 0xffffffff > Mar 25 19:48:46 better kernel: irq event stamp: 174638 > Mar 25 19:48:46 better kernel: hardirqs last enabled at (174637): > [<c0204ecf>] > restore_nocheck+0x12/0x15 > Mar 25 19:48:46 better kernel: hardirqs last disabled at (174638): > [<c044ac65>] > _spin_lock_irqsave+0x15/0x60 > Mar 25 19:48:46 better kernel: softirqs last enabled at (174636): > [<c0225595>] > do_softirq+0x45/0x50 > Mar 25 19:48:46 better kernel: softirqs last disabled at (174629): > [<c0225595>] > do_softirq+0x45/0x50 > Mar 25 19:48:46 better kernel: > Mar 25 19:48:46 better kernel: other info that might help us debug this: > Mar 25 19:48:46 better kernel: 3 locks held by syslogd/2773: > Mar 25 19:48:46 better kernel: #0: (&tty->atomic_write_lock){--..}, at: > [<c036c699>] tty_write_lock+0x19/0x50 > Mar 25 19:48:46 better kernel: #1: (target_list_lock){--..}, at: > [<c0399c0d>] > write_msg+0x2d/0xd0 > Mar 25 19:48:46 better kernel: #2: (&napi->poll_lock){-+..}, at: > [<c03db11c>] > netpoll_poll+0xbc/0x390 > Mar 25 19:48:46 better kernel: > Mar 25 19:48:46 better kernel: stack backtrace: > Mar 25 19:48:46 better kernel: Pid: 2773, comm: syslogd Not tainted > 2.6.25-rc6-lg #4 > Mar 25 19:48:46 better kernel: [<c023ff6f>] print_usage_bug+0x15f/0x170 > Mar 25 19:48:46 better kernel: [<c0240bd1>] mark_lock+0x401/0x590 > Mar 25 19:48:46 better kernel: [<c0240d98>] mark_held_locks+0x38/0x70 > Mar 25 19:48:46 better kernel: [<c0398a43>] sky2_poll+0x7a3/0xad0 > Mar 25 19:48:46 better kernel: [<c0240eec>] trace_hardirqs_on+0x9c/0x110 > Mar 25 19:48:46 better kernel: [<c0398a43>] sky2_poll+0x7a3/0xad0 > Mar 25 19:48:46 better kernel: [<c03db11c>] netpoll_poll+0xbc/0x390 > Mar 25 19:48:46 better kernel: [<c044adb2>] _spin_trylock+0x42/0x80 > Mar 25 19:48:46 better kernel: [<c03db0d3>] netpoll_poll+0x73/0x390 > Mar 25 19:48:46 better kernel: [<c03db51f>] netpoll_send_skb+0x12f/0x1d0 > Mar 25 19:48:46 better kernel: [<c0399c7c>] write_msg+0x9c/0xd0 > Mar 25 19:48:46 better kernel: [<c0220bd4>] __call_console_drivers+0x54/0x70 > Mar 25 19:48:46 better kernel: [<c0220d90>] release_console_sem+0xd0/0x1e0 > Mar 25 19:48:46 better kernel: [<c037a8d9>] do_con_write+0x189/0x1a80 > Mar 25 19:48:46 better kernel: [<c0240d98>] mark_held_locks+0x38/0x70 > Mar 25 19:48:46 better kernel: [<c044affc>] > _spin_unlock_irqrestore+0x4c/0x60 > Mar 25 19:48:46 better kernel: [<c0240eec>] trace_hardirqs_on+0x9c/0x110 > Mar 25 19:48:46 better kernel: [<c044afe7>] > _spin_unlock_irqrestore+0x37/0x60 > Mar 25 19:48:46 better kernel: [<c037c1f7>] con_put_char+0x27/0x30 > Mar 25 19:48:46 better kernel: [<c036e6ed>] opost+0xad/0x220 > Mar 25 19:48:46 better kernel: [<c036f08b>] write_chan+0x1ab/0x350 > Mar 25 19:48:46 better kernel: [<c021a790>] default_wake_function+0x0/0x10 > Mar 25 19:48:46 better kernel: [<c036c87b>] tty_write+0x13b/0x1d0 > Mar 25 19:48:47 better kernel: [<c036eee0>] write_chan+0x0/0x350 > Mar 25 19:48:47 better kernel: [<c02802e3>] do_loop_readv_writev+0x33/0x60 > Mar 25 19:48:47 better kernel: [<c0280d50>] do_readv_writev+0x170/0x180 > Mar 25 19:48:47 better kernel: [<c036c740>] tty_write+0x0/0x1d0 > Mar 25 19:48:47 better kernel: [<c0280d9c>] vfs_writev+0x3c/0x50 > Mar 25 19:48:47 better kernel: [<c0281261>] sys_writev+0x41/0x70 > Mar 25 19:48:47 better kernel: [<c0204de6>] sysenter_past_esp+0x5f/0xa5 > Mar 25 19:48:47 better kernel: ======================= > Andrew Morton wrote, On 03/25/2008 09:43 PM: > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > On Tue, 25 Mar 2008 13:21:23 -0700 (PDT) > bugme-daemon@bugzilla.kernel.org wrote: > >> http://bugzilla.kernel.org/show_bug.cgi?id=10326 >> >> Summary: inconsistent lock state in net_rx_action >> Product: Networking >> Version: 2.5 >> KernelVersion: 2.6.25-rc6 (git >> a4083c9271e0a697278e089f2c0b9a95363ada0a) >> Platform: All >> OS/Version: Linux >> Tree: Mainline >> Status: NEW >> Severity: normal >> Priority: P1 >> Component: Other >> AssignedTo: acme@ghostprotocols.net >> ReportedBy: marcus@better.se >> >> >> Latest working kernel version: 2.6.24 >> Earliest failing kernel version: 2.6.25-rc6 (git >> a4083c9271e0a697278e089f2c0b9a95363ada0a) > > A post-2.624 regression. > >> Distribution: Debian testing/unstable >> Hardware Environment: LG LE50 Express laptop, i386 >> Software Environment: Debian i386, X.org 7.3, KDE 4 >> >> Problem Description: >> >> Got this during bootup, somewhere before launching X. It only happened once, >> I >> have rebooted a few times and didn't reproduce it. >> >> Mar 25 19:48:46 better kernel: ================================= >> Mar 25 19:48:46 better kernel: [ INFO: inconsistent lock state ] >> Mar 25 19:48:46 better kernel: 2.6.25-rc6-lg #4 >> Mar 25 19:48:46 better kernel: --------------------------------- >> Mar 25 19:48:46 better kernel: inconsistent {in-softirq-W} -> {softirq-on-W} >> usage. >> Mar 25 19:48:46 better kernel: syslogd/2773 [HC0[0]:SC0[0]:HE1:SE1] takes: >> Mar 25 19:48:46 better kernel: (&napi->poll_lock){-+..}, at: [<c03db11c>] >> netpoll_poll+0xbc/0x390 >> Mar 25 19:48:46 better kernel: {in-softirq-W} state was registered at: >> Mar 25 19:48:46 better kernel: [<c024171a>] __lock_acquire+0x36a/0x1070 >> Mar 25 19:48:46 better kernel: [<c024153f>] __lock_acquire+0x18f/0x1070 >> Mar 25 19:48:46 better kernel: [<c024247f>] lock_acquire+0x5f/0x80 >> Mar 25 19:48:46 better kernel: [<c03cfafc>] net_rx_action+0xbc/0x1d0 >> Mar 25 19:48:46 better kernel: [<c044a9e4>] _spin_lock+0x34/0x40 >> Mar 25 19:48:46 better kernel: [<c03cfafc>] net_rx_action+0xbc/0x1d0 >> Mar 25 19:48:46 better kernel: [<c03cfafc>] net_rx_action+0xbc/0x1d0 >> Mar 25 19:48:46 better kernel: [<c03cfaa0>] net_rx_action+0x60/0x1d0 >> Mar 25 19:48:46 better kernel: [<c02254f2>] __do_softirq+0x62/0xc0 >> Mar 25 19:48:46 better kernel: [<c0225595>] do_softirq+0x45/0x50 >> Mar 25 19:48:46 better kernel: [<c0225797>] irq_exit+0x77/0x80 >> Mar 25 19:48:46 better kernel: [<c0211a95>] >> smp_apic_timer_interrupt+0x45/0x80 >> Mar 25 19:48:46 better kernel: [<c02058ad>] apic_timer_interrupt+0x29/0x38 >> Mar 25 19:48:46 better kernel: [<c02058b7>] apic_timer_interrupt+0x33/0x38 >> Mar 25 19:48:46 better kernel: [<c022007b>] __set_personality+0x2b/0x240 >> Mar 25 19:48:46 better kernel: [<c0220e51>] >> release_console_sem+0x191/0x1e0 >> Mar 25 19:48:46 better kernel: [<c05751aa>] init_netconsole+0x1da/0x1f0 >> Mar 25 19:48:46 better kernel: [<c0558700>] kernel_init+0x90/0x270 >> Mar 25 19:48:46 better kernel: [<c0204ecf>] restore_nocheck+0x12/0x15 >> Mar 25 19:48:46 better kernel: [<c0240eec>] trace_hardirqs_on+0x9c/0x110 >> Mar 25 19:48:46 better kernel: [<c0558670>] kernel_init+0x0/0x270 >> Mar 25 19:48:46 better kernel: [<c0558670>] kernel_init+0x0/0x270 >> Mar 25 19:48:46 better kernel: [<c0205a23>] kernel_thread_helper+0x7/0x14 >> Mar 25 19:48:46 better kernel: [<ffffffff>] 0xffffffff >> Mar 25 19:48:46 better kernel: irq event stamp: 174638 >> Mar 25 19:48:46 better kernel: hardirqs last enabled at (174637): >> [<c0204ecf>] >> restore_nocheck+0x12/0x15 >> Mar 25 19:48:46 better kernel: hardirqs last disabled at (174638): >> [<c044ac65>] >> _spin_lock_irqsave+0x15/0x60 >> Mar 25 19:48:46 better kernel: softirqs last enabled at (174636): >> [<c0225595>] >> do_softirq+0x45/0x50 >> Mar 25 19:48:46 better kernel: softirqs last disabled at (174629): >> [<c0225595>] >> do_softirq+0x45/0x50 >> Mar 25 19:48:46 better kernel: >> Mar 25 19:48:46 better kernel: other info that might help us debug this: >> Mar 25 19:48:46 better kernel: 3 locks held by syslogd/2773: >> Mar 25 19:48:46 better kernel: #0: (&tty->atomic_write_lock){--..}, at: >> [<c036c699>] tty_write_lock+0x19/0x50 >> Mar 25 19:48:46 better kernel: #1: (target_list_lock){--..}, at: >> [<c0399c0d>] >> write_msg+0x2d/0xd0 >> Mar 25 19:48:46 better kernel: #2: (&napi->poll_lock){-+..}, at: >> [<c03db11c>] >> netpoll_poll+0xbc/0x390 >> Mar 25 19:48:46 better kernel: >> Mar 25 19:48:46 better kernel: stack backtrace: >> Mar 25 19:48:46 better kernel: Pid: 2773, comm: syslogd Not tainted >> 2.6.25-rc6-lg #4 >> Mar 25 19:48:46 better kernel: [<c023ff6f>] print_usage_bug+0x15f/0x170 >> Mar 25 19:48:46 better kernel: [<c0240bd1>] mark_lock+0x401/0x590 >> Mar 25 19:48:46 better kernel: [<c0240d98>] mark_held_locks+0x38/0x70 >> Mar 25 19:48:46 better kernel: [<c0398a43>] sky2_poll+0x7a3/0xad0 >> Mar 25 19:48:46 better kernel: [<c0240eec>] trace_hardirqs_on+0x9c/0x110 >> Mar 25 19:48:46 better kernel: [<c0398a43>] sky2_poll+0x7a3/0xad0 >> Mar 25 19:48:46 better kernel: [<c03db11c>] netpoll_poll+0xbc/0x390 >> Mar 25 19:48:46 better kernel: [<c044adb2>] _spin_trylock+0x42/0x80 >> Mar 25 19:48:46 better kernel: [<c03db0d3>] netpoll_poll+0x73/0x390 >> Mar 25 19:48:46 better kernel: [<c03db51f>] netpoll_send_skb+0x12f/0x1d0 >> Mar 25 19:48:46 better kernel: [<c0399c7c>] write_msg+0x9c/0xd0 netconsole's write_msg calls netpoll_send_udp (then netpoll_send_skb) always under spin_lock_irqsave(&target_list_lock, flags), and here this lock (and afterwards napi->poll_lock) is seen taken with irqs enabled. So, IMHO, it looks like some other irq (tracking?) imbalance and not networking bug. Regards, Jarek P. PS: linux-kernel added to CC. >> Mar 25 19:48:46 better kernel: [<c0220bd4>] >> __call_console_drivers+0x54/0x70 >> Mar 25 19:48:46 better kernel: [<c0220d90>] release_console_sem+0xd0/0x1e0 >> Mar 25 19:48:46 better kernel: [<c037a8d9>] do_con_write+0x189/0x1a80 >> Mar 25 19:48:46 better kernel: [<c0240d98>] mark_held_locks+0x38/0x70 >> Mar 25 19:48:46 better kernel: [<c044affc>] >> _spin_unlock_irqrestore+0x4c/0x60 >> Mar 25 19:48:46 better kernel: [<c0240eec>] trace_hardirqs_on+0x9c/0x110 >> Mar 25 19:48:46 better kernel: [<c044afe7>] >> _spin_unlock_irqrestore+0x37/0x60 >> Mar 25 19:48:46 better kernel: [<c037c1f7>] con_put_char+0x27/0x30 >> Mar 25 19:48:46 better kernel: [<c036e6ed>] opost+0xad/0x220 >> Mar 25 19:48:46 better kernel: [<c036f08b>] write_chan+0x1ab/0x350 >> Mar 25 19:48:46 better kernel: [<c021a790>] default_wake_function+0x0/0x10 >> Mar 25 19:48:46 better kernel: [<c036c87b>] tty_write+0x13b/0x1d0 >> Mar 25 19:48:47 better kernel: [<c036eee0>] write_chan+0x0/0x350 >> Mar 25 19:48:47 better kernel: [<c02802e3>] do_loop_readv_writev+0x33/0x60 >> Mar 25 19:48:47 better kernel: [<c0280d50>] do_readv_writev+0x170/0x180 >> Mar 25 19:48:47 better kernel: [<c036c740>] tty_write+0x0/0x1d0 >> Mar 25 19:48:47 better kernel: [<c0280d9c>] vfs_writev+0x3c/0x50 >> Mar 25 19:48:47 better kernel: [<c0281261>] sys_writev+0x41/0x70 >> Mar 25 19:48:47 better kernel: [<c0204de6>] sysenter_past_esp+0x5f/0xa5 >> Mar 25 19:48:47 better kernel: ======================= >> > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Reply-To: akpm@linux-foundation.org On Thu, 27 Mar 2008 00:13:41 +0100 Jarek Poplawski <jarkao2@gmail.com> wrote: > Andrew Morton wrote, On 03/25/2008 09:43 PM: > > > (switched to email. Please respond via emailed reply-to-all, not via the > > bugzilla web interface). > > > > On Tue, 25 Mar 2008 13:21:23 -0700 (PDT) > > bugme-daemon@bugzilla.kernel.org wrote: > > > >> http://bugzilla.kernel.org/show_bug.cgi?id=10326 > >> > >> Summary: inconsistent lock state in net_rx_action > >> Product: Networking > >> Version: 2.5 > >> KernelVersion: 2.6.25-rc6 (git > >> a4083c9271e0a697278e089f2c0b9a95363ada0a) > >> Platform: All > >> OS/Version: Linux > >> Tree: Mainline > >> Status: NEW > >> Severity: normal > >> Priority: P1 > >> Component: Other > >> AssignedTo: acme@ghostprotocols.net > >> ReportedBy: marcus@better.se > >> > >> > >> Latest working kernel version: 2.6.24 > >> Earliest failing kernel version: 2.6.25-rc6 (git > >> a4083c9271e0a697278e089f2c0b9a95363ada0a) > > > > A post-2.624 regression. > > > >> Distribution: Debian testing/unstable > >> Hardware Environment: LG LE50 Express laptop, i386 > >> Software Environment: Debian i386, X.org 7.3, KDE 4 > >> > >> Problem Description: > >> > >> Got this during bootup, somewhere before launching X. It only happened > once, I > >> have rebooted a few times and didn't reproduce it. > >> > >> Mar 25 19:48:46 better kernel: ================================= > >> Mar 25 19:48:46 better kernel: [ INFO: inconsistent lock state ] > >> Mar 25 19:48:46 better kernel: 2.6.25-rc6-lg #4 > >> Mar 25 19:48:46 better kernel: --------------------------------- > >> Mar 25 19:48:46 better kernel: inconsistent {in-softirq-W} -> > {softirq-on-W} > >> usage. > >> Mar 25 19:48:46 better kernel: syslogd/2773 [HC0[0]:SC0[0]:HE1:SE1] takes: > >> Mar 25 19:48:46 better kernel: (&napi->poll_lock){-+..}, at: [<c03db11c>] > >> netpoll_poll+0xbc/0x390 > >> Mar 25 19:48:46 better kernel: {in-softirq-W} state was registered at: > >> Mar 25 19:48:46 better kernel: [<c024171a>] __lock_acquire+0x36a/0x1070 > >> Mar 25 19:48:46 better kernel: [<c024153f>] __lock_acquire+0x18f/0x1070 > >> Mar 25 19:48:46 better kernel: [<c024247f>] lock_acquire+0x5f/0x80 > >> Mar 25 19:48:46 better kernel: [<c03cfafc>] net_rx_action+0xbc/0x1d0 > >> Mar 25 19:48:46 better kernel: [<c044a9e4>] _spin_lock+0x34/0x40 > >> Mar 25 19:48:46 better kernel: [<c03cfafc>] net_rx_action+0xbc/0x1d0 > >> Mar 25 19:48:46 better kernel: [<c03cfafc>] net_rx_action+0xbc/0x1d0 > >> Mar 25 19:48:46 better kernel: [<c03cfaa0>] net_rx_action+0x60/0x1d0 > >> Mar 25 19:48:46 better kernel: [<c02254f2>] __do_softirq+0x62/0xc0 > >> Mar 25 19:48:46 better kernel: [<c0225595>] do_softirq+0x45/0x50 > >> Mar 25 19:48:46 better kernel: [<c0225797>] irq_exit+0x77/0x80 > >> Mar 25 19:48:46 better kernel: [<c0211a95>] > >> smp_apic_timer_interrupt+0x45/0x80 > >> Mar 25 19:48:46 better kernel: [<c02058ad>] > apic_timer_interrupt+0x29/0x38 > >> Mar 25 19:48:46 better kernel: [<c02058b7>] > apic_timer_interrupt+0x33/0x38 > >> Mar 25 19:48:46 better kernel: [<c022007b>] __set_personality+0x2b/0x240 > >> Mar 25 19:48:46 better kernel: [<c0220e51>] > release_console_sem+0x191/0x1e0 > >> Mar 25 19:48:46 better kernel: [<c05751aa>] init_netconsole+0x1da/0x1f0 > >> Mar 25 19:48:46 better kernel: [<c0558700>] kernel_init+0x90/0x270 > >> Mar 25 19:48:46 better kernel: [<c0204ecf>] restore_nocheck+0x12/0x15 > >> Mar 25 19:48:46 better kernel: [<c0240eec>] trace_hardirqs_on+0x9c/0x110 > >> Mar 25 19:48:46 better kernel: [<c0558670>] kernel_init+0x0/0x270 > >> Mar 25 19:48:46 better kernel: [<c0558670>] kernel_init+0x0/0x270 > >> Mar 25 19:48:46 better kernel: [<c0205a23>] > kernel_thread_helper+0x7/0x14 > >> Mar 25 19:48:46 better kernel: [<ffffffff>] 0xffffffff > >> Mar 25 19:48:46 better kernel: irq event stamp: 174638 > >> Mar 25 19:48:46 better kernel: hardirqs last enabled at (174637): > [<c0204ecf>] > >> restore_nocheck+0x12/0x15 > >> Mar 25 19:48:46 better kernel: hardirqs last disabled at (174638): > [<c044ac65>] > >> _spin_lock_irqsave+0x15/0x60 > >> Mar 25 19:48:46 better kernel: softirqs last enabled at (174636): > [<c0225595>] > >> do_softirq+0x45/0x50 > >> Mar 25 19:48:46 better kernel: softirqs last disabled at (174629): > [<c0225595>] > >> do_softirq+0x45/0x50 > >> Mar 25 19:48:46 better kernel: > >> Mar 25 19:48:46 better kernel: other info that might help us debug this: > >> Mar 25 19:48:46 better kernel: 3 locks held by syslogd/2773: > >> Mar 25 19:48:46 better kernel: #0: (&tty->atomic_write_lock){--..}, at: > >> [<c036c699>] tty_write_lock+0x19/0x50 > >> Mar 25 19:48:46 better kernel: #1: (target_list_lock){--..}, at: > [<c0399c0d>] > >> write_msg+0x2d/0xd0 > >> Mar 25 19:48:46 better kernel: #2: (&napi->poll_lock){-+..}, at: > [<c03db11c>] > >> netpoll_poll+0xbc/0x390 > >> Mar 25 19:48:46 better kernel: > >> Mar 25 19:48:46 better kernel: stack backtrace: > >> Mar 25 19:48:46 better kernel: Pid: 2773, comm: syslogd Not tainted > >> 2.6.25-rc6-lg #4 > >> Mar 25 19:48:46 better kernel: [<c023ff6f>] print_usage_bug+0x15f/0x170 > >> Mar 25 19:48:46 better kernel: [<c0240bd1>] mark_lock+0x401/0x590 > >> Mar 25 19:48:46 better kernel: [<c0240d98>] mark_held_locks+0x38/0x70 > >> Mar 25 19:48:46 better kernel: [<c0398a43>] sky2_poll+0x7a3/0xad0 > >> Mar 25 19:48:46 better kernel: [<c0240eec>] trace_hardirqs_on+0x9c/0x110 > >> Mar 25 19:48:46 better kernel: [<c0398a43>] sky2_poll+0x7a3/0xad0 > >> Mar 25 19:48:46 better kernel: [<c03db11c>] netpoll_poll+0xbc/0x390 > >> Mar 25 19:48:46 better kernel: [<c044adb2>] _spin_trylock+0x42/0x80 > >> Mar 25 19:48:46 better kernel: [<c03db0d3>] netpoll_poll+0x73/0x390 > >> Mar 25 19:48:46 better kernel: [<c03db51f>] netpoll_send_skb+0x12f/0x1d0 > >> Mar 25 19:48:46 better kernel: [<c0399c7c>] write_msg+0x9c/0xd0 > > > netconsole's write_msg calls netpoll_send_udp (then netpoll_send_skb) always > under spin_lock_irqsave(&target_list_lock, flags), and here this lock (and > afterwards napi->poll_lock) is seen taken with irqs enabled. So, IMHO, it > looks like some other irq (tracking?) imbalance and not networking bug. > No, it's not an irq_disable() thing, directly. What lockdep is saying is that sky2_poll() is taking napi->poll_lock for writing with softirqs enabled, but net_rx_action() takes the same lock from within softirq context. If sky2_poll() always takes napi->poll_lock under local_irq_disable() then that would be a lockdep bug. On Wed, Mar 26, 2008 at 05:14:03PM -0700, Andrew Morton wrote: ... > > >> http://bugzilla.kernel.org/show_bug.cgi?id=10326 ... > No, it's not an irq_disable() thing, directly. > > What lockdep is saying is that sky2_poll() is taking napi->poll_lock for > writing with softirqs enabled, but net_rx_action() takes the same lock from > within softirq context. > > If sky2_poll() always takes napi->poll_lock under local_irq_disable() then > that would be a lockdep bug. sky2_poll() doesn't take napi->poll_lock; this lock is taken by netpoll_poll() before calling sky2_poll(). And before this hardirqs are disabled in write_msg(). So, theoretically lockdep could be right if sky2_poll() would enable irqs after this. (If it were done in netpoll - lockdep should warn before or after sky2_poll() call.) But I really can't see any such possibility in sky2_poll(). Regards, Jarek P. On Thu, Mar 27, 2008 at 09:55:42AM +0100, Jarek Poplawski wrote:
...
> sky2_poll() doesn't take napi->poll_lock; this lock is taken by
> netpoll_poll() before calling sky2_poll(). And before this hardirqs
> are disabled in write_msg().
Actually, hardirqs are disabled again before netpoll_poll() (in
netpoll_send_skb()), so sky2_poll() should need enable_irqs 2x
to justify such a warning.
Jarek P.
Reply-To: akpm@linux-foundation.org On Thu, 27 Mar 2008 09:55:42 +0100 Jarek Poplawski <jarkao2@gmail.com> wrote: > On Wed, Mar 26, 2008 at 05:14:03PM -0700, Andrew Morton wrote: > ... > > > >> http://bugzilla.kernel.org/show_bug.cgi?id=10326 > ... > > No, it's not an irq_disable() thing, directly. > > > > What lockdep is saying is that sky2_poll() is taking napi->poll_lock for > > writing with softirqs enabled, but net_rx_action() takes the same lock from > > within softirq context. > > > > If sky2_poll() always takes napi->poll_lock under local_irq_disable() then > > that would be a lockdep bug. > > sky2_poll() doesn't take napi->poll_lock; this lock is taken by > netpoll_poll() before calling sky2_poll(). And before this hardirqs > are disabled in write_msg(). So, theoretically lockdep could be right > if sky2_poll() would enable irqs after this. (If it were done in > netpoll - lockdep should warn before or after sky2_poll() call.) > But I really can't see any such possibility in sky2_poll(). I can't spot it from a five-minute read either. gcc's autoinlining really makes this sort of thing much harder than it used to be :( Anyway, the accusation is that lockdep is busted, in that it doesn't realise that local_irq_disable() blocks softirqs. I bet the net code is wrong and we missed it ;) On Thu, Mar 27, 2008 at 02:18:12AM -0700, Andrew Morton wrote: ... > Anyway, the accusation is that lockdep is busted, in that it doesn't realise > that > local_irq_disable() blocks softirqs. Usually lockdep gets this right, so I've some doubts too. But here it's not only about softirqs; this: > >> Mar 25 19:48:46 better kernel: #1: (target_list_lock){--..}, at: > [<c0399c0d>] means lockdep saw hardirqs enabled while acquiring/holding this lock. Jarek P. On Thu, 2008-03-27 at 02:18 -0700, Andrew Morton wrote:
> On Thu, 27 Mar 2008 09:55:42 +0100 Jarek Poplawski <jarkao2@gmail.com> wrote:
>
> > On Wed, Mar 26, 2008 at 05:14:03PM -0700, Andrew Morton wrote:
> > ...
> > > > >> http://bugzilla.kernel.org/show_bug.cgi?id=10326
> > ...
> > > No, it's not an irq_disable() thing, directly.
> > >
> > > What lockdep is saying is that sky2_poll() is taking napi->poll_lock for
> > > writing with softirqs enabled, but net_rx_action() takes the same lock
> from
> > > within softirq context.
> > >
> > > If sky2_poll() always takes napi->poll_lock under local_irq_disable()
> then
> > > that would be a lockdep bug.
> >
> > sky2_poll() doesn't take napi->poll_lock; this lock is taken by
> > netpoll_poll() before calling sky2_poll(). And before this hardirqs
> > are disabled in write_msg(). So, theoretically lockdep could be right
> > if sky2_poll() would enable irqs after this. (If it were done in
> > netpoll - lockdep should warn before or after sky2_poll() call.)
> > But I really can't see any such possibility in sky2_poll().
>
> I can't spot it from a five-minute read either. gcc's autoinlining really
> makes this sort of thing much harder than it used to be :(
>
> Anyway, the accusation is that lockdep is busted, in that it doesn't realise
> that
> local_irq_disable() blocks softirqs.
>
> I bet the net code is wrong and we missed it ;)
How about this:
<irqs disabled>
netpoll_poll()
poll_napi()
spin_trylock(&napi->poll_lock)
poll_one_napi()
napi->poll() := sky2_poll()
napi_complete()
local_irq_disable()
local_irq_enable() <--- *BUG*
<irq>
irq_exit()
do_softirq()
net_rx_action()
spin_lock(&napi->poll_lock) <--- Deadlock!
Because we still hold the lock....
From: Peter Zijlstra <a.p.zijlstra@chello.nl> Date: Thu, 27 Mar 2008 11:56:19 +0100 > How about this: > > <irqs disabled> > > netpoll_poll() > poll_napi() > spin_trylock(&napi->poll_lock) > poll_one_napi() > napi->poll() := sky2_poll() > napi_complete() > local_irq_disable() > local_irq_enable() <--- *BUG* > > <irq> > irq_exit() > do_softirq() > net_rx_action() > spin_lock(&napi->poll_lock) <--- Deadlock! > > Because we still hold the lock.... Yep, that will deadlock. Can some test if using local_irq_{save,restore}() in napi_complete() cures this lockdep warning? Thanks. David Miller wrote:
> Can some test if using local_irq_{save,restore}() in
> napi_complete() cures this lockdep warning?
I cannot reproduce the bug, so cannot help much.
Marcus
On Thu, Mar 27, 2008 at 11:56:19AM +0100, Peter Zijlstra wrote: > On Thu, 2008-03-27 at 02:18 -0700, Andrew Morton wrote: ... > > I bet the net code is wrong and we missed it ;) It looks like you are natural born winner! Congratulations! > How about this: > > <irqs disabled> > > netpoll_poll() > poll_napi() > spin_trylock(&napi->poll_lock) > poll_one_napi() > napi->poll() := sky2_poll() > napi_complete() > local_irq_disable() > local_irq_enable() <--- *BUG* Yes! I missed it's unconditional here... Great catch! On the other hand, still a question why lockdep doesn't see this every day? Jarek P. On Thu, Mar 27, 2008 at 12:59:02PM +0100, Marcus Better wrote: > David Miller wrote: >> Can some test if using local_irq_{save,restore}() in >> napi_complete() cures this lockdep warning? Isn't it now justified enough without lockdep's ack? > I cannot reproduce the bug, so cannot help much. Marcus, you've helped here very much and enough! Thanks, Jarek P. On Thu, 2008-03-27 at 13:22 +0100, Jarek Poplawski wrote:
> On Thu, Mar 27, 2008 at 11:56:19AM +0100, Peter Zijlstra wrote:
> > On Thu, 2008-03-27 at 02:18 -0700, Andrew Morton wrote:
> ....
> > > I bet the net code is wrong and we missed it ;)
>
> It looks like you are natural born winner! Congratulations!
>
> > How about this:
> >
> > <irqs disabled>
> >
> > netpoll_poll()
> > poll_napi()
> > spin_trylock(&napi->poll_lock)
> > poll_one_napi()
> > napi->poll() := sky2_poll()
> > napi_complete()
> > local_irq_disable()
> > local_irq_enable() <--- *BUG*
>
> Yes! I missed it's unconditional here... Great catch!
>
> On the other hand, still a question why lockdep doesn't see this
> every day?
My guess is that it is a race between polling the device and irq pushing
the packet. That is, normally the IRQ handler wins and netpoll doesn't
have anything to do and it doesn't traverse this code path.
(although I must admit to being a little out of my depth here)
On Thu, Mar 27, 2008 at 01:30:14PM +0100, Peter Zijlstra wrote:
...
> My guess is that it is a race between polling the device and irq pushing
> the packet. That is, normally the IRQ handler wins and netpoll doesn't
> have anything to do and it doesn't traverse this code path.
>
> (although I must admit to being a little out of my depth here)
Probably you're right but there were really a lot of testing and
stressing netconsole for misterious lockups for quite a long time
with this new napi, so it's hard to believe this was hidden so
well...
Jarek P.
From: Jarek Poplawski <jarkao2@gmail.com> Date: Thu, 27 Mar 2008 13:32:48 +0100 > On Thu, Mar 27, 2008 at 12:59:02PM +0100, Marcus Better wrote: > > David Miller wrote: > >> Can some test if using local_irq_{save,restore}() in > >> napi_complete() cures this lockdep warning? > > Isn't it now justified enough without lockdep's ack? Absolutely, but I want to know that the problem is solved and confirmed by a tester. On Thursday, 27 of March 2008, Jarek Poplawski wrote:
> On Thu, Mar 27, 2008 at 01:30:14PM +0100, Peter Zijlstra wrote:
> ...
> > My guess is that it is a race between polling the device and irq pushing
> > the packet. That is, normally the IRQ handler wins and netpoll doesn't
> > have anything to do and it doesn't traverse this code path.
> >
> > (although I must admit to being a little out of my depth here)
>
> Probably you're right but there were really a lot of testing and
> stressing netconsole for misterious lockups for quite a long time
> with this new napi, so it's hard to believe this was hidden so
> well...
I assume there will be a patch posted for this issue. Correct?
Rafael
On Thu, Mar 27, 2008 at 10:56:49PM +0100, Rafael J. Wysocki wrote:
...
> I assume there will be a patch posted for this issue. Correct?
I guess Peter should know the answer!
Jarek P.
From: Jarek Poplawski <jarkao2@gmail.com> Date: Thu, 27 Mar 2008 23:22:48 +0100 > On Thu, Mar 27, 2008 at 10:56:49PM +0100, Rafael J. Wysocki wrote: > ... > > I assume there will be a patch posted for this issue. Correct? > > I guess Peter should know the answer! Instead of going back and forth like this can someone post just take it upon themselves to post such an obvious patch already :-) All of this "he'll do it, she'll do it" just wastes time. diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index a2f0032..fae6a7e 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -383,9 +383,11 @@ static inline void __napi_complete(struct napi_struct *n) static inline void napi_complete(struct napi_struct *n) { - local_irq_disable(); + unsigned long flags; + + local_irq_save(flags); __napi_complete(n); - local_irq_enable(); + local_irq_restore(flags); } /** Regressions list annotation: Patch : http://bugzilla.kernel.org/show_bug.cgi?id=10326#c20 On Thu, Mar 27, 2008 at 03:22:18PM -0700, David Miller wrote:
...
> Instead of going back and forth like this can someone post
> just take it upon themselves to post such an obvious patch
> already :-)
>
> All of this "he'll do it, she'll do it" just wastes time.
Right! She or he should also remember to add Peter's lockup
scenario and his signed-off-by...
Jarek P.
From: Jarek Poplawski <jarkao2@gmail.com> Date: Thu, 27 Mar 2008 23:39:13 +0100 > On Thu, Mar 27, 2008 at 03:22:18PM -0700, David Miller wrote: > ... > > Instead of going back and forth like this can someone post > > just take it upon themselves to post such an obvious patch > > already :-) > > > > All of this "he'll do it, she'll do it" just wastes time. > > Right! She or he should also remember to add Peter's lockup > scenario and his signed-off-by... I didn't put in a Signed-off-by since Peter didn't give one, but I did give him credit for the discovery and I did list explicitly his given logic. On Thu, 2008-03-27 at 17:59 -0700, David Miller wrote: > From: Jarek Poplawski <jarkao2@gmail.com> > Date: Thu, 27 Mar 2008 23:39:13 +0100 > > > On Thu, Mar 27, 2008 at 03:22:18PM -0700, David Miller wrote: > > ... > > > Instead of going back and forth like this can someone post > > > just take it upon themselves to post such an obvious patch > > > already :-) > > > > > > All of this "he'll do it, she'll do it" just wastes time. > > > > Right! She or he should also remember to add Peter's lockup > > scenario and his signed-off-by... > > I didn't put in a Signed-off-by since Peter didn't give > one, but I did give him credit for the discovery and I > did list explicitly his given logic. Hehe, nr_emails > loc_changed situation alert! Fwiw here's my ACK: Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> |