Latest working kernel version: 2.6.24 Earliest failing kernel version: 2.6.25-rc6 (git a4083c9271e0a697278e089f2c0b9a95363ada0a) Distribution: Debian testing/unstable Hardware Environment: LG LE50 Express laptop, i386 Software Environment: Debian i386, X.org 7.3, KDE 4 Problem Description: Got this during bootup, somewhere before launching X. It only happened once, I have rebooted a few times and didn't reproduce it. Mar 25 19:48:46 better kernel: ================================= Mar 25 19:48:46 better kernel: [ INFO: inconsistent lock state ] Mar 25 19:48:46 better kernel: 2.6.25-rc6-lg #4 Mar 25 19:48:46 better kernel: --------------------------------- Mar 25 19:48:46 better kernel: inconsistent {in-softirq-W} -> {softirq-on-W} usage. Mar 25 19:48:46 better kernel: syslogd/2773 [HC0[0]:SC0[0]:HE1:SE1] takes: Mar 25 19:48:46 better kernel: (&napi->poll_lock){-+..}, at: [<c03db11c>] netpoll_poll+0xbc/0x390 Mar 25 19:48:46 better kernel: {in-softirq-W} state was registered at: Mar 25 19:48:46 better kernel: [<c024171a>] __lock_acquire+0x36a/0x1070 Mar 25 19:48:46 better kernel: [<c024153f>] __lock_acquire+0x18f/0x1070 Mar 25 19:48:46 better kernel: [<c024247f>] lock_acquire+0x5f/0x80 Mar 25 19:48:46 better kernel: [<c03cfafc>] net_rx_action+0xbc/0x1d0 Mar 25 19:48:46 better kernel: [<c044a9e4>] _spin_lock+0x34/0x40 Mar 25 19:48:46 better kernel: [<c03cfafc>] net_rx_action+0xbc/0x1d0 Mar 25 19:48:46 better kernel: [<c03cfafc>] net_rx_action+0xbc/0x1d0 Mar 25 19:48:46 better kernel: [<c03cfaa0>] net_rx_action+0x60/0x1d0 Mar 25 19:48:46 better kernel: [<c02254f2>] __do_softirq+0x62/0xc0 Mar 25 19:48:46 better kernel: [<c0225595>] do_softirq+0x45/0x50 Mar 25 19:48:46 better kernel: [<c0225797>] irq_exit+0x77/0x80 Mar 25 19:48:46 better kernel: [<c0211a95>] smp_apic_timer_interrupt+0x45/0x80 Mar 25 19:48:46 better kernel: [<c02058ad>] apic_timer_interrupt+0x29/0x38 Mar 25 19:48:46 better kernel: [<c02058b7>] apic_timer_interrupt+0x33/0x38 Mar 25 19:48:46 better kernel: [<c022007b>] __set_personality+0x2b/0x240 Mar 25 19:48:46 better kernel: [<c0220e51>] release_console_sem+0x191/0x1e0 Mar 25 19:48:46 better kernel: [<c05751aa>] init_netconsole+0x1da/0x1f0 Mar 25 19:48:46 better kernel: [<c0558700>] kernel_init+0x90/0x270 Mar 25 19:48:46 better kernel: [<c0204ecf>] restore_nocheck+0x12/0x15 Mar 25 19:48:46 better kernel: [<c0240eec>] trace_hardirqs_on+0x9c/0x110 Mar 25 19:48:46 better kernel: [<c0558670>] kernel_init+0x0/0x270 Mar 25 19:48:46 better kernel: [<c0558670>] kernel_init+0x0/0x270 Mar 25 19:48:46 better kernel: [<c0205a23>] kernel_thread_helper+0x7/0x14 Mar 25 19:48:46 better kernel: [<ffffffff>] 0xffffffff Mar 25 19:48:46 better kernel: irq event stamp: 174638 Mar 25 19:48:46 better kernel: hardirqs last enabled at (174637): [<c0204ecf>] restore_nocheck+0x12/0x15 Mar 25 19:48:46 better kernel: hardirqs last disabled at (174638): [<c044ac65>] _spin_lock_irqsave+0x15/0x60 Mar 25 19:48:46 better kernel: softirqs last enabled at (174636): [<c0225595>] do_softirq+0x45/0x50 Mar 25 19:48:46 better kernel: softirqs last disabled at (174629): [<c0225595>] do_softirq+0x45/0x50 Mar 25 19:48:46 better kernel: Mar 25 19:48:46 better kernel: other info that might help us debug this: Mar 25 19:48:46 better kernel: 3 locks held by syslogd/2773: Mar 25 19:48:46 better kernel: #0: (&tty->atomic_write_lock){--..}, at: [<c036c699>] tty_write_lock+0x19/0x50 Mar 25 19:48:46 better kernel: #1: (target_list_lock){--..}, at: [<c0399c0d>] write_msg+0x2d/0xd0 Mar 25 19:48:46 better kernel: #2: (&napi->poll_lock){-+..}, at: [<c03db11c>] netpoll_poll+0xbc/0x390 Mar 25 19:48:46 better kernel: Mar 25 19:48:46 better kernel: stack backtrace: Mar 25 19:48:46 better kernel: Pid: 2773, comm: syslogd Not tainted 2.6.25-rc6-lg #4 Mar 25 19:48:46 better kernel: [<c023ff6f>] print_usage_bug+0x15f/0x170 Mar 25 19:48:46 better kernel: [<c0240bd1>] mark_lock+0x401/0x590 Mar 25 19:48:46 better kernel: [<c0240d98>] mark_held_locks+0x38/0x70 Mar 25 19:48:46 better kernel: [<c0398a43>] sky2_poll+0x7a3/0xad0 Mar 25 19:48:46 better kernel: [<c0240eec>] trace_hardirqs_on+0x9c/0x110 Mar 25 19:48:46 better kernel: [<c0398a43>] sky2_poll+0x7a3/0xad0 Mar 25 19:48:46 better kernel: [<c03db11c>] netpoll_poll+0xbc/0x390 Mar 25 19:48:46 better kernel: [<c044adb2>] _spin_trylock+0x42/0x80 Mar 25 19:48:46 better kernel: [<c03db0d3>] netpoll_poll+0x73/0x390 Mar 25 19:48:46 better kernel: [<c03db51f>] netpoll_send_skb+0x12f/0x1d0 Mar 25 19:48:46 better kernel: [<c0399c7c>] write_msg+0x9c/0xd0 Mar 25 19:48:46 better kernel: [<c0220bd4>] __call_console_drivers+0x54/0x70 Mar 25 19:48:46 better kernel: [<c0220d90>] release_console_sem+0xd0/0x1e0 Mar 25 19:48:46 better kernel: [<c037a8d9>] do_con_write+0x189/0x1a80 Mar 25 19:48:46 better kernel: [<c0240d98>] mark_held_locks+0x38/0x70 Mar 25 19:48:46 better kernel: [<c044affc>] _spin_unlock_irqrestore+0x4c/0x60 Mar 25 19:48:46 better kernel: [<c0240eec>] trace_hardirqs_on+0x9c/0x110 Mar 25 19:48:46 better kernel: [<c044afe7>] _spin_unlock_irqrestore+0x37/0x60 Mar 25 19:48:46 better kernel: [<c037c1f7>] con_put_char+0x27/0x30 Mar 25 19:48:46 better kernel: [<c036e6ed>] opost+0xad/0x220 Mar 25 19:48:46 better kernel: [<c036f08b>] write_chan+0x1ab/0x350 Mar 25 19:48:46 better kernel: [<c021a790>] default_wake_function+0x0/0x10 Mar 25 19:48:46 better kernel: [<c036c87b>] tty_write+0x13b/0x1d0 Mar 25 19:48:47 better kernel: [<c036eee0>] write_chan+0x0/0x350 Mar 25 19:48:47 better kernel: [<c02802e3>] do_loop_readv_writev+0x33/0x60 Mar 25 19:48:47 better kernel: [<c0280d50>] do_readv_writev+0x170/0x180 Mar 25 19:48:47 better kernel: [<c036c740>] tty_write+0x0/0x1d0 Mar 25 19:48:47 better kernel: [<c0280d9c>] vfs_writev+0x3c/0x50 Mar 25 19:48:47 better kernel: [<c0281261>] sys_writev+0x41/0x70 Mar 25 19:48:47 better kernel: [<c0204de6>] sysenter_past_esp+0x5f/0xa5 Mar 25 19:48:47 better kernel: =======================
Created attachment 15431 [details] Kernel config
Created attachment 15432 [details] Output of lspci -v
Reply-To: akpm@linux-foundation.org (switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Tue, 25 Mar 2008 13:21:23 -0700 (PDT) bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=10326 > > Summary: inconsistent lock state in net_rx_action > Product: Networking > Version: 2.5 > KernelVersion: 2.6.25-rc6 (git > a4083c9271e0a697278e089f2c0b9a95363ada0a) > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Other > AssignedTo: acme@ghostprotocols.net > ReportedBy: marcus@better.se > > > Latest working kernel version: 2.6.24 > Earliest failing kernel version: 2.6.25-rc6 (git > a4083c9271e0a697278e089f2c0b9a95363ada0a) A post-2.624 regression. > Distribution: Debian testing/unstable > Hardware Environment: LG LE50 Express laptop, i386 > Software Environment: Debian i386, X.org 7.3, KDE 4 > > Problem Description: > > Got this during bootup, somewhere before launching X. It only happened once, > I > have rebooted a few times and didn't reproduce it. > > Mar 25 19:48:46 better kernel: ================================= > Mar 25 19:48:46 better kernel: [ INFO: inconsistent lock state ] > Mar 25 19:48:46 better kernel: 2.6.25-rc6-lg #4 > Mar 25 19:48:46 better kernel: --------------------------------- > Mar 25 19:48:46 better kernel: inconsistent {in-softirq-W} -> {softirq-on-W} > usage. > Mar 25 19:48:46 better kernel: syslogd/2773 [HC0[0]:SC0[0]:HE1:SE1] takes: > Mar 25 19:48:46 better kernel: (&napi->poll_lock){-+..}, at: [<c03db11c>] > netpoll_poll+0xbc/0x390 > Mar 25 19:48:46 better kernel: {in-softirq-W} state was registered at: > Mar 25 19:48:46 better kernel: [<c024171a>] __lock_acquire+0x36a/0x1070 > Mar 25 19:48:46 better kernel: [<c024153f>] __lock_acquire+0x18f/0x1070 > Mar 25 19:48:46 better kernel: [<c024247f>] lock_acquire+0x5f/0x80 > Mar 25 19:48:46 better kernel: [<c03cfafc>] net_rx_action+0xbc/0x1d0 > Mar 25 19:48:46 better kernel: [<c044a9e4>] _spin_lock+0x34/0x40 > Mar 25 19:48:46 better kernel: [<c03cfafc>] net_rx_action+0xbc/0x1d0 > Mar 25 19:48:46 better kernel: [<c03cfafc>] net_rx_action+0xbc/0x1d0 > Mar 25 19:48:46 better kernel: [<c03cfaa0>] net_rx_action+0x60/0x1d0 > Mar 25 19:48:46 better kernel: [<c02254f2>] __do_softirq+0x62/0xc0 > Mar 25 19:48:46 better kernel: [<c0225595>] do_softirq+0x45/0x50 > Mar 25 19:48:46 better kernel: [<c0225797>] irq_exit+0x77/0x80 > Mar 25 19:48:46 better kernel: [<c0211a95>] > smp_apic_timer_interrupt+0x45/0x80 > Mar 25 19:48:46 better kernel: [<c02058ad>] apic_timer_interrupt+0x29/0x38 > Mar 25 19:48:46 better kernel: [<c02058b7>] apic_timer_interrupt+0x33/0x38 > Mar 25 19:48:46 better kernel: [<c022007b>] __set_personality+0x2b/0x240 > Mar 25 19:48:46 better kernel: [<c0220e51>] release_console_sem+0x191/0x1e0 > Mar 25 19:48:46 better kernel: [<c05751aa>] init_netconsole+0x1da/0x1f0 > Mar 25 19:48:46 better kernel: [<c0558700>] kernel_init+0x90/0x270 > Mar 25 19:48:46 better kernel: [<c0204ecf>] restore_nocheck+0x12/0x15 > Mar 25 19:48:46 better kernel: [<c0240eec>] trace_hardirqs_on+0x9c/0x110 > Mar 25 19:48:46 better kernel: [<c0558670>] kernel_init+0x0/0x270 > Mar 25 19:48:46 better kernel: [<c0558670>] kernel_init+0x0/0x270 > Mar 25 19:48:46 better kernel: [<c0205a23>] kernel_thread_helper+0x7/0x14 > Mar 25 19:48:46 better kernel: [<ffffffff>] 0xffffffff > Mar 25 19:48:46 better kernel: irq event stamp: 174638 > Mar 25 19:48:46 better kernel: hardirqs last enabled at (174637): > [<c0204ecf>] > restore_nocheck+0x12/0x15 > Mar 25 19:48:46 better kernel: hardirqs last disabled at (174638): > [<c044ac65>] > _spin_lock_irqsave+0x15/0x60 > Mar 25 19:48:46 better kernel: softirqs last enabled at (174636): > [<c0225595>] > do_softirq+0x45/0x50 > Mar 25 19:48:46 better kernel: softirqs last disabled at (174629): > [<c0225595>] > do_softirq+0x45/0x50 > Mar 25 19:48:46 better kernel: > Mar 25 19:48:46 better kernel: other info that might help us debug this: > Mar 25 19:48:46 better kernel: 3 locks held by syslogd/2773: > Mar 25 19:48:46 better kernel: #0: (&tty->atomic_write_lock){--..}, at: > [<c036c699>] tty_write_lock+0x19/0x50 > Mar 25 19:48:46 better kernel: #1: (target_list_lock){--..}, at: > [<c0399c0d>] > write_msg+0x2d/0xd0 > Mar 25 19:48:46 better kernel: #2: (&napi->poll_lock){-+..}, at: > [<c03db11c>] > netpoll_poll+0xbc/0x390 > Mar 25 19:48:46 better kernel: > Mar 25 19:48:46 better kernel: stack backtrace: > Mar 25 19:48:46 better kernel: Pid: 2773, comm: syslogd Not tainted > 2.6.25-rc6-lg #4 > Mar 25 19:48:46 better kernel: [<c023ff6f>] print_usage_bug+0x15f/0x170 > Mar 25 19:48:46 better kernel: [<c0240bd1>] mark_lock+0x401/0x590 > Mar 25 19:48:46 better kernel: [<c0240d98>] mark_held_locks+0x38/0x70 > Mar 25 19:48:46 better kernel: [<c0398a43>] sky2_poll+0x7a3/0xad0 > Mar 25 19:48:46 better kernel: [<c0240eec>] trace_hardirqs_on+0x9c/0x110 > Mar 25 19:48:46 better kernel: [<c0398a43>] sky2_poll+0x7a3/0xad0 > Mar 25 19:48:46 better kernel: [<c03db11c>] netpoll_poll+0xbc/0x390 > Mar 25 19:48:46 better kernel: [<c044adb2>] _spin_trylock+0x42/0x80 > Mar 25 19:48:46 better kernel: [<c03db0d3>] netpoll_poll+0x73/0x390 > Mar 25 19:48:46 better kernel: [<c03db51f>] netpoll_send_skb+0x12f/0x1d0 > Mar 25 19:48:46 better kernel: [<c0399c7c>] write_msg+0x9c/0xd0 > Mar 25 19:48:46 better kernel: [<c0220bd4>] __call_console_drivers+0x54/0x70 > Mar 25 19:48:46 better kernel: [<c0220d90>] release_console_sem+0xd0/0x1e0 > Mar 25 19:48:46 better kernel: [<c037a8d9>] do_con_write+0x189/0x1a80 > Mar 25 19:48:46 better kernel: [<c0240d98>] mark_held_locks+0x38/0x70 > Mar 25 19:48:46 better kernel: [<c044affc>] > _spin_unlock_irqrestore+0x4c/0x60 > Mar 25 19:48:46 better kernel: [<c0240eec>] trace_hardirqs_on+0x9c/0x110 > Mar 25 19:48:46 better kernel: [<c044afe7>] > _spin_unlock_irqrestore+0x37/0x60 > Mar 25 19:48:46 better kernel: [<c037c1f7>] con_put_char+0x27/0x30 > Mar 25 19:48:46 better kernel: [<c036e6ed>] opost+0xad/0x220 > Mar 25 19:48:46 better kernel: [<c036f08b>] write_chan+0x1ab/0x350 > Mar 25 19:48:46 better kernel: [<c021a790>] default_wake_function+0x0/0x10 > Mar 25 19:48:46 better kernel: [<c036c87b>] tty_write+0x13b/0x1d0 > Mar 25 19:48:47 better kernel: [<c036eee0>] write_chan+0x0/0x350 > Mar 25 19:48:47 better kernel: [<c02802e3>] do_loop_readv_writev+0x33/0x60 > Mar 25 19:48:47 better kernel: [<c0280d50>] do_readv_writev+0x170/0x180 > Mar 25 19:48:47 better kernel: [<c036c740>] tty_write+0x0/0x1d0 > Mar 25 19:48:47 better kernel: [<c0280d9c>] vfs_writev+0x3c/0x50 > Mar 25 19:48:47 better kernel: [<c0281261>] sys_writev+0x41/0x70 > Mar 25 19:48:47 better kernel: [<c0204de6>] sysenter_past_esp+0x5f/0xa5 > Mar 25 19:48:47 better kernel: ======================= >
Andrew Morton wrote, On 03/25/2008 09:43 PM: > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > On Tue, 25 Mar 2008 13:21:23 -0700 (PDT) > bugme-daemon@bugzilla.kernel.org wrote: > >> http://bugzilla.kernel.org/show_bug.cgi?id=10326 >> >> Summary: inconsistent lock state in net_rx_action >> Product: Networking >> Version: 2.5 >> KernelVersion: 2.6.25-rc6 (git >> a4083c9271e0a697278e089f2c0b9a95363ada0a) >> Platform: All >> OS/Version: Linux >> Tree: Mainline >> Status: NEW >> Severity: normal >> Priority: P1 >> Component: Other >> AssignedTo: acme@ghostprotocols.net >> ReportedBy: marcus@better.se >> >> >> Latest working kernel version: 2.6.24 >> Earliest failing kernel version: 2.6.25-rc6 (git >> a4083c9271e0a697278e089f2c0b9a95363ada0a) > > A post-2.624 regression. > >> Distribution: Debian testing/unstable >> Hardware Environment: LG LE50 Express laptop, i386 >> Software Environment: Debian i386, X.org 7.3, KDE 4 >> >> Problem Description: >> >> Got this during bootup, somewhere before launching X. It only happened once, >> I >> have rebooted a few times and didn't reproduce it. >> >> Mar 25 19:48:46 better kernel: ================================= >> Mar 25 19:48:46 better kernel: [ INFO: inconsistent lock state ] >> Mar 25 19:48:46 better kernel: 2.6.25-rc6-lg #4 >> Mar 25 19:48:46 better kernel: --------------------------------- >> Mar 25 19:48:46 better kernel: inconsistent {in-softirq-W} -> {softirq-on-W} >> usage. >> Mar 25 19:48:46 better kernel: syslogd/2773 [HC0[0]:SC0[0]:HE1:SE1] takes: >> Mar 25 19:48:46 better kernel: (&napi->poll_lock){-+..}, at: [<c03db11c>] >> netpoll_poll+0xbc/0x390 >> Mar 25 19:48:46 better kernel: {in-softirq-W} state was registered at: >> Mar 25 19:48:46 better kernel: [<c024171a>] __lock_acquire+0x36a/0x1070 >> Mar 25 19:48:46 better kernel: [<c024153f>] __lock_acquire+0x18f/0x1070 >> Mar 25 19:48:46 better kernel: [<c024247f>] lock_acquire+0x5f/0x80 >> Mar 25 19:48:46 better kernel: [<c03cfafc>] net_rx_action+0xbc/0x1d0 >> Mar 25 19:48:46 better kernel: [<c044a9e4>] _spin_lock+0x34/0x40 >> Mar 25 19:48:46 better kernel: [<c03cfafc>] net_rx_action+0xbc/0x1d0 >> Mar 25 19:48:46 better kernel: [<c03cfafc>] net_rx_action+0xbc/0x1d0 >> Mar 25 19:48:46 better kernel: [<c03cfaa0>] net_rx_action+0x60/0x1d0 >> Mar 25 19:48:46 better kernel: [<c02254f2>] __do_softirq+0x62/0xc0 >> Mar 25 19:48:46 better kernel: [<c0225595>] do_softirq+0x45/0x50 >> Mar 25 19:48:46 better kernel: [<c0225797>] irq_exit+0x77/0x80 >> Mar 25 19:48:46 better kernel: [<c0211a95>] >> smp_apic_timer_interrupt+0x45/0x80 >> Mar 25 19:48:46 better kernel: [<c02058ad>] apic_timer_interrupt+0x29/0x38 >> Mar 25 19:48:46 better kernel: [<c02058b7>] apic_timer_interrupt+0x33/0x38 >> Mar 25 19:48:46 better kernel: [<c022007b>] __set_personality+0x2b/0x240 >> Mar 25 19:48:46 better kernel: [<c0220e51>] >> release_console_sem+0x191/0x1e0 >> Mar 25 19:48:46 better kernel: [<c05751aa>] init_netconsole+0x1da/0x1f0 >> Mar 25 19:48:46 better kernel: [<c0558700>] kernel_init+0x90/0x270 >> Mar 25 19:48:46 better kernel: [<c0204ecf>] restore_nocheck+0x12/0x15 >> Mar 25 19:48:46 better kernel: [<c0240eec>] trace_hardirqs_on+0x9c/0x110 >> Mar 25 19:48:46 better kernel: [<c0558670>] kernel_init+0x0/0x270 >> Mar 25 19:48:46 better kernel: [<c0558670>] kernel_init+0x0/0x270 >> Mar 25 19:48:46 better kernel: [<c0205a23>] kernel_thread_helper+0x7/0x14 >> Mar 25 19:48:46 better kernel: [<ffffffff>] 0xffffffff >> Mar 25 19:48:46 better kernel: irq event stamp: 174638 >> Mar 25 19:48:46 better kernel: hardirqs last enabled at (174637): >> [<c0204ecf>] >> restore_nocheck+0x12/0x15 >> Mar 25 19:48:46 better kernel: hardirqs last disabled at (174638): >> [<c044ac65>] >> _spin_lock_irqsave+0x15/0x60 >> Mar 25 19:48:46 better kernel: softirqs last enabled at (174636): >> [<c0225595>] >> do_softirq+0x45/0x50 >> Mar 25 19:48:46 better kernel: softirqs last disabled at (174629): >> [<c0225595>] >> do_softirq+0x45/0x50 >> Mar 25 19:48:46 better kernel: >> Mar 25 19:48:46 better kernel: other info that might help us debug this: >> Mar 25 19:48:46 better kernel: 3 locks held by syslogd/2773: >> Mar 25 19:48:46 better kernel: #0: (&tty->atomic_write_lock){--..}, at: >> [<c036c699>] tty_write_lock+0x19/0x50 >> Mar 25 19:48:46 better kernel: #1: (target_list_lock){--..}, at: >> [<c0399c0d>] >> write_msg+0x2d/0xd0 >> Mar 25 19:48:46 better kernel: #2: (&napi->poll_lock){-+..}, at: >> [<c03db11c>] >> netpoll_poll+0xbc/0x390 >> Mar 25 19:48:46 better kernel: >> Mar 25 19:48:46 better kernel: stack backtrace: >> Mar 25 19:48:46 better kernel: Pid: 2773, comm: syslogd Not tainted >> 2.6.25-rc6-lg #4 >> Mar 25 19:48:46 better kernel: [<c023ff6f>] print_usage_bug+0x15f/0x170 >> Mar 25 19:48:46 better kernel: [<c0240bd1>] mark_lock+0x401/0x590 >> Mar 25 19:48:46 better kernel: [<c0240d98>] mark_held_locks+0x38/0x70 >> Mar 25 19:48:46 better kernel: [<c0398a43>] sky2_poll+0x7a3/0xad0 >> Mar 25 19:48:46 better kernel: [<c0240eec>] trace_hardirqs_on+0x9c/0x110 >> Mar 25 19:48:46 better kernel: [<c0398a43>] sky2_poll+0x7a3/0xad0 >> Mar 25 19:48:46 better kernel: [<c03db11c>] netpoll_poll+0xbc/0x390 >> Mar 25 19:48:46 better kernel: [<c044adb2>] _spin_trylock+0x42/0x80 >> Mar 25 19:48:46 better kernel: [<c03db0d3>] netpoll_poll+0x73/0x390 >> Mar 25 19:48:46 better kernel: [<c03db51f>] netpoll_send_skb+0x12f/0x1d0 >> Mar 25 19:48:46 better kernel: [<c0399c7c>] write_msg+0x9c/0xd0 netconsole's write_msg calls netpoll_send_udp (then netpoll_send_skb) always under spin_lock_irqsave(&target_list_lock, flags), and here this lock (and afterwards napi->poll_lock) is seen taken with irqs enabled. So, IMHO, it looks like some other irq (tracking?) imbalance and not networking bug. Regards, Jarek P. PS: linux-kernel added to CC. >> Mar 25 19:48:46 better kernel: [<c0220bd4>] >> __call_console_drivers+0x54/0x70 >> Mar 25 19:48:46 better kernel: [<c0220d90>] release_console_sem+0xd0/0x1e0 >> Mar 25 19:48:46 better kernel: [<c037a8d9>] do_con_write+0x189/0x1a80 >> Mar 25 19:48:46 better kernel: [<c0240d98>] mark_held_locks+0x38/0x70 >> Mar 25 19:48:46 better kernel: [<c044affc>] >> _spin_unlock_irqrestore+0x4c/0x60 >> Mar 25 19:48:46 better kernel: [<c0240eec>] trace_hardirqs_on+0x9c/0x110 >> Mar 25 19:48:46 better kernel: [<c044afe7>] >> _spin_unlock_irqrestore+0x37/0x60 >> Mar 25 19:48:46 better kernel: [<c037c1f7>] con_put_char+0x27/0x30 >> Mar 25 19:48:46 better kernel: [<c036e6ed>] opost+0xad/0x220 >> Mar 25 19:48:46 better kernel: [<c036f08b>] write_chan+0x1ab/0x350 >> Mar 25 19:48:46 better kernel: [<c021a790>] default_wake_function+0x0/0x10 >> Mar 25 19:48:46 better kernel: [<c036c87b>] tty_write+0x13b/0x1d0 >> Mar 25 19:48:47 better kernel: [<c036eee0>] write_chan+0x0/0x350 >> Mar 25 19:48:47 better kernel: [<c02802e3>] do_loop_readv_writev+0x33/0x60 >> Mar 25 19:48:47 better kernel: [<c0280d50>] do_readv_writev+0x170/0x180 >> Mar 25 19:48:47 better kernel: [<c036c740>] tty_write+0x0/0x1d0 >> Mar 25 19:48:47 better kernel: [<c0280d9c>] vfs_writev+0x3c/0x50 >> Mar 25 19:48:47 better kernel: [<c0281261>] sys_writev+0x41/0x70 >> Mar 25 19:48:47 better kernel: [<c0204de6>] sysenter_past_esp+0x5f/0xa5 >> Mar 25 19:48:47 better kernel: ======================= >> > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >
Reply-To: akpm@linux-foundation.org On Thu, 27 Mar 2008 00:13:41 +0100 Jarek Poplawski <jarkao2@gmail.com> wrote: > Andrew Morton wrote, On 03/25/2008 09:43 PM: > > > (switched to email. Please respond via emailed reply-to-all, not via the > > bugzilla web interface). > > > > On Tue, 25 Mar 2008 13:21:23 -0700 (PDT) > > bugme-daemon@bugzilla.kernel.org wrote: > > > >> http://bugzilla.kernel.org/show_bug.cgi?id=10326 > >> > >> Summary: inconsistent lock state in net_rx_action > >> Product: Networking > >> Version: 2.5 > >> KernelVersion: 2.6.25-rc6 (git > >> a4083c9271e0a697278e089f2c0b9a95363ada0a) > >> Platform: All > >> OS/Version: Linux > >> Tree: Mainline > >> Status: NEW > >> Severity: normal > >> Priority: P1 > >> Component: Other > >> AssignedTo: acme@ghostprotocols.net > >> ReportedBy: marcus@better.se > >> > >> > >> Latest working kernel version: 2.6.24 > >> Earliest failing kernel version: 2.6.25-rc6 (git > >> a4083c9271e0a697278e089f2c0b9a95363ada0a) > > > > A post-2.624 regression. > > > >> Distribution: Debian testing/unstable > >> Hardware Environment: LG LE50 Express laptop, i386 > >> Software Environment: Debian i386, X.org 7.3, KDE 4 > >> > >> Problem Description: > >> > >> Got this during bootup, somewhere before launching X. It only happened > once, I > >> have rebooted a few times and didn't reproduce it. > >> > >> Mar 25 19:48:46 better kernel: ================================= > >> Mar 25 19:48:46 better kernel: [ INFO: inconsistent lock state ] > >> Mar 25 19:48:46 better kernel: 2.6.25-rc6-lg #4 > >> Mar 25 19:48:46 better kernel: --------------------------------- > >> Mar 25 19:48:46 better kernel: inconsistent {in-softirq-W} -> > {softirq-on-W} > >> usage. > >> Mar 25 19:48:46 better kernel: syslogd/2773 [HC0[0]:SC0[0]:HE1:SE1] takes: > >> Mar 25 19:48:46 better kernel: (&napi->poll_lock){-+..}, at: [<c03db11c>] > >> netpoll_poll+0xbc/0x390 > >> Mar 25 19:48:46 better kernel: {in-softirq-W} state was registered at: > >> Mar 25 19:48:46 better kernel: [<c024171a>] __lock_acquire+0x36a/0x1070 > >> Mar 25 19:48:46 better kernel: [<c024153f>] __lock_acquire+0x18f/0x1070 > >> Mar 25 19:48:46 better kernel: [<c024247f>] lock_acquire+0x5f/0x80 > >> Mar 25 19:48:46 better kernel: [<c03cfafc>] net_rx_action+0xbc/0x1d0 > >> Mar 25 19:48:46 better kernel: [<c044a9e4>] _spin_lock+0x34/0x40 > >> Mar 25 19:48:46 better kernel: [<c03cfafc>] net_rx_action+0xbc/0x1d0 > >> Mar 25 19:48:46 better kernel: [<c03cfafc>] net_rx_action+0xbc/0x1d0 > >> Mar 25 19:48:46 better kernel: [<c03cfaa0>] net_rx_action+0x60/0x1d0 > >> Mar 25 19:48:46 better kernel: [<c02254f2>] __do_softirq+0x62/0xc0 > >> Mar 25 19:48:46 better kernel: [<c0225595>] do_softirq+0x45/0x50 > >> Mar 25 19:48:46 better kernel: [<c0225797>] irq_exit+0x77/0x80 > >> Mar 25 19:48:46 better kernel: [<c0211a95>] > >> smp_apic_timer_interrupt+0x45/0x80 > >> Mar 25 19:48:46 better kernel: [<c02058ad>] > apic_timer_interrupt+0x29/0x38 > >> Mar 25 19:48:46 better kernel: [<c02058b7>] > apic_timer_interrupt+0x33/0x38 > >> Mar 25 19:48:46 better kernel: [<c022007b>] __set_personality+0x2b/0x240 > >> Mar 25 19:48:46 better kernel: [<c0220e51>] > release_console_sem+0x191/0x1e0 > >> Mar 25 19:48:46 better kernel: [<c05751aa>] init_netconsole+0x1da/0x1f0 > >> Mar 25 19:48:46 better kernel: [<c0558700>] kernel_init+0x90/0x270 > >> Mar 25 19:48:46 better kernel: [<c0204ecf>] restore_nocheck+0x12/0x15 > >> Mar 25 19:48:46 better kernel: [<c0240eec>] trace_hardirqs_on+0x9c/0x110 > >> Mar 25 19:48:46 better kernel: [<c0558670>] kernel_init+0x0/0x270 > >> Mar 25 19:48:46 better kernel: [<c0558670>] kernel_init+0x0/0x270 > >> Mar 25 19:48:46 better kernel: [<c0205a23>] > kernel_thread_helper+0x7/0x14 > >> Mar 25 19:48:46 better kernel: [<ffffffff>] 0xffffffff > >> Mar 25 19:48:46 better kernel: irq event stamp: 174638 > >> Mar 25 19:48:46 better kernel: hardirqs last enabled at (174637): > [<c0204ecf>] > >> restore_nocheck+0x12/0x15 > >> Mar 25 19:48:46 better kernel: hardirqs last disabled at (174638): > [<c044ac65>] > >> _spin_lock_irqsave+0x15/0x60 > >> Mar 25 19:48:46 better kernel: softirqs last enabled at (174636): > [<c0225595>] > >> do_softirq+0x45/0x50 > >> Mar 25 19:48:46 better kernel: softirqs last disabled at (174629): > [<c0225595>] > >> do_softirq+0x45/0x50 > >> Mar 25 19:48:46 better kernel: > >> Mar 25 19:48:46 better kernel: other info that might help us debug this: > >> Mar 25 19:48:46 better kernel: 3 locks held by syslogd/2773: > >> Mar 25 19:48:46 better kernel: #0: (&tty->atomic_write_lock){--..}, at: > >> [<c036c699>] tty_write_lock+0x19/0x50 > >> Mar 25 19:48:46 better kernel: #1: (target_list_lock){--..}, at: > [<c0399c0d>] > >> write_msg+0x2d/0xd0 > >> Mar 25 19:48:46 better kernel: #2: (&napi->poll_lock){-+..}, at: > [<c03db11c>] > >> netpoll_poll+0xbc/0x390 > >> Mar 25 19:48:46 better kernel: > >> Mar 25 19:48:46 better kernel: stack backtrace: > >> Mar 25 19:48:46 better kernel: Pid: 2773, comm: syslogd Not tainted > >> 2.6.25-rc6-lg #4 > >> Mar 25 19:48:46 better kernel: [<c023ff6f>] print_usage_bug+0x15f/0x170 > >> Mar 25 19:48:46 better kernel: [<c0240bd1>] mark_lock+0x401/0x590 > >> Mar 25 19:48:46 better kernel: [<c0240d98>] mark_held_locks+0x38/0x70 > >> Mar 25 19:48:46 better kernel: [<c0398a43>] sky2_poll+0x7a3/0xad0 > >> Mar 25 19:48:46 better kernel: [<c0240eec>] trace_hardirqs_on+0x9c/0x110 > >> Mar 25 19:48:46 better kernel: [<c0398a43>] sky2_poll+0x7a3/0xad0 > >> Mar 25 19:48:46 better kernel: [<c03db11c>] netpoll_poll+0xbc/0x390 > >> Mar 25 19:48:46 better kernel: [<c044adb2>] _spin_trylock+0x42/0x80 > >> Mar 25 19:48:46 better kernel: [<c03db0d3>] netpoll_poll+0x73/0x390 > >> Mar 25 19:48:46 better kernel: [<c03db51f>] netpoll_send_skb+0x12f/0x1d0 > >> Mar 25 19:48:46 better kernel: [<c0399c7c>] write_msg+0x9c/0xd0 > > > netconsole's write_msg calls netpoll_send_udp (then netpoll_send_skb) always > under spin_lock_irqsave(&target_list_lock, flags), and here this lock (and > afterwards napi->poll_lock) is seen taken with irqs enabled. So, IMHO, it > looks like some other irq (tracking?) imbalance and not networking bug. > No, it's not an irq_disable() thing, directly. What lockdep is saying is that sky2_poll() is taking napi->poll_lock for writing with softirqs enabled, but net_rx_action() takes the same lock from within softirq context. If sky2_poll() always takes napi->poll_lock under local_irq_disable() then that would be a lockdep bug.
On Wed, Mar 26, 2008 at 05:14:03PM -0700, Andrew Morton wrote: ... > > >> http://bugzilla.kernel.org/show_bug.cgi?id=10326 ... > No, it's not an irq_disable() thing, directly. > > What lockdep is saying is that sky2_poll() is taking napi->poll_lock for > writing with softirqs enabled, but net_rx_action() takes the same lock from > within softirq context. > > If sky2_poll() always takes napi->poll_lock under local_irq_disable() then > that would be a lockdep bug. sky2_poll() doesn't take napi->poll_lock; this lock is taken by netpoll_poll() before calling sky2_poll(). And before this hardirqs are disabled in write_msg(). So, theoretically lockdep could be right if sky2_poll() would enable irqs after this. (If it were done in netpoll - lockdep should warn before or after sky2_poll() call.) But I really can't see any such possibility in sky2_poll(). Regards, Jarek P.
On Thu, Mar 27, 2008 at 09:55:42AM +0100, Jarek Poplawski wrote: ... > sky2_poll() doesn't take napi->poll_lock; this lock is taken by > netpoll_poll() before calling sky2_poll(). And before this hardirqs > are disabled in write_msg(). Actually, hardirqs are disabled again before netpoll_poll() (in netpoll_send_skb()), so sky2_poll() should need enable_irqs 2x to justify such a warning. Jarek P.
Reply-To: akpm@linux-foundation.org On Thu, 27 Mar 2008 09:55:42 +0100 Jarek Poplawski <jarkao2@gmail.com> wrote: > On Wed, Mar 26, 2008 at 05:14:03PM -0700, Andrew Morton wrote: > ... > > > >> http://bugzilla.kernel.org/show_bug.cgi?id=10326 > ... > > No, it's not an irq_disable() thing, directly. > > > > What lockdep is saying is that sky2_poll() is taking napi->poll_lock for > > writing with softirqs enabled, but net_rx_action() takes the same lock from > > within softirq context. > > > > If sky2_poll() always takes napi->poll_lock under local_irq_disable() then > > that would be a lockdep bug. > > sky2_poll() doesn't take napi->poll_lock; this lock is taken by > netpoll_poll() before calling sky2_poll(). And before this hardirqs > are disabled in write_msg(). So, theoretically lockdep could be right > if sky2_poll() would enable irqs after this. (If it were done in > netpoll - lockdep should warn before or after sky2_poll() call.) > But I really can't see any such possibility in sky2_poll(). I can't spot it from a five-minute read either. gcc's autoinlining really makes this sort of thing much harder than it used to be :( Anyway, the accusation is that lockdep is busted, in that it doesn't realise that local_irq_disable() blocks softirqs. I bet the net code is wrong and we missed it ;)
On Thu, Mar 27, 2008 at 02:18:12AM -0700, Andrew Morton wrote: ... > Anyway, the accusation is that lockdep is busted, in that it doesn't realise > that > local_irq_disable() blocks softirqs. Usually lockdep gets this right, so I've some doubts too. But here it's not only about softirqs; this: > >> Mar 25 19:48:46 better kernel: #1: (target_list_lock){--..}, at: > [<c0399c0d>] means lockdep saw hardirqs enabled while acquiring/holding this lock. Jarek P.
On Thu, 2008-03-27 at 02:18 -0700, Andrew Morton wrote: > On Thu, 27 Mar 2008 09:55:42 +0100 Jarek Poplawski <jarkao2@gmail.com> wrote: > > > On Wed, Mar 26, 2008 at 05:14:03PM -0700, Andrew Morton wrote: > > ... > > > > >> http://bugzilla.kernel.org/show_bug.cgi?id=10326 > > ... > > > No, it's not an irq_disable() thing, directly. > > > > > > What lockdep is saying is that sky2_poll() is taking napi->poll_lock for > > > writing with softirqs enabled, but net_rx_action() takes the same lock > from > > > within softirq context. > > > > > > If sky2_poll() always takes napi->poll_lock under local_irq_disable() > then > > > that would be a lockdep bug. > > > > sky2_poll() doesn't take napi->poll_lock; this lock is taken by > > netpoll_poll() before calling sky2_poll(). And before this hardirqs > > are disabled in write_msg(). So, theoretically lockdep could be right > > if sky2_poll() would enable irqs after this. (If it were done in > > netpoll - lockdep should warn before or after sky2_poll() call.) > > But I really can't see any such possibility in sky2_poll(). > > I can't spot it from a five-minute read either. gcc's autoinlining really > makes this sort of thing much harder than it used to be :( > > Anyway, the accusation is that lockdep is busted, in that it doesn't realise > that > local_irq_disable() blocks softirqs. > > I bet the net code is wrong and we missed it ;) How about this: <irqs disabled> netpoll_poll() poll_napi() spin_trylock(&napi->poll_lock) poll_one_napi() napi->poll() := sky2_poll() napi_complete() local_irq_disable() local_irq_enable() <--- *BUG* <irq> irq_exit() do_softirq() net_rx_action() spin_lock(&napi->poll_lock) <--- Deadlock! Because we still hold the lock....
From: Peter Zijlstra <a.p.zijlstra@chello.nl> Date: Thu, 27 Mar 2008 11:56:19 +0100 > How about this: > > <irqs disabled> > > netpoll_poll() > poll_napi() > spin_trylock(&napi->poll_lock) > poll_one_napi() > napi->poll() := sky2_poll() > napi_complete() > local_irq_disable() > local_irq_enable() <--- *BUG* > > <irq> > irq_exit() > do_softirq() > net_rx_action() > spin_lock(&napi->poll_lock) <--- Deadlock! > > Because we still hold the lock.... Yep, that will deadlock. Can some test if using local_irq_{save,restore}() in napi_complete() cures this lockdep warning? Thanks.
David Miller wrote: > Can some test if using local_irq_{save,restore}() in > napi_complete() cures this lockdep warning? I cannot reproduce the bug, so cannot help much. Marcus
On Thu, Mar 27, 2008 at 11:56:19AM +0100, Peter Zijlstra wrote: > On Thu, 2008-03-27 at 02:18 -0700, Andrew Morton wrote: ... > > I bet the net code is wrong and we missed it ;) It looks like you are natural born winner! Congratulations! > How about this: > > <irqs disabled> > > netpoll_poll() > poll_napi() > spin_trylock(&napi->poll_lock) > poll_one_napi() > napi->poll() := sky2_poll() > napi_complete() > local_irq_disable() > local_irq_enable() <--- *BUG* Yes! I missed it's unconditional here... Great catch! On the other hand, still a question why lockdep doesn't see this every day? Jarek P.
On Thu, Mar 27, 2008 at 12:59:02PM +0100, Marcus Better wrote: > David Miller wrote: >> Can some test if using local_irq_{save,restore}() in >> napi_complete() cures this lockdep warning? Isn't it now justified enough without lockdep's ack? > I cannot reproduce the bug, so cannot help much. Marcus, you've helped here very much and enough! Thanks, Jarek P.
On Thu, 2008-03-27 at 13:22 +0100, Jarek Poplawski wrote: > On Thu, Mar 27, 2008 at 11:56:19AM +0100, Peter Zijlstra wrote: > > On Thu, 2008-03-27 at 02:18 -0700, Andrew Morton wrote: > .... > > > I bet the net code is wrong and we missed it ;) > > It looks like you are natural born winner! Congratulations! > > > How about this: > > > > <irqs disabled> > > > > netpoll_poll() > > poll_napi() > > spin_trylock(&napi->poll_lock) > > poll_one_napi() > > napi->poll() := sky2_poll() > > napi_complete() > > local_irq_disable() > > local_irq_enable() <--- *BUG* > > Yes! I missed it's unconditional here... Great catch! > > On the other hand, still a question why lockdep doesn't see this > every day? My guess is that it is a race between polling the device and irq pushing the packet. That is, normally the IRQ handler wins and netpoll doesn't have anything to do and it doesn't traverse this code path. (although I must admit to being a little out of my depth here)
On Thu, Mar 27, 2008 at 01:30:14PM +0100, Peter Zijlstra wrote: ... > My guess is that it is a race between polling the device and irq pushing > the packet. That is, normally the IRQ handler wins and netpoll doesn't > have anything to do and it doesn't traverse this code path. > > (although I must admit to being a little out of my depth here) Probably you're right but there were really a lot of testing and stressing netconsole for misterious lockups for quite a long time with this new napi, so it's hard to believe this was hidden so well... Jarek P.
From: Jarek Poplawski <jarkao2@gmail.com> Date: Thu, 27 Mar 2008 13:32:48 +0100 > On Thu, Mar 27, 2008 at 12:59:02PM +0100, Marcus Better wrote: > > David Miller wrote: > >> Can some test if using local_irq_{save,restore}() in > >> napi_complete() cures this lockdep warning? > > Isn't it now justified enough without lockdep's ack? Absolutely, but I want to know that the problem is solved and confirmed by a tester.
On Thursday, 27 of March 2008, Jarek Poplawski wrote: > On Thu, Mar 27, 2008 at 01:30:14PM +0100, Peter Zijlstra wrote: > ... > > My guess is that it is a race between polling the device and irq pushing > > the packet. That is, normally the IRQ handler wins and netpoll doesn't > > have anything to do and it doesn't traverse this code path. > > > > (although I must admit to being a little out of my depth here) > > Probably you're right but there were really a lot of testing and > stressing netconsole for misterious lockups for quite a long time > with this new napi, so it's hard to believe this was hidden so > well... I assume there will be a patch posted for this issue. Correct? Rafael
On Thu, Mar 27, 2008 at 10:56:49PM +0100, Rafael J. Wysocki wrote: ... > I assume there will be a patch posted for this issue. Correct? I guess Peter should know the answer! Jarek P.
From: Jarek Poplawski <jarkao2@gmail.com> Date: Thu, 27 Mar 2008 23:22:48 +0100 > On Thu, Mar 27, 2008 at 10:56:49PM +0100, Rafael J. Wysocki wrote: > ... > > I assume there will be a patch posted for this issue. Correct? > > I guess Peter should know the answer! Instead of going back and forth like this can someone post just take it upon themselves to post such an obvious patch already :-) All of this "he'll do it, she'll do it" just wastes time. diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index a2f0032..fae6a7e 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -383,9 +383,11 @@ static inline void __napi_complete(struct napi_struct *n) static inline void napi_complete(struct napi_struct *n) { - local_irq_disable(); + unsigned long flags; + + local_irq_save(flags); __napi_complete(n); - local_irq_enable(); + local_irq_restore(flags); } /**
Regressions list annotation: Patch : http://bugzilla.kernel.org/show_bug.cgi?id=10326#c20
On Thu, Mar 27, 2008 at 03:22:18PM -0700, David Miller wrote: ... > Instead of going back and forth like this can someone post > just take it upon themselves to post such an obvious patch > already :-) > > All of this "he'll do it, she'll do it" just wastes time. Right! She or he should also remember to add Peter's lockup scenario and his signed-off-by... Jarek P.
From: Jarek Poplawski <jarkao2@gmail.com> Date: Thu, 27 Mar 2008 23:39:13 +0100 > On Thu, Mar 27, 2008 at 03:22:18PM -0700, David Miller wrote: > ... > > Instead of going back and forth like this can someone post > > just take it upon themselves to post such an obvious patch > > already :-) > > > > All of this "he'll do it, she'll do it" just wastes time. > > Right! She or he should also remember to add Peter's lockup > scenario and his signed-off-by... I didn't put in a Signed-off-by since Peter didn't give one, but I did give him credit for the discovery and I did list explicitly his given logic.
On Thu, 2008-03-27 at 17:59 -0700, David Miller wrote: > From: Jarek Poplawski <jarkao2@gmail.com> > Date: Thu, 27 Mar 2008 23:39:13 +0100 > > > On Thu, Mar 27, 2008 at 03:22:18PM -0700, David Miller wrote: > > ... > > > Instead of going back and forth like this can someone post > > > just take it upon themselves to post such an obvious patch > > > already :-) > > > > > > All of this "he'll do it, she'll do it" just wastes time. > > > > Right! She or he should also remember to add Peter's lockup > > scenario and his signed-off-by... > > I didn't put in a Signed-off-by since Peter didn't give > one, but I did give him credit for the discovery and I > did list explicitly his given logic. Hehe, nr_emails > loc_changed situation alert! Fwiw here's my ACK: Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Fixed by: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=50fd4407b8bfbde7c1a0bfe4f24de7df37164342