Bug 8132 - pptp server lockup in ppp_asynctty_receive()
Summary: pptp server lockup in ppp_asynctty_receive()
Status: CLOSED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Network (show other bugs)
Hardware: i386 Linux
: P2 high
Assignee: Paul Mackerras
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-03-05 14:26 UTC by Yuriy Shkandybin
Modified: 2008-09-23 09:44 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.20
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
config file of problem pptp server (26.09 KB, text/plain)
2007-03-09 00:36 UTC, Yuriy Shkandybin
Details

Description Yuriy Shkandybin 2007-03-05 14:26:23 UTC
Already several kernel releases i've expirienced different lockups of  vpn 
(pptp) server.
There is more then 200  ppp connections sometimes.
With kernel debug i was able to retrive next information:

First:
Showing all locks held in the system:
1 lock held by agetty/4486:
 #0:  (&tty->atomic_read_lock){--..}, at: [<c01d4e6f>] read_chan+0x41a/0x60b
1 lock held by agetty/4487:
 #0:  (&tty->atomic_read_lock){--..}, at: [<c01d4e6f>] read_chan+0x41a/0x60b
1 lock held by agetty/4488:
 #0:  (&tty->atomic_read_lock){--..}, at: [<c01d4e6f>] read_chan+0x41a/0x60b
2 locks held by pptpctrl/4500:
 #0:  (&tty->atomic_write_lock){--..}, at: [<c01d0f8f>] tty_write+0x83/0x1d0
 #1:  (&ap->recv_lock){....}, at: [<c02241fe>] 
ppp_asynctty_receive+0x2e/0x710

=============================================
BUG: spinlock lockup on CPU#1, pppd/4504, df5048c4
 [<c01c42c6>] _raw_spin_lock+0x100/0x134
 [<c0223f03>] ppp_async_ioctl+0xa7/0x1d0
 [<c0221b7c>] ppp_ioctl+0xa5/0xbff
 [<c012f2a5>] down_read+0x29/0x3a
 [<c0223e5c>] ppp_async_ioctl+0x0/0x1d0
 [<c0221ba5>] ppp_ioctl+0xce/0xbff
 [<c029b78f>] _spin_unlock+0x14/0x1c
 [<c014a22a>] do_wp_page+0x256/0x4ba
 [<c014c1e9>] __handle_mm_fault+0x74e/0xa22
 [<c0167224>] do_ioctl+0x64/0x6d
 [<c016727d>] vfs_ioctl+0x50/0x273
 [<c01674d4>] sys_ioctl+0x34/0x50
 [<c0102dae>] sysenter_past_esp+0x5f/0x99
 =======================
BUG: soft lockup detected on CPU#0!
 [<c013ce27>] softlockup_tick+0x8d/0xbc
 [<c0123784>] update_process_times+0x28/0x5e
 [<c010d9c4>] smp_apic_timer_interrupt+0x80/0x9c
 [<c0103903>] apic_timer_interrupt+0x33/0x38
 [<c01c2e9d>] delay_tsc+0x9/0x13
 [<c01c2ed3>] __delay+0x6/0x7
 [<c01c426f>] _raw_spin_lock+0xa9/0x134
 [<c01d0f8f>] tty_write+0x83/0x1d0
 [<c01cf418>] tty_ldisc_try+0x2f/0x33
 [<c029bc8c>] lock_kernel+0x19/0x24
 [<c01d1017>] tty_write+0x10b/0x1d0
 [<c01d35cf>] write_chan+0x0/0x320
 [<c015d502>] vfs_write+0x87/0xf0
 [<c01d0f0c>] tty_write+0x0/0x1d0
 [<c015daa9>] sys_write+0x41/0x6a
 [<c0102dae>] sysenter_past_esp+0x5f/0x99
 =======================


Second)
<0>BUG: spinlock lockup on CPU#0, pppd/5209, de3e2884
 [<c01c42c6>] _raw_spin_lock+0x100/0x134
BUG: spinlock lockup on CPU#1, ip-down/7524, c0353300
 [<c01c42c6>] _raw_spin_lock+0x100/0x134
 [<c029bc8c>] lock_kernel+0x19/0x24
 [<c015f234>] chrdev_open+0x8a/0x16e
 [<c015f1aa>] chrdev_open+0x0/0x16e
 [<c015ba94>] __dentry_open+0xaf/0x1a0
 [<c015bc12>] nameidata_to_filp+0x31/0x3a
 [<c015bc54>] do_filp_open+0x39/0x40
 [<c029b78f>] _spin_unlock+0x14/0x1c
 [<c015b9d4>] get_unused_fd+0xaa/0xbb
 [<c015bc95>] do_sys_open+0x3a/0x6d
 [<c015bd03>] sys_open+0x1c/0x20
 [<c0102dae>] sysenter_past_esp+0x5f/0x99
 =======================
 [<c0223f03>] ppp_async_ioctl+0xa7/0x1d0
 [<c0221b7c>] ppp_ioctl+0xa5/0xbff
 [<c012f2a5>] down_read+0x29/0x3a
 [<c0223e5c>] ppp_async_ioctl+0x0/0x1d0
 [<c0221ba5>] ppp_ioctl+0xce/0xbff
 [<c029b78f>] _spin_unlock+0x14/0x1c
 [<c014a22a>] do_wp_page+0x256/0x4ba
 [<c014c1e9>] __handle_mm_fault+0x74e/0xa22
 [<c0167224>] do_ioctl+0x64/0x6d
 [<c016727d>] vfs_ioctl+0x50/0x273
 [<c01674d4>] sys_ioctl+0x34/0x50
 [<c0102dae>] sysenter_past_esp+0x5f/0x99
 =======================

Third)
BUG: soft lockup detected on CPU#0!
 [<c013ce27>] softlockup_tick+0x8d/0xbc
 [<c0123784>] update_process_times+0x28/0x5e
 [<c010d9c4>] smp_apic_timer_interrupt+0x80/0x9c
 [<c0103903>] apic_timer_interrupt+0x33/0x38
 [<c01c2e9d>] delay_tsc+0x9/0x13
 [<c01c2ed3>] __delay+0x6/0x7
 [<c01c426f>] _raw_spin_lock+0xa9/0x134
 [<c01cf418>] tty_ldisc_try+0x2f/0x33
 [<c029bc8c>] lock_kernel+0x19/0x24
 [<c01d1984>] tty_read+0x5a/0xbe
 [<c015d5f0>] vfs_read+0x85/0xee
 [<c01d192a>] tty_read+0x0/0xbe
 [<c015da3f>] sys_read+0x41/0x6a
 [<c0102dae>] sysenter_past_esp+0x5f/0x99
 =======================
BUG: soft lockup detected on CPU#0!
 [<c013ce27>] softlockup_tick+0x8d/0xbc
 [<c0123784>] update_process_times+0x28/0x5e
 [<c010d9c4>] smp_apic_timer_interrupt+0x80/0x9c
 [<c0103903>] apic_timer_interrupt+0x33/0x38
 [<c01c00d8>] prio_tree_insert+0xe8/0x23b
 [<c01c4275>] _raw_spin_lock+0xaf/0x134
 [<c01cf418>] tty_ldisc_try+0x2f/0x33
 [<c029bc8c>] lock_kernel+0x19/0x24
 [<c01d1984>] tty_read+0x5a/0xbe
 [<c015d5f0>] vfs_read+0x85/0xee
 [<c01d192a>] tty_read+0x0/0xbe
 [<c015da3f>] sys_read+0x41/0x6a
 [<c0102dae>] sysenter_past_esp+0x5f/0x99


Next via SysRq:

Showing all locks held in the system:
1 lock held by agetty/5057:
 #0:  (&tty->atomic_read_lock){--..}, at: [<c01d4e6f>] read_chan+0x41a/0x60b
1 lock held by agetty/5058:
 #0:  (&tty->atomic_read_lock){--..}, at: [<c01d4e6f>] read_chan+0x41a/0x60b
1 lock held by agetty/5059:
 #0:  (&tty->atomic_read_lock){--..}, at: [<c01d4e6f>] read_chan+0x41a/0x60b
2 locks held by pptpctrl/5071:
 #0:  (&tty->atomic_write_lock){--..}, at: [<c01d0f8f>] tty_write+0x83/0x1d0
 #1:  (&ap->recv_lock){....}, at: [<c02241fe>] 
ppp_asynctty_receive+0x2e/0x710


~#SysRq : Show Blocked State

                         free                        sibling
  task             PC    stack   pid father child younger older
pptpctrl      D C02A18E0     0  5071   4646  5074    5094  5064 (L-TLB)
       df3a3bd0 00000082 0029b837 c02a18e0 00000246 00000000 dd4f131c 
dd563cac
       def86030 c140864c 00000000 00000000 00000009 def86030 2ccaa8e5 
0000017d
       3b9aca00 def8613c c1407cc0 00000000 006a9953 00000000 de8f503c 
00000046
Call Trace:
 [<c029b5b0>] __down+0x8c/0xf4
 [<c0116209>] default_wake_function+0x0/0xc
 [<c029b397>] __down_failed+0x7/0xc
 [<c0223de4>] ppp_asynctty_close+0x28/0x98
 [<c0223e4f>] ppp_asynctty_close+0x93/0x98
 [<c0223e59>] ppp_asynctty_hangup+0x5/0x8
 [<c01d0c8a>] do_tty_hangup+0x119/0x391
 [<c01d0612>] release_dev+0x154/0x69b
 [<c0158a01>] poison_obj+0x20/0x3d
 [<c0158a01>] poison_obj+0x20/0x3d
 [<c0158c70>] cache_free_debugcheck+0xc7/0x1dd
 [<c016ddd5>] destroy_inode+0x20/0x33
 [<c016c88c>] dput+0xd9/0x145
 [<c01d0b68>] tty_release+0xf/0x18
 [<c015dd96>] __fput+0x3c/0xe7
 [<c015b906>] filp_close+0x3e/0x62
 [<c011c73b>] put_files_struct+0xa5/0xb8
 [<c011d808>] do_exit+0x124/0x7e2
 [<c01e1710>] vgacon_set_cursor_size+0x2d/0xeb
 [<c0104392>] do_trap+0x0/0xa9
 [<c0104bf5>] do_invalid_op+0x0/0xab
 [<c0104c97>] do_invalid_op+0xa2/0xab
 [<c024d1e4>] skb_under_panic+0x59/0x5d
 [<c0113ecc>] __wake_up+0x32/0x43
 [<c011af94>] release_console_sem+0x22d/0x239
 [<c011b4f1>] vprintk+0x1fb/0x37c
 [<c029bd4c>] error_code+0x7c/0x84
 [<c024d1e4>] skb_under_panic+0x59/0x5d
 [<c02248a2>] ppp_asynctty_receive+0x6d2/0x710
 [<c02248b0>] ppp_asynctty_receive+0x6e0/0x710
 [<c01d5a09>] pty_write+0x39/0x41
 [<c01d37e1>] write_chan+0x212/0x320
 [<c0116209>] default_wake_function+0x0/0xc
 [<c01d1028>] tty_write+0x11c/0x1d0
 [<c01d35cf>] write_chan+0x0/0x320
 [<c015d502>] vfs_write+0x87/0xf0
 [<c01d0f0c>] tty_write+0x0/0x1d0
 [<c015daa9>] sys_write+0x41/0x6a
 [<c0102dae>] sysenter_past_esp+0x5f/0x99
 =======================

Dump registers
Pid: 13874, comm:             pptpctrl
EIP: 0060:[<c01c2e94>] CPU: 0
EIP is at delay_tsc+0x0/0x13
 EFLAGS: 00000246    Not tainted  (2.6.20-gentoo #10)
EAX: 00000001 EBX: c0353300 ECX: f52a298e EDX: c799bc55
ESI: 492d5b81 EDI: 00000000 EBP: 8eb4e7d4 DS: 007b ES: 007b GS: 00d8
CR0: 8005003b CR2: b7e598b8 CR3: 0e6f3000 CR4: 000006d0
 [<c01c2ed3>] __delay+0x6/0x7
 [<c01c426f>] _raw_spin_lock+0xa9/0x134
 [<c01cf418>] tty_ldisc_try+0x2f/0x33
 [<c029bc8c>] lock_kernel+0x19/0x24
 [<c01d1984>] tty_read+0x5a/0xbe
 [<c015d5f0>] vfs_read+0x85/0xee
 [<c01d192a>] tty_read+0x0/0xbe
 [<c015da3f>] sys_read+0x41/0x6a
 [<c0102dae>] sysenter_past_esp+0x5f/0x99
 =======================


Information about system:
 sh scripts/ver_linux
If some fields are empty or look unusual you may have an old version.
Compare to the current minimal requirements in Documentation/Changes.

Linux vpn2 2.6.20-gentoo #10 SMP Mon Mar 5 11:07:33 MSK 2007 i686 Intel(R) 
Pentium(R) 4 CPU 2.40GHz GenuineIntel GNU/Linux

Gnu C                  4.1.2
Gnu make               3.81
binutils               2.17
util-linux             2.12r
mount                  2.12r
module-init-tools      3.2.2
e2fsprogs              1.39
reiserfsprogs          3.6.19
PPP                    2.4.4
Linux C Library        > libc.2.5
Dynamic linker (ldd)   2.5
Procps                 3.2.7
Net-tools              1.60
Kbd                    1.12
Sh-utils               6.7
udev                   104
Modules Loaded         8021q ipt_TCPMSS xt_tcpudp xt_pkttype iptable_filter 
ip_tables x_tables i2c_i801 i2c_core
Comment 1 Anonymous Emailer 2007-03-05 15:13:39 UTC
Reply-To: akpm@linux-foundation.org

On Mon, 5 Mar 2007 14:26:30 -0800
bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=8132
> 
>            Summary: pptp server lockup in ppp_asynctty_receive()
>     Kernel Version:  2.6.20
>             Status: NEW
>           Severity: high
>              Owner: jgarzik@pobox.com
>          Submitter: jura@netams.com
>                 CC: jura@netams.com
> 
> 
> Already several kernel releases i've expirienced different lockups of  vpn 
> (pptp) server.
> There is more then 200  ppp connections sometimes.
> With kernel debug i was able to retrive next information:
> 
> First:
> Showing all locks held in the system:
> 1 lock held by agetty/4486:
>  #0:  (&tty->atomic_read_lock){--..}, at: [<c01d4e6f>] read_chan+0x41a/0x60b
> 1 lock held by agetty/4487:
>  #0:  (&tty->atomic_read_lock){--..}, at: [<c01d4e6f>] read_chan+0x41a/0x60b
> 1 lock held by agetty/4488:
>  #0:  (&tty->atomic_read_lock){--..}, at: [<c01d4e6f>] read_chan+0x41a/0x60b
> 2 locks held by pptpctrl/4500:
>  #0:  (&tty->atomic_write_lock){--..}, at: [<c01d0f8f>] tty_write+0x83/0x1d0
>  #1:  (&ap->recv_lock){....}, at: [<c02241fe>] 
> ppp_asynctty_receive+0x2e/0x710
> 
> =============================================
> BUG: spinlock lockup on CPU#1, pppd/4504, df5048c4
>  [<c01c42c6>] _raw_spin_lock+0x100/0x134
>  [<c0223f03>] ppp_async_ioctl+0xa7/0x1d0
>  [<c0221b7c>] ppp_ioctl+0xa5/0xbff
>  [<c012f2a5>] down_read+0x29/0x3a
>  [<c0223e5c>] ppp_async_ioctl+0x0/0x1d0
>  [<c0221ba5>] ppp_ioctl+0xce/0xbff
>  [<c029b78f>] _spin_unlock+0x14/0x1c
>  [<c014a22a>] do_wp_page+0x256/0x4ba
>  [<c014c1e9>] __handle_mm_fault+0x74e/0xa22
>  [<c0167224>] do_ioctl+0x64/0x6d
>  [<c016727d>] vfs_ioctl+0x50/0x273
>  [<c01674d4>] sys_ioctl+0x34/0x50
>  [<c0102dae>] sysenter_past_esp+0x5f/0x99
>  =======================
> BUG: soft lockup detected on CPU#0!
>  [<c013ce27>] softlockup_tick+0x8d/0xbc
>  [<c0123784>] update_process_times+0x28/0x5e
>  [<c010d9c4>] smp_apic_timer_interrupt+0x80/0x9c
>  [<c0103903>] apic_timer_interrupt+0x33/0x38
>  [<c01c2e9d>] delay_tsc+0x9/0x13
>  [<c01c2ed3>] __delay+0x6/0x7
>  [<c01c426f>] _raw_spin_lock+0xa9/0x134
>  [<c01d0f8f>] tty_write+0x83/0x1d0
>  [<c01cf418>] tty_ldisc_try+0x2f/0x33
>  [<c029bc8c>] lock_kernel+0x19/0x24
>  [<c01d1017>] tty_write+0x10b/0x1d0
>  [<c01d35cf>] write_chan+0x0/0x320
>  [<c015d502>] vfs_write+0x87/0xf0
>  [<c01d0f0c>] tty_write+0x0/0x1d0
>  [<c015daa9>] sys_write+0x41/0x6a
>  [<c0102dae>] sysenter_past_esp+0x5f/0x99
>  =======================
> 
> 
> Second)
> <0>BUG: spinlock lockup on CPU#0, pppd/5209, de3e2884
>  [<c01c42c6>] _raw_spin_lock+0x100/0x134
> BUG: spinlock lockup on CPU#1, ip-down/7524, c0353300
>  [<c01c42c6>] _raw_spin_lock+0x100/0x134
>  [<c029bc8c>] lock_kernel+0x19/0x24
>  [<c015f234>] chrdev_open+0x8a/0x16e
>  [<c015f1aa>] chrdev_open+0x0/0x16e
>  [<c015ba94>] __dentry_open+0xaf/0x1a0
>  [<c015bc12>] nameidata_to_filp+0x31/0x3a
>  [<c015bc54>] do_filp_open+0x39/0x40
>  [<c029b78f>] _spin_unlock+0x14/0x1c
>  [<c015b9d4>] get_unused_fd+0xaa/0xbb
>  [<c015bc95>] do_sys_open+0x3a/0x6d
>  [<c015bd03>] sys_open+0x1c/0x20
>  [<c0102dae>] sysenter_past_esp+0x5f/0x99
>  =======================
>  [<c0223f03>] ppp_async_ioctl+0xa7/0x1d0
>  [<c0221b7c>] ppp_ioctl+0xa5/0xbff
>  [<c012f2a5>] down_read+0x29/0x3a
>  [<c0223e5c>] ppp_async_ioctl+0x0/0x1d0
>  [<c0221ba5>] ppp_ioctl+0xce/0xbff
>  [<c029b78f>] _spin_unlock+0x14/0x1c
>  [<c014a22a>] do_wp_page+0x256/0x4ba
>  [<c014c1e9>] __handle_mm_fault+0x74e/0xa22
>  [<c0167224>] do_ioctl+0x64/0x6d
>  [<c016727d>] vfs_ioctl+0x50/0x273
>  [<c01674d4>] sys_ioctl+0x34/0x50
>  [<c0102dae>] sysenter_past_esp+0x5f/0x99
>  =======================
> 
> Third)
> BUG: soft lockup detected on CPU#0!
>  [<c013ce27>] softlockup_tick+0x8d/0xbc
>  [<c0123784>] update_process_times+0x28/0x5e
>  [<c010d9c4>] smp_apic_timer_interrupt+0x80/0x9c
>  [<c0103903>] apic_timer_interrupt+0x33/0x38
>  [<c01c2e9d>] delay_tsc+0x9/0x13
>  [<c01c2ed3>] __delay+0x6/0x7
>  [<c01c426f>] _raw_spin_lock+0xa9/0x134
>  [<c01cf418>] tty_ldisc_try+0x2f/0x33
>  [<c029bc8c>] lock_kernel+0x19/0x24
>  [<c01d1984>] tty_read+0x5a/0xbe
>  [<c015d5f0>] vfs_read+0x85/0xee
>  [<c01d192a>] tty_read+0x0/0xbe
>  [<c015da3f>] sys_read+0x41/0x6a
>  [<c0102dae>] sysenter_past_esp+0x5f/0x99
>  =======================
> BUG: soft lockup detected on CPU#0!
>  [<c013ce27>] softlockup_tick+0x8d/0xbc
>  [<c0123784>] update_process_times+0x28/0x5e
>  [<c010d9c4>] smp_apic_timer_interrupt+0x80/0x9c
>  [<c0103903>] apic_timer_interrupt+0x33/0x38
>  [<c01c00d8>] prio_tree_insert+0xe8/0x23b
>  [<c01c4275>] _raw_spin_lock+0xaf/0x134
>  [<c01cf418>] tty_ldisc_try+0x2f/0x33
>  [<c029bc8c>] lock_kernel+0x19/0x24
>  [<c01d1984>] tty_read+0x5a/0xbe
>  [<c015d5f0>] vfs_read+0x85/0xee
>  [<c01d192a>] tty_read+0x0/0xbe
>  [<c015da3f>] sys_read+0x41/0x6a
>  [<c0102dae>] sysenter_past_esp+0x5f/0x99
> 
> 
> Next via SysRq:
> 
> Showing all locks held in the system:
> 1 lock held by agetty/5057:
>  #0:  (&tty->atomic_read_lock){--..}, at: [<c01d4e6f>] read_chan+0x41a/0x60b
> 1 lock held by agetty/5058:
>  #0:  (&tty->atomic_read_lock){--..}, at: [<c01d4e6f>] read_chan+0x41a/0x60b
> 1 lock held by agetty/5059:
>  #0:  (&tty->atomic_read_lock){--..}, at: [<c01d4e6f>] read_chan+0x41a/0x60b
> 2 locks held by pptpctrl/5071:
>  #0:  (&tty->atomic_write_lock){--..}, at: [<c01d0f8f>] tty_write+0x83/0x1d0
>  #1:  (&ap->recv_lock){....}, at: [<c02241fe>] 
> ppp_asynctty_receive+0x2e/0x710
> 
> 
> ~#SysRq : Show Blocked State
> 
>                          free                        sibling
>   task             PC    stack   pid father child younger older
> pptpctrl      D C02A18E0     0  5071   4646  5074    5094  5064 (L-TLB)
>        df3a3bd0 00000082 0029b837 c02a18e0 00000246 00000000 dd4f131c 
> dd563cac
>        def86030 c140864c 00000000 00000000 00000009 def86030 2ccaa8e5 
> 0000017d
>        3b9aca00 def8613c c1407cc0 00000000 006a9953 00000000 de8f503c 
> 00000046
> Call Trace:
>  [<c029b5b0>] __down+0x8c/0xf4
>  [<c0116209>] default_wake_function+0x0/0xc
>  [<c029b397>] __down_failed+0x7/0xc
>  [<c0223de4>] ppp_asynctty_close+0x28/0x98
>  [<c0223e4f>] ppp_asynctty_close+0x93/0x98
>  [<c0223e59>] ppp_asynctty_hangup+0x5/0x8
>  [<c01d0c8a>] do_tty_hangup+0x119/0x391
>  [<c01d0612>] release_dev+0x154/0x69b
>  [<c0158a01>] poison_obj+0x20/0x3d
>  [<c0158a01>] poison_obj+0x20/0x3d
>  [<c0158c70>] cache_free_debugcheck+0xc7/0x1dd
>  [<c016ddd5>] destroy_inode+0x20/0x33
>  [<c016c88c>] dput+0xd9/0x145
>  [<c01d0b68>] tty_release+0xf/0x18
>  [<c015dd96>] __fput+0x3c/0xe7
>  [<c015b906>] filp_close+0x3e/0x62
>  [<c011c73b>] put_files_struct+0xa5/0xb8
>  [<c011d808>] do_exit+0x124/0x7e2
>  [<c01e1710>] vgacon_set_cursor_size+0x2d/0xeb
>  [<c0104392>] do_trap+0x0/0xa9
>  [<c0104bf5>] do_invalid_op+0x0/0xab
>  [<c0104c97>] do_invalid_op+0xa2/0xab
>  [<c024d1e4>] skb_under_panic+0x59/0x5d
>  [<c0113ecc>] __wake_up+0x32/0x43
>  [<c011af94>] release_console_sem+0x22d/0x239
>  [<c011b4f1>] vprintk+0x1fb/0x37c
>  [<c029bd4c>] error_code+0x7c/0x84
>  [<c024d1e4>] skb_under_panic+0x59/0x5d
>  [<c02248a2>] ppp_asynctty_receive+0x6d2/0x710
>  [<c02248b0>] ppp_asynctty_receive+0x6e0/0x710
>  [<c01d5a09>] pty_write+0x39/0x41
>  [<c01d37e1>] write_chan+0x212/0x320
>  [<c0116209>] default_wake_function+0x0/0xc
>  [<c01d1028>] tty_write+0x11c/0x1d0
>  [<c01d35cf>] write_chan+0x0/0x320
>  [<c015d502>] vfs_write+0x87/0xf0
>  [<c01d0f0c>] tty_write+0x0/0x1d0
>  [<c015daa9>] sys_write+0x41/0x6a
>  [<c0102dae>] sysenter_past_esp+0x5f/0x99
>  =======================
> 
> Dump registers
> Pid: 13874, comm:             pptpctrl
> EIP: 0060:[<c01c2e94>] CPU: 0
> EIP is at delay_tsc+0x0/0x13
>  EFLAGS: 00000246    Not tainted  (2.6.20-gentoo #10)
> EAX: 00000001 EBX: c0353300 ECX: f52a298e EDX: c799bc55
> ESI: 492d5b81 EDI: 00000000 EBP: 8eb4e7d4 DS: 007b ES: 007b GS: 00d8
> CR0: 8005003b CR2: b7e598b8 CR3: 0e6f3000 CR4: 000006d0
>  [<c01c2ed3>] __delay+0x6/0x7
>  [<c01c426f>] _raw_spin_lock+0xa9/0x134
>  [<c01cf418>] tty_ldisc_try+0x2f/0x33
>  [<c029bc8c>] lock_kernel+0x19/0x24
>  [<c01d1984>] tty_read+0x5a/0xbe
>  [<c015d5f0>] vfs_read+0x85/0xee
>  [<c01d192a>] tty_read+0x0/0xbe
>  [<c015da3f>] sys_read+0x41/0x6a
>  [<c0102dae>] sysenter_past_esp+0x5f/0x99
>  =======================
> 
> 
> Information about system:
>  sh scripts/ver_linux
> If some fields are empty or look unusual you may have an old version.
> Compare to the current minimal requirements in Documentation/Changes.
> 
> Linux vpn2 2.6.20-gentoo #10 SMP Mon Mar 5 11:07:33 MSK 2007 i686 Intel(R) 
> Pentium(R) 4 CPU 2.40GHz GenuineIntel GNU/Linux
> 
> Gnu C                  4.1.2
> Gnu make               3.81
> binutils               2.17
> util-linux             2.12r
> mount                  2.12r
> module-init-tools      3.2.2
> e2fsprogs              1.39
> reiserfsprogs          3.6.19
> PPP                    2.4.4
> Linux C Library        > libc.2.5
> Dynamic linker (ldd)   2.5
> Procps                 3.2.7
> Net-tools              1.60
> Kbd                    1.12
> Sh-utils               6.7
> udev                   104
> Modules Loaded         8021q ipt_TCPMSS xt_tcpudp xt_pkttype iptable_filter 
> ip_tables x_tables i2c_i801 i2c_core
> 

This could be an interaction between the tty-layer locking changes and ppp?

Comment 2 Jarek Poplawski 2007-03-08 00:41:42 UTC
On 06-03-2007 00:13, Andrew Morton wrote:
> On Mon, 5 Mar 2007 14:26:30 -0800
> bugme-daemon@bugzilla.kernel.org wrote:
> 
>> http://bugzilla.kernel.org/show_bug.cgi?id=8132
>>
>>            Summary: pptp server lockup in ppp_asynctty_receive()
>>     Kernel Version:  2.6.20
>>             Status: NEW
>>           Severity: high
>>              Owner: jgarzik@pobox.com
>>          Submitter: jura@netams.com
>>                 CC: jura@netams.com
>>
>>
>> Already several kernel releases i've expirienced different lockups of  vpn 
>> (pptp) server.
>> There is more then 200  ppp connections sometimes.
>> With kernel debug i was able to retrive next information:

Could this be done with:
CONFIG_PROVE_LOCKING = y
CONFIG_4KSTACKS = n 

and, if possible, to send the first full lockdep error
message with some lines before it? It would be nice to
try to be more exact when hand writing, too.

Regards,
Jarek P.

Comment 3 Yuriy Shkandybin 2007-03-09 00:36:24 UTC
Created attachment 10660 [details]
config file of problem pptp server
Comment 4 Yuriy Shkandybin 2007-03-09 00:42:20 UTC
> Could this be done with:
> CONFIG_PROVE_LOCKING = y
> CONFIG_4KSTACKS = n
>
> and, if possible, to send the first full lockdep error
> message with some lines before it? It would be nice to
> try to be more exact when hand writing, too.
>
> Regards,
> Jarek P.
>

I already have
CONFIG_PROVE_LOCKING=y
# CONFIG_4KSTACKS is not set
.config is at
http://bugzilla.kernel.org/attachment.cgi?id=10660&action=view
Also all information i've provied was recieved by serial console and it's 
not hand writing.

I've checked logs and right before lockup there is oops in syslog
Mar  5 21:50:44 vpn2 skb_under_panic: text:c02248a2 len:207 put:1 
head:db96e22c data:db96e22b tail:db96e2fa end:db96e82c dev:<NULL>
Mar  5 21:50:44 vpn2 ------------[ cut here ]------------
Mar  5 21:50:44 vpn2 kernel BUG at net/core/skbuff.c:112!
Mar  5 21:50:44 vpn2 invalid opcode: 0000 [#1]
Mar  5 21:50:44 vpn2 SMP
Mar  5 21:50:44 vpn2 Modules linked in: 8021q ipt_TCPMSS xt_tcpudp 
xt_pkttype iptable_filter ip_tables x_tables i2c_i801 i2c_core
Mar  5 21:50:44 vpn2 CPU:    1
Mar  5 21:50:44 vpn2 EIP:    0060:[<c024d1e4>]    Not tainted VLI
Mar  5 21:50:44 vpn2 EFLAGS: 00010092   (2.6.20-gentoo #10)
Mar  5 21:50:44 vpn2 EIP is at skb_under_panic+0x59/0x5d
Mar  5 21:50:44 vpn2 eax: 00000072   ebx: db96e22c   ecx: 00000001   edx: 
de20d4d0
Mar  5 21:50:44 vpn2 esi: 00000000   edi: db96e2fc   ebp: dc5ab2e8   esp: 
dcaf5e84
Mar  5 21:50:44 vpn2 ds: 007b   es: 007b   ss: 0068
Mar  5 21:50:44 vpn2 Process pptpctrl (pid: 5203, ti=dcaf4000 task=de20d4d0 
task.ti=dcaf4000)
Mar  5 21:50:44 vpn2 Stack: c0315e34 c02248a2 000000cf 00000001 db96e22c 
db96e22b db96e2fa db96e82c
Mar  5 21:50:44 vpn2 c0301eef 00000004 00000000 c02248b0 de20d4d0 de20da04 
de20d4d0 00000000
Mar  5 21:50:44 vpn2 db96e22b 000000cf de3e28f0 de3e2884 00000000 ddae82ac 
de3e2854 00000296
Mar  5 21:50:44 vpn2 Call Trace:
Mar  5 21:50:44 vpn2 [<c02248a2>] ppp_asynctty_receive+0x6d2/0x710
Mar  5 21:50:44 vpn2 [<c02248b0>] ppp_asynctty_receive+0x6e0/0x710
Mar  5 21:50:44 vpn2 [<c01d5a09>] pty_write+0x39/0x41
Mar  5 21:50:44 vpn2 [<c01d37e1>] write_chan+0x212/0x320
Mar  5 21:50:44 vpn2 [<c0116209>] default_wake_function+0x0/0xc
Mar  5 21:50:44 vpn2 [<c01d1028>] tty_write+0x11c/0x1d0
Mar  5 21:50:44 vpn2 [<c01d35cf>] write_chan+0x0/0x320
Mar  5 21:50:44 vpn2 [<c015d502>] vfs_write+0x87/0xf0
Mar  5 21:50:44 vpn2 [<c01d0f0c>] tty_write+0x0/0x1d0
Mar  5 21:50:44 vpn2 [<c015daa9>] sys_write+0x41/0x6a
Mar  5 21:50:44 vpn2 [<c0102dae>] sysenter_past_esp+0x5f/0x99
Mar  5 21:50:44 vpn2 =======================
Mar  5 21:50:44 vpn2 Code: 00 00 89 5c 24 14 8b 98 8c 00 00 00 89 5c 24 10 
89 54 24 0c 8b 40 60 89 44 24 08 89 4c 24 04 c7 04 24 34 5e 31 c0 e8 8e e4 
ec ff <0
f> 0b eb fe 56 53 83 ec 24 8b 70 14 bb ef 1e 30 c0 85 f6 0f 45
Mar  5 21:50:44 vpn2 EIP: [<c024d1e4>] skb_under_panic+0x59/0x5d SS:ESP 
0068:dcaf5e84

Another thing - when i`ve changed HZ from 100 too 300 there is no such 
lockups for few days.

Jura

Comment 5 Jarek Poplawski 2007-03-09 02:17:43 UTC
On Fri, Mar 09, 2007 at 11:40:04AM +0300, Yuriy N. Shkandybin wrote:
...
> I already have
> CONFIG_PROVE_LOCKING=y
> # CONFIG_4KSTACKS is not set
> .config is at
> http://bugzilla.kernel.org/attachment.cgi?id=10660&action=view
> Also all information i've provied was recieved by serial console and it's 
> not hand writing.

Maybe you cut too much (lines before a bug could be meaningful
- if there are some secret data - try to mask only them).
And there is no real lockdep diagnose, so I wasn't sure.
Dump registers at the end looks incomplete (no ESP info), too.
So, this all made me think - it's cut because it's from the screen.
Very sorry!!!

> I've checked logs and right before lockup there is oops in syslog
> Mar  5 21:50:44 vpn2 skb_under_panic: text:c02248a2 len:207 put:1 

As a matter of fact the first bug is the most interesting
and everything after could be only some aftereffect.
This skb_under_panic message could change all diagnose:
it's not necessarily a locking problem.
On the other hand this could be even harder to diagnose
than locking...

> head:db96e22c data:db96e22b tail:db96e2fa end:db96e82c dev:<NULL>
> Mar  5 21:50:44 vpn2 ------------[ cut here ]------------

Why?! We don't even know if there was some config pending here.

> Mar  5 21:50:44 vpn2 kernel BUG at net/core/skbuff.c:112!
...
> Another thing - when i`ve changed HZ from 100 too 300 there is no such 
> lockups for few days.

You wrote it had been since several kernel releases, so it
shouldn't be the case, but anyway - it would be interesting
to check HZ 100 but with clean and more current kernel version
e.g. 2.6.21-rc3.

Thanks & regards,
Jarek P.

Comment 6 Jarek Poplawski 2007-03-09 05:35:04 UTC
On Fri, Mar 09, 2007 at 11:40:04AM +0300, Yuriy N. Shkandybin wrote:
...
> .config is at
> http://bugzilla.kernel.org/attachment.cgi?id=10660&action=view
> Also all information i've provied was recieved by serial console and it's 
> not hand writing.
> 
> I've checked logs and right before lockup there is oops in syslog
> Mar  5 21:50:44 vpn2 skb_under_panic: text:c02248a2 len:207 put:1 
> head:db96e22c data:db96e22b tail:db96e2fa end:db96e82c dev:<NULL>

This looks like a real problem with skb and maybe with
dev->hard_header_len. I see you are using vlan module,
so maybe there is some interaction? I don't know ppp
enough, so I CC this message to the ppp list.
I'm not sure HZ change will cure this forever (maybe
some packets are going to the wrong dev?).

If you're willing to experiment, you can try to edit
"include/linux/ppp_defs.h" and change it like this:

#define PPP_HDRLEN	8
#define PPP_MRU		1496

and "include/linux/if_ppp.h":

#define PPP_MTU		1496

plus mru/mtu in your pppd config (and recompile).

But I hope ppp people will propose something better.

Cheers,
Jarek P.

> Mar  5 21:50:44 vpn2 ------------[ cut here ]------------
> Mar  5 21:50:44 vpn2 kernel BUG at net/core/skbuff.c:112!
> Mar  5 21:50:44 vpn2 invalid opcode: 0000 [#1]
> Mar  5 21:50:44 vpn2 SMP
> Mar  5 21:50:44 vpn2 Modules linked in: 8021q ipt_TCPMSS xt_tcpudp 
> xt_pkttype iptable_filter ip_tables x_tables i2c_i801 i2c_core
> Mar  5 21:50:44 vpn2 CPU:    1
> Mar  5 21:50:44 vpn2 EIP:    0060:[<c024d1e4>]    Not tainted VLI
> Mar  5 21:50:44 vpn2 EFLAGS: 00010092   (2.6.20-gentoo #10)
> Mar  5 21:50:44 vpn2 EIP is at skb_under_panic+0x59/0x5d
> Mar  5 21:50:44 vpn2 eax: 00000072   ebx: db96e22c   ecx: 00000001   edx: 
> de20d4d0
> Mar  5 21:50:44 vpn2 esi: 00000000   edi: db96e2fc   ebp: dc5ab2e8   esp: 
> dcaf5e84
> Mar  5 21:50:44 vpn2 ds: 007b   es: 007b   ss: 0068
> Mar  5 21:50:44 vpn2 Process pptpctrl (pid: 5203, ti=dcaf4000 task=de20d4d0 
> task.ti=dcaf4000)
> Mar  5 21:50:44 vpn2 Stack: c0315e34 c02248a2 000000cf 00000001 db96e22c 
> db96e22b db96e2fa db96e82c
> Mar  5 21:50:44 vpn2 c0301eef 00000004 00000000 c02248b0 de20d4d0 de20da04 
> de20d4d0 00000000
> Mar  5 21:50:44 vpn2 db96e22b 000000cf de3e28f0 de3e2884 00000000 ddae82ac 
> de3e2854 00000296
> Mar  5 21:50:44 vpn2 Call Trace:
> Mar  5 21:50:44 vpn2 [<c02248a2>] ppp_asynctty_receive+0x6d2/0x710
> Mar  5 21:50:44 vpn2 [<c02248b0>] ppp_asynctty_receive+0x6e0/0x710
> Mar  5 21:50:44 vpn2 [<c01d5a09>] pty_write+0x39/0x41
> Mar  5 21:50:44 vpn2 [<c01d37e1>] write_chan+0x212/0x320
> Mar  5 21:50:44 vpn2 [<c0116209>] default_wake_function+0x0/0xc
> Mar  5 21:50:44 vpn2 [<c01d1028>] tty_write+0x11c/0x1d0
> Mar  5 21:50:44 vpn2 [<c01d35cf>] write_chan+0x0/0x320
> Mar  5 21:50:44 vpn2 [<c015d502>] vfs_write+0x87/0xf0
> Mar  5 21:50:44 vpn2 [<c01d0f0c>] tty_write+0x0/0x1d0
> Mar  5 21:50:44 vpn2 [<c015daa9>] sys_write+0x41/0x6a
> Mar  5 21:50:44 vpn2 [<c0102dae>] sysenter_past_esp+0x5f/0x99
> Mar  5 21:50:44 vpn2 =======================
> Mar  5 21:50:44 vpn2 Code: 00 00 89 5c 24 14 8b 98 8c 00 00 00 89 5c 24 10 
> 89 54 24 0c 8b 40 60 89 44 24 08 89 4c 24 04 c7 04 24 34 5e 31 c0 e8 8e e4 
> ec ff <0
> f> 0b eb fe 56 53 83 ec 24 8b 70 14 bb ef 1e 30 c0 85 f6 0f 45
> Mar  5 21:50:44 vpn2 EIP: [<c024d1e4>] skb_under_panic+0x59/0x5d SS:ESP 
> 0068:dcaf5e84
> 
> Another thing - when i`ve changed HZ from 100 too 300 there is no such 
> lockups for few days.
> 
> Jura
> 

Comment 7 Yuriy Shkandybin 2007-03-19 00:49:39 UTC
I've changed kernel to rc4 and completely changed hardware.
Now this is

I've got new trace, but this is another problem as i can see and connected 
with pppoe

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.21-rc4 #1
-------------------------------------------------------
pppd/8926 is trying to acquire lock:
 (&vlan_netdev_xmit_lock_key){-...}, at: [<c0265486>] 
dev_queue_xmit+0x247/0x2f1

but task is already holding lock:
 (&pch->downl){-+..}, at: [<c0230c72>] ppp_channel_push+0x19/0x9a

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #3 (&pch->downl){-+..}:
       [<c013642b>] __lock_acquire+0xe62/0x1010
       [<c0136642>] lock_acquire+0x69/0x83
       [<c02afc13>] _spin_lock_bh+0x30/0x3d
       [<c022f715>] ppp_push+0x5a/0x9a
       [<c022fb40>] ppp_xmit_process+0x2e/0x511
       [<c0231a05>] ppp_write+0xb8/0xf2
       [<c015ec26>] vfs_write+0x7f/0xba
       [<c015f158>] sys_write+0x3d/0x64
       [<c01027de>] sysenter_past_esp+0x5f/0x99
       [<ffffffff>] 0xffffffff

-> #2 (&ppp->wlock){-+..}:
       [<c013642b>] __lock_acquire+0xe62/0x1010
       [<c0136642>] lock_acquire+0x69/0x83
       [<c02afc13>] _spin_lock_bh+0x30/0x3d
       [<c022fb2b>] ppp_xmit_process+0x19/0x511
       [<c02318d3>] ppp_start_xmit+0x18a/0x204
       [<c0263a6f>] dev_hard_start_xmit+0x1f6/0x2c4
       [<c026ded3>] __qdisc_run+0x81/0x1bc
       [<c026549e>] dev_queue_xmit+0x25f/0x2f1
       [<c027c75f>] ip_output+0x1be/0x25f
       [<c02788ce>] ip_forward+0x159/0x22b
       [<c027745c>] ip_rcv+0x297/0x4dd
       [<c0263698>] netif_receive_skb+0x164/0x1f2
       [<c022199d>] e1000_clean_rx_irq+0x12a/0x4b7
       [<c02209bc>] e1000_clean+0x3ff/0x5dd
       [<c0265084>] net_rx_action+0x7d/0x12b
       [<c011e442>] __do_softirq+0x82/0xf2
       [<c011e509>] do_softirq+0x57/0x59
       [<c011e877>] irq_exit+0x7f/0x81
       [<c0105011>] do_IRQ+0x45/0x84
       [<c0103252>] common_interrupt+0x2e/0x34
       [<c0100b66>] mwait_idle+0x12/0x14
       [<c0100c60>] cpu_idle+0x6c/0x86
       [<c01001cd>] rest_init+0x23/0x36
       [<c0377d89>] start_kernel+0x3ca/0x461
       [<00000000>] 0x0
       [<ffffffff>] 0xffffffff

-> #1 (&dev->_xmit_lock){-+..}:
       [<c013642b>] __lock_acquire+0xe62/0x1010
       [<c0136642>] lock_acquire+0x69/0x83
       [<c02afc13>] _spin_lock_bh+0x30/0x3d
       [<c0266861>] dev_mc_add+0x34/0x16a
       [<c02ab5c7>] vlan_dev_set_multicast_list+0x88/0x25c
       [<c0266592>] __dev_mc_upload+0x22/0x24
       [<c0266914>] dev_mc_add+0xe7/0x16a
       [<c029f323>] igmp_group_added+0xe6/0xeb
       [<c029f50b>] ip_mc_inc_group+0x13f/0x210
       [<c029f5fa>] ip_mc_up+0x1e/0x61
       [<c029ab81>] inetdev_event+0x154/0x2c7
       [<c0125a46>] notifier_call_chain+0x2c/0x39
       [<c0125a7c>] raw_notifier_call_chain+0x8/0xa
       [<c026477a>] dev_open+0x6d/0x71
       [<c0263028>] dev_change_flags+0x51/0x101
       [<c029b7ca>] devinet_ioctl+0x4df/0x644
       [<c029bc03>] inet_ioctl+0x5c/0x6f
       [<c02596e0>] sock_ioctl+0x4f/0x1e8
       [<c0168c32>] do_ioctl+0x22/0x71
       [<c0168cd6>] vfs_ioctl+0x55/0x27e
       [<c0168f32>] sys_ioctl+0x33/0x51
       [<c01027de>] sysenter_past_esp+0x5f/0x99
       [<ffffffff>] 0xffffffff

-> #0 (&vlan_netdev_xmit_lock_key){-...}:
       [<c0136289>] __lock_acquire+0xcc0/0x1010
       [<c0136642>] lock_acquire+0x69/0x83
       [<c02afbd6>] _spin_lock+0x2b/0x38
       [<c0265486>] dev_queue_xmit+0x247/0x2f1
       [<c02334f6>] __pppoe_xmit+0x1a9/0x215
       [<c023356c>] pppoe_xmit+0xa/0xc
       [<c0230c9a>] ppp_channel_push+0x41/0x9a
       [<c0231a13>] ppp_write+0xc6/0xf2
       [<c015ec26>] vfs_write+0x7f/0xba
       [<c015f158>] sys_write+0x3d/0x64
       [<c01027de>] sysenter_past_esp+0x5f/0x99
       [<ffffffff>] 0xffffffff

other info that might help us debug this:

1 lock held by pppd/8926:
 #0:  (&pch->downl){-+..}, at: [<c0230c72>] ppp_channel_push+0x19/0x9a

stack backtrace:
 [<c0103834>] show_trace_log_lvl+0x1a/0x30
 [<c0103f16>] show_trace+0x12/0x14
 [<c0103f9d>] dump_stack+0x16/0x18
 [<c01343cd>] print_circular_bug_tail+0x68/0x71
 [<c0136289>] __lock_acquire+0xcc0/0x1010
 [<c0136642>] lock_acquire+0x69/0x83
 [<c02afbd6>] _spin_lock+0x2b/0x38
 [<c0265486>] dev_queue_xmit+0x247/0x2f1
 [<c02334f6>] __pppoe_xmit+0x1a9/0x215
 [<c023356c>] pppoe_xmit+0xa/0xc
 [<c0230c9a>] ppp_channel_push+0x41/0x9a
 [<c0231a13>] ppp_write+0xc6/0xf2
 [<c015ec26>] vfs_write+0x7f/0xba
 [<c015f158>] sys_write+0x3d/0x64
 [<c01027de>] sysenter_past_esp+0x5f/0x99
 =======================
Clocksource tsc unstable (delta = 4686844667 ns)
Time: acpi_pm clocksource has been installed.

> On Fri, Mar 09, 2007 at 11:40:04AM +0300, Yuriy N. Shkandybin wrote:
> ...
>> .config is at
>> http://bugzilla.kernel.org/attachment.cgi?id=10660&action=view
>> Also all information i've provied was recieved by serial console and it's
>> not hand writing.
>>
>> I've checked logs and right before lockup there is oops in syslog
>> Mar  5 21:50:44 vpn2 skb_under_panic: text:c02248a2 len:207 put:1
>> head:db96e22c data:db96e22b tail:db96e2fa end:db96e82c dev:<NULL>
>
> This looks like a real problem with skb and maybe with
> dev->hard_header_len. I see you are using vlan module,
> so maybe there is some interaction? I don't know ppp
> enough, so I CC this message to the ppp list.
> I'm not sure HZ change will cure this forever (maybe
> some packets are going to the wrong dev?).
>
> If you're willing to experiment, you can try to edit
> "include/linux/ppp_defs.h" and change it like this:
>
> #define PPP_HDRLEN 8
> #define PPP_MRU 1496
>
> and "include/linux/if_ppp.h":
>
> #define PPP_MTU 1496
>
> plus mru/mtu in your pppd config (and recompile).
>
> But I hope ppp people will propose something better.
>
> Cheers,
> Jarek P.
>
>> Mar  5 21:50:44 vpn2 ------------[ cut here ]------------
>> Mar  5 21:50:44 vpn2 kernel BUG at net/core/skbuff.c:112!
>> Mar  5 21:50:44 vpn2 invalid opcode: 0000 [#1]
>> Mar  5 21:50:44 vpn2 SMP
>> Mar  5 21:50:44 vpn2 Modules linked in: 8021q ipt_TCPMSS xt_tcpudp
>> xt_pkttype iptable_filter ip_tables x_tables i2c_i801 i2c_core
>> Mar  5 21:50:44 vpn2 CPU:    1
>> Mar  5 21:50:44 vpn2 EIP:    0060:[<c024d1e4>]    Not tainted VLI
>> Mar  5 21:50:44 vpn2 EFLAGS: 00010092   (2.6.20-gentoo #10)
>> Mar  5 21:50:44 vpn2 EIP is at skb_under_panic+0x59/0x5d
>> Mar  5 21:50:44 vpn2 eax: 00000072   ebx: db96e22c   ecx: 00000001   edx:
>> de20d4d0
>> Mar  5 21:50:44 vpn2 esi: 00000000   edi: db96e2fc   ebp: dc5ab2e8   esp:
>> dcaf5e84
>> Mar  5 21:50:44 vpn2 ds: 007b   es: 007b   ss: 0068
>> Mar  5 21:50:44 vpn2 Process pptpctrl (pid: 5203, ti=dcaf4000 
>> task=de20d4d0
>> task.ti=dcaf4000)
>> Mar  5 21:50:44 vpn2 Stack: c0315e34 c02248a2 000000cf 00000001 db96e22c
>> db96e22b db96e2fa db96e82c
>> Mar  5 21:50:44 vpn2 c0301eef 00000004 00000000 c02248b0 de20d4d0 
>> de20da04
>> de20d4d0 00000000
>> Mar  5 21:50:44 vpn2 db96e22b 000000cf de3e28f0 de3e2884 00000000 
>> ddae82ac
>> de3e2854 00000296
>> Mar  5 21:50:44 vpn2 Call Trace:
>> Mar  5 21:50:44 vpn2 [<c02248a2>] ppp_asynctty_receive+0x6d2/0x710
>> Mar  5 21:50:44 vpn2 [<c02248b0>] ppp_asynctty_receive+0x6e0/0x710
>> Mar  5 21:50:44 vpn2 [<c01d5a09>] pty_write+0x39/0x41
>> Mar  5 21:50:44 vpn2 [<c01d37e1>] write_chan+0x212/0x320
>> Mar  5 21:50:44 vpn2 [<c0116209>] default_wake_function+0x0/0xc
>> Mar  5 21:50:44 vpn2 [<c01d1028>] tty_write+0x11c/0x1d0
>> Mar  5 21:50:44 vpn2 [<c01d35cf>] write_chan+0x0/0x320
>> Mar  5 21:50:44 vpn2 [<c015d502>] vfs_write+0x87/0xf0
>> Mar  5 21:50:44 vpn2 [<c01d0f0c>] tty_write+0x0/0x1d0
>> Mar  5 21:50:44 vpn2 [<c015daa9>] sys_write+0x41/0x6a
>> Mar  5 21:50:44 vpn2 [<c0102dae>] sysenter_past_esp+0x5f/0x99
>> Mar  5 21:50:44 vpn2 =======================
>> Mar  5 21:50:44 vpn2 Code: 00 00 89 5c 24 14 8b 98 8c 00 00 00 89 5c 24 
>> 10
>> 89 54 24 0c 8b 40 60 89 44 24 08 89 4c 24 04 c7 04 24 34 5e 31 c0 e8 8e 
>> e4
>> ec ff <0
>> f> 0b eb fe 56 53 83 ec 24 8b 70 14 bb ef 1e 30 c0 85 f6 0f 45
>> Mar  5 21:50:44 vpn2 EIP: [<c024d1e4>] skb_under_panic+0x59/0x5d SS:ESP
>> 0068:dcaf5e84
>>
>> Another thing - when i`ve changed HZ from 100 too 300 there is no such
>> lockups for few days.
>>
>> Jura
>>
> 

Comment 8 Jarek Poplawski 2007-03-28 01:15:18 UTC
On Mon, Mar 19, 2007 at 10:49:12AM +0300, Yuriy N. Shkandybin wrote:
> I've changed kernel to rc4 and completely changed hardware.
> Now this is
> 
> I've got new trace, but this is another problem as i can see and connected 
> with pppoe
> 
> =======================================================
> [ INFO: possible circular locking dependency detected ]
> 2.6.21-rc4 #1
> -------------------------------------------------------
> pppd/8926 is trying to acquire lock:
> (&vlan_netdev_xmit_lock_key){-...}, at: [<c0265486>] 
> dev_queue_xmit+0x247/0x2f1
> 
> but task is already holding lock:
> (&pch->downl){-+..}, at: [<c0230c72>] ppp_channel_push+0x19/0x9a
> 
> which lock already depends on the new lock.
> 
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #3 (&pch->downl){-+..}:
>       [<c013642b>] __lock_acquire+0xe62/0x1010
>       [<c0136642>] lock_acquire+0x69/0x83
>       [<c02afc13>] _spin_lock_bh+0x30/0x3d
>       [<c022f715>] ppp_push+0x5a/0x9a
>       [<c022fb40>] ppp_xmit_process+0x2e/0x511
>       [<c0231a05>] ppp_write+0xb8/0xf2
>       [<c015ec26>] vfs_write+0x7f/0xba
>       [<c015f158>] sys_write+0x3d/0x64
>       [<c01027de>] sysenter_past_esp+0x5f/0x99
>       [<ffffffff>] 0xffffffff
> 
> -> #2 (&ppp->wlock){-+..}:
>       [<c013642b>] __lock_acquire+0xe62/0x1010
>       [<c0136642>] lock_acquire+0x69/0x83
>       [<c02afc13>] _spin_lock_bh+0x30/0x3d
>       [<c022fb2b>] ppp_xmit_process+0x19/0x511
>       [<c02318d3>] ppp_start_xmit+0x18a/0x204
>       [<c0263a6f>] dev_hard_start_xmit+0x1f6/0x2c4
>       [<c026ded3>] __qdisc_run+0x81/0x1bc
>       [<c026549e>] dev_queue_xmit+0x25f/0x2f1
>       [<c027c75f>] ip_output+0x1be/0x25f
>       [<c02788ce>] ip_forward+0x159/0x22b
>       [<c027745c>] ip_rcv+0x297/0x4dd
>       [<c0263698>] netif_receive_skb+0x164/0x1f2
>       [<c022199d>] e1000_clean_rx_irq+0x12a/0x4b7
>       [<c02209bc>] e1000_clean+0x3ff/0x5dd
>       [<c0265084>] net_rx_action+0x7d/0x12b
>       [<c011e442>] __do_softirq+0x82/0xf2
>       [<c011e509>] do_softirq+0x57/0x59
>       [<c011e877>] irq_exit+0x7f/0x81
>       [<c0105011>] do_IRQ+0x45/0x84
>       [<c0103252>] common_interrupt+0x2e/0x34
>       [<c0100b66>] mwait_idle+0x12/0x14
>       [<c0100c60>] cpu_idle+0x6c/0x86
>       [<c01001cd>] rest_init+0x23/0x36
>       [<c0377d89>] start_kernel+0x3ca/0x461
>       [<00000000>] 0x0
>       [<ffffffff>] 0xffffffff
> 
> -> #1 (&dev->_xmit_lock){-+..}:
>       [<c013642b>] __lock_acquire+0xe62/0x1010
>       [<c0136642>] lock_acquire+0x69/0x83
>       [<c02afc13>] _spin_lock_bh+0x30/0x3d
>       [<c0266861>] dev_mc_add+0x34/0x16a
>       [<c02ab5c7>] vlan_dev_set_multicast_list+0x88/0x25c
>       [<c0266592>] __dev_mc_upload+0x22/0x24
>       [<c0266914>] dev_mc_add+0xe7/0x16a
>       [<c029f323>] igmp_group_added+0xe6/0xeb
>       [<c029f50b>] ip_mc_inc_group+0x13f/0x210
>       [<c029f5fa>] ip_mc_up+0x1e/0x61
>       [<c029ab81>] inetdev_event+0x154/0x2c7
>       [<c0125a46>] notifier_call_chain+0x2c/0x39
>       [<c0125a7c>] raw_notifier_call_chain+0x8/0xa
>       [<c026477a>] dev_open+0x6d/0x71
>       [<c0263028>] dev_change_flags+0x51/0x101
>       [<c029b7ca>] devinet_ioctl+0x4df/0x644
>       [<c029bc03>] inet_ioctl+0x5c/0x6f
>       [<c02596e0>] sock_ioctl+0x4f/0x1e8
>       [<c0168c32>] do_ioctl+0x22/0x71
>       [<c0168cd6>] vfs_ioctl+0x55/0x27e
>       [<c0168f32>] sys_ioctl+0x33/0x51
>       [<c01027de>] sysenter_past_esp+0x5f/0x99
>       [<ffffffff>] 0xffffffff
> 
> -> #0 (&vlan_netdev_xmit_lock_key){-...}:
>       [<c0136289>] __lock_acquire+0xcc0/0x1010
>       [<c0136642>] lock_acquire+0x69/0x83
>       [<c02afbd6>] _spin_lock+0x2b/0x38
>       [<c0265486>] dev_queue_xmit+0x247/0x2f1
>       [<c02334f6>] __pppoe_xmit+0x1a9/0x215
>       [<c023356c>] pppoe_xmit+0xa/0xc
>       [<c0230c9a>] ppp_channel_push+0x41/0x9a
>       [<c0231a13>] ppp_write+0xc6/0xf2
>       [<c015ec26>] vfs_write+0x7f/0xba
>       [<c015f158>] sys_write+0x3d/0x64
>       [<c01027de>] sysenter_past_esp+0x5f/0x99
>       [<ffffffff>] 0xffffffff
> 
> other info that might help us debug this:
> 
> 1 lock held by pppd/8926:
> #0:  (&pch->downl){-+..}, at: [<c0230c72>] ppp_channel_push+0x19/0x9a
> 
> stack backtrace:
> [<c0103834>] show_trace_log_lvl+0x1a/0x30
> [<c0103f16>] show_trace+0x12/0x14
> [<c0103f9d>] dump_stack+0x16/0x18
> [<c01343cd>] print_circular_bug_tail+0x68/0x71
> [<c0136289>] __lock_acquire+0xcc0/0x1010
> [<c0136642>] lock_acquire+0x69/0x83
> [<c02afbd6>] _spin_lock+0x2b/0x38
> [<c0265486>] dev_queue_xmit+0x247/0x2f1
> [<c02334f6>] __pppoe_xmit+0x1a9/0x215
> [<c023356c>] pppoe_xmit+0xa/0xc
> [<c0230c9a>] ppp_channel_push+0x41/0x9a
> [<c0231a13>] ppp_write+0xc6/0xf2
> [<c015ec26>] vfs_write+0x7f/0xba
> [<c015f158>] sys_write+0x3d/0x64
> [<c01027de>] sysenter_past_esp+0x5f/0x99
> =======================
> Clocksource tsc unstable (delta = 4686844667 ns)
> Time: acpi_pm clocksource has been installed.
...

lockdep has seen locks "-> #0" - "-> #3" taken in circular
order, but IMHO, lock "-> #3" (&pch->downl) taken after
"-> #2" (&ppp->wlock) differs from &pch->downl lock taken in
"-> #0" (before &vlan_netdev_xmit_lock_key) and lockdep
should be notified about this.

This patch proposal needs confirmation by some PPP expert
that channels processed in ppp_channel_push() differ from
channels processed in ppp_xmit_process(), so the same channel
cannot be processed by turns with both these functions.

I also hope Yuriy could test this patch (this should apply to
2.6.21-rc4 or later, CONFIG_PROVE_LOCKING = y), thanks.

Reported & tested by: "Yuriy N. Shkandybin" <jura@netams.com>
Signed-off-by: Jarek Poplawski <jarkao2@o2.pl>

(don't apply without above mentioned acks)
---

diff -Nurp 2.6.21-rc5-git1-/drivers/net/ppp_generic.c 2.6.21-rc5-git1/drivers/net/ppp_generic.c
--- 2.6.21-rc5-git1-/drivers/net/ppp_generic.c	2007-03-27 20:24:30.000000000 +0200
+++ 2.6.21-rc5-git1/drivers/net/ppp_generic.c	2007-03-27 20:34:44.000000000 +0200
@@ -1435,7 +1435,8 @@ ppp_channel_push(struct channel *pch)
 	struct sk_buff *skb;
 	struct ppp *ppp;
 
-	spin_lock_bh(&pch->downl);
+	local_bh_disable();
+	spin_lock_nested(&pch->downl, SINGLE_DEPTH_NESTING);
 	if (pch->chan != 0) {
 		while (!skb_queue_empty(&pch->file.xq)) {
 			skb = skb_dequeue(&pch->file.xq);
@@ -1449,7 +1450,8 @@ ppp_channel_push(struct channel *pch)
 		/* channel got deregistered */
 		skb_queue_purge(&pch->file.xq);
 	}
-	spin_unlock_bh(&pch->downl);
+	spin_unlock(&pch->downl);
+	local_bh_enable();
 	/* see if there is anything from the attached unit to be sent */
 	if (skb_queue_empty(&pch->file.xq)) {
 		read_lock_bh(&pch->upl);

Comment 9 Natalie Protasevich 2007-07-07 17:49:05 UTC
Yuri,
Did you get chance to test the patch?
Thanks.
Comment 10 David Juran 2007-07-08 23:55:33 UTC
Subject: Re:  pptp server lockup in ppp_asynctty_receive()

Yes, it works.
I've confirmed it to Jarek Poplawski.
Probably
"Reported & tested by: "Yuriy N. Shkandybin" <jura@netams.com>" means that.

Jura

----- Original Message ----- 
From: <bugme-daemon@bugzilla.kernel.org>
To: <jura@netams.com>
Sent: Sunday, July 08, 2007 4:44 AM
Subject: [Bug 8132] pptp server lockup in ppp_asynctty_receive()


> http://bugzilla.kernel.org/show_bug.cgi?id=8132
>
>
> protasnb@gmail.com changed:
>
>           What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                 CC|                            |protasnb@gmail.com
>      KernelVersion| 2.6.20                     |2.6.20
>
>
>
>
> ------- Comment #9 from protasnb@gmail.com  2007-07-07 17:49 -------
> Yuri,
> Did you get chance to test the patch?
> Thanks.
>
>
> -- 
> Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug, or are watching someone who is.
> You reported the bug, or are watching the reporter.
> 
Comment 11 Natalie Protasevich 2007-07-20 22:07:41 UTC
OK thanks, the bug can be closed now.

Note You need to log in before you can comment on or make changes to this bug.