Bug 43110 - tasks blocked for more than 120sec with heavy NFS traffic
Summary: tasks blocked for more than 120sec with heavy NFS traffic
Status: RESOLVED UNREPRODUCIBLE
Alias: None
Product: File System
Classification: Unclassified
Component: NFS (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: bfields
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-04-16 19:36 UTC by warpme
Modified: 2012-10-30 19:21 UTC (History)
4 users (show)

See Also:
Kernel Version: 3.0.27
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description warpme 2012-04-16 19:36:26 UTC
Hi,

Forgive me if this report is wrongly classified - but looking on stack traces I can't judge better where root cause is. 
I have server (Archlinux based) running as MythTV backend and also as NFS server. 
Generally server seem to be rock solid stable (24/7 uptime measured in weeks; interrupted by maintenance or software upgrades). 
Recently I was copying 35G file from MAC desktop to server one NFS. After 5-10min since copy start, all server user task becomes blocked (telnet, HTTPserver, myth backend, etc). 
In this state NFS file copy was ongoing OK. When NFS transfer ends - all process resumed. Kernel log reported as below. 
I'm not sure is it kernel issue - but it was definitely correlated with NFS transfer (as soon as NFS copy finish - all server tasks resume).

Apr 15 21:04:50 [461880.890543] INFO: task kjournald:9163 blocked for more than 120 seconds.
Apr 15 21:04:50 [461880.890582] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 15 21:04:50 [461880.890620] kjournald       D 00000001083ffddf     0  9163      2 0x00000000
Apr 15 21:04:50 [461880.890624]  ffff8800b6577c10 0000000000000046 00000001083ffddf ffffffff810ffd9a
Apr 15 21:04:50 [461880.890628]  ffff8801195a9590 ffff8800b6577fd8 ffff8800b6577fd8 ffff8800b6577fd8
Apr 15 21:04:50 [461880.890631]  ffff8800105323f0 ffff8801195a9590 ffff88011fffb700 0000000100000000
Apr 15 21:04:50 [461880.890634] Call Trace:
Apr 15 21:04:50 [461880.890640]  [<ffffffff810ffd9a>] ? free_pcppages_bulk+0x42a/0x470
Apr 15 21:04:50 [461880.890645]  [<ffffffff81012f69>] ? read_tsc+0x9/0x20
Apr 15 21:04:50 [461880.890648]  [<ffffffff811860e0>] ? __wait_on_buffer+0x30/0x30
Apr 15 21:04:50 [461880.890651]  [<ffffffff813f3fdf>] schedule+0x3f/0x60
Apr 15 21:04:50 [461880.890653]  [<ffffffff813f408f>] io_schedule+0x8f/0xd0
Apr 15 21:04:50 [461880.890656]  [<ffffffff811860ee>] sleep_on_buffer+0xe/0x20
Apr 15 21:04:50 [461880.890658]  [<ffffffff813f483f>] __wait_on_bit+0x5f/0x90
Apr 15 21:04:50 [461880.890660]  [<ffffffff811860e0>] ? __wait_on_buffer+0x30/0x30
Apr 15 21:04:50 [461880.890663]  [<ffffffff813f48ec>] out_of_line_wait_on_bit+0x7c/0x90
Apr 15 21:04:50 [461880.890666]  [<ffffffff8107f840>] ? autoremove_wake_function+0x40/0x40
Apr 15 21:04:50 [461880.890669]  [<ffffffff811860de>] __wait_on_buffer+0x2e/0x30
Apr 15 21:04:50 [461880.890687]  [<ffffffffa00a5b2c>] journal_commit_transaction+0xcac/0x13c0 [jbd]
Apr 15 21:04:50 [461880.890690]  [<ffffffff8106cd66>] ? try_to_del_timer_sync+0x96/0x160
Apr 15 21:04:50 [461880.890696]  [<ffffffffa00a8a76>] kjournald+0xe6/0x250 [jbd]
Apr 15 21:04:50 [461880.890699]  [<ffffffff8107f800>] ? abort_exclusive_wait+0xb0/0xb0
Apr 15 21:04:50 [461880.890703]  [<ffffffffa00a8990>] ? commit_timeout+0x10/0x10 [jbd]
Apr 15 21:04:50 [461880.890706]  [<ffffffff8107eeac>] kthread+0x8c/0xa0
Apr 15 21:04:50 [461880.890709]  [<ffffffff813f8864>] kernel_thread_helper+0x4/0x10
Apr 15 21:04:50 [461880.890711]  [<ffffffff8107ee20>] ? kthread_worker_fn+0x190/0x190
Apr 15 21:04:50 [461880.890714]  [<ffffffff813f8860>] ? gs_change+0x13/0x13
Apr 15 21:04:50 [461880.890718] INFO: task mythbackend:15037 blocked for more than 120 seconds.
Apr 15 21:04:50 [461880.890766] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 15 21:04:50 [461880.890803] mythbackend     D 00000001083ff1a3     0 15037      1 0x00000000
Apr 15 21:04:50 [461880.890807]  ffff88002045de78 0000000000000086 ffff88002045deb0 ffff88002045de48
Apr 15 21:04:50 [461880.890810]  ffff8801090e5d70 ffff88002045dfd8 ffff88002045dfd8 ffff88002045dfd8
Apr 15 21:04:50 [461880.890813]  ffffffff8169b020 ffff8801090e5d70 ffff88001b7b3f08 ffff88001b7b3f10
Apr 15 21:04:50 [461880.890815] Call Trace:
Apr 15 21:04:50 [461880.890819]  [<ffffffff81103ac5>] ? pagevec_lookup_tag+0x25/0x40
Apr 15 21:04:50 [461880.890822]  [<ffffffff810f7ecf>] ? filemap_fdatawait_range+0x9f/0x1a0
Apr 15 21:04:50 [461880.890824]  [<ffffffff813f3fdf>] schedule+0x3f/0x60
Apr 15 21:04:50 [461880.890827]  [<ffffffff813f5629>] __mutex_lock_slowpath+0x139/0x330
Apr 15 21:04:50 [461880.890830]  [<ffffffff813f5836>] mutex_lock+0x16/0x30
Apr 15 21:04:50 [461880.890832]  [<ffffffff81183fb2>] vfs_fsync_range+0x52/0xa0
Apr 15 21:04:50 [461880.890834]  [<ffffffff8118406c>] vfs_fsync+0x1c/0x20
Apr 15 21:04:50 [461880.890837]  [<ffffffff811843d6>] sys_fdatasync+0x36/0x50
Apr 15 21:04:50 [461880.890839]  [<ffffffff813f7742>] system_call_fastpath+0x16/0x1b
Apr 15 21:04:50 [461880.890843] INFO: task mythbackend:17562 blocked for more than 120 seconds.
Apr 15 21:04:50 [461880.890876] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 15 21:04:50 [461880.890913] mythbackend     D 00000001083ff1a0     0 17562      1 0x00000000
Apr 15 21:04:50 [461880.890916]  ffff880016999e78 0000000000000086 ffff880016999eb0 ffff880016999e48
Apr 15 21:04:50 [461880.890919]  ffff88010d1df300 ffff880016999fd8 ffff880016999fd8 ffff880016999fd8
Apr 15 21:04:50 [461880.890921]  ffffffff8169b020 ffff88010d1df300 ffff880000000001 ffff88006172e0d0
Apr 15 21:04:50 [461880.890924] Call Trace:
Apr 15 21:04:50 [461880.890927]  [<ffffffff81103ac5>] ? pagevec_lookup_tag+0x25/0x40
Apr 15 21:04:50 [461880.890929]  [<ffffffff810f7ecf>] ? filemap_fdatawait_range+0x9f/0x1a0
Apr 15 21:04:50 [461880.890932]  [<ffffffff813f3fdf>] schedule+0x3f/0x60
Apr 15 21:04:50 [461880.890934]  [<ffffffff813f5629>] __mutex_lock_slowpath+0x139/0x330
Apr 15 21:04:50 [461880.890937]  [<ffffffff813f5836>] mutex_lock+0x16/0x30
Apr 15 21:04:50 [461880.890939]  [<ffffffff81183fb2>] vfs_fsync_range+0x52/0xa0
Apr 15 21:04:50 [461880.890941]  [<ffffffff8118406c>] vfs_fsync+0x1c/0x20
Apr 15 21:04:50 [461880.890943]  [<ffffffff811843d6>] sys_fdatasync+0x36/0x50
Apr 15 21:04:50 [461880.890946]  [<ffffffff813f7742>] system_call_fastpath+0x16/0x1b
Apr 15 21:04:50 [461880.890949] INFO: task mythcommflag:17595 blocked for more than 120 seconds.
Apr 15 21:04:50 [461880.890983] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 15 21:04:50 [461880.891019] mythcommflag    D 0000000000000002     0 17595  17594 0x00000000
Apr 15 21:04:50 [461880.891022]  ffff880012883738 0000000000000086 ffff8800264afbe8 000000003ded1f97
Apr 15 21:04:50 [461880.891025]  ffff88010d1ddd70 ffff880012883fd8 ffff880012883fd8 ffff880012883fd8
Apr 15 21:04:50 [461880.891028]  ffff8800681e5640 ffff88010d1ddd70 ffff880012883fd8 ffff88011fc57c08
Apr 15 21:04:50 [461880.891030] Call Trace:
Apr 15 21:04:50 [461880.891033]  [<ffffffff81012f69>] ? read_tsc+0x9/0x20
Apr 15 21:04:50 [461880.891036]  [<ffffffff81089edd>] ? ktime_get_ts+0xad/0xe0
Apr 15 21:04:50 [461880.891038]  [<ffffffff810f7bd0>] ? __lock_page+0x70/0x70
Apr 15 21:04:50 [461880.891040]  [<ffffffff813f3fdf>] schedule+0x3f/0x60
Apr 15 21:04:50 [461880.891043]  [<ffffffff813f408f>] io_schedule+0x8f/0xd0
Apr 15 21:04:50 [461880.891045]  [<ffffffff810f7bde>] sleep_on_page+0xe/0x20
Apr 15 21:04:50 [461880.891047]  [<ffffffff813f483f>] __wait_on_bit+0x5f/0x90
Apr 15 21:04:50 [461880.891050]  [<ffffffff810f7dc8>] wait_on_page_bit+0x78/0x80
Apr 15 21:04:50 [461880.891052]  [<ffffffff8107f840>] ? autoremove_wake_function+0x40/0x40
Apr 15 21:04:50 [461880.891055]  [<ffffffff8110a1a5>] shrink_page_list+0x405/0x910
Apr 15 21:04:50 [461880.891058]  [<ffffffff811050d6>] ? __pagevec_release+0x26/0x40
Apr 15 21:04:50 [461880.891061]  [<ffffffff813f689f>] ? _raw_spin_unlock_irq+0xf/0x40
Apr 15 21:04:50 [461880.891063]  [<ffffffff8110ac71>] shrink_inactive_list+0x2e1/0x470
Apr 15 21:04:50 [461880.891065]  [<ffffffff8121dca7>] ? kobject_put+0x27/0x60
Apr 15 21:04:50 [461880.891068]  [<ffffffff812e7af7>] ? put_device+0x17/0x20
Apr 15 21:04:50 [461880.891070]  [<ffffffff8110b60c>] shrink_zone+0x4bc/0x660
Apr 15 21:04:50 [461880.891073]  [<ffffffff81012f69>] ? read_tsc+0x9/0x20
Apr 15 21:04:50 [461880.891075]  [<ffffffff8110bade>] do_try_to_free_pages+0xae/0x400
Apr 15 21:04:50 [461880.891078]  [<ffffffff8110c1a4>] try_to_free_pages+0xb4/0x1e0
Apr 15 21:04:50 [461880.891080]  [<ffffffff811005fc>] __alloc_pages_nodemask+0x52c/0x850
Apr 15 21:04:50 [461880.891084]  [<ffffffff81127f8d>] ? page_add_new_anon_rmap+0x9d/0xb0
Apr 15 21:04:50 [461880.891087]  [<ffffffff81139f7a>] alloc_pages_vma+0x9a/0x180
Apr 15 21:04:50 [461880.891090]  [<ffffffff8114b2d2>] do_huge_pmd_anonymous_page+0x152/0x380
Apr 15 21:04:50 [461880.891092]  [<ffffffff8111fbe6>] handle_mm_fault+0x256/0x350
Apr 15 21:04:50 [461880.891095]  [<ffffffff8103993a>] do_page_fault+0x14a/0x4d0
Apr 15 21:04:50 [461880.891098]  [<ffffffff8112405f>] ? do_brk+0x21f/0x320
Apr 15 21:04:50 [461880.891100]  [<ffffffff813f7185>] page_fault+0x25/0x30
Apr 15 21:06:50 [462000.890513] INFO: task kjournald:9163 blocked for more than 120 seconds.
Apr 15 21:06:50 [462000.890552] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 15 21:06:50 [462000.890590] kjournald       D 00000001083ffddf     0  9163      2 0x00000000
Apr 15 21:06:50 [462000.890594]  ffff8800b6577c10 0000000000000046 00000001083ffddf ffffffff810ffd9a
Apr 15 21:06:50 [462000.890598]  ffff8801195a9590 ffff8800b6577fd8 ffff8800b6577fd8 ffff8800b6577fd8
Apr 15 21:06:50 [462000.890601]  ffff8800105323f0 ffff8801195a9590 ffff88011fffb700 0000000100000000
Apr 15 21:06:50 [462000.890604] Call Trace:
Apr 15 21:06:50 [462000.890610]  [<ffffffff810ffd9a>] ? free_pcppages_bulk+0x42a/0x470
Apr 15 21:06:50 [462000.890614]  [<ffffffff81012f69>] ? read_tsc+0x9/0x20
Apr 15 21:06:50 [462000.890618]  [<ffffffff811860e0>] ? __wait_on_buffer+0x30/0x30
Apr 15 21:06:50 [462000.890621]  [<ffffffff813f3fdf>] schedule+0x3f/0x60
Apr 15 21:06:50 [462000.890623]  [<ffffffff813f408f>] io_schedule+0x8f/0xd0
Apr 15 21:06:50 [462000.890625]  [<ffffffff811860ee>] sleep_on_buffer+0xe/0x20
Apr 15 21:06:50 [462000.890628]  [<ffffffff813f483f>] __wait_on_bit+0x5f/0x90
Apr 15 21:06:50 [462000.890630]  [<ffffffff811860e0>] ? __wait_on_buffer+0x30/0x30
Apr 15 21:06:50 [462000.890633]  [<ffffffff813f48ec>] out_of_line_wait_on_bit+0x7c/0x90
Apr 15 21:06:50 [462000.890636]  [<ffffffff8107f840>] ? autoremove_wake_function+0x40/0x40
Apr 15 21:06:50 [462000.890638]  [<ffffffff811860de>] __wait_on_buffer+0x2e/0x30
Apr 15 21:06:50 [462000.890657]  [<ffffffffa00a5b2c>] journal_commit_transaction+0xcac/0x13c0 [jbd]
Apr 15 21:06:50 [462000.890660]  [<ffffffff8106cd66>] ? try_to_del_timer_sync+0x96/0x160
Apr 15 21:06:50 [462000.890666]  [<ffffffffa00a8a76>] kjournald+0xe6/0x250 [jbd]
Apr 15 21:06:50 [462000.890668]  [<ffffffff8107f800>] ? abort_exclusive_wait+0xb0/0xb0
Apr 15 21:06:50 [462000.890673]  [<ffffffffa00a8990>] ? commit_timeout+0x10/0x10 [jbd]
Apr 15 21:06:50 [462000.890676]  [<ffffffff8107eeac>] kthread+0x8c/0xa0
Apr 15 21:06:50 [462000.890678]  [<ffffffff813f8864>] kernel_thread_helper+0x4/0x10
Apr 15 21:06:50 [462000.890681]  [<ffffffff8107ee20>] ? kthread_worker_fn+0x190/0x190
Apr 15 21:06:50 [462000.890683]  [<ffffffff813f8860>] ? gs_change+0x13/0x13
Apr 15 21:06:50 [462000.890688] INFO: task mythbackend:15037 blocked for more than 120 seconds.
Apr 15 21:06:50 [462000.890721] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 15 21:06:50 [462000.890758] mythbackend     D 00000001083ff1a3     0 15037      1 0x00000000
Apr 15 21:06:50 [462000.890761]  ffff88002045de78 0000000000000086 ffff88002045deb0 ffff88002045de48
Apr 15 21:06:50 [462000.890765]  ffff8801090e5d70 ffff88002045dfd8 ffff88002045dfd8 ffff88002045dfd8
Apr 15 21:06:50 [462000.890767]  ffffffff8169b020 ffff8801090e5d70 ffff88001b7b3f08 ffff88001b7b3f10
Apr 15 21:06:50 [462000.890770] Call Trace:
Apr 15 21:06:50 [462000.890773]  [<ffffffff81103ac5>] ? pagevec_lookup_tag+0x25/0x40
Apr 15 21:06:50 [462000.890777]  [<ffffffff810f7ecf>] ? filemap_fdatawait_range+0x9f/0x1a0
Apr 15 21:06:50 [462000.890779]  [<ffffffff813f3fdf>] schedule+0x3f/0x60
Apr 15 21:06:50 [462000.890782]  [<ffffffff813f5629>] __mutex_lock_slowpath+0x139/0x330
Apr 15 21:06:50 [462000.890784]  [<ffffffff813f5836>] mutex_lock+0x16/0x30
Apr 15 21:06:50 [462000.890787]  [<ffffffff81183fb2>] vfs_fsync_range+0x52/0xa0
Apr 15 21:06:50 [462000.890789]  [<ffffffff8118406c>] vfs_fsync+0x1c/0x20
Apr 15 21:06:50 [462000.890791]  [<ffffffff811843d6>] sys_fdatasync+0x36/0x50
Apr 15 21:06:50 [462000.890794]  [<ffffffff813f7742>] system_call_fastpath+0x16/0x1b
Apr 15 21:06:50 [462000.890797] INFO: task mythbackend:17562 blocked for more than 120 seconds.
Apr 15 21:06:50 [462000.890830] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 15 21:06:50 [462000.890867] mythbackend     D 00000001083ff1a0     0 17562      1 0x00000000
Apr 15 21:06:50 [462000.890870]  ffff880016999e78 0000000000000086 ffff880016999eb0 ffff880016999e48
Apr 15 21:06:50 [462000.890873]  ffff88010d1df300 ffff880016999fd8 ffff880016999fd8 ffff880016999fd8
Apr 15 21:06:50 [462000.890876]  ffffffff8169b020 ffff88010d1df300 ffff880000000001 ffff88006172e0d0
Apr 15 21:06:50 [462000.890879] Call Trace:
Apr 15 21:06:50 [462000.890881]  [<ffffffff81103ac5>] ? pagevec_lookup_tag+0x25/0x40
Apr 15 21:06:50 [462000.890884]  [<ffffffff810f7ecf>] ? filemap_fdatawait_range+0x9f/0x1a0
Apr 15 21:06:50 [462000.890886]  [<ffffffff813f3fdf>] schedule+0x3f/0x60
Apr 15 21:06:50 [462000.890889]  [<ffffffff813f5629>] __mutex_lock_slowpath+0x139/0x330
Apr 15 21:06:50 [462000.890891]  [<ffffffff813f5836>] mutex_lock+0x16/0x30
Apr 15 21:06:50 [462000.890893]  [<ffffffff81183fb2>] vfs_fsync_range+0x52/0xa0
Apr 15 21:06:50 [462000.890896]  [<ffffffff8118406c>] vfs_fsync+0x1c/0x20
Apr 15 21:06:50 [462000.890898]  [<ffffffff811843d6>] sys_fdatasync+0x36/0x50
Apr 15 21:06:50 [462000.890901]  [<ffffffff813f7742>] system_call_fastpath+0x16/0x1b
Apr 15 21:06:50 [462000.890904] INFO: task mythcommflag:17595 blocked for more than 120 seconds.
Apr 15 21:06:50 [462000.892048] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 15 21:06:50 [462000.893248] mythcommflag    D 0000000000000002     0 17595  17594 0x00000000
Apr 15 21:06:50 [462000.893254]  ffff880012883738 0000000000000086 ffff8800264afbe8 000000003ded1f97
Apr 15 21:06:50 [462000.893257]  ffff88010d1ddd70 ffff880012883fd8 ffff880012883fd8 ffff880012883fd8
Apr 15 21:06:50 [462000.893260]  ffff8800681e5640 ffff88010d1ddd70 ffff880012883fd8 ffff88011fc57c08
Apr 15 21:06:50 [462000.893263] Call Trace:
Apr 15 21:06:50 [462000.893266]  [<ffffffff81012f69>] ? read_tsc+0x9/0x20
Apr 15 21:06:50 [462000.893269]  [<ffffffff81089edd>] ? ktime_get_ts+0xad/0xe0
Apr 15 21:06:50 [462000.893272]  [<ffffffff810f7bd0>] ? __lock_page+0x70/0x70
Apr 15 21:06:50 [462000.893274]  [<ffffffff813f3fdf>] schedule+0x3f/0x60
Apr 15 21:06:50 [462000.893276]  [<ffffffff813f408f>] io_schedule+0x8f/0xd0
Apr 15 21:06:50 [462000.893279]  [<ffffffff810f7bde>] sleep_on_page+0xe/0x20
Apr 15 21:06:50 [462000.893281]  [<ffffffff813f483f>] __wait_on_bit+0x5f/0x90
Apr 15 21:06:50 [462000.893284]  [<ffffffff810f7dc8>] wait_on_page_bit+0x78/0x80
Apr 15 21:06:50 [462000.893286]  [<ffffffff8107f840>] ? autoremove_wake_function+0x40/0x40
Apr 15 21:06:50 [462000.893289]  [<ffffffff8110a1a5>] shrink_page_list+0x405/0x910
Apr 15 21:06:50 [462000.893291]  [<ffffffff811050d6>] ? __pagevec_release+0x26/0x40
Apr 15 21:06:50 [462000.893294]  [<ffffffff813f689f>] ? _raw_spin_unlock_irq+0xf/0x40
Apr 15 21:06:50 [462000.893297]  [<ffffffff8110ac71>] shrink_inactive_list+0x2e1/0x470
Apr 15 21:06:50 [462000.893300]  [<ffffffff8121dca7>] ? kobject_put+0x27/0x60
Apr 15 21:06:50 [462000.893303]  [<ffffffff812e7af7>] ? put_device+0x17/0x20
Apr 15 21:06:50 [462000.893305]  [<ffffffff8110b60c>] shrink_zone+0x4bc/0x660
Apr 15 21:06:50 [462000.893307]  [<ffffffff81012f69>] ? read_tsc+0x9/0x20
Apr 15 21:06:50 [462000.893310]  [<ffffffff8110bade>] do_try_to_free_pages+0xae/0x400
Apr 15 21:06:50 [462000.893312]  [<ffffffff8110c1a4>] try_to_free_pages+0xb4/0x1e0
Apr 15 21:06:50 [462000.893315]  [<ffffffff811005fc>] __alloc_pages_nodemask+0x52c/0x850
Apr 15 21:06:50 [462000.893318]  [<ffffffff81127f8d>] ? page_add_new_anon_rmap+0x9d/0xb0
Apr 15 21:06:50 [462000.893321]  [<ffffffff81139f7a>] alloc_pages_vma+0x9a/0x180
Apr 15 21:06:50 [462000.893324]  [<ffffffff8114b2d2>] do_huge_pmd_anonymous_page+0x152/0x380
Apr 15 21:06:50 [462000.893326]  [<ffffffff8111fbe6>] handle_mm_fault+0x256/0x350
Apr 15 21:06:50 [462000.893329]  [<ffffffff8103993a>] do_page_fault+0x14a/0x4d0
Apr 15 21:06:50 [462000.893332]  [<ffffffff8112405f>] ? do_brk+0x21f/0x320
Apr 15 21:06:50 [462000.893334]  [<ffffffff813f7185>] page_fault+0x25/0x30
Apr 15 21:08:50 [462120.890526] INFO: task kjournald:9163 blocked for more than 120 seconds.
Apr 15 21:08:50 [462120.891707] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 15 21:08:50 [462120.892890] kjournald       D 00000001083ffddf     0  9163      2 0x00000000
Apr 15 21:08:50 [462120.892894]  ffff8800b6577c10 0000000000000046 00000001083ffddf ffffffff810ffd9a
Apr 15 21:08:50 [462120.892898]  ffff8801195a9590 ffff8800b6577fd8 ffff8800b6577fd8 ffff8800b6577fd8
Apr 15 21:08:50 [462120.892901]  ffff8800105323f0 ffff8801195a9590 ffff88011fffb700 0000000100000000
Apr 15 21:08:50 [462120.892904] Call Trace:
Apr 15 21:08:50 [462120.892911]  [<ffffffff810ffd9a>] ? free_pcppages_bulk+0x42a/0x470
Apr 15 21:08:50 [462120.892915]  [<ffffffff81012f69>] ? read_tsc+0x9/0x20
Apr 15 21:08:50 [462120.892919]  [<ffffffff811860e0>] ? __wait_on_buffer+0x30/0x30
Apr 15 21:08:50 [462120.892922]  [<ffffffff813f3fdf>] schedule+0x3f/0x60
Apr 15 21:08:50 [462120.892924]  [<ffffffff813f408f>] io_schedule+0x8f/0xd0
Apr 15 21:08:50 [462120.892927]  [<ffffffff811860ee>] sleep_on_buffer+0xe/0x20
Apr 15 21:08:50 [462120.892929]  [<ffffffff813f483f>] __wait_on_bit+0x5f/0x90
Apr 15 21:08:50 [462120.892932]  [<ffffffff811860e0>] ? __wait_on_buffer+0x30/0x30
Apr 15 21:08:50 [462120.892934]  [<ffffffff813f48ec>] out_of_line_wait_on_bit+0x7c/0x90
Apr 15 21:08:50 [462120.892938]  [<ffffffff8107f840>] ? autoremove_wake_function+0x40/0x40
Apr 15 21:08:50 [462120.892940]  [<ffffffff811860de>] __wait_on_buffer+0x2e/0x30
Apr 15 21:08:50 [462120.892961]  [<ffffffffa00a5b2c>] journal_commit_transaction+0xcac/0x13c0 [jbd]
Apr 15 21:08:50 [462120.892965]  [<ffffffff8106cd66>] ? try_to_del_timer_sync+0x96/0x160
Apr 15 21:08:50 [462120.892970]  [<ffffffffa00a8a76>] kjournald+0xe6/0x250 [jbd]
Apr 15 21:08:50 [462120.892973]  [<ffffffff8107f800>] ? abort_exclusive_wait+0xb0/0xb0
Apr 15 21:08:50 [462120.892978]  [<ffffffffa00a8990>] ? commit_timeout+0x10/0x10 [jbd]
Apr 15 21:08:50 [462120.892980]  [<ffffffff8107eeac>] kthread+0x8c/0xa0
Apr 15 21:08:50 [462120.892983]  [<ffffffff813f8864>] kernel_thread_helper+0x4/0x10
Apr 15 21:08:50 [462120.892986]  [<ffffffff8107ee20>] ? kthread_worker_fn+0x190/0x190
Apr 15 21:08:50 [462120.892988]  [<ffffffff813f8860>] ? gs_change+0x13/0x13
Apr 15 21:08:50 [462120.893000] INFO: task mythbackend:15037 blocked for more than 120 seconds.
Apr 15 21:08:50 [462120.894190] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 15 21:08:50 [462120.895394] mythbackend     D 00000001083ff1a3     0 15037      1 0x00000000
Apr 15 21:08:50 [462120.895397]  ffff88002045de78 0000000000000086 ffff88002045deb0 ffff88002045de48
Apr 15 21:08:50 [462120.895400]  ffff8801090e5d70 ffff88002045dfd8 ffff88002045dfd8 ffff88002045dfd8
Apr 15 21:08:50 [462120.895403]  ffffffff8169b020 ffff8801090e5d70 ffff88001b7b3f08 ffff88001b7b3f10
Apr 15 21:08:50 [462120.895406] Call Trace:
Apr 15 21:08:50 [462120.895409]  [<ffffffff81103ac5>] ? pagevec_lookup_tag+0x25/0x40
Apr 15 21:08:50 [462120.895413]  [<ffffffff810f7ecf>] ? filemap_fdatawait_range+0x9f/0x1a0
Apr 15 21:08:50 [462120.895415]  [<ffffffff813f3fdf>] schedule+0x3f/0x60
Apr 15 21:08:50 [462120.895418]  [<ffffffff813f5629>] __mutex_lock_slowpath+0x139/0x330
Apr 15 21:08:50 [462120.895420]  [<ffffffff813f5836>] mutex_lock+0x16/0x30
Apr 15 21:08:50 [462120.895423]  [<ffffffff81183fb2>] vfs_fsync_range+0x52/0xa0
Apr 15 21:08:50 [462120.895425]  [<ffffffff8118406c>] vfs_fsync+0x1c/0x20
Apr 15 21:08:50 [462120.895427]  [<ffffffff811843d6>] sys_fdatasync+0x36/0x50
Apr 15 21:08:50 [462120.895430]  [<ffffffff813f7742>] system_call_fastpath+0x16/0x1b
Comment 1 bfields 2012-06-12 21:43:46 UTC
No idea.  Looks kjournald and a bunch of other processes are stuck waiting for IO.  What kind of disk is this?  Has this only happened this one time, or is it reproduceable?
Comment 2 flo 2012-06-12 23:44:56 UTC
I can reproduce the case:

I have set up automounter to mount /data/video via nfsv4 from my video server (otto) (bug is also hit with nfsv3 i think).

I try:

dd if=/dev/zero of=/data/video/testfile bs=1024k count=100

-> 22MB/s, works great

dd if=/dev/zero of=/data/video/testfile bs=1024k count=1000

-> 6MB/s, works great

dd if=/dev/zero of=/data/video/testfile bs=1024k count=10000

-> stuck after some time, some processes block.

I logged in via ssh and tried to kill dd, so to find the process i did:

ps -ef |grep dd

-> This command blocks!

So i did

strace -f ps -ef

...

stat("/proc/2528", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
open("/proc/2528/stat", O_RDONLY)       = 6
read(6, "2528 (plasma-desktop) D 1 2411 2"..., 1023) = 249
close(6)                                = 0
open("/proc/2528/status", O_RDONLY)     = 6
read(6, "Name:\tplasma-desktop\nState:\tD (d"..., 1023) = 910
close(6)                                = 0
open("/proc/2528/cmdline", O_RDONLY)    = 6
read(6,  <unfinished ...>

Last line comes from CTRL-C i think

So something in /proc is blocking?

ps -u myusername works btw. and killing dd makes the system usable within some seconds.

My guess: If the IO waiting for NFS to write is larger than memory (+swap?), then i hit the bug. System has 4GB memory and 4GB swap, the dd generates 10GB data.

More information:

Opensuse 12.1, all patches installed,
 uname -a
Linux mamba 3.1.10-1.9-desktop #1 SMP PREEMPT Thu Apr 5 18:48:38 UTC 2012 (4a97ec8) x86_64 x86_64 x86_64 GNU/Linux

NFS Server is a Opensuse 11.4, and

mount |grep video
otto:/video/ on /data/video type nfs4 (rw,relatime,vers=4,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.135.136,minorversion=0,local_lock=none,addr=192.168.135.104)
Comment 3 warpme 2012-06-16 08:24:22 UTC
Hi,

Unfortunately recently moved to 3.3.7. As soon as i will catch this issue under 3.3.7 - I'll post immediately...

-br
Comment 4 bfields 2012-10-30 16:18:56 UTC
Is this still happening?
Comment 5 warpme 2012-10-30 17:32:55 UTC
I'm not able to reproduce this problem with 3.3.8. But also I have to mention I switch from NFS to CIFS due problems with v.slow speed (1-3MB/s) for MAC->Linux NFS transfers. Strange as Linux->MAC works OK (70-90MB/s) :-(

br
Comment 6 bfields 2012-10-30 17:51:21 UTC
OK, marking this unreproduceable for now.

"problems with v.slow speed (1-3MB/s) for MAC->Linux NFS transfers"

That's writes from an OSX client to a Linux server?  Possibly worth opening another bug if you want to look into that.
Comment 7 warpme 2012-10-30 19:21:57 UTC
"That's writes from an OSX client to a Linux server?  Possibly worth opening
another bug if you want to look into that."

Oh thx for suggestion. I'll do this definitely !

Note You need to log in before you can comment on or make changes to this bug.