Bug 15552
Summary: | nfs is blocking processes | ||
---|---|---|---|
Product: | File System | Reporter: | Andreas Radke (andyrtr) |
Component: | NFS | Assignee: | Trond Myklebust (trondmy) |
Status: | RESOLVED DUPLICATE | ||
Severity: | normal | ||
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.33.1 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: | NFS: Avoid a deadlock in nfs_release_page |
Description
Andreas Radke
2010-03-17 05:38:56 UTC
The above just means that the thread is waiting for another write request to complete. In order to figure out why it is hanging, you need to look at the other threads to figure out which one is actually responsible for doing the writeout and see why it is hanging. That said, there is a known problem with a deadlock in shrink_page_list() and I've got a patch queued up to fix it. Could you perhaps give it a try, and see if it fixes your hang. Created attachment 25567 [details]
NFS: Avoid a deadlock in nfs_release_page
I may be seeing a similar error. When copying large amounts (hundreds of GBs) of data over NFS the copying process (rsync, tar, etc.) will get into a D state and then flush and kswapd enter a D state. Finally any process writing to disk will block. This occurs with the Debian 2.6.33-1~experimental (2.6.33.1) kernel and the Debian 2.6.32-10 (2.6.32.9 equivalent) kernels. I applied the provided patch to the Debian 2.6.33.1 kernel and it did not solve the problem. Below is the call trace from the blocked process. This did not kill the process. INFO: task rsync:17708 blocked for more than 600 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. rsync D 0000000000000002 0 17708 17707 0x00000000 ffff880c7cfb6200 0000000000000086 0000000000000000 ffff88044b409334 0000000000000286 000000000000f8e0 ffff88044b409fd8 0000000000015680 0000000000015680 ffff88067bc6e900 ffff88067bc6ebf0 0000000181066af9 Call Trace: [<ffffffff8100f789>] ? read_tsc+0xa/0x20 [<ffffffff8109411e>] ? delayacct_end+0x74/0x7f [<ffffffffa03c3c04>] ? nfs_wait_bit_uninterruptible+0x0/0xd [nfs] [<ffffffff812ed1cb>] ? io_schedule+0x73/0xb7 [<ffffffffa03c3c0d>] ? nfs_wait_bit_uninterruptible+0x9/0xd [nfs] [<ffffffff812ed6f6>] ? __wait_on_bit+0x41/0x70 [<ffffffff81183d3c>] ? __lookup_tag+0xad/0x11b [<ffffffffa03c3c04>] ? nfs_wait_bit_uninterruptible+0x0/0xd [nfs] [<ffffffff812ed790>] ? out_of_line_wait_on_bit+0x6b/0x77 [<ffffffff8105f300>] ? wake_bit_function+0x0/0x23 [<ffffffffa03c7b87>] ? nfs_sync_mapping_wait+0xfa/0x227 [nfs] [<ffffffffa03c7d48>] ? nfs_wb_page+0x94/0xc3 [nfs] [<ffffffffa03badfc>] ? nfs_release_page+0x3a/0x57 [nfs] [<ffffffff810b9382>] ? shrink_page_list+0x48d/0x617 [<ffffffff8103a9a1>] ? pick_next_task_fair+0xca/0xd5 [<ffffffff810b843d>] ? isolate_pages_global+0x1a0/0x20f [<ffffffff810b9ee3>] ? shrink_zone+0x710/0xae9 [<ffffffff810b52ba>] ? __alloc_pages_nodemask+0x2a7/0x5e1 [<ffffffff81279ab1>] ? tcp_established_options+0x2d/0xa9 [<ffffffff8103dcac>] ? find_busiest_group+0x3b0/0x875 [<ffffffff8100f789>] ? read_tsc+0xa/0x20 [<ffffffff810bb121>] ? do_try_to_free_pages+0x1ce/0x31e [<ffffffff810bb396>] ? try_to_free_pages+0x72/0x78 [<ffffffff810b829d>] ? isolate_pages_global+0x0/0x20f [<ffffffff810b53cb>] ? __alloc_pages_nodemask+0x3b8/0x5e1 [<ffffffff81053aaf>] ? lock_timer_base+0x26/0x4b [<ffffffff810e1314>] ? new_slab+0x42/0x1ca [<ffffffff810e168c>] ? __slab_alloc+0x1f0/0x3a2 [<ffffffff810b0a58>] ? mempool_alloc+0x55/0x106 [<ffffffff810b0a58>] ? mempool_alloc+0x55/0x106 [<ffffffff810e1d66>] ? kmem_cache_alloc+0x7f/0xf0 [<ffffffff810b0a58>] ? mempool_alloc+0x55/0x106 [<ffffffff8105f2d2>] ? autoremove_wake_function+0x0/0x2e [<ffffffffa03c840e>] ? nfs_writedata_alloc+0x19/0x98 [nfs] [<ffffffffa03c84a1>] ? nfs_flush_one+0x14/0xce [nfs] [<ffffffffa03c3abc>] ? nfs_pageio_doio+0x2a/0x51 [nfs] [<ffffffffa03c3ba8>] ? nfs_pageio_add_request+0xc5/0xd5 [nfs] [<ffffffffa03c6e01>] ? nfs_do_writepage+0xe0/0x103 [nfs] [<ffffffffa03c72e7>] ? nfs_writepages_callback+0xf/0x21 [nfs] [<ffffffff810b5c38>] ? write_cache_pages+0x1be/0x2a4 [<ffffffffa03c72d8>] ? nfs_writepages_callback+0x0/0x21 [nfs] [<ffffffffa03c7297>] ? nfs_writepages+0xde/0x11f [nfs] [<ffffffffa03c848d>] ? nfs_flush_one+0x0/0xce [nfs] [<ffffffffa03c8336>] ? nfs_write_mapping+0x55/0x8e [nfs] [<ffffffffa03bb273>] ? nfs_do_fsync+0x1c/0x3c [nfs] [<ffffffff810e820a>] ? filp_close+0x37/0x62 [<ffffffff810e82c9>] ? sys_close+0x94/0xcd [<ffffffff81008ac2>] ? system_call_fastpath+0x16/0x1b Yes. Looking again at the traces, I'm thinking this might be the same bug as Bugzilla entry 15578. Can you try the proposed patch at https://bugzilla.kernel.org/attachment.cgi?id=25612 and see if that fixes the hang? I've installed the proposed patch and have not had any problems yet. It seems to take many GBs of NFS writing to cause the problem, so I can't yet say if it has fixed the issue. My problem occoured in 2.6.33 and 2.6.33.1. I've tried the patch appended here in this bug. So far it seems to solve my problem. |