Bug 29062

Summary: soft lockup in nfs_commit_inode()?
Product: File System Reporter: Roman Kononov (roman)
Component: NFSAssignee: Trond Myklebust (trondmy)
Status: CLOSED CODE_FIX    
Severity: normal    
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.36, 2.6.37, 2.6.38 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: more call traces in 2.6.37
nfs: remove extraneous and problematic calls to nfs_clear_request
NFS: Fix a hang/infinite loop in nfs_wb_page()

Description Roman Kononov 2011-02-13 18:59:50 UTC
In 2.6.36.3, I can quite quickly (~1 hour) get the following under some workload:

Feb 12 00:17:12 10.10.10.102 kernel: BUG: soft lockup - CPU#3 stuck for 67s! [postmaster:27813] 
Feb 12 00:17:12 10.10.10.102 kernel: Modules linked in: dm_mod mlx4_ib mlx4_core sata_nv 3w_9xxx 
Feb 12 00:17:12 10.10.10.102 kernel: CPU 3  
Feb 12 00:17:12 10.10.10.102 kernel: Modules linked in: dm_mod mlx4_ib mlx4_core sata_nv 3w_9xxx 
Feb 12 00:17:12 10.10.10.102 kernel:  
Feb 12 00:17:12 10.10.10.102 kernel: Pid: 27813, comm: postmaster Not tainted 2.6.36.3.xdi #6 S2892/empty 
Feb 12 00:17:12 10.10.10.102 kernel: RIP: 0010:[<ffffffff81055033>]  [<ffffffff81055033>] bit_waitqueue+0xc3/0xd0 
Feb 12 00:17:12 10.10.10.102 kernel: RSP: 0018:ffff880039e7ba48  EFLAGS: 00000292 
Feb 12 00:17:12 10.10.10.102 kernel: RAX: ffff880001877b30 RBX: 0000000000000d12 RCX: 0000000000000034 
Feb 12 00:17:12 10.10.10.102 kernel: RDX: ffff880100000e00 RSI: 0000000000000000 RDI: 8038000000000000 
Feb 12 00:17:12 10.10.10.102 kernel: RBP: ffffffff8100354e R08: 023ee891c01c0000 R09: fda317fdfa087007 
Feb 12 00:17:12 10.10.10.102 kernel: R10: 0000000000000001 R11: 0000000000000000 R12: ffff880001877b30 
Feb 12 00:17:12 10.10.10.102 kernel: R13: fda317fdfa087007 R14: 0000000000000001 R15: 0000000000000000 
Feb 12 00:17:12 10.10.10.102 kernel: FS:  000000004087c940(0063) GS:ffff880001b80000(0000) knlGS:0000000000000000 
Feb 12 00:17:12 10.10.10.102 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b 
Feb 12 00:17:12 10.10.10.102 kernel: CR2: 000000000042b000 CR3: 000000003b87e000 CR4: 00000000000006e0 
Feb 12 00:17:12 10.10.10.102 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 
Feb 12 00:17:12 10.10.10.102 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 
Feb 12 00:17:12 10.10.10.102 kernel: Process postmaster (pid: 27813, threadinfo ffff880039e7a000, task ffff880133f91880) 
Feb 12 00:17:12 10.10.10.102 kernel: Stack: 
Feb 12 00:17:12 10.10.10.102 kernel:  ffff88023ee891c0 ffffffff813b98e8 ffff88023ee891c0 0000000000000007 
Feb 12 00:17:12 10.10.10.102 kernel: <0> 0000000000000001 ffff880133f91880 ffffffff810550e0 ffff880039e7ba80 
Feb 12 00:17:12 10.10.10.102 kernel: <0> ffff880039e7ba80 ffffffffffffff10 ffffffff8100372e ffff88023ee89130 
Feb 12 00:17:12 10.10.10.102 kernel: Call Trace: 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff813b98e8>] ? out_of_line_wait_on_bit_lock+0x28/0x90 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff810550e0>] ? wake_bit_function+0x0/0x40 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff8100372e>] ? reschedule_interrupt+0xe/0x20 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff81133489>] ? nfs_commit_inode+0xb9/0x1c0 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff811337a9>] ? nfs_wb_page+0x69/0xc0 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff81133ce1>] ? nfs_flush_incompatible+0x41/0x90 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff81124c9f>] ? nfs_write_begin+0x8f/0x220 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff81081638>] ? generic_file_buffered_write+0x118/0x280 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff813b8c63>] ? schedule+0x273/0x900 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff81083419>] ? __generic_file_aio_write+0x229/0x420 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff81035aa3>] ? load_balance+0x133/0x700 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff81083677>] ? generic_file_aio_write+0x67/0xe0 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff81125ad1>] ? nfs_file_write+0xe1/0x230 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff8104a472>] ? get_signal_to_deliver+0x92/0x350 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff810c2abf>] ? do_sync_write+0xbf/0x100 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff813b8c63>] ? schedule+0x273/0x900 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff810c31f6>] ? vfs_write+0xc6/0x180 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff810c350e>] ? sys_write+0x4e/0x90 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff81040276>] ? sys_gettimeofday+0x36/0x90 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff81002c6b>] ? system_call_fastpath+0x16/0x1b 
Feb 12 00:17:12 10.10.10.102 kernel: Code: 00 fa 6c 81 48 8d 2c 29 b9 40 00 00 00 2b 8a 90 06 00 00 48 d3 ed 48 8d 44 6d 00 48 c1 e0 03 48 03 82 80 06 00 00 48 83 c4 08 5b <5d> c3 66 66 2e 0f 1f 84 00 00 00 00 00 48 83 ec 18 48 8d 47 08  
Feb 12 00:17:12 10.10.10.102 kernel: Call Trace: 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff813b98e8>] ? out_of_line_wait_on_bit_lock+0x28/0x90 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff810550e0>] ? wake_bit_function+0x0/0x40 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff8100372e>] ? reschedule_interrupt+0xe/0x20 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff81133489>] ? nfs_commit_inode+0xb9/0x1c0 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff811337a9>] ? nfs_wb_page+0x69/0xc0 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff81133ce1>] ? nfs_flush_incompatible+0x41/0x90 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff81124c9f>] ? nfs_write_begin+0x8f/0x220 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff81081638>] ? generic_file_buffered_write+0x118/0x280 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff813b8c63>] ? schedule+0x273/0x900 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff81083419>] ? __generic_file_aio_write+0x229/0x420 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff81035aa3>] ? load_balance+0x133/0x700 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff81083677>] ? generic_file_aio_write+0x67/0xe0 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff81125ad1>] ? nfs_file_write+0xe1/0x230 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff8104a472>] ? get_signal_to_deliver+0x92/0x350 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff810c2abf>] ? do_sync_write+0xbf/0x100 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff813b8c63>] ? schedule+0x273/0x900 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff810c31f6>] ? vfs_write+0xc6/0x180 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff810c350e>] ? sys_write+0x4e/0x90 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff81040276>] ? sys_gettimeofday+0x36/0x90 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff81002c6b>] ? system_call_fastpath+0x16/0x1b 

I cannot reproduce it in 2.6.35.11.
Comment 1 Roman Kononov 2011-02-13 21:26:58 UTC
Created attachment 47652 [details]
more call traces in 2.6.37

2.6.37 locks up as well
Comment 2 Roman Kononov 2011-02-14 01:28:02 UTC
2.6.36 locks up as well
Comment 3 Roman Kononov 2011-02-14 16:49:53 UTC
2.6.38-rc4 is likewise
Comment 4 Trond Myklebust 2011-02-15 00:00:29 UTC
Can you use 'git bisect' to figure out where the problem started between 2.6.35 and 2.6.36?
Comment 5 Roman Kononov 2011-02-20 02:44:11 UTC
$ git bisect log
# bad: [f6f94e2ab1b33f0082ac22d71f66385a60d8157f] Linux 2.6.36
# good: [9fe6206f400646a2322096b56c59891d530e8d51] Linux 2.6.35
git bisect start 'v2.6.36' 'v2.6.35' '--' 'fs/nfs'
# skip: [452e93523d9433f83670e7b42cbe75319c208762] NFSv4: Clean up the process of renewing the NFSv4 lease
git bisect skip 452e93523d9433f83670e7b42cbe75319c208762
# skip: [a17c2153d2e271b0cbacae9bed83b0eaa41db7e1] SUNRPC: Move the bound cred to struct rpc_rqst
git bisect skip a17c2153d2e271b0cbacae9bed83b0eaa41db7e1
# bad: [df486a25900f4dba9cdc3886c4ac871951c6aef3] NFS: Fix the selection of security flavours in Kconfig
git bisect bad df486a25900f4dba9cdc3886c4ac871951c6aef3
# bad: [77041ed9b49a9e10f374bfa6e482d30ee7a3d46e] NFSv4: Ensure the lockowners are labelled using the fl_owner and/or fl_pid
git bisect bad 77041ed9b49a9e10f374bfa6e482d30ee7a3d46e
# skip: [c48f4f3541e67881c9eb7c46e052f5ece48ef530] NFSv41: Convert the various reboot recovery ops etc to minor version ops
git bisect skip c48f4f3541e67881c9eb7c46e052f5ece48ef530
# good: [d185a334c748b3ca9de1f3a293fd8a9cf68378ab] NFSv4.1: Simplify nfs41_sequence_done()
git bisect good d185a334c748b3ca9de1f3a293fd8a9cf68378ab
# skip: [a4432345352c2be157ed844603147ac2c82f209c] NFSv41: Deprecate nfs_client->cl_minorversion
git bisect skip a4432345352c2be157ed844603147ac2c82f209c
# good: [035168ab39f66e4946d493f9ee20d11e154f332a] NFSv4.1: Make nfs4_setup_sequence take a nfs_server argument
git bisect good 035168ab39f66e4946d493f9ee20d11e154f332a
# good: [d77d76ffb638bd013782138cca6d8f4918c5afd6] NFSv41: Clean up exclusive create
git bisect good d77d76ffb638bd013782138cca6d8f4918c5afd6
# good: [1f0e890dba5b0f543fea47732116b1c65d55614e] NFSv4: Clean up struct nfs4_state_owner
git bisect good 1f0e890dba5b0f543fea47732116b1c65d55614e
# skip: [daccbded7f153ec84a3baf3136052e41d0eab555] NFSv4: Clean up for lockowner XDR encoding
git bisect skip daccbded7f153ec84a3baf3136052e41d0eab555
# bad: [d3c7b7ccc199ee564177ee914c04771d6bc00295] NFSv4: Add support for the RELEASE_LOCKOWNER operation
git bisect bad d3c7b7ccc199ee564177ee914c04771d6bc00295
# bad: [f11ac8db5d07b6e99d41ff4aa39d878ee5cef1c5] NFSv4: Ensure that we track the NFSv4 lock state in read/write requests.
git bisect bad f11ac8db5d07b6e99d41ff4aa39d878ee5cef1c5

It says that "f11ac8db5d07b6e99d41ff4aa39d878ee5cef1c5 is the first bad commit". Does it make sense? I skipped some revisions because they caused some seemingly unrelated failures and I could neither prove nor disprove the lockup.
Comment 6 Trond Myklebust 2011-02-20 19:05:46 UTC
Thanks! That certainly helps a lot!

The patch in question is indeed known to trigger a problem in the writeback code. A fix was merged into kernel 2.6.37, but does not appear to have been sent to the stable kernel.

I'll attach the fix patch to this bug report, but you can find it as commit 2df485a774ba59c3f43bfe84107672c1d9b731a0 (nfs: remove extraneous and problematic calls to nfs_clear_request)
Comment 7 Trond Myklebust 2011-02-20 19:08:44 UTC
Created attachment 48482 [details]
nfs: remove extraneous and problematic calls to nfs_clear_request
Comment 8 Roman Kononov 2011-03-02 23:46:19 UTC
I am confused. Are you saying that the 2df485a7 patch should fix the lockup? Earlier, I tried 2.6.37, which has the patch, and it locked up.
Comment 9 Trond Myklebust 2011-03-21 19:49:24 UTC
Created attachment 51572 [details]
NFS: Fix a hang/infinite loop in nfs_wb_page()

When one of the two waits in nfs_commit_inode() is interrupted, it
returns a non-negative value, which causes nfs_wb_page() to think
that the operation was successful causing it to busy-loop rather
than exiting.
It also causes nfs_file_fsync() to incorrectly report the file as
being successfully committed to disk.

This patch fixes both problems by ensuring that we return an error
if the attempts to wait fail.
Comment 10 Roman Kononov 2011-03-23 02:23:23 UTC
In my setup, v2.6.37 with the patches has worked for 15 hours. Without the second patch it fails withing an hour. Thanks.
Comment 11 Trond Myklebust 2011-04-02 23:22:37 UTC
Closing. The patch has been submitted to mainline and stable@kernel.org.

Please reopen this report if the bug reoccurs in a future kernel...