Bug 9712

Summary: BUG in current NFS4 code makes the system unusable
Product: File System Reporter: Puzin, Dimitri (bugs)
Component: NFSAssignee: Trond Myklebust (trondmy)
Status: CLOSED CODE_FIX    
Severity: blocking CC: akpm, bfields, bunk, gentuu, rjw
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.24-rc6-git10 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 9243    
Attachments: nfsd4: fix bad seqid on lock request incompatible with open mode
SysRq-t dump
NFSv4: Give the lock stateid its own sequence queue

Description Puzin, Dimitri 2008-01-07 23:41:20 UTC
Latest working kernel version: 2.6.24-rc6-git9
Earliest failing kernel version: 2.6.24-rc6-git10
Distribution: Debian GNU/Linux 4.0/Etch 
Hardware Environment: x86_64, diskless workstation
Software Environment: NFS4(krb5i) /home of users
Problem Description: 
I've noticed there is a bug in the current commit to -rc6-git10
e6e21970baff4845de74584e2efc8c964a55d574 (NFSv4: Fix open_to_lock_owner
sequenceid allocation...)  Here, the /home of users is shared between machines via nfs4 mounts from server. The code leads to an unusable system within
~10 mins when running some file hungry app like firefox. The app seem to
hang, as every other process which accesses an nfs4 mountpoint. The behavior is
reproducible here at least on 2 x86_64 boxes. I haven't tried an x86 box
yet. Unapplying this commit removes the described effect.
Comment 1 Andrew Morton 2008-01-07 23:52:39 UTC
argh.
Comment 2 Trond Myklebust 2008-01-08 11:06:09 UTC
Created attachment 14366 [details]
nfsd4: fix bad seqid on lock request incompatible with open mode
Comment 3 Trond Myklebust 2008-01-08 11:40:54 UTC
As I said in my email to you about this subject, I'm interested in obtaining 
the sysrq-T trace when the hang occurs.

That said, the attached patch fixes a known bug with sequence ids on the Linux
server and might make a difference.
Comment 4 Puzin, Dimitri 2008-01-08 13:50:23 UTC
Created attachment 14368 [details]
SysRq-t dump

The dump is attached. It is a bit huge. I think the offending part is looking like

Jan  7 22:57:52 narcissus kernel: firefox-bin   S 0000000000000001     0  1714   1727
Jan  7 22:57:52 narcissus kernel: ffff810114c2dbd8 0000000000000046 ffff810114c2db78 ffffffff80256348
Jan  7 22:57:52 narcissus kernel: ffff810114c2a000 ffffffff805b5c8a ffff810114c2a000 ffff810122dd8000
Jan  7 22:57:52 narcissus kernel: ffff810114c2a218 0000000180256516 ffff810009857730 0000000000000292
Jan  7 22:57:52 narcissus kernel: Call Trace:
Jan  7 22:57:52 narcissus kernel: [<ffffffff80256348>] mark_held_locks+0x4a/0x6a
Jan  7 22:57:52 narcissus kernel: [<ffffffff805b5c8a>] _spin_unlock_irqrestore+0x3f/0x69
Jan  7 22:57:52 narcissus kernel: [<ffffffff805b5c97>] _spin_unlock_irqrestore+0x4c/0x69
Jan  7 22:57:52 narcissus kernel: [<ffffffff8059c81c>] rpc_wait_bit_interruptible+0x22/0x28
Jan  7 22:57:52 narcissus kernel: [<ffffffff805b38de>] __wait_on_bit+0x45/0x77
Jan  7 22:57:52 narcissus kernel: [<ffffffff8059c7fa>] rpc_wait_bit_interruptible+0x0/0x28
Jan  7 22:57:52 narcissus kernel: [<ffffffff8059c7fa>] rpc_wait_bit_interruptible+0x0/0x28
Jan  7 22:57:52 narcissus kernel: [<ffffffff805b397e>] out_of_line_wait_on_bit+0x6e/0x7b
Jan  7 22:57:52 narcissus kernel: [<ffffffff8024b194>] wake_bit_function+0x0/0x2a
Jan  7 22:57:52 narcissus kernel: [<ffffffff8059c7a0>] __rpc_wait_for_completion_task+0x3a/0x40
Jan  7 22:57:52 narcissus kernel: [<ffffffff80306987>] nfs4_wait_for_completion_rpc_task+0x2a/0x47
Jan  7 22:57:52 narcissus kernel: [<ffffffff80306c2b>] _nfs4_do_setlk+0x1a1/0x205
Jan  7 22:57:52 narcissus kernel: [<ffffffff803073f1>] nfs4_proc_lock+0x309/0x40b
Jan  7 22:57:52 narcissus kernel: [<ffffffff802f610d>] do_setlk+0x61/0xbb
Jan  7 22:57:52 narcissus kernel: [<ffffffff802f640d>] nfs_lock+0x1f3/0x204
Jan  7 22:57:52 narcissus kernel: [<ffffffff80293f22>] locks_alloc_lock+0x15/0x17
Jan  7 22:57:52 narcissus kernel: [<ffffffff802943b7>] vfs_lock_file+0x1e/0x2d
Jan  7 22:57:52 narcissus kernel: [<ffffffff80294ec5>] fcntl_setlk+0x123/0x246
Jan  7 22:57:52 narcissus kernel: [<ffffffff8028699c>] fget+0xc0/0x104
Jan  7 22:57:52 narcissus kernel: [<ffffffff802911c5>] sys_fcntl+0x2f9/0x37b
Jan  7 22:57:52 narcissus kernel: [<ffffffff8020ba2a>] tracesys+0xdc/0xe1
Comment 5 Trond Myklebust 2008-01-08 20:58:25 UTC
Created attachment 14371 [details]
NFSv4: Give the lock stateid its own sequence queue
Comment 6 Trond Myklebust 2008-01-08 21:02:49 UTC
One more patch. This one ought to fix the client regression. The current code
shares a sequence queue for open and lock requests, so that OPEN and LOCK
are always serialised. That works fine, except when we try to grab both an
open_seqid and a lock_seqid: only one of those two can be at the head of the
queue...
Comment 7 Puzin, Dimitri 2008-01-09 09:46:39 UTC
The first patch (#14366) didn't resolve the problem here. Trying the #14371 now.
Comment 8 Puzin, Dimitri 2008-01-09 10:13:06 UTC
It works with both fixes applied. Thanks a lot for the fast resolution! Do you want me to test the last patch against vanilla -rc7?
Comment 9 Trond Myklebust 2008-01-09 10:27:03 UTC
There shouldn't be any changes in -rc7 compared to what I understand you've been
testing, but if you have the time, then that would certainly be useful.
Thanks!
Comment 10 Puzin, Dimitri 2008-01-09 11:01:57 UTC
OK, I will prepare following condition:

Server: 2.6.24-rc7-git1 (x86_64) no patches 
Client: 2.6.24-rc7-git1 (x86_64) && patch #14371

I'll try to add another i386 client with same patchlevel.
Comment 11 Puzin, Dimitri 2008-01-09 17:39:33 UTC
It seem to work fine here in the configuration mentioned above. I think this can be closed. Will both patches reach mainline before the .24 release?
Comment 12 Trond Myklebust 2008-01-10 16:58:16 UTC
I've pushed the second patch to Linus. The first should really be up to Bruce
Fields, since it is a server bug. AFAICS, he has queued it for 2.6.25...
Comment 14 bfields 2008-01-11 16:24:59 UTC
"The first should really be up to Bruce
Fields, since it is a server bug. AFAICS, he has queued it for 2.6.25..."

Yes.  I'm assuming it isn't urgent enough to be pushed to 2.6.24 at this point.