77031 – nfsd hangs on lru_put_end with list_del corruption

Bug 77031 - nfsd hangs on lru_put_end with list_del corruption

Summary: nfsd hangs on lru_put_end with list_del corruption

Status:	RESOLVED CODE_FIX

Alias:	None

Product:	File System
Classification:	Unclassified
Component:	NFS (show other bugs)
Hardware:	x86-64 Linux

Importance:	P1 normal
Assignee:	bfields

URL:
Keywords:

Depends on:
Blocks:

Reported:	2014-05-28 16:05 UTC by Tomas Kasparek
Modified:	2014-06-25 05:50 UTC (History)
CC List:	2 users (show)

See Also:
Kernel Version:	3.10.40
Subsystem:
Regression:	No
Bisected commit-id:

Attachments
hangup screenshot (150.51 KB, image/jpeg) 2014-05-28 16:29 UTC, Tomas Kasparek	Details
nfsd: don't halt scanning the DRC LRU list when there is an RC_INPROG entry (1.34 KB, patch) 2014-06-05 01:10 UTC, Jeff Layton	Details \| Diff
nfsd: don't halt scanning the DRC LRU list when there is an RC_INPROG entry (1.81 KB, patch) 2014-06-05 01:12 UTC, Jeff Layton	Details \| Diff
nfsd: don't halt scanning the DRC LRU list when there is an RC_INPROG entry (1.87 KB, patch) 2014-06-05 01:36 UTC, Jeff Layton	Details \| Diff
patched nfscache.c from 3.10.44 (15.15 KB, text/x-csrc) 2014-06-21 04:02 UTC, Tomas Kasparek	Details
.config used for the kernel (80.01 KB, application/octet-stream) 2014-06-21 04:03 UTC, Tomas Kasparek	Details
Show Obsolete (2) Add an attachment (proposed patch, testcase, etc.)

Description Tomas Kasparek 2014-05-28 16:05:59 UTC

system: Supermicro X9DRW-iF, 2x Xeon E5-2630, 16GB RAM
kernel: Linux version 3.10.40 (gcc version 4.6.4 (GCC) )

We have 6 storage servers with 3ware HW RAID using RAID10 for scratches on our cluster. Things get better with newer version of 3.10 as there are probably some related problems solved, but still this one persists.

After high load our NFSd servers die with:

May 28 16:34:15 scratch02 kernel: ------------[ cut here ]------------
May 28 16:34:15 scratch02 kernel: WARNING: at lib/list_debug.c:53 __list_del_entry+0x8d/0x98()
May 28 16:34:15 scratch02 kernel: list_del corruption, ffff88046069da90->next is LIST_POISON1 (dead000000100100)
May 28 16:34:15 scratch02 kernel: Modules linked in: nfsv4 nfsd nfs_acl auth_rpcgss oid_registry autofs4 nfsv3 nfs lockd sunrpc ipv6 binfmt_misc xfs exportfs libcrc32c dm_mod af_packet hid_generic usbhid hid corete
May 28 16:34:15 scratch02 kernel: CPU: 5 PID: 8657 Comm: nfsd Not tainted 3.10.40 #1
May 28 16:34:15 scratch02 kernel: Hardware name: Supermicro X9DRW/X9DRW, BIOS 3.0b 11/28/2013
May 28 16:34:15 scratch02 kernel: ffffffff81578ec7 ffff8802b2e29c48 ffffffff8136a29b ffff8802b2e29c88
May 28 16:34:15 scratch02 kernel: ffffffff8102be40 0000000000000000 ffff8802b2e29c98 ffff88046069da90
May 28 16:34:15 scratch02 kernel: ffff8802b2c12000 ffff88046069da80 00000000000000f8 ffff8802b2e29ce8
May 28 16:34:15 scratch02 kernel: Call Trace:
May 28 16:34:15 scratch02 kernel: [<ffffffff8136a29b>] dump_stack+0x19/0x1b
May 28 16:34:15 scratch02 kernel: [<ffffffff8102be40>] warn_slowpath_common+0x62/0x7b
May 28 16:34:15 scratch02 kernel: [<ffffffff8102beed>] warn_slowpath_fmt+0x41/0x43
May 28 16:34:15 scratch02 kernel: [<ffffffff811d052d>] __list_del_entry+0x8d/0x98
May 28 16:34:15 scratch02 kernel: [<ffffffffa0311521>] lru_put_end+0x1e/0x55 [nfsd]
May 28 16:34:15 scratch02 kernel: [<ffffffffa0311e85>] nfsd_cache_update+0xca/0x104 [nfsd]
May 28 16:34:15 scratch02 kernel: [<ffffffff810f9ce4>] ? dput+0x2e/0x168
May 28 16:34:15 scratch02 kernel: [<ffffffffa0308e2d>] nfsd_dispatch+0x142/0x155 [nfsd]
May 28 16:34:15 scratch02 kernel: [<ffffffffa028b289>] svc_process_common+0x2e7/0x4bd [sunrpc]
May 28 16:34:15 scratch02 kernel: [<ffffffffa028b666>] svc_process+0x109/0x123 [sunrpc]
May 28 16:34:15 scratch02 kernel: [<ffffffffa03087f4>] nfsd+0xc1/0x11a [nfsd]
May 28 16:34:15 scratch02 kernel: [<ffffffffa0308733>] ? nfsd_destroy+0xba/0xba [nfsd]
May 28 16:34:15 scratch02 kernel: [<ffffffff81049d9b>] kthread+0xd0/0xd8
May 28 16:34:15 scratch02 kernel: [<ffffffff81049ccb>] ? __init_kthread_worker+0x55/0x55
May 28 16:34:15 scratch02 kernel: [<ffffffff8136eaac>] ret_from_fork+0x7c/0xb0
May 28 16:34:15 scratch02 kernel: [<ffffffff81049ccb>] ? __init_kthread_worker+0x55/0x55
May 28 16:34:15 scratch02 kernel: ---[ end trace d29a6ee0c23b26e3 ]---

May 28 16:34:15 scratch02 kernel: ------------[ cut here ]------------
May 28 16:34:15 scratch02 kernel: WARNING: at lib/list_debug.c:53 __list_del_entry+0x8d/0x98()
May 28 16:34:15 scratch02 kernel: list_del corruption, ffff880438107b70->next is LIST_POISON1 (dead000000100100)
May 28 16:34:15 scratch02 kernel: Modules linked in: nfsv4 nfsd nfs_acl auth_rpcgss oid_registry autofs4 nfsv3 nfs lockd sunrpc ipv6 binfmt_misc xfs exportfs libcrc32c dm_mod af_packet hid_generic usbhid hid corete
May 28 16:34:15 scratch02 kernel: CPU: 2 PID: 8647 Comm: nfsd Tainted: G        W    3.10.40 #1
May 28 16:34:15 scratch02 kernel: Hardware name: Supermicro X9DRW/X9DRW, BIOS 3.0b 11/28/2013
May 28 16:34:15 scratch02 kernel: ffffffff81578ec7 ffff8803e8fd1c48 ffffffff8136a29b ffff8803e8fd1c88
May 28 16:34:15 scratch02 kernel: ffffffff8102be40 00000000000a0932 ffff8803e8fd1c98 ffff880438107b70
May 28 16:34:15 scratch02 kernel: ffff8802b3fa6000 ffff880438107b60 00000000000000f8 ffff8803e8fd1ce8
May 28 16:34:15 scratch02 kernel: Call Trace:
May 28 16:34:15 scratch02 kernel: [<ffffffff8136a29b>] dump_stack+0x19/0x1b
May 28 16:34:15 scratch02 kernel: [<ffffffff8102be40>] warn_slowpath_common+0x62/0x7b
May 28 16:34:15 scratch02 kernel: [<ffffffff8102beed>] warn_slowpath_fmt+0x41/0x43
May 28 16:34:15 scratch02 kernel: [<ffffffff811d0065>] ? do_raw_spin_lock+0xac/0xfe
May 28 16:34:15 scratch02 kernel: [<ffffffff811d052d>] __list_del_entry+0x8d/0x98
May 28 16:34:15 scratch02 kernel: [<ffffffffa0311521>] lru_put_end+0x1e/0x55 [nfsd]
May 28 16:34:15 scratch02 kernel: [<ffffffffa0311e85>] nfsd_cache_update+0xca/0x104 [nfsd]
May 28 16:34:15 scratch02 kernel: [<ffffffff810f9ce4>] ? dput+0x2e/0x168
May 28 16:34:15 scratch02 kernel: [<ffffffffa0308e2d>] nfsd_dispatch+0x142/0x155 [nfsd]
May 28 16:34:15 scratch02 kernel: [<ffffffffa028b289>] svc_process_common+0x2e7/0x4bd [sunrpc]
May 28 16:34:15 scratch02 kernel: [<ffffffffa028b666>] svc_process+0x109/0x123 [sunrpc]
May 28 16:34:15 scratch02 kernel: [<ffffffffa03087f4>] nfsd+0xc1/0x11a [nfsd]
May 28 16:34:15 scratch02 kernel: [<ffffffffa0308733>] ? nfsd_destroy+0xba/0xba [nfsd]
May 28 16:34:15 scratch02 kernel: [<ffffffff81049d9b>] kthread+0xd0/0xd8
May 28 16:34:15 scratch02 kernel: [<ffffffff81049ccb>] ? __init_kthread_worker+0x55/0x55
May 28 16:34:15 scratch02 kernel: [<ffffffff8136eaac>] ret_from_fork+0x7c/0xb0
May 28 16:34:15 scratch02 kernel: [<ffffffff81049ccb>] ? __init_kthread_worker+0x55/0x55
May 28 16:34:15 scratch02 kernel: ---[ end trace d29a6ee0c23b26e4 ]---

May 28 16:36:03 scratch02 kernel: ------------[ cut here ]------------
May 28 16:36:03 scratch02 kernel: WARNING: at lib/list_debug.c:62 __list_del_entry+0x8d/0x98()
May 28 16:36:03 scratch02 kernel: list_del corruption. next->prev should be ffff88042dc1d940, but was ffff8804281c04e0
May 28 16:36:03 scratch02 kernel: Modules linked in: nfsv4 nfsd nfs_acl auth_rpcgss oid_registry autofs4 nfsv3 nfs lockd sunrpc ipv6 binfmt_misc xfs exportfs libcrc32c dm_mod af_packet hid_generic usbhid hid corete
May 28 16:36:03 scratch02 kernel: CPU: 0 PID: 8721 Comm: nfsd Tainted: G        W    3.10.40 #1
May 28 16:36:03 scratch02 kernel: Hardware name: Supermicro X9DRW/X9DRW, BIOS 3.0b 11/28/2013
May 28 16:36:03 scratch02 kernel: ffffffff81578ec7 ffff8802a74b5c28 ffffffff8136a29b ffff8802a74b5c68
May 28 16:36:03 scratch02 kernel: ffffffff8102be40 0000000000000000 ffff8802a74b5c78 ffff88042dc1d940
May 28 16:36:03 scratch02 kernel: 000000000001d503 ffff88042dc1d930 000000002309c644 ffff8802a74b5cc8
May 28 16:36:03 scratch02 kernel: Call Trace:
May 28 16:36:03 scratch02 kernel: [<ffffffff8136a29b>] dump_stack+0x19/0x1b
May 28 16:36:03 scratch02 kernel: [<ffffffff8102be40>] warn_slowpath_common+0x62/0x7b
May 28 16:36:03 scratch02 kernel: [<ffffffff8102beed>] warn_slowpath_fmt+0x41/0x43
May 28 16:36:03 scratch02 kernel: [<ffffffff811d052d>] __list_del_entry+0x8d/0x98
May 28 16:36:03 scratch02 kernel: [<ffffffffa0311521>] lru_put_end+0x1e/0x55 [nfsd]
May 28 16:36:03 scratch02 kernel: [<ffffffffa0311cd5>] nfsd_cache_lookup+0x533/0x619 [nfsd]
May 28 16:36:03 scratch02 kernel: [<ffffffffa0308d44>] nfsd_dispatch+0x59/0x155 [nfsd]
May 28 16:36:03 scratch02 kernel: [<ffffffffa028b289>] svc_process_common+0x2e7/0x4bd [sunrpc]
May 28 16:36:03 scratch02 kernel: [<ffffffffa028b666>] svc_process+0x109/0x123 [sunrpc]
May 28 16:36:03 scratch02 kernel: [<ffffffffa03087f4>] nfsd+0xc1/0x11a [nfsd]
May 28 16:36:03 scratch02 kernel: [<ffffffffa0308733>] ? nfsd_destroy+0xba/0xba [nfsd]
May 28 16:36:03 scratch02 kernel: [<ffffffff81049d9b>] kthread+0xd0/0xd8
May 28 16:36:03 scratch02 kernel: [<ffffffff81049ccb>] ? __init_kthread_worker+0x55/0x55
May 28 16:36:03 scratch02 kernel: [<ffffffff8136eaac>] ret_from_fork+0x7c/0xb0
May 28 16:36:03 scratch02 kernel: [<ffffffff81049ccb>] ? __init_kthread_worker+0x55/0x55
May 28 16:36:04 scratch02 kernel: ---[ end trace d29a6ee0c23b26e5 ]---

Comment 1 Tomas Kasparek 2014-05-28 16:29:29 UTC

Created attachment 137601 [details]
hangup screenshot

Most of the time, we are only able to got the screen with NMI, so some lock stays probably held after the list corruption.

Comment 2 Tomas Kasparek 2014-06-04 05:49:20 UTC

Today another hang, where I was able to get some output (previosuly 3 times it hangs with black screen and not reacting to keyboard):

BUG: unable to handle kernel paging request at 0000040000000300
IP: [<ffffffff811d04ee>] __list_del_entry+0x4e/0x98
PGD 0  
Oops: 0000 [#1] SMP
Modules linked in: nfsv4 nfsd nfs_acl auth_rpcgss oid_registry autofs4 nfsv3 nfs lockd sunrpc ipv6 binfmt_misc xfs exportfs libcrc32c dm_mod af_packet hid_generic usbhid hid coretemp psmouse
 sg evdev rtc_cmos 3w_sas mperf processor thermal_sys lpc_ich mfd_core ehci_pci ehci_hcd ixgbe mdio igb hwmon i2c_algo_bit i2c_core ptp pps_core button usbcore usb_common unix
CPU: 1 PID: 2228 Comm: nfsd Not tainted 3.10.40 #1
Hardware name: Supermicro X9DRW/X9DRW, BIOS 3.0c 03/24/2014
task: ffff88045c75b940 ti: ffff88045c390000 task.ti: ffff88045c390000
RIP: 0010:[<ffffffff811d04ee>]  [<ffffffff811d04ee>] __list_del_entry+0x4e/0x98
RSP: 0018:ffff88045c391cf8  EFLAGS: 00010207
RAX: dead000000200200 RBX: ffff88044ca47390 RCX: ffff88044ca47390
RDX: 0000040000000300 RSI: 0000a38601000200 RDI: ffff88044ca47390
RBP: ffff88045c391cf8 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffffa0311e76 R11: 0000000000000000 R12: ffff88045c766000
R13: ffff88044ca47380 R14: 00000000000000f8 R15: ffff88042868101c
FS:  0000000000000000(0000) GS:ffff88047fc20000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000040000000300 CR3: 000000000160b000 CR4: 00000000000407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
 ffff88045c391d18 ffffffffa0311521 0000000000000000 0000000000000002
 ffff88045c391d58 ffffffffa0311e85 ffffffff810f9ce4 ffff88045c766000
 ffffffffa0330150 ffff880428681018 000000000000001c ffff880428681000
Call Trace:
 [<ffffffffa0311521>] lru_put_end+0x1e/0x55 [nfsd]
 [<ffffffffa0311e85>] nfsd_cache_update+0xca/0x104 [nfsd]
 [<ffffffff810f9ce4>] ? dput+0x2e/0x168
 [<ffffffffa0308e2d>] nfsd_dispatch+0x142/0x155 [nfsd]
 [<ffffffffa028b289>] svc_process_common+0x2e7/0x4bd [sunrpc]
 [<ffffffffa028b666>] svc_process+0x109/0x123 [sunrpc]
 [<ffffffffa03087f4>] nfsd+0xc1/0x11a [nfsd]
 [<ffffffffa0308733>] ? nfsd_destroy+0xba/0xba [nfsd]
 [<ffffffff81049d9b>] kthread+0xd0/0xd8
 [<ffffffff81049ccb>] ? __init_kthread_worker+0x55/0x55
 [<ffffffff8136eaac>] ret_from_fork+0x7c/0xb0
 [<ffffffff81049ccb>] ? __init_kthread_worker+0x55/0x55
Code: 29 8f 57 81 be 35 00 00 00 eb 51 48 b8 00 02 20 00 00 00 ad de 48 39 c2 75 11 49 89 d0 be 38 00 00 00 48 c7 c2 5d 8f 57 81 eb 31 <48> 8b 02 48 39 f8 74 11 49 89 c0 48 c7 c2 91 8f 57 81
 be 3b 00
RIP  [<ffffffff811d04ee>] __list_del_entry+0x4e/0x98
 RSP <ffff88045c391cf8>
CR2: 0000040000000300
---[ end trace 4d74e24f9b4e8b51 ]---

Comment 3 bfields 2014-06-04 20:20:45 UTC

So nfsd_cache_lookup is passing lru_put_end an already-deleted entry.  I'm not sure how that happened.

The last bugfix I remember here is 781c2a5a5f75eacc04663aced0f0f1a648d4f308 "nfsd: when reusing an existing repcache entry, unhash it first", but it looks like that was backported to 3.10.25 as fbaa929d862503b59110081efb57a40213193a6d.

Comment 4 Jeff Layton 2014-06-04 21:04:05 UTC

Hmm...

In both stack traces, it's the lru_put_end call in nfsd_cache_update that's oopsing.

I suppose it's possible that you create a new entry and then go to call pc_func. Before that function returns though, the entry has made its way all the way to the head of the list, and we end up calling nfsd_reply_cache_free_locked on it from one of the other places in nfsd_cache_lookup before nfsd_cache_update is called.

What might be best is to try to apply commit 
a0ef5e19684f0447da9ff0654a12019c484f57ca and see if that fixes it.

That's what I get for trying to be too clever with the allocation and freeing of these things...

Comment 5 Jeff Layton 2014-06-05 01:10:05 UTC

Created attachment 138171 [details]
nfsd: don't halt scanning the DRC LRU list when there is an RC_INPROG entry

We might also want to change the pruner logic a bit too. If there are entries
that are eligible for expiration after an entry that's still RC_INPROG, then we
want to allow them to be pruned.

Bruce, maybe something like this patch?

Comment 6 Jeff Layton 2014-06-05 01:12:01 UTC

Created attachment 138181 [details]
nfsd: don't halt scanning the DRC LRU list when there is an RC_INPROG entry

...or this patch (which removes the now unused nfsd_cache_entry_expired function).

Comment 7 Jeff Layton 2014-06-05 01:36:29 UTC

Created attachment 138191 [details]
nfsd: don't halt scanning the DRC LRU list when there is an RC_INPROG entry

...or maybe this, which is closer to the original logic. We do want to purge entries that are before expiration, if they are !RC_INPROG and we're over the max_drc_entries limit.

Comment 8 Tomas Kasparek 2014-06-05 07:25:32 UTC

Just can confirm, that after 3.0.25 with fbaa929d86250 it behaves better, but still some hangs occur. Will try the patch proposed by Jeff (138191) and let you know. (will take some time - vacation next week) Thanks.

Comment 9 Jeff Layton 2014-06-05 10:32:12 UTC

Note that the patch in comment #7 is not likely to help your hangs. It's just a proposal to make the cache pruner not be blocked by a single entry.

The patch that may help you is this one from mainline kernels:

commit a0ef5e19684f0447da9ff0654a12019c484f57ca
Author: Jeff Layton <jlayton@redhat.com>
Date:   Thu Dec 5 06:00:51 2013 -0500

    nfsd: don't try to reuse an expired DRC entry off the list
    
    Currently when we are processing a request, we try to scrape an expired
    or over-limit entry off the list in preference to allocating a new one
    from the slab.
    
    This is unnecessarily complicated. Just use the slab layer.
    
    Signed-off-by: Jeff Layton <jlayton@redhat.com>
    Signed-off-by: J. Bruce Fields <bfields@redhat.com>


...it should apply cleanly, but let me know if it doesn't.

Comment 10 Tomas Kasparek 2014-06-05 13:50:56 UTC

Did apply and build without any problem. Will let you know the results. Thanks.

Comment 11 Jeff Layton 2014-06-05 14:02:14 UTC

...and I posted the patch in comment #7 upstream. It's not directly related to this problem but may help the memory-management of the cache under better control.

Comment 12 Tomas Kasparek 2014-06-06 07:46:10 UTC

I asked our users to load the machine, here is the result:

Message from syslogd@scratch02 at Jun  6 00:55:04 ...
 kernel:BUG: spinlock lockup suspected on CPU#11, nfsd/1823

Message from syslogd@scratch02 at Jun  6 00:55:04 ...
 kernel: lock: cache_lock+0x0/0xffffffffffffb2f0 [nfsd], .magic: dead4ead, .owner: nfsd/1891, .owner_cpu: 9

Message from syslogd@scratch02 at Jun  6 00:55:07 ...
 kernel:BUG: spinlock lockup suspected on CPU#8, nfsd/1845

Message from syslogd@scratch02 at Jun  6 00:55:07 ...
 kernel: lock: cache_lock+0x0/0xffffffffffffb2f0 [nfsd], .magic: dead4ead, .owner: nfsd/1891, .owner_cpu: 9

Message from syslogd@scratch02 at Jun  6 00:55:13 ...
 kernel:BUG: spinlock lockup suspected on CPU#0, nfsd/1943

Message from syslogd@scratch02 at Jun  6 00:55:13 ...
 kernel: lock: cache_lock+0x0/0xffffffffffffb2f0 [nfsd], .magic: dead4ead, .owner: nfsd/1891, .owner_cpu: 9

Message from syslogd@scratch02 at Jun  6 00:55:14 ...
 kernel:BUG: spinlock lockup suspected on CPU#1, nfsd/1840

Message from syslogd@scratch02 at Jun  6 00:55:14 ...
 kernel: lock: cache_lock+0x0/0xffffffffffffb2f0 [nfsd], .magic: dead4ead, .owner: nfsd/1891, .owner_cpu: 9

Message from syslogd@scratch02 at Jun  6 00:55:22 ...
 kernel:BUG: spinlock lockup suspected on CPU#6, nfsd/1905

Message from syslogd@scratch02 at Jun  6 00:55:22 ...
 kernel: lock: cache_lock+0x0/0xffffffffffffb2f0 [nfsd], .magic: dead4ead, .owner: nfsd/1891, .owner_cpu: 9


Message from syslogd@scratch02 at Jun  6 00:55:37 ...
 kernel:BUG: spinlock lockup suspected on CPU#10, nfsd/1874

Message from syslogd@scratch02 at Jun  6 00:55:37 ...
 kernel: lock: cache_lock+0x0/0xffffffffffffb2f0 [nfsd], .magic: dead4ead, .owner: nfsd/1891, .owner_cpu: 9

Message from syslogd@scratch02 at Jun  6 00:55:37 ...
 kernel:BUG: spinlock lockup suspected on CPU#7, nfsd/1884

Message from syslogd@scratch02 at Jun  6 00:55:37 ...
 kernel: lock: cache_lock+0x0/0xffffffffffffb2f0 [nfsd], .magic: dead4ead, .owner: nfsd/1891, .owner_cpu: 9

Message from syslogd@scratch02 at Jun  6 00:57:02 ...
 kernel:BUG: spinlock lockup suspected on CPU#3, kworker/3:0/16105

Message from syslogd@scratch02 at Jun  6 00:57:02 ...
 kernel: lock: cache_lock+0x0/0xffffffffffffb2f0 [nfsd], .magic: dead4ead, .owner: nfsd/1891, .owner_cpu: 9


Still I see a difference, previously it dies quickly, in one moment, this time it takes about 4 hours till it gives up definitely. So I expect the patch helps to solve some problem, but still there is something wrong. Except for these troubles I see high memory (slab) pressure during high load, so #7 patch may be of help too, but I don't want to mix several things for now. Will try to get better output, but that may take some time.

Thanks.

Comment 13 Jeff Layton 2014-06-06 18:47:14 UTC

Ok, thanks for trying it out. I'm stumped then. It might be interesting to see the stack traces from these hung threads.

Comment 14 Tomas Kasparek 2014-06-16 13:34:12 UTC

After my vacation got this from 3.10.40. Right now, after the reboot it works up to now with 3.10.42-rc2. Will have to study changelogs for full 42 and 43 if there is anything which may help:

------------[ cut here ]------------
WARNING: at lib/list_debug.c:53 __list_del_entry+0x8d/0x98()
list_del corruption, ffff88042eee9780->next is LIST_POISON1 (dead000000100100)
Modules linked in: nfsv4 nfsd nfs_acl auth_rpcgss oid_registry autofs4 nfsv3 nfs lockd sunrpc ipv6 binfmt_misc xfs exportfs libcrc32c dm_mod af_packet hid_generic usbhid hid coretemp psmouse sg evdev rtc_cmos mperf processor thermal_sys 3w_sas lpc_ich mfd_core ehci_pci ehci_hcd ixgbe mdio igb hwmon i2c_algo_bit i2c_core ptp pps_core button usbcore usb_common unix
CPU: 5 PID: 2168 Comm: nfsd Not tainted 3.10.40 #1
Hardware name: Supermicro X9DRW/X9DRW, BIOS 3.0c 03/24/2014
 ffffffff81578eff ffff88045f887c48 ffffffff8136a29b ffff88045f887c88
 ffffffff8102be40 0000000000000000 ffff88045f887c98 ffff88042eee9780
 ffff88045fc5e000 ffff88042eee9770 00000000000000f8 ffff88045f887ce8
Call Trace:
 [<ffffffff8136a29b>] dump_stack+0x19/0x1b
 [<ffffffff8102be40>] warn_slowpath_common+0x62/0x7b
 [<ffffffff8102beed>] warn_slowpath_fmt+0x41/0x43
 [<ffffffff811d052d>] __list_del_entry+0x8d/0x98
 [<ffffffffa0316521>] lru_put_end+0x1e/0x55 [nfsd]
 [<ffffffffa0316dcb>] nfsd_cache_update+0xca/0x104 [nfsd]
 [<ffffffff810f9ce4>] ? dput+0x2e/0x168
 [<ffffffffa030de2d>] nfsd_dispatch+0x142/0x155 [nfsd]
 [<ffffffffa02b1289>] svc_process_common+0x2e7/0x4bd [sunrpc]
 [<ffffffffa02b1666>] svc_process+0x109/0x123 [sunrpc]
 [<ffffffffa030d7f4>] nfsd+0xc1/0x11a [nfsd]
 [<ffffffffa030d733>] ? nfsd_destroy+0xba/0xba [nfsd]
 [<ffffffff81049d9b>] kthread+0xd0/0xd8
 [<ffffffff81049ccb>] ? __init_kthread_worker+0x55/0x55
 [<ffffffff8136eaac>] ret_from_fork+0x7c/0xb0
 [<ffffffff81049ccb>] ? __init_kthread_worker+0x55/0x55

Would you like me to check any other patches (or config options to get more info)?

Comment 15 Jeff Layton 2014-06-17 00:35:00 UTC

Ok, probably what we should do is go ahead and ask for a0ef5e19684f to be included in stable. Then, if you or anyone is still seeing problems we can take it up from there.

Bruce, sound ok? I guess what we do is send that patch to stable@vger and mention that it should go into v3.10-stable kernels?

Comment 16 bfields 2014-06-17 02:34:26 UTC

(In reply to Jeff Layton from comment #15)
> Bruce, sound ok?

That would be OK with me.

> I guess what we do is send that patch to stable@vger and
> mention that it should go into v3.10-stable kernels?

Yes.  But:

Kasparek, I'm not clear exactly which kernels you're reporting results from.

In comment 12, the reported "lockup suspected" warning is from a kernel (which one?) with a0ef5e19684f applied, but not any other patch?

And you haven't reported any results from kernels with both a0ef5e19684f and Jeff's patch from comment 7 applied?

Comment 17 Tomas Kasparek 2014-06-17 06:56:01 UTC

Original is from 3.10.40 without any patches, comment 12 and comment 14 are for 3.10.40 + a0ef5e19684f. Did not apply anything else according to comment 9. Right now, the machine is running 3.10.42-rc2 + a0ef5e19684f (security fix).

Comment 18 Tomas Kasparek 2014-06-20 13:04:48 UTC

Hello, got hang with 3.10.42-rc2 + a0ef5e19684f, no response on keyboard, black screen.
After that run 3.10.44 + a0ef5e19684f, try to load the machine enough and got:

 ------------[ cut here ]------------
 WARNING: at lib/list_debug.c:53 __list_del_entry+0x8d/0x98()
 list_del corruption, ffff88044f5e01d0->next is LIST_POISON1 (dead000000100100)
 Modules linked in: nfsv4 nfsd nfs_acl auth_rpcgss oid_registry autofs4 nfsv3 nfs lockd sunrpc ipv6 binfmt_misc xfs exportfs libcrc32c dm_mod af_packet hid_generic usbhid hid coretemp psmouse sg evdev rtc_cmos mperf processor thermal_sys lpc_ich mfd_core 3w_sas ehci_pci ehci_hcd ixgbe mdio igb hwmon i2c_algo_bit i2c_core ptp pps_core button usbcore usb_common unix
 CPU: 5 PID: 2225 Comm: nfsd Not tainted 3.10.44 #1
 Hardware name: Supermicro X9DRW/X9DRW, BIOS 3.0c 03/24/2014
 ffffffff8157935e ffff88045c4b9c48 ffffffff8136aafb ffff88045c4b9c88
 ffffffff8102be78 0000000000000000 ffff88045c4b9c98 ffff88044f5e01d0
 ffff88045c8ae000 ffff88044f5e01c0 0000000000000070 ffff88045c4b9ce8
 Call Trace:
 [<ffffffff8136aafb>] dump_stack+0x19/0x1b
 [<ffffffff8102be78>] warn_slowpath_common+0x62/0x7b
 [<ffffffff8102bf25>] warn_slowpath_fmt+0x41/0x43
 [<ffffffff811d0c6d>] __list_del_entry+0x8d/0x98
 [<ffffffffa0315521>] lru_put_end+0x1e/0x55 [nfsd]
 [<ffffffffa0315dcb>] nfsd_cache_update+0xca/0x104 [nfsd]
 [<ffffffffa0319ae6>] ? nfs3svc_encode_readres+0xc9/0xc9 [nfsd]
 [<ffffffffa030ce2d>] nfsd_dispatch+0x142/0x155 [nfsd]
 [<ffffffffa02b0289>] svc_process_common+0x2e7/0x4bd [sunrpc]
 [<ffffffffa02b0666>] svc_process+0x109/0x123 [sunrpc]
 [<ffffffffa030c7f4>] nfsd+0xc1/0x11a [nfsd]
 [<ffffffffa030c733>] ? nfsd_destroy+0xba/0xba [nfsd]
 [<ffffffff81049e0b>] kthread+0xd0/0xd8
 [<ffffffff81049d3b>] ? __init_kthread_worker+0x55/0x55
 [<ffffffff8136f2ec>] ret_from_fork+0x7c/0xb0
 [<ffffffff81049d3b>] ? __init_kthread_worker+0x55/0x55
---[ end trace de826c3b219809dc ]---
------------[ cut here ]------------
WARNING: at lib/list_debug.c:62 __list_del_entry+0x8d/0x98()
list_del corruption. next->prev should be ffff8803e4feb080, but was ffff880467a0b4e0
Modules linked in: nfsv4 nfsd nfs_acl auth_rpcgss oid_registry autofs4 nfsv3 nfs lockd sunrpc ipv6 binfmt_misc xfs exportfs libcrc32c dm_mod af_packet hi
d_generic usbhid hid coretemp psmouse sg evdev rtc_cmos mperf processor thermal_sys lpc_ich mfd_core 3w_sas ehci_pci ehci_hcd ixgbe mdio igb hwmon i2c_algo_bit i2c_core ptp pps_core button usbcore usb_common unix
CPU: 0 PID: 2132 Comm: nfsd Tainted: G        W    3.10.44 #1
Hardware name: Supermicro X9DRW/X9DRW, BIOS 3.0c 03/24/2014
 ffffffff8157935e ffff880462409be8 ffffffff8136aafb ffff880462409c28
 ffffffff8102be78 0000000000000002 ffff880462409c38 ffff8803e4feb080
 00000000060c148a 0000000000000000 00000000ab49c61d ffff880462409c88
Call Trace:
 [<ffffffff8136aafb>] dump_stack+0x19/0x1b
 [<ffffffff8102be78>] warn_slowpath_common+0x62/0x7b
 [<ffffffff8102bf25>] warn_slowpath_fmt+0x41/0x43
 [<ffffffffa0315478>] ? nfsd_reply_cache_free_locked+0x28/0x84 [nfsd]
 [<ffffffff811d0c6d>] __list_del_entry+0x8d/0x98
 [<ffffffff811d0c86>] list_del+0xe/0x2e
 [<ffffffffa03154b4>] nfsd_reply_cache_free_locked+0x64/0x84 [nfsd]
 [<ffffffffa03155a9>] prune_cache_entries+0x51/0xa4 [nfsd]
 [<ffffffffa0315912>] nfsd_cache_lookup+0x170/0x55f [nfsd]
 [<ffffffffa030cd44>] nfsd_dispatch+0x59/0x155 [nfsd]
 [<ffffffffa02b0289>] svc_process_common+0x2e7/0x4bd [sunrpc]
 [<ffffffffa02b0666>] svc_process+0x109/0x123 [sunrpc]
 [<ffffffffa030c7f4>] nfsd+0xc1/0x11a [nfsd]
 [<ffffffffa030c733>] ? nfsd_destroy+0xba/0xba [nfsd]
 [<ffffffff81049e0b>] kthread+0xd0/0xd8
 [<ffffffff81049d3b>] ? __init_kthread_worker+0x55/0x55
 [<ffffffff8136f2ec>] ret_from_fork+0x7c/0xb0
 [<ffffffff81049d3b>] ? __init_kthread_worker+0x55/0x55
---[ end trace de826c3b219809dd ]---
general protection fault: 0000 [#1] SMP
Modules linked in: nfsv4 nfsd nfs_acl auth_rpcgss oid_registry autofs4 nfsv3 nfs lockd sunrpc ipv6 binfmt_misc xfs exportfs libcrc32c dm_mod af_packet hi
d_generic usbhid hid coretemp psmouse sg evdev rtc_cmos mperf processor thermal_sys lpc_ich mfd_core 3w_sas ehci_pci ehci_hcd ixgbe mdio igb hwmon i2c_algo_bit i2c_core ptp pps_core button usbcore usb_common unix
CPU: 0 PID: 2160 Comm: nfsd Tainted: G        W    3.10.44 #1             
Hardware name: Supermicro X9DRW/X9DRW, BIOS 3.0c 03/24/2014
task: ffff880460e12af0 ti: ffff880460926000 task.ti: ffff880460926000
RIP: 0010:[<ffffffffa0315487>]  [<ffffffffa0315487>] nfsd_reply_cache_free_locked+0x37/0x84 [nfsd]
RSP: 0018:ffff880460927cc8  EFLAGS: 00010286
RAX: ffff8803e4feb700 RBX: ffff8803e4feb070 RCX: 000000018020001d
RDX: dead000000200200 RSI: ffffea000f2e4d00 RDI: ffff88046f003900
RBP: ffff880460927cd8 R08: 0000000000000001 R09: 0000000000000000
R10: ffffffffa03158fb R11: 0000000000000000 R12: 00000000060c1490
R13: 0000000000000000 R14: 000000001d9695f6 R15: ffff880450ffbbd0
FS:  0000000000000000(0000) GS:ffff88047fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f980aa61020 CR3: 0000000347b64000 CR4: 00000000000407f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
 0000000000000000 dead0000001000f0 ffff880460927cf8 ffffffffa03155a9
 ffff880460e12af0 ffff880460e1c000 ffff880460927d58 ffffffffa0315912
 ffff880460927d38 ffffffff8104f93b 0000000760e1c000 0000000600000003
Call Trace:
 [<ffffffffa03155a9>] prune_cache_entries+0x51/0xa4 [nfsd]
 [<ffffffffa0315912>] nfsd_cache_lookup+0x170/0x55f [nfsd]
 [<ffffffff8104f93b>] ? groups_alloc+0x37/0xba
 [<ffffffffa030cd44>] nfsd_dispatch+0x59/0x155 [nfsd]
 [<ffffffffa02b0289>] svc_process_common+0x2e7/0x4bd [sunrpc]
 [<ffffffffa02b0666>] svc_process+0x109/0x123 [sunrpc]
 [<ffffffffa030c7f4>] nfsd+0xc1/0x11a [nfsd]
 [<ffffffffa030c733>] ? nfsd_destroy+0xba/0xba [nfsd]
 [<ffffffff81049e0b>] kthread+0xd0/0xd8
 [<ffffffff81049d3b>] ? __init_kthread_worker+0x55/0x55
 [<ffffffff8136f2ec>] ret_from_fork+0x7c/0xb0
 [<ffffffff81049d3b>] ? __init_kthread_worker+0x55/0x55
Code: 21 02 75 18 48 8b 7f 60 48 85 ff 74 0f 48 8b 43 68 29 05 9d 2c 02 00 e8 56 6c dc e0 48 8b 53 08 48 85 d2 74 2a 48 8b 03 48 85 c0 <48> 89 02 74 04 4
8 89 50 08 48 be 00 01 10 00 00 00 ad de 48 bf
RIP  [<ffffffffa0315487>] nfsd_reply_cache_free_locked+0x37/0x84 [nfsd]
 RSP <ffff880460927cc8>
---[ end trace de826c3b219809de ]---

Should i try to add the patch to improve cache performance from comment 7? Or is there anything else I can try (kernel debug setting etc.)

Thanks

Comment 19 Tomas Kasparek 2014-06-20 13:32:50 UTC

I was finally able to setup a load to have the system die in short time, so I'm able to better test any patches etc.

Comment 20 Jeff Layton 2014-06-20 14:12:49 UTC

Strange.

Clearly the cache entry attached to the rqstp got freed while it was being processed, I just don't see the bug that would make that occur.

You're certainly welcome to try the patch in comment #7. I doubt it'll make much difference, but it's unlikely to hurt anything.

Comment 21 Jeff Layton 2014-06-20 14:24:43 UTC

Can you attach this file from the sources you're building your kernel from?

    fs/nfsd/nfscache.c

...I'd like to take a look to see exactly how this file looks after your patching.

Comment 22 Tomas Kasparek 2014-06-21 04:02:10 UTC

(In reply to Jeff Layton from comment #20)
> You're certainly welcome to try the patch in comment #7. I doubt it'll make
> much difference, but it's unlikely to hurt anything.

OK, will try next week.

Comment 23 Tomas Kasparek 2014-06-21 04:02:56 UTC

Created attachment 140541 [details]
patched nfscache.c from 3.10.44

Comment 24 Tomas Kasparek 2014-06-21 04:03:54 UTC

Created attachment 140551 [details]
.config used for the kernel

Comment 25 Jeff Layton 2014-06-21 11:47:19 UTC

Ok, I think I may see the problem. Twisted logic (written by me, sadly) is to blame:

static inline bool
nfsd_cache_entry_expired(struct svc_cacherep *rp)
{
	return rp->c_state != RC_INPROG &&
	       time_after(jiffies, rp->c_timestamp + RC_EXPIRE);
}

/*
 * Walk the LRU list and prune off entries that are older than RC_EXPIRE.
 * Also prune the oldest ones when the total exceeds the max number of entries.
 */
static void
prune_cache_entries(void)
{
	struct svc_cacherep *rp, *tmp;

	list_for_each_entry_safe(rp, tmp, &lru_head, c_lru) {
		if (!nfsd_cache_entry_expired(rp) &&
		    num_drc_entries <= max_drc_entries)
			break;
		nfsd_reply_cache_free_locked(rp);
	}

...

Let's assume that there's an RC_INPROG entry at the head of the c_lru list and we're at or over the max_drc_entries limit. nfsd_cache_entry_expired returns false, but there are still more entries than the max. The entry will be freed.

The patch in comment #7 should fix that, actually. With it, RC_INPROG entries are always skipped, so we should never free one now. Can you test with that and see if it helps?

Comment 26 Tomas Kasparek 2014-06-21 17:50:59 UTC

Thanks, will test it on Monday. Bye.

Comment 27 Tomas Kasparek 2014-06-25 05:50:55 UTC

I can confirm, that with patch from comment 7, the system is running for quiet long time without any problem. Seems to be solved.

May I ask for submitting both patches to stable (3.10 is what I need most)?

Thanks for all your help.

Note You need to log in before you can comment on or make changes to this bug.