Bug 68371

Summary: NFSv4.0 on RDMA kernel panic when decoding READDIR result
Product: File System Reporter: Chuck Lever (chucklever)
Component: NFSAssignee: Chuck Lever (chucklever)
Status: RESOLVED DUPLICATE    
Severity: normal CC: jlayton, trondmy
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.12 Subsystem:
Regression: No Bisected commit-id:
Attachments: Photo of oops on console
Proposed fix

Description Chuck Lever 2014-01-09 19:34:58 UTC
When starting up the cthon basic tests with "-o proto=rdma,vers=4", while
decoding a READDIR result, xdr_inline_decode calls memcpy, and that results in
a NULL dereference and kernel panic.  The same problem does not occur for
NFSv3.

The call graph is nfs_readdir_page_filler -> alloc_pages_current ->
nfs4_decode_dirent -> xdr_inline_decode.
Comment 1 Chuck Lever 2014-01-09 19:35:39 UTC
Created attachment 121421 [details]
Photo of oops on console
Comment 2 Chuck Lever 2014-01-13 16:30:46 UTC
I'm working on this issue first because it blocks the ability to run even basic tests.

Cthon basic tests fail immediately with a hard system reboot (or multiple oopses resulting in a kernel panic) after

commit aa9c2669626ca7e5e5bab28e6caeb583fd40099b
Author: David Quigley <dpquigl@davequigley.com>
Date:   Wed May 22 12:50:44 2013 -0400

    NFS: Client implementation of Labeled-NFS

in 3.11-rc1 (via "git bisect").

There were several subsequent fixes in this area, including commits b4a2cf76, 4f3cc480, d7067b2d, and d204c5d2.  I will try 3.13-rc8.
Comment 3 Chuck Lever 2014-01-13 19:47:51 UTC
3.13-rc8 fails much the same way as 3.12 did, but this time the full oops output was captured in the kernel log:

Jan 13 14:40:45 manet kernel: BUG: unable to handle kernel NULL pointer dereference at           (null)
Jan 13 14:40:45 manet kernel: IP: [<ffffffff8129cc56>] memcpy+0x6/0x110
Jan 13 14:40:45 manet kernel: PGD cbe82067 PUD ccfad067 PMD 0 
Jan 13 14:40:45 manet kernel: Oops: 0000 [#1] SMP 
Jan 13 14:40:45 manet kernel: Modules linked in: nfsv4(F) xprtrdma(F) rdma_cm(F) iw_cm(F) ib_addr(F) nfs(F) fscache(F) lockd(F) ib_ipoib(F) ib_cm(F) ib_uverbs(F) ib_umad(F) mlx4_en(F) mlx4_ib(F) ib_sa(F) ib_mad(F) ib_core(F) mlx4_core(F) autofs
4(F) rpcsec_gss_krb5(F) auth_rpcgss(F) sunrpc(F) cpufreq_ondemand(F) ipv6(F) xfs(F) libcrc32c(F) iTCO_wdt(F) iTCO_vendor_support(F) microcode(F) pcspkr(F) i2c_i801(F) sg(F) lpc_ich(F) mfd_core(F) r8169(F) mii(F) snd_hda_codec_hdmi(F) snd_hda_co
dec_realtek(F) snd_hda_intel(F) snd_hda_codec(F) snd_hwdep(F) snd_seq(F) snd_seq_device(F) snd_pcm(F) snd_timer(F) snd(F) soundcore(F) snd_page_alloc(F) acpi_cpufreq(F) ext4(F) jbd2(F) mbcache(F) sd_mod(F) crc_t10dif(F) crct10dif_common(F) ahci
(F) libahci(F) xhci_hcd(F) i915(F) drm_kms_helper(F) drm(F) i2c_algo_bit(F) i2c_core(F) video(F) dm_mirror(F) dm_region_hash(F) dm_log(F) dm_mod(F)
Jan 13 14:40:45 manet kernel: CPU: 1 PID: 2979 Comm: rm Tainted: GF            3.13.0-rc8 #1
Jan 13 14:40:45 manet kernel: Hardware name: Shuttle SZ77/FZ77, BIOS 1.10 07/10/2012
Jan 13 14:40:45 manet kernel: task: ffff8800c46fe090 ti: ffff8800b8570000 task.ti: ffff8800b8570000
Jan 13 14:40:45 manet kernel: RIP: 0010:[<ffffffff8129cc56>]  [<ffffffff8129cc56>] memcpy+0x6/0x110
Jan 13 14:40:45 manet kernel: RSP: 0018:ffff8800b8571aa0  EFLAGS: 00010206
Jan 13 14:40:45 manet kernel: RAX: ffff8800b87ad000 RBX: ffff8800b8571ba8 RCX: 0000000000001000
Jan 13 14:40:45 manet kernel: RDX: 0000000000001000 RSI: 0000000000000000 RDI: ffff8800b87ad000
Jan 13 14:40:45 manet kernel: RBP: ffff8800b8571ac8 R08: ffff8800b8571ba8 R09: 000000000
0000000
Jan 13 14:40:45 manet kernel: R10: 000000000000177a R11: 0000000000000000 R12: 0000000000000004
Jan 13 14:40:45 manet kernel: R13: ffff8800b87ad000 R14: 0000000000001000 R15: ffff8800b9972000
Jan 13 14:40:45 manet kernel: FS:  00007f8a5ea4b700(0000) GS:ffff88021f280000(0000) knlGS:0000000000000000
Jan 13 14:40:45 manet kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 13 14:40:45 manet kernel: CR2: 0000000000000000 CR3: 00000000cbf7c000 CR4: 00000000001407e0
Jan 13 14:40:45 manet kernel: Stack:
Jan 13 14:40:45 manet kernel: ffffffffa05dc1b1 ffff8800b8571c58 ffff8800b8571ba8 ffff880000000000
Jan 13 14:40:45 manet kernel: 6db6db6db6db6db7 ffff8800b8571b28 ffffffffa071f121 ffff8800b8571b28
Jan 13 14:40:45 manet kernel: ffffffff81178a02 0000000000000000 0000000000000000 0000000000000000
Jan 13 14:40:45 manet kernel: Call Trace:
Jan 13 14:40:45 manet kernel: [<ffffffffa05dc1b1>] ? xdr_inline_decode+0xb1/0x120 [sunrpc]
Jan 13 14:40:45 manet kernel: [<ffffffffa071f121>] nfs4_decode_dirent+0x31/0x190 [nfsv4]
Jan 13 14:40:45 manet kernel: [<ffffffff81178a02>] ? alloc_pages_current+0xb2/0x170
Jan 13 14:40:45 manet kernel: [<ffffffffa06a1225>] nfs_readdir_page_filler+0xe5/0x2c0 [nfs]
Jan 13 14:40:45 manet kernel: [<ffffffffa06a1622>] nfs_readdir_xdr_to_array+0x222/0x2e0 [nfs]
Jan 13 14:40:45 manet kernel: [<ffffffffa06a1702>] nfs_readdir_filler+0x22/0x90 [nfs]
Jan 13 14:40:45 manet kernel: [<ffffffff8112f975>] ? add_to_page_cache_lru+0x35/0x50
Jan 13 14:40:45 manet kernel: [<ffffffff8112faee>] __read_cache_page+0x7e/0xe0
Jan 13 14:40:45 manet kernel: [<ffffffffa06a16e0>] ? nfs_readdir_xdr_to_array+0x2e0/0x2e0 [nfs]
Jan 13 14:40:45 manet kernel: [<ffffffffa06a16e0>] ? nfs_readdir_xdr_to_array+0x2e0/0x2e0 [nfs]
Jan 13 14:40:45 manet kernel: [<ffffffff8113079c>] do_read_cache_page+0x3c/0x110
Jan 13 14:40:45 manet kernel: [<ffffffff811308b9>] read_cache_page_async+0x19/0x20
Jan 13 14:40:45 manet kernel: [<ffffffff811308ce>] read_cache_page+0xe/0x20
Jan 13 14:40:45 manet kernel: [<ffffffffa06a1c1e>] nfs_readdir+0x14e/0x3d0 [nfs]
Jan 13 14:40:45 manet kernel: [<ffffffffa071f0f0>] ? decode_getfattr_attrs+0x7b0/0x7b0 [nfsv4]
Jan 13 14:40:45 manet kernel: [<ffffffff811a811d>] iterate_dir+0xad/0xd0
Jan 13 14:40:45 manet kernel: [<ffffffff811a71ca>] ? do_fcntl+0x28a/0x370
Jan 13 14:40:45 manet kernel: [<ffffffff811a82d5>] SyS_getdents+0x95/0x100
Jan 13 14:40:45 manet kernel: [<ffffffff811a83e0>] ? SyS_old_readdir+0xa0/0xa0
Jan 13 14:40:45 manet kernel: [<ffffffff815a7752>] system_call_fastpath+0x16/0x1b
Jan 13 14:40:45 manet kernel: Code: 0f 94 c0 48 83 c4 08 0f b6 c0 5b c9 c3 0f 1f 84 00 00 00 00 00 e8 6b f8 ff ff 80 7b 25 00 74 c8 eb d3 90 90 90 48 89 f8 48 89 d1 <f3> a4 c3 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 20 4c 8b 06 4c 8b 
Jan 13 14:40:45 manet kernel: RIP  [<ffffffff8129cc56>] memcpy+0x6/0x110
Jan 13 14:40:45 manet kernel: RSP <ffff8800b8571aa0>
Jan 13 14:40:45 manet kernel: CR2: 0000000000000000
Jan 13 14:40:45 manet kernel: ---[ end trace 60338b53565ae174 ]---
Comment 4 Chuck Lever 2014-01-14 21:15:59 UTC
Created attachment 122121 [details]
Proposed fix
Comment 5 Chuck Lever 2014-04-11 14:28:33 UTC

*** This bug has been marked as a duplicate of bug 68391 ***