Bug 14249

Summary: BUG: oops in gss_validate on 2.6.31
Product: File System Reporter: Rafael J. Wysocki (rjw)
Component: NFSAssignee: Trond Myklebust (trondmy)
Status: CLOSED CODE_FIX    
Severity: normal CC: brian, harri, rico
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.31 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 13615    
Attachments: NFSv4: Fix two unbalanced put_rpccred() issues.

Description Rafael J. Wysocki 2009-09-29 21:10:18 UTC
Subject    : BUG: oops in gss_validate on 2.6.31
Submitter  : Bastian Blank <bastian@waldi.eu.org>
Date       : 2009-09-16 10:29
References : http://marc.info/?l=linux-kernel&m=125309700417283&w=4
Handled-By : Trond Myklebust <trond.myklebust@fys.uio.no>

This entry is being used for tracking a regression from 2.6.30.  Please don't
close it until the problem is fixed in the mainline.
Comment 1 Brian J. Murrell 2009-10-23 17:23:18 UTC
I think I have the same thing going on here:

[253207.745918] BUG: unable to handle kernel NULL pointer dereference at 00000010
[253207.749013] IP: [<fb27d24b>] gss_validate+0x7b/0x1d0 [auth_rpcgss]
[253207.753994] *pde = 94fb8067 
[253207.753994] Oops: 0000 [#1] SMP 
[253207.753994] last sysfs file: /sys/devices/pci0000:00/0000:00:0b.1/usb1/1-3/1-3:1.0/host6/target6:0:0/6:0:0:0/block/sde/sde1/stat
[253207.753994] Modules linked in: xt_multiport binfmt_misc bridge stp bnep vboxnetflt vboxdrv tun des_generic cbc autofs4 video output rpcsec_gss_krb5 nfsd exportfs nfs lockd nfs_acl auth_rpcgss sunrpc nf_conntrack_ipv6 xt_hl ipt_LOG xt_limit ipt_REJECT xt_tcpudp x
[253207.842462] 
[253207.842462] Pid: 4036, comm: rpciod/1 Tainted: P           (2.6.31-14-generic #48-Ubuntu) System Product Name
[253207.842462] EIP: 0060:[<fb27d24b>] EFLAGS: 00010296 CPU: 1
[253207.842462] EIP is at gss_validate+0x7b/0x1d0 [auth_rpcgss]
[253207.842462] EAX: 00000004 EBX: 00000000 ECX: f6abde80 EDX: f28128e4
[253207.842462] ESI: 00000025 EDI: ec7b6fc4 EBP: f6abdea4 ESP: f6abde40
[253207.842462]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[253207.842462] Process rpciod/1 (pid: 4036, ti=f6abc000 task=f6a33ed0 task.ti=f6abc000)
[253207.842462] Stack:
[253207.842462]  f6abde58 c049ca59 00000001 00000001 f28128e4 e43210c0 f6abde94 00000004
[253207.842462] <0> 00000000 00000000 f6abde8c c0121270 00000000 02020202 00000004 00000004
[253207.842462] <0> 00000025 f28128e4 f6abde94 00000004 00000100 85030000 ec7b6fc4 e43210c0
[253207.842462] Call Trace:
[253207.842462]  [<c049ca59>] ? net_tx_action+0x59/0x130
[253207.842462]  [<c0121270>] ? ack_apic_level+0x60/0x230
[253207.842462]  [<fb242bf2>] ? rpcauth_checkverf+0x22/0x60 [sunrpc]
[253207.842462]  [<c014b60f>] ? irq_exit+0x2f/0x70
[253207.842462]  [<c0104f10>] ? do_IRQ+0x50/0xc0
[253207.842462]  [<fb23b2df>] ? rpc_verify_header+0x1af/0x5c0 [sunrpc]
[253207.842462]  [<c01039b0>] ? common_interrupt+0x30/0x40
[253207.842462]  [<fb23b807>] ? call_decode+0x117/0x220 [sunrpc]
[253207.842462]  [<fb33dfd0>] ? nfs4_xdr_dec_read+0x0/0x60 [nfs]
[253207.842462]  [<fb242022>] ? __rpc_execute+0x92/0x1f0 [sunrpc]
[253207.842462]  [<fb2421ab>] ? rpc_async_schedule+0xb/0x10 [sunrpc]
[253207.842462]  [<c0157a7e>] ? run_workqueue+0x6e/0x140
[253207.842462]  [<fb2421a0>] ? rpc_async_schedule+0x0/0x10 [sunrpc]
[253207.842462]  [<c0157bd8>] ? worker_thread+0x88/0xe0
[253207.842462]  [<c015c280>] ? autoremove_wake_function+0x0/0x40
[253207.842462]  [<c0157b50>] ? worker_thread+0x0/0xe0
[253207.842462]  [<c015bf8c>] ? kthread+0x7c/0x90
[253207.842462]  [<c015bf10>] ? kthread+0x0/0x90
[253207.842462]  [<c0104007>] ? kernel_thread_helper+0x7/0x10
[253207.842462] Code: 55 b4 8b 40 64 0f c8 89 45 f0 8d 45 f0 89 45 e4 8d 45 e4 c7 45 e8 04 00 00 00 e8 31 cf fc ff 8b 55 ac 8d 4d dc 89 75 dc 89 55 e0 <8b> 43 10 8d 55 b4 e8 2a 11 00 00 3d 00 00 0c 00 74 6b 85 c0 75 
[253207.842462] EIP: [<fb27d24b>] gss_validate+0x7b/0x1d0 [auth_rpcgss] SS:ESP 0068:f6abde40
[253207.842462] CR2: 0000000000000010
[253207.845072] ---[ end trace ad285e035a384c5f ]---
[253208.107509] BUG: unable to handle kernel NULL pointer dereference at 00000010
[253208.107518] IP: [<fb27d24b>] gss_validate+0x7b/0x1d0 [auth_rpcgss]
[253208.107534] *pde = aee17067 
[253208.107537] Oops: 0000 [#2] SMP 
[253208.107540] last sysfs file: /sys/devices/pci0000:00/0000:00:0b.1/usb1/1-3/1-3:1.0/host6/target6:0:0/6:0:0:0/block/sde/sde1/stat
[253208.107544] Modules linked in: xt_multiport binfmt_misc bridge stp bnep vboxnetflt vboxdrv tun des_generic cbc autofs4 video output rpcsec_gss_krb5 nfsd exportfs nfs lockd nfs_acl auth_rpcgss sunrpc nf_conntrack_ipv6 xt_hl ipt_LOG xt_limit ipt_REJECT xt_tcpudp x
[253208.107607] 
[253208.107611] Pid: 4033, comm: rpciod/0 Tainted: P      D    (2.6.31-14-generic #48-Ubuntu) System Product Name
[253208.107614] EIP: 0060:[<fb27d24b>] EFLAGS: 00010296 CPU: 0
[253208.107620] EIP is at gss_validate+0x7b/0x1d0 [auth_rpcgss]
[253208.107622] EAX: 00000004 EBX: 00000000 ECX: f64f7e80 EDX: d80a68e4
[253208.107625] ESI: 00000025 EDI: eb716c44 EBP: f64f7ea4 ESP: f64f7e40
[253208.107627]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[253208.107630] Process rpciod/0 (pid: 4033, ti=f64f6000 task=f6a34b60 task.ti=f64f6000)
[253208.107632] Stack:
[253208.107633]  c0127c38 f64f7e58 c05707da f7070000 d80a68e4 e43210c0 f64f7e94 00000004
[253208.107637] <0> 00000000 00000000 00000292 ecb2d204 00000000 c0150c2b 00000004 00000004
[253208.107641] <0> 00000025 d80a68e4 f64f7e94 00000004 2942dfc4 88030000 eb716c44 e43210c0
[253208.107646] Call Trace:
[253208.107655]  [<c0127c38>] ? default_spin_lock_flags+0x8/0x10
[253208.107660]  [<c05707da>] ? _spin_lock_irqsave+0x2a/0x40
[253208.107664]  [<c0150c2b>] ? mod_timer+0xcb/0x140
[253208.107695]  [<fb242bf2>] ? rpcauth_checkverf+0x22/0x60 [sunrpc]
[253208.107709]  [<fb23b2df>] ? rpc_verify_header+0x1af/0x5c0 [sunrpc]
[253208.107723]  [<fb23b807>] ? call_decode+0x117/0x220 [sunrpc]
[253208.107756]  [<fb33dfd0>] ? nfs4_xdr_dec_read+0x0/0x60 [nfs]
[253208.107772]  [<fb242022>] ? __rpc_execute+0x92/0x1f0 [sunrpc]
[253208.107806]  [<fb2421ab>] ? rpc_async_schedule+0xb/0x10 [sunrpc]
[253208.107811]  [<c0157a7e>] ? run_workqueue+0x6e/0x140
[253208.107836]  [<fb2421a0>] ? rpc_async_schedule+0x0/0x10 [sunrpc]
[253208.107849]  [<c0157bd8>] ? worker_thread+0x88/0xe0
[253208.107858]  [<c015c280>] ? autoremove_wake_function+0x0/0x40
[253208.107867]  [<c0157b50>] ? worker_thread+0x0/0xe0
[253208.107870]  [<c015bf8c>] ? kthread+0x7c/0x90
[253208.107873]  [<c015bf10>] ? kthread+0x0/0x90
[253208.107877]  [<c0104007>] ? kernel_thread_helper+0x7/0x10
[253208.107878] Code: 55 b4 8b 40 64 0f c8 89 45 f0 8d 45 f0 89 45 e4 8d 45 e4 c7 45 e8 04 00 00 00 e8 31 cf fc ff 8b 55 ac 8d 4d dc 89 75 dc 89 55 e0 <8b> 43 10 8d 55 b4 e8 2a 11 00 00 3d 00 00 0c 00 74 6b 85 c0 75 
[253208.107898] EIP: [<fb27d24b>] gss_validate+0x7b/0x1d0 [auth_rpcgss] SS:ESP 0068:f64f7e40
[253208.107906] CR2: 0000000000000010
[253208.107909] ---[ end trace ad285e035a384c60 ]---

Any progress on this issue?
Comment 2 Trond Myklebust 2009-10-23 19:15:50 UTC
*** Bug 14453 has been marked as a duplicate of this bug. ***
Comment 3 Trond Myklebust 2009-10-23 20:02:20 UTC
Is this only happening for NFSv4 mounted filesystems?

If so does the following patch help?
Comment 4 Trond Myklebust 2009-10-23 20:03:01 UTC
Created attachment 23508 [details]
NFSv4: Fix two unbalanced put_rpccred() issues.

Commits 29fba38b (nfs41: lease renewal) and fc01cea9 (nfs41: sequence
operation) introduce a couple of put_rpccred() calls on credentials for
which there is no corresponding get_rpccred().
Comment 5 Brian J. Murrell 2009-10-24 02:07:57 UTC
(In reply to comment #3)
> Is this only happening for NFSv4 mounted filesystems?

I'm not positive as it only happened the once to me (today was the first and only occurrence) and I rolled back to a 2.6.30 kernel.

> If so does the following patch help?

I wonder if Ubuntu would be willing to add that patch as a "sauce" patch for their 9.10 RC users to test.
Comment 6 Trond Myklebust 2009-10-24 14:35:00 UTC
If it fixes the bug, I'm planning on sending it to Greg for the stable series. Ubuntu, and all other distros can pick it up from there.

I do need it tested first, though. Hopefully, at least one of you can try it out and see if it suffices to prevent the bug from reoccurring.
Comment 7 Rico Rommel 2009-10-24 20:44:35 UTC
The patch is working fine here (NFSv4).
Comment 8 Harald Dunkel 2009-10-26 10:01:28 UTC
Does this problem affect the NFS clients only, or is there a chance for a similar problem for NFS servers?
Comment 9 Trond Myklebust 2009-10-26 11:56:14 UTC
It won't affect a server that isn't also running as a client. However it will affect anything that is running an NFSv4 client.
Comment 10 Harald Dunkel 2009-10-30 09:37:36 UTC
(In reply to comment #6)
> If it fixes the bug, I'm planning on sending it to Greg for the stable
> series.
> Ubuntu, and all other distros can pick it up from there.
> 

Works for me too (on top of 2.6.31.5), but of course the problem came up just by chance. 

Will this patch be included in 2.6.31.x?
Comment 11 Trond Myklebust 2009-10-30 11:48:26 UTC
Yes. I've already sent it to the stable kernel maintainers for consideration.
Comment 12 Brian J. Murrell 2009-11-11 13:20:20 UTC
(In reply to comment #11)
> Yes. I've already sent it to the stable kernel maintainers for consideration.

How do we know when/if it gets included in the stable kernel?  Will this bug get automatically updated when/if that happens?
Comment 13 Trond Myklebust 2009-11-11 22:17:49 UTC
The patch has been merged into 2.6.31.6, so I'm closing this bug...