Bug 30322

Summary: BUG in xs_tcp_setup_socket
Product: File System Reporter: Ben Hutchings (bhutchings)
Component: NFSAssignee: Trond Myklebust (trondmy)
Status: CLOSED CODE_FIX    
Severity: normal CC: florian
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.38-rc5 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 21782    
Attachments: SUNRPC: Fix a bug in xs_create_sock()

Description Ben Hutchings 2011-03-01 20:34:04 UTC
[Previously posted at http://article.gmane.org/gmane.linux.nfs/38949 including a patch]

BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
IP: [<ffffffffa05b5e08>] xs_tcp_setup_socket+0x348/0x4a0 [sunrpc]
PGD 0 
Oops: 0000 [#1] SMP 
last sysfs file: /sys/devices/pci0000:00/0000:00:1c.0/0000:02:00.0/0000:03:00.0/irq
CPU 0 
Modules linked in: netconsole configfs nfs lockd fscache nfs_acl auth_rpcgss ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat bridge stp llc autofs4 sunrpc be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb3i libcxgbi cxgb3 ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ext2 dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun kvm_intel kvm uinput bnx2 sg dcdbas serio_raw pcspkr iTCO_wdt iTCO_vendor_support i5k_amb i5000_edac edac_core ioatdma dca sfc mtd mdio shpchp ext3 jbd mbcache sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix mptsas mptscsih mptbase scsi_transport_sas radeon ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core [last unloaded: speedstep_lib]
Pid: 10, comm: kworker/0:1 Not tainted 2.6.38-rc5 #3 Dell Inc. PowerEdge 2950/0CX396
RIP: 0010:[<ffffffffa05b5e08>]  [<ffffffffa05b5e08>] xs_tcp_setup_socket+0x348/0x4a0 [sunrpc]
RSP: 0018:ffff880126c11da0  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8801220ee000 RCX: 0000000100016909
RDX: 000000000000001e RSI: ffff880123065a80 RDI: 0000000000000000
RBP: ffff880126c11df0 R08: f018000000000000 R09: febef9edc98abe03
R10: 0000000000000480 R11: 0000000000000000 R12: ffff8801220ee680
R13: ffffe8ffffc0dd00 R14: 0000000000000000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff8800cf800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000020 CR3: 000000012191f000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kworker/0:1 (pid: 10, threadinfo ffff880126c10000, task ffff880126c0f560)
Stack:
 0000000000014e80 ffff880126c0f560 ffff880126c0faf0 00000000c17d7567
 ffff880126c0faf8 ffff8801273cbc40 ffff8800cf811040 ffffe8ffffc0dd00
 ffffffffa05b5ac0 0000000000000000 ffff880126c11e50 ffffffff8107b884
Call Trace:
 [<ffffffffa05b5ac0>] ? xs_tcp_setup_socket+0x0/0x4a0 [sunrpc]
 [<ffffffff8107b884>] process_one_work+0x124/0x430
 [<ffffffff8107e1d1>] worker_thread+0x181/0x3c0
 [<ffffffff8107e050>] ? worker_thread+0x0/0x3c0
 [<ffffffff810828c6>] kthread+0x96/0xa0
 [<ffffffff8100cdc4>] kernel_thread_helper+0x4/0x10
 [<ffffffff81082830>] ? kthread+0x0/0xa0
 [<ffffffff8100cdc0>] ? kernel_thread_helper+0x0/0x10
Code: 0f 1f 00 0f 84 3a ff ff ff e9 4b fe ff ff 0f 1f 44 00 00 41 83 fd 91 0f 85 3c fe ff ff 66 0f 1f 44 00 00 e9 1b ff ff ff 0f 1f 00 <4d> 8b 6e 20 4d 8d bd 68 01 00 00 4c 89 ff e8 45 99 f0 e0 49 8b 
RIP  [<ffffffffa05b5e08>] xs_tcp_setup_socket+0x348/0x4a0 [sunrpc]
 RSP <ffff880126c11da0>
CR2: 0000000000000020
---[ end trace 6efc43bb9b1264f8 ]---

The code dump alone is pretty useless as the IP is at the start of a
block, but having disassembled the entire sunrpc module it appears that
it corresponds to:

static int xs_tcp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
{
        struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt);

        if (!transport->inet) {
                struct sock *sk = sock->sk; /* <-- this line */

                write_lock_bh(&sk->sk_callback_lock);

I think the bug is that xs_create_sock() returns 0 if xs_bind() fails.

This bug appears to have been introduced in 2.6.37 by:

commit b65c0310611af73569f94c526a1e2323d99b380a
Author: Pavel Emelyanov <xemul@parallels.com>
Date:   Mon Oct 4 16:53:46 2010 +0400

    sunrpc: Factor out udp sockets creation

commit 22f793268de3b4dff8abfcd873ba7afc1f34224f
Author: Pavel Emelyanov <xemul@parallels.com>
Date:   Mon Oct 4 16:54:26 2010 +0400

    sunrpc: Factor out v4 sockets creation

commit 22d44a7d8a03456aa6d0a047c051aa28728e6ecd
Author: Pavel Emelyanov <xemul@parallels.com>
Date:   Mon Oct 4 16:54:55 2010 +0400

    sunrpc: Factor out v6 sockets creation
Comment 1 Trond Myklebust 2011-03-01 23:34:37 UTC
Created attachment 49842 [details]
SUNRPC: Fix a bug in xs_create_sock()
Comment 2 Trond Myklebust 2011-03-01 23:35:53 UTC
Does the above patch suffice to fix the Oops?
Comment 3 Ben Hutchings 2011-03-01 23:45:47 UTC
(In reply to comment #2)
> Does the above patch suffice to fix the Oops?

WTF, you take a week to respond and then send back a patch I already wrote?
Comment 4 Ben Hutchings 2011-03-01 23:48:14 UTC
Note, this bug report was copied from mail purely so that it can be tracked as a regression.
Comment 5 Trond Myklebust 2011-03-02 01:04:41 UTC
Sorry I missed the deadline.  I'be been traveling for 2 weeks.

That patch was one I wrote a week ago for a different bug report. If you already had a patch, then why didn't you attach it to the bugreport?

bugzilla-daemon@bugzilla.kernel.org wrote:

>https://bugzilla.kernel.org/show_bug.cgi?id=30322
>
>
>
>
>
>--- Comment #3 from Ben Hutchings <bhutchings@solarflare.com>  2011-03-01
>23:45:47 ---
>(In reply to comment #2)
>> Does the above patch suffice to fix the Oops?
>
>WTF, you take a week to respond and then send back a patch I already wrote?
>
>-- 
>Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
>------- You are receiving this mail because: -------
>You are the assignee for the bug.
Comment 6 Ben Hutchings 2011-03-02 01:08:42 UTC
(In reply to comment #5)
> Sorry I missed the deadline.  I'be been traveling for 2 weeks.
> 
> That patch was one I wrote a week ago for a different bug report. If you
> already had a patch, then why didn't you attach it to the bugreport?

Because I've been told (I forget who by) that patches shouldn't be posted on Bz if they have already been posted on the appropriate mailing list.
Comment 7 Trond Myklebust 2011-03-02 04:23:03 UTC
As concerns the mailing list, I won't start scouring that for patches until
I get back home, which won't be for another few days.

Posting just part of a report wastes everybody's time. If you have a fix, then
attach the damned thing, so that I don't have to go looking for it.
Comment 8 Florian Mickler 2011-03-05 01:33:26 UTC
Patch: http://article.gmane.org/gmane.linux.nfs/38949
Comment 9 Ben Hutchings 2011-03-10 01:07:32 UTC
*** Bug 30222 has been marked as a duplicate of this bug. ***
Comment 10 Ben Hutchings 2011-03-10 01:07:50 UTC
*** Bug 30232 has been marked as a duplicate of this bug. ***
Comment 11 Ben Hutchings 2011-03-10 01:08:03 UTC
*** Bug 30252 has been marked as a duplicate of this bug. ***
Comment 12 Ben Hutchings 2011-03-10 01:08:17 UTC
*** Bug 30272 has been marked as a duplicate of this bug. ***
Comment 13 Ben Hutchings 2011-03-10 01:08:33 UTC
*** Bug 30292 has been marked as a duplicate of this bug. ***
Comment 14 Florian Mickler 2011-03-30 20:37:25 UTC
The fix has been merged in mainline for v2.6.38:

commit 4cea288aaf0e11647880cc487350b1dc45d9febc
Author: Ben Hutchings <bhutchings@solarflare.com>
Date:   Tue Feb 22 21:54:34 2011 +0000

    sunrpc: Propagate errors from xs_bind() through xs_create_sock()