Bug 209213 - ib_srp: Already connected to target port with
Summary: ib_srp: Already connected to target port with
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Infiniband/RDMA (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_infiniband-rdma
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-09-09 17:21 UTC by Yuri K
Modified: 2020-09-11 11:08 UTC (History)
1 user (show)

See Also:
Kernel Version: 3.10.0-1127.19.1.el7.x86_64
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Yuri K 2020-09-09 17:21:22 UTC
Hi.

Not sure where to report this, but I'm being annoyed by these messages in my dmesg/kernel log:

[88003.088015] scsi host1425: ib_srp: Already connected to target port with id_ext=c0fec0fec0fec0fe;ioc_guid=c0fec0fec0fec0fe;initiator_ext=0000000000000000
[88065.090095] scsi host1426: ib_srp: Already connected to target port with id_ext=c0fec0fec0fec0fe;ioc_guid=c0fec0fec0fec0fe;initiator_ext=0000000000000000
[88127.148561] scsi host1427: ib_srp: Already connected to target port with id_ext=c0fec0fec0fec0fe;ioc_guid=c0fec0fec0fec0fe;initiator_ext=0000000000000000
and so on.

No idea why it's trying to connect over to c0fe again and again. I'm running EL7, but I guess SRP development is so old and staleble that it doesn't matter at this point. As you've noticed, I had changed the `srpt_service_guid` on target to be more fun.

This is what SRP daemon args look like:
/usr/sbin/srp_daemon --systemd -e -c -j mlx4_0:1 -R 60

# ibsrpdm -v
Using device mlx4_0 port 1
Device mlx4_0 was found
CQ was created with 10 CQEs
CQ was created with 1 CQEs
MR was created with addr=0xa7db50, lkey=0x18010300,
QP was created, QP number=0x22e
QPs were modified to RTS
Advanced SM, performing a capability query
discover Targets for P_key ffff (index 0)
enter do_port
IO Unit Info:
    port LID:        0005
    port GID:        fe800000000000000002c90300fb9c31
    change ID:       0001
    max controllers: 0x10

    controller[  1]
        GUID:      c0fec0fec0fec0fe
        vendor ID: 000002
        device ID: 001003
        IO class : 0100
        ID:        Linux SRP target
        service entries: 1
            service[  0]: c0fec0fec0fec0fe / SRP.T10:c0fec0fec0fec0fe
Found an SRP target with id_ext c0fec0fec0fec0fe - check if it is already connected

discover Targets for P_key ffff (index 0)
enter do_port
IO Unit Info:
    port LID:        0003
    port GID:        fe8000000000000024be05ffffb2a031
    change ID:       0001
    max controllers: 0x10

    controller[  1]
        GUID:      c0fec0fec0fec0fe
        vendor ID: 000002
        device ID: 001003
        IO class : 0100
        ID:        Linux SRP target
        service entries: 1
            service[  0]: c0fec0fec0fec0fe / SRP.T10:c0fec0fec0fec0fe
Found an SRP target with id_ext c0fec0fec0fec0fe - check if it is already connected

discover Targets for P_key ffff (index 0)
enter do_port
IO Unit Info:
    port LID:        0013
    port GID:        fe80000000000000e0071bffff81f7c1
    change ID:       0001
    max controllers: 0x10

    controller[  1]
        GUID:      e0071bffff81f7c0
        vendor ID: 000002
        device ID: 001007
        IO class : 0100
        ID:        Linux SRP target
        service entries: 1
            service[  0]: e0071bffff81f7c0 / SRP.T10:e0071bffff81f7c0
Found an SRP target with id_ext e0071bffff81f7c0 - check if it is already connected

--
Thanks.
Comment 1 Bart Van Assche 2020-09-10 02:43:34 UTC
The ib_srp kernel driver verifies as follows whether or not a connection to an SRP target already exists:

static bool srp_conn_unique(struct srp_host *host,
			    struct srp_target_port *target)
{
	struct srp_target_port *t;
	bool ret = false;

	if (target->state == SRP_TARGET_REMOVED)
		goto out;

	ret = true;

	spin_lock(&host->target_lock);
	list_for_each_entry(t, &host->target_list, list) {
		if (t != target &&
		    target->id_ext == t->id_ext &&
		    target->ioc_guid == t->ioc_guid &&
		    target->initiator_ext == t->initiator_ext) {
			ret = false;
			break;
		}
	}
	spin_unlock(&host->target_lock);

out:
	return ret;
}

The source code of ibsrpdm is available in https://github.com/linux-rdma/rdma-core/blob/master/srp_daemon/srp_daemon.c. Your report probably means that there is a mismatch between the code in add_non_exist_target() and the code in the kernel. Please use the kernel code as reference since it is an implementation of the following requirement from the SRP specification (see also http://www.t10.org/cgi-bin/ac.pl?t=f&f=srp2r06.pdf):

The MULTICHANNEL ACTION field (see table 15) indicates how an SRP target port handles existing RDMA channels associated with the same I_T nexus. [ ... ] The INITIATOR PORT IDENTIFIER field and the TARGET PORT IDENTIFIER field specify the I_T nexus that shall be associated with this RDMA channel.
Comment 2 Yuri K 2020-09-10 12:15:51 UTC
I don't think I'm able to understand you. Target (ib_srpt) is using 2 cards, with 2 ports in them, with different node GUIDs and Port GUIDs. ib_srpt is exposing the service over 2 ports to IB switch for reliability. 

Initiators are connected to IB switch and have the following config in /etc/srp_daemon.conf:
a       ioc_guid=c0fec0fec0fec0fe
d

I'd like to get rid of those messages in dmesg (and, as I get it, endless srp_daemon connection attempts to connect to something that is already connected).

All this SCSI stuff is hard for me.

Thanks.
Comment 3 Bart Van Assche 2020-09-10 23:55:10 UTC
In your message I read "connection attempts". That may be a misunderstanding. What I think is happening is that the srp_daemon periodically tells the ib_srp driver to relogin (through another IB port than it is currently logged in) and that the ib_srp driver reports "already logged in" (through the other port). I don't think that any network communication is triggered when ib_srp reports "Already connected".
Comment 4 Yuri K 2020-09-11 11:08:58 UTC
Why would it tell ib_srp to relogin if it's already logged in? I'm an end-user, I don't care. I just need to get rid of those messages. Also, the scsi host# is increasing, so I assume it consumes resources of some sort and might run out of numbers. This is clearly a bug, be it on srp daemon or ib_srp module part.

Thanks.

Note You need to log in before you can comment on or make changes to this bug.