Bug 4829

Summary: Problem with sym53c8xx_2 and target devices that doesn't support SYNC/PPR
Product: SCSI Drivers Reporter: Daniel Forsgren (daniel.forsgren)
Component: sym53c8xxAssignee: Matthew Wilcox (matthew)
Status: CLOSED CODE_FIX    
Severity: normal CC: akpm, matthew
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.12.1 Subsystem:
Regression: --- Bisected commit-id:
Attachments: Patch to avoid endless sequence of PPRs

Description Daniel Forsgren 2005-07-01 15:18:22 UTC
I've encountered a problem when using the sym53c8xx_2 driver with a
scsi target device that doesn't support synchronous transfer (sync),
nor PPR:

In response to an inquiry command, my scsi device will return inquiry
data where the SYNC bit (of byte 7) is cleared. Thus, the "sdtr" flag
(in struct scsi_device) will never be set (see
drivers/scsi/scsi_scan.c: scsi_add_lun()).
The "sdtr" flag is later on checked by the spi_support_sync()
function, in sym_check_goals() (see sym53c8xx_2/sym_hipd.c).

This is where the root of the problem is, I think:

As the target device doesn't support sync, the sym_check_goals()
function will clear all fields within the sym_trans structure
(including goal->period).

As a result, the sym_prepare_nego() function (also in sym_hipd.c) will
try to use PPR (parallel protocol request) for negotiation (as
goal->period < 0).

Now, my target device doesn't support PPR (extended message 0x04)
either. So it will reject this message. Thus we will end up in
sym_nego_default() (also in sym_hipd.c).

The sym_nego_default() function will basically set
tp->tgoal.check_nego = 1, thus restarting the negotiation (with the
comment that "A target that understands a PPR message should never
reject it...").

But - a target that *doesn't* understand PPR will reject it, and in
that case it doesn't make sense to just restart negotiation. It will
eventually end up in sym_prepare_nego() again, and as the target still
doesn't understand SYNC, it will go for PPR again. And again. And
again.

The result being that my target device receives an endless sequence of
PPR messages.

So, to summarize:

In the current implementation, the sym_prepare_nego() function will
always go for PPR, as soon as the SYNC bit is cleared (which is a bit
odd).

And it also looks odd to me to try PPR all over again (and again) with
a target device that already has rejected it once. A target that
understands PPR will of course not reject it. But a target device that
*doesn't* understand it will. And if that target device doesn't
support SYNC either, then we will basically end up in an endless loop
of PPRs here.
Comment 1 Daniel Forsgren 2005-07-01 15:27:39 UTC
Typo:

   > As a result, the sym_prepare_nego() function (also in sym_hipd.c) will
   > try to use PPR (parallel protocol request) for negotiation (as
   > goal->period < 0).

should read:

    As a result, the sym_prepare_nego() function (also in sym_hipd.c) will
    try to use PPR (parallel protocol request) for negotiation (as
    goal->period < 0xa).
Comment 2 Andrew Morton 2005-07-28 23:12:27 UTC
Has there been any movement on this?  Are
you able to test 2.6.13-rc4?

Thanks.
Comment 3 Daniel Forsgren 2005-08-12 04:32:56 UTC
I tried this with 2.6.13-rc6, and as far as I can see, the problem
remains.

I didn't step through the code to see exactly what happens in rc6, but
the symptoms are the same (a never ending sequence of PPRs).
Comment 4 Daniel Forsgren 2005-08-12 06:51:01 UTC
Created attachment 5615 [details]
Patch to avoid endless sequence of PPRs

I would propose the attached patch (though I haven't thought about it
/ tested it extensively, so there are probably a lot of things that
I've overlooked. It all boils down to this:

As described above, it will always go for PPR as soon as period is
less than 0xa0. And that makes sense. If I'm not mistaken, SPI-5 says
that:

    "PPR messages shall be supported by ports supporting transfer
     period factors less than 0Ah or supporting any of the protocol
     options."

But, we shouldn't set period = 0 just because spi_support_sync()
returns false. (not supporting sync doesn't necessarily means that
we're supporting PPR).
Comment 5 Anonymous Emailer 2005-08-12 07:38:09 UTC
Reply-To: matthew@wil.cx

On Fri, Aug 12, 2005 at 06:51:02AM -0700, bugme-daemon@kernel-bugs.osdl.org wrote:
> Created an attachment (id=5615)
>  --> (http://bugzilla.kernel.org/attachment.cgi?id=5615&action=view)
> Patch to avoid endless sequence of PPRs
> 
> I would propose the attached patch (though I haven't thought about it
> / tested it extensively, so there are probably a lot of things that
> I've overlooked. It all boils down to this:

I think you've found the problem!  Could you try setting it to 0xff
instead of leaving it unchanged, please?  I'll incorporate that into
the next sym2 release if it works.

Comment 6 Daniel Forsgren 2005-08-12 09:00:56 UTC
I will try out period = 0xFF. But I just came to think of another
issue (given the implementation of the sym_prepare_nego() function):

Setting goal->period = 0xFF means that we will probably end up with
nego = NS_SYNC, right? That is, we will send a SDTR message (extended
message 0x01) to a target that (in this case) already (in its inquiry
page) has declared that it doesn't support SYNC.

It shouldn't really cause any problems, because the target will simply
reject it, and we don't risk ending up with an endless sequence of
SDTRs (in the same way as PPRs before), as for SDTRs,
sym_nego_default() doesn't restart negotiation (as it does for PPRs).

But, still, we will be sending out an SDTR (with period = 0xFF) to a
target that we know doesn't support sync. (And period = 0xFF would in
this case mean a request for Fast-5, right?. And it's not necessarily
the case that we wanted to run Fast-5, even if the target *had*
supported SYNC (which it, in this case, didn't).

Wouldn't it make sense (in sym_prepare_nego) to add a check so that we
don't go for NS_SYNC with targets that don't support it?
Comment 7 Marc Collin 2005-09-06 13:19:15 UTC
does this bug can do some reset on the bus (scsi)? 
Comment 8 Andrew Morton 2005-09-14 22:51:42 UTC
Matthew, Daniel's patch remains unmerged and afaik the
bug remains in 2.6.14-rc1.

What to do?
Comment 9 Daniel Forsgren 2005-09-15 00:40:17 UTC
Yes, 2.6.14-rc1 shows the same problem. I tried out period = 0xFF (comment #5
above), and as a result, I seem to get the sync requests described in #6.
Comment 10 Daniel Forsgren 2005-09-15 00:55:07 UTC
In response to #7: No, I can't see how this could cause a bus reset.
Comment 11 Matthew Wilcox 2005-12-15 04:07:45 UTC
This patch is now in Linus' git tree and will be in the next release after
2.6.15-rc5.