Bug 6948

Summary: Initio 2430 sbp2 controller not properly handled
Product: Drivers Reporter: piergiorgio.sartor
Component: IEEE1394Assignee: Stefan Richter (stefanr)
Status: CLOSED CODE_FIX    
Severity: normal    
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.17.7 Subsystem:
Regression: --- Bisected commit-id:
Attachments: [RFT PATCH 2.6.17.x] ieee1394: sbp2: handle "sbp2util_node_write_no_wait failed"
[PATCH 2.6.18-rc4-mm1 2/8] ieee1394: sbp2: handle "sbp2util_node_write_no_wait failed"

Description piergiorgio.sartor 2006-08-02 12:34:24 UTC
Most recent kernel where this bug did not occur:
Distribution: Fedora Core 5 (with fedora and vanilla kernels)
Hardware Environment: P3-S 1400MHz, 512MB, intel chipset
02:09.0 FireWire (IEEE 1394): Texas Instruments TSB12LV23 IEEE-1394 Controller
(prog-if 10 [OHCI])
        Subsystem: Texas Instruments Unknown device 8010
        Flags: bus master, medium devsel, latency 32, IRQ 21
        Memory at ec800000 (32-bit, non-prefetchable) [size=2K]
        Memory at ec000000 (32-bit, non-prefetchable) [size=16K]
        Capabilities: [44] Power Management version 1

Problem Description:
The Initio 2430 is a FW800 sbp2/3 controller, of course backward
compatible to FW100/200/400. Actually connected in FW400 mode, with
a FW400<->FW800 cable (unit date 2005).
Plugging the device, the following output is produced:

scsi3 : SBP-2 IEEE-1394
ieee1394: sbp2: Workarounds for node 0-00:1023: 0x2 (firmware_revision 0x000242,
vendor_id 0x001010, model_id 0x000000)
ieee1394: sbp2: Logged into SBP-2 device
ieee1394: Node 0-00:1023: Max speed [S400] - Max payload [2048]
  Vendor: Initio    Model: SP2014N           Rev: 2.42
  Type:   Direct-Access                      ANSI SCSI revision: 00
SCSI device sda: 390721968 512-byte hdwr sectors (200050 MB)
sda: Write Protect is off
sda: Mode Sense: 86 0b 00 02
sda: missing header in MODE_SENSE response
SCSI device sda: drive cache: write back
SCSI device sda: 390721968 512-byte hdwr sectors (200050 MB)
sda: Write Protect is off
sda: Mode Sense: 86 0b 00 02
sda: missing header in MODE_SENSE response
SCSI device sda: drive cache: write back
 sda: unknown partition table
sd 3:0:0:0: Attached scsi disk sda

Note the "missing header in MODE_SENSE response", which does
not sound too good.
Anyway, the problem occurs when trasfering data to the device,
sometimes the copy locks with:

ieee1394: sbp2: sbp2util_node_write_no_wait failed.

ieee1394: sbp2: aborting sbp2 command
sd 3:0:0:0:
        command: cdb[0]=0x2a: 2a 00 11 35 18 d8 00 00 f8 00
ieee1394: sbp2: sbp2util_node_write_no_wait failed.

it restarts and, it seems, there is no data corruption.
I read around, this seems to be "common", sometimes blamed to
the firmware of the unit.
serializing or not makes little difference, only the errors are
in sequence or overlapped.
Disabling workarounds does not bring anything.
Initio does not seem to have firmware updates for this unit.

Together with the other, I'm planning to test this thing
under windows and see what it says...

Steps to reproduce:
cp some_files /mnt/initio2430, dmesg...
Comment 1 Stefan Richter 2006-08-02 16:28:10 UTC
This is a known problem in sbp2. Whenever "sbp2util_node_write_no_wait failed"
appears in the log, it means most certainly thet sbp2 was unable to acquire a
free transaction label. I plan to rework sbp2's routine which emits that message
to be able to sleep until a transaction label becomes available. This will take
some time but I'm on it. Until then you could try if a different filesystem
makes this condition less likely.
http://sourceforge.net/mailarchive/forum.php?thread_id=25299507&forum_id=5389

The "missing header in MODE_SENSE response" business is just a sign of firmware
flaws of the Initio bridge. I have a INIC-2430 based disk too which has the same
flaw. Linux SCSI core was hardened to work around that, so it's no problem
anymore. And it's unrelated to the "sbp2util_node_write_no_wait failed" bug.
Comment 2 Stefan Richter 2006-08-10 03:25:58 UTC
Side note. As reported in http://bugzilla.kernel.org/show_bug.cgi?id=6947#c13 an
INIC-1530 is affected too, and comment #1 refers to an OXFW922 bridge.

My plan as stated in the mailarchive may take more time, therefore I will try to
come up with a simpler temporary solution that is quicker to implement and can
be merged sooner.
Comment 3 Stefan Richter 2006-08-12 03:55:30 UTC
Created attachment 8766 [details]
[RFT PATCH 2.6.17.x] ieee1394: sbp2: handle "sbp2util_node_write_no_wait failed"
Comment 4 Stefan Richter 2006-08-14 11:28:49 UTC
Created attachment 8787 [details]
[PATCH 2.6.18-rc4-mm1 2/8] ieee1394: sbp2: handle "sbp2util_node_write_no_wait failed"

This improved update has been posted to lkml. I will try to get it together
with patches which it directly depends on into Linux 2.6.18(-rcX).
Comment 5 Stefan Richter 2006-10-01 15:39:47 UTC
fix went into Linux 2.6.18-git16