Bug 1872 - sbp2 serialize_io=0 buggy (was: data corruption on ipod using sbp2 module)
Summary: sbp2 serialize_io=0 buggy (was: data corruption on ipod using sbp2 module)
Status: REJECTED WILL_NOT_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: IEEE1394 (show other bugs)
Hardware: i386 Linux
: P2 low
Assignee: Stefan Richter
URL:
Keywords:
: 6093 (view as bug list)
Depends on:
Blocks: 10046
  Show dependency tree
 
Reported: 2004-01-15 05:27 UTC by Thomas Margraf
Modified: 2008-02-19 12:00 UTC (History)
3 users (show)

See Also:
Kernel Version: all
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
dmesg output log (14.98 KB, text/plain)
2004-03-09 14:01 UTC, elias
Details
From syslog: Linux 2.6.11.6, Audigy, Oxford 911 ATA adapter, nastiness (419.22 KB, text/plain)
2005-04-21 11:31 UTC, R
Details

Description Thomas Margraf 2004-01-15 05:27:10 UTC
Distribution: gentoo 
Hardware Environment: athlon-tbird 1.3ghz 
Software Environment: kernel 2.6.1 
Problem Description: 
dmesg output:  
sbp2: $Rev$ Ben Collins <bcollins@debian.org> 
scsi1 : SCSI emulation for IEEE-1394 SBP-2 Devices 
ieee1394: sbp2: Logged into SBP-2 device 
ieee1394: sbp2: Node 0-00:1023: Max speed [S400] - Max payload [2048] 
  Vendor: Apple     Model: iPod              Rev: 1.21 
  Type:   Direct-Access                      ANSI SCSI revision: 02 
SCSI device sda: 19531260 512-byte hdwr sectors (10000 MB) 
sda: test WP failed, assume Write Enabled 
sda: asking for cache data failed 
sda: assuming drive cache: write through 
 /dev/scsi/host1/bus0/target0/lun0: p1 p2 
Attached scsi removable disk sda at scsi1, channel 0, id 0, lun 0 
Attached scsi generic sg1 at scsi1, channel 0, id 0, lun 0,  type 0 
ieee1394: sbp2: aborting sbp2 command 
0x2a 00 00 02 0b f2 00 00 80 00 
ieee1394: sbp2: aborting sbp2 command 
0x2a 00 00 02 0c 72 00 00 80 00 
ieee1394: sbp2: aborting sbp2 command 
0x2a 00 00 02 0c f2 00 00 80 00 
ieee1394: sbp2: aborting sbp2 command 
0x2a 00 00 02 0d 72 00 00 80 00 
ieee1394: sbp2: aborting sbp2 command 
0x2a 00 00 02 0d f2 00 00 80 00 
ieee1394: sbp2: aborting sbp2 command 
0x2a 00 00 02 0e 72 00 00 80 00 
ieee1394: sbp2: aborting sbp2 command 
0x2a 00 00 02 0e f2 00 00 80 00 
 
 
Steps to reproduce: modprobe sbp2 
		    mount /dev/sda2 /mnt/ipod 
		    copy files to  /mnt/ipod 
 
the above worked flawlessly on a 2.4.18 kernel.
Comment 1 elias 2004-03-09 14:01:33 UTC
Created attachment 2305 [details]
dmesg output log

I had this problem with 2.6.1 too. I can connect to iPod with 2.6.3 now, but
the connection brakes after a short time and the ipod is reconnected at sdc
while it was on sdb before and so on, what leafs my gtkpod app waiting forever,
since I can't manage to umount the lost sdb2 connection to remount sdc2 to
/mnt/ipod.
I have compiled ieee1394, ohci1394, sbp2 and sd_mod directly into kernel. If
you need the kernel config or something else, just ask.
By the way all this was working in 2.4.20+ but getting the device connected was
far more work, so thanks for that part of your work!
Comment 2 R 2005-04-21 11:31:54 UTC
Created attachment 4968 [details]
From syslog: Linux 2.6.11.6, Audigy, Oxford 911 ATA adapter, nastiness

Messages from the current session of 2.6.11.6, playing with Firewire drive and
Audigy.
Comment 3 R 2005-04-21 11:41:13 UTC
Comment on attachment 4968 [details]
From syslog: Linux 2.6.11.6, Audigy, Oxford 911 ATA adapter, nastiness

I'm finding very similar problems with my Alpha and Audigy and Oxford 911 ATA
adapter, and Linux 2.6.11.6. Except, I get another slightly different set of
errors.

In this case, too, Linux 2.4 had no discernable problem. Also, the same drive
has no problem with my brother's Athlon and a more normal Firewire chip and
Linux 2.6.10.
Comment 4 Stefan Richter 2005-07-24 03:17:37 UTC
Thomas and "R", please update to Linux 2.6.12 and/or try the suggestions at
http://www.linux1394.org/faq.php#sbp2abort

The problem reported by Elias (bug 2278) is different because it's cause is a
fast succession of bus resets.
Comment 5 Stefan Richter 2005-10-01 01:55:35 UTC
FYI, sbp2 was switched to a safer default mode in Linux 2.6.14-rc3. But the
underlying cause of the problem class "works in 2.4 but not 2.6" has not been
identified yet AFAIK.
Comment 6 Stefan Richter 2006-04-23 10:46:23 UTC
*** Bug 6093 has been marked as a duplicate of this bug. ***
Comment 7 Stefan Richter 2006-08-16 00:48:04 UTC
Some relevant patches have been recently submitted to Andrew Morton's -mm
patchset. I made them also available as patches for released kernels at
http://me.in-berlin.de/~s5r6/linux1394/updates/. Some of these fixes may perhaps
be released with Linux 2.6.18.

Code inspection has shown that there are more changes needed. Also, one of my
SBP-2 devices (a TI StorageLynx based HDD) still shows command abortions if sbp2
is loaded with serialize_io=0. (serialize_io=1 is the default since Linux
2.6.14.) All other HDDs and CD/DVD-RWs I have access to work OK.

I hope to get the missing changes implemented during the next weeks.
Comment 8 Stefan Richter 2006-08-16 00:54:06 UTC
PS: The patches I referred to reside in "v146_experimental" or later versions at
http://me.in-berlin.de/~s5r6/linux1394/updates/.
Comment 9 Stefan Richter 2006-11-01 14:41:27 UTC
I believe the following needs to be done to resolve this (assumed that the
devices and their firmwares are bugfree):
 - If the device has the "ordered" flag set in its Logical_Unit_Number ROM
entry, completion of a task means completion of all previous tasks.
 - If src==1 in a status block, the relating ORB DMA must not be unmapped and
reused until status for a subsequent ORB is received.
 - I suspect all of the protocol handling should be moved out of atomic context
into a kernel thread (using the kthread API or workqueue API), so that all
usages of sbp2util_node_write_no_wait() can be replaced by true transactions.
This would be a continuation of the solution to bug 6948.
Comment 10 Stefan Richter 2006-12-05 12:06:08 UTC
setting "Severity=low" to reflect the diminishing practical impact of this bug
since Linux 2.6.14
Comment 11 Natalie Protasevich 2007-09-04 00:51:34 UTC
Thomas and others,
Is the problem still present with current kernel (hopefully containing patches that Stefan mentioned in #7-8)?
Thanks.
Comment 12 Stefan Richter 2007-09-04 01:42:41 UTC
Natalie, thanks for the heads up.  I don't have an iPod but there weren't any bug reports of this kind anymore for a year or so, because sbp2 was changed to safer defaults in Linux 2.6.14.  For a non-default mode of operation though (sbp2 module loaded with parameter serialize_io=0), the following remarks from comment #9 still need to be addressed:
   - If the device has the "ordered" flag set in its Logical_Unit_Number
     ROM entry, completion of a task means completion of all previous
     tasks.
   - If src==1 in a status block, the relating ORB DMA must not be
     unmapped and reused until status for a subsequent ORB is received.
See also the "TODO" comment at the top of drivers/ieee1394/sbp2.c:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=drivers/ieee1394/sbp2.c;h=a81ba8fca0db168314a2ce21aee0cecb0fd4f567#l38
Comment 13 Stefan Richter 2007-09-04 01:45:39 UTC
PS:  Patches mentioned in earlier comments have been merged in mainline Linux soon after the comments.
Comment 14 Stefan Richter 2008-02-19 12:00:44 UTC
The problem does not occur if sbp2 is used with default parameters.  Also, the alternative firewire-sbp2 driver does not feature this bug.  There are currently no resources to fix sbp2 when used with the non-default parameter serialize_io=0.

Note You need to log in before you can comment on or make changes to this bug.