Bug 12725

Summary: login into firewire disk fails with 64K PAGE_SIZE kernel on ppc64
Product: Drivers Reporter: Olaf Hering (olaf)
Component: IEEE1394Assignee: drivers_ieee1394
Status: RESOLVED OBSOLETE    
Severity: normal CC: alan, benh, stefanr
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.29-rc5 Subsystem:
Regression: No Bisected commit-id:
Bug Depends on:    
Bug Blocks: 10046    

Description Olaf Hering 2009-02-17 09:30:56 UTC
Distribution: openSuSE 11.1-ppc
Hardware Environment: PowerMac
Problem Description:
spb2 fails to login into a Maxtor OneTouch with Error logging into target on an
Apple PowerMac G5.

works with 4K PAGE_SIZE test kernel.
fails with 64K PAGE_SIZE kernel.

ohci1394 0001:03:0e.0: enabling device (0000 -> 0002)
ohci1394: fw-host0: OHCI-1394 1.0 (PCI): IRQ=[40]  MMIO=[80200000-802007ff] 
Max Packet=[4096]  IR/IT contexts=[8/8]
ieee1394: Host added: ID:BUS[0-00:1023]  GUID[000a95fffebab0e8]
ieee1394: Error parsing configrom for node 0-00:1023
ieee1394: Node changed: 0-00:1023 -> 0-01:1023


...
motherboard     : PowerMac7,2 MacRISC4 Power Macintosh
detected as     : 336 (PowerMac G5)

tested with SLES11 kernel, and with kernel-vanilla 2.6.27 and 2.6.29-rc5
ftp://ftp.suse.com/pub/projects/kernel/kotd/SLE11_BRANCH/ppc64/
Comment 1 Anonymous Emailer 2009-02-17 10:01:41 UTC
Reply-To: stefanr@s5r6.in-berlin.de

(Olaf, if you reply by mail, please Cc
linux1394-devel@lists.sourceforge.net.)

bugme-daemon@bugzilla.kernel.org wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=12725
...
> spb2 fails to login into a Maxtor OneTouch with Error logging into target on
> an
> Apple PowerMac G5.
> 
> works with 4K PAGE_SIZE test kernel.
> fails with 64K PAGE_SIZE kernel.

Interesting failure mode.

> ohci1394 0001:03:0e.0: enabling device (0000 -> 0002)
> ohci1394: fw-host0: OHCI-1394 1.0 (PCI): IRQ=[40]  MMIO=[80200000-802007ff] 
> Max Packet=[4096]  IR/IT contexts=[8/8]
> ieee1394: Host added: ID:BUS[0-00:1023]  GUID[000a95fffebab0e8]
> ieee1394: Error parsing configrom for node 0-00:1023
> ieee1394: Node changed: 0-00:1023 -> 0-01:1023

Actually it looks not that sbp2 login fails but that ieee1394 is unable
to read the configuration ROM of the device.  Otherwise there would be a
"Node added: ..." message for the disk eventually.  Please confirm this by
$ ls /sys/bus/ieee1394/devices/

And the cause of this could for example be a programming issue in
ohci1394.  Could very well be that some assumptions in that driver are
violated if PAGE_SIZE != 4k.

Please also test asynchronous I/O with either gscanbus or firecontrol.
In gscanbus, just check if gscanbus is able to detect the node type and
to show the GUID etc. when you click on the icon.  Or if you can't
easily obtain gscanbus for openSUSE 11, start firecontrol instead and
issue the command
r . 0 0xfffff0000404 4
The result of this should be
31 33 39 34
This is assuming that the disk has the physical ID 0 like in your dmesg.
To test reading from the local node which is physical ID 1, use
r . 1 0xfffff0000404 4
or to address the local node independently of its actual physical ID:
r . . 0xfffff0000404 4
This would merely serve as a verification that you are using firecontrol
correctly. :-)

(I/O to and from the local node almost never touches the controller
hardware and hence works despite any potential ohci1394 bug.)

And lastly, I would be interested in learning how firewire-sbp2 --
firewire-core -- firewire-ohci behave on PPC64 + PAGE_SIZE = whatever.
Just recompile the kernel modules with CONFIG_FIREWIRE(_...) switched
on, unload ohci1394, load firewire-ohci, and enjoy.
Comment 2 Olaf Hering 2009-02-18 01:05:03 UTC
this is with 2.6.27.17-SLE11_BRANCH_20090216205308_cb6936d0-ppc64:

firecontrol 0.2
Copyright (C) 2002-2007 by Manfred Weihs <mweihs@users.sourceforge.net>
This software comes with absolutely no warranty.

No adapter specified!
successfully got handle
current generation number (driver): 1
1 card(s) found
  nodes on bus:  2, card name: ohci1394
using adapter 0
found: 2 nodes on bus, local ID is 1, IRM is 1
current generation number (adapter): 3

entering command mode
Type 'help' for more information!
Command: help
available commands:
help                                ... display this help
br                                  ... reset the bus
br short                            ... reset the bus (short 1394a bus reset)
r bus node offset size              ... read from a node
w bus node offset size data         ... write from a node
l bus node offset extTCode data arg ... lock a node
p data                              ... send phy packet
i                                   ... print information (nr. of nodes etc.)
v                                   ... print version information
e                                   ... exit

For bus and node arguments '.' can be used as local bus or local node.

Command: r . 0 0xfffff0000404 4
reading from node 0, bus 1023, offset 0XFFFFF0000404 4 bytes
read failed.
Timeout
Command: r . 1 0xfffff0000404 4
reading from node 1, bus 1023, offset 0XFFFFF0000404 4 bytes
read succeeded. Data follows (hex):
31 33 39 34
Ack code: pending; Response code: complete
Command: r . . 0xfffff0000404 4
reading from node 1, bus 1023, offset 0XFFFFF0000404 4 bytes
read succeeded. Data follows (hex):
31 33 39 34
Ack code: pending; Response code: complete
Command:
Comment 3 Olaf Hering 2009-02-18 02:19:58 UTC
2.6.29-rc5-20090218_bug476479_3f6963f1-vanilla with 64K

sharlyn:~ # dmesg | grep fire
firewire_ohci 0001:03:0e.0: enabling device (0000 -> 0002)
firewire_ohci: Added fw-ohci device 0001:03:0e.0, OHCI version 1.0
firewire_core: created device fw0: GUID 000a95fffebab0e8, S800
firewire_core: created device fw1: GUID 0010b92100519fad, S400
firewire_core: phy config: card 0, new root=ffc1, gap_count=5
firewire_sbp2: fw1.1: logged in to LUN 0000 (0 retries)
firewire_sbp2: fw1.1: sbp2_scsi_abort
sharlyn:~ # lsscsi 
[0:0:0:0]    disk    ATA      ST3500320AS      SD15  /dev/sda
[4:0:0:0]    disk    Maxtor   OneTouch II      023g  -       
Comment 4 Olaf Hering 2009-02-18 02:45:03 UTC
same kernel with 4K 

sharlyn:~ # dmesg | grep fire
firewire_ohci 0001:03:0e.0: enabling device (0000 -> 0002)
firewire_ohci: Added fw-ohci device 0001:03:0e.0, OHCI version 1.0
firewire_core: created device fw0: GUID 000a95fffebab0e8, S800
firewire_core: created device fw1: GUID 0010b92100519fad, S400
firewire_core: phy config: card 0, new root=ffc1, gap_count=5
firewire_sbp2: fw1.1: logged in to LUN 0000 (0 retries)
sharlyn:~ # lsscsi 
[0:0:0:0]    disk    ATA      ST3500320AS      SD15  /dev/sda
[4:0:0:0]    disk    Maxtor   OneTouch II      023g  /dev/sdb
sharlyn:~ # 
Comment 5 Anonymous Emailer 2009-02-18 06:38:16 UTC
Reply-To: stefanr@s5r6.in-berlin.de

(quoting in full for linux1394-devel)

bugme-daemon@bugzilla.kernel.org wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=12725
> 
> 
> ------- Comment #2 from olaf@aepfle.de  2009-02-18 01:05 -------
> this is with 2.6.27.17-SLE11_BRANCH_20090216205308_cb6936d0-ppc64:
> 
> firecontrol 0.2
> Copyright (C) 2002-2007 by Manfred Weihs <mweihs@users.sourceforge.net>
> This software comes with absolutely no warranty.
> 
> No adapter specified!
> successfully got handle
> current generation number (driver): 1
> 1 card(s) found
>   nodes on bus:  2, card name: ohci1394
> using adapter 0
> found: 2 nodes on bus, local ID is 1, IRM is 1
> current generation number (adapter): 3
> 
> entering command mode
> Type 'help' for more information!
> Command: help
> available commands:
> help                                ... display this help
> br                                  ... reset the bus
> br short                            ... reset the bus (short 1394a bus reset)
> r bus node offset size              ... read from a node
> w bus node offset size data         ... write from a node
> l bus node offset extTCode data arg ... lock a node
> p data                              ... send phy packet
> i                                   ... print information (nr. of nodes etc.)
> v                                   ... print version information
> e                                   ... exit
> 
> For bus and node arguments '.' can be used as local bus or local node.
> 
> Command: r . 0 0xfffff0000404 4
> reading from node 0, bus 1023, offset 0XFFFFF0000404 4 bytes
> read failed.
> Timeout
> Command: r . 1 0xfffff0000404 4
> reading from node 1, bus 1023, offset 0XFFFFF0000404 4 bytes
> read succeeded. Data follows (hex):
> 31 33 39 34
> Ack code: pending; Response code: complete
> Command: r . . 0xfffff0000404 4
> reading from node 1, bus 1023, offset 0XFFFFF0000404 4 bytes
> read succeeded. Data follows (hex):
> 31 33 39 34
> Ack code: pending; Response code: complete
> Command:

OK, that's what I expected.  not only ieee1394's nodemgr but also
firecontrol (via raw1394) is unable to perform transactions with the
external node.


> ------- Comment #3 from olaf@aepfle.de  2009-02-18 02:19 -------
> 2.6.29-rc5-20090218_bug476479_3f6963f1-vanilla with 64K
> 
> sharlyn:~ # dmesg | grep fire
> firewire_ohci 0001:03:0e.0: enabling device (0000 -> 0002)
> firewire_ohci: Added fw-ohci device 0001:03:0e.0, OHCI version 1.0
> firewire_core: created device fw0: GUID 000a95fffebab0e8, S800
> firewire_core: created device fw1: GUID 0010b92100519fad, S400

OK, firewire-core's equivalent to ieee1394's nodemgr is successful.

> firewire_core: phy config: card 0, new root=ffc1, gap_count=5
> firewire_sbp2: fw1.1: logged in to LUN 0000 (0 retries)
> firewire_sbp2: fw1.1: sbp2_scsi_abort
> sharlyn:~ # lsscsi 
> [0:0:0:0]    disk    ATA      ST3500320AS      SD15  /dev/sda
> [4:0:0:0]    disk    Maxtor   OneTouch II      023g  -       

Is this sbp2_scsi_abort part of an unrecoverable SCSI I/O failure? Seems
to be, since there is no /dev/sdb listed.

If so, did you check that it was repeatable, not a random one-off failure?


> ------- Comment #4 from olaf@aepfle.de  2009-02-18 02:45 -------
> same kernel with 4K 
> 
> sharlyn:~ # dmesg | grep fire
> firewire_ohci 0001:03:0e.0: enabling device (0000 -> 0002)
> firewire_ohci: Added fw-ohci device 0001:03:0e.0, OHCI version 1.0
> firewire_core: created device fw0: GUID 000a95fffebab0e8, S800
> firewire_core: created device fw1: GUID 0010b92100519fad, S400
> firewire_core: phy config: card 0, new root=ffc1, gap_count=5
> firewire_sbp2: fw1.1: logged in to LUN 0000 (0 retries)
> sharlyn:~ # lsscsi 
> [0:0:0:0]    disk    ATA      ST3500320AS      SD15  /dev/sda
> [4:0:0:0]    disk    Maxtor   OneTouch II      023g  /dev/sdb
> sharlyn:~ # 

Looks better, though this also worked in the old drivers.

(Unrelated side note:  Funny that Apple configure their FireWire 800
controllers as OHCI 1.0 controllers rather than OHCI 1.1.  Evidently
they stopped to care about details.)
Comment 6 Olaf Hering 2009-02-19 03:22:00 UTC
Ben,

do you have an idea why there are PAGE_SIZE related issues on PowerMac?
Comment 7 Anonymous Emailer 2009-02-19 04:23:05 UTC
Reply-To: stefanr@s5r6.in-berlin.de

What's known so far:

old driver stack)
Quadlet-sized outbound read transactions time out.
ohci1394's AT-req or AR-resp DMA context programming doesn't work.

new driver stack)
Everything succeeds until at least the SCSI inquiry command inclusive.
This means the physical/ AT-req/ AR-resp/ AR-req DMA units of the
FireWire controller are working to some degree (for quadlet-sized and
block-sized inbound and outbound transactions).  Comment 3 looks like
further SCSI commands which are issued by sd_mod's probe failed; but why
is not clear from the provided diagnostics so far.
Comment 8 Stefan Richter 2009-06-04 13:05:21 UTC
old driver stack)
See last comment -- likely a bug in ohci1394.

new driver stack)
The messages look like a bug at the SCSI level, and more likely a firmware bug rather than a kernel bug.

Perhaps the 64k kernel issues SCSI requests which cause the OneTouch II's firmware to lock up, while the 4k kernel does not issue such requests (or issues them with other parameters).  Especially near-the-end-of-disk read requests are famous for revealing all sorts of different firmware bugs, mostly with USB disks, but also with some other disks, e.g. FireWire.

Also see here a report of a different problem with disk accesses ("attempt to access beyond end of device", "sdb3: rw=0, want=31664120, limit=31664115", on as yet unmerged linux-next code) which only happen on 64k PAGE_SIZE kernels, not on 4k PAGE_SIZE kernels:  http://marc.info/?l=linux-kernel&m=124406549620554
Comment 9 Stefan Richter 2009-12-30 11:18:33 UTC
Sorry that there was no concrete progress yet...

old driver stack)
Nobody is going to look at this in ohci1394 anymore.  The newer firewire driver stack are now (in 2.6.3X kernels) functional enough that all mainstream distributors should switch to it.  Hence I link this bug entry to meta bug 10046 "won't fix in ieee1394 (fixed or to be fixed in firewire)".

new driver stack)
I wrote:
>> Everything succeeds until at least the SCSI inquiry command inclusive.
>> This means the physical/ AT-req/ AR-resp/ AR-req DMA units of the
>> FireWire controller are working to some degree (for quadlet-sized and
>> block-sized inbound and outbound transactions).  Comment 3 looks like
>> further SCSI commands which are issued by sd_mod's probe failed;
>> but why is not clear from the provided diagnostics so far.
> The messages look like a bug at the SCSI level, and more likely a
> firmware bug rather than a kernel bug.

On the other hand, it could still be that the problem is related to the handling of request buffers.  The SCSI Inquiry does already involve a buffer though.  It is only a small buffer without SBP-2 scatter-gather list, but the other requests which happen during SCSI device probe do not look like that they would require large buffers.  Still, firewire-sbp2's DMA mapping should be checked for potential issues with page sizes that differ from 4k.
Comment 10 Alan 2012-10-30 15:51:40 UTC
If this is still seen on modern kernels then please re-open/update