Distribution: openSuSE 11.1-ppc Hardware Environment: PowerMac Problem Description: spb2 fails to login into a Maxtor OneTouch with Error logging into target on an Apple PowerMac G5. works with 4K PAGE_SIZE test kernel. fails with 64K PAGE_SIZE kernel. ohci1394 0001:03:0e.0: enabling device (0000 -> 0002) ohci1394: fw-host0: OHCI-1394 1.0 (PCI): IRQ=[40] MMIO=[80200000-802007ff] Max Packet=[4096] IR/IT contexts=[8/8] ieee1394: Host added: ID:BUS[0-00:1023] GUID[000a95fffebab0e8] ieee1394: Error parsing configrom for node 0-00:1023 ieee1394: Node changed: 0-00:1023 -> 0-01:1023 ... motherboard : PowerMac7,2 MacRISC4 Power Macintosh detected as : 336 (PowerMac G5) tested with SLES11 kernel, and with kernel-vanilla 2.6.27 and 2.6.29-rc5 ftp://ftp.suse.com/pub/projects/kernel/kotd/SLE11_BRANCH/ppc64/
Reply-To: stefanr@s5r6.in-berlin.de (Olaf, if you reply by mail, please Cc linux1394-devel@lists.sourceforge.net.) bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=12725 ... > spb2 fails to login into a Maxtor OneTouch with Error logging into target on > an > Apple PowerMac G5. > > works with 4K PAGE_SIZE test kernel. > fails with 64K PAGE_SIZE kernel. Interesting failure mode. > ohci1394 0001:03:0e.0: enabling device (0000 -> 0002) > ohci1394: fw-host0: OHCI-1394 1.0 (PCI): IRQ=[40] MMIO=[80200000-802007ff] > Max Packet=[4096] IR/IT contexts=[8/8] > ieee1394: Host added: ID:BUS[0-00:1023] GUID[000a95fffebab0e8] > ieee1394: Error parsing configrom for node 0-00:1023 > ieee1394: Node changed: 0-00:1023 -> 0-01:1023 Actually it looks not that sbp2 login fails but that ieee1394 is unable to read the configuration ROM of the device. Otherwise there would be a "Node added: ..." message for the disk eventually. Please confirm this by $ ls /sys/bus/ieee1394/devices/ And the cause of this could for example be a programming issue in ohci1394. Could very well be that some assumptions in that driver are violated if PAGE_SIZE != 4k. Please also test asynchronous I/O with either gscanbus or firecontrol. In gscanbus, just check if gscanbus is able to detect the node type and to show the GUID etc. when you click on the icon. Or if you can't easily obtain gscanbus for openSUSE 11, start firecontrol instead and issue the command r . 0 0xfffff0000404 4 The result of this should be 31 33 39 34 This is assuming that the disk has the physical ID 0 like in your dmesg. To test reading from the local node which is physical ID 1, use r . 1 0xfffff0000404 4 or to address the local node independently of its actual physical ID: r . . 0xfffff0000404 4 This would merely serve as a verification that you are using firecontrol correctly. :-) (I/O to and from the local node almost never touches the controller hardware and hence works despite any potential ohci1394 bug.) And lastly, I would be interested in learning how firewire-sbp2 -- firewire-core -- firewire-ohci behave on PPC64 + PAGE_SIZE = whatever. Just recompile the kernel modules with CONFIG_FIREWIRE(_...) switched on, unload ohci1394, load firewire-ohci, and enjoy.
this is with 2.6.27.17-SLE11_BRANCH_20090216205308_cb6936d0-ppc64: firecontrol 0.2 Copyright (C) 2002-2007 by Manfred Weihs <mweihs@users.sourceforge.net> This software comes with absolutely no warranty. No adapter specified! successfully got handle current generation number (driver): 1 1 card(s) found nodes on bus: 2, card name: ohci1394 using adapter 0 found: 2 nodes on bus, local ID is 1, IRM is 1 current generation number (adapter): 3 entering command mode Type 'help' for more information! Command: help available commands: help ... display this help br ... reset the bus br short ... reset the bus (short 1394a bus reset) r bus node offset size ... read from a node w bus node offset size data ... write from a node l bus node offset extTCode data arg ... lock a node p data ... send phy packet i ... print information (nr. of nodes etc.) v ... print version information e ... exit For bus and node arguments '.' can be used as local bus or local node. Command: r . 0 0xfffff0000404 4 reading from node 0, bus 1023, offset 0XFFFFF0000404 4 bytes read failed. Timeout Command: r . 1 0xfffff0000404 4 reading from node 1, bus 1023, offset 0XFFFFF0000404 4 bytes read succeeded. Data follows (hex): 31 33 39 34 Ack code: pending; Response code: complete Command: r . . 0xfffff0000404 4 reading from node 1, bus 1023, offset 0XFFFFF0000404 4 bytes read succeeded. Data follows (hex): 31 33 39 34 Ack code: pending; Response code: complete Command:
2.6.29-rc5-20090218_bug476479_3f6963f1-vanilla with 64K sharlyn:~ # dmesg | grep fire firewire_ohci 0001:03:0e.0: enabling device (0000 -> 0002) firewire_ohci: Added fw-ohci device 0001:03:0e.0, OHCI version 1.0 firewire_core: created device fw0: GUID 000a95fffebab0e8, S800 firewire_core: created device fw1: GUID 0010b92100519fad, S400 firewire_core: phy config: card 0, new root=ffc1, gap_count=5 firewire_sbp2: fw1.1: logged in to LUN 0000 (0 retries) firewire_sbp2: fw1.1: sbp2_scsi_abort sharlyn:~ # lsscsi [0:0:0:0] disk ATA ST3500320AS SD15 /dev/sda [4:0:0:0] disk Maxtor OneTouch II 023g -
same kernel with 4K sharlyn:~ # dmesg | grep fire firewire_ohci 0001:03:0e.0: enabling device (0000 -> 0002) firewire_ohci: Added fw-ohci device 0001:03:0e.0, OHCI version 1.0 firewire_core: created device fw0: GUID 000a95fffebab0e8, S800 firewire_core: created device fw1: GUID 0010b92100519fad, S400 firewire_core: phy config: card 0, new root=ffc1, gap_count=5 firewire_sbp2: fw1.1: logged in to LUN 0000 (0 retries) sharlyn:~ # lsscsi [0:0:0:0] disk ATA ST3500320AS SD15 /dev/sda [4:0:0:0] disk Maxtor OneTouch II 023g /dev/sdb sharlyn:~ #
Reply-To: stefanr@s5r6.in-berlin.de (quoting in full for linux1394-devel) bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=12725 > > > ------- Comment #2 from olaf@aepfle.de 2009-02-18 01:05 ------- > this is with 2.6.27.17-SLE11_BRANCH_20090216205308_cb6936d0-ppc64: > > firecontrol 0.2 > Copyright (C) 2002-2007 by Manfred Weihs <mweihs@users.sourceforge.net> > This software comes with absolutely no warranty. > > No adapter specified! > successfully got handle > current generation number (driver): 1 > 1 card(s) found > nodes on bus: 2, card name: ohci1394 > using adapter 0 > found: 2 nodes on bus, local ID is 1, IRM is 1 > current generation number (adapter): 3 > > entering command mode > Type 'help' for more information! > Command: help > available commands: > help ... display this help > br ... reset the bus > br short ... reset the bus (short 1394a bus reset) > r bus node offset size ... read from a node > w bus node offset size data ... write from a node > l bus node offset extTCode data arg ... lock a node > p data ... send phy packet > i ... print information (nr. of nodes etc.) > v ... print version information > e ... exit > > For bus and node arguments '.' can be used as local bus or local node. > > Command: r . 0 0xfffff0000404 4 > reading from node 0, bus 1023, offset 0XFFFFF0000404 4 bytes > read failed. > Timeout > Command: r . 1 0xfffff0000404 4 > reading from node 1, bus 1023, offset 0XFFFFF0000404 4 bytes > read succeeded. Data follows (hex): > 31 33 39 34 > Ack code: pending; Response code: complete > Command: r . . 0xfffff0000404 4 > reading from node 1, bus 1023, offset 0XFFFFF0000404 4 bytes > read succeeded. Data follows (hex): > 31 33 39 34 > Ack code: pending; Response code: complete > Command: OK, that's what I expected. not only ieee1394's nodemgr but also firecontrol (via raw1394) is unable to perform transactions with the external node. > ------- Comment #3 from olaf@aepfle.de 2009-02-18 02:19 ------- > 2.6.29-rc5-20090218_bug476479_3f6963f1-vanilla with 64K > > sharlyn:~ # dmesg | grep fire > firewire_ohci 0001:03:0e.0: enabling device (0000 -> 0002) > firewire_ohci: Added fw-ohci device 0001:03:0e.0, OHCI version 1.0 > firewire_core: created device fw0: GUID 000a95fffebab0e8, S800 > firewire_core: created device fw1: GUID 0010b92100519fad, S400 OK, firewire-core's equivalent to ieee1394's nodemgr is successful. > firewire_core: phy config: card 0, new root=ffc1, gap_count=5 > firewire_sbp2: fw1.1: logged in to LUN 0000 (0 retries) > firewire_sbp2: fw1.1: sbp2_scsi_abort > sharlyn:~ # lsscsi > [0:0:0:0] disk ATA ST3500320AS SD15 /dev/sda > [4:0:0:0] disk Maxtor OneTouch II 023g - Is this sbp2_scsi_abort part of an unrecoverable SCSI I/O failure? Seems to be, since there is no /dev/sdb listed. If so, did you check that it was repeatable, not a random one-off failure? > ------- Comment #4 from olaf@aepfle.de 2009-02-18 02:45 ------- > same kernel with 4K > > sharlyn:~ # dmesg | grep fire > firewire_ohci 0001:03:0e.0: enabling device (0000 -> 0002) > firewire_ohci: Added fw-ohci device 0001:03:0e.0, OHCI version 1.0 > firewire_core: created device fw0: GUID 000a95fffebab0e8, S800 > firewire_core: created device fw1: GUID 0010b92100519fad, S400 > firewire_core: phy config: card 0, new root=ffc1, gap_count=5 > firewire_sbp2: fw1.1: logged in to LUN 0000 (0 retries) > sharlyn:~ # lsscsi > [0:0:0:0] disk ATA ST3500320AS SD15 /dev/sda > [4:0:0:0] disk Maxtor OneTouch II 023g /dev/sdb > sharlyn:~ # Looks better, though this also worked in the old drivers. (Unrelated side note: Funny that Apple configure their FireWire 800 controllers as OHCI 1.0 controllers rather than OHCI 1.1. Evidently they stopped to care about details.)
Ben, do you have an idea why there are PAGE_SIZE related issues on PowerMac?
Reply-To: stefanr@s5r6.in-berlin.de What's known so far: old driver stack) Quadlet-sized outbound read transactions time out. ohci1394's AT-req or AR-resp DMA context programming doesn't work. new driver stack) Everything succeeds until at least the SCSI inquiry command inclusive. This means the physical/ AT-req/ AR-resp/ AR-req DMA units of the FireWire controller are working to some degree (for quadlet-sized and block-sized inbound and outbound transactions). Comment 3 looks like further SCSI commands which are issued by sd_mod's probe failed; but why is not clear from the provided diagnostics so far.
old driver stack) See last comment -- likely a bug in ohci1394. new driver stack) The messages look like a bug at the SCSI level, and more likely a firmware bug rather than a kernel bug. Perhaps the 64k kernel issues SCSI requests which cause the OneTouch II's firmware to lock up, while the 4k kernel does not issue such requests (or issues them with other parameters). Especially near-the-end-of-disk read requests are famous for revealing all sorts of different firmware bugs, mostly with USB disks, but also with some other disks, e.g. FireWire. Also see here a report of a different problem with disk accesses ("attempt to access beyond end of device", "sdb3: rw=0, want=31664120, limit=31664115", on as yet unmerged linux-next code) which only happen on 64k PAGE_SIZE kernels, not on 4k PAGE_SIZE kernels: http://marc.info/?l=linux-kernel&m=124406549620554
Sorry that there was no concrete progress yet... old driver stack) Nobody is going to look at this in ohci1394 anymore. The newer firewire driver stack are now (in 2.6.3X kernels) functional enough that all mainstream distributors should switch to it. Hence I link this bug entry to meta bug 10046 "won't fix in ieee1394 (fixed or to be fixed in firewire)". new driver stack) I wrote: >> Everything succeeds until at least the SCSI inquiry command inclusive. >> This means the physical/ AT-req/ AR-resp/ AR-req DMA units of the >> FireWire controller are working to some degree (for quadlet-sized and >> block-sized inbound and outbound transactions). Comment 3 looks like >> further SCSI commands which are issued by sd_mod's probe failed; >> but why is not clear from the provided diagnostics so far. > The messages look like a bug at the SCSI level, and more likely a > firmware bug rather than a kernel bug. On the other hand, it could still be that the problem is related to the handling of request buffers. The SCSI Inquiry does already involve a buffer though. It is only a small buffer without SBP-2 scatter-gather list, but the other requests which happen during SCSI device probe do not look like that they would require large buffers. Still, firewire-sbp2's DMA mapping should be checked for potential issues with page sizes that differ from 4k.
If this is still seen on modern kernels then please re-open/update