Bug 10935 - fw-ohci: ALi M52xx unsupported
Summary: fw-ohci: ALi M52xx unsupported
Status: CLOSED INVALID
Alias: None
Product: Drivers
Classification: Unclassified
Component: IEEE1394 (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_ieee1394
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-06-18 13:24 UTC by Stefan Richter
Modified: 2015-06-17 20:19 UTC (History)
3 users (show)

See Also:
Kernel Version: 2.6.22 and later
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Stefan Richter 2008-06-18 13:24:28 UTC
Tested with 2.6.26-rc6 and firewire patches du jour,
Belkin F50508 PCI card with ALi M5271 OHCI 1.1 controller:

fw-ohci gets the first self ID complete IRQ event with a valid self ID buffer.  After that, no further IRQ events except for cycle64Seconds are received anymore. I.e. no further self ID receive interrupts when something is plugged in or out, no AT-req interrupts when a phy packet or a read request packet is sent.

"modprobe firewire-ohci debug=-1", which also unmasks bus reset IRQ events, causes an endless stream of bus reset IRQ events to be logged.

The very same card works OK with ohci1394.  Tested with a HDD + sbp2 and with a miniDV camcorder + raw1394/kino.
Comment 1 Naveed Hasan 2008-07-02 14:05:59 UTC
I have an 'ALi Corporation M5253 P1394 OHCI 1.1 Controller' and it does not work with the new firewire stack either. Here are some details about it from a downstream bug.

[ lspci -vvv  ] https://bugzilla.redhat.com/show_bug.cgi?id=444694#c19
[ cold boot 1 ] https://bugzilla.redhat.com/show_bug.cgi?id=444694#c20
[ cold boot 2 ] https://bugzilla.redhat.com/show_bug.cgi?id=444694#c21
Comment 2 Stefan Richter 2008-07-02 14:29:11 UTC
These results are consistent with what I get from my card.  If nobody else is quicker, I will attempt to fix firewire-ohci for these cards eventually.  Right now I am caught up in other activities though.

As mentioned above, the controller gets stuck in a state where only intEvent.busReset is on and no events like selfIDComplete or RQPkt are happening anymore.  ohci1394 does not have this issue even though it does not contain any special ALi targeted workaround.
Comment 3 Anonymous Emailer 2009-11-29 18:07:18 UTC
Reply-To: stefanr@s5r6.in-berlin.de

Today I did a few more tests with my ALi M5271 based Belkin PCI card.
The issue is _not_ that we wouldn't clear the busReset IRQ event bit.
This bit /is/ cleared as intended when the very first selfID complete
event is handled.

Rather, the problem is that after this first selfID complete event,
neither the AR response DMA nor the selfID receive DMA do anything
anymore.  The AR context control register still looks OK though.  Also,
a bus reset because something is plugged in or out does switch
IntEventSet.busReset back on --- it's just that no AR or selfID events
come in anymore, nor is the contents of the selfID receive DMA buffer
updated.

I then plugged another PC into the card to see how the ALi node looks
from a remote node's perspective.  (At first this crashed the local PC
first because the Belkin card is obviously very cheaply wired, so that
the remote bus power provider feeds into one of the PC's 12 V power
rails.  I worked around that by putting a 6-pin + 4-pin node in the
middle to cut off bus power.)  The remote PC receives a selfID packet
from the ALi M5271 which looks good.  It goes on to attempt to read the
ALi's config ROM.  All of the remote node's read requests to
0xffff'f000'0400 are going out with ack_pending --- but not seen by
firewire-ohci on the local node, hence not answered by any response.

Vice versa, the remote node does not receive requests from the ALi node,
which means that the AT DMA on the ALi is obviously dead too.  However,
I checked that the contextControl.run bit is indeed switched on when
firewire-ohci queues requests for the AT context.

So the next task now is to try to find out more about why AT, AR and
selfID receive DMA get inoperable after the first selfID complete event.
It is probably a chip bug which for an as yet unknown reason is never
triggered by ohci1394, only by firewire-ohci.  Past testing showed one
or two incidents --- among hundreds if not thousands attempts --- where
the DMAs did *not* die for a few bus generations.  I.e. there seems to
be a race condition in the hardware.
Comment 4 Stefan Richter 2009-11-29 18:09:46 UTC
PS: cycle64Seconds interrupt events still come in at the expected intervals.
Comment 5 Anonymous Emailer 2010-01-12 16:31:10 UTC
Reply-To: stefanr@s5r6.in-berlin.de

On 29 Nov, Stefan Richter wrote:
> Today I did a few more tests with my ALi M5271 based Belkin PCI card.
> The issue is _not_ that we wouldn't clear the busReset IRQ event bit.
> This bit /is/ cleared as intended when the very first selfID complete
> event is handled.
> 
> Rather, the problem is that after this first selfID complete event,
> neither the AR response DMA nor the selfID receive DMA do anything
> anymore.  The AR context control register still looks OK though.  Also,
> a bus reset because something is plugged in or out does switch
> IntEventSet.busReset back on --- it's just that no AR or selfID events
> come in anymore, nor is the contents of the selfID receive DMA buffer
> updated.
[...]
> The remote PC receives a selfID packet
> from the ALi M5271 which looks good.  It goes on to attempt to read the
> ALi's config ROM.  All of the remote node's read requests to
> 0xffff'f000'0400 are going out with ack_pending --- but not seen by
> firewire-ohci on the local node, hence not answered by any response.
> 
> Vice versa, the remote node does not receive requests from the ALi node,
> which means that the AT DMA on the ALi is obviously dead too.  However,
> I checked that the contextControl.run bit is indeed switched on when
> firewire-ohci queues requests for the AT context.

Just for the record, at that time I also tried the following addition to
firewire-ohci which fiddles with PHY port registers like ohci1394 does.
However, this did *not* help with the ALi card.
---
 drivers/firewire/ohci.c |   47 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

Index: linux-2.6.32-rc7/drivers/firewire/ohci.c
===================================================================
--- linux-2.6.32-rc7.orig/drivers/firewire/ohci.c
+++ linux-2.6.32-rc7/drivers/firewire/ohci.c
@@ -483,6 +483,52 @@ static int ohci_update_phy_reg(struct fw
 	return 0;
 }
 
+static int get_phy_reg(struct fw_ohci *ohci, int addr)
+{
+	int val;
+
+	reg_write(ohci, OHCI1394_PhyControl, OHCI1394_PhyControl_Read(addr));
+	flush_writes(ohci);
+	msleep(2);
+	val = reg_read(ohci, OHCI1394_PhyControl);
+	if ((val & OHCI1394_PhyControl_ReadDone) == 0) {
+		fw_error("failed to get phy reg bits\n");
+		return -EBUSY;
+	}
+
+	return OHCI1394_PhyControl_ReadData(val);
+}
+
+static int set_phy_reg(struct fw_ohci *ohci, int addr, int val)
+{
+	reg_write(ohci, OHCI1394_PhyControl,
+		  OHCI1394_PhyControl_Write(addr, val));
+	flush_writes(ohci);
+	msleep(2);
+	val = reg_read(ohci, OHCI1394_PhyControl);
+	if ((val & OHCI1394_PhyControl_WriteDone) != 0) {
+		fw_error("failed to set phy reg bits\n");
+		return -EBUSY;
+	}
+
+	return 0;
+}
+
+static void enable_phy_ports(struct fw_ohci *ohci)
+{
+	int i, num_ports, status;
+
+	num_ports = get_phy_reg(ohci, 2) & 0xf;
+	for (i = 0; i < num_ports; i++) {
+		set_phy_reg(ohci, 7, i);
+		status = get_phy_reg(ohci, 8);
+		if (status < 0)
+			break;
+		if (status & 0x20)
+			set_phy_reg(ohci, 8, status & ~1);
+	}
+}
+
 static int ar_context_add_page(struct ar_context *ctx)
 {
 	struct device *dev = ctx->ohci->card.device;
@@ -1653,6 +1699,7 @@ static int ohci_enable(struct fw_card *c
 		  OHCI1394_HCControl_linkEnable |
 		  OHCI1394_HCControl_BIBimageValid);
 	flush_writes(ohci);
+	enable_phy_ports(ohci);
 
 	/*
 	 * We are ready to go, initiate bus reset to finish the
Comment 6 Anonymous Emailer 2010-01-12 18:42:56 UTC
Reply-To: stefanr@s5r6.in-berlin.de

And the following patch which makes PHY register writes block until the
link issued the write to the PHY (like ohci1394 does) does *not* help
either.
---
 drivers/firewire/ohci.c |   54 ++++++++++++++++++++++++++++++----------
 1 file changed, 40 insertions(+), 14 deletions(-)

Index: linux-2.6.32.2/drivers/firewire/ohci.c
===================================================================
--- linux-2.6.32.2.orig/drivers/firewire/ohci.c
+++ linux-2.6.32.2/drivers/firewire/ohci.c
@@ -452,27 +452,53 @@ static inline void flush_writes(const st
 	reg_read(ohci, OHCI1394_Version);
 }
 
-static int ohci_update_phy_reg(struct fw_card *card, int addr,
-			       int clear_bits, int set_bits)
+static int phy_reg_read(const struct fw_ohci *ohci, int addr)
 {
-	struct fw_ohci *ohci = fw_ohci(card);
-	u32 val, old;
+	u32 val;
+	int i;
 
 	reg_write(ohci, OHCI1394_PhyControl, OHCI1394_PhyControl_Read(addr));
-	flush_writes(ohci);
-	msleep(2);
-	val = reg_read(ohci, OHCI1394_PhyControl);
-	if ((val & OHCI1394_PhyControl_ReadDone) == 0) {
-		fw_error("failed to set phy reg bits.\n");
-		return -EBUSY;
+	for (i = 0; i < 10; i++) {
+		val = reg_read(ohci, OHCI1394_PhyControl);
+		if (val & OHCI1394_PhyControl_ReadDone)
+			return OHCI1394_PhyControl_ReadData(val);
+
+		msleep(1);
 	}
+	fw_error("failed to get phy reg\n");
+
+	return -EBUSY;
+}
+
+static int phy_reg_write(const struct fw_ohci *ohci, int addr, u32 val)
+{
+	int i;
 
-	old = OHCI1394_PhyControl_ReadData(val);
-	old = (old & ~clear_bits) | set_bits;
 	reg_write(ohci, OHCI1394_PhyControl,
-		  OHCI1394_PhyControl_Write(addr, old));
+		  OHCI1394_PhyControl_Write(addr, val));
+	for (i = 0; i < 100; i++) {
+		val = reg_read(ohci, OHCI1394_PhyControl);
+		if (val & OHCI1394_PhyControl_WriteDone)
+			return 0;
 
-	return 0;
+		msleep(1);
+	}
+	fw_error("failed to set phy reg\n");
+
+	return -EBUSY;
+}
+
+static int ohci_update_phy_reg(struct fw_card *card, int addr,
+			       int clear_bits, int set_bits)
+{
+	struct fw_ohci *ohci = fw_ohci(card);
+	int ret;
+
+	ret = phy_reg_read(ohci, addr);
+	if (ret < 0)
+		return ret;
+
+	return phy_reg_write(ohci, addr, (ret & ~clear_bits) | set_bits);
 }
 
 static int ar_context_add_page(struct ar_context *ctx)
Comment 7 Stefan Richter 2010-03-29 18:54:26 UTC
Another downstream report:
https://bugzilla.redhat.com/show_bug.cgi?id=514839#c4 ... #c7
Comment 8 Stefan Richter 2010-03-29 19:48:30 UTC
Downstream bug move to
https://bugzilla.redhat.com/show_bug.cgi?id=577937
Comment 9 Stefan Richter 2010-04-02 13:44:25 UTC
Stefan Richter wrote on 2010-01-12:
> And the following patch which makes PHY register writes block until the
> link issued the write to the PHY (like ohci1394 does) does *not* help
> either.
> ---
>  drivers/firewire/ohci.c |   54 ++++++++++++++++++++++++++++++----------
>  1 file changed, 40 insertions(+), 14 deletions(-)
[...]
> +static int phy_reg_write(const struct fw_ohci *ohci, int addr, u32 val)
> +{
> +     int i;
>  
> -     old = OHCI1394_PhyControl_ReadData(val);
> -     old = (old & ~clear_bits) | set_bits;
>       reg_write(ohci, OHCI1394_PhyControl,
> -               OHCI1394_PhyControl_Write(addr, old));
> +               OHCI1394_PhyControl_Write(addr, val));
> +     for (i = 0; i < 100; i++) {
> +             val = reg_read(ohci, OHCI1394_PhyControl);
> +             if (val & OHCI1394_PhyControl_WriteDone)
> +                     return 0;
[...]

There was a mistake here because OHCI1394_PhyControl_WriteDone =
0x00004000 is misnamed.  The test should have been

		if (!(val & 0x00004000))
			return 0;

I retried with this one fixed, but my M5271 card still does not work.
IOW I am still in the dark about the cause of this bug.
Comment 10 Stefan Richter 2010-04-24 20:18:40 UTC
Today, firewire-ohci's probe failed with "Failed to reset ohci card.", i.e. the HCControl.softReset bit did not go off within the 500 miliseconds retry loop of the intial software_reset() call.
Comment 11 Stefan Richter 2010-04-25 23:12:11 UTC
For an unrelated reason, I activated the old driver stack today (haven't used it for a while).  To my surprise, ohci1394 was unable to initialize the ALi controller:

>>>
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]                             
ohci1394: fw-host4: Runaway loop while stopping context: ...                                    
ohci1394: fw-host4: Runaway loop while stopping context: ...                                    
ohci1394: fw-host4: Runaway loop while stopping context: ...                                    
ohci1394: fw-host4: Runaway loop while stopping context: ...                                    
ohci1394: fw-host4: OHCI-1394 165.165 (PCI): IRQ=[23]  MMIO=[b3000-b37ff]  Max Packet=[65536]  IR/IT contexts=[32/32]                                                                           
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]                             
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]                             
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]                             
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Serial EEPROM has suspicious values, attempting to set max_packet_size to 512 bytes
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
<<<

Evidently, ohci1394 gets ~0 from several MMIO reads.  Either the card's EEPROM gave out or something else happened to the card.
Comment 12 Stefan Richter 2010-04-27 19:13:43 UTC
On this Belkin card, the ALi M5271 link is attached to a Texas Instruments TSB41AB3 phy.  Only known erratum:  "Errata For the 1394 Physical Layer Devices", http://www.ti.com/litv/pdf/sllz012.  Does not look like a candidate cause for this ohci1394->firewire-ohci regression.
Comment 13 Stefan Richter 2010-05-23 18:24:35 UTC
At http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=545112#108, Ben Hutchings reports that an ALi M5253 link + Agere FW802C phy based card works at least some of the time.
Comment 14 Stefan Richter 2011-07-04 00:00:46 UTC
The problem is still there as of Linux 3.0-rc4.

Example of a startup with an external node connected:

Jul  4 01:32:01 stein kernel: firewire_ohci 0000:0c:07.4: PCI INT C -> GSI 23 (level, low) -> IRQ 23
Jul  4 01:32:01 stein kernel: firewire_ohci: Added fw-ohci device 0000:0c:07.4, OHCI v1.10, 4 IR + 8 IT contexts, quirks 0x1
Jul  4 01:32:01 stein kernel: firewire_ohci: IRQ 00020010 AR_req busReset
Jul  4 01:32:01 stein kernel: firewire_ohci: IRQ 00020000 busReset
Jul  4 01:32:01 stein kernel: firewire_ohci: IRQ 00030000 selfID busReset
Jul  4 01:32:01 stein kernel: firewire_ohci: IRQ 00020000 busReset
Jul  4 01:32:01 stein kernel: firewire_ohci: AR evt_bus_reset, generation 1
Jul  4 01:32:01 stein kernel: firewire_ohci: IRQ 00020000 busReset
Jul  4 01:32:01 stein kernel: firewire_ohci: IRQ 00020000 busReset
Jul  4 01:32:01 stein kernel: firewire_ohci: IRQ 00020000 busReset
Jul  4 01:32:01 stein kernel: firewire_ohci: IRQ 00020000 busReset
Jul  4 01:32:01 stein kernel: firewire_ohci: IRQ 00020000 busReset
Jul  4 01:32:01 stein kernel: firewire_ohci: IRQ 00020000 busReset
Jul  4 01:32:01 stein kernel: firewire_ohci: IRQ 00020000 busReset
Jul  4 01:32:01 stein kernel: firewire_ohci: IRQ 00020000 busReset
Jul  4 01:32:01 stein kernel: firewire_ohci: IRQ 00020000 busReset
Jul  4 01:32:01 stein kernel: firewire_ohci: IRQ 00020000 busReset
Jul  4 01:32:01 stein kernel: firewire_ohci: IRQ 00020000 busReset
Jul  4 01:32:01 stein kernel: firewire_ohci: IRQ 00020000 busReset
Jul  4 01:32:01 stein kernel: firewire_ohci: IRQ 00020000 busReset
Jul  4 01:32:01 stein kernel: firewire_ohci: 2 selfIDs, generation 1, local node ID ffc0
Jul  4 01:32:01 stein kernel: firewire_ohci: selfID 0: 807f8c96, phy 0 [p--] S400 gc=63 -3W Lci
Jul  4 01:32:01 stein kernel: firewire_ohci: selfID 0: 817f8470, phy 1 [-c.] S400 gc=63 -3W L
Jul  4 01:32:01 stein kernel: firewire_core: created device fw6: GUID 0030bd051800064f, S400

After this proper self ID reception, the quadlet read request that core-device.c issues to the remote node never causes an AT-req interrupt.  But there aren't any further busReset IRQ events either, also not when the external node is being unplugged.  Only cycle64Seconds IRQ events happen.
Comment 15 Stefan Richter 2012-05-01 11:09:12 UTC
After a long while of using the PCI slots for other cards, I put the Belkin F50508 back in yesterday in order to get back to this bug again.

-------------------------------------------------------------------------
boot
-------------------------------------------------------------------------

Apr 30 20:38:16 stein kernel: pci 0000:0c:07.0: [10b9:5237] type 00 class 0x0c0310
Apr 30 20:38:16 stein kernel: pci 0000:0c:07.0: reg 10: [mem 0xfb7ff000-0xfb7fffff]
Apr 30 20:38:16 stein kernel: pci 0000:0c:07.0: PME# supported from D0 D1 D3hot D3cold
Apr 30 20:38:16 stein kernel: pci 0000:0c:07.1: [10b9:5237] type 00 class 0x0c0310
Apr 30 20:38:16 stein kernel: pci 0000:0c:07.1: reg 10: [mem 0xfb7fe000-0xfb7fefff]
Apr 30 20:38:16 stein kernel: pci 0000:0c:07.1: PME# supported from D0 D1 D3hot D3cold
Apr 30 20:38:16 stein kernel: pci 0000:0c:07.3: [10b9:5239] type 00 class 0x0c0320
Apr 30 20:38:16 stein kernel: pci 0000:0c:07.3: reg 10: [mem 0xfb7fdc00-0xfb7fdcff]
Apr 30 20:38:16 stein kernel: pci 0000:0c:07.3: PME# supported from D0 D3hot D3cold
Apr 30 20:38:16 stein kernel: pci 0000:0c:07.4: [10b9:5253] type 00 class 0x0c0010
Apr 30 20:38:16 stein kernel: pci 0000:0c:07.4: reg 10: [mem 0xfb7fd000-0xfb7fd7ff]
Apr 30 20:38:16 stein kernel: pci 0000:0c:07.4: reg 30: [mem 0xfb7e0000-0xfb7effff pref]
Apr 30 20:38:16 stein kernel: pci 0000:0c:07.4: supports D1 D2
Apr 30 20:38:16 stein kernel: pci 0000:0c:07.4: PME# supported from D1 D2 D3hot
Apr 30 20:38:16 stein kernel: ehci_hcd 0000:0c:07.3: EHCI Host Controller
Apr 30 20:38:16 stein kernel: ehci_hcd 0000:0c:07.3: new USB bus registered, assigned bus number 3
Apr 30 20:38:16 stein kernel: ehci_hcd 0000:0c:07.3: debug port 1
Apr 30 20:38:16 stein kernel: ehci_hcd 0000:0c:07.3: irq 21, io mem 0xfb7fdc00
Apr 30 20:38:16 stein kernel: ehci_hcd 0000:0c:07.3: USB 2.0 started, EHCI 1.00
Apr 30 20:38:16 stein kernel: ohci_hcd 0000:0c:07.0: OHCI Host Controller
Apr 30 20:38:16 stein kernel: ohci_hcd 0000:0c:07.0: new USB bus registered, assigned bus number 9
Apr 30 20:38:16 stein kernel: ohci_hcd 0000:0c:07.0: irq 22, io mem 0xfb7ff000
Apr 30 20:38:16 stein kernel: ohci_hcd 0000:0c:07.1: OHCI Host Controller
Apr 30 20:38:16 stein kernel: ohci_hcd 0000:0c:07.1: new USB bus registered, assigned bus number 10
Apr 30 20:38:16 stein kernel: ohci_hcd 0000:0c:07.1: irq 22, io mem 0xfb7fe000

-------------------------------------------------------------------------
modprobe firewire-ohci debug=2
-------------------------------------------------------------------------

Apr 30 20:51:49 stein kernel: firewire_ohci 0000:0c:07.4: added OHCI v1.10 device as card 4, 4 IR + 8 IT contexts, quirks 0x1
Apr 30 20:51:49 stein kernel: firewire_ohci 0000:0c:07.4: 1 selfIDs, generation 1, local node ID ffc0
Apr 30 20:51:49 stein kernel: firewire_ohci 0000:0c:07.4: selfID 0: 807f8c56, phy 0 [---] S400 gc=63 -3W Lci
Apr 30 20:51:50 stein kernel: firewire_core 0000:0c:07.4: created device fw7: GUID 0030bd051800064f, S400

-------------------------------------------------------------------------
plug in a bus-powered camera
-------------------------------------------------------------------------

May  1 12:40:41 stein kernel: ohci_hcd 0000:0c:07.1: HC died; cleaning up

-------------------------------------------------------------------------
plug the camera out again
-------------------------------------------------------------------------

results in bus reset on two other FireWire controllers (on the same PSU power rail), and the video screen goes off, showing just black + vertical stripes

-------------------------------------------------------------------------

So this card is evidently seriously buggy or defective at least WRT bus power supply.  I now removed it from my PC and do not intend to experiment with it further.
Comment 16 Andreas Mohr 2014-12-28 20:23:39 UTC
"Re: [PATCH 4/4] PCI: quirk Atheros AR93xx to avoid bus reset" https://lkml.org/lkml/2014/12/26/30
(and especially the discussion at the link within the patch)
may very well be able to shed some hope on this seemingly "lost" case, which seems to be very related to "bus reset" stuff (mentioned several times above!), especially since http://www.spinics.net/lists/linux-pci/msg35902.html
also contains these very relevant lines:

"When doing the echo 1 > reset, the shell doesn't come
back again and the blinking of the cursor gets immediately slower.
Getting slower means: it takes some more time until it is on / off again
again. This way, it "blinks" another not exceeding 2 times until it's
finally dead.
It looks like the machine would have suddenly extremely high load (there
are 8 cores!) - but this seems to be not true, because the cpu fan stays
silent - the rpm isn't changed at all.


- Most of the time, I'm doing tests which fail, I'm having problems
after the hang with USB (it's the Etron device). Problem means: initrd
isn't able to communicate with the device (but bios and grub2 didn't had
any problem, because keyboard worked fine, which is connected via USB
3). At this point, it is necessary to disconnect the mains completely
and wait half a minute until the problem disappears.

Seldom, I too had this problem even on bios stage: the keyboard couldn't
be seen even by the bios any more.


- Sometimes (really seldom - now happened about 3 times), it gets
extremely hard to return to normal operation after that hang. This
means: Since a few weeks, I'm running kernel 3.12.28-3-desktop out of
the box (= as provided by openSUSE). Sometimes now, I got (apparently)
the same problems (= PCIe passthrough hangs the complete machine) w/
3.12.28 as I'm having with stock >= 3.14 after testing. It's even
useless then to reconnect the mains (I experienced this 2 times in
series after one hang yesterday). At this point, I have to run kernel
3.10.x (which runs pretty fine as usual) and only after that, 3.12 works
again as expected (as appeared once yesterday while tests w/ disabled
USB 3 devices via bios).


- I think there is a relationship between how long the hang is active
and the consecutive problems coming up. If the hang is immediately (max
about 1s) reset w/ the reset knob, it is possible, that there is no USB
problem after reboot and the machine works completely fine with 3.12.x
again.


Conclusion (from my point of view):
The broken reset seems to do something really _extreme ugly_ w/ the
hardware, which has the potential to break the hardware "lasting" or the
consecutive software isn't able at all to correctly reconfigure the
system again - even after reconnecting the mains.
Fortunately I'm having an old kernel version (3.10.x), which seems to be
able to "repair" the hardware again. But I have to emphasis that the
situation is really highly questionable and I'm meanwhile fearing to
break my board finally, which is working really _extremely_ stable
besides that."


Since on my existing hardware combo I had ugly hardware effects very similar to what Stefan notes above and what this link describes, too, I took a note to possibly revisit this case, but no promises...
Comment 17 Andreas Mohr 2015-06-17 20:19:33 UTC
So I decided to revisit this recently, and it turned out that things are a lot brighter than they used to be: while this card caused an awful amount of issues (PC lockup when loading USB EHCI driver, lockup when loading firewire-ohci, instant reboot on suspend or shutdown, ...), I now strongly suspect that at least the instant-reboot issue is a "exceedingly boringly normal" thing (plus..... further candy items below :):

http://www.tonymacx86.com/general-help/65531-gigabyte-uefi-bios-startech-firewire-pcie-card-sleep-wake-shutdown-restart-issues-thread-8.html
documents (in very detailed form, yet with no final solution there) a very large number of issues with various Firewire cards and instant-re-wakeup of PC hardware.

HOWEVER, once I found that I could disable AWARD BIOS v6.00PG "Power Management Settings" -> "IRQ/Event Activity Detect" -> "PCI PowerOn wakeup", there was *NO* instant automatic re-wakeup occurring any more, which meant that I could finally keep this card installed permanently, for much more productive analysis.

For this particular issue, I now strongly suspect that re-enabling PCI PowerOn wakeup in BIOS and then manually fiddling with relevant bits in PCI CAP_PM range (this seems best described in ohci1206.pdf, interestingly - see PME_EN bit probably!) at one or more PCI sub devices of my combo card ought to be able to successfully keep my box from doing wakeup despite PCI PME wakeup in general (i.e., for all other cards!) being activated in BIOS (I really ought to have this verified ASAP). Then it's possibly only a matter of adding a PCI quirk for certain devices on this card, but perhaps for certain environment conditions only (e.g.: weird BIOS with ACPI PCI PME wakeup config issues?).
Quite possibly dito for many other PCI cards with PME wakeup issues.

keywords: PCI PME wakeup instant suspend resume poweron powerup.


Also, after many hours of debugging, I accidentally (IIRC) figured out that skipping BIBimageValid.Set (i.e., ignoring config ROM stuff) will suddenly make the FireWire part of this ALi M5271 card work i.e. have AT events (more or less... operating HDD via SBP2, etc.) properly without getting stuck in busReset screaming IRQ storm lockups (keywords, anyone?).
Since there were several reports that it worked with old ieee1394 driver, I just did an investigation and found that ieee1394 as of 2.6.36 did not even contain a define for BIBimageValid at all!!! (one, two, three: "blech!!").


This very moment I'm in the process of submitting a simple and precise quirk-based hotfix patch for this card. If anyone could revisit inoperative cards (prime candidate: nForce2) with this BIBimageValid disabled one-trick pony, then that would be useful one would think.

Big thanks also go to the very informative discussion over at
"[firewire] plugging device into nForce2 controller hangs system" https://bugzilla.redhat.com/show_bug.cgi?id=244576

Note You need to log in before you can comment on or make changes to this bug.