Bug 10935
Summary: | fw-ohci: ALi M52xx unsupported | ||
---|---|---|---|
Product: | Drivers | Reporter: | Stefan Richter (stefanr) |
Component: | IEEE1394 | Assignee: | drivers_ieee1394 |
Status: | CLOSED INVALID | ||
Severity: | normal | CC: | andi, jarod, naveed |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.22 and later | Subsystem: | |
Regression: | No | Bisected commit-id: |
Description
Stefan Richter
2008-06-18 13:24:28 UTC
I have an 'ALi Corporation M5253 P1394 OHCI 1.1 Controller' and it does not work with the new firewire stack either. Here are some details about it from a downstream bug. [ lspci -vvv ] https://bugzilla.redhat.com/show_bug.cgi?id=444694#c19 [ cold boot 1 ] https://bugzilla.redhat.com/show_bug.cgi?id=444694#c20 [ cold boot 2 ] https://bugzilla.redhat.com/show_bug.cgi?id=444694#c21 These results are consistent with what I get from my card. If nobody else is quicker, I will attempt to fix firewire-ohci for these cards eventually. Right now I am caught up in other activities though. As mentioned above, the controller gets stuck in a state where only intEvent.busReset is on and no events like selfIDComplete or RQPkt are happening anymore. ohci1394 does not have this issue even though it does not contain any special ALi targeted workaround. Reply-To: stefanr@s5r6.in-berlin.de Today I did a few more tests with my ALi M5271 based Belkin PCI card. The issue is _not_ that we wouldn't clear the busReset IRQ event bit. This bit /is/ cleared as intended when the very first selfID complete event is handled. Rather, the problem is that after this first selfID complete event, neither the AR response DMA nor the selfID receive DMA do anything anymore. The AR context control register still looks OK though. Also, a bus reset because something is plugged in or out does switch IntEventSet.busReset back on --- it's just that no AR or selfID events come in anymore, nor is the contents of the selfID receive DMA buffer updated. I then plugged another PC into the card to see how the ALi node looks from a remote node's perspective. (At first this crashed the local PC first because the Belkin card is obviously very cheaply wired, so that the remote bus power provider feeds into one of the PC's 12 V power rails. I worked around that by putting a 6-pin + 4-pin node in the middle to cut off bus power.) The remote PC receives a selfID packet from the ALi M5271 which looks good. It goes on to attempt to read the ALi's config ROM. All of the remote node's read requests to 0xffff'f000'0400 are going out with ack_pending --- but not seen by firewire-ohci on the local node, hence not answered by any response. Vice versa, the remote node does not receive requests from the ALi node, which means that the AT DMA on the ALi is obviously dead too. However, I checked that the contextControl.run bit is indeed switched on when firewire-ohci queues requests for the AT context. So the next task now is to try to find out more about why AT, AR and selfID receive DMA get inoperable after the first selfID complete event. It is probably a chip bug which for an as yet unknown reason is never triggered by ohci1394, only by firewire-ohci. Past testing showed one or two incidents --- among hundreds if not thousands attempts --- where the DMAs did *not* die for a few bus generations. I.e. there seems to be a race condition in the hardware. PS: cycle64Seconds interrupt events still come in at the expected intervals. Reply-To: stefanr@s5r6.in-berlin.de On 29 Nov, Stefan Richter wrote: > Today I did a few more tests with my ALi M5271 based Belkin PCI card. > The issue is _not_ that we wouldn't clear the busReset IRQ event bit. > This bit /is/ cleared as intended when the very first selfID complete > event is handled. > > Rather, the problem is that after this first selfID complete event, > neither the AR response DMA nor the selfID receive DMA do anything > anymore. The AR context control register still looks OK though. Also, > a bus reset because something is plugged in or out does switch > IntEventSet.busReset back on --- it's just that no AR or selfID events > come in anymore, nor is the contents of the selfID receive DMA buffer > updated. [...] > The remote PC receives a selfID packet > from the ALi M5271 which looks good. It goes on to attempt to read the > ALi's config ROM. All of the remote node's read requests to > 0xffff'f000'0400 are going out with ack_pending --- but not seen by > firewire-ohci on the local node, hence not answered by any response. > > Vice versa, the remote node does not receive requests from the ALi node, > which means that the AT DMA on the ALi is obviously dead too. However, > I checked that the contextControl.run bit is indeed switched on when > firewire-ohci queues requests for the AT context. Just for the record, at that time I also tried the following addition to firewire-ohci which fiddles with PHY port registers like ohci1394 does. However, this did *not* help with the ALi card. --- drivers/firewire/ohci.c | 47 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+) Index: linux-2.6.32-rc7/drivers/firewire/ohci.c =================================================================== --- linux-2.6.32-rc7.orig/drivers/firewire/ohci.c +++ linux-2.6.32-rc7/drivers/firewire/ohci.c @@ -483,6 +483,52 @@ static int ohci_update_phy_reg(struct fw return 0; } +static int get_phy_reg(struct fw_ohci *ohci, int addr) +{ + int val; + + reg_write(ohci, OHCI1394_PhyControl, OHCI1394_PhyControl_Read(addr)); + flush_writes(ohci); + msleep(2); + val = reg_read(ohci, OHCI1394_PhyControl); + if ((val & OHCI1394_PhyControl_ReadDone) == 0) { + fw_error("failed to get phy reg bits\n"); + return -EBUSY; + } + + return OHCI1394_PhyControl_ReadData(val); +} + +static int set_phy_reg(struct fw_ohci *ohci, int addr, int val) +{ + reg_write(ohci, OHCI1394_PhyControl, + OHCI1394_PhyControl_Write(addr, val)); + flush_writes(ohci); + msleep(2); + val = reg_read(ohci, OHCI1394_PhyControl); + if ((val & OHCI1394_PhyControl_WriteDone) != 0) { + fw_error("failed to set phy reg bits\n"); + return -EBUSY; + } + + return 0; +} + +static void enable_phy_ports(struct fw_ohci *ohci) +{ + int i, num_ports, status; + + num_ports = get_phy_reg(ohci, 2) & 0xf; + for (i = 0; i < num_ports; i++) { + set_phy_reg(ohci, 7, i); + status = get_phy_reg(ohci, 8); + if (status < 0) + break; + if (status & 0x20) + set_phy_reg(ohci, 8, status & ~1); + } +} + static int ar_context_add_page(struct ar_context *ctx) { struct device *dev = ctx->ohci->card.device; @@ -1653,6 +1699,7 @@ static int ohci_enable(struct fw_card *c OHCI1394_HCControl_linkEnable | OHCI1394_HCControl_BIBimageValid); flush_writes(ohci); + enable_phy_ports(ohci); /* * We are ready to go, initiate bus reset to finish the Reply-To: stefanr@s5r6.in-berlin.de And the following patch which makes PHY register writes block until the link issued the write to the PHY (like ohci1394 does) does *not* help either. --- drivers/firewire/ohci.c | 54 ++++++++++++++++++++++++++++++---------- 1 file changed, 40 insertions(+), 14 deletions(-) Index: linux-2.6.32.2/drivers/firewire/ohci.c =================================================================== --- linux-2.6.32.2.orig/drivers/firewire/ohci.c +++ linux-2.6.32.2/drivers/firewire/ohci.c @@ -452,27 +452,53 @@ static inline void flush_writes(const st reg_read(ohci, OHCI1394_Version); } -static int ohci_update_phy_reg(struct fw_card *card, int addr, - int clear_bits, int set_bits) +static int phy_reg_read(const struct fw_ohci *ohci, int addr) { - struct fw_ohci *ohci = fw_ohci(card); - u32 val, old; + u32 val; + int i; reg_write(ohci, OHCI1394_PhyControl, OHCI1394_PhyControl_Read(addr)); - flush_writes(ohci); - msleep(2); - val = reg_read(ohci, OHCI1394_PhyControl); - if ((val & OHCI1394_PhyControl_ReadDone) == 0) { - fw_error("failed to set phy reg bits.\n"); - return -EBUSY; + for (i = 0; i < 10; i++) { + val = reg_read(ohci, OHCI1394_PhyControl); + if (val & OHCI1394_PhyControl_ReadDone) + return OHCI1394_PhyControl_ReadData(val); + + msleep(1); } + fw_error("failed to get phy reg\n"); + + return -EBUSY; +} + +static int phy_reg_write(const struct fw_ohci *ohci, int addr, u32 val) +{ + int i; - old = OHCI1394_PhyControl_ReadData(val); - old = (old & ~clear_bits) | set_bits; reg_write(ohci, OHCI1394_PhyControl, - OHCI1394_PhyControl_Write(addr, old)); + OHCI1394_PhyControl_Write(addr, val)); + for (i = 0; i < 100; i++) { + val = reg_read(ohci, OHCI1394_PhyControl); + if (val & OHCI1394_PhyControl_WriteDone) + return 0; - return 0; + msleep(1); + } + fw_error("failed to set phy reg\n"); + + return -EBUSY; +} + +static int ohci_update_phy_reg(struct fw_card *card, int addr, + int clear_bits, int set_bits) +{ + struct fw_ohci *ohci = fw_ohci(card); + int ret; + + ret = phy_reg_read(ohci, addr); + if (ret < 0) + return ret; + + return phy_reg_write(ohci, addr, (ret & ~clear_bits) | set_bits); } static int ar_context_add_page(struct ar_context *ctx) Another downstream report: https://bugzilla.redhat.com/show_bug.cgi?id=514839#c4 ... #c7 Downstream bug move to https://bugzilla.redhat.com/show_bug.cgi?id=577937 Stefan Richter wrote on 2010-01-12: > And the following patch which makes PHY register writes block until the > link issued the write to the PHY (like ohci1394 does) does *not* help > either. > --- > drivers/firewire/ohci.c | 54 ++++++++++++++++++++++++++++++---------- > 1 file changed, 40 insertions(+), 14 deletions(-) [...] > +static int phy_reg_write(const struct fw_ohci *ohci, int addr, u32 val) > +{ > + int i; > > - old = OHCI1394_PhyControl_ReadData(val); > - old = (old & ~clear_bits) | set_bits; > reg_write(ohci, OHCI1394_PhyControl, > - OHCI1394_PhyControl_Write(addr, old)); > + OHCI1394_PhyControl_Write(addr, val)); > + for (i = 0; i < 100; i++) { > + val = reg_read(ohci, OHCI1394_PhyControl); > + if (val & OHCI1394_PhyControl_WriteDone) > + return 0; [...] There was a mistake here because OHCI1394_PhyControl_WriteDone = 0x00004000 is misnamed. The test should have been if (!(val & 0x00004000)) return 0; I retried with this one fixed, but my M5271 card still does not work. IOW I am still in the dark about the cause of this bug. Today, firewire-ohci's probe failed with "Failed to reset ohci card.", i.e. the HCControl.softReset bit did not go off within the 500 miliseconds retry loop of the intial software_reset() call. For an unrelated reason, I activated the old driver stack today (haven't used it for a while). To my surprise, ohci1394 was unable to initialize the ALi controller:
>>>
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Runaway loop while stopping context: ...
ohci1394: fw-host4: Runaway loop while stopping context: ...
ohci1394: fw-host4: Runaway loop while stopping context: ...
ohci1394: fw-host4: Runaway loop while stopping context: ...
ohci1394: fw-host4: OHCI-1394 165.165 (PCI): IRQ=[23] MMIO=[b3000-b37ff] Max Packet=[65536] IR/IT contexts=[32/32]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
ohci1394: fw-host4: Serial EEPROM has suspicious values, attempting to set max_packet_size to 512 bytes
ohci1394: fw-host4: Set PHY Reg timeout [0xffffffff/0x00004000/100]
<<<
Evidently, ohci1394 gets ~0 from several MMIO reads. Either the card's EEPROM gave out or something else happened to the card.
On this Belkin card, the ALi M5271 link is attached to a Texas Instruments TSB41AB3 phy. Only known erratum: "Errata For the 1394 Physical Layer Devices", http://www.ti.com/litv/pdf/sllz012. Does not look like a candidate cause for this ohci1394->firewire-ohci regression. At http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=545112#108, Ben Hutchings reports that an ALi M5253 link + Agere FW802C phy based card works at least some of the time. The problem is still there as of Linux 3.0-rc4. Example of a startup with an external node connected: Jul 4 01:32:01 stein kernel: firewire_ohci 0000:0c:07.4: PCI INT C -> GSI 23 (level, low) -> IRQ 23 Jul 4 01:32:01 stein kernel: firewire_ohci: Added fw-ohci device 0000:0c:07.4, OHCI v1.10, 4 IR + 8 IT contexts, quirks 0x1 Jul 4 01:32:01 stein kernel: firewire_ohci: IRQ 00020010 AR_req busReset Jul 4 01:32:01 stein kernel: firewire_ohci: IRQ 00020000 busReset Jul 4 01:32:01 stein kernel: firewire_ohci: IRQ 00030000 selfID busReset Jul 4 01:32:01 stein kernel: firewire_ohci: IRQ 00020000 busReset Jul 4 01:32:01 stein kernel: firewire_ohci: AR evt_bus_reset, generation 1 Jul 4 01:32:01 stein kernel: firewire_ohci: IRQ 00020000 busReset Jul 4 01:32:01 stein kernel: firewire_ohci: IRQ 00020000 busReset Jul 4 01:32:01 stein kernel: firewire_ohci: IRQ 00020000 busReset Jul 4 01:32:01 stein kernel: firewire_ohci: IRQ 00020000 busReset Jul 4 01:32:01 stein kernel: firewire_ohci: IRQ 00020000 busReset Jul 4 01:32:01 stein kernel: firewire_ohci: IRQ 00020000 busReset Jul 4 01:32:01 stein kernel: firewire_ohci: IRQ 00020000 busReset Jul 4 01:32:01 stein kernel: firewire_ohci: IRQ 00020000 busReset Jul 4 01:32:01 stein kernel: firewire_ohci: IRQ 00020000 busReset Jul 4 01:32:01 stein kernel: firewire_ohci: IRQ 00020000 busReset Jul 4 01:32:01 stein kernel: firewire_ohci: IRQ 00020000 busReset Jul 4 01:32:01 stein kernel: firewire_ohci: IRQ 00020000 busReset Jul 4 01:32:01 stein kernel: firewire_ohci: IRQ 00020000 busReset Jul 4 01:32:01 stein kernel: firewire_ohci: 2 selfIDs, generation 1, local node ID ffc0 Jul 4 01:32:01 stein kernel: firewire_ohci: selfID 0: 807f8c96, phy 0 [p--] S400 gc=63 -3W Lci Jul 4 01:32:01 stein kernel: firewire_ohci: selfID 0: 817f8470, phy 1 [-c.] S400 gc=63 -3W L Jul 4 01:32:01 stein kernel: firewire_core: created device fw6: GUID 0030bd051800064f, S400 After this proper self ID reception, the quadlet read request that core-device.c issues to the remote node never causes an AT-req interrupt. But there aren't any further busReset IRQ events either, also not when the external node is being unplugged. Only cycle64Seconds IRQ events happen. After a long while of using the PCI slots for other cards, I put the Belkin F50508 back in yesterday in order to get back to this bug again. ------------------------------------------------------------------------- boot ------------------------------------------------------------------------- Apr 30 20:38:16 stein kernel: pci 0000:0c:07.0: [10b9:5237] type 00 class 0x0c0310 Apr 30 20:38:16 stein kernel: pci 0000:0c:07.0: reg 10: [mem 0xfb7ff000-0xfb7fffff] Apr 30 20:38:16 stein kernel: pci 0000:0c:07.0: PME# supported from D0 D1 D3hot D3cold Apr 30 20:38:16 stein kernel: pci 0000:0c:07.1: [10b9:5237] type 00 class 0x0c0310 Apr 30 20:38:16 stein kernel: pci 0000:0c:07.1: reg 10: [mem 0xfb7fe000-0xfb7fefff] Apr 30 20:38:16 stein kernel: pci 0000:0c:07.1: PME# supported from D0 D1 D3hot D3cold Apr 30 20:38:16 stein kernel: pci 0000:0c:07.3: [10b9:5239] type 00 class 0x0c0320 Apr 30 20:38:16 stein kernel: pci 0000:0c:07.3: reg 10: [mem 0xfb7fdc00-0xfb7fdcff] Apr 30 20:38:16 stein kernel: pci 0000:0c:07.3: PME# supported from D0 D3hot D3cold Apr 30 20:38:16 stein kernel: pci 0000:0c:07.4: [10b9:5253] type 00 class 0x0c0010 Apr 30 20:38:16 stein kernel: pci 0000:0c:07.4: reg 10: [mem 0xfb7fd000-0xfb7fd7ff] Apr 30 20:38:16 stein kernel: pci 0000:0c:07.4: reg 30: [mem 0xfb7e0000-0xfb7effff pref] Apr 30 20:38:16 stein kernel: pci 0000:0c:07.4: supports D1 D2 Apr 30 20:38:16 stein kernel: pci 0000:0c:07.4: PME# supported from D1 D2 D3hot Apr 30 20:38:16 stein kernel: ehci_hcd 0000:0c:07.3: EHCI Host Controller Apr 30 20:38:16 stein kernel: ehci_hcd 0000:0c:07.3: new USB bus registered, assigned bus number 3 Apr 30 20:38:16 stein kernel: ehci_hcd 0000:0c:07.3: debug port 1 Apr 30 20:38:16 stein kernel: ehci_hcd 0000:0c:07.3: irq 21, io mem 0xfb7fdc00 Apr 30 20:38:16 stein kernel: ehci_hcd 0000:0c:07.3: USB 2.0 started, EHCI 1.00 Apr 30 20:38:16 stein kernel: ohci_hcd 0000:0c:07.0: OHCI Host Controller Apr 30 20:38:16 stein kernel: ohci_hcd 0000:0c:07.0: new USB bus registered, assigned bus number 9 Apr 30 20:38:16 stein kernel: ohci_hcd 0000:0c:07.0: irq 22, io mem 0xfb7ff000 Apr 30 20:38:16 stein kernel: ohci_hcd 0000:0c:07.1: OHCI Host Controller Apr 30 20:38:16 stein kernel: ohci_hcd 0000:0c:07.1: new USB bus registered, assigned bus number 10 Apr 30 20:38:16 stein kernel: ohci_hcd 0000:0c:07.1: irq 22, io mem 0xfb7fe000 ------------------------------------------------------------------------- modprobe firewire-ohci debug=2 ------------------------------------------------------------------------- Apr 30 20:51:49 stein kernel: firewire_ohci 0000:0c:07.4: added OHCI v1.10 device as card 4, 4 IR + 8 IT contexts, quirks 0x1 Apr 30 20:51:49 stein kernel: firewire_ohci 0000:0c:07.4: 1 selfIDs, generation 1, local node ID ffc0 Apr 30 20:51:49 stein kernel: firewire_ohci 0000:0c:07.4: selfID 0: 807f8c56, phy 0 [---] S400 gc=63 -3W Lci Apr 30 20:51:50 stein kernel: firewire_core 0000:0c:07.4: created device fw7: GUID 0030bd051800064f, S400 ------------------------------------------------------------------------- plug in a bus-powered camera ------------------------------------------------------------------------- May 1 12:40:41 stein kernel: ohci_hcd 0000:0c:07.1: HC died; cleaning up ------------------------------------------------------------------------- plug the camera out again ------------------------------------------------------------------------- results in bus reset on two other FireWire controllers (on the same PSU power rail), and the video screen goes off, showing just black + vertical stripes ------------------------------------------------------------------------- So this card is evidently seriously buggy or defective at least WRT bus power supply. I now removed it from my PC and do not intend to experiment with it further. "Re: [PATCH 4/4] PCI: quirk Atheros AR93xx to avoid bus reset" https://lkml.org/lkml/2014/12/26/30 (and especially the discussion at the link within the patch) may very well be able to shed some hope on this seemingly "lost" case, which seems to be very related to "bus reset" stuff (mentioned several times above!), especially since http://www.spinics.net/lists/linux-pci/msg35902.html also contains these very relevant lines: "When doing the echo 1 > reset, the shell doesn't come back again and the blinking of the cursor gets immediately slower. Getting slower means: it takes some more time until it is on / off again again. This way, it "blinks" another not exceeding 2 times until it's finally dead. It looks like the machine would have suddenly extremely high load (there are 8 cores!) - but this seems to be not true, because the cpu fan stays silent - the rpm isn't changed at all. - Most of the time, I'm doing tests which fail, I'm having problems after the hang with USB (it's the Etron device). Problem means: initrd isn't able to communicate with the device (but bios and grub2 didn't had any problem, because keyboard worked fine, which is connected via USB 3). At this point, it is necessary to disconnect the mains completely and wait half a minute until the problem disappears. Seldom, I too had this problem even on bios stage: the keyboard couldn't be seen even by the bios any more. - Sometimes (really seldom - now happened about 3 times), it gets extremely hard to return to normal operation after that hang. This means: Since a few weeks, I'm running kernel 3.12.28-3-desktop out of the box (= as provided by openSUSE). Sometimes now, I got (apparently) the same problems (= PCIe passthrough hangs the complete machine) w/ 3.12.28 as I'm having with stock >= 3.14 after testing. It's even useless then to reconnect the mains (I experienced this 2 times in series after one hang yesterday). At this point, I have to run kernel 3.10.x (which runs pretty fine as usual) and only after that, 3.12 works again as expected (as appeared once yesterday while tests w/ disabled USB 3 devices via bios). - I think there is a relationship between how long the hang is active and the consecutive problems coming up. If the hang is immediately (max about 1s) reset w/ the reset knob, it is possible, that there is no USB problem after reboot and the machine works completely fine with 3.12.x again. Conclusion (from my point of view): The broken reset seems to do something really _extreme ugly_ w/ the hardware, which has the potential to break the hardware "lasting" or the consecutive software isn't able at all to correctly reconfigure the system again - even after reconnecting the mains. Fortunately I'm having an old kernel version (3.10.x), which seems to be able to "repair" the hardware again. But I have to emphasis that the situation is really highly questionable and I'm meanwhile fearing to break my board finally, which is working really _extremely_ stable besides that." Since on my existing hardware combo I had ugly hardware effects very similar to what Stefan notes above and what this link describes, too, I took a note to possibly revisit this case, but no promises... So I decided to revisit this recently, and it turned out that things are a lot brighter than they used to be: while this card caused an awful amount of issues (PC lockup when loading USB EHCI driver, lockup when loading firewire-ohci, instant reboot on suspend or shutdown, ...), I now strongly suspect that at least the instant-reboot issue is a "exceedingly boringly normal" thing (plus..... further candy items below :): http://www.tonymacx86.com/general-help/65531-gigabyte-uefi-bios-startech-firewire-pcie-card-sleep-wake-shutdown-restart-issues-thread-8.html documents (in very detailed form, yet with no final solution there) a very large number of issues with various Firewire cards and instant-re-wakeup of PC hardware. HOWEVER, once I found that I could disable AWARD BIOS v6.00PG "Power Management Settings" -> "IRQ/Event Activity Detect" -> "PCI PowerOn wakeup", there was *NO* instant automatic re-wakeup occurring any more, which meant that I could finally keep this card installed permanently, for much more productive analysis. For this particular issue, I now strongly suspect that re-enabling PCI PowerOn wakeup in BIOS and then manually fiddling with relevant bits in PCI CAP_PM range (this seems best described in ohci1206.pdf, interestingly - see PME_EN bit probably!) at one or more PCI sub devices of my combo card ought to be able to successfully keep my box from doing wakeup despite PCI PME wakeup in general (i.e., for all other cards!) being activated in BIOS (I really ought to have this verified ASAP). Then it's possibly only a matter of adding a PCI quirk for certain devices on this card, but perhaps for certain environment conditions only (e.g.: weird BIOS with ACPI PCI PME wakeup config issues?). Quite possibly dito for many other PCI cards with PME wakeup issues. keywords: PCI PME wakeup instant suspend resume poweron powerup. Also, after many hours of debugging, I accidentally (IIRC) figured out that skipping BIBimageValid.Set (i.e., ignoring config ROM stuff) will suddenly make the FireWire part of this ALi M5271 card work i.e. have AT events (more or less... operating HDD via SBP2, etc.) properly without getting stuck in busReset screaming IRQ storm lockups (keywords, anyone?). Since there were several reports that it worked with old ieee1394 driver, I just did an investigation and found that ieee1394 as of 2.6.36 did not even contain a define for BIBimageValid at all!!! (one, two, three: "blech!!"). This very moment I'm in the process of submitting a simple and precise quirk-based hotfix patch for this card. If anyone could revisit inoperative cards (prime candidate: nForce2) with this BIBimageValid disabled one-trick pony, then that would be useful one would think. Big thanks also go to the very informative discussion over at "[firewire] plugging device into nForce2 controller hangs system" https://bugzilla.redhat.com/show_bug.cgi?id=244576 |