Distribution: Gentoo Hardware Environment: IBM Thinkpad T42, HP Omnibook 6100 Software Environment: Gnome Problem Description: When I plug in a digital video camera (JVC GR-DVL9800) it appears on the bus when using a 2.4 kernel, but not in 2.6 (neither devfs nor udev). I get an error in /var/log/messages: Feb 4 16:32:24 localhost ohci1394: fw-host0: SelfID received, but NodeID invalid (probably new bus reset occurred): 0000FFC0 I recorded all sorts of information around that, and I'll attach it as a file. The kernel seems to recognize my FW card alright (from firewiredirect) but not the camera. I'd expect the camera itself, but in 2.4 it works perfectly, at least as root. I can use kino, dvcont, or whatever there. I tried the stock kernel 2.6.10 to make sure it's not a Gentoo issue. Steps to reproduce:
Created attachment 4521 [details] various outputs of /var/log/messages
Code was recently added to ieee1394-2.6 to auto-load protocol modules for DV cameras. It might appear in kernel 2.6.11. Most likely the camera is being recognized and active on the bus despite the warning. On kernel 2.6, there is not /proc/bus/ieee1394/devices; you can look in /sys/bus/ieee1394. Only root has read/write permissions to /dev/raw1394 which is perhaps why dvcont fails as a normal user. You should look at how to configure permissions for udev; hopefully, distros will start taking care of this. The Oops is not cool, but I do not have a way to test card hotplugging.
Thanks for the update. I snooped around /sys/bus/ but didn't find anything. (I'm in 2.6.8-gentoo devfs at the moment.) I tried: grep -r 008088 ieee1394/ and it came up empty. 008088 seems to be JVC's ID. I do find entries this way for the fw card itself, by grepping for 0015601. Is there something else I should do? or should I just sit tight and wait for 2.6.11?
There is no reason to wait for 2.6.11. When I wrote "It might appear in 2.6.11" I am referring to the new code, which is really just pure convenience, not your camera! I am disappointed you did not respond about permissions to /dev/raw1394. I want you to make sure raw1394 is loaded. Then, as root, run 'dvcont status' and tell me what happens. Please send the output of 'ls /sys/bus/ieee1394/devices'
Sorry. I'm in devfs, not udev. Of course I've been trying all this as root. Could there be a console permission problem? I'm working as a user in X, su'd to root. localhost bus # uname -a Linux localhost 2.6.8-gentoo-r3 #1 Wed Feb 2 21:51:05 IST 2005 i686 Intel(R) Pentium(R) M processor 1.70GHz GenuineIntel GNU/Linux localhost bus # whoami root localhost bus # lsmod |grep 1394 dv1394 18508 0 raw1394 26668 0 video1394 15692 0 ohci1394 30724 2 dv1394,video1394 ieee1394 93684 5 dv1394,raw1394,video1394,ohci1394,sbp2 localhost bus # ls -l /dev/raw1394 crw------- 1 root root 171, 0 Jan 1 1970 /dev/raw1394 localhost bus # ls /sys/bus/ieee1394/devices/ 00015601000002c1 fw-host0 localhost bus # dvcont status Could not find any AV/C devices on the 1394 bus.
OK, thank you, I am now convinced you have a genuine problem internal to ieee1394 and not operator error. One more thing. Please show me: cat /sys/bus/ieee1394/devices/fw-host0/node_count cat /sys/bus/ieee1394/devices/fw-host0/nodes_active You feel like trying an updated ieee1394? It is simple to do if you are already compiling your own kernel. cd /path/to/src/linux/drivers wget http://www.linux1394.org/viewcvs/ieee1394/trunk.tar.gz?view=tar mv ieee1394 ieee1394-orig tar xvzf trunk.tar.gz\?view\=tar mv trunk ieee1394 cd .. make make modules_install reboot
Here's the output. I have the camera turned on and plugged into the pcmcia card. I'll try the updated 1394 tommorrow. (it's midnight here) Does it matter on which kernel (.8,.9,.10)? I'm still using 2.6.8 because I've lost suspend to disk in the more recent ones, and the usb disk-on-key in 2.6.10. Those are separate bugs, though. Thanks for your help. localhost bus # cat /sys/bus/ieee1394/devices/fw-host0/node_count 0 localhost bus # cat /sys/bus/ieee1394/devices/fw-host0/nodes_active 1
bad news... the make fails, with output: LD drivers/ieee1394/built-in.o CC [M] drivers/ieee1394/ieee1394_core.o CC [M] drivers/ieee1394/ieee1394_transactions.o CC [M] drivers/ieee1394/hosts.o CC [M] drivers/ieee1394/highlevel.o drivers/ieee1394/highlevel.c:46: warning: type defaults to `int' in declaration of `DEFINE_RWLOCK' drivers/ieee1394/highlevel.c:46: warning: parameter names (without types) in function declaration drivers/ieee1394/highlevel.c:48: warning: type defaults to `int' in declaration of `DEFINE_RWLOCK' drivers/ieee1394/highlevel.c:48: warning: parameter names (without types) in function declaration drivers/ieee1394/highlevel.c: In function `hpsb_register_highlevel': drivers/ieee1394/highlevel.c:224: error: `hl_irqs_lock' undeclared (first use in this function) drivers/ieee1394/highlevel.c:224: error: (Each undeclared identifier is reported only once drivers/ieee1394/highlevel.c:224: error: for each function it appears in.) drivers/ieee1394/highlevel.c: In function `__unregister_host': drivers/ieee1394/highlevel.c:253: error: `addr_space_lock' undeclared (first use in this function) drivers/ieee1394/highlevel.c: In function `hpsb_unregister_highlevel': drivers/ieee1394/highlevel.c:287: error: `hl_irqs_lock' undeclared (first use in this function) drivers/ieee1394/highlevel.c: In function `hpsb_allocate_and_register_addrspace': drivers/ieee1394/highlevel.c:340: error: `addr_space_lock' undeclared (first use in this function) drivers/ieee1394/highlevel.c: In function `hpsb_register_addrspace': drivers/ieee1394/highlevel.c:399: error: `addr_space_lock' undeclared (first use in this function) drivers/ieee1394/highlevel.c: In function `hpsb_unregister_addrspace': drivers/ieee1394/highlevel.c:433: error: `addr_space_lock' undeclared (first use in this function) drivers/ieee1394/highlevel.c: In function `highlevel_host_reset': drivers/ieee1394/highlevel.c:526: error: `hl_irqs_lock' undeclared (first use in this function) drivers/ieee1394/highlevel.c: In function `highlevel_iso_receive': drivers/ieee1394/highlevel.c:539: error: `hl_irqs_lock' undeclared (first use in this function) drivers/ieee1394/highlevel.c: In function `highlevel_fcp_request': drivers/ieee1394/highlevel.c:553: error: `hl_irqs_lock' undeclared (first use in this function) drivers/ieee1394/highlevel.c: In function `highlevel_read': drivers/ieee1394/highlevel.c:569: error: `addr_space_lock' undeclared (first use in this function) drivers/ieee1394/highlevel.c: In function `highlevel_write': drivers/ieee1394/highlevel.c:611: error: `addr_space_lock' undeclared (first use in this function) drivers/ieee1394/highlevel.c: In function `highlevel_lock': drivers/ieee1394/highlevel.c:653: error: `addr_space_lock' undeclared (first use in this function) drivers/ieee1394/highlevel.c: In function `highlevel_lock64': drivers/ieee1394/highlevel.c:682: error: `addr_space_lock' undeclared (first use in this function) drivers/ieee1394/highlevel.c: At top level: drivers/ieee1394/highlevel.c:48: warning: `DEFINE_RWLOCK' declared `static' but never defined make[2]: *** [drivers/ieee1394/highlevel.o] Error 1 make[1]: *** [drivers/ieee1394] Error 2 make: *** [drivers] Error 2 I tried this under the Gentoo 2.6.9-r13 and stock 2.6.10 kernel sources, and also tried make mrproper when it didn't work the first time.
I started looking for the answer to your question about version of kernel needed for ieee1394 trunk. Then, I was interrupted and delayed. Sorry, it requires 2.6.11. I understand if you prefer to wait for a general release, but I still find the problem very unusual. It is not a general problem shared by DV camera users.
I will try it in 2.6.11 over the next couple days. I'd blame the camera itself, except that it works fine in 2.4 kernels! it also works in windows but not in OS X (10.2 at least).
Hello, I have the same problem with an Apple isight on my amd64 system (Debian). Kernel is 2.6.11-rc3 and I applied the above mentioned trunk version of the ieee1394 drivers. Now a device /dev/video1394-0 is created. It should be /dev/video1394/0, shouldn't it? :) Coriander detects the card well (it has done it before). video1394 and raw1394 modules are now loaded automatically by hotplug.
Marko, how do you know you have a problem; what is the symptom? You said hotplug took care of loading raw1394 and video1394, so your camera must be recognized. Not having your device detected is a very different problem than proper sysfs and hotplug support. Regarding your question about the video1393 dev file naming, that is the new scheme based upon the 2.6 driver model. You must use a udev rule: KERNEL="video1394*", NAME="video1394/%n" to rename it to the scheme expected by libdc1394 and coriander.
Oh sorry I did not tell the whole story: With kernel 2.6.10 or 2.6.9 the isight was recognized on the ieee1394 bus but video1394 nor raw1394 modules were loaded by hotplug when I plug in the camera. With your latest driver version this now works perfectly. So in the end I might have had a different problem but it seems to be fixed now.
Sorry. I'm still stuck. I brought the 2.6.11_rc 2 kernel (from Gentoo's vanilla sources) and compiled it with the new ieee1394 drivers as you suggested. I'm using Gentoo's udev baselayout, which fills up /dev automatically. I get the same NodeID invalid error when I plug in the camera. It complains a bit about the fw card as well this time. Here is all the output I could think might be relevant. The only major difference I saw was that the node_count is 1 now, not 0. I could try to get it back into devfs if you think that might help. localhost root # uname -a Linux localhost 2.6.11-rc2 #1 Mon Feb 14 22:52:59 IST 2005 i686 Intel(R) Pentium(R) M processor 1.70GHz GenuineIntel GNU/Linux localhost bus # whoami root tail /var/log/messages: plug in FW card (firewiredirect.com) Feb 15 22:37:55 localhost ohci1394: fw-host0: OHCI-1394 1.0 (PCI): IRQ=[11] MMIO=[40c04000-40c047ff] Max Packet=[2048] Feb 15 22:37:55 localhost ieee1394.agent[13624]: ... no drivers for IEEE1394 product 0x/0x/0x Feb 15 22:37:56 localhost ieee1394: Host added: ID:BUS[0-00:1023] GUID[00015601000002c1] Feb 15 22:37:56 localhost ieee1394.agent[13646]: ... no drivers for IEEE1394 product 0x/0x/0x Feb 15 22:38:00 localhost wait_for_sysfs[13601]: either wait_for_sysfs (udev 045) needs an update to handle the device '/class/ieee1394_protocol/video1394-0' properly (no device symlink) or the sysfs-support of your device's driver needs to be fixed, please report to <linux-hotplug-devel@lists.sourceforge.net> Feb 15 22:38:00 localhost wait_for_sysfs[13603]: either wait_for_sysfs (udev 045) needs an update to handle the device '/class/ieee1394_protocol/dv1394-0' properly (no device symlink) or the sysfs-support of your device's driver needs to be fixed, please report to <linux-hotplug-devel@lists.sourceforge.net> plug in DV camera: Feb 15 22:38:29 localhost ohci1394: fw-host0: SelfID received, but NodeID invalid (probably new bus reset occurred): 0000FFC0 localhost bus # lsmod |grep 1394 dv1394 18956 0 raw1394 27372 0 video1394 16460 0 ohci1394 30852 2 dv1394,video1394 ieee1394 90548 5 dv1394,raw1394,video1394,ohci1394,sbp2 localhost bus # ls -l /dev/raw/raw1394 crw-rw---- 1 root disk 171, 0 Feb 16 2005 /dev/raw/raw1394 localhost grub # ls -l /dev/dv1394-0 crw-rw---- 1 root root 171, 32 Feb 15 22:38 /dev/dv1394-0 localhost bus # ls /sys/bus/ieee1394/devices/ 00015601000002c1 fw-host0 localhost bus # cat /sys/bus/ieee1394/devices/fw-host0/node_count 1 localhost bus # cat /sys/bus/ieee1394/devices/fw-host0/nodes_active 1
On first look it appears like a problem in ohci1394. However, the code with the NodeID check did not change from Linux 2.4 to 2.6, so there must be something interfering from a higher level (e.g. bus reset from ieee1394) or lower level (controller setup or PCI setup, interrupt handling...). It seems the 'host->in_bus_reset' flag is switched differently in Linux 2.4 and 2.6. --- A problem with similar symptoms (but perhaps with a different cause) was reported for Linux 2.4.29 at linux1394-user in July 2005: http://marc.theaimsgroup.com/?t=112030364200001 For this one, a simple workaround was found: The camera has to be switched on and connected before ohci1394 is loaded. --- Michael, open a separate bug for the other problem you saw if it is not resolved yet: > if I pull the whole card out, with the camera still plugged in, I get ... > Unable to handle kernel NULL pointer dereference at virtual address 00000000
I did some more checking but the problem remains. I tried a 2.6.12 kernel (Gentoo -r6) as well as .8 and .10. I read the bug you referenced and tried all possible orders of booting without modules, plugging the camera in running or not, and loading ohci1394. Amusingly, or not, when hald is installed but stopped, ohci1394 and ieee1394 are still loaded automatically when I plug in the card. (I took them out of modules.autoconf.d/.) raw1394 does not come in automatically with hald running or not, probably because the system doesn't recognize the device as a video camera. There might be a useful hint from hal about the bus reset. When hald and dbus daemons are running and I unplug the camera (not the fw card) the usb bus also resets. The fw is through pcmcia while the usb is built in. Jul 30 23:44:17 localhost drivers/usb/input/hid-core.c: input irq status -75 received Jul 30 23:44:17 localhost drivers/usb/input/hid-core.c: input irq status -84 received Jul 30 23:44:17 localhost drivers/usb/input/hid-core.c: input irq status -84 received Jul 30 23:44:17 localhost drivers/usb/input/hid-core.c: input irq status -84 received Jul 30 23:44:17 localhost drivers/usb/input/hid-core.c: input irq status -84 received Jul 30 23:44:17 localhost hub 3-0:1.0: port 2 disabled by hub (EMI?), re-enabling... Jul 30 23:44:17 localhost usb 3-2: USB disconnect, address 2 Jul 30 23:44:17 localhost hal.hotplug[15991]: DEVPATH is not set Jul 30 23:44:17 localhost usb 3-2: new low speed USB device using uhci_hcd and a ddress 3 Jul 30 23:44:18 localhost input: USB HID v1.00 Mouse [Logitech USB-PS/2 Trackball] on usb-0000:00:1d.1-2 Jul 30 23:44:18 localhost hal.hotplug[16030]: DEVPATH is not set Jul 30 23:44:19 localhost ohci1394: fw-host0: SelfID received, but NodeID invalid (probably new bus reset occurred): 0000FFC0 At the end, I get back to the NodeID invalid. Updating udev (to -058) didn't help. I'll send another bug about the unplugging problem, as you request. I need to recheck all the conditions precisely. In the meantime I still reboot to 2.4 to see the camera. Is there something else to try? I don't know where to look for the 'host->in_bus_reset' flag.
> Amusingly, or not, when hald is installed but stopped, ohci1394 > and ieee1394 are still loaded automatically when I plug in the > card. I am not familiar with hald. CardBus cards are treated like PCI cards. Card insertion triggers a hotplug event, and /etc/hotplug/pci.agent may have been called. > When hald and dbus daemons are running and I unplug the camera > (not the fw card) the usb bus also resets. The fw is through > pcmcia while the usb is built in. Do the CardBus bridge and USB controller share an interrupt? What is in /proc/interrupts? > I don't know where to look for the 'host->in_bus_reset' flag. I was just thinking out loud. It is a state variable in the ieee1394 drivers.
Yes, you're right. they're on the same interrupt, [11]. So the whole thing resets. localhost root # more /proc/interrupts CPU0 0: 8403973 XT-PIC timer 1: 9100 XT-PIC i8042 2: 0 XT-PIC cascade 7: 2 XT-PIC parport0 9: 2524 XT-PIC acpi 11: 458933 XT-PIC yenta, yenta, ehci_hcd, uhci_hcd, uhci_hcd, uhci_hcd, ipw2200, Intel 82801DB-ICH4, eth0 12: 1234 XT-PIC i8042 14: 9986 XT-PIC ide0 15: 11 XT-PIC ide1 NMI: 0 ERR: 0
Then it seems certain interrupt signals from the FireWire controller are always routed to the wrong interrupt handler. There is nothing that the FireWire drivers can do about this. It has to be fixed somewhere in the PCI subsystem or below. IRQ routing is obviously more successful under Linux 2.4 than under 2.6 on your machine.
PS: Shared interrupts are usually no problem; it puzzles me that they may still be handled wrong in current Linux versions.
It's also puzzling that the same symptom appears on two very different machines, HP Omnibook 6100 and IBM Thinkpad T42. Both are high-end laptops of their day. The /proc/interrupts was from the IBM. The hard disk died Friday on the HP so I'll check it again as soon as the replacement arrives. Perhaps I can fix the problem in the BIOS (suggestions welcome!!!), and I didn't think to try it in Knoppix.
a little progress... I moved as much as I could off of interrupt 11. Also I made a link manually from the new udev style /dev/raw/raw1394 to /dev/raw1394. (Without that dvcont doesn't see raw1394 loaded!) Then once, and only once, it actually worked. I was using the 2.6.11_rc2 kernel. I had ieee1394, ohci1394, and raw1394 loaded by hand. I was so pleased that I rebooted to try again... Nothing could restore the success. I tried with the link to raw1394, with mknod, with the camera on and off and all permutations of the order of loading modules. dvcont always replied 'Could not find...' Since once it did work, that's enough to prove the possibility and maybe grounds for some optimism. Supposing that I'll get it to work again, what other output would help in the diagnosis?
Created attachment 5786 [details] details from 2.6.12
Created attachment 5787 [details] details from 2.4.28
a little more "progress": I found a procedure that seems to work, but only once per boot! After that it refuses. I reboot, and it works again, once. I suspect that the trouble is in the new hotplug agent, ieee.hotplug via udevsend, rather than hal.hotplug in 2.4. The trick is NOT to load the 1394 modules, or to unload them specifically, and then to modprobe video1394. That loads all the other modules and seems to work much of the time. Killing udevd just beforehand seems to help too. As far as I could see ieee1394 and raw1394 don't get in the way, and the problem centers on ohci1394. Even in 2.4, if I unplug or turn off the camera and want it to start again, I have to rmmod all the modules (except ieee1394) beforehand. I've attached 2 little records with /var/log/messages output. does this help the diagnosis?
Your logs show that it worked fine in Linux 2.4.20 but there are problems in Linux 2.4.28. Is this perhaps a VIA VT6306 rev43 controller? (lspci) We have a report that this controller worked with 2.4.20 but not with 2.4.21 and later. http://marc.theaimsgroup.com/?l=linux1394-devel&t=114641862900002
*** Bug 6945 has been marked as a duplicate of this bug. ***
Hi Stefan, how are you? How is it going with this bug? As you already know, I've similar problem with an SBP2 device (Bug 6945, marked as dup of this one), so I was wondering if there is any update or when we can expect one. Just curious, nothing more, since 2.6.19 is out, with a lot of 1394 update, but the problem seems to be still there. Was it supposed to be fixed? Thanks, bye
I suspect that bug 6070 may play a role in this bug. I am currently slowly working on bug 6070.
Re comment #28: Kristian H
Michael, piergiorgio, do you still have the hardware and give an update? If possible, test with firewire-ohci in Linux 2.6.22 or (preferrably) a later kernel and post whether a file for the controller and two files for the camera or two files for the disk are created in /sys/bus/firewire/devices/. I ask because firewire-ohci handles bus resets very differently from ohci1394, as mentioned.
(In reply to comment #31) > Michael, piergiorgio, > do you still have the hardware and give an update? If possible, test with > firewire-ohci in Linux 2.6.22 or (preferrably) a later kernel and post > whether > a file for the controller and two files for the camera or two files for the > disk are created in /sys/bus/firewire/devices/. I ask because firewire-ohci > handles bus resets very differently from ohci1394, as mentioned. > To be honest, I do not remember exactly what this bug was... Anyway, I've right now two PCs with firewire, both with Fedora 8 up-to-date. So, a quick check is with the Fedora kernel, 2.6.23.15-137.fc8 namely. One PC is NVIDIA based, with TI OHCI 1.1 firewire, the other is an old intel based (P-III), with a TI PCI card, OHCI 1.0. The OHCI 1.1 has two new files for the camera, plus the controller one, but, as per Fedora bug, it seems impossible to capture anything, while the control works. On the OHCI 1.0, it seems nothing happens, like the camera was not connected. I'll have to double check this, anyway, if any new file appears in /sys/bus/firewire/devices. SBP2 devices seems to work fine on the OHCI 1.1 PC (except for the libvolume_id story...), the other I'll have to check. Do you recommend to test with vanilla kernels? Or the Fedora ones are enough? pg
Fedora 8 kernels should be fine for purposes of this bug here, i.e. "SelfID received, but NodeID invalid", unless we would get to a point where there would be patches for you to test. So this bug does at least not hit on your TI OHCI 1.1 + firewire-ohci. On the other hand, you reported bug 6945 for the TI OHCI 1.0 controller. I would like to know about that one: - messages in dmesg when firewire-ohci is loaded, and when the camera is plugged in/ switched on (and whether the sysfs files = actually symlinks appear); - whether the SBP-2 device works on the OHCI 1.0 controller; if not, same questions as about the camera. Thanks.
(In reply to comment #33) > Fedora 8 kernels should be fine for purposes of this bug here, i.e. "SelfID > received, but NodeID invalid", unless we would get to a point where there > would > be patches for you to test. > > So this bug does at least not hit on your TI OHCI 1.1 + firewire-ohci. On > the > other hand, you reported bug 6945 for the TI OHCI 1.0 controller. I would > like > to know about that one: > - messages in dmesg when firewire-ohci is loaded, and when the camera is > plugged in/ switched on (and whether the sysfs files = actually symlinks > appear); > - whether the SBP-2 device works on the OHCI 1.0 controller; if not, same > questions as about the camera. Here we go: 1) modprobe firewire-ohci (after rmmod, since the module is auto loaded) ACPI: PCI Interrupt 0000:02:09.0[A] -> GSI 21 (level, low) -> IRQ 16 firewire_ohci: Added fw-ohci device 0000:02:09.0, OHCI version 1.0 firewire_core: created device fw0: GUID 08002856000008da, S400 2) camera on, the two new files appear in /sys/..., but: firewire_ohci: node ID not valid, new bus reset in progress firewire_core: created device fw1: GUID 08004601029441d8, S100 firewire_core: phy config: card 1, new root=ffc1, gap_count=5 firewire_ohci: node ID not valid, new bus reset in progress firewire_core: phy config: card 1, new root=ffc1, gap_count=5 firewire_ohci: node ID not valid, new bus reset in progress firewire_core: phy config: card 1, new root=ffc1, gap_count=5 firewire_ohci: node ID not valid, new bus reset in progress firewire_core: phy config: card 1, new root=ffc1, gap_count=5 firewire_ohci: node ID not valid, new bus reset in progress firewire_core: phy config: card 1, new root=ffc1, gap_count=5 firewire_ohci: node ID not valid, new bus reset in progress firewire_core: phy config: card 1, new root=ffc1, gap_count=5 firewire_ohci: node ID not valid, new bus reset in progress firewire_core: phy config: card 1, new root=ffc1, gap_count=5 firewire_ohci: node ID not valid, new bus reset in progress ... and this goes on forever... 3) SBP2 HD plugged: firewire_core: giving up on config rom for node id ffc0 firewire_ohci: node ID not valid, new bus reset in progress this is stuck here, i.e. only these lines are printed and no new files appear in /sys/..., the device does not show up in any way. Following is the lspci -vv of the 1394 for this PC: 02:09.0 FireWire (IEEE 1394): Texas Instruments TSB12LV23 IEEE-1394 Controller (prog-if 10 [OHCI]) Subsystem: Texas Instruments Unknown device 8010 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 32 (750ns min, 1000ns max), Cache Line Size: 32 bytes Interrupt: pin A routed to IRQ 16 Region 0: Memory at ed800000 (32-bit, non-prefetchable) [size=2K] Region 1: Memory at ed000000 (32-bit, non-prefetchable) [size=16K] Capabilities: [44] Power Management version 1 Flags: PMEClk- DSI- D1- D2+ AuxCurrent=0mA PME(D0-,D1-,D2+,D3hot+,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Kernel driver in use: firewire_ohci Kernel modules: firewire-ohci Final note. A couple of times, an hotplug of a firewire device resulted in a complete PC lock-up (with both PCs), without any log reported and with only a physical reset possible (no sysrq). Hope this helps. pg
OK, then the new bus reset handling scheme does not work, plus there is no dependency on bug 6070.
On the other hand, there is at least a brief window in which the bus is in so far operational that multiple transactions to the camera are successfully completed, leading to "created device fw1: GUID 08004601029441d8, S100". With the HDD, that window does perhaps exist too but is evidently too short to complete reading the HDD's configuration ROM.
Note to self: Try long reset instead of short reset in the bus manager code? The ieee1394 driver already uses a long reset in the IRM code though, so that might be futile.
Note to self: NodeID is a register in the SCLK domain. See http://thread.gmane.org/gmane.linux.kernel.firewire.devel/11929/focus=11940
Piergiorgio, please check with one of the following kernels (whichever is easiest for you to install) whether the "node ID not valid" problem persists: - http://koji.fedoraproject.org/packages/kernel/2.6.24.4/63.fc8/ - vanilla sources plus patchkit from http://me.in-berlin.de/~s5r6/linux1394/updates/ - git cloned sources plus git pull from master branch of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6.git
(In reply to comment #39) > please check with one of the following kernels (whichever is easiest for you > to > install) whether the "node ID not valid" problem persists: > - http://koji.fedoraproject.org/packages/kernel/2.6.24.4/63.fc8/ I went for this one. We are talking about the OHCI 1.0 machine, I guess. For the camera, after removing firewire-ohci and re-inserting it, I see the following: ACPI: PCI Interrupt 0000:02:09.0[A] -> GSI 21 (level, low) -> IRQ 16 firewire_ohci: Added fw-ohci device 0000:02:09.0, OHCI version 1.0 firewire_core: created device fw0: GUID 08002856000008da, S400 firewire_ohci: node ID not valid, new bus reset in progress firewire_core: phy config: card 1, new root=ffc1, gap_count=5 firewire_ohci: node ID not valid, new bus reset in progress firewire_core: giving up on config ROM for node id ffc0 (returned 17) No file creation in /sys/, that is only fw0 is there. With the SBP2: ACPI: PCI Interrupt 0000:02:09.0[A] -> GSI 21 (level, low) -> IRQ 16 firewire_ohci: Added fw-ohci device 0000:02:09.0, OHCI version 1.0 firewire_core: created device fw0: GUID 08002856000008da, S400 firewire_ohci: node ID not valid, new bus reset in progress and it stays here, without going any further. No file in /sys/, except fw0. Hope this helps, pg
One thing I forgot (as usual), do you need the output of /var/log/messages or dmesg is enough? pg
> do you need the output of /var/log/messages or dmesg is enough? In this case, either one is OK.
Reply-To: stefanr@s5r6.in-berlin.de (Cc'ing linux1394-devel) bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=4172 > > ------- Comment #40 from piergiorgio.sartor@nexgo.de 2008-03-29 02:24 > ------- > (In reply to comment #39) >> please check with one of the following kernels (whichever is easiest for you >> to >> install) whether the "node ID not valid" problem persists: >> - http://koji.fedoraproject.org/packages/kernel/2.6.24.4/63.fc8/ > > I went for this one. > We are talking about the OHCI 1.0 machine, I guess. I.e. TSB12LV23 > For the camera, after removing firewire-ohci and > re-inserting it, I see the following: > > ACPI: PCI Interrupt 0000:02:09.0[A] -> GSI 21 (level, low) -> IRQ 16 > firewire_ohci: Added fw-ohci device 0000:02:09.0, OHCI version 1.0 > firewire_core: created device fw0: GUID 08002856000008da, S400 > firewire_ohci: node ID not valid, new bus reset in progress > firewire_core: phy config: card 1, new root=ffc1, gap_count=5 > firewire_ohci: node ID not valid, new bus reset in progress > firewire_core: giving up on config ROM for node id ffc0 (returned 17) > > No file creation in /sys/, that is only fw0 is there. > > With the SBP2: > > ACPI: PCI Interrupt 0000:02:09.0[A] -> GSI 21 (level, low) -> IRQ 16 > firewire_ohci: Added fw-ohci device 0000:02:09.0, OHCI version 1.0 > firewire_core: created device fw0: GUID 08002856000008da, S400 > firewire_ohci: node ID not valid, new bus reset in progress > > and it stays here, without going any further. > > No file in /sys/, except fw0. > > Hope this helps, Please also provide kernel messages from the following: # echo -1 > /sys/module/firewire_ohci/parameters/debug Then plug in the camera, then the disk (only one at a time). Thanks,
(In reply to comment #43) > > We are talking about the OHCI 1.0 machine, I guess. > > I.e. TSB12LV23 Yep, exactly. > Please also provide kernel messages from the following: > > # echo -1 > /sys/module/firewire_ohci/parameters/debug > > Then plug in the camera, then the disk (only one at a time). For the camera I can see: AR evt_bus_reset, link internal AR evt_bus_reset, link internal firewire_ohci: node ID not valid, new bus reset in progress AR evt_bus_reset, link internal firewire_ohci: 2 selfIDs, generation 4 selfID 0: 807f0882, phy 0 [p..] S100 gc=63 +0W Lci selfID 0: 817f8c74, phy 1 [-c-] S400 gc=63 -3W Lc firewire_core: phy config: card 0, new root=ffc1, gap_count=5 AT ack_complete, phy config packet, 01c50000 AR evt_bus_reset, link internal AR evt_bus_reset, link internal firewire_ohci: node ID not valid, new bus reset in progress firewire_core: giving up on config ROM for node id ffc0 (returned 17) Detaching the camera: AR evt_bus_reset, link internal firewire_ohci: 1 selfIDs, generation 7 selfID 0: 807f8c56, phy 0 [---] S400 gc=63 -3W Lci Attaching the SBP2 (this is the Symbios one, I've also 2 from Initio I did not tested since long time): AR evt_bus_reset, link internal AR evt_bus_reset, link internal firewire_ohci: node ID not valid, new bus reset in progress That's it, here it stops. pgs
Reply-To: stefanr@s5r6.in-berlin.de (quoting in full for linux1394-devel) bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=4172 > ------- Comment #44 from piergiorgio.sartor@nexgo.de 2008-03-29 04:53 > ------- > (In reply to comment #43) > >>> We are talking about the OHCI 1.0 machine, I guess. >> I.e. TSB12LV23 > > Yep, exactly. > >> Please also provide kernel messages from the following: >> >> # echo -1 > /sys/module/firewire_ohci/parameters/debug >> >> Then plug in the camera, then the disk (only one at a time). > > For the camera I can see: > > AR evt_bus_reset, link internal > AR evt_bus_reset, link internal > firewire_ohci: node ID not valid, new bus reset in progress > AR evt_bus_reset, link internal > firewire_ohci: 2 selfIDs, generation 4 > selfID 0: 807f0882, phy 0 [p..] S100 gc=63 +0W Lci > selfID 0: 817f8c74, phy 1 [-c-] S400 gc=63 -3W Lc > firewire_core: phy config: card 0, new root=ffc1, gap_count=5 > AT ack_complete, phy config packet, 01c50000 > AR evt_bus_reset, link internal > AR evt_bus_reset, link internal > firewire_ohci: node ID not valid, new bus reset in progress > firewire_core: giving up on config ROM for node id ffc0 (returned 17) > > Detaching the camera: > > AR evt_bus_reset, link internal > firewire_ohci: 1 selfIDs, generation 7 > selfID 0: 807f8c56, phy 0 [---] S400 gc=63 -3W Lci > > Attaching the SBP2 (this is the Symbios one, I've also 2 from Initio I did > not > tested since long time): > > AR evt_bus_reset, link internal > AR evt_bus_reset, link internal > firewire_ohci: node ID not valid, new bus reset in progress > > That's it, here it stops. Thanks. Strange that - no further self ID complete event follows (or at least, is not detected by the driver) after the "node ID not valid" condition, - it happens after the 2nd or the 1st "node ID not valid" condition, depending on whether the disk or the camera is connected. The disk is perhhaps one of the kind which internally has two PHYs, so this may be a possible cause for the difference to the camera. There are always(?) two evt_bus_reset before "node ID not valid", so we may have perhaps missed a proper self ID complete event due to some timing(?) issue.
Re comment 45: We surely didn't miss events. Rather, the chip seems to have a quirk regarding the NodeID.iDValid bit. Re previous comments: We totally forgot to ask Michael for the lspci output of the card. The FireWireDirect cards which are listed in linux1394.org's database are TI TSB12LV26 based though which is the successor to TSB12LV23. Newer TI 1394 controller generations (TSB43AB22, TSB82AA2...) all have another bug related to how they report bus resets to software (bus_reset_packet_quirk flag in fw-ohci.c), so it'd be no surprise if TSB12LV23 and TSB12LV26 got it even less right. Since these chips seemed to work under Linux 2.4, there may be hope. Perhaps the 2.4 kernel had higher scheduling latency of ohci1394's self ID complete event handler. So one idea to try would be to re-schedule the tasklet a for a number of retries in hope that iDValid eventually lights up. piergiorgio, would it make sense if I prepared such a patch for you?
Ping, Piergiorgio!
Ops, I think I miss some emails here, thanks for the remainder. Anyway, it seems to me I was able to use dvgrab successfully, maybe with the new PC. I was hit by the 4GiB HW bug of the chipset, but since this was fixed, everything was quite smooth. So, I guess a patch for this issue is no use for me. Thanks again, anyway, bye, pg