Bug 4172 - firewire: video camera not recognized in 2.6; works in 2.4
Summary: firewire: video camera not recognized in 2.6; works in 2.4
Status: CLOSED UNREPRODUCIBLE
Alias: None
Product: Drivers
Classification: Unclassified
Component: IEEE1394 (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Stefan Richter
URL:
Keywords:
: 6945 (view as bug list)
Depends on:
Blocks:
 
Reported: 2005-02-05 14:46 UTC by Michael Elbaum
Modified: 2010-01-06 21:49 UTC (History)
3 users (show)

See Also:
Kernel Version: 2.6
Subsystem:
Regression: No
Bisected commit-id:


Attachments
various outputs of /var/log/messages (7.92 KB, text/plain)
2005-02-05 14:46 UTC, Michael Elbaum
Details
details from 2.6.12 (4.73 KB, text/plain)
2005-08-27 14:18 UTC, Michael Elbaum
Details
details from 2.4.28 (2.25 KB, text/plain)
2005-08-27 14:19 UTC, Michael Elbaum
Details

Description Michael Elbaum 2005-02-05 14:46:02 UTC
Distribution: Gentoo
Hardware Environment: IBM Thinkpad T42, HP Omnibook 6100
Software Environment: Gnome
Problem Description: 

When I plug in a digital video camera (JVC GR-DVL9800) it appears on the bus
when using a 2.4 kernel, but not in 2.6 (neither devfs nor udev). I get an error
in /var/log/messages:
Feb  4 16:32:24 localhost ohci1394: fw-host0: SelfID received, but NodeID
invalid (probably new bus reset occurred): 0000FFC0

I recorded all sorts of information around that, and I'll attach it as a file.
The kernel seems to recognize my FW card alright (from firewiredirect) but not
the camera. I'd expect the camera itself, but in 2.4 it works perfectly, at
least as root. I can use kino, dvcont, or whatever there.

I tried the stock kernel 2.6.10 to make sure it's not a Gentoo issue.


Steps to reproduce:
Comment 1 Michael Elbaum 2005-02-05 14:46:59 UTC
Created attachment 4521 [details]
various outputs of /var/log/messages
Comment 2 Dan Dennedy 2005-02-11 07:56:58 UTC
Code was recently added to ieee1394-2.6 to auto-load protocol modules for DV
cameras. It might appear in kernel 2.6.11. 

Most likely the camera is being recognized and active on the bus despite the
warning. On kernel 2.6, there is not /proc/bus/ieee1394/devices; you can look in
/sys/bus/ieee1394. Only root has read/write permissions to /dev/raw1394 which is
perhaps why dvcont fails as a normal user. You should look at how to configure
permissions for udev; hopefully, distros will start taking care of this.

The Oops is not cool, but I do not have a way to test card hotplugging.
Comment 3 Michael Elbaum 2005-02-11 12:50:57 UTC
Thanks for the update. I snooped around /sys/bus/ but didn't find anything. (I'm
in 2.6.8-gentoo devfs at the moment.) I tried:
grep -r 008088 ieee1394/
and it came up empty. 008088 seems to be JVC's ID. I do find entries this way
for the fw card itself, by grepping for 0015601.
Is there something else I should do? 
or should I just sit tight and wait for 2.6.11?
Comment 4 Dan Dennedy 2005-02-11 13:24:49 UTC
There is no reason to wait for 2.6.11. When I wrote "It might appear in 2.6.11"
I am referring to the new code, which is really just pure convenience, not your
camera!

I am disappointed you did not respond about permissions to /dev/raw1394. I want
you to make sure raw1394 is loaded. Then, as root, run 'dvcont status' and tell
me what happens.

Please send the output of 'ls /sys/bus/ieee1394/devices'
Comment 5 Michael Elbaum 2005-02-11 13:33:25 UTC
Sorry. I'm in devfs, not udev. Of course I've been trying all this as root.
Could there be a console permission problem? I'm working as a user in X, su'd to
root.

localhost bus # uname -a
Linux localhost 2.6.8-gentoo-r3 #1 Wed Feb 2 21:51:05 IST 2005 i686 Intel(R)
Pentium(R) M processor 1.70GHz GenuineIntel GNU/Linux
localhost bus # whoami
root
localhost bus # lsmod |grep 1394
dv1394                 18508  0
raw1394                26668  0
video1394              15692  0
ohci1394               30724  2 dv1394,video1394
ieee1394               93684  5 dv1394,raw1394,video1394,ohci1394,sbp2
localhost bus # ls -l /dev/raw1394
crw-------  1 root root 171, 0 Jan  1  1970 /dev/raw1394
localhost bus # ls /sys/bus/ieee1394/devices/
00015601000002c1  fw-host0
localhost bus # dvcont status
Could not find any AV/C devices on the 1394 bus.
Comment 6 Dan Dennedy 2005-02-11 13:49:01 UTC
OK, thank you, I am now convinced you have a genuine problem internal to
ieee1394 and not operator error.

One more thing. Please show me:
cat /sys/bus/ieee1394/devices/fw-host0/node_count
cat /sys/bus/ieee1394/devices/fw-host0/nodes_active

You feel like trying an updated ieee1394? It is simple to do if you are already
compiling your own kernel.

cd /path/to/src/linux/drivers
wget http://www.linux1394.org/viewcvs/ieee1394/trunk.tar.gz?view=tar
mv ieee1394 ieee1394-orig
tar xvzf trunk.tar.gz\?view\=tar
mv trunk ieee1394
cd ..
make
make modules_install
reboot
Comment 7 Michael Elbaum 2005-02-11 13:59:28 UTC
Here's the output. I have the camera turned on and plugged into the pcmcia card.
I'll try the updated 1394 tommorrow. (it's midnight here) Does it matter on
which kernel (.8,.9,.10)? I'm still using 2.6.8 because I've lost suspend to
disk in the more recent ones, and the usb disk-on-key in 2.6.10. Those are
separate bugs, though.

Thanks for your help.


localhost bus # cat /sys/bus/ieee1394/devices/fw-host0/node_count
0
localhost bus # cat /sys/bus/ieee1394/devices/fw-host0/nodes_active
1
Comment 8 Michael Elbaum 2005-02-12 12:52:51 UTC
bad news... the make fails, with output:


  LD      drivers/ieee1394/built-in.o
  CC [M]  drivers/ieee1394/ieee1394_core.o
  CC [M]  drivers/ieee1394/ieee1394_transactions.o
  CC [M]  drivers/ieee1394/hosts.o
  CC [M]  drivers/ieee1394/highlevel.o
drivers/ieee1394/highlevel.c:46: warning: type defaults to `int' in declaration
of `DEFINE_RWLOCK'
drivers/ieee1394/highlevel.c:46: warning: parameter names (without types) in
function declaration
drivers/ieee1394/highlevel.c:48: warning: type defaults to `int' in declaration
of `DEFINE_RWLOCK'
drivers/ieee1394/highlevel.c:48: warning: parameter names (without types) in
function declaration
drivers/ieee1394/highlevel.c: In function `hpsb_register_highlevel':
drivers/ieee1394/highlevel.c:224: error: `hl_irqs_lock' undeclared (first use in
this function)
drivers/ieee1394/highlevel.c:224: error: (Each undeclared identifier is reported
only once
drivers/ieee1394/highlevel.c:224: error: for each function it appears in.)
drivers/ieee1394/highlevel.c: In function `__unregister_host':
drivers/ieee1394/highlevel.c:253: error: `addr_space_lock' undeclared (first use
in this function)
drivers/ieee1394/highlevel.c: In function `hpsb_unregister_highlevel':
drivers/ieee1394/highlevel.c:287: error: `hl_irqs_lock' undeclared (first use in
this function)
drivers/ieee1394/highlevel.c: In function `hpsb_allocate_and_register_addrspace':
drivers/ieee1394/highlevel.c:340: error: `addr_space_lock' undeclared (first use
in this function)
drivers/ieee1394/highlevel.c: In function `hpsb_register_addrspace':
drivers/ieee1394/highlevel.c:399: error: `addr_space_lock' undeclared (first use
in this function)
drivers/ieee1394/highlevel.c: In function `hpsb_unregister_addrspace':
drivers/ieee1394/highlevel.c:433: error: `addr_space_lock' undeclared (first use
in this function)
drivers/ieee1394/highlevel.c: In function `highlevel_host_reset':
drivers/ieee1394/highlevel.c:526: error: `hl_irqs_lock' undeclared (first use in
this function)
drivers/ieee1394/highlevel.c: In function `highlevel_iso_receive':
drivers/ieee1394/highlevel.c:539: error: `hl_irqs_lock' undeclared (first use in
this function)
drivers/ieee1394/highlevel.c: In function `highlevel_fcp_request':
drivers/ieee1394/highlevel.c:553: error: `hl_irqs_lock' undeclared (first use in
this function)
drivers/ieee1394/highlevel.c: In function `highlevel_read':
drivers/ieee1394/highlevel.c:569: error: `addr_space_lock' undeclared (first use
in this function)
drivers/ieee1394/highlevel.c: In function `highlevel_write':
drivers/ieee1394/highlevel.c:611: error: `addr_space_lock' undeclared (first use
in this function)
drivers/ieee1394/highlevel.c: In function `highlevel_lock':
drivers/ieee1394/highlevel.c:653: error: `addr_space_lock' undeclared (first use
in this function)
drivers/ieee1394/highlevel.c: In function `highlevel_lock64':
drivers/ieee1394/highlevel.c:682: error: `addr_space_lock' undeclared (first use
in this function)
drivers/ieee1394/highlevel.c: At top level:
drivers/ieee1394/highlevel.c:48: warning: `DEFINE_RWLOCK' declared `static' but
never defined
make[2]: *** [drivers/ieee1394/highlevel.o] Error 1
make[1]: *** [drivers/ieee1394] Error 2
make: *** [drivers] Error 2


I tried this under the Gentoo 2.6.9-r13 and stock 2.6.10 kernel sources, and
also tried make mrproper when it didn't work the first time.
Comment 9 Dan Dennedy 2005-02-12 15:12:40 UTC
I started looking for the answer to your question about version of kernel needed
for ieee1394 trunk. Then, I was interrupted and delayed. Sorry, it requires
2.6.11. I understand if you prefer to wait for a general release, but I still
find the problem very unusual. It is not a general problem shared by DV camera
users. 
Comment 10 Michael Elbaum 2005-02-13 12:19:49 UTC
I will try it in 2.6.11 over the next couple days. I'd blame the camera itself,
except that it works fine in 2.4 kernels! it also works in windows but not in OS
X (10.2 at least).
Comment 11 Marko Kaiser 2005-02-14 00:52:30 UTC
Hello,

I have the same problem with an Apple isight on my amd64 system (Debian).
Kernel is 2.6.11-rc3 and I applied the above mentioned trunk version of the 
ieee1394 drivers.

Now a device /dev/video1394-0 is created. It should be /dev/video1394/0, 
shouldn't it? :) Coriander detects the card well (it has done it before).
video1394 and raw1394 modules are now loaded automatically by hotplug.
Comment 12 Dan Dennedy 2005-02-14 04:59:02 UTC
Marko, how do you know you have a problem; what is the symptom? You said hotplug
took care of loading raw1394 and video1394, so your camera must be recognized.
Not having your device detected is a very different problem than proper sysfs
and hotplug support.

Regarding your question about the video1393 dev file naming, that is the new
scheme based upon the 2.6 driver model. You must use a udev rule:
KERNEL="video1394*", NAME="video1394/%n"
to rename it to the scheme expected by libdc1394 and coriander.
Comment 13 Marko Kaiser 2005-02-14 12:26:42 UTC
Oh sorry I did not tell the whole story:
With kernel 2.6.10 or 2.6.9 the isight was recognized on the ieee1394 bus but
video1394 nor raw1394 modules were loaded by hotplug when I plug in the camera.
With your latest driver version this now works perfectly. So in the end I might
have had a different problem but it seems to be fixed now.
Comment 14 Michael Elbaum 2005-02-15 13:07:22 UTC
Sorry. I'm still stuck. I brought the 2.6.11_rc 2 kernel (from Gentoo's vanilla
sources) and compiled it with the new ieee1394 drivers as you suggested. I'm
using Gentoo's udev baselayout, which fills up /dev automatically. I get the
same NodeID invalid error when I plug in the camera. It complains a bit about
the fw card as well this time. Here is all the output I could think might be
relevant. The only major difference I saw was that the node_count is 1 now, not
0. I could try to get it back into devfs if you think that might help.


localhost root # uname -a
Linux localhost 2.6.11-rc2 #1 Mon Feb 14 22:52:59 IST 2005 i686 Intel(R)
Pentium(R) M processor 1.70GHz GenuineIntel GNU/Linux

localhost bus # whoami
root


tail /var/log/messages:

plug in FW card (firewiredirect.com)

Feb 15 22:37:55 localhost ohci1394: fw-host0: OHCI-1394 1.0 (PCI): IRQ=[11] 
MMIO=[40c04000-40c047ff]  Max Packet=[2048]
Feb 15 22:37:55 localhost ieee1394.agent[13624]: ... no drivers for IEEE1394
product 0x/0x/0x
Feb 15 22:37:56 localhost ieee1394: Host added: ID:BUS[0-00:1023] 
GUID[00015601000002c1]
Feb 15 22:37:56 localhost ieee1394.agent[13646]: ... no drivers for IEEE1394
product 0x/0x/0x
Feb 15 22:38:00 localhost wait_for_sysfs[13601]: either wait_for_sysfs (udev
045) needs an update to handle the device '/class/ieee1394_protocol/video1394-0'
properly (no device symlink) or the sysfs-support of your device's driver needs
to be fixed, please report to <linux-hotplug-devel@lists.sourceforge.net>
Feb 15 22:38:00 localhost wait_for_sysfs[13603]: either wait_for_sysfs (udev
045) needs an update to handle the device '/class/ieee1394_protocol/dv1394-0'
properly (no device symlink) or the sysfs-support of your device's driver needs
to be fixed, please report to <linux-hotplug-devel@lists.sourceforge.net>

plug in DV camera:

Feb 15 22:38:29 localhost ohci1394: fw-host0: SelfID received, but NodeID
invalid (probably new bus reset occurred): 0000FFC0


localhost bus # lsmod |grep 1394
dv1394                 18956  0
raw1394                27372  0
video1394              16460  0
ohci1394               30852  2 dv1394,video1394
ieee1394               90548  5 dv1394,raw1394,video1394,ohci1394,sbp2

localhost bus # ls -l /dev/raw/raw1394
crw-rw----  1 root disk 171, 0 Feb 16  2005 /dev/raw/raw1394
localhost grub # ls -l /dev/dv1394-0
crw-rw----  1 root root 171, 32 Feb 15 22:38 /dev/dv1394-0

localhost bus # ls /sys/bus/ieee1394/devices/
00015601000002c1  fw-host0

localhost bus # cat /sys/bus/ieee1394/devices/fw-host0/node_count
1
localhost bus # cat /sys/bus/ieee1394/devices/fw-host0/nodes_active
1
Comment 15 Stefan Richter 2005-07-24 02:12:51 UTC
On first look it appears like a problem in ohci1394. However, the code with the
NodeID check did not change from Linux 2.4 to 2.6, so there must be something
interfering from a higher level (e.g. bus reset from ieee1394) or lower level
(controller setup or PCI setup, interrupt handling...).

It seems the 'host->in_bus_reset' flag is switched differently in Linux 2.4 and 2.6.

---

A problem with similar symptoms (but perhaps with a different cause) was
reported for Linux 2.4.29 at linux1394-user in July 2005:
http://marc.theaimsgroup.com/?t=112030364200001
For this one, a simple workaround was found: The camera has to be switched on
and connected before ohci1394 is loaded.

---

Michael, open a separate bug for the other problem you saw if it is not resolved
yet:
> if I pull the whole card out, with the camera still plugged in, I get
...
> Unable to handle kernel NULL pointer dereference at virtual address 00000000
Comment 16 Michael Elbaum 2005-07-30 14:23:52 UTC
I did some more checking but the problem remains. I tried a 2.6.12 kernel
(Gentoo -r6) as well as .8 and .10. I read the bug you referenced and tried all
possible orders of booting without modules, plugging the camera in running or
not, and loading ohci1394. Amusingly, or not, when hald is installed but
stopped, ohci1394 and ieee1394 are still loaded automatically when I plug in the
card. (I took them out of modules.autoconf.d/.) raw1394 does not come in
automatically with hald running or not, probably because the system doesn't
recognize the device as a video camera. 

There might be a useful hint from hal about the bus reset. When hald and dbus
daemons are running and I unplug the camera (not the fw card) the usb bus also
resets. The fw is through pcmcia while the usb is built in.


Jul 30 23:44:17 localhost drivers/usb/input/hid-core.c: input irq status -75
received
Jul 30 23:44:17 localhost drivers/usb/input/hid-core.c: input irq status -84
received
Jul 30 23:44:17 localhost drivers/usb/input/hid-core.c: input irq status -84
received
Jul 30 23:44:17 localhost drivers/usb/input/hid-core.c: input irq status -84
received
Jul 30 23:44:17 localhost drivers/usb/input/hid-core.c: input irq status -84
received
Jul 30 23:44:17 localhost hub 3-0:1.0: port 2 disabled by hub (EMI?), re-enabling...
Jul 30 23:44:17 localhost usb 3-2: USB disconnect, address 2
Jul 30 23:44:17 localhost hal.hotplug[15991]: DEVPATH is not set
Jul 30 23:44:17 localhost usb 3-2: new low speed USB device using uhci_hcd and a
ddress 3
Jul 30 23:44:18 localhost input: USB HID v1.00 Mouse [Logitech USB-PS/2
Trackball] on usb-0000:00:1d.1-2
Jul 30 23:44:18 localhost hal.hotplug[16030]: DEVPATH is not set
Jul 30 23:44:19 localhost ohci1394: fw-host0: SelfID received, but NodeID
invalid (probably new bus reset occurred): 0000FFC0


At the end, I get back to the NodeID invalid. Updating udev (to -058) didn't help. 

I'll send another bug about the unplugging problem, as you request. I need to
recheck all the conditions precisely. In the meantime I still reboot to 2.4 to
see the camera. Is there something else to try? I don't know where to look for
the 'host->in_bus_reset' flag.
Comment 17 Stefan Richter 2005-07-30 15:07:35 UTC
> Amusingly, or not, when hald is installed but stopped, ohci1394
> and ieee1394 are still loaded automatically when I plug in the
> card.

I am not familiar with hald. CardBus cards are treated like PCI
cards. Card insertion triggers a hotplug event, and
/etc/hotplug/pci.agent may have been called.

> When hald and dbus daemons are running and I unplug the camera
> (not the fw card) the usb bus also resets. The fw is through
> pcmcia while the usb is built in.

Do the CardBus bridge and USB controller share an interrupt?
What is in /proc/interrupts?

> I don't know where to look for the 'host->in_bus_reset' flag.

I was just thinking out loud. It is a state variable in the
ieee1394 drivers.
Comment 18 Michael Elbaum 2005-07-31 10:02:36 UTC
Yes, you're right. they're on the same interrupt, [11]. So the whole thing resets.

localhost root # more /proc/interrupts
           CPU0
  0:    8403973          XT-PIC  timer
  1:       9100          XT-PIC  i8042
  2:          0          XT-PIC  cascade
  7:          2          XT-PIC  parport0
  9:       2524          XT-PIC  acpi
 11:     458933          XT-PIC  yenta, yenta, ehci_hcd, uhci_hcd, uhci_hcd,
uhci_hcd, ipw2200, Intel 82801DB-ICH4, eth0
 12:       1234          XT-PIC  i8042
 14:       9986          XT-PIC  ide0
 15:         11          XT-PIC  ide1
NMI:          0
ERR:          0
Comment 19 Stefan Richter 2005-07-31 10:40:48 UTC
Then it seems certain interrupt signals from the FireWire controller are always
routed to the wrong interrupt handler. There is nothing that the FireWire
drivers can do about this. It has to be fixed somewhere in the PCI subsystem or
below.

IRQ routing is obviously more successful under Linux 2.4 than under 2.6 on your
machine.
Comment 20 Stefan Richter 2005-07-31 10:43:45 UTC
PS: Shared interrupts are usually no problem; it puzzles me that they may still
be handled wrong in current Linux versions.
Comment 21 Michael Elbaum 2005-07-31 11:53:54 UTC
It's also puzzling that the same symptom appears on two very different machines,
HP Omnibook 6100 and IBM Thinkpad T42. Both are high-end laptops of their day.
The /proc/interrupts was from the IBM. The hard disk died Friday on the HP so
I'll check it again as soon as the replacement arrives. Perhaps I can fix the
problem in the BIOS (suggestions welcome!!!), and I didn't think to try it in
Knoppix. 
Comment 22 Michael Elbaum 2005-07-31 23:24:08 UTC
a little progress...

I moved as much as I could off of interrupt 11. Also I made a link manually from
the new udev style /dev/raw/raw1394 to /dev/raw1394. (Without that dvcont
doesn't see raw1394 loaded!) Then once, and only once, it actually worked. I was
using the 2.6.11_rc2 kernel. I had ieee1394, ohci1394, and raw1394 loaded by
hand. I was so pleased that I rebooted to try again... Nothing could restore the
success. I tried with the link to raw1394, with mknod, with the camera on and
off and all permutations of the order of loading modules. dvcont always replied
'Could not find...' Since once it did work, that's enough to prove the
possibility and maybe grounds for some optimism. Supposing that I'll get it to
work again, what other output would help in the diagnosis? 
Comment 23 Michael Elbaum 2005-08-27 14:18:42 UTC
Created attachment 5786 [details]
details from 2.6.12
Comment 24 Michael Elbaum 2005-08-27 14:19:54 UTC
Created attachment 5787 [details]
details from 2.4.28
Comment 25 Michael Elbaum 2005-08-27 14:21:17 UTC
a little more "progress": I found a procedure that seems to work, but only once
per boot! After that it refuses. I reboot, and it works again, once. I suspect
that the trouble is  in the new hotplug agent, ieee.hotplug via udevsend, rather
than hal.hotplug in 2.4.

The trick is NOT to load the 1394 modules, or to unload them specifically, and
then to modprobe video1394. That loads all the other modules and seems to work
much of the time. Killing udevd just beforehand seems to help too. As far as I
could see ieee1394 and raw1394 don't get in the way, and the problem centers on
ohci1394.

Even in 2.4, if I unplug or turn off the camera and want it to start again, I
have to rmmod all the modules (except ieee1394) beforehand.

I've attached 2 little records with /var/log/messages output.

does this help the diagnosis?

Comment 26 Stefan Richter 2006-05-04 11:15:21 UTC
Your logs show that it worked fine in Linux 2.4.20 but there are problems in
Linux 2.4.28. Is this perhaps a VIA VT6306 rev43 controller? (lspci)
We have a report that this controller worked with 2.4.20 but not with 2.4.21 and
later. http://marc.theaimsgroup.com/?l=linux1394-devel&t=114641862900002
Comment 27 Stefan Richter 2006-08-02 16:40:23 UTC
*** Bug 6945 has been marked as a duplicate of this bug. ***
Comment 28 piergiorgio.sartor 2006-11-30 13:10:55 UTC
Hi Stefan,
how are you?
How is it going with this bug?
As you already know, I've similar problem with an SBP2 device (Bug 6945, marked
as dup of this one), so I was wondering if there is any update or when we can
expect one.
Just curious, nothing more, since 2.6.19 is out, with a lot of 1394 update, but
the problem seems to be still there.
Was it supposed to be fixed?

Thanks,

bye
Comment 29 Stefan Richter 2006-12-01 01:35:35 UTC
I suspect that bug 6070 may play a role in this bug. I am currently slowly
working on bug 6070. 
Comment 30 Stefan Richter 2007-01-08 10:45:33 UTC
Re comment #28: Kristian H
Comment 31 Stefan Richter 2008-02-19 11:44:16 UTC
Michael, piergiorgio,
do you still have the hardware and give an update?  If possible, test with firewire-ohci in Linux 2.6.22 or (preferrably) a later kernel and post whether a file for the controller and two files for the camera or two files for the disk are created in /sys/bus/firewire/devices/.  I ask because firewire-ohci handles bus resets very differently from ohci1394, as mentioned.
Comment 32 piergiorgio.sartor 2008-02-19 12:42:44 UTC
(In reply to comment #31)
> Michael, piergiorgio,
> do you still have the hardware and give an update?  If possible, test with
> firewire-ohci in Linux 2.6.22 or (preferrably) a later kernel and post
> whether
> a file for the controller and two files for the camera or two files for the
> disk are created in /sys/bus/firewire/devices/.  I ask because firewire-ohci
> handles bus resets very differently from ohci1394, as mentioned.
> 

To be honest, I do not remember exactly what this bug was...
Anyway, I've right now two PCs with firewire, both with Fedora 8 up-to-date.
So, a quick check is with the Fedora kernel, 2.6.23.15-137.fc8 namely.

One PC is NVIDIA based, with TI OHCI 1.1 firewire, the other is an old intel based (P-III), with a TI PCI card, OHCI 1.0.

The OHCI 1.1 has two new files for the camera, plus the controller one, but, as per Fedora bug, it seems impossible to capture anything, while the control works.

On the OHCI 1.0, it seems nothing happens, like the camera was not connected. I'll have to double check this, anyway, if any new file appears in /sys/bus/firewire/devices.

SBP2 devices seems to work fine on the OHCI 1.1 PC (except for the libvolume_id story...), the other I'll have to check.

Do you recommend to test with vanilla kernels?
Or the Fedora ones are enough?

pg
Comment 33 Stefan Richter 2008-02-19 12:58:06 UTC
Fedora 8 kernels should be fine for purposes of this bug here, i.e. "SelfID received, but NodeID invalid", unless we would get to a point where there would be patches for you to test.

So this bug does at least not hit on your TI OHCI 1.1 + firewire-ohci.  On the other hand, you reported bug 6945 for the TI OHCI 1.0 controller.  I would like to know about that one:
  - messages in dmesg when firewire-ohci is loaded, and when the camera is plugged in/ switched on (and whether the sysfs files = actually symlinks appear);
  - whether the SBP-2 device works on the OHCI 1.0 controller; if not, same questions as about the camera.

Thanks.
Comment 34 piergiorgio.sartor 2008-02-19 13:54:28 UTC
(In reply to comment #33)
> Fedora 8 kernels should be fine for purposes of this bug here, i.e. "SelfID
> received, but NodeID invalid", unless we would get to a point where there
> would
> be patches for you to test.
> 
> So this bug does at least not hit on your TI OHCI 1.1 + firewire-ohci.  On
> the
> other hand, you reported bug 6945 for the TI OHCI 1.0 controller.  I would
> like
> to know about that one:
>   - messages in dmesg when firewire-ohci is loaded, and when the camera is
> plugged in/ switched on (and whether the sysfs files = actually symlinks
> appear);
>   - whether the SBP-2 device works on the OHCI 1.0 controller; if not, same
> questions as about the camera.

Here we go:

1) modprobe firewire-ohci (after rmmod, since the module is auto loaded)

ACPI: PCI Interrupt 0000:02:09.0[A] -> GSI 21 (level, low) -> IRQ 16
firewire_ohci: Added fw-ohci device 0000:02:09.0, OHCI version 1.0
firewire_core: created device fw0: GUID 08002856000008da, S400

2) camera on, the two new files appear in /sys/..., but:

firewire_ohci: node ID not valid, new bus reset in progress
firewire_core: created device fw1: GUID 08004601029441d8, S100
firewire_core: phy config: card 1, new root=ffc1, gap_count=5
firewire_ohci: node ID not valid, new bus reset in progress
firewire_core: phy config: card 1, new root=ffc1, gap_count=5
firewire_ohci: node ID not valid, new bus reset in progress
firewire_core: phy config: card 1, new root=ffc1, gap_count=5
firewire_ohci: node ID not valid, new bus reset in progress
firewire_core: phy config: card 1, new root=ffc1, gap_count=5
firewire_ohci: node ID not valid, new bus reset in progress
firewire_core: phy config: card 1, new root=ffc1, gap_count=5
firewire_ohci: node ID not valid, new bus reset in progress
firewire_core: phy config: card 1, new root=ffc1, gap_count=5
firewire_ohci: node ID not valid, new bus reset in progress
firewire_core: phy config: card 1, new root=ffc1, gap_count=5
firewire_ohci: node ID not valid, new bus reset in progress
...

and this goes on forever...

3) SBP2 HD plugged:

firewire_core: giving up on config rom for node id ffc0
firewire_ohci: node ID not valid, new bus reset in progress

this is stuck here, i.e. only these lines are printed and no new files appear in /sys/..., the device does not show up in any way.


Following is the lspci -vv of the 1394 for this PC:

02:09.0 FireWire (IEEE 1394): Texas Instruments TSB12LV23 IEEE-1394 Controller (prog-if 10 [OHCI])
        Subsystem: Texas Instruments Unknown device 8010
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 32 (750ns min, 1000ns max), Cache Line Size: 32 bytes
        Interrupt: pin A routed to IRQ 16
        Region 0: Memory at ed800000 (32-bit, non-prefetchable) [size=2K]
        Region 1: Memory at ed000000 (32-bit, non-prefetchable) [size=16K]
        Capabilities: [44] Power Management version 1
                Flags: PMEClk- DSI- D1- D2+ AuxCurrent=0mA PME(D0-,D1-,D2+,D3hot+,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
        Kernel driver in use: firewire_ohci
        Kernel modules: firewire-ohci


Final note. A couple of times, an hotplug of a firewire device resulted in a complete PC lock-up (with both PCs), without any log reported and with only a physical reset possible (no sysrq).

Hope this helps.

pg
Comment 35 Stefan Richter 2008-02-19 14:08:51 UTC
OK, then the new bus reset handling scheme does not work, plus there is no dependency on bug 6070.
Comment 36 Stefan Richter 2008-02-19 14:21:28 UTC
On the other hand, there is at least a brief window in which the bus is in so far operational that multiple transactions to the camera are successfully completed, leading to "created device fw1: GUID 08004601029441d8, S100".

With the HDD, that window does perhaps exist too but is evidently too short to complete reading the HDD's configuration ROM.
Comment 37 Stefan Richter 2008-02-19 14:26:24 UTC
Note to self:  Try long reset instead of short reset in the bus manager code?  The ieee1394 driver already uses a long reset in the IRM code though, so that might be futile.
Comment 38 Stefan Richter 2008-03-28 11:54:38 UTC
Note to self:  NodeID is a register in the SCLK domain.  See http://thread.gmane.org/gmane.linux.kernel.firewire.devel/11929/focus=11940
Comment 39 Stefan Richter 2008-03-28 13:30:04 UTC
Piergiorgio,
please check with one of the following kernels (whichever is easiest for you to install) whether the "node ID not valid" problem persists:
  - http://koji.fedoraproject.org/packages/kernel/2.6.24.4/63.fc8/
  - vanilla sources plus patchkit from
    http://me.in-berlin.de/~s5r6/linux1394/updates/
  - git cloned sources plus git pull from master branch of
    git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6.git
Comment 40 piergiorgio.sartor 2008-03-29 02:24:47 UTC
(In reply to comment #39)
> please check with one of the following kernels (whichever is easiest for you
> to
> install) whether the "node ID not valid" problem persists:
>   - http://koji.fedoraproject.org/packages/kernel/2.6.24.4/63.fc8/

I went for this one.
We are talking about the OHCI 1.0 machine, I guess.

For the camera, after removing firewire-ohci and
re-inserting it, I see the following:

ACPI: PCI Interrupt 0000:02:09.0[A] -> GSI 21 (level, low) -> IRQ 16
firewire_ohci: Added fw-ohci device 0000:02:09.0, OHCI version 1.0
firewire_core: created device fw0: GUID 08002856000008da, S400
firewire_ohci: node ID not valid, new bus reset in progress
firewire_core: phy config: card 1, new root=ffc1, gap_count=5
firewire_ohci: node ID not valid, new bus reset in progress
firewire_core: giving up on config ROM for node id ffc0 (returned 17)

No file creation in /sys/, that is only fw0 is there.

With the SBP2:

ACPI: PCI Interrupt 0000:02:09.0[A] -> GSI 21 (level, low) -> IRQ 16
firewire_ohci: Added fw-ohci device 0000:02:09.0, OHCI version 1.0
firewire_core: created device fw0: GUID 08002856000008da, S400
firewire_ohci: node ID not valid, new bus reset in progress

and it stays here, without going any further.

No file in /sys/, except fw0.

Hope this helps,

pg
Comment 41 piergiorgio.sartor 2008-03-29 02:26:15 UTC
One thing I forgot (as usual), do you need the output of /var/log/messages or dmesg is enough?

pg
Comment 42 Stefan Richter 2008-03-29 02:35:30 UTC
> do you need the output of /var/log/messages or dmesg is enough?

In this case, either one is OK.
Comment 43 Anonymous Emailer 2008-03-29 02:43:20 UTC
Reply-To: stefanr@s5r6.in-berlin.de

(Cc'ing linux1394-devel)

bugme-daemon@bugzilla.kernel.org wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=4172
> 
> ------- Comment #40 from piergiorgio.sartor@nexgo.de  2008-03-29 02:24
> -------
> (In reply to comment #39)
>> please check with one of the following kernels (whichever is easiest for you
>> to
>> install) whether the "node ID not valid" problem persists:
>>   - http://koji.fedoraproject.org/packages/kernel/2.6.24.4/63.fc8/
> 
> I went for this one.
> We are talking about the OHCI 1.0 machine, I guess.

I.e. TSB12LV23

> For the camera, after removing firewire-ohci and
> re-inserting it, I see the following:
> 
> ACPI: PCI Interrupt 0000:02:09.0[A] -> GSI 21 (level, low) -> IRQ 16
> firewire_ohci: Added fw-ohci device 0000:02:09.0, OHCI version 1.0
> firewire_core: created device fw0: GUID 08002856000008da, S400
> firewire_ohci: node ID not valid, new bus reset in progress
> firewire_core: phy config: card 1, new root=ffc1, gap_count=5
> firewire_ohci: node ID not valid, new bus reset in progress
> firewire_core: giving up on config ROM for node id ffc0 (returned 17)
> 
> No file creation in /sys/, that is only fw0 is there.
> 
> With the SBP2:
> 
> ACPI: PCI Interrupt 0000:02:09.0[A] -> GSI 21 (level, low) -> IRQ 16
> firewire_ohci: Added fw-ohci device 0000:02:09.0, OHCI version 1.0
> firewire_core: created device fw0: GUID 08002856000008da, S400
> firewire_ohci: node ID not valid, new bus reset in progress
> 
> and it stays here, without going any further.
> 
> No file in /sys/, except fw0.
> 
> Hope this helps,

Please also provide kernel messages from the following:

# echo -1 > /sys/module/firewire_ohci/parameters/debug

Then plug in the camera, then the disk (only one at a time).
Thanks,
Comment 44 piergiorgio.sartor 2008-03-29 04:53:29 UTC
(In reply to comment #43)

> > We are talking about the OHCI 1.0 machine, I guess.
> 
> I.e. TSB12LV23

Yep, exactly.

> Please also provide kernel messages from the following:
> 
> # echo -1 > /sys/module/firewire_ohci/parameters/debug
> 
> Then plug in the camera, then the disk (only one at a time).

For the camera I can see:

AR evt_bus_reset, link internal
AR evt_bus_reset, link internal
firewire_ohci: node ID not valid, new bus reset in progress
AR evt_bus_reset, link internal
firewire_ohci: 2 selfIDs, generation 4
selfID 0: 807f0882, phy 0 [p..] S100 gc=63 +0W Lci
selfID 0: 817f8c74, phy 1 [-c-] S400 gc=63 -3W Lc
firewire_core: phy config: card 0, new root=ffc1, gap_count=5
AT ack_complete, phy config packet, 01c50000
AR evt_bus_reset, link internal
AR evt_bus_reset, link internal
firewire_ohci: node ID not valid, new bus reset in progress
firewire_core: giving up on config ROM for node id ffc0 (returned 17)

Detaching the camera:

AR evt_bus_reset, link internal
firewire_ohci: 1 selfIDs, generation 7
selfID 0: 807f8c56, phy 0 [---] S400 gc=63 -3W Lci

Attaching the SBP2 (this is the Symbios one, I've also 2 from Initio I did not tested since long time):

AR evt_bus_reset, link internal
AR evt_bus_reset, link internal
firewire_ohci: node ID not valid, new bus reset in progress

That's it, here it stops.

pgs
Comment 45 Anonymous Emailer 2008-03-29 06:05:43 UTC
Reply-To: stefanr@s5r6.in-berlin.de

(quoting in full for linux1394-devel)

bugme-daemon@bugzilla.kernel.org wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=4172
> ------- Comment #44 from piergiorgio.sartor@nexgo.de  2008-03-29 04:53
> -------
> (In reply to comment #43)
> 
>>> We are talking about the OHCI 1.0 machine, I guess.
>> I.e. TSB12LV23
> 
> Yep, exactly.
> 
>> Please also provide kernel messages from the following:
>>
>> # echo -1 > /sys/module/firewire_ohci/parameters/debug
>>
>> Then plug in the camera, then the disk (only one at a time).
> 
> For the camera I can see:
> 
> AR evt_bus_reset, link internal
> AR evt_bus_reset, link internal
> firewire_ohci: node ID not valid, new bus reset in progress
> AR evt_bus_reset, link internal
> firewire_ohci: 2 selfIDs, generation 4
> selfID 0: 807f0882, phy 0 [p..] S100 gc=63 +0W Lci
> selfID 0: 817f8c74, phy 1 [-c-] S400 gc=63 -3W Lc
> firewire_core: phy config: card 0, new root=ffc1, gap_count=5
> AT ack_complete, phy config packet, 01c50000
> AR evt_bus_reset, link internal
> AR evt_bus_reset, link internal
> firewire_ohci: node ID not valid, new bus reset in progress
> firewire_core: giving up on config ROM for node id ffc0 (returned 17)
> 
> Detaching the camera:
> 
> AR evt_bus_reset, link internal
> firewire_ohci: 1 selfIDs, generation 7
> selfID 0: 807f8c56, phy 0 [---] S400 gc=63 -3W Lci
> 
> Attaching the SBP2 (this is the Symbios one, I've also 2 from Initio I did
> not
> tested since long time):
> 
> AR evt_bus_reset, link internal
> AR evt_bus_reset, link internal
> firewire_ohci: node ID not valid, new bus reset in progress
> 
> That's it, here it stops.

Thanks.  Strange that

   - no further self ID complete event follows (or at least, is not
     detected by the driver) after the "node ID not valid" condition,

   - it happens after the 2nd or the 1st "node ID not valid" condition,
     depending on whether the disk or the camera is connected.  The
     disk is perhhaps one of the kind which internally has two PHYs,
     so this may be a possible cause for the difference to the camera.

There are always(?) two evt_bus_reset before "node ID not valid", so we 
may have perhaps missed a proper self ID complete event due to some 
timing(?) issue.
Comment 46 Stefan Richter 2009-02-09 15:34:38 UTC
Re comment 45:  We surely didn't miss events.  Rather, the chip seems to have a quirk regarding the NodeID.iDValid bit.

Re previous comments:  We totally forgot to ask Michael for the lspci output of the card.  The FireWireDirect cards which are listed in linux1394.org's database are TI TSB12LV26 based though which is the successor to TSB12LV23.

Newer TI 1394 controller generations (TSB43AB22, TSB82AA2...) all have another bug related to how they report bus resets to software (bus_reset_packet_quirk flag in fw-ohci.c), so it'd be no surprise if TSB12LV23 and TSB12LV26 got it even less right.

Since these chips seemed to work under Linux 2.4, there may be hope.  Perhaps the 2.4 kernel had higher scheduling latency of ohci1394's self ID complete event handler.  So one idea to try would be to re-schedule the tasklet a for a number of retries in hope that iDValid eventually lights up.  piergiorgio, would it make sense if I prepared such a patch for you?
Comment 47 Erik Andr 2010-01-05 20:36:51 UTC
Ping, Piergiorgio!
Comment 48 piergiorgio.sartor 2010-01-06 17:41:35 UTC
Ops, I think I miss some emails here, thanks for the remainder.

Anyway, it seems to me I was able to use dvgrab successfully, maybe with the new PC.

I was hit by the 4GiB HW bug of the chipset, but since this was fixed, everything was quite smooth.

So, I guess a patch for this issue is no use for me.

Thanks again, anyway,

bye,

pg

Note You need to log in before you can comment on or make changes to this bug.