Bug 13808

Summary: "dv-grab" firewire camera causes kernel crashes in 64 bit kernels
Product: Drivers Reporter: Paul Johnson (paul)
Component: IEEE1394Assignee: drivers_ieee1394
Status: CLOSED CODE_FIX    
Severity: high CC: stefanr
Priority: P1    
Hardware: All   
OS: Linux   
URL: https://bugzilla.redhat.com/show_bug.cgi?id=471708
Kernel Version: 2.6.29.5-191.fc11.x86_64 #1 SMP Subsystem:
Regression: No Bisected commit-id:
Attachments: lsmod output from working Ubuntu box
lspci output from the working Ubuntu box
lsmod output from failing Fedora box
lspci output from the failing Fedora box

Description Paul Johnson 2009-07-22 08:25:38 UTC
I've had this problem a while, and the history is documented in Fedora bug 471708.  When I try to download digital video from my DV camera using FireWire I get dropped data.  Not long after I get kernel oops or my computer locks up (its normally very stable).

I'm running Fedora 64 bit on a quad Core 2 machine.  However another machine running 32 bit Ubuntu (dual P4) works fine.  Both machines are using the same TSB43AB23 chip, so I suspect the problem is in the 64 bit compilation of the driver for that chip.  Hence I'm posting the bug report here.
Comment 1 Stefan Richter 2009-07-22 18:13:24 UTC
> Not long after I get kernel oops or my computer locks up

Acquire the panic message and attach it here.

> I'm running Fedora 64 bit on a quad Core 2 machine.  However another machine
> running 32 bit Ubuntu (dual P4) works fine.  Both machines are using the same
> TSB43AB23 chip,

Is ohci1394 or firewire-ohci running on the Ubuntu box?
Comment 2 Paul Johnson 2009-07-23 14:37:37 UTC
Created attachment 22471 [details]
lsmod output from working Ubuntu box

This shows the Ubuntu box is running ohci1394.
Comment 3 Paul Johnson 2009-07-23 14:38:33 UTC
Created attachment 22472 [details]
lspci output from the working Ubuntu box
Comment 4 Paul Johnson 2009-07-23 14:42:48 UTC
Created attachment 22473 [details]
lsmod output from failing Fedora box

This shows that the Fedora box is running firewire-ohci.
Comment 5 Paul Johnson 2009-07-23 14:43:18 UTC
Created attachment 22474 [details]
lspci output from the failing Fedora box
Comment 6 Stefan Richter 2009-12-30 11:34:46 UTC
Sorry for catching up on this so late.

Would you be able to rebuild the Fedora kernel from source, and before that, patch it or edit the source code?  We have a patch pending for mainline kernel 2.6.33-rc* which may get you rid of the crash as a side effect:
http://git.kernel.org/?p=linux/kernel/git/ieee1394/linux1394-2.6.git;a=patch;h=090699c0530ae5380a9b8511d76f656cc437bb6e

In somewhat older kernels, the affected source file is named drivers/firewire/fw-ohci.c instead of drivers/firewire/ohci.c and the surrounding code may differ.  In any case, just remove the line
        ohci->use_dualbuffer = version >= OHCI_VERSION_1_1;
which is unique in the driver source.

There is a guide how to customize Fedora kernels:
http://fedoraproject.org/wiki/Building_a_custom_kernel
Comment 7 Stefan Richter 2010-01-24 18:08:42 UTC
Another reporter of this issue found out for me that TSB43AB23 is affected by the same issue as TSB43AB22/A.  We already have a workaround implemented for the latter.  I am now waiting for either that reporter testing a 2.6.33 kernel, or for Paul to respond.  (Paul, instead of my comment 6, you can also test a current kernel package from Rawhide which is now at a 2.6.33 pre-release version.)

The other report:
http://marc.info/?l=linux1394-user&m=126154279004083
And the corresponding confirmation of the type of hardware bug:
http://marc.info/?l=linux1394-user&m=126435541629095
Comment 8 Stefan Richter 2010-01-24 18:26:18 UTC
Likely another report of this issue:
https://bugzilla.redhat.com/show_bug.cgi?id=552142
Comment 9 Stefan Richter 2010-01-25 08:44:15 UTC
Another report, against Arch Linux 2.6.32 x86-64:
http://marc.info/?l=linux1394-user&m=126432246128386
Reporter confirmed that mem=2G works around the issue:
http://marc.info/?l=linux1394-user&m=126434369715731
Comment 10 Stefan Richter 2010-01-26 20:43:35 UTC
Proposed patch: http://lkml.org/lkml/2010/1/26/284
Comment 11 Stefan Richter 2010-02-09 17:11:28 UTC
Patch was merged in 2.6.32.8.  Kernel 2.6.33 should have been immune already since 2.6.33-rc1.  Please re-open the bug if I was wrong and the problem still exists in either 2.6.32.8 or later or 2.6.33-rc6 or later.