Bug 31962

Summary: cx88-blackbird broken (since 2.6.37 ?)
Product: v4l-dvb Reporter: Andi Huber (hobrom)
Component: cx88Assignee: v4l-dvb_cx88
Status: CLOSED CODE_FIX    
Severity: high CC: andrew.walker27, damoxc, florian, jrnieder, mchehab, whiteulver
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.38 Subsystem:
Regression: No Bisected commit-id:
Attachments: patch that fixes cx88_blackbird driver lock issues
patch from http://thread.gmane.org/gmane.linux.drivers.video-input-infrastructure/31187

Description Andi Huber 2011-03-27 16:09:39 UTC
[Symptom]
Processes that try to open a cx88-blackbird driven MPEG device will hang up.

[Cause]
Nested mutex_locks (which are not allowed) result in a deadlock.

[Details]
There has been resent work on removing BKL (BigKernelLock) calls from kernel code. (see http://kernelnewbies.org/BigKernelLock) This was not properly done for the cx88-blackbird driver:

Source-File: drivers/media/video/cx88/cx88-blackbird.c
Function: int mpeg_open(struct file *file)
Problem: the calls to  drv->request_acquire(drv); and drv->request_release(drv); will hang because they try to lock a mutex that has already been locked by a previouse call to mutex_lock(&dev->core->lock) ...

1050 static int mpeg_open(struct file *file)
1051 {
[...]
1060         mutex_lock(&dev->core->lock);         // MUTEX LOCKED !!!!!!!!!!!!!!!!
1061
1062         /* Make sure we can acquire the hardware */
1063         drv = cx8802_get_driver(dev, CX88_MPEG_BLACKBIRD);
1064         if (drv) {
1065                 err = drv->request_acquire(drv);  // HANGS !!!!!!!!!!!!!!!!!!!
1066                 if(err != 0) {
1067                         dprintk(1,"%s: Unable to acquire hardware, %d\n", __func__, err);
1068                         mutex_unlock(&dev->core->lock);;
1069                         return err;
1070                 }
1071         }
[...]

Here's the relevant kernel log extract (Linux version 2.6.38-1-amd64 (Debian 2.6.38-1)) ...

Mar 24 21:25:10 xen kernel: [  241.472067] INFO: task v4l_id:1000 blocked for more than 120 seconds.
Mar 24 21:25:10 xen kernel: [  241.478845] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 24 21:25:10 xen kernel: [  241.482412] v4l_id          D ffff88006bcb6540     0  1000      1 0x00000000
Mar 24 21:25:10 xen kernel: [  241.486031]  ffff88006bcb6540 0000000000000086 ffff880000000001 ffff88006981c380
Mar 24 21:25:10 xen kernel: [  241.489694]  0000000000013700 ffff88006be5bfd8 ffff88006be5bfd8 0000000000013700
Mar 24 21:25:10 xen kernel: [  241.493301]  ffff88006bcb6540 ffff88006be5a010 ffff88006bcb6540 000000016be5a000
Mar 24 21:25:10 xen kernel: [  241.496766] Call Trace:
Mar 24 21:25:10 xen kernel: [  241.500145]  [<ffffffff81321c4a>] ? __mutex_lock_common+0x127/0x193
Mar 24 21:25:10 xen kernel: [  241.503630]  [<ffffffff81321d82>] ? mutex_lock+0x1a/0x33
Mar 24 21:25:10 xen kernel: [  241.507145]  [<ffffffffa09dd155>] ? cx8802_request_acquire+0x66/0xc6 [cx8802]
Mar 24 21:25:10 xen kernel: [  241.510699]  [<ffffffffa0aab7f2>] ? mpeg_open+0x7a/0x1fc [cx88_blackbird]
Mar 24 21:25:10 xen kernel: [  241.514279]  [<ffffffff8123bfb6>] ? kobj_lookup+0x139/0x173
Mar 24 21:25:10 xen kernel: [  241.517856]  [<ffffffffa062d5fd>] ? v4l2_open+0xb3/0xdf [videodev]
Comment 1 Jonathan Nieder 2011-03-27 16:24:43 UTC
Thanks.  LKML thread: http://thread.gmane.org/gmane.linux.kernel/1118815
Comment 2 Andi Huber 2011-04-01 06:11:22 UTC
Created attachment 52902 [details]
patch that fixes cx88_blackbird driver lock issues

Ben Hutchings provided me with a patch that solved the deadlock during mpeg_open() but left some other lock issues unresolved. 

I could do the remaining work and fixed all issues I had since kernel 2.6.37. (Tested on a PC with 2 Hauppauge HVR1300 TV cards.) 

Everything works fine for me now.

The new patch (cx88-2.6.38-fix-driver-deadlocks.patch) is attached ...
Comment 3 andrew.walker27 2011-04-01 23:23:27 UTC
I know this isn't a help forum but I wanted to see if this patch works but I'm not sure how to use the patch.
I tried this

patch cx88-blackbird.c /home/fred/cx88-2.6.38-fix-driver-deadlocks.patch

but it didn't work. Am I doing this wrong?
Comment 4 Andi Huber 2011-04-02 00:02:55 UTC
(In reply to comment #3)
> I know this isn't a help forum but I wanted to see if this patch works but
> I'm
> not sure how to use the patch.
> I tried this
> 
> patch cx88-blackbird.c /home/fred/cx88-2.6.38-fix-driver-deadlocks.patch
> 
> but it didn't work. Am I doing this wrong?

Hi Andrew, you need to patch the kernel sources. If you have extracted them into a folder let's say /usr/src/linux-source-2.6.38/ change dir into this folder and execute 

patch -p1 < /home/fred/cx88-2.6.38-fix-driver-deadlocks.patch

This should work.

Andi.
Comment 5 andrew.walker27 2011-04-04 19:24:47 UTC
I tried patching but I get an error saying it was previously patched, I'm running Gentoo so maybe the patch is already installed. I'm wondering if I actually have the same bug as everyone here. The problem I have is no channels actually get detected on either my HVR-1300 or WinTV Nova-T although it goes through the motions without any errors. Is this associated with this bug or a different issue completely?
Comment 6 Damien Churchill 2011-04-06 21:30:59 UTC
You could try the patch in comment #147 from https://bugs.launchpad.net/mythtv/+bug/439163 with regards to that.
Comment 7 Jonathan Nieder 2011-04-06 21:46:23 UTC
Hi Damien,

Damien Churchill wrote:

> You could try the patch in comment #147 from
> https://bugs.launchpad.net/mythtv/+bug/439163 with regards to that.

The launchpad bug you reference seems to be about something else.

 (1) It is from 2009-09-29, way before the BKL conversion
 (2) It is about a card being recognized incorrectly rather than
     hangs.

Are you sure you have the right bug?  If so, have you tried the
patches at [1] (which seem to work ok)?  Naturally I'd be very
interested in problems with in the patch series (regressions or
locking problems the patch missed).

[1] http://thread.gmane.org/gmane.linux.drivers.video-input-infrastructure/31187
Comment 8 Damien Churchill 2011-04-06 22:04:43 UTC
(In reply to comment #7)
> Are you sure you have the right bug?  If so, have you tried the
> patches at [1] (which seem to work ok)?  Naturally I'd be very
> interested in problems with in the patch series (regressions or
> locking problems the patch missed).
> 

Sorry I was replying to the person above me, I should have made that clearer. He has the same card as I do and seems to experience a similar issue to me (no channels being found) and I discovered that patch whilst hunting around earlier.

I was also experiencing this bug until I applied the attached patch and it has successfully fixed it so can confirm the patch works well on a Hauppauge HVR1300.
Comment 9 Jonathan Nieder 2011-04-06 22:26:44 UTC
Created attachment 53722 [details]
patch from http://thread.gmane.org/gmane.linux.drivers.video-input-infrastructure/31187

Ah, thanks, Damien. I'm attaching a patch that squashes together the fixes from http://thread.gmane.org/gmane.linux.drivers.video-input-infrastructure/31187 for convenience. It's almost the same as Andi's patch but closes a few more races and keeps the reference count as a reference count to avoid breaking when multiple dvb frontends try concurrently to access the device.

The no-channels-found problem is probably somehow related to <https://bugzilla.kernel.org/show_bug.cgi?id=26962>.
Comment 10 Damien Churchill 2011-04-06 22:46:45 UTC
Excellent, I'll have to rebuild and give your patch a try.

I can say that removing the 4 lines as suggested in the Launchpad link (comment 147) does actually allow w_scan, scan and gstreamer all to work with my card now which is a definite improvement. Not having looked at the code I have no idea if it's a bad fix however.
Comment 11 Andi Huber 2011-04-06 22:51:23 UTC
Comment on attachment 52902 [details]
patch that fixes cx88_blackbird driver lock issues

premature and replaced by
https://bugzilla.kernel.org/attachment.cgi?id=53722
Comment 12 Jonathan Nieder 2011-05-25 02:20:50 UTC
This and related problems should be fixed by

 - 8a317a87 ([media] cx88: protect per-device driver list with device lock)
 - 1fe70e96 ([media] cx88: fix locking of sub-driver operations)
 - 1d6213ab ([media] cx88: hold device lock during sub-driver initialization)
 - 344d6c6b ([media] cx88: protect cx8802_devlist with a mutex)
 - 579b2b45 ([media] cx88: gracefully reject attempts to use unregistered
            cx88-blackbird driver)
 - f4bd4be8 ([media] cx88: don't use atomic_t for core->mpeg_users)

which have hit mainline (hoorah!). For the sake of people using old kernels: the regression was probably introduced in v2.6.37-rc1~64^2~350 (V4L/DVB: cx88: Remove BKL, 2010-09-15) or thereabouts (BKL removal). Anything older than that should be okay.
Comment 13 Andi Huber 2011-05-25 16:43:59 UTC
Many thanks to Jonathan! I'm closing this bugreport.
Comment 14 Florian Mickler 2011-05-30 07:20:52 UTC
A patch referencing this bug report has been merged in v3.0-rc1:

commit 1fe70e963028f34ba5e32488a7870ff4b410b19b
Author: Jonathan Nieder <jrnieder@gmail.com>
Date:   Sun May 1 06:29:37 2011 -0300

    [media] cx88: fix locking of sub-driver operations
Comment 15 whiteulver 2011-06-03 01:19:11 UTC
So in which kernel version will be fixed? 2.6.39?
Comment 16 Mauro Carvalho Chehab 2011-06-04 09:39:10 UTC
(In reply to comment #15)
> So in which kernel version will be fixed? 2.6.39?

Both 2.6.38 and 2.6.39 got the fix:

http://lwn.net/Articles/445974/
http://lwn.net/Articles/445972/