Bug 12802
Summary: | General protection fault on rmmod cx8800 | ||
---|---|---|---|
Product: | v4l-dvb | Reporter: | Jean Delvare (jdelvare) |
Component: | cx88 | Assignee: | Jean Delvare (jdelvare) |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | marcin.slusarz |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.28.5 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
fix
Tested fix |
Description
Jean Delvare
2009-03-02 04:38:50 UTC
Created attachment 20416 [details]
fix
does attached patch fix this bug? In fact I already have a tested fix for this bug, I will attach it. I can test your fix as well, it is simpler, but I suspect it is still racy: ir->work could be run between del_timer_sync(&ir->timer) and cancel_work_sync(&ir->work), rearming the timer, couldn't it? Created attachment 20417 [details]
Tested fix
if I understand comment of cancel_work_sync, my fix should be ok too /** * cancel_work_sync - block until a work_struct's callback has terminated * @work: the work which is to be flushed * * Returns true if @work was pending. * * cancel_work_sync() will cancel the work if it is queued. If the work's * callback appears to be running, cancel_work_sync() will block until it * has completed. * * It is possible to use this function if the work re-queues itself. It can * cancel the work even if it migrates to another workqueue, however in that * case it only guarantees that work->func() has completed on the last queued * workqueue. * * cancel_work_sync(&delayed_work->work) should be used only if ->timer is not * pending, otherwise it goes into a busy-wait loop until the timer expires. * * The caller must ensure that workqueue_struct on which this work was last * queued can't be destroyed before this function returns. */ note that cancel_delayed_work_sync has to deal with the same problem... and it does: /** * cancel_delayed_work_sync - reliably kill off a delayed work. * @dwork: the delayed work struct * * Returns true if @dwork was pending. * * It is possible to use this function if @dwork rearms itself via queue_work() * or queue_delayed_work(). See also the comment for cancel_work_sync(). */ I tested your patch and it worked fine for me. However I am running a single-CPU system, so I can't guarantee that it would work as well on an SMP system. Just because I did not hit the race doesn't mean it's not there... The reasons why I am skeptical are: 1* While cancel_work_sync() is guaranteed to succeed on a work which re-queues itself, this isn't exactly what the original code (nor yours) does. Instead, the work arms a timer, and the timer in turn re-arms the work. I suspect the guarantee doesn't apply to this case. 2* Looking at __cancel_work_timer(), I see that it loops over trying to delete the timer and grab the pending work. This suggests that the first try could fail, while your own code does not include any such loop. So my implementation looks safer. Given that it has the added benefit of making the code more compact and simple, I think we should go with it. I agree. Shouldn't schedule_delayed_work (in cx88_ir_start) be called with second parameter set to 0? It's not a bug, but change in behavior. Good point. I don't think it makes any difference in practice, but 0 is equivalent to the original code. I'll do that. hmmm, lots of drivers use del_timer_sync() && flush_scheduled_work(); or del_timer_sync() && cancel_work_sync(); lots of them are probably broken... I already have patches for cx88, ir-kbd-i2c, em28xx and saa6588. I hope to have them merged in the v4l-dvb tree soon, and from there pushed to Linus. If there are other drivers out there doing the same then they indeed probably need a similar fix. Patch went upstream in kernel 2.6.30: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=569b7ec73abf576f9a9e4070d213aadf2cce73cb Also in stable kernels 2.6.29.2 and 2.6.27.25. |