Bug 14255
Summary: | WARNING: at drivers/char/tty_io.c:1267 | ||
---|---|---|---|
Product: | Drivers | Reporter: | Rafael J. Wysocki (rjw) |
Component: | Other | Assignee: | drivers_other |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | ||
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.31 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 13615 |
Description
Rafael J. Wysocki
2009-09-29 21:44:13 UTC
On Friday 02 October 2009, Linus Torvalds wrote:
>
> On Thu, 1 Oct 2009, Rafael J. Wysocki wrote:
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14255
> > Subject : WARNING: at drivers/char/tty_io.c:1267
> > Submitter : Heinz Diehl <htd@fancy-poultry.org>
> > Date : 2009-09-20 11:37 (12 days old)
> > References : http://marc.info/?l=linux-kernel&m=125344629506309&w=4
> > http://lkml.org/lkml/2009/9/8/393
> > Handled-By : Linus Torvalds <torvalds@linux-foundation.org>
>
> So the real problem here is that horrible workqueue deadlock, but it turns
> out that I think that we should be able to safely do a
>
> cancel_delayed_work_sync(&tty->buf.work);
>
> in tty_ldisc_halt(), because cancel_delayed_work_sync() should never wait
> for any other work than the exact work in question. And the buf.work thing
> is flush_to_ldisc(), so waiting for _that_ is safe - the problematic thing
> was always waiting for the (unrelated) tty->hangup_work, which can (and
> does) take the semaphore for do_tty_hangup.
>
> So doing that synchronous version of the delayed work cancel means that we
> can now rest easy after tty_ldisc_halt(), and we don't need to worry about
> buf.work still being pending. We _do_ in general need to worry about
> hangup_work, which will call do_tty_hangup, which will call
> tty_ldisc_hangup, but that's actually the routine we are in right now, so
> for the _very_ special case of tty_ldisc_hangup that is a non-issue too.
>
> Did that sound subtle to you?
>
> It should.
>
> It's subtle as hell, and I don't like it, but I think that the two subtle
> rules above means that the following two-liner patch is safe - it can't
> cause any new deadlocks, and getting rid of a the flush_scheduled_work is
> safe in this one case.
>
> So please give it a whirl. I'm not happy about the subtlety, but I also
> hope that we'll get rid of that in the long run, so as a short-term hack
> this looks acceptable.
>
> To recap:
>
> - tty_ldisc_halt() _can_ be called under the ldisc_mutex, because while
> it waits for the work, it never waits for _other_ work, and buf.work
> itself doesn't need the ldisc_mutex. So no deadlock.
>
> - The flush_scheduled_work() after tty_ldisc_halt() is normally needed to
> not just flush the buf.work (which is now done by tty_ldisc_halt()
> itself), but to also make sure that there isn't any hangup work
> pending.
>
> So we can't remove that in general, and the other cases will still need
> to flush all scheduled work (and worry about deadlocks with
> ldisc_mutex). HOWEVER, in the special case of tty_ldisc_hangup() we
> know that we are inside the hangup work, and thus don't need to wait
> for ourselves, so we can just get rid of it there - just nowhere else.
>
> - The other cases of dropping the ldisc_mutex in the middle are
> long-standing, and have that TTY_LDISC_CHANGING vs TTY_HUPPED hackery
> to take care of the races that it opens. I'd love to get rid of that
> too, but they all seem to work. And they have never apparently
> triggered the WARN_ON in this bugzilla.
>
> I'm not proud of this patch, and I'm not signing off on it until the
> people who have seen this warning have tried it and report that it seems
> to work..
>
> Linus
>
> ---
> drivers/char/tty_ldisc.c | 7 ++-----
> 1 files changed, 2 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/char/tty_ldisc.c b/drivers/char/tty_ldisc.c
> index aafdbae..feb5507 100644
> --- a/drivers/char/tty_ldisc.c
> +++ b/drivers/char/tty_ldisc.c
> @@ -518,7 +518,7 @@ static void tty_ldisc_restore(struct tty_struct *tty,
> struct tty_ldisc *old)
> static int tty_ldisc_halt(struct tty_struct *tty)
> {
> clear_bit(TTY_LDISC, &tty->flags);
> - return cancel_delayed_work(&tty->buf.work);
> + return cancel_delayed_work_sync(&tty->buf.work);
> }
>
> /**
> @@ -756,12 +756,9 @@ void tty_ldisc_hangup(struct tty_struct *tty)
> * N_TTY.
> */
> if (tty->driver->flags & TTY_DRIVER_RESET_TERMIOS) {
> - /* Make sure the old ldisc is quiescent */
> - tty_ldisc_halt(tty);
> - flush_scheduled_work();
> -
> /* Avoid racing set_ldisc or tty_ldisc_release */
> mutex_lock(&tty->ldisc_mutex);
> + tty_ldisc_halt(tty);
> if (tty->ldisc) { /* Not yet closed */
> /* Switch back to N_TTY */
> tty_ldisc_reinit(tty);
>
>
|