Bug 11234 - kernel is deadlocked due to tty->termios_mutex locked twice in one procedure
Summary: kernel is deadlocked due to tty->termios_mutex locked twice in one procedure
Status: CLOSED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Console/Framebuffers (show other bugs)
Hardware: All Linux
: P1 high
Assignee: Alan
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-08-02 12:07 UTC by kouyu
Modified: 2008-09-16 08:44 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.26
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
Proposed fix (1.39 KB, patch)
2008-08-04 04:24 UTC, Alan
Details | Diff

Description kouyu 2008-08-02 12:07:58 UTC
Latest working kernel version:2.6.25
Earliest failing kernel version:2.6.26
Distribution:Gentoo
Hardware Environment:
Software Environment:
Problem Description:
Function tiocswinsz in drivers/char/tty_io.c calls mutex_lock(&tty->termios_mutex), 
then tiocswinsz calls vc_lock_resize with tty->driver_data as first argument when tty->termios_mutex
is locking.

The latter(vc_lock_resize in drivers/char/vt.c) then calls vc_resize which calls mutex_lock(&vc->vc_tty->termios_mutex).
vc is its first argument which is tty->driver_data. vc->vc_tty is the tty of tty->driver_data, is also
the tty of tty->termios_mutex locked in tiocswinsz.

So the procedure is like this: tiocswinsz locks a mutex, and then calls another function(vc_resize finally), the latter locks the same mutex, and, oops, kernel is deadlocked then!

I saw the code vt.c under version 2.6.25, there was no mutex_lock(&vc->vc_tty->termios_mutex) in vc_resize.
This is new in 2.6.26.


Steps to reproduce:
int fd = open ("/dev/tty0", O_RDWR);
strunct winsize ws;
ws.ws_col = col; /*different from current col*/
ws.ws_row = row; /*different from current row*/
ioctl(fd, TIOCSWINSZ, &ws);
Comment 1 kouyu 2008-08-02 12:15:59 UTC
Now I remove mutex_lock(&vc->vc_tty->termios_mutex) and mutex_unlock in vc_resize in drivers/char/vt.c. And the programs, which need set a different window size in vt, work correctly.
Comment 2 Andrew Morton 2008-08-02 12:20:09 UTC
err, Alan?
Comment 3 Alan 2008-08-03 06:01:01 UTC
Erk everyone must be using tools that exercise the VT_RESIZE path only.

Will sort that out Monday
Comment 4 Alan 2008-08-03 12:18:27 UTC
Oh my brain hurts

It doesn't deadlock for me although we clearly take the mutex twice. I've got another report where someone is seeing races that can't occur here and the real fix is going to be foul.

Seems we have a lack of sanity check on mutexes so that recursive mutex_lock *works* (but unlocks wrongly) if CONFIG_PREEMPT is set.

So that explains why nobody found the bug and how to work around it for the moment.
Comment 5 kouyu 2008-08-04 02:02:31 UTC
You mean setting CONFIG_PREEMPT to avoid this problem is a temporary method, don't you?

I checked my kernel config just right now. That's right, I set CONFIG_PREEMPT_VOLUNTARY, not CONFIG_PREEMPT.

I'll try that *method*, and tell you the result.
Comment 6 Alan 2008-08-04 04:23:01 UTC
Proposed patch follows. Its very ugly but will do for the short term and the code gets cleaned up in some further patches in testing for .28
Comment 7 Alan 2008-08-04 04:24:13 UTC
Created attachment 17075 [details]
Proposed fix
Comment 8 Alan 2008-08-04 13:15:42 UTC
Full fix turns out to be ghastly - sent Linus a partial revert to go back to the race condition. I've got a rework of this area lined up for -next and .28.
Comment 9 kouyu 2008-08-04 18:31:42 UTC
Thank you Alan for response.

I just tried set CONFIG_PREEMPT, and recompiled kernel, and the problem still existed.

You mean this bug will be fixed in .28, and now I have to use that proposed patch only, don't you?
Comment 10 Alan 2008-08-15 01:53:24 UTC
Under discussion. Linus has said he'll look at it for 2.6.27
Comment 11 Alan 2008-09-16 08:44:14 UTC
Short term fixes pushed upstream, longer term ones for 2.6.28

Note You need to log in before you can comment on or make changes to this bug.