Latest working kernel version:2.6.25 Earliest failing kernel version:2.6.26 Distribution:Gentoo Hardware Environment: Software Environment: Problem Description: Function tiocswinsz in drivers/char/tty_io.c calls mutex_lock(&tty->termios_mutex), then tiocswinsz calls vc_lock_resize with tty->driver_data as first argument when tty->termios_mutex is locking. The latter(vc_lock_resize in drivers/char/vt.c) then calls vc_resize which calls mutex_lock(&vc->vc_tty->termios_mutex). vc is its first argument which is tty->driver_data. vc->vc_tty is the tty of tty->driver_data, is also the tty of tty->termios_mutex locked in tiocswinsz. So the procedure is like this: tiocswinsz locks a mutex, and then calls another function(vc_resize finally), the latter locks the same mutex, and, oops, kernel is deadlocked then! I saw the code vt.c under version 2.6.25, there was no mutex_lock(&vc->vc_tty->termios_mutex) in vc_resize. This is new in 2.6.26. Steps to reproduce: int fd = open ("/dev/tty0", O_RDWR); strunct winsize ws; ws.ws_col = col; /*different from current col*/ ws.ws_row = row; /*different from current row*/ ioctl(fd, TIOCSWINSZ, &ws);
Now I remove mutex_lock(&vc->vc_tty->termios_mutex) and mutex_unlock in vc_resize in drivers/char/vt.c. And the programs, which need set a different window size in vt, work correctly.
err, Alan?
Erk everyone must be using tools that exercise the VT_RESIZE path only. Will sort that out Monday
Oh my brain hurts It doesn't deadlock for me although we clearly take the mutex twice. I've got another report where someone is seeing races that can't occur here and the real fix is going to be foul. Seems we have a lack of sanity check on mutexes so that recursive mutex_lock *works* (but unlocks wrongly) if CONFIG_PREEMPT is set. So that explains why nobody found the bug and how to work around it for the moment.
You mean setting CONFIG_PREEMPT to avoid this problem is a temporary method, don't you? I checked my kernel config just right now. That's right, I set CONFIG_PREEMPT_VOLUNTARY, not CONFIG_PREEMPT. I'll try that *method*, and tell you the result.
Proposed patch follows. Its very ugly but will do for the short term and the code gets cleaned up in some further patches in testing for .28
Created attachment 17075 [details] Proposed fix
Full fix turns out to be ghastly - sent Linus a partial revert to go back to the race condition. I've got a rework of this area lined up for -next and .28.
Thank you Alan for response. I just tried set CONFIG_PREEMPT, and recompiled kernel, and the problem still existed. You mean this bug will be fixed in .28, and now I have to use that proposed patch only, don't you?
Under discussion. Linus has said he'll look at it for 2.6.27
Short term fixes pushed upstream, longer term ones for 2.6.28