Latest working kernel version: 2.6.27 Earliest failing kernel version: 2.6.28-rc1 Reported by Johannes Weiner in http://lkml.org/lkml/2008/10/23/192: - raw1394_ioctl() locks fi->state_mutex, then calls copy_from/to_user() which may lock mmap_sem due to page fault. - raw1394_mmap() is called with mmap_sem held, then locks fi->state_mutex. fi->state_mutex was introduced while Big Kernel Lock usage was removed from raw1394 by http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=10963ea1bd966ba46a46178c4d6abcdf3c23538d There are two related follow-up changes to raw1394: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=ddfb908d3f905dbb5964d6fbf783e69c417eb13e http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=f22e52b89e036fd12b9374212da8b5d4a447bd1e This bug is unlikely to hit because all raw1394 users (i.e. libraw1394 users) have been trained to use a file handle only in a single thread. Before above mentioned commits, raw1394 would have corrupted its internal state anyway if accessed by multiple threads. Nevertheless, it's a regression which I need to fix ASAP.
Reply-To: stefanr@s5r6.in-berlin.de For the list to know, bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=11824 > > Summary: raw1394: possible deadlock if accessed by multithreaded > app > Product: Drivers > Version: 2.5 > KernelVersion: 2.6.28-rc1 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: IEEE1394 > AssignedTo: drivers_ieee1394@kernel-bugs.osdl.org > ReportedBy: stefan-r-bz@s5r6.in-berlin.de > OtherBugsDependingO 11808 > nThis: > Regression: 1 > > > Latest working kernel version: 2.6.27 > Earliest failing kernel version: 2.6.28-rc1 > > Reported by Johannes Weiner in http://lkml.org/lkml/2008/10/23/192: > > - raw1394_ioctl() locks fi->state_mutex, then calls copy_from/to_user() > which may lock mmap_sem due to page fault. > > - raw1394_mmap() is called with mmap_sem held, then locks fi->state_mutex. > > fi->state_mutex was introduced while Big Kernel Lock usage was removed from > raw1394 by > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=10963ea1bd966ba46a46178c4d6abcdf3c23538d > > There are two related follow-up changes to raw1394: > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=ddfb908d3f905dbb5964d6fbf783e69c417eb13e > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=f22e52b89e036fd12b9374212da8b5d4a447bd1e > > This bug is unlikely to hit because all raw1394 users (i.e. libraw1394 users) > have been trained to use a file handle only in a single thread. Before above > mentioned commits, raw1394 would have corrupted its internal state anyway if > accessed by multiple threads. Nevertheless, it's a regression which I need > to > fix ASAP. (It is possible to comment simultaneously on the list and in bugzilla by replying with Cc: bugme-daemon@bugzilla.kernel.org and keeping [Bug 11823] in the subject.) So far my ideas of a fix are: - Use mutex_trylock() in raw1394_mmap() [and possibly --- but not necessarily --- anywhere else where fi->state_mutex is taken], and exit with -EAGAIN if the mutex cannot be acquired. A simultaneous raw1394_ioctl() would then become unstuck as soon as mmap() finished as failed. - Same effect as replacing all mutex_lock() by mutex_trylock(): Access the state variables by atomic cmpxchg(). Every code site which needs to change the state or has a critical region during whose execution the state must not be changed will thus ensure exclusive control over the state or will fail with -EAGAIN. Since multithreaded use of a raw1394 file descriptor was never working and never supported before, it should have little consequence whether fi->state and fi->iso_state related critical regions are serialized in a blocking or non-blocking fashion.
Reply-To: stefanr@s5r6.in-berlin.de I wrote: > (It is possible to comment simultaneously on the list and in bugzilla by > replying with Cc: bugme-daemon@bugzilla.kernel.org and keeping [Bug > 11823] in the subject.) [Bug 11824] of course. > So far my ideas of a fix are: > > - Use mutex_trylock() in raw1394_mmap() [and possibly --- but not > necessarily --- anywhere else where fi->state_mutex is taken], > and exit with -EAGAIN if the mutex cannot be acquired. A > simultaneous raw1394_ioctl() would then become unstuck as soon > as mmap() finished as failed. Similarly, mutex_trylock() could be used in dv1394_mmap() to fix bug 11823. It would change previous behavior, but I suspect the effect of that change is irrelevant in practice. > - Same effect as replacing all mutex_lock() by mutex_trylock(): > Access the state variables by atomic cmpxchg(). Every code site > which needs to change the state or has a critical region during > whose execution the state must not be changed will thus ensure > exclusive control over the state or will fail with -EAGAIN. The effect of this would actually be like using mutex_trylock() everywhere /and/ using two different mutexes for write() on one hand and mmap() and ioctl() on the other hand.
Reply-To: stefanr@s5r6.in-berlin.de Regression in 2.6.28-rc1: When I added the new state_mutex which prevents corruption of raw1394's internal state when accessed by multithreaded client applications, the following possible though highly unlikely deadlock slipped in: Thread A: Thread B: - acquire mmap_sem - raw1394_write() or raw1394_ioctl() - raw1394_mmap() - acquire state_mutex - acquire state_mutex - copy_to/from_user(), possible page fault: acquire mmap_sem The simplest fix is to use mutex_trylock() instead of mutex_lock() in raw1394_mmap(). This changes the behavior under contention in a way which is visible to userspace clients. However, since multithreaded access was entirely buggy before state_mutex was added and libraw1394's documentation advised application programmers to use a handle only in a single thread, this change in behaviour should not be an issue in practice at all. Since we have to use mutex_trylock() in raw1394_mmap() regardless whether /dev/raw1394 was opened with O_NONBLOCK or not, we now use mutex_trylock() unconditionally everywhere for state_mutex, just to have consistent behavior. Reported-by: Johannes Weiner <hannes@saeurebad.de> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> --- Background: That new state_mutex went only in because raw1394_ioctl() already head some weak protection by the Big Kernel Lock, which I removed for the general reasons pro BKL removal (get better performance with local locks; make the locking clearer, easier to debug, more reliable). drivers/ieee1394/raw1394.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) Index: linux/drivers/ieee1394/raw1394.c =================================================================== --- linux.orig/drivers/ieee1394/raw1394.c +++ linux/drivers/ieee1394/raw1394.c @@ -2268,7 +2268,8 @@ static ssize_t raw1394_write(struct file return -EFAULT; } - mutex_lock(&fi->state_mutex); + if (!mutex_trylock(&fi->state_mutex)) + return -EAGAIN; switch (fi->state) { case opened: @@ -2548,7 +2549,8 @@ static int raw1394_mmap(struct file *fil struct file_info *fi = file->private_data; int ret; - mutex_lock(&fi->state_mutex); + if (!mutex_trylock(&fi->state_mutex)) + return -EAGAIN; if (fi->iso_state == RAW1394_ISO_INACTIVE) ret = -EINVAL; @@ -2669,7 +2671,8 @@ static long raw1394_ioctl(struct file *f break; } - mutex_lock(&fi->state_mutex); + if (!mutex_trylock(&fi->state_mutex)) + return -EAGAIN; switch (fi->iso_state) { case RAW1394_ISO_INACTIVE:
Handled-By : Stefan Richter <stefanr@s5r6.in-berlin.de> Patch : http://bugzilla.kernel.org/show_bug.cgi?id=11824#c3
* Stefan Richter <stefanr@s5r6.in-berlin.de> wrote: > Regression in 2.6.28-rc1: When I added the new state_mutex which > prevents corruption of raw1394's internal state when accessed by > multithreaded client applications, the following possible though > highly unlikely deadlock slipped in: > > Thread A: Thread B: > - acquire mmap_sem - raw1394_write() or raw1394_ioctl() > - raw1394_mmap() - acquire state_mutex > - acquire state_mutex - copy_to/from_user(), possible page fault: > acquire mmap_sem > > The simplest fix is to use mutex_trylock() instead of mutex_lock() in > raw1394_mmap(). This changes the behavior under contention in a way > which is visible to userspace clients. However, since multithreaded > access was entirely buggy before state_mutex was added and libraw1394's > documentation advised application programmers to use a handle only in a > single thread, this change in behaviour should not be an issue in > practice at all. > > Since we have to use mutex_trylock() in raw1394_mmap() regardless > whether /dev/raw1394 was opened with O_NONBLOCK or not, we now use > mutex_trylock() unconditionally everywhere for state_mutex, just to have > consistent behavior. > > Reported-by: Johannes Weiner <hannes@saeurebad.de> > Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> > --- > > Background: That new state_mutex went only in because raw1394_ioctl() > already head some weak protection by the Big Kernel Lock, which I > removed for the general reasons pro BKL removal (get better performance > with local locks; make the locking clearer, easier to debug, more > reliable). > > drivers/ieee1394/raw1394.c | 9 ++++++--- > 1 file changed, 6 insertions(+), 3 deletions(-) > > Index: linux/drivers/ieee1394/raw1394.c > =================================================================== > --- linux.orig/drivers/ieee1394/raw1394.c > +++ linux/drivers/ieee1394/raw1394.c > @@ -2268,7 +2268,8 @@ static ssize_t raw1394_write(struct file > return -EFAULT; > } > > - mutex_lock(&fi->state_mutex); > + if (!mutex_trylock(&fi->state_mutex)) > + return -EAGAIN; > > switch (fi->state) { > case opened: > @@ -2548,7 +2549,8 @@ static int raw1394_mmap(struct file *fil > struct file_info *fi = file->private_data; > int ret; > > - mutex_lock(&fi->state_mutex); > + if (!mutex_trylock(&fi->state_mutex)) > + return -EAGAIN; > > if (fi->iso_state == RAW1394_ISO_INACTIVE) > ret = -EINVAL; > @@ -2669,7 +2671,8 @@ static long raw1394_ioctl(struct file *f > break; > } > > - mutex_lock(&fi->state_mutex); > + if (!mutex_trylock(&fi->state_mutex)) > + return -EAGAIN; So we can return a spurious -EAGAIN to user-space, if the state_mutex is taken briefly by some other context? Ingo
Reply-To: stefanr@s5r6.in-berlin.de Ingo Molnar wrote: > * Stefan Richter <stefanr@s5r6.in-berlin.de> wrote: > >> Regression in 2.6.28-rc1: When I added the new state_mutex which >> prevents corruption of raw1394's internal state when accessed by >> multithreaded client applications, the following possible though >> highly unlikely deadlock slipped in: >> >> Thread A: Thread B: >> - acquire mmap_sem - raw1394_write() or raw1394_ioctl() >> - raw1394_mmap() - acquire state_mutex >> - acquire state_mutex - copy_to/from_user(), possible page fault: >> acquire mmap_sem >> >> The simplest fix is to use mutex_trylock() instead of mutex_lock() in >> raw1394_mmap(). [...] >> Since we have to use mutex_trylock() in raw1394_mmap() regardless >> whether /dev/raw1394 was opened with O_NONBLOCK or not, we now use >> mutex_trylock() unconditionally everywhere for state_mutex, just to have >> consistent behavior. [...] >> Background: That new state_mutex went only in because raw1394_ioctl() >> already head some weak protection by the Big Kernel Lock, which I >> removed for the general reasons pro BKL removal (get better performance >> with local locks; make the locking clearer, easier to debug, more >> reliable). >> >> drivers/ieee1394/raw1394.c | 9 ++++++--- >> 1 file changed, 6 insertions(+), 3 deletions(-) >> >> Index: linux/drivers/ieee1394/raw1394.c >> =================================================================== >> --- linux.orig/drivers/ieee1394/raw1394.c >> +++ linux/drivers/ieee1394/raw1394.c >> @@ -2268,7 +2268,8 @@ static ssize_t raw1394_write(struct file >> return -EFAULT; >> } >> >> - mutex_lock(&fi->state_mutex); >> + if (!mutex_trylock(&fi->state_mutex)) >> + return -EAGAIN; >> >> switch (fi->state) { >> case opened: >> @@ -2548,7 +2549,8 @@ static int raw1394_mmap(struct file *fil >> struct file_info *fi = file->private_data; >> int ret; >> >> - mutex_lock(&fi->state_mutex); >> + if (!mutex_trylock(&fi->state_mutex)) >> + return -EAGAIN; >> >> if (fi->iso_state == RAW1394_ISO_INACTIVE) >> ret = -EINVAL; >> @@ -2669,7 +2671,8 @@ static long raw1394_ioctl(struct file *f >> break; >> } >> >> - mutex_lock(&fi->state_mutex); >> + if (!mutex_trylock(&fi->state_mutex)) >> + return -EAGAIN; > > So we can return a spurious -EAGAIN to user-space, if the state_mutex > is taken briefly by some other context? .write() and .mmap() were not serialized against each other and against .ioctl() at all in raw1394 before 2.6.28-rc1. Only .ioctl() was (somewhat) serialized against itself due to traditionally being implemented as .ioctl() instead of .unlocked_ioctl(). If there were ever concurrent userspace contexts accessing the same file handle, they would have corrupted raw1394's internal state eventually. Application developers have been advised all the time to access the file handle only from a single thread, ever. So, before: Concurrent access resulted in hard to debug driver malfunction, without clear error indication. After: Concurrent access results in -EAGAIN. I.e. since concurrent access was so broken before that hopefully no raw1394/libraw1394 client exists which did use a single handle in more than one thread, we are quite free how we fix that. Or at least that's how I understand the situation. Would be good to hear opinions from those who are more involved with libraw1394.
Reply-To: stefanr@s5r6.in-berlin.de I wrote: > .write() and .mmap() were not serialized against each other and against > .ioctl() at all in raw1394 before 2.6.28-rc1. PS: There is a need for serialization to some degree because the client registers itself with a controller via .write() (among many other things that are implemented through .write()), manages isochronous I/O contexts on this controller via .ioctl() and maps DMA buffers for isochronous I/O via .mmap(). The raw1394 driver tracks respective state by means of two state variables and some other variables, and accesses of the state variables is not reentrant within one opener of /dev/raw1394. AFAICS the issue exists between .write() and .write(), and independently of that between .ioctl() and .ioctl() and between .ioctl() and .mmap(). Local mutex protection is the simplest way to fix that --- except that there is this obscure issue of locking order between the driver's mutex and the mmap semaphore outside the driver.
fixed by commit 638570b54346f140bc09b986d93e76025d35180f