Bug 12515 - possible circular locking #0: (sk_lock-AF_PACKET){--..}, at: [<c1279838>] sock_setsockopt+0x12b/0x4a4
Summary: possible circular locking #0: (sk_lock-AF_PACKET){--..}, at: [<c1279838>] so...
Status: CLOSED CODE_FIX
Alias: None
Product: Networking
Classification: Unclassified
Component: IPV4 (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Vegard Nossum
URL:
Keywords:
Depends on:
Blocks: 12398
  Show dependency tree
 
Reported: 2009-01-20 10:16 UTC by Martin Mokrejs
Modified: 2009-02-01 02:05 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.29-rc1-git4
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
dmesg (32.85 KB, text/plain)
2009-01-20 10:18 UTC, Martin Mokrejs
Details
[PATCH] net: fix setsockopt() locking errors (5.14 KB, patch)
2009-01-24 13:56 UTC, Vegard Nossum
Details | Diff
dmesg-2.6.29-rc2-git1-patched (27.49 KB, text/plain)
2009-01-26 08:38 UTC, Martin Mokrejs
Details
[PATCH] net: fix setsockopt() locking errors #2 (8.65 KB, patch)
2009-01-27 14:38 UTC, Vegard Nossum
Details | Diff
dmesg-2.6.29-rc2-git1-patched (29.52 KB, application/octet-stream)
2009-01-27 17:13 UTC, Martin Mokrejs
Details
packet: Avoid lock_sock in mmap handler (1.28 KB, patch)
2009-01-29 22:51 UTC, Herbert Xu
Details | Diff

Description Martin Mokrejs 2009-01-20 10:16:37 UTC
Latest working kernel version:
Earliest failing kernel version:
Distribution: Gentoo Linux
Hardware Environment: ASUS L3C/S laptop
Software Environment:
Problem Description:

Steps to reproduce: I have been reinstalling some apps while started tcpdump. It immediately hit the bug.
Comment 1 Martin Mokrejs 2009-01-20 10:18:28 UTC
Created attachment 19907 [details]
dmesg
Comment 2 Martin Mokrejs 2009-01-20 12:11:18 UTC
device eth0 entered promiscuous mode

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.29-rc2-git1 #1
-------------------------------------------------------
tcpdump/3734 is trying to acquire lock:
 (&mm->mmap_sem){----}, at: [<c1053294>] might_fault+0x30/0x6b

but task is already holding lock:
 (sk_lock-AF_PACKET){--..}, at: [<c12798c8>] sock_setsockopt+0x12b/0x4a4

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (sk_lock-AF_PACKET){--..}:
       [<c1033a0d>] add_lock_to_list+0x5e/0x8b
       [<c10365ae>] __lock_acquire+0xfb8/0x12e5
       [<c12cc63c>] packet_mmap+0x2e/0xf4
       [<c12cc63c>] packet_mmap+0x2e/0xf4
       [<c1036922>] lock_acquire+0x47/0x5e
       [<c12cc63c>] packet_mmap+0x2e/0xf4
       [<c1278548>] lock_sock_nested+0xac/0xb9
       [<c12cc63c>] packet_mmap+0x2e/0xf4
       [<c12cc63c>] packet_mmap+0x2e/0xf4
       [<c1034e94>] trace_hardirqs_on_caller+0x104/0x124
       [<c12759fa>] sock_mmap+0xc/0xe
       [<c10588f9>] mmap_region+0x27e/0x4b4
       [<c1058d1e>] do_mmap_pgoff+0x1ef/0x239
       [<c10057cf>] sys_mmap2+0x58/0x76
       [<c1002cd5>] sysenter_do_call+0x12/0x35
       [<ffffffff>] 0xffffffff

-> #0 (&mm->mmap_sem){----}:
       [<c103507d>] print_circular_bug_entry+0x36/0x3d
       [<c10362d1>] __lock_acquire+0xcdb/0x12e5
       [<c1034cf3>] mark_held_locks+0x50/0x66
       [<c101ff01>] local_bh_enable+0x97/0x9a
       [<c1036922>] lock_acquire+0x47/0x5e
       [<c1053294>] might_fault+0x30/0x6b
       [<c10532b1>] might_fault+0x4d/0x6b
       [<c1053294>] might_fault+0x30/0x6b
       [<c1137907>] copy_from_user+0x27/0x10e
       [<c1279bcb>] sock_setsockopt+0x42e/0x4a4
       [<c10532b4>] might_fault+0x50/0x6b
       [<c127636a>] sys_setsockopt+0x4c/0x7d
       [<c1277957>] sys_socketcall+0x135/0x18c
       [<c1002cd5>] sysenter_do_call+0x12/0x35
       [<ffffffff>] 0xffffffff

other info that might help us debug this:

1 lock held by tcpdump/3734:
 #0:  (sk_lock-AF_PACKET){--..}, at: [<c12798c8>] sock_setsockopt+0x12b/0x4a4

stack backtrace:
Pid: 3734, comm: tcpdump Not tainted 2.6.29-rc2-git1 #1
Call Trace:
 [<c10352c1>] print_circular_bug_tail+0xa5/0xaf
 [<c103507d>] print_circular_bug_entry+0x36/0x3d
 [<c10362d1>] __lock_acquire+0xcdb/0x12e5
 [<c1034cf3>] mark_held_locks+0x50/0x66
 [<c101ff01>] local_bh_enable+0x97/0x9a
 [<c1036922>] lock_acquire+0x47/0x5e
 [<c1053294>] might_fault+0x30/0x6b
 [<c10532b1>] might_fault+0x4d/0x6b
 [<c1053294>] might_fault+0x30/0x6b
 [<c1137907>] copy_from_user+0x27/0x10e
 [<c1279bcb>] sock_setsockopt+0x42e/0x4a4
 [<c10532b4>] might_fault+0x50/0x6b
 [<c127636a>] sys_setsockopt+0x4c/0x7d
 [<c1277957>] sys_socketcall+0x135/0x18c
 [<c1002cd5>] sysenter_do_call+0x12/0x35
device eth0 left promiscuous mode
Comment 3 Vegard Nossum 2009-01-24 13:25:32 UTC
Thanks for info, it seems that locking in sock_setsockopt() is wrong (it could deadlock ABBA-style). I will submit a patch to clean up this function. Is that ok with Stephen Hemminger (it looks like this bug was auto-assigned)?
Comment 4 Vegard Nossum 2009-01-24 13:56:25 UTC
Created attachment 19980 [details]
[PATCH] net: fix setsockopt() locking errors

If the problem was reproducible, can you see if this patch makes a difference? I compile-tested it, will also give it a bootup test.
Comment 5 Martin Mokrejs 2009-01-26 08:38:42 UTC
Created attachment 20000 [details]
dmesg-2.6.29-rc2-git1-patched

No, the patch did not help. Don't know what bootup tests you meant but I don't see any errors except the one caused by tcpdump.
Comment 6 Anonymous Emailer 2009-01-27 13:44:37 UTC
Reply-To: akpm@linux-foundation.org


(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Tue, 20 Jan 2009 10:16:38 -0800 (PST)
bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=12515
> 
>            Summary: possible circular locking #0:  (sk_lock-AF_PACKET){--
>                     ..}, at: [<c1279838>] sock_setsockopt+0x12b/0x4a4
>            Product: Networking
>            Version: 2.5
>      KernelVersion: 2.6.29-rc1-git4
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: IPV4
>         AssignedTo: shemminger@linux-foundation.org
>         ReportedBy: mmokrejs@ribosome.natur.cuni.cz
> 
> 
> Latest working kernel version:
> Earliest failing kernel version:
> Distribution: Gentoo Linux
> Hardware Environment: ASUS L3C/S laptop
> Software Environment:
> Problem Description:
> 
> Steps to reproduce: I have been reinstalling some apps while started tcpdump.
> It immediately hit the bug.
> 

More info at the link.  Vegard did some analysis, had a shot at fixing
it, but it seems that he missed.
Comment 7 Martin Mokrejs 2009-01-27 13:54:13 UTC
There has been some discussion under subject
"Re: [PATCH] net: fix setsockopt() locking errors"
and I hope it is archived somewhere via netdev@vger.kernel.org .
If not, Jarek Poplawski and Peter Zijlstra have some clues
what to do and I am waiting for Vegard Nossum to give me
another patch to test.


Here is some part of the discussion ,hopefully the most important
part:

On Tue, Jan 27, 2009 at 09:52:49AM +0100, Peter Zijlstra wrote:
> > On Tue, 2009-01-27 at 08:45 +0000, Jarek Poplawski wrote:
>> > > On Mon, Jan 26, 2009 at 10:30:30PM +0100, Martin MOKREJŠ wrote:
>>> > > > The patch really did not help:
>>> > > > http://bugzilla.kernel.org/show_bug.cgi?id=12515#c5
>>> > > > Martin
>> > > 
>> > > Actually, there is a little change: the warning triggerd in another
>> > > place (sock_setsockopt() -> sk_attach_filter()). So we could go deeper
>> > > with these changes, but I'm not sure this is the right way to fix.
>> > > 
>> > > It looks like the scenario is very old, but probably wasn't reported
>> > > (maybe there is some lockdep improvement):
> > 
> > Yes, they likely are very old, and yes we added a lockdep annotation to
> > copy_to/from_user() to catch these.
> > 
>> > > A) sys_mmap2() -> mm->mmap_sem -> packet_mmap() -> sk_lock
>> > > B) sock_setsockopt() -> sk_lock -> copy_from_user() -> mm->mmap_sem
>> > > 
>> > > packet_mmap() (net/packet/af_packet.c) seems to be the only place in
>> > > net to implement mmap method, and using this lock order btw. On the
>> > > other hand copy_from_user() could be more popular under sk_lock, and
>> > > I'm not sure these changes are necessary.
>> > > 
>> > > Since I don't know enough neither sock/packet nor sys_mmap, I guess
>> > > some advice would be precious. It looks like Peter Zijlstra solved
>> > > similar problems in nfs, so I CC him.
> > 
> > The NFS/sunrpc case was special in that it did copy_to/from_kernel, that
> > is, it never actually touched user memory -- we taught the might_fault()
> > annotation about that.
> > 
> > Can't you simply do the copy_from_user() before you take the sk_lock?
> > 

Since it's really needed, and Vegard started doing it like this, I
guess he will try to add the missing pieces.

Thanks again,
Jarek P.





Andrew Morton wrote:
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
> 
> On Tue, 20 Jan 2009 10:16:38 -0800 (PST)
> bugme-daemon@bugzilla.kernel.org wrote:
> 
>> http://bugzilla.kernel.org/show_bug.cgi?id=12515
>>
>>            Summary: possible circular locking #0:  (sk_lock-AF_PACKET){--
>>                     ..}, at: [<c1279838>] sock_setsockopt+0x12b/0x4a4
>>            Product: Networking
>>            Version: 2.5
>>      KernelVersion: 2.6.29-rc1-git4
>>           Platform: All
>>         OS/Version: Linux
>>               Tree: Mainline
>>             Status: NEW
>>           Severity: normal
>>           Priority: P1
>>          Component: IPV4
>>         AssignedTo: shemminger@linux-foundation.org
>>         ReportedBy: mmokrejs@ribosome.natur.cuni.cz
>>
>>
>> Latest working kernel version:
>> Earliest failing kernel version:
>> Distribution: Gentoo Linux
>> Hardware Environment: ASUS L3C/S laptop
>> Software Environment:
>> Problem Description:
>>
>> Steps to reproduce: I have been reinstalling some apps while started
>> tcpdump.
>> It immediately hit the bug.
>>
> 
> More info at the link.  Vegard did some analysis, had a shot at fixing
> it, but it seems that he missed.
Comment 8 Vegard Nossum 2009-01-27 14:38:23 UTC
Created attachment 20018 [details]
[PATCH] net: fix setsockopt() locking errors #2

Andrew, I just missed a case because the actual call to copy_from_user was in a different file altogether.

There might be more, but I didn't really have the time to look for everything yet.

In the meantime, I give you this patch, which includes the SO_ATTACH_FILTER socket option, which is what was reported to fail with the first patch applied.

I also don't REALLY think this is a regression, it's just lockdep that got smarter (or the insertion of a might_fault() something something made it more likely to show up -- I think this was info from Peter, but I don't remember accurately).
Comment 9 Martin Mokrejs 2009-01-27 17:13:44 UTC
Created attachment 20021 [details]
dmesg-2.6.29-rc2-git1-patched

Cannot reproduce anymore. Looks you're on the right way.
Comment 10 Martin Mokrejs 2009-01-27 17:16:30 UTC

Martin MOKREJŠ wrote:
> There has been some discussion under subject
> "Re: [PATCH] net: fix setsockopt() locking errors"
> and I hope it is archived somewhere via netdev@vger.kernel.org .
> If not, Jarek Poplawski and Peter Zijlstra have some clues
> what to do and I am waiting for Vegard Nossum to give me
> another patch to test.

I have tested the patch below in bugzilla and cannot reproduce anymore.
dmesg(1) output in bugzilla.

Martin


------- Comment #8 from vegard.nossum@gmail.com  2009-01-27 14:38 -------
Created an attachment (id=20018)
 --> (http://bugzilla.kernel.org/attachment.cgi?id=20018&action=view)
[PATCH] net: fix setsockopt() locking errors #2

Andrew, I just missed a case because the actual call to copy_from_user was in a
different file altogether.

There might be more, but I didn't really have the time to look for everything
yet.

In the meantime, I give you this patch, which includes the SO_ATTACH_FILTER
socket option, which is what was reported to fail with the first patch applied.

I also don't REALLY think this is a regression, it's just lockdep that got
smarter (or the insertion of a might_fault() something something made it more
likely to show up -- I think this was info from Peter, but I don't remember
accurately).



> 
> 
> Here is some part of the discussion ,hopefully the most important
> part:
> 
> On Tue, Jan 27, 2009 at 09:52:49AM +0100, Peter Zijlstra wrote:
>>> On Tue, 2009-01-27 at 08:45 +0000, Jarek Poplawski wrote:
>>>>> On Mon, Jan 26, 2009 at 10:30:30PM +0100, Martin MOKREJŠ wrote:
>>>>>>> The patch really did not help:
>>>>>>> http://bugzilla.kernel.org/show_bug.cgi?id=12515#c5
>>>>>>> Martin
>>>>> Actually, there is a little change: the warning triggerd in another
>>>>> place (sock_setsockopt() -> sk_attach_filter()). So we could go deeper
>>>>> with these changes, but I'm not sure this is the right way to fix.
>>>>>
>>>>> It looks like the scenario is very old, but probably wasn't reported
>>>>> (maybe there is some lockdep improvement):
>>> Yes, they likely are very old, and yes we added a lockdep annotation to
>>> copy_to/from_user() to catch these.
>>>
>>>>> A) sys_mmap2() -> mm->mmap_sem -> packet_mmap() -> sk_lock
>>>>> B) sock_setsockopt() -> sk_lock -> copy_from_user() -> mm->mmap_sem
>>>>>
>>>>> packet_mmap() (net/packet/af_packet.c) seems to be the only place in
>>>>> net to implement mmap method, and using this lock order btw. On the
>>>>> other hand copy_from_user() could be more popular under sk_lock, and
>>>>> I'm not sure these changes are necessary.
>>>>>
>>>>> Since I don't know enough neither sock/packet nor sys_mmap, I guess
>>>>> some advice would be precious. It looks like Peter Zijlstra solved
>>>>> similar problems in nfs, so I CC him.
>>> The NFS/sunrpc case was special in that it did copy_to/from_kernel, that
>>> is, it never actually touched user memory -- we taught the might_fault()
>>> annotation about that.
>>>
>>> Can't you simply do the copy_from_user() before you take the sk_lock?
>>>
> 
> Since it's really needed, and Vegard started doing it like this, I
> guess he will try to add the missing pieces.
> 
> Thanks again,
> Jarek P.
> 
> 
> 
> 
> 
> Andrew Morton wrote:
>> (switched to email.  Please respond via emailed reply-to-all, not via the
>> bugzilla web interface).
>>
>> On Tue, 20 Jan 2009 10:16:38 -0800 (PST)
>> bugme-daemon@bugzilla.kernel.org wrote:
>>
>>> http://bugzilla.kernel.org/show_bug.cgi?id=12515
>>>
>>>            Summary: possible circular locking #0:  (sk_lock-AF_PACKET){--
>>>                     ..}, at: [<c1279838>] sock_setsockopt+0x12b/0x4a4
>>>            Product: Networking
>>>            Version: 2.5
>>>      KernelVersion: 2.6.29-rc1-git4
>>>           Platform: All
>>>         OS/Version: Linux
>>>               Tree: Mainline
>>>             Status: NEW
>>>           Severity: normal
>>>           Priority: P1
>>>          Component: IPV4
>>>         AssignedTo: shemminger@linux-foundation.org
>>>         ReportedBy: mmokrejs@ribosome.natur.cuni.cz
>>>
>>>
>>> Latest working kernel version:
>>> Earliest failing kernel version:
>>> Distribution: Gentoo Linux
>>> Hardware Environment: ASUS L3C/S laptop
>>> Software Environment:
>>> Problem Description:
>>>
>>> Steps to reproduce: I have been reinstalling some apps while started
>>> tcpdump.
>>> It immediately hit the bug.
>>>
>> More info at the link.  Vegard did some analysis, had a shot at fixing
>> it, but it seems that he missed.
Comment 11 Herbert Xu 2009-01-29 22:51:31 UTC
Created attachment 20044 [details]
packet: Avoid lock_sock in mmap handler 

Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>                                                                                                                                                                                                                                             
> Can't you simply do the copy_from_user() before you take the sk_lock?                                                                                                                                                                       

Well, doing the copy under sk_lock is pretty common through all
protocols.  So I think it'd be safer to change the other path,
which is doing the odd thing here, i.e., ->mmap() grabbing the
socket lock while holding mmap_sem.

In fact, it would appear that we don't really need the socket lock
in ->mmap() since it only needs to ensure that pg_vec* doesn't
get yanked or changed.  So this patch should work:

packet: Avoid lock_sock in mmap handler

As the mmap handler gets called under mmap_sem, and we may grab
mmap_sem elsewhere under the socket lock to access user data, we
should avoid grabbing the socket lock in the mmap handler.

Since the only thing we care about in the mmap handler is for
pg_vec* to be invariant, i.e., to exclude packet_set_ring, we
can achieve this by simply using sk_receive_queue.lock.

I resisted the temptation to create a new spin lock because the
mmap path isn't exactly common.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Comment 12 Herbert Xu 2009-01-30 04:50:05 UTC
On Fri, Jan 30, 2009 at 05:12:50PM +1100, Herbert Xu wrote:
> 
> Well, doing the copy under sk_lock is pretty common through all
> protocols.  So I think it'd be safer to change the other path,
> which is doing the odd thing here, i.e., ->mmap() grabbing the
> socket lock while holding mmap_sem.
> 
> In fact, it would appear that we don't really need the socket lock
> in ->mmap() since it only needs to ensure that pg_vec* doesn't
> get yanked or changed.  So this patch should work:
> 
> packet: Avoid lock_sock in mmap handler

Dave pointed out that a spin lock is illegal for this purpose
as vm_insert_page can do a GFP_KERNEL allocation.  So I've added
a mutex for this.

I've also widened the critical section in packet_set_ring since
we need the mapped check to be within it.

packet: Avoid lock_sock in mmap handler

As the mmap handler gets called under mmap_sem, and we may grab
mmap_sem elsewhere under the socket lock to access user data, we
should avoid grabbing the socket lock in the mmap handler.

Since the only thing we care about in the mmap handler is for
pg_vec* to be invariant, i.e., to exclude packet_set_ring, we
can achieve this by simply using sk_receive_queue.lock.

I resisted the temptation to create a new spin lock because the
mmap path isn't exactly common.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 5f94db2..9454d4a 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -77,6 +77,7 @@
 #include <linux/poll.h>
 #include <linux/module.h>
 #include <linux/init.h>
+#include <linux/mutex.h>
 
 #ifdef CONFIG_INET
 #include <net/inet_common.h>
@@ -175,6 +176,7 @@ struct packet_sock {
 #endif
 	struct packet_type	prot_hook;
 	spinlock_t		bind_lock;
+	struct mutex		pg_vec_lock;
 	unsigned int		running:1,	/* prot_hook is attached*/
 				auxdata:1,
 				origdev:1;
@@ -1069,6 +1071,7 @@ static int packet_create(struct net *net, struct socket *sock, int protocol)
 	 */
 
 	spin_lock_init(&po->bind_lock);
+	mutex_init(&po->pg_vec_lock);
 	po->prot_hook.func = packet_rcv;
 
 	if (sock->type == SOCK_PACKET)
@@ -1865,6 +1868,7 @@ static int packet_set_ring(struct sock *sk, struct tpacket_req *req, int closing
 	synchronize_net();
 
 	err = -EBUSY;
+	mutex_lock(&po->pg_vec_lock);
 	if (closing || atomic_read(&po->mapped) == 0) {
 		err = 0;
 #define XC(a, b) ({ __typeof__ ((a)) __t; __t = (a); (a) = (b); __t; })
@@ -1886,6 +1890,7 @@ static int packet_set_ring(struct sock *sk, struct tpacket_req *req, int closing
 		if (atomic_read(&po->mapped))
 			printk(KERN_DEBUG "packet_mmap: vma is busy: %d\n", atomic_read(&po->mapped));
 	}
+	mutex_unlock(&po->pg_vec_lock);
 
 	spin_lock(&po->bind_lock);
 	if (was_running && !po->running) {
@@ -1918,7 +1923,7 @@ static int packet_mmap(struct file *file, struct socket *sock, struct vm_area_st
 
 	size = vma->vm_end - vma->vm_start;
 
-	lock_sock(sk);
+	mutex_lock(&po->pg_vec_lock);
 	if (po->pg_vec == NULL)
 		goto out;
 	if (size != po->pg_vec_len*po->pg_vec_pages*PAGE_SIZE)
@@ -1941,7 +1946,7 @@ static int packet_mmap(struct file *file, struct socket *sock, struct vm_area_st
 	err = 0;
 
 out:
-	release_sock(sk);
+	mutex_unlock(&po->pg_vec_lock);
 	return err;
 }
 #endif

Cheers,
Comment 13 Jarek Poplawski 2009-01-30 05:56:56 UTC
On Fri, Jan 30, 2009 at 11:49:47PM +1100, Herbert Xu wrote:
...
> Dave pointed out that a spin lock is illegal for this purpose
> as vm_insert_page can do a GFP_KERNEL allocation.  So I've added
> a mutex for this.
...
> I resisted the temptation to create a new spin lock because the
> mmap path isn't exactly common.

So, Dave is stronger than the temptation...

Jarek P.
Comment 14 Martin Mokrejs 2009-01-30 09:18:30 UTC
This patch works for me.
Thanks
M.

Herbert Xu wrote:
> On Fri, Jan 30, 2009 at 05:12:50PM +1100, Herbert Xu wrote:
>> Well, doing the copy under sk_lock is pretty common through all
>> protocols.  So I think it'd be safer to change the other path,
>> which is doing the odd thing here, i.e., ->mmap() grabbing the
>> socket lock while holding mmap_sem.
>>
>> In fact, it would appear that we don't really need the socket lock
>> in ->mmap() since it only needs to ensure that pg_vec* doesn't
>> get yanked or changed.  So this patch should work:
>>
>> packet: Avoid lock_sock in mmap handler
> 
> Dave pointed out that a spin lock is illegal for this purpose
> as vm_insert_page can do a GFP_KERNEL allocation.  So I've added
> a mutex for this.
> 
> I've also widened the critical section in packet_set_ring since
> we need the mapped check to be within it.
> 
> packet: Avoid lock_sock in mmap handler
> 
> As the mmap handler gets called under mmap_sem, and we may grab
> mmap_sem elsewhere under the socket lock to access user data, we
> should avoid grabbing the socket lock in the mmap handler.
> 
> Since the only thing we care about in the mmap handler is for
> pg_vec* to be invariant, i.e., to exclude packet_set_ring, we
> can achieve this by simply using sk_receive_queue.lock.
> 
> I resisted the temptation to create a new spin lock because the
> mmap path isn't exactly common.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> 
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index 5f94db2..9454d4a 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -77,6 +77,7 @@
>  #include <linux/poll.h>
>  #include <linux/module.h>
>  #include <linux/init.h>
> +#include <linux/mutex.h>
>  
>  #ifdef CONFIG_INET
>  #include <net/inet_common.h>
> @@ -175,6 +176,7 @@ struct packet_sock {
>  #endif
>       struct packet_type      prot_hook;
>       spinlock_t              bind_lock;
> +     struct mutex            pg_vec_lock;
>       unsigned int            running:1,      /* prot_hook is attached*/
>                               auxdata:1,
>                               origdev:1;
> @@ -1069,6 +1071,7 @@ static int packet_create(struct net *net, struct socket
> *sock, int protocol)
>        */
>  
>       spin_lock_init(&po->bind_lock);
> +     mutex_init(&po->pg_vec_lock);
>       po->prot_hook.func = packet_rcv;
>  
>       if (sock->type == SOCK_PACKET)
> @@ -1865,6 +1868,7 @@ static int packet_set_ring(struct sock *sk, struct
> tpacket_req *req, int closing
>       synchronize_net();
>  
>       err = -EBUSY;
> +     mutex_lock(&po->pg_vec_lock);
>       if (closing || atomic_read(&po->mapped) == 0) {
>               err = 0;
>  #define XC(a, b) ({ __typeof__ ((a)) __t; __t = (a); (a) = (b); __t; })
> @@ -1886,6 +1890,7 @@ static int packet_set_ring(struct sock *sk, struct
> tpacket_req *req, int closing
>               if (atomic_read(&po->mapped))
>                       printk(KERN_DEBUG "packet_mmap: vma is busy: %d\n",
>  atomic_read(&po->mapped));
>       }
> +     mutex_unlock(&po->pg_vec_lock);
>  
>       spin_lock(&po->bind_lock);
>       if (was_running && !po->running) {
> @@ -1918,7 +1923,7 @@ static int packet_mmap(struct file *file, struct socket
> *sock, struct vm_area_st
>  
>       size = vma->vm_end - vma->vm_start;
>  
> -     lock_sock(sk);
> +     mutex_lock(&po->pg_vec_lock);
>       if (po->pg_vec == NULL)
>               goto out;
>       if (size != po->pg_vec_len*po->pg_vec_pages*PAGE_SIZE)
> @@ -1941,7 +1946,7 @@ static int packet_mmap(struct file *file, struct socket
> *sock, struct vm_area_st
>       err = 0;
>  
>  out:
> -     release_sock(sk);
> +     mutex_unlock(&po->pg_vec_lock);
>       return err;
>  }
>  #endif
Comment 15 Herbert Xu 2009-01-30 14:01:47 UTC
On Fri, Jan 30, 2009 at 01:56:17PM +0000, Jarek Poplawski wrote:
>
> > I resisted the temptation to create a new spin lock because the
> > mmap path isn't exactly common.
> 
> So, Dave is stronger than the temptation...

Heh, that sentence should be removed from the description.

Cheers,
Comment 16 David S. Miller 2009-01-30 14:05:59 UTC
From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Sat, 31 Jan 2009 09:01:28 +1100

> On Fri, Jan 30, 2009 at 01:56:17PM +0000, Jarek Poplawski wrote:
> >
> > > I resisted the temptation to create a new spin lock because the
> > > mmap path isn't exactly common.
> > 
> > So, Dave is stronger than the temptation...
> 
> Heh, that sentence should be removed from the description.

I might leave it in there for laughs :-)
Comment 17 David S. Miller 2009-01-30 14:13:32 UTC
From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Fri, 30 Jan 2009 23:49:47 +1100

> packet: Avoid lock_sock in mmap handler
> 
> As the mmap handler gets called under mmap_sem, and we may grab
> mmap_sem elsewhere under the socket lock to access user data, we
> should avoid grabbing the socket lock in the mmap handler.
> 
> Since the only thing we care about in the mmap handler is for
> pg_vec* to be invariant, i.e., to exclude packet_set_ring, we
> can achieve this by simply using sk_receive_queue.lock.
> 
> I resisted the temptation to create a new spin lock because the
> mmap path isn't exactly common.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Looks good, applied.

I rewrote the commit message as follows so that it actually
matches what happens in the fix ;-)

--------------------
packet: Avoid lock_sock in mmap handler

As the mmap handler gets called under mmap_sem, and we may grab
mmap_sem elsewhere under the socket lock to access user data, we
should avoid grabbing the socket lock in the mmap handler.

Since the only thing we care about in the mmap handler is for
pg_vec* to be invariant, i.e., to exclude packet_set_ring, we
can achieve this by simply using a new mutex.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Comment 18 Vegard Nossum 2009-02-01 02:05:24 UTC
Fixed in mainline as of commit 905db44087855e3c1709f538ecdc22fd149cadd8

Note You need to log in before you can comment on or make changes to this bug.