public inbox for netdev@vger.kernel.org 
 help / color / mirror / Atom feed
* [PATCH net-next] packet: Protect packet sk list with mutex (v2)
@ 2012-08-21 11:06 Pavel Emelyanov
  2012-08-22  6:59 ` Eric Dumazet
  0 siblings, 1 reply; 3+ messages in thread
From: Pavel Emelyanov @ 2012-08-21 11:06 UTC (permalink / raw)
  To: Eric Dumazet, David Miller, Linux Netdev List

Change since v1:

* Fixed inuse counters access spotted by Eric

In patch eea68e2f (packet: Report socket mclist info via diag module) I've
introduced a "scheduling in atomic" problem in packet diag module -- the
socket list is traversed under rcu_read_lock() while performed under it sk
mclist access requires rtnl lock (i.e. -- mutex) to be taken.

[152363.820563] BUG: scheduling while atomic: crtools/12517/0x10000002
[152363.820573] 4 locks held by crtools/12517:
[152363.820581]  #0:  (sock_diag_mutex){+.+.+.}, at: [<ffffffff81a2dcb5>] sock_diag_rcv+0x1f/0x3e
[152363.820613]  #1:  (sock_diag_table_mutex){+.+.+.}, at: [<ffffffff81a2de70>] sock_diag_rcv_msg+0xdb/0x11a
[152363.820644]  #2:  (nlk->cb_mutex){+.+.+.}, at: [<ffffffff81a67d01>] netlink_dump+0x23/0x1ab
[152363.820693]  #3:  (rcu_read_lock){.+.+..}, at: [<ffffffff81b6a049>] packet_diag_dump+0x0/0x1af

Similar thing was then re-introduced by further packet diag patches (fanount 
mutex and pgvec mutex for rings) :(

Apart from being terribly sorry for the above, I propose to change the packet
sk list protection from spinlock to mutex. This lock currently protects two
modifications:

* sklist
* prot inuse counters

The sklist modifications can be just reprotected with mutex since they already
occur in a sleeping context. The inuse counters modifications are trickier -- the
__this_cpu_-s are used inside, thus requiring the caller to handle the potential
issues with contexts himself. Since packet sockets' counters are modified in two 
places only (packet_create and packet_release) we only need to protect the context 
from being preempted. BH disabling is not required in this case.

Signed-off-by: Pavel Emelyanov <xemul@parallels•com>

---

diff --git a/include/net/netns/packet.h b/include/net/netns/packet.h
index cb4e894..4780b08 100644
--- a/include/net/netns/packet.h
+++ b/include/net/netns/packet.h
@@ -8,7 +8,7 @@
 #include <linux/spinlock.h>
 
 struct netns_packet {
-	spinlock_t		sklist_lock;
+	struct mutex		sklist_lock;
 	struct hlist_head	sklist;
 };
 
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 226b2cd..79bc69c 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2308,10 +2308,13 @@ static int packet_release(struct socket *sock)
 	net = sock_net(sk);
 	po = pkt_sk(sk);
 
-	spin_lock_bh(&net->packet.sklist_lock);
+	mutex_lock(&net->packet.sklist_lock);
 	sk_del_node_init_rcu(sk);
+	mutex_unlock(&net->packet.sklist_lock);
+
+	preempt_disable();
 	sock_prot_inuse_add(net, sk->sk_prot, -1);
-	spin_unlock_bh(&net->packet.sklist_lock);
+	preempt_enable();
 
 	spin_lock(&po->bind_lock);
 	unregister_prot_hook(sk, false);
@@ -2510,10 +2513,13 @@ static int packet_create(struct net *net, struct socket *sock, int protocol,
 		register_prot_hook(sk);
 	}
 
-	spin_lock_bh(&net->packet.sklist_lock);
+	mutex_lock(&net->packet.sklist_lock);
 	sk_add_node_rcu(sk, &net->packet.sklist);
+	mutex_unlock(&net->packet.sklist_lock);
+
+	preempt_disable();
 	sock_prot_inuse_add(net, &packet_proto, 1);
-	spin_unlock_bh(&net->packet.sklist_lock);
+	preempt_enable();
 
 	return 0;
 out:
@@ -3766,7 +3772,7 @@ static const struct file_operations packet_seq_fops = {
 
 static int __net_init packet_net_init(struct net *net)
 {
-	spin_lock_init(&net->packet.sklist_lock);
+	mutex_init(&net->packet.sklist_lock);
 	INIT_HLIST_HEAD(&net->packet.sklist);
 
 	if (!proc_net_fops_create(net, "packet", 0, &packet_seq_fops))
diff --git a/net/packet/diag.c b/net/packet/diag.c
index bc33fbe..39bce0d 100644
--- a/net/packet/diag.c
+++ b/net/packet/diag.c
@@ -177,8 +177,8 @@ static int packet_diag_dump(struct sk_buff *skb, struct netlink_callback *cb)
 	net = sock_net(skb->sk);
 	req = nlmsg_data(cb->nlh);
 
-	rcu_read_lock();
-	sk_for_each_rcu(sk, node, &net->packet.sklist) {
+	mutex_lock(&net->packet.sklist_lock);
+	sk_for_each(sk, node, &net->packet.sklist) {
 		if (!net_eq(sock_net(sk), net))
 			continue;
 		if (num < s_num)
@@ -192,7 +192,7 @@ next:
 		num++;
 	}
 done:
-	rcu_read_unlock();
+	mutex_unlock(&net->packet.sklist_lock);
 	cb->args[0] = num;
 
 	return skb->len;

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH net-next] packet: Protect packet sk list with mutex (v2)
  2012-08-21 11:06 [PATCH net-next] packet: Protect packet sk list with mutex (v2) Pavel Emelyanov
@ 2012-08-22  6:59 ` Eric Dumazet
  2012-08-23  5:59   ` David Miller
  0 siblings, 1 reply; 3+ messages in thread
From: Eric Dumazet @ 2012-08-22  6:59 UTC (permalink / raw)
  To: Pavel Emelyanov; +Cc: David Miller, Linux Netdev List

On Tue, 2012-08-21 at 15:06 +0400, Pavel Emelyanov wrote:
> Change since v1:
> 
> * Fixed inuse counters access spotted by Eric
> 
> In patch eea68e2f (packet: Report socket mclist info via diag module) I've
> introduced a "scheduling in atomic" problem in packet diag module -- the
> socket list is traversed under rcu_read_lock() while performed under it sk
> mclist access requires rtnl lock (i.e. -- mutex) to be taken.
> 
> [152363.820563] BUG: scheduling while atomic: crtools/12517/0x10000002
> [152363.820573] 4 locks held by crtools/12517:
> [152363.820581]  #0:  (sock_diag_mutex){+.+.+.}, at: [<ffffffff81a2dcb5>] sock_diag_rcv+0x1f/0x3e
> [152363.820613]  #1:  (sock_diag_table_mutex){+.+.+.}, at: [<ffffffff81a2de70>] sock_diag_rcv_msg+0xdb/0x11a
> [152363.820644]  #2:  (nlk->cb_mutex){+.+.+.}, at: [<ffffffff81a67d01>] netlink_dump+0x23/0x1ab
> [152363.820693]  #3:  (rcu_read_lock){.+.+..}, at: [<ffffffff81b6a049>] packet_diag_dump+0x0/0x1af
> 
> Similar thing was then re-introduced by further packet diag patches (fanount 
> mutex and pgvec mutex for rings) :(
> 
> Apart from being terribly sorry for the above, I propose to change the packet
> sk list protection from spinlock to mutex. This lock currently protects two
> modifications:
> 
> * sklist
> * prot inuse counters
> 
> The sklist modifications can be just reprotected with mutex since they already
> occur in a sleeping context. The inuse counters modifications are trickier -- the
> __this_cpu_-s are used inside, thus requiring the caller to handle the potential
> issues with contexts himself. Since packet sockets' counters are modified in two 
> places only (packet_create and packet_release) we only need to protect the context 
> from being preempted. BH disabling is not required in this case.
> 
> Signed-off-by: Pavel Emelyanov <xemul@parallels•com>
> 
> ---
> 
> diff --git a/include/net/netns/packet.h b/include/net/netns/packet.h
> index cb4e894..4780b08 100644
> --- a/include/net/netns/packet.h
> +++ b/include/net/netns/packet.h
> @@ -8,7 +8,7 @@
>  #include <linux/spinlock.h>
>  
>  struct netns_packet {
> -	spinlock_t		sklist_lock;
> +	struct mutex		sklist_lock;
>  	struct hlist_head	sklist;
>  };
>  
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index 226b2cd..79bc69c 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -2308,10 +2308,13 @@ static int packet_release(struct socket *sock)
>  	net = sock_net(sk);
>  	po = pkt_sk(sk);
>  
> -	spin_lock_bh(&net->packet.sklist_lock);
> +	mutex_lock(&net->packet.sklist_lock);
>  	sk_del_node_init_rcu(sk);
> +	mutex_unlock(&net->packet.sklist_lock);

I am still a bit uncomfortable : are we allowed to sleep in a release()
handler ?

It seems yes, so :

Acked-by: Eric Dumazet <edumazet@google•com>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH net-next] packet: Protect packet sk list with mutex (v2)
  2012-08-22  6:59 ` Eric Dumazet
@ 2012-08-23  5:59   ` David Miller
  0 siblings, 0 replies; 3+ messages in thread
From: David Miller @ 2012-08-23  5:59 UTC (permalink / raw)
  To: eric.dumazet; +Cc: xemul, netdev

From: Eric Dumazet <eric.dumazet@gmail•com>
Date: Wed, 22 Aug 2012 08:59:17 +0200

> On Tue, 2012-08-21 at 15:06 +0400, Pavel Emelyanov wrote:
>> Change since v1:
>> 
>> * Fixed inuse counters access spotted by Eric
>> 
>> In patch eea68e2f (packet: Report socket mclist info via diag module) I've
>> introduced a "scheduling in atomic" problem in packet diag module -- the
>> socket list is traversed under rcu_read_lock() while performed under it sk
>> mclist access requires rtnl lock (i.e. -- mutex) to be taken.
>> 
>> [152363.820563] BUG: scheduling while atomic: crtools/12517/0x10000002
>> [152363.820573] 4 locks held by crtools/12517:
>> [152363.820581]  #0:  (sock_diag_mutex){+.+.+.}, at: [<ffffffff81a2dcb5>] sock_diag_rcv+0x1f/0x3e
>> [152363.820613]  #1:  (sock_diag_table_mutex){+.+.+.}, at: [<ffffffff81a2de70>] sock_diag_rcv_msg+0xdb/0x11a
>> [152363.820644]  #2:  (nlk->cb_mutex){+.+.+.}, at: [<ffffffff81a67d01>] netlink_dump+0x23/0x1ab
>> [152363.820693]  #3:  (rcu_read_lock){.+.+..}, at: [<ffffffff81b6a049>] packet_diag_dump+0x0/0x1af
>> 
>> Similar thing was then re-introduced by further packet diag patches (fanount 
>> mutex and pgvec mutex for rings) :(
>> 
>> Apart from being terribly sorry for the above, I propose to change the packet
>> sk list protection from spinlock to mutex. This lock currently protects two
>> modifications:
>> 
>> * sklist
>> * prot inuse counters
>> 
>> The sklist modifications can be just reprotected with mutex since they already
>> occur in a sleeping context. The inuse counters modifications are trickier -- the
>> __this_cpu_-s are used inside, thus requiring the caller to handle the potential
>> issues with contexts himself. Since packet sockets' counters are modified in two 
>> places only (packet_create and packet_release) we only need to protect the context 
>> from being preempted. BH disabling is not required in this case.
>> 
>> Signed-off-by: Pavel Emelyanov <xemul@parallels•com>
 ...
> Acked-by: Eric Dumazet <edumazet@google•com>

Applied, thanks.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-08-23  5:59 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-21 11:06 [PATCH net-next] packet: Protect packet sk list with mutex (v2) Pavel Emelyanov
2012-08-22  6:59 ` Eric Dumazet
2012-08-23  5:59   ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox