public inbox for netdev@vger.kernel.org 
 help / color / mirror / Atom feed
From: John Fastabend <john.fastabend@gmail•com>
To: Eric Dumazet <eric.dumazet@gmail•com>
Cc: xiyou.wangcong@gmail•com, davem@davemloft•net, jhs@mojatatu•com,
	netdev@vger•kernel.org, paulmck@linux•vnet.ibm.com,
	brouer@redhat•com
Subject: Re: [net-next PATCH v3 09/15] net: sched: make cls_u32 lockless
Date: Tue, 09 Sep 2014 09:35:23 -0700	[thread overview]
Message-ID: <540F2C4B.6060404@gmail.com> (raw)
In-Reply-To: <1410268807.11872.177.camel@edumazet-glaptop2.roam.corp.google.com>

On 09/09/2014 06:20 AM, Eric Dumazet wrote:
> On Mon, 2014-09-08 at 22:58 -0700, John Fastabend wrote:
>> Make cls_u32 classifier safe to run without holding lock. This patch
>> converts statistics that are kept in read section u32_classify into
>> per cpu counters.
>>
>> This patch was tested with a tight u32 filter add/delete loop while
>> generating traffic with pktgen. By running pktgen on vlan devices
>> created on top of a physical device we can hit the qdisc layer
>> correctly. For ingress qdisc's a loopback cable was used.
>>
>> for i in {1..100}; do
>>          q=`echo $i%8|bc`;
>>          echo -n "u32 tos: iteration $i on queue $q";
>>          tc filter add dev p3p2 parent $p prio $i u32 match ip tos 0x10 0xff \
>>                    action skbedit queue_mapping $q;
>>          sleep 1;
>>          tc filter del dev p3p2 prio $i;
>>
>>          echo -n "u32 tos hash table: iteration $i on queue $q";
>>          tc filter add dev p3p2 parent $p protocol ip prio $i handle 628: u32 divisor 1
>>          tc filter add dev p3p2 parent $p protocol ip prio $i u32 \
>>                  match ip protocol 17 0xff link 628: offset at 0 mask 0xf00 shift 6 plus 0
>>          tc filter add dev p3p2 parent $p protocol ip prio $i u32 \
>>                  ht 628:0 match ip tos 0x10 0xff action skbedit queue_mapping $q
>>          sleep 2;
>>          tc filter del dev p3p2 prio $i
>>          sleep 1;
>> done
>>
>
>
> Note it might be easier to split this patch in 2 parts.
>
> (The percpu stuff could be done in a first step, then rcu conversion)

Sure, I'll split it up.

>
>> Signed-off-by: John Fastabend <john.r.fastabend@intel•com>
>> ---

[...]

>>
>>   static inline unsigned int u32_hash_fold(__be32 key,
>> @@ -96,7 +103,7 @@ static int u32_classify(struct sk_buff *skb, const struct tcf_proto *tp, struct
>>   		unsigned int	  off;
>>   	} stack[TC_U32_MAXDEPTH];
>>
>> -	struct tc_u_hnode *ht = tp->root;
>> +	struct tc_u_hnode *ht = rcu_dereference_bh(tp->root);
>>   	unsigned int off = skb_network_offset(skb);
>>   	struct tc_u_knode *n;
>>   	int sdepth = 0;
>> @@ -108,23 +115,23 @@ static int u32_classify(struct sk_buff *skb, const struct tcf_proto *tp, struct
>>   	int i, r;
>>
>>   next_ht:
>> -	n = ht->ht[sel];
>> +	n = rcu_dereference_bh(ht->ht[sel]);
>>
>>   next_knode:
>>   	if (n) {
>>   		struct tc_u32_key *key = n->sel.keys;
>>
>>   #ifdef CONFIG_CLS_U32_PERF
>> -		n->pf->rcnt += 1;
>> +		this_cpu_inc(n->pf->rcnt);
>
> As we run in BH, we are not preemptable, we can use instead :
> __this_cpu_inc()  (on all occurrences)
>
> Using a macro would also reduce the #ifdef mess of this file
>

I'll go ahead and do the macro conversion.

>>   		j = 0;
>>   #endif
>>
>>   #ifdef CONFIG_CLS_U32_MARK
>> -		if ((skb->mark & n->mark.mask) != n->mark.val) {
>> -			n = n->next;
>> +		if ((skb->mark & n->mask) != n->val) {
>> +			n = rcu_dereference_bh(n->next);
>>   			goto next_knode;
>>   		} else {
>> -			n->mark.success++;
>> +			this_cpu_inc(*n->pcpu_success);
>>   		}
>>   #endif
>>

[...]


>> @@ -326,11 +342,11 @@ static int u32_init(struct tcf_proto *tp)
>>   	}
>>
>>   	tp_c->refcnt++;
>> -	root_ht->next = tp_c->hlist;
>> -	tp_c->hlist = root_ht;
>> +	rcu_assign_pointer(root_ht->next, tp_c->hlist);
>
> 	RCU_INIT_POINTER() or root_ht->next = tp_c->hlist;
>

I'll fix all these as well.

>> +	rcu_assign_pointer(tp_c->hlist, root_ht);
>>   	root_ht->tp_c = tp_c;
>>
>> -	tp->root = root_ht;
>> +	rcu_assign_pointer(tp->root, root_ht);
>>   	tp->data = tp_c;
>>   	return 0;
>>   }
>> @@ -342,25 +358,33 @@ static int u32_destroy_key(struct tcf_proto *tp, struct tc_u_knode *n)
>>   	if (n->ht_down)
>>   		n->ht_down->refcnt--;
>>   #ifdef CONFIG_CLS_U32_PERF
>> -	kfree(n->pf);
>> +	free_percpu(n->pf);
>>   #endif
>>   	kfree(n);
>>   	return 0;
>>   }
>>
>> +void __u32_delete_key(struct rcu_head *rcu)
>
> Can we consistently use _rcu, as in u32_delete_key_rcu() ?

Yes, I'll rename these calls.

Thanks,
John

-- 
John Fastabend         Intel Corporation

  reply	other threads:[~2014-09-09 16:35 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-09  5:53 [net-next PATCH v3 00/15] net/sched use rcu filters John Fastabend
2014-09-09  5:54 ` [net-next PATCH v3 01/15] net: qdisc: use rcu prefix and silence sparse warnings John Fastabend
2014-09-09 12:05   ` Eric Dumazet
2014-09-09  5:54 ` [net-next PATCH v3 02/15] net: rcu-ify tcf_proto John Fastabend
2014-09-09 12:20   ` Eric Dumazet
2014-09-09 15:15     ` John Fastabend
2014-09-09 15:20       ` Eric Dumazet
2014-09-09  5:55 ` [net-next PATCH v3 03/15] net: sched: cls_basic use RCU John Fastabend
2014-09-09 12:29   ` Eric Dumazet
2014-09-09  5:55 ` [net-next PATCH v3 04/15] net: sched: cls_cgroup " John Fastabend
2014-09-09 12:34   ` Eric Dumazet
2014-09-09  5:56 ` [net-next PATCH v3 05/15] net: sched: cls_flow " John Fastabend
2014-09-09 12:41   ` Eric Dumazet
2014-09-09 16:09     ` John Fastabend
2014-09-09 16:20       ` Eric Dumazet
2014-09-09  5:56 ` [net-next PATCH v3 06/15] net: sched: fw " John Fastabend
2014-09-09 12:48   ` Eric Dumazet
2014-09-09  5:57 ` [net-next PATCH v3 07/15] net: sched: RCU cls_route John Fastabend
2014-09-09 12:59   ` Eric Dumazet
2014-09-09 16:24     ` John Fastabend
2014-09-09  5:57 ` [net-next PATCH v3 08/15] net: sched: RCU cls_tcindex John Fastabend
2014-09-09  5:58 ` [net-next PATCH v3 09/15] net: sched: make cls_u32 lockless John Fastabend
2014-09-09 13:20   ` Eric Dumazet
2014-09-09 16:35     ` John Fastabend [this message]
2014-09-09 18:02   ` Cong Wang
2014-09-10  4:53     ` John Fastabend
2014-09-09  5:58 ` [net-next PATCH v3 10/15] net: sched: rcu'ify cls_rsvp John Fastabend
2014-09-09 13:27   ` Eric Dumazet
2014-09-09 16:46     ` John Fastabend
2014-09-09  5:59 ` [net-next PATCH v3 11/15] net: sched: rcu'ify cls_bpf John Fastabend
2014-09-09  5:59 ` [net-next PATCH v3 12/15] net: sched: make tc_action safe to walk under RCU John Fastabend
2014-09-09  6:00 ` [net-next PATCH v3 13/15] net: sched: make bstats per cpu and estimator RCU safe John Fastabend
2014-09-09  6:00 ` [net-next PATCH v3 14/15] net: sched: make qstats per cpu John Fastabend
2014-09-09  6:01 ` [net-next PATCH v3 15/15] net: sched: drop ingress qdisc lock John Fastabend

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=540F2C4B.6060404@gmail.com \
    --to=john.fastabend@gmail$(echo .)com \
    --cc=brouer@redhat$(echo .)com \
    --cc=davem@davemloft$(echo .)net \
    --cc=eric.dumazet@gmail$(echo .)com \
    --cc=jhs@mojatatu$(echo .)com \
    --cc=netdev@vger$(echo .)kernel.org \
    --cc=paulmck@linux$(echo .)vnet.ibm.com \
    --cc=xiyou.wangcong@gmail$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox