From: Jiri Pirko <jiri@resnulli•us>
To: Vlad Buslov <vladbu@mellanox•com>
Cc: "netdev@vger•kernel.org" <netdev@vger•kernel.org>,
"pablo@netfilter•org" <pablo@netfilter•org>,
"xiyou.wangcong@gmail•com" <xiyou.wangcong@gmail•com>,
"jhs@mojatatu•com" <jhs@mojatatu•com>, mlxsw <mlxsw@mellanox•com>,
Alex Kushnarov <alexanderk@mellanox•com>,
"pabeni@redhat•com" <pabeni@redhat•com>
Subject: Re: tc tp creation performance degratation since kernel 5.1
Date: Thu, 13 Jun 2019 13:11:35 +0200 [thread overview]
Message-ID: <20190613111135.GA2201@nanopsycho.orion> (raw)
In-Reply-To: <vbfblz123vt.fsf@mellanox.com>
Thu, Jun 13, 2019 at 12:09:32PM CEST, vladbu@mellanox•com wrote:
>On Thu 13 Jun 2019 at 11:11, Jiri Pirko <jiri@resnulli•us> wrote:
>> I made a mistake during measurements, sorry about that.
>>
>> This is the correct script:
>> -----------------------------------------------------------------------
>> #!/bin/bash
>>
>> dev=testdummy
>> ip link add name $dev type dummy
>> ip link set dev $dev up
>> tc qdisc add dev $dev ingress
>>
>> tmp_file_name=$(date +"/tmp/tc_batch.%s.%N.tmp")
>> pref_id=1
>>
>> while [ $pref_id -lt 20000 ]
>> do
>> echo "filter add dev $dev ingress proto ip pref $pref_id flower action drop" >> $tmp_file_name
>> #echo "filter add dev $dev ingress proto ip pref $pref_id matchall action drop" >> $tmp_file_name
>> ((pref_id++))
>> done
>>
>> start=$(date +"%s")
>> tc -b $tmp_file_name
>> stop=$(date +"%s")
>> echo "Insertion duration: $(($stop - $start)) sec"
>> rm -f $tmp_file_name
>>
>> ip link del dev $dev
>> -----------------------------------------------------------------------
>>
>> Note the commented out matchall. I don't see the regression with
>> matchall. However, I see that with flower:
>> kernel 5.1
>> Insertion duration: 4 sec
>> kernel 5.2
>> Insertion duration: 163 sec
>>
>> I don't see any significant difference in perf:
>> kernel 5.1
>> 77.24% tc [kernel.vmlinux] [k] tcf_chain_tp_find
>> 1.67% tc [kernel.vmlinux] [k] mutex_spin_on_owner
>> 1.44% tc [kernel.vmlinux] [k] _raw_spin_unlock_irqrestore
>> 0.93% tc [kernel.vmlinux] [k] idr_get_free
>> 0.79% tc_pref_scale_o [kernel.vmlinux] [k] do_syscall_64
>> 0.69% tc [kernel.vmlinux] [k] finish_task_switch
>> 0.53% tc libc-2.28.so [.] __memset_sse2_unaligned_erms
>> 0.49% tc [kernel.vmlinux] [k] __memset
>> 0.36% tc_pref_scale_o libc-2.28.so [.] malloc
>> 0.30% tc_pref_scale_o libc-2.28.so [.] _int_free
>> 0.24% tc [kernel.vmlinux] [k] __memcpy
>> 0.23% tc [cls_flower] [k] fl_change
>> 0.23% tc [kernel.vmlinux] [k] __nla_validate_parse
>> 0.22% tc [kernel.vmlinux] [k] __slab_alloc
>>
>>
>> 75.57% tc [kernel.kallsyms] [k] tcf_chain_tp_find
>> 2.70% tc [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
>> 1.13% tc_pref_scale_o [kernel.kallsyms] [k] do_syscall_64
>> 0.87% tc libc-2.28.so [.] __memset_sse2_unaligned_erms
>> 0.86% ip [kernel.kallsyms] [k] finish_task_switch
>> 0.67% tc [kernel.kallsyms] [k] memset
>> 0.63% tc [kernel.kallsyms] [k] mutex_spin_on_owner
>> 0.52% tc_pref_scale_o libc-2.28.so [.] malloc
>> 0.48% tc [kernel.kallsyms] [k] idr_get_free
>> 0.46% tc [kernel.kallsyms] [k] fl_change
>> 0.42% tc_pref_scale_o libc-2.28.so [.] _int_free
>> 0.35% tc_pref_scale_o libc-2.28.so [.] __GI___strlen_sse2
>> 0.35% tc_pref_scale_o libc-2.28.so [.] __mbrtowc
>> 0.34% tc_pref_scale_o libc-2.28.so [.] __fcntl64_nocancel_adjusted
>>
>> Any ideas?
>
>Thanks for providing reproduction script!
>
>I've investigate the problem and found the root cause. First of all I
>noticed that CPU utilization during problematic tc run is quite low
>(<10%), so I decided to investigate why tc sleeps so much. I've used bcc
>and obtained following off-CPU trace (uninteresting traces are omitted
>for brevity):
>
>~$ sudo /usr/share/bcc/tools/offcputime -K -p `pgrep -nx tc`
>Tracing off-CPU time (us) of PID 2069 by kernel stack... Hit Ctrl-C to end.
>...
> finish_task_switch
> __sched_text_start
> schedule
> schedule_timeout
> wait_for_completion
> __wait_rcu_gp
> synchronize_rcu
> fl_change
> tc_new_tfilter
> rtnetlink_rcv_msg
> netlink_rcv_skb
> netlink_unicast
> netlink_sendmsg
> sock_sendmsg
> ___sys_sendmsg
> __sys_sendmsg
> do_syscall_64
> entry_SYSCALL_64_after_hwframe
> - tc (2069)
> 142284953
>
>As you can see 142 seconds are spent sleeping in synchronize_rcu(). The
>code is in fl_create_new_mask() function:
>
> err = rhashtable_replace_fast(&head->ht, &mask->ht_node,
> &newmask->ht_node, mask_ht_params);
> if (err)
> goto errout_destroy;
>
> /* Wait until any potential concurrent users of mask are finished */
> synchronize_rcu();
>
>The justification for this is described in comment in
>fl_check_assign_mask() (user of fl_create_new_mask()):
>
> /* Insert mask as temporary node to prevent concurrent creation of mask
> * with same key. Any concurrent lookups with same key will return
> * -EAGAIN because mask's refcnt is zero. It is safe to insert
> * stack-allocated 'mask' to masks hash table because we call
> * synchronize_rcu() before returning from this function (either in case
> * of error or after replacing it with heap-allocated mask in
> * fl_create_new_mask()).
> */
> fnew->mask = rhashtable_lookup_get_insert_fast(&head->ht,
> &mask->ht_node,
> mask_ht_params);
>
>The offending commit is part of my series that implements unlocked
>flower: 195c234d15c9 ("net: sched: flower: handle concurrent mask
>insertion")
>
>The justification presented in it is no longer relevant since Ivan
>Vecera changed mask to be dynamically allocated in commit 2cddd2014782
>("net/sched: cls_flower: allocate mask dynamically in fl_change()").
>With this we can just change fl_change() to deallocate temporary mask
>with rcu grace period and remove offending syncrhonize_rcu() call.
>
>Any other suggestions?
So basically you just change synchronize_rcu() to kfree_rcu(mask),
correct?
next prev parent reply other threads:[~2019-06-13 15:30 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-12 12:03 tc tp creation performance degratation since kernel 5.1 Jiri Pirko
2019-06-12 12:30 ` Paolo Abeni
2019-06-13 4:50 ` Jiri Pirko
2019-06-12 12:34 ` Vlad Buslov
2019-06-13 5:49 ` Jiri Pirko
2019-06-13 8:11 ` Jiri Pirko
2019-06-13 10:09 ` Vlad Buslov
2019-06-13 11:11 ` Jiri Pirko [this message]
2019-06-13 11:26 ` Vlad Buslov
2019-06-13 14:18 ` Jiri Pirko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190613111135.GA2201@nanopsycho.orion \
--to=jiri@resnulli$(echo .)us \
--cc=alexanderk@mellanox$(echo .)com \
--cc=jhs@mojatatu$(echo .)com \
--cc=mlxsw@mellanox$(echo .)com \
--cc=netdev@vger$(echo .)kernel.org \
--cc=pabeni@redhat$(echo .)com \
--cc=pablo@netfilter$(echo .)org \
--cc=vladbu@mellanox$(echo .)com \
--cc=xiyou.wangcong@gmail$(echo .)com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox