public inbox for netdev@vger.kernel.org 
 help / color / mirror / Atom feed
From: Vlad Buslov <vladbu@mellanox•com>
To: Jiri Pirko <jiri@resnulli•us>
Cc: Vlad Buslov <vladbu@mellanox•com>,
	"netdev@vger•kernel.org" <netdev@vger•kernel.org>,
	"pablo@netfilter•org" <pablo@netfilter•org>,
	"xiyou.wangcong@gmail•com" <xiyou.wangcong@gmail•com>,
	"jhs@mojatatu•com" <jhs@mojatatu•com>, mlxsw <mlxsw@mellanox•com>,
	Alex Kushnarov <alexanderk@mellanox•com>,
	"pabeni@redhat•com" <pabeni@redhat•com>
Subject: Re: tc tp creation performance degratation since kernel 5.1
Date: Thu, 13 Jun 2019 11:26:17 +0000	[thread overview]
Message-ID: <vbfa7el20ff.fsf@mellanox.com> (raw)
In-Reply-To: <20190613111135.GA2201@nanopsycho.orion>


On Thu 13 Jun 2019 at 14:11, Jiri Pirko <jiri@resnulli•us> wrote:
> Thu, Jun 13, 2019 at 12:09:32PM CEST, vladbu@mellanox•com wrote:
>>On Thu 13 Jun 2019 at 11:11, Jiri Pirko <jiri@resnulli•us> wrote:
>>> I made a mistake during measurements, sorry about that.
>>>
>>> This is the correct script:
>>> -----------------------------------------------------------------------
>>> #!/bin/bash
>>>
>>> dev=testdummy
>>> ip link add name $dev type dummy
>>> ip link set dev $dev up
>>> tc qdisc add dev $dev ingress
>>>
>>> tmp_file_name=$(date +"/tmp/tc_batch.%s.%N.tmp")
>>> pref_id=1
>>>
>>> while [ $pref_id -lt 20000 ]
>>> do
>>>         echo "filter add dev $dev ingress proto ip pref $pref_id flower action drop" >> $tmp_file_name
>>>         #echo "filter add dev $dev ingress proto ip pref $pref_id matchall action drop" >> $tmp_file_name
>>>         ((pref_id++))
>>> done
>>>
>>> start=$(date +"%s")
>>> tc -b $tmp_file_name
>>> stop=$(date +"%s")
>>> echo "Insertion duration: $(($stop - $start)) sec"
>>> rm -f $tmp_file_name
>>>
>>> ip link del dev $dev
>>> -----------------------------------------------------------------------
>>>
>>> Note the commented out matchall. I don't see the regression with
>>> matchall. However, I see that with flower:
>>> kernel 5.1
>>> Insertion duration: 4 sec
>>> kernel 5.2
>>> Insertion duration: 163 sec
>>>
>>> I don't see any significant difference in perf:
>>> kernel 5.1
>>>     77.24%  tc               [kernel.vmlinux]    [k] tcf_chain_tp_find
>>>      1.67%  tc               [kernel.vmlinux]    [k] mutex_spin_on_owner
>>>      1.44%  tc               [kernel.vmlinux]    [k] _raw_spin_unlock_irqrestore
>>>      0.93%  tc               [kernel.vmlinux]    [k] idr_get_free
>>>      0.79%  tc_pref_scale_o  [kernel.vmlinux]    [k] do_syscall_64
>>>      0.69%  tc               [kernel.vmlinux]    [k] finish_task_switch
>>>      0.53%  tc               libc-2.28.so        [.] __memset_sse2_unaligned_erms
>>>      0.49%  tc               [kernel.vmlinux]    [k] __memset
>>>      0.36%  tc_pref_scale_o  libc-2.28.so        [.] malloc
>>>      0.30%  tc_pref_scale_o  libc-2.28.so        [.] _int_free
>>>      0.24%  tc               [kernel.vmlinux]    [k] __memcpy
>>>      0.23%  tc               [cls_flower]        [k] fl_change
>>>      0.23%  tc               [kernel.vmlinux]    [k] __nla_validate_parse
>>>      0.22%  tc               [kernel.vmlinux]    [k] __slab_alloc
>>>
>>>
>>>     75.57%  tc               [kernel.kallsyms]  [k] tcf_chain_tp_find
>>>      2.70%  tc               [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
>>>      1.13%  tc_pref_scale_o  [kernel.kallsyms]  [k] do_syscall_64
>>>      0.87%  tc               libc-2.28.so       [.] __memset_sse2_unaligned_erms
>>>      0.86%  ip               [kernel.kallsyms]  [k] finish_task_switch
>>>      0.67%  tc               [kernel.kallsyms]  [k] memset
>>>      0.63%  tc               [kernel.kallsyms]  [k] mutex_spin_on_owner
>>>      0.52%  tc_pref_scale_o  libc-2.28.so       [.] malloc
>>>      0.48%  tc               [kernel.kallsyms]  [k] idr_get_free
>>>      0.46%  tc               [kernel.kallsyms]  [k] fl_change
>>>      0.42%  tc_pref_scale_o  libc-2.28.so       [.] _int_free
>>>      0.35%  tc_pref_scale_o  libc-2.28.so       [.] __GI___strlen_sse2
>>>      0.35%  tc_pref_scale_o  libc-2.28.so       [.] __mbrtowc
>>>      0.34%  tc_pref_scale_o  libc-2.28.so       [.] __fcntl64_nocancel_adjusted
>>>
>>> Any ideas?
>>
>>Thanks for providing reproduction script!
>>
>>I've investigate the problem and found the root cause. First of all I
>>noticed that CPU utilization during problematic tc run is quite low
>>(<10%), so I decided to investigate why tc sleeps so much. I've used bcc
>>and obtained following off-CPU trace (uninteresting traces are omitted
>>for brevity):
>>
>>~$ sudo /usr/share/bcc/tools/offcputime -K -p `pgrep -nx tc`
>>Tracing off-CPU time (us) of PID 2069 by kernel stack... Hit Ctrl-C to end.
>>...
>>    finish_task_switch
>>    __sched_text_start
>>    schedule
>>    schedule_timeout
>>    wait_for_completion
>>    __wait_rcu_gp
>>    synchronize_rcu
>>    fl_change
>>    tc_new_tfilter
>>    rtnetlink_rcv_msg
>>    netlink_rcv_skb
>>    netlink_unicast
>>    netlink_sendmsg
>>    sock_sendmsg
>>    ___sys_sendmsg
>>    __sys_sendmsg
>>    do_syscall_64
>>    entry_SYSCALL_64_after_hwframe
>>    -                tc (2069)
>>        142284953
>>
>>As you can see 142 seconds are spent sleeping in synchronize_rcu(). The
>>code is in fl_create_new_mask() function:
>>
>>	err = rhashtable_replace_fast(&head->ht, &mask->ht_node,
>>				      &newmask->ht_node, mask_ht_params);
>>	if (err)
>>		goto errout_destroy;
>>
>>	/* Wait until any potential concurrent users of mask are finished */
>>	synchronize_rcu();
>>
>>The justification for this is described in comment in
>>fl_check_assign_mask() (user of fl_create_new_mask()):
>>
>>	/* Insert mask as temporary node to prevent concurrent creation of mask
>>	 * with same key. Any concurrent lookups with same key will return
>>	 * -EAGAIN because mask's refcnt is zero. It is safe to insert
>>	 * stack-allocated 'mask' to masks hash table because we call
>>	 * synchronize_rcu() before returning from this function (either in case
>>	 * of error or after replacing it with heap-allocated mask in
>>	 * fl_create_new_mask()).
>>	 */
>>	fnew->mask = rhashtable_lookup_get_insert_fast(&head->ht,
>>						       &mask->ht_node,
>>						       mask_ht_params);
>>
>>The offending commit is part of my series that implements unlocked
>>flower: 195c234d15c9 ("net: sched: flower: handle concurrent mask
>>insertion")
>>
>>The justification presented in it is no longer relevant since Ivan
>>Vecera changed mask to be dynamically allocated in commit 2cddd2014782
>>("net/sched: cls_flower: allocate mask dynamically in fl_change()").
>>With this we can just change fl_change() to deallocate temporary mask
>>with rcu grace period and remove offending syncrhonize_rcu() call.
>>
>>Any other suggestions?
>
> So basically you just change synchronize_rcu() to kfree_rcu(mask),
>    correct?

Not really. I remove synchronize_rcu() and change all kfree(mask) in
fl_change() to tcf_queue_work(&mask->rwork, fl_mask_free_work) which
uses queue_rcu_work() internally. This would allow us to deallocate
fl_flow_mask in same manner on all code paths and doesn't require any
extensions in fl_flow_mask struct (kfree_rcu would require extending it
with rcu_head).

  reply	other threads:[~2019-06-13 15:29 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-12 12:03 tc tp creation performance degratation since kernel 5.1 Jiri Pirko
2019-06-12 12:30 ` Paolo Abeni
2019-06-13  4:50   ` Jiri Pirko
2019-06-12 12:34 ` Vlad Buslov
2019-06-13  5:49   ` Jiri Pirko
2019-06-13  8:11 ` Jiri Pirko
2019-06-13 10:09   ` Vlad Buslov
2019-06-13 11:11     ` Jiri Pirko
2019-06-13 11:26       ` Vlad Buslov [this message]
2019-06-13 14:18         ` Jiri Pirko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=vbfa7el20ff.fsf@mellanox.com \
    --to=vladbu@mellanox$(echo .)com \
    --cc=alexanderk@mellanox$(echo .)com \
    --cc=jhs@mojatatu$(echo .)com \
    --cc=jiri@resnulli$(echo .)us \
    --cc=mlxsw@mellanox$(echo .)com \
    --cc=netdev@vger$(echo .)kernel.org \
    --cc=pabeni@redhat$(echo .)com \
    --cc=pablo@netfilter$(echo .)org \
    --cc=xiyou.wangcong@gmail$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox