From: Andrew <nitr0@seti•kr.ua>
To: Michael Ma <make0818@gmail•com>,
Jesper Dangaard Brouer <brouer@redhat•com>
Cc: netdev@vger•kernel.org
Subject: Re: qdisc spin lock
Date: Sat, 16 Apr 2016 11:52:05 +0300 [thread overview]
Message-ID: <5711FD35.90108@seti.kr.ua> (raw)
In-Reply-To: <CAAmHdhwpVOCv=4Y+pb9PfGKWV0ooqnr7eC58ZYfRTtYjC35EFw@mail.gmail.com>
I think that it isn't a good solution - unless you can bind specified
host (src/dst) to specified txq. Usually traffic is spreaded on txqs by
src+dst IP (or even IP:port) hash which results in traffic spreading
among all mqs on device, and wrong bandwidth limiting (N*bandwidth on
multi-session load like p2p/server traffic)...
People said that hfsc shaper have better performance, but I didn't
tested it.
01.04.2016 02:41, Michael Ma пишет:
> Thanks for the suggestion - I'll try the MQ solution out. It seems to
> be able to solve the problem well with the assumption that bandwidth
> can be statically partitioned.
>
> 2016-03-31 12:18 GMT-07:00 Jesper Dangaard Brouer <brouer@redhat•com>:
>> On Wed, 30 Mar 2016 00:20:03 -0700 Michael Ma <make0818@gmail•com> wrote:
>>
>>> I know this might be an old topic so bare with me – what we are facing
>>> is that applications are sending small packets using hundreds of
>>> threads so the contention on spin lock in __dev_xmit_skb increases the
>>> latency of dev_queue_xmit significantly. We’re building a network QoS
>>> solution to avoid interference of different applications using HTB.
>> Yes, as you have noticed with HTB there is a single qdisc lock, and
>> congestion obviously happens :-)
>>
>> It is possible with different tricks to make it scale. I believe
>> Google is using a variant of HTB, and it scales for them. They have
>> not open source their modifications to HTB (which likely also involves
>> a great deal of setup tricks).
>>
>> If your purpose it to limit traffic/bandwidth per "cloud" instance,
>> then you can just use another TC setup structure. Like using MQ and
>> assigning a HTB per MQ queue (where the MQ queues are bound to each
>> CPU/HW queue)... But you have to figure out this setup yourself...
>>
>>
>>> But in this case when some applications send massive small packets in
>>> parallel, the application to be protected will get its throughput
>>> affected (because it’s doing synchronous network communication using
>>> multiple threads and throughput is sensitive to the increased latency)
>>>
>>> Here is the profiling from perf:
>>>
>>> - 67.57% iperf [kernel.kallsyms] [k] _spin_lock
>>> - 99.94% dev_queue_xmit
>>> - 96.91% _spin_lock
>>> - 2.62% __qdisc_run
>>> - 98.98% sch_direct_xmit
>>> - 99.98% _spin_lock
>>>
>>> As far as I understand the design of TC is to simplify locking schema
>>> and minimize the work in __qdisc_run so that throughput won’t be
>>> affected, especially with large packets. However if the scenario is
>>> that multiple classes in the queueing discipline only have the shaping
>>> limit, there isn’t really a necessary correlation between different
>>> classes. The only synchronization point should be when the packet is
>>> dequeued from the qdisc queue and enqueued to the transmit queue of
>>> the device. My question is – is it worth investing on avoiding the
>>> locking contention by partitioning the queue/lock so that this
>>> scenario is addressed with relatively smaller latency?
>> Yes, there is a lot go gain, but it is not easy ;-)
>>
>>> I must have oversimplified a lot of details since I’m not familiar
>>> with the TC implementation at this point – just want to get your input
>>> in terms of whether this is a worthwhile effort or there is something
>>> fundamental that I’m not aware of. If this is just a matter of quite
>>> some additional work, would also appreciate helping to outline the
>>> required work here.
>>>
>>> Also would appreciate if there is any information about the latest
>>> status of this work http://www.ijcset.com/docs/IJCSET13-04-04-113.pdf
>> This article seems to be very low quality... spelling errors, only 5
>> pages, no real code, etc.
>>
>> --
>> Best regards,
>> Jesper Dangaard Brouer
>> MSc.CS, Principal Kernel Engineer at Red Hat
>> Author of http://www.iptv-analyzer.org
>> LinkedIn: http://www.linkedin.com/in/brouer
next prev parent reply other threads:[~2016-04-16 9:11 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-30 7:20 qdisc spin lock Michael Ma
2016-03-31 19:18 ` Jesper Dangaard Brouer
2016-03-31 23:41 ` Michael Ma
2016-04-16 8:52 ` Andrew [this message]
2016-03-31 22:16 ` Cong Wang
2016-03-31 23:48 ` Michael Ma
2016-04-01 2:19 ` David Miller
2016-04-01 17:17 ` Michael Ma
2016-04-01 3:44 ` John Fastabend
2016-04-13 18:23 ` Michael Ma
2016-04-08 14:19 ` Eric Dumazet
2016-04-15 22:46 ` Michael Ma
2016-04-15 22:54 ` Eric Dumazet
2016-04-15 23:05 ` Michael Ma
2016-04-15 23:56 ` Eric Dumazet
2016-04-20 21:24 ` Michael Ma
2016-04-20 22:34 ` Eric Dumazet
2016-04-21 5:51 ` Michael Ma
2016-04-21 12:41 ` Eric Dumazet
2016-04-21 22:12 ` Michael Ma
2016-04-25 17:29 ` Michael Ma
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5711FD35.90108@seti.kr.ua \
--to=nitr0@seti$(echo .)kr.ua \
--cc=brouer@redhat$(echo .)com \
--cc=make0818@gmail$(echo .)com \
--cc=netdev@vger$(echo .)kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox