From: Jesper Dangaard Brouer <brouer@redhat•com>
To: Hannes Frederic Sowa <hannes@stressinduktion•org>
Cc: brouer@redhat•com, Tom Herbert <tom@herbertland•com>,
Florian Westphal <fw@strlen•de>,
Linux Kernel Network Developers <netdev@vger•kernel.org>,
Alexander Duyck <alexander.duyck@gmail•com>,
John Fastabend <john.fastabend@gmail•com>,
linux-mm <linux-mm@kvack•org>
Subject: Re: Initial thoughts on TXDP
Date: Fri, 2 Dec 2016 14:01:02 +0100 [thread overview]
Message-ID: <20161202140102.1d515e0b@redhat.com> (raw)
In-Reply-To: <859a0c99-f427-1db8-d260-1297777792fb@stressinduktion.org>
On Thu, 1 Dec 2016 23:47:44 +0100
Hannes Frederic Sowa <hannes@stressinduktion•org> wrote:
> Side note:
>
> On 01.12.2016 20:51, Tom Herbert wrote:
> >> > E.g. "mini-skb": Even if we assume that this provides a speedup
> >> > (where does that come from? should make no difference if a 32 or
> >> > 320 byte buffer gets allocated).
Yes, the size of the allocation from the SLUB allocator does not change
base performance/cost much (at least for small objects, if < 1024).
Do notice the base SLUB alloc+free cost is fairly high (compared to a
201 cycles budget). Especially for networking as the free-side is very
likely to hit a slow path. SLUB fast-path 53 cycles, and slow-path
around 100 cycles (data from [1]). I've tried to address this with the
kmem_cache bulk APIs. Which reduce the cost to approx 30 cycles.
(Something we have not fully reaped the benefit from yet!)
[1] https://git.kernel.org/torvalds/c/ca257195511
> >> >
> > It's the zero'ing of three cache lines. I believe we talked about that
> > as netdev.
Actually 4 cache-lines, but with some cleanup I believe we can get down
to clearing 192 bytes 3 cache-lines.
>
> Jesper and me played with that again very recently:
>
> https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/time_bench_memset.c#L590
>
> In micro-benchmarks we saw a pretty good speed up not using the rep
> stosb generated by gcc builtin but plain movq's. Probably the cost model
> for __builtin_memset in gcc is wrong?
Yes, I believe so.
> When Jesper is free we wanted to benchmark this and maybe come up with a
> arch specific way of cleaning if it turns out to really improve throughput.
>
> SIMD instructions seem even faster but the kernel_fpu_begin/end() kill
> all the benefits.
One strange thing was, that on my skylake CPU (i7-6700K @4.00GHz),
Hannes's hand-optimized MOVQ ASM-code didn't go past 8 bytes per cycle,
or 32 cycles for 256 bytes.
Talking to Alex and John during netdev, and reading on the Intel arch,
I though that this CPU should be-able-to perform 16 bytes per cycle.
The CPU can do it as the rep-stos show this once the size gets large
enough.
On this CPU the memset rep stos starts to win around 512 bytes:
192/35 = 5.5 bytes/cycle
256/36 = 7.1 bytes/cycle
512/40 = 12.8 bytes/cycle
768/46 = 16.7 bytes/cycle
1024/52 = 19.7 bytes/cycle
2048/84 = 24.4 bytes/cycle
4096/148= 27.7 bytes/cycle
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack•org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack•org"> email@kvack•org </a>
next prev parent reply other threads:[~2016-12-02 13:01 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-30 22:54 Initial thoughts on TXDP Tom Herbert
2016-12-01 2:44 ` Florian Westphal
2016-12-01 19:51 ` Tom Herbert
2016-12-01 22:47 ` Hannes Frederic Sowa
2016-12-01 23:46 ` Tom Herbert
2016-12-02 14:36 ` Edward Cree
2016-12-02 17:12 ` Tom Herbert
2016-12-02 13:01 ` Jesper Dangaard Brouer [this message]
2016-12-02 12:13 ` Jesper Dangaard Brouer
2016-12-01 13:55 ` Sowmini Varadhan
2016-12-01 19:05 ` Tom Herbert
2016-12-01 19:48 ` Rick Jones
2016-12-01 20:18 ` Tom Herbert
2016-12-01 21:47 ` Rick Jones
2016-12-01 22:12 ` Tom Herbert
2016-12-02 0:04 ` Rick Jones
2016-12-01 20:13 ` Sowmini Varadhan
2016-12-01 20:39 ` Tom Herbert
2016-12-01 22:55 ` Hannes Frederic Sowa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161202140102.1d515e0b@redhat.com \
--to=brouer@redhat$(echo .)com \
--cc=alexander.duyck@gmail$(echo .)com \
--cc=fw@strlen$(echo .)de \
--cc=hannes@stressinduktion$(echo .)org \
--cc=john.fastabend@gmail$(echo .)com \
--cc=linux-mm@kvack$(echo .)org \
--cc=netdev@vger$(echo .)kernel.org \
--cc=tom@herbertland$(echo .)com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox