From: Thomas Petazzoni <thomas.petazzoni@free-electrons•com>
To: Willy Tarreau <w@1wt•eu>
Cc: netdev@vger•kernel.org,
Gregory CLEMENT <gregory.clement@free-electrons•com>,
Eric Dumazet <edumazet@google•com>
Subject: Re: [PATCH] net: convert mvneta to build_skb()
Date: Fri, 5 Jul 2013 09:50:38 +0200 [thread overview]
Message-ID: <20130705095038.018de378@skate> (raw)
In-Reply-To: <20130705074330.GB25188@1wt.eu>
Dear Willy Tarreau,
On Fri, 5 Jul 2013 09:43:30 +0200, Willy Tarreau wrote:
> > Thanks Willy. Sorry for asking such a stupid question, but I'm not very
> > familiar with how this mechanism works. Can you explain why a single
> > 'frag_size' field per port is sufficient? My concern is that this
> > frag_size seems to be a per-packet information, but we have potentially
> > multiple packets being received, and multiple RX queues. Is one single
> > 'frag_size' per network interface sufficient?
>
> I had the exact same question when Eric sent me an experimental patch to
> do this on mv64xxx_eth a few months ago. Then I checked how the frag_size
> is computed. As you can see below, it does not depend on a per-packet size
> but on a per-port size which is in fact the MTU ("pkt_size" is the misleading
> part here) :
>
> skb_size = SKB_DATA_ALIGN(pp->pkt_size + MVNETA_MH_SIZE + NET_SKB_PAD) +
> SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>
> So if skb_size depends solely on pp (struct mvneta_port), then it makes
> sense to have frag_size stored at the same place.
>
> In practice we don't really need to store the frag_size in the struct, we
> just need to know if the data were allocated using netdev_alloc_frag()
> or kmalloc() to know how to free them. So a single bit would be enough,
> and I thought about just doing a +1 on the pointer when we need to free
> using kmalloc(). But that would add unneeded extra work in the fast path
> to fix the pointers. And since we need to pass frag_size to build_skb()
> it was a reasonable solution in my opinion.
Aah, okay makes sense. So now, the question that comes up is why this
skb_size calculation is done in every call of mvneta_rx_refill() ? It
should be computed once, at the same time pkt_size is calculated, and
stored in the mvneta_port structure? Then you just need to test whether
it is smaller or larger than PAGE_SIZE to decide whether to use
netdev_alloc_frag() vs. kmalloc().
I.e, I would turn all the:
if (pp->frag_size)
...
else
...
into:
if (pp->skb_size <= PAGE_SIZE)
...
else
...
Of course, you can always hide this test behind a small macro or inline
function, to make it something like:
if (mvneta_uses_small_skbs(pp))
...
else
...
A better name than mvneta_uses_small_skbs() would of course be useful.
> > For example, in mvneta_rx_refill(), you store the skb_size in
> > pp->frag_size, and then you later re-use it in mvneta_rxq_drop_pkts.
> > What guarantees you that mvneta_rx_refill() hasn't be called in the
> > mean time for a different packet, in a different rxq, for the same
> > network interface, and the value of pp->frag_size has been overridden?
>
> It's not a problem since the refill() applies pp->pkt_size which doesn't
> change between calls. It's only changed in mvneta_change_mtu() which
> first stops the device. So I think it's safe.
Yeah, sure, now that I see that pp->frag_size is constant for a given
MTU configuration, it looks safe. But I find the current code that
retests and reassigns pp->frag_size at every call of rxq_refill() to be
very confusing.
Thanks!
Thomas
--
Thomas Petazzoni, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com
next prev parent reply other threads:[~2013-07-05 7:50 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-07-04 17:35 [PATCH] net: convert mvneta to build_skb() Willy Tarreau
2013-07-04 21:31 ` David Miller
2013-07-04 22:12 ` Willy Tarreau
2013-07-05 7:17 ` Thomas Petazzoni
2013-07-05 7:43 ` Willy Tarreau
2013-07-05 7:50 ` Thomas Petazzoni [this message]
2013-07-05 8:09 ` Willy Tarreau
2013-07-15 14:34 ` Thomas Petazzoni
2013-07-15 15:12 ` Willy Tarreau
2013-07-15 15:23 ` Thomas Petazzoni
2013-07-15 15:30 ` Willy Tarreau
2013-07-15 15:35 ` Thomas Petazzoni
2013-07-15 15:52 ` Florian Fainelli
2013-07-15 17:01 ` Willy Tarreau
2013-07-15 19:44 ` Thomas Petazzoni
2013-07-15 23:02 ` Florian Fainelli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130705095038.018de378@skate \
--to=thomas.petazzoni@free-electrons$(echo .)com \
--cc=edumazet@google$(echo .)com \
--cc=gregory.clement@free-electrons$(echo .)com \
--cc=netdev@vger$(echo .)kernel.org \
--cc=w@1wt$(echo .)eu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox