From: David Miller <davem@davemloft•net>
To: rusty@rustcorp•com.au
Cc: kuznet@ms2•inr.ac.ru, johnpol@2ka•mipt.ru, netdev@vger•kernel.org
Subject: Re: Netchannles: first stage has been completed. Further ideas.
Date: Mon, 31 Jul 2006 21:47:29 -0700 (PDT) [thread overview]
Message-ID: <20060731.214729.35358981.davem@davemloft.net> (raw)
In-Reply-To: <1154066044.5159.130.camel@localhost.localdomain>
From: Rusty Russell <rusty@rustcorp•com.au>
Date: Fri, 28 Jul 2006 15:54:04 +1000
> (1) I am imagining some Grand Unified Flow Cache (Olsson trie?) that
> holds (some subset of?) flows. A successful lookup immediately after
> packet comes off NIC gives destiny for packet: what route, (optionally)
> what socket, what filtering, what connection tracking (& what NAT), etc?
> I don't know if this should be a general array of fn & data ptrs, or
> specialized fields for each one, or a mix. Maybe there's a "too hard,
> do slow path" bit, or maybe hard cases just never get put in the cache.
> Perhaps we need a separate one for locally-generated packets, a-la
> ip_route_output(). Anyway, we trade slightly more expensive flow setup
> for faster packet processing within flows.
So, specifically, one of the methods you are thinking about might
be implemented by adding:
void (*input)(struct sk_buff *, void *);
void *input_data;
to "struct flow_cache_entry" or whatever replaces it?
This way we don't need some kind of "type" information in
the flow cache entry, since the input handler knows the type.
> One way to do this is to add a "have_interest" callback into the
> hook_ops, which takes each about-to-be-inserted GUFC entry and adds any
> destinies this hook cares about. In the case of packet filtering this
> would do a traversal and append a fn/data ptr to the entry for each rule
> which could effect it.
Can you give a concrete example of how the GUFC might make use
of this? Just some small abstract code snippets will do.
> The other way is to have the hooks register what they are interested in
> into a general data structure which GUFC entry creation then looks up
> itself. This general data structure will need to support wildcards
> though.
My gut reaction is that imposing a global data structure on all object
classes is not prudent. When we take a GUFC miss, it seems better we
call into the subsystems to resolve things. It can implement whatever
slow path lookup algorithm is most appropriate for it's data.
> We also need efficient ways of reflecting rule changes into the GUFC.
> We can be pretty slack with conntrack timeouts, but we either need to
> flush or handle callbacks from GUFC on timed-out entries. Packet
> filtering changes need to be synchronous, definitely.
This, I will remind, is similar to the problem of doing RCU locking
of the TCP hash tables.
> (3) Smart NICs that do some flowid work themselves can accelerate lookup
> implicitly (same flow goes to same CPU/thread) or explicitly (each
> CPU/thread maintains only part of GUFC which it needs, or even NIC
> returns flow cookie which is pointer to GUFC entry or subtree?). AFAICT
> this will magnify the payoff from the GUFC.
I want to warn you about HW issues that I mentioned to Alexey the
other week. If we are not careful, we can run into the same issues
TOE cards run into, performance wise.
Namely, it is important to be careful about how the GUFC table entries
get updated in the card. If you add them synchronously, your
connection rates will deteriorate dramatically.
I had the idea of a lazy scheme. When we create a GUFC entry, we
tack it onto a DMA'able linked list the card uses. We do not
notify the card, we just entail the update onto the list.
Then, if the card misses it's on-chip GUFC table on an incoming
packet, it checks the DMA update list by reading it in from memory.
It updates it's GUFC table with whatever entries are found on this
list, then it retries to classify the packet.
This seems like a possible good solution until we try to address GUFC
entry deletion, which unfortunately cannot be evaluated in a lazy
fashion. It must be synchronous. This is because if, for example, we
just killed off a TCP socket we must make sure we don't hit the GUFC
entry for the TCP identity of that socket any longer.
Just something to think about, when considering how to translate these
ideas into hardware.
next prev parent reply other threads:[~2006-08-01 4:48 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-07-18 8:16 Netchannles: first stage has been completed. Further ideas Evgeniy Polyakov
2006-07-18 8:34 ` David Miller
2006-07-18 8:50 ` Evgeniy Polyakov
2006-07-18 11:16 ` Christian Borntraeger
2006-07-18 11:51 ` Evgeniy Polyakov
2006-07-18 12:36 ` Christian Borntraeger
2006-07-18 19:11 ` Evgeniy Polyakov
2006-07-18 21:20 ` David Miller
2006-07-18 12:15 ` Jörn Engel
2006-07-18 19:08 ` Evgeniy Polyakov
2006-07-19 11:00 ` Jörn Engel
2006-07-20 7:42 ` Evgeniy Polyakov
2006-07-18 23:01 ` Alexey Kuznetsov
2006-07-19 0:39 ` David Miller
2006-07-19 5:38 ` Evgeniy Polyakov
2006-07-19 6:30 ` Evgeniy Polyakov
2006-07-19 13:19 ` Alexey Kuznetsov
2006-07-20 7:32 ` Evgeniy Polyakov
2006-07-20 16:41 ` Alexey Kuznetsov
2006-07-20 21:08 ` Evgeniy Polyakov
2006-07-20 21:21 ` Ben Greear
2006-07-21 7:19 ` Evgeniy Polyakov
2006-07-21 7:20 ` Evgeniy Polyakov
2006-07-21 16:14 ` Ben Greear
2006-07-21 16:27 ` Evgeniy Polyakov
2006-07-22 13:23 ` Caitlin Bestler
2006-07-20 21:40 ` Ian McDonald
2006-07-21 7:26 ` Evgeniy Polyakov
2006-07-20 22:59 ` Alexey Kuznetsov
2006-07-21 4:55 ` David Miller
2006-07-21 7:10 ` Evgeniy Polyakov
2006-07-21 7:47 ` David Miller
2006-07-21 9:06 ` Evgeniy Polyakov
2006-07-21 9:19 ` David Miller
2006-07-21 9:39 ` Evgeniy Polyakov
2006-07-21 9:46 ` David Miller
2006-07-21 9:55 ` Evgeniy Polyakov
2006-07-21 16:26 ` Rick Jones
2006-07-21 20:57 ` David Miller
2006-07-19 19:52 ` Stephen Hemminger
2006-07-19 20:01 ` David Miller
2006-07-19 20:16 ` Stephen Hemminger
2006-07-24 18:54 ` Stephen Hemminger
2006-07-24 20:52 ` Alexey Kuznetsov
2006-07-27 2:17 ` Rusty Russell
2006-07-27 5:17 ` David Miller
2006-07-27 5:46 ` Rusty Russell
2006-07-27 6:00 ` David Miller
2006-07-27 18:54 ` Stephen Hemminger
2006-07-28 8:21 ` David Miller
2006-07-28 5:54 ` Rusty Russell
2006-08-01 4:47 ` David Miller [this message]
2006-08-01 6:36 ` Rusty Russell
2006-07-27 16:33 ` Alexey Kuznetsov
2006-07-27 16:51 ` Evgeniy Polyakov
2006-07-27 20:56 ` Alexey Kuznetsov
2006-07-28 5:17 ` Evgeniy Polyakov
2006-07-28 5:34 ` David Miller
2006-07-28 5:47 ` Evgeniy Polyakov
2006-07-28 4:49 ` Rusty Russell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060731.214729.35358981.davem@davemloft.net \
--to=davem@davemloft$(echo .)net \
--cc=johnpol@2ka$(echo .)mipt.ru \
--cc=kuznet@ms2$(echo .)inr.ac.ru \
--cc=netdev@vger$(echo .)kernel.org \
--cc=rusty@rustcorp$(echo .)com.au \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox