From: Evgeniy Polyakov <johnpol@2ka•mipt.ru>
To: David Miller <davem@davemloft•net>
Cc: rdreier@cisco•com, ak@suse•de, tom@opengridcomputing•com,
netdev@vger•kernel.org, akpm@osdl•org
Subject: Re: RDMA will be reverted
Date: Tue, 25 Jul 2006 09:51:28 +0400 [thread overview]
Message-ID: <20060725055127.GA5103@2ka.mipt.ru> (raw)
In-Reply-To: <20060724.150613.54186472.davem@davemloft.net>
On Mon, Jul 24, 2006 at 03:06:13PM -0700, David Miller (davem@davemloft•net) wrote:
> Don't get too excited about VJ netchannels, more and more roadblocks
> to their practicality are being found every day.
>
> For example, my idea to allow ESTABLISHED TCP socket demux to be done
> before netfilter is flawed. Connection tracking and NAT can change
> the packet ID and loop it back to us to hit exactly an ESTABLISHED TCP
> socket, therefore we must always hit netfilter first.
There is no problem with netfilter and process context processing - when
skb is removed from hardware list/array and is being processed by
netfilter in netchannel (or in process context in general),
there is no problems if changed skb will be rerouted into different
queue and state.
> All the original costs of route, netfilter, TCP socket lookup all
> reappear as we make VJ netchannels fit all the rules of real practical
> systems, eliminating their gains entirely. I will also note in
> passing that papers on related ideas, such as the Exokernel stuff, are
> very careful to not address the issue of how practical 1) their demux
> engine is and 2) the negative side effects of userspace TCP
> implementations. For an example of the latter, if you have some 1GB
> JAVA process you do not want to wake that monster up just to do some
> ACK processing or TCP window updates, yet if you don't you violate
> TCP's rules and risk spurious unnecessary retransmits.
I still plan to continue userspace implementation.
If gigantic-java-monster (tm) is going to read some data - it has been
awakened already, thus it is in the memeory (with linked tcp lib), so
there is zero overhead.
> Furthermore, the VJ netchannel gains can be partially obtained from
> generic stateless facilities that we are going to get anyways.
> Networking chips supporting multiple MSI-X vectors, choosen by hashing
> the flow ID, can move TCP processing to "end nodes" which are cpu
> threads in this case, by having each such MSI-X vector target a
> different cpu thread.
And if that CPU is very busy?
Linux should somehow tell NIC that some CPUs are valid and some are not
right now, but not in a second, so scheduler must be tightly bound with
network internals.
Just my 2 coins.
--
Evgeniy Polyakov
next prev parent reply other threads:[~2006-07-25 5:55 UTC|newest]
Thread overview: 74+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-06-28 7:07 RDMA will be reverted David Miller
2006-06-28 7:41 ` Evgeniy Polyakov
2006-06-28 14:56 ` Tom Tucker
2006-06-28 15:01 ` Steve Wise
2006-06-29 16:54 ` Roland Dreier
2006-06-29 17:32 ` YOSHIFUJI Hideaki / 吉藤英明
2006-06-29 17:35 ` Roland Dreier
2006-06-29 17:40 ` YOSHIFUJI Hideaki / 吉藤英明
2006-06-29 19:46 ` David Miller
2006-06-29 20:11 ` Tom Tucker
2006-06-29 20:16 ` Tom Tucker
2006-06-29 20:19 ` David Miller
2006-06-29 20:47 ` Tom Tucker
2006-06-29 20:53 ` David Miller
2006-06-29 21:28 ` Tom Tucker
2006-06-29 21:25 ` Andi Kleen
2006-06-29 20:42 ` James Morris
2006-06-30 20:51 ` Roland Dreier
2006-06-30 21:16 ` David Miller
2006-06-30 23:01 ` Tom Tucker
2006-07-01 14:26 ` Andi Kleen
2006-07-04 18:34 ` Andy Gay
2006-07-04 20:47 ` Andi Kleen
2006-07-04 22:22 ` Andy Gay
2006-07-04 23:01 ` Andi Kleen
2006-07-04 23:48 ` Andy Gay
2006-07-05 0:04 ` Andi Kleen
2006-07-04 20:34 ` Roland Dreier
2006-07-24 22:06 ` David Miller
2006-07-24 23:10 ` Andi Kleen
2006-07-24 23:22 ` David Miller
2006-07-25 0:02 ` Andi Kleen
2006-07-25 0:29 ` Rick Jones
2006-07-25 0:45 ` David Miller
2006-07-25 0:55 ` Rick Jones
2006-07-25 1:04 ` Andi Kleen
2006-07-25 1:21 ` David Miller
2006-07-25 16:29 ` Rick Jones
2006-07-25 16:32 ` Andi Kleen
2006-07-25 1:03 ` Rick Jones
2006-07-25 1:42 ` Andi Kleen
2006-07-25 5:51 ` Evgeniy Polyakov [this message]
2006-07-25 6:48 ` David Miller
2006-07-25 6:59 ` Evgeniy Polyakov
2006-07-25 7:33 ` David Miller
2006-07-25 7:42 ` Evgeniy Polyakov
2006-07-05 17:09 ` Tom Tucker
2006-07-05 17:50 ` Steve Wise
2006-07-24 22:25 ` David Miller
2006-07-24 22:47 ` Caitlin Bestler
2006-07-24 22:23 ` David Miller
2006-07-24 22:57 ` Caitlin Bestler
2006-07-01 21:45 ` David Miller
2006-07-04 20:34 ` Roland Dreier
2006-07-05 18:27 ` David Miller
2006-07-05 20:29 ` Roland Dreier
2006-07-06 3:03 ` David Miller
2006-07-06 5:25 ` Tom Tucker
2006-07-06 14:08 ` Herbert Xu
2006-07-06 17:36 ` Tom Tucker
2006-07-07 0:03 ` Herbert Xu
2006-07-07 0:32 ` Tom Tucker
2006-07-07 6:53 ` David Miller
2006-07-07 8:11 ` What is RDMA (was: RDMA will be reverted) Herbert Xu
2006-07-07 18:25 ` Steve Wise
2006-07-11 8:17 ` Herbert Xu
2006-07-11 13:27 ` Steve Wise
2006-07-24 22:29 ` What is RDMA David Miller
2006-07-24 22:34 ` Rick Jones
2006-07-24 22:39 ` David Miller
2006-07-24 22:49 ` Andi Kleen
2006-07-07 13:29 ` RDMA will be reverted Tom Tucker
-- strict thread matches above, loose matches on Subject: below --
2006-07-06 13:26 Caitlin Bestler
2006-07-25 19:59 Tom Tucker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060725055127.GA5103@2ka.mipt.ru \
--to=johnpol@2ka$(echo .)mipt.ru \
--cc=ak@suse$(echo .)de \
--cc=akpm@osdl$(echo .)org \
--cc=davem@davemloft$(echo .)net \
--cc=netdev@vger$(echo .)kernel.org \
--cc=rdreier@cisco$(echo .)com \
--cc=tom@opengridcomputing$(echo .)com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox