Re: [Patch net-next] net: make neigh tables per netns

public inbox for netdev@vger.kernel.org 
 help / color / mirror / Atom feed

From: ebiederm@xmission•com (Eric W. Biederman)
To: Cong Wang <xiyou.wangcong@gmail•com>
Cc: David Miller <davem@davemloft•net>,
	Linux Kernel Network Developers <netdev@vger•kernel.org>,
	Patrick McHardy <kaber@trash•net>,
	Stephen Hemminger <stephen@networkplumber•org>,
	Cong Wang <cwang@twopensource•com>,
	Stefan Bader <stefan.bader@canonical•com>,
	stephane.graber@canonical•com, chris.j.arges@canonical•com,
	Serge Hallyn <serge.hallyn@canonical•com>
Subject: Re: [Patch net-next] net: make neigh tables per netns
Date: Fri, 27 Jun 2014 22:12:52 -0700	[thread overview]
Message-ID: <87vbrl8vmz.fsf@x220.int.ebiederm.org> (raw)
In-Reply-To: <CAM_iQpX7iPR=32kog=QJm4G1tgRfg80Hr8Y=BOgUiGnym5EmKw@mail.gmail.com> (Cong Wang's message of "Fri, 27 Jun 2014 17:09:15 -0700")

Cong Wang <xiyou.wangcong@gmail•com> writes:

> On Thu, Jun 26, 2014 at 3:44 PM, David Miller <davem@davemloft•net> wrote:
>>
>> First of all it is clear that once you start creating containers on the
>> order of half the global neigh limit, yes you will run into problems as
>> it's easy to have 2 or more outputs in flight.
>>
>> So it would perhaps be wise to scale the limits (in some way) based
>> upon the number of namespaces, but still keep it a global limit.
>>
>> These entries consume a global resource (memory) and benefit from
>> global sharing, so I am still convinced that making the tables
>> themselves per-ns does not make any sense.
>>
>> Secondly, if there are things holding onto neighbour entries for real
>> we should find this out.  Once could audit neigh_lookup*() invocations
>> to see where that might be happening.  Also neigh_create() calls with
>> 'want_ref' set to true.
>>
>
> Hmm, I did overlook the potential DOS problem. But hold on, isn't
> IP fragments have the same problem? The fragment queues are per
> netns, and the thresh is per netns as well, we will eventually have
> memory pressure as well.

Interesting.  It does look like ip fragments are susceptible that way.

Sorting out limits is something that that is still quite rough, in the
code today.

Limits serve two basic purposes.
- Basic sanity limits so that a buggy application can be
  killed/stopped hopefully before they take down the entire machine.

  Think of the file descriptor limit.

- Machine hogging limits to prevent one application from interferring
  with other applications.  This is what the kernel memory limit of
  the memory cgroup tries to implememt.

These purposes aren't entirely distinct.  So it is a bit of a challenge
to separate them.

Basic sanity limits are the easiest to comprehend as the reasoning is
all local.  You just have to say any application that uses more than X
amount of a resource is clearly buggy.  With a sysctl/rlimit knob to
handle those rare applications that legitimately need more than X.

Machine hogging limits are very different as that actually requires
looking at how global state is used.  I would like to say that the
memory cgroup tackles successfully that problem but it last I looked it
has some nasty deadlock potentials when dealing with kernel memory.

I wish I had a clear recipe I could point people at to get all of these
issues sorted correctly, unfortunately all I have is a little bit of
clarity as to what the problems actually are.

Eric

next prev parent reply	other threads:[~2014-06-28  5:16 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-23 22:09 [Patch net-next] net: make neigh tables per netns Cong Wang
2014-06-25 23:33 ` David Miller
2014-06-26  0:04 ` Eric W. Biederman
2014-06-26  0:22   ` Cong Wang
2014-06-26  1:17     ` Eric W. Biederman
2014-06-26  6:14       ` Michal Kubecek
2014-06-26 12:10         ` Eric W. Biederman
2014-06-26 20:43       ` David Miller
     [not found]         ` <87egybibh5.fsf@x220.int.ebiederm.org>
2014-06-26 22:44           ` David Miller
2014-06-28  0:09             ` Cong Wang
2014-06-28  5:12               ` Eric W. Biederman [this message]
2014-06-30 18:15                 ` Jesper Dangaard Brouer
2014-06-30 18:54                   ` Hannes Frederic Sowa
2014-11-04 15:49                     ` Stéphane Graber

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87vbrl8vmz.fsf@x220.int.ebiederm.org \
    --to=ebiederm@xmission$(echo .)com \
    --cc=chris.j.arges@canonical$(echo .)com \
    --cc=cwang@twopensource$(echo .)com \
    --cc=davem@davemloft$(echo .)net \
    --cc=kaber@trash$(echo .)net \
    --cc=netdev@vger$(echo .)kernel.org \
    --cc=serge.hallyn@canonical$(echo .)com \
    --cc=stefan.bader@canonical$(echo .)com \
    --cc=stephane.graber@canonical$(echo .)com \
    --cc=stephen@networkplumber$(echo .)org \
    --cc=xiyou.wangcong@gmail$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox