Re: [PATCH 2/6] C/R: Basic support for network namespaces and devices (v5)

public inbox for netdev@vger.kernel.org 
 help / color / mirror / Atom feed

From: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public•gmane.org>
To: Dan Smith <danms-r/Jw6+rmf7HQT0dZR+AlfA@public•gmane.org>
Cc: netdev-u79uwXL29TY76Z2rM5mHXA@public•gmane.org,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w@public•gmane.org,
	containers-qjLDD68F18O7TbgM5vRIOg@public•gmane.org,
	den-GEFAQzZX7r8dnm+yROfE0A@public•gmane.org,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q@public•gmane.org,
	benjamin.thery-6ktuUTfB/bM@public•gmane.org
Subject: Re: [PATCH 2/6] C/R: Basic support for network namespaces and devices (v5)
Date: Sat, 06 Mar 2010 17:21:40 -0500	[thread overview]
Message-ID: <4B92D574.1090006@cs.columbia.edu> (raw)
In-Reply-To: <87bpf1idic.fsf-FLMGYpZoEPULwtHQx/6qkW3U47Q5hpJU@public.gmane.org>



Dan Smith wrote:
> OL> What about leak detection ?
> OL> Aren't we missing {netns,netdev}_users()?
> 
> This is something I need to give more thought to, but it's not as easy
> as it sounds.  Network devices aren't released at the last put() like
> a lot of other things, and my initial attempts to reconcile the
> refcount after a checkpoint operation have not been successful.
> 
> However, I'm not sure that it's as important here, because AFAIK, a
> network device can only exist in one network namespace at a time.  If
> we're checkpointing a netdev, it's because we are checkpointing the
> namespace that it lives in.  Making sure the netns isn't leaked out of
> the process tree would be much easier and just as effective, no?

We should guarantee that neither netns nor netdev leaks outside
the container; currently none is. If a netdev can only belong to
a single netns, then it suffices to only care about netns.

> 
>>> +config CHECKPOINT_NETNS
>>> +       bool
>>> +       default y if NET && NET_NS && CHECKPOINT
>>> +
> 
> OL> Did you mean this to be visible (settable) by the user ?
> 
> No, it was specifically supposed to enable itself when those other
> items are enabled, but not be a user adjustable toggle.  I had a
> discussion with Serge about it and we came to this as a solution,
> although I don't remember what the problem we started with was.  I'll
> dig through my IRC logs to see if I can figure it out.

Duh.. my bad, I misinterpreted the code. That's fine.

BTW, there is a similar SYSVIPC_CHECKPOINT - we should decide
if we do X_CHECKPOINT or CHECKPOINT_X for a subsystem X, and
stick to that convention. I prefer the latter - what you did...

> 
>>> + retry:
>>> +	if (++pages > 4) {
>>> +		addrs = -E2BIG;
>>> +		goto out;
>>> +	}
> 
> OL> Why 4 ?
> 
> It's just a sanity limit.

Hmm... let me be more explicit:  why not keep trying until it
realloc fails ?  or switch to vmalloc() at some point ?

> 
> OL> Do we really need this special case ?  I'd be happy with a ckpt_err()
> OL> for any error - and the actual error number would be useful to tell
> OL> which case it was.
> 
> Unless I'm missing something, you asked for this specifically:
> 
> https://lists.linux-foundation.org/pipermail/containers/2010-February/022844.html

Lol .. that was me :o  But looking at the code it feels wrong,
because the errno already reveals the type of the problem.

I'm thinking - wouldn't it make sense to do error reporting
in checkpoint_netdev() if the call to ->ndo_checkpoint() fails ?

> 
> OL> Isn't this check redundant ?  I expect it to fail promptly in
> OL> checkpoint_netdev() above.
> 
> No, as I've said a couple of times previously, this isn't the only way
> we can arrive at a netdev for checkpointing.  This case is the one
> where we're marching through the netns and find a netdev that is not
> supported.  The other is where we arrive at a device as a peer of
> another device, so the other check may come into play at times where
> this one doesn't and vice versa.

I'm confused: in checkpoint_ns() inside the for_each_netdev()
loop you first test for dev->netdev_ops->ndo_checkpoint and
then call checkpoint_obj(... CKPT_OBJ_NETDEV) - which in turn
will call checkpoint_netdev(), which will again test for
dev->netdev_ops->ndo_checkpoint ...  am I reading it wrongly ?

> 
> OL> This may be a bit simpler if you move the first deferqueue_add()
> OL> forward to just before the other one. Or better: change dq_netdev
> OL> to have two pointers, dev and peer (if any is null, the cleanup
> OL> function will skip).
> 
> The reason it is this messy is because of the way network devices are
> deallocated.  Since they don't destroy themselves on the final put(),
> we have to explicitly call unregister_netdev() on them when we know
> they're no longer used (else we block).  Once we've added them to the
> deferqueue, we can no longer destroy them here because a reference is
> held and the deferqueue will run afterwards.
> 
> The ordering of this is a result of me injecting failures at each step
> and working it out until I got it to not block on unregistering either
> of the devices in all of the error paths.  That's not to say it's the
> best way, but there is a reason it's ordered the way it is.
> 

How about this - to me it feels simpler:

	dev = rtnl_newlink(veth_new_link_msg, &veth, this_name);
	if (IS_ERR(dev))
		return dev;

	peer = dev_get_by_name(current->nsproxy->net_ns, peer_name);
	if (!peer) {
		ret = -EINVAL;
		goto err_dev;
	}
	ret = ckpt_obj_insert(ctx, peer, h->veth.peer_ref,
			      CKPT_OBJ_NETDEV);
	if (ret < 0)
		goto err_peer;

	dev_put(peer);

	dq.dev = dev;
	dq.peer = peer;
	ret = deferqueue_add(ctx->deferqueue, &dq, sizeof(dq),
			     netdev_noop, netdev_cleanup);
	if (ret)
		goto err_peer;

(yes, you need to adjust struct dq_netdev and netdev_cleanup).

BTW, the variable "didreg" should disappear from restore_veth().

Oren.

next prev parent reply	other threads:[~2010-03-06 22:21 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-25 20:43 C/R: Checkpoint and restore network namespaces and devices Dan Smith
2010-02-25 20:43 ` [PATCH 1/6] C/R: Add checkpoint and collect hooks to net_device_ops Dan Smith
2010-02-26 12:08   ` David Miller
2010-02-25 20:43 ` [PATCH 2/6] C/R: Basic support for network namespaces and devices (v5) Dan Smith
2010-02-26 12:08   ` David Miller
2010-02-26 14:56     ` Dan Smith
2010-03-06  3:55       ` Oren Laadan
     [not found]         ` <4B91D234.2020003-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-03-06 17:09           ` Dan Smith
2010-03-06  3:53   ` Oren Laadan
     [not found]     ` <4B91D1A3.9030404-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-03-06 17:08       ` Dan Smith
     [not found]         ` <87bpf1idic.fsf-FLMGYpZoEPULwtHQx/6qkW3U47Q5hpJU@public.gmane.org>
2010-03-06 22:21           ` Oren Laadan [this message]
2010-03-08 17:36             ` Dan Smith
2010-03-08 17:53               ` Eric W. Biederman
2010-03-08 18:07                 ` Dan Smith
2010-03-08 18:36               ` Oren Laadan
2010-02-25 20:43 ` [PATCH 3/6] C/R: Add checkpoint support for veth devices (v2) Dan Smith
2010-02-26 12:09   ` David Miller
2010-02-25 20:43 ` [PATCH 4/6] C/R: Add loopback checkpoint support (v2) Dan Smith
2010-02-26 12:09   ` David Miller
2010-02-25 20:43 ` [PATCH 5/6] C/R: Add a checkpoint handler to the 'sit' device Dan Smith
2010-02-26 12:09   ` David Miller
2010-02-25 20:43 ` [PATCH 6/6] C/R: Add checkpoint support to macvlan driver Dan Smith
2010-02-26 12:09   ` David Miller
2010-03-15  2:49 ` C/R: Checkpoint and restore network namespaces and devices Oren Laadan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B92D574.1090006@cs.columbia.edu \
    --to=orenl-eqauephvms7envbuuze7ea@public$(echo .)gmane.org \
    --cc=benjamin.thery-6ktuUTfB/bM@public$(echo .)gmane.org \
    --cc=containers-qjLDD68F18O7TbgM5vRIOg@public$(echo .)gmane.org \
    --cc=danms-r/Jw6+rmf7HQT0dZR+AlfA@public$(echo .)gmane.org \
    --cc=davem-fT/PcQaiUtIeIZ0/mPfg9Q@public$(echo .)gmane.org \
    --cc=den-GEFAQzZX7r8dnm+yROfE0A@public$(echo .)gmane.org \
    --cc=ebiederm-aS9lmoZGLiVWk0Htik3J/w@public$(echo .)gmane.org \
    --cc=netdev-u79uwXL29TY76Z2rM5mHXA@public$(echo .)gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox