From: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public•gmane.org>
To: Dan Smith <danms-r/Jw6+rmf7HQT0dZR+AlfA@public•gmane.org>
Cc: netdev-u79uwXL29TY76Z2rM5mHXA@public•gmane.org,
ebiederm-aS9lmoZGLiVWk0Htik3J/w@public•gmane.org,
containers-qjLDD68F18O7TbgM5vRIOg@public•gmane.org,
den-GEFAQzZX7r8dnm+yROfE0A@public•gmane.org,
davem-fT/PcQaiUtIeIZ0/mPfg9Q@public•gmane.org,
benjamin.thery-6ktuUTfB/bM@public•gmane.org
Subject: Re: [PATCH 2/6] C/R: Basic support for network namespaces and devices (v5)
Date: Sat, 06 Mar 2010 17:21:40 -0500 [thread overview]
Message-ID: <4B92D574.1090006@cs.columbia.edu> (raw)
In-Reply-To: <87bpf1idic.fsf-FLMGYpZoEPULwtHQx/6qkW3U47Q5hpJU@public.gmane.org>
Dan Smith wrote:
> OL> What about leak detection ?
> OL> Aren't we missing {netns,netdev}_users()?
>
> This is something I need to give more thought to, but it's not as easy
> as it sounds. Network devices aren't released at the last put() like
> a lot of other things, and my initial attempts to reconcile the
> refcount after a checkpoint operation have not been successful.
>
> However, I'm not sure that it's as important here, because AFAIK, a
> network device can only exist in one network namespace at a time. If
> we're checkpointing a netdev, it's because we are checkpointing the
> namespace that it lives in. Making sure the netns isn't leaked out of
> the process tree would be much easier and just as effective, no?
We should guarantee that neither netns nor netdev leaks outside
the container; currently none is. If a netdev can only belong to
a single netns, then it suffices to only care about netns.
>
>>> +config CHECKPOINT_NETNS
>>> + bool
>>> + default y if NET && NET_NS && CHECKPOINT
>>> +
>
> OL> Did you mean this to be visible (settable) by the user ?
>
> No, it was specifically supposed to enable itself when those other
> items are enabled, but not be a user adjustable toggle. I had a
> discussion with Serge about it and we came to this as a solution,
> although I don't remember what the problem we started with was. I'll
> dig through my IRC logs to see if I can figure it out.
Duh.. my bad, I misinterpreted the code. That's fine.
BTW, there is a similar SYSVIPC_CHECKPOINT - we should decide
if we do X_CHECKPOINT or CHECKPOINT_X for a subsystem X, and
stick to that convention. I prefer the latter - what you did...
>
>>> + retry:
>>> + if (++pages > 4) {
>>> + addrs = -E2BIG;
>>> + goto out;
>>> + }
>
> OL> Why 4 ?
>
> It's just a sanity limit.
Hmm... let me be more explicit: why not keep trying until it
realloc fails ? or switch to vmalloc() at some point ?
>
> OL> Do we really need this special case ? I'd be happy with a ckpt_err()
> OL> for any error - and the actual error number would be useful to tell
> OL> which case it was.
>
> Unless I'm missing something, you asked for this specifically:
>
> https://lists.linux-foundation.org/pipermail/containers/2010-February/022844.html
Lol .. that was me :o But looking at the code it feels wrong,
because the errno already reveals the type of the problem.
I'm thinking - wouldn't it make sense to do error reporting
in checkpoint_netdev() if the call to ->ndo_checkpoint() fails ?
>
> OL> Isn't this check redundant ? I expect it to fail promptly in
> OL> checkpoint_netdev() above.
>
> No, as I've said a couple of times previously, this isn't the only way
> we can arrive at a netdev for checkpointing. This case is the one
> where we're marching through the netns and find a netdev that is not
> supported. The other is where we arrive at a device as a peer of
> another device, so the other check may come into play at times where
> this one doesn't and vice versa.
I'm confused: in checkpoint_ns() inside the for_each_netdev()
loop you first test for dev->netdev_ops->ndo_checkpoint and
then call checkpoint_obj(... CKPT_OBJ_NETDEV) - which in turn
will call checkpoint_netdev(), which will again test for
dev->netdev_ops->ndo_checkpoint ... am I reading it wrongly ?
>
> OL> This may be a bit simpler if you move the first deferqueue_add()
> OL> forward to just before the other one. Or better: change dq_netdev
> OL> to have two pointers, dev and peer (if any is null, the cleanup
> OL> function will skip).
>
> The reason it is this messy is because of the way network devices are
> deallocated. Since they don't destroy themselves on the final put(),
> we have to explicitly call unregister_netdev() on them when we know
> they're no longer used (else we block). Once we've added them to the
> deferqueue, we can no longer destroy them here because a reference is
> held and the deferqueue will run afterwards.
>
> The ordering of this is a result of me injecting failures at each step
> and working it out until I got it to not block on unregistering either
> of the devices in all of the error paths. That's not to say it's the
> best way, but there is a reason it's ordered the way it is.
>
How about this - to me it feels simpler:
dev = rtnl_newlink(veth_new_link_msg, &veth, this_name);
if (IS_ERR(dev))
return dev;
peer = dev_get_by_name(current->nsproxy->net_ns, peer_name);
if (!peer) {
ret = -EINVAL;
goto err_dev;
}
ret = ckpt_obj_insert(ctx, peer, h->veth.peer_ref,
CKPT_OBJ_NETDEV);
if (ret < 0)
goto err_peer;
dev_put(peer);
dq.dev = dev;
dq.peer = peer;
ret = deferqueue_add(ctx->deferqueue, &dq, sizeof(dq),
netdev_noop, netdev_cleanup);
if (ret)
goto err_peer;
(yes, you need to adjust struct dq_netdev and netdev_cleanup).
BTW, the variable "didreg" should disappear from restore_veth().
Oren.
next prev parent reply other threads:[~2010-03-06 22:21 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-02-25 20:43 C/R: Checkpoint and restore network namespaces and devices Dan Smith
2010-02-25 20:43 ` [PATCH 1/6] C/R: Add checkpoint and collect hooks to net_device_ops Dan Smith
2010-02-26 12:08 ` David Miller
2010-02-25 20:43 ` [PATCH 2/6] C/R: Basic support for network namespaces and devices (v5) Dan Smith
2010-02-26 12:08 ` David Miller
2010-02-26 14:56 ` Dan Smith
2010-03-06 3:55 ` Oren Laadan
[not found] ` <4B91D234.2020003-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-03-06 17:09 ` Dan Smith
2010-03-06 3:53 ` Oren Laadan
[not found] ` <4B91D1A3.9030404-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-03-06 17:08 ` Dan Smith
[not found] ` <87bpf1idic.fsf-FLMGYpZoEPULwtHQx/6qkW3U47Q5hpJU@public.gmane.org>
2010-03-06 22:21 ` Oren Laadan [this message]
2010-03-08 17:36 ` Dan Smith
2010-03-08 17:53 ` Eric W. Biederman
2010-03-08 18:07 ` Dan Smith
2010-03-08 18:36 ` Oren Laadan
2010-02-25 20:43 ` [PATCH 3/6] C/R: Add checkpoint support for veth devices (v2) Dan Smith
2010-02-26 12:09 ` David Miller
2010-02-25 20:43 ` [PATCH 4/6] C/R: Add loopback checkpoint support (v2) Dan Smith
2010-02-26 12:09 ` David Miller
2010-02-25 20:43 ` [PATCH 5/6] C/R: Add a checkpoint handler to the 'sit' device Dan Smith
2010-02-26 12:09 ` David Miller
2010-02-25 20:43 ` [PATCH 6/6] C/R: Add checkpoint support to macvlan driver Dan Smith
2010-02-26 12:09 ` David Miller
2010-03-15 2:49 ` C/R: Checkpoint and restore network namespaces and devices Oren Laadan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4B92D574.1090006@cs.columbia.edu \
--to=orenl-eqauephvms7envbuuze7ea@public$(echo .)gmane.org \
--cc=benjamin.thery-6ktuUTfB/bM@public$(echo .)gmane.org \
--cc=containers-qjLDD68F18O7TbgM5vRIOg@public$(echo .)gmane.org \
--cc=danms-r/Jw6+rmf7HQT0dZR+AlfA@public$(echo .)gmane.org \
--cc=davem-fT/PcQaiUtIeIZ0/mPfg9Q@public$(echo .)gmane.org \
--cc=den-GEFAQzZX7r8dnm+yROfE0A@public$(echo .)gmane.org \
--cc=ebiederm-aS9lmoZGLiVWk0Htik3J/w@public$(echo .)gmane.org \
--cc=netdev-u79uwXL29TY76Z2rM5mHXA@public$(echo .)gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox