From: Glauber Costa <glommer@parallels•com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp•fujitsu.com>
Cc: Balbir Singh <bsingharora@gmail•com>,
Greg Thelen <gthelen@google•com>, <linux-kernel@vger•kernel.org>,
<paul@paulmenage•org>, <lizf@cn•fujitsu.com>,
<ebiederm@xmission•com>, <davem@davemloft•net>,
<netdev@vger•kernel.org>, <linux-mm@kvack•org>,
<kirill@shutemov•name>
Subject: Re: [PATCH v3 2/7] socket: initial cgroup code.
Date: Tue, 27 Sep 2011 17:43:29 -0300 [thread overview]
Message-ID: <4E823571.6060001@parallels.com> (raw)
In-Reply-To: <20110926195213.12da87b4.kamezawa.hiroyu@jp.fujitsu.com>
[-- Attachment #1: Type: text/plain, Size: 2758 bytes --]
On 09/26/2011 07:52 AM, KAMEZAWA Hiroyuki wrote:
> On Sat, 24 Sep 2011 11:45:04 -0300
> Glauber Costa<glommer@parallels•com> wrote:
>
>> On 09/22/2011 12:09 PM, Balbir Singh wrote:
>>> On Thu, Sep 22, 2011 at 11:30 AM, Greg Thelen<gthelen@google•com> wrote:
>>>> On Wed, Sep 21, 2011 at 11:59 AM, Glauber Costa<glommer@parallels•com> wrote:
>>>>> Right now I am working under the assumption that tasks are long lived inside
>>>>> the cgroup. Migration potentially introduces some nasty locking problems in
>>>>> the mem_schedule path.
>>>>>
>>>>> Also, unless I am missing something, the memcg already has the policy of
>>>>> not carrying charges around, probably because of this very same complexity.
>>>>>
>>>>> True that at least it won't EBUSY you... But I think this is at least a way
>>>>> to guarantee that the cgroup under our nose won't disappear in the middle of
>>>>> our allocations.
>>>>
>>>> Here's the memcg user page behavior using the same pattern:
>>>>
>>>> 1. user page P is allocate by task T in memcg M1
>>>> 2. T is moved to memcg M2. The P charge is left behind still charged
>>>> to M1 if memory.move_charge_at_immigrate=0; or the charge is moved to
>>>> M2 if memory.move_charge_at_immigrate=1.
>>>> 3. rmdir M1 will try to reclaim P (if P was left in M1). If unable to
>>>> reclaim, then P is recharged to parent(M1).
>>>>
>>>
>>> We also have some magic in page_referenced() to remove pages
>>> referenced from different containers. What we do is try not to
>>> penalize a cgroup if another cgroup is referencing this page and the
>>> page under consideration is being reclaimed from the cgroup that
>>> touched it.
>>>
>>> Balbir Singh
>> Do you guys see it as a showstopper for this series to be merged, or can
>> we just TODO it ?
>>
>
> In my experience, 'I can't rmdir cgroup.' is always an important/difficult
> problem. The users cannot know where the accouting is leaking other than
> kmem.usage_in_bytes or memory.usage_in_bytes. and can't fix the issue.
>
> please add EXPERIMENTAL to Kconfig until this is fixed.
>
>> I can push a proposal for it, but it would be done in a separate patch
>> anyway. Also, we may be in better conditions to fix this when the slab
>> part is merged - since it will likely have the same problems...
>>
>
> Yes. considering sockets which can be shared between tasks(cgroups)
> you'll finally need
> - owner task of socket
> - account moving callback
>
> Or disallow task moving once accounted.
>
So,
I tried to come up with proper task charge moving here, and the locking
easily gets quite complicated. (But I have the feeling I am overlooking
something...) So I think I'll really need more time for that.
What do you guys think of this following patch, + EXPERIMENTAL ?
[-- Attachment #2: foo.patch --]
[-- Type: text/plain, Size: 3232 bytes --]
diff --git a/include/net/tcp.h b/include/net/tcp.h
index f784cb7..684c090 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -257,6 +257,7 @@ struct mem_cgroup;
struct tcp_memcontrol {
/* per-cgroup tcp memory pressure knobs */
int tcp_max_memory;
+ atomic_t refcnt;
atomic_long_t tcp_memory_allocated;
struct percpu_counter tcp_sockets_allocated;
/* those two are read-mostly, leave them at the end */
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6937f20..b594a9a 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -361,34 +361,21 @@ static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem);
void sock_update_memcg(struct sock *sk)
{
- /* right now a socket spends its whole life in the same cgroup */
- BUG_ON(sk->sk_cgrp);
-
rcu_read_lock();
sk->sk_cgrp = mem_cgroup_from_task(current);
-
- /*
- * We don't need to protect against anything task-related, because
- * we are basically stuck with the sock pointer that won't change,
- * even if the task that originated the socket changes cgroups.
- *
- * What we do have to guarantee, is that the chain leading us to
- * the top level won't change under our noses. Incrementing the
- * reference count via cgroup_exclude_rmdir guarantees that.
- */
- cgroup_exclude_rmdir(mem_cgroup_css(sk->sk_cgrp));
rcu_read_unlock();
}
void sock_release_memcg(struct sock *sk)
{
- cgroup_release_and_wakeup_rmdir(mem_cgroup_css(sk->sk_cgrp));
}
void memcg_sock_mem_alloc(struct mem_cgroup *mem, struct proto *prot,
int amt, int *parent_failure)
{
+ atomic_inc(&mem->tcp.refcnt);
mem = parent_mem_cgroup(mem);
+
for (; mem != NULL; mem = parent_mem_cgroup(mem)) {
long alloc;
long *prot_mem = prot->prot_mem(mem);
@@ -406,9 +393,12 @@ EXPORT_SYMBOL(memcg_sock_mem_alloc);
void memcg_sock_mem_free(struct mem_cgroup *mem, struct proto *prot, int amt)
{
- mem = parent_mem_cgroup(mem);
- for (; mem != NULL; mem = parent_mem_cgroup(mem))
- atomic_long_sub(amt, prot->memory_allocated(mem));
+ struct mem_cgroup *parent;
+ parent = parent_mem_cgroup(mem);
+ for (; parent != NULL; parent = parent_mem_cgroup(parent))
+ atomic_long_sub(amt, prot->memory_allocated(parent));
+
+ atomic_dec(&mem->tcp.refcnt);
}
EXPORT_SYMBOL(memcg_sock_mem_free);
@@ -541,6 +531,7 @@ int tcp_init_cgroup(struct proto *prot, struct cgroup *cgrp,
cg->tcp.tcp_memory_pressure = 0;
atomic_long_set(&cg->tcp.tcp_memory_allocated, 0);
+ atomic_set(&cg->tcp.refcnt, 0);
percpu_counter_init(&cg->tcp.tcp_sockets_allocated, 0);
limit = nr_free_buffer_pages() / 8;
@@ -5787,6 +5778,9 @@ static int mem_cgroup_can_attach(struct cgroup_subsys *ss,
int ret = 0;
struct mem_cgroup *mem = mem_cgroup_from_cont(cgroup);
+ if (atomic_read(&mem->tcp.refcnt))
+ return 1;
+
if (mem->move_charge_at_immigrate) {
struct mm_struct *mm;
struct mem_cgroup *from = mem_cgroup_from_task(p);
@@ -5957,6 +5951,11 @@ static int mem_cgroup_can_attach(struct cgroup_subsys *ss,
struct cgroup *cgroup,
struct task_struct *p)
{
+ struct mem_cgroup *mem = mem_cgroup_from_cont(cgroup);
+
+ if (atomic_read(&mem->tcp.refcnt))
+ return 1;
+
return 0;
}
static void mem_cgroup_cancel_attach(struct cgroup_subsys *ss,
next prev parent reply other threads:[~2011-09-27 20:43 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-09-19 0:56 [PATCH v3 0/7] per-cgroup tcp buffer pressure settings Glauber Costa
2011-09-19 0:56 ` [PATCH v3 1/7] Basic kernel memory functionality for the Memory Controller Glauber Costa
2011-09-21 2:23 ` Glauber Costa
2011-09-22 3:17 ` Balbir Singh
2011-09-22 3:19 ` Glauber Costa
2011-09-24 14:43 ` Glauber Costa
2011-09-27 10:06 ` Balbir Singh
2011-09-22 5:58 ` Greg Thelen
2011-09-26 10:34 ` KAMEZAWA Hiroyuki
2011-09-26 22:44 ` Glauber Costa
2011-09-26 23:18 ` Glauber Costa
2011-09-28 0:58 ` KAMEZAWA Hiroyuki
2011-09-28 12:03 ` Glauber Costa
2011-09-19 0:56 ` [PATCH v3 2/7] socket: initial cgroup code Glauber Costa
2011-09-21 18:47 ` Greg Thelen
2011-09-21 18:59 ` Glauber Costa
2011-09-22 6:00 ` Greg Thelen
2011-09-22 15:09 ` Balbir Singh
2011-09-24 13:33 ` Glauber Costa
2011-09-24 13:40 ` Glauber Costa
2011-09-24 14:45 ` Glauber Costa
2011-09-26 10:52 ` KAMEZAWA Hiroyuki
2011-09-26 22:47 ` Glauber Costa
2011-09-28 0:56 ` KAMEZAWA Hiroyuki
2011-09-27 20:43 ` Glauber Costa [this message]
2011-09-19 0:56 ` [PATCH v3 3/7] foundations of per-cgroup memory pressure controlling Glauber Costa
2011-09-19 0:56 ` [PATCH v3 4/7] per-cgroup tcp buffers control Glauber Costa
2011-09-26 10:59 ` KAMEZAWA Hiroyuki
2011-09-26 22:48 ` Glauber Costa
2011-09-27 1:53 ` Glauber Costa
2011-09-28 1:09 ` KAMEZAWA Hiroyuki
2011-09-26 14:39 ` Andrew Vagin
2011-09-26 22:52 ` Glauber Costa
2011-09-19 0:56 ` [PATCH v3 5/7] per-netns ipv4 sysctl_tcp_mem Glauber Costa
2011-09-19 0:56 ` [PATCH v3 6/7] tcp buffer limitation: per-cgroup limit Glauber Costa
2011-09-22 6:01 ` Greg Thelen
2011-09-22 9:58 ` Kirill A. Shutemov
2011-09-22 15:44 ` Greg Thelen
2011-09-24 13:30 ` Glauber Costa
2011-09-26 11:02 ` KAMEZAWA Hiroyuki
2011-09-26 22:49 ` Glauber Costa
2011-09-22 23:08 ` Balbir Singh
2011-09-24 13:35 ` Glauber Costa
2011-09-24 16:58 ` Andi Kleen
2011-09-24 17:27 ` Glauber Costa
2011-09-28 2:29 ` Balbir Singh
2011-09-28 3:06 ` Andi Kleen
2011-09-19 0:56 ` [PATCH v3 7/7] Display current tcp memory allocation in kmem cgroup Glauber Costa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4E823571.6060001@parallels.com \
--to=glommer@parallels$(echo .)com \
--cc=bsingharora@gmail$(echo .)com \
--cc=davem@davemloft$(echo .)net \
--cc=ebiederm@xmission$(echo .)com \
--cc=gthelen@google$(echo .)com \
--cc=kamezawa.hiroyu@jp$(echo .)fujitsu.com \
--cc=kirill@shutemov$(echo .)name \
--cc=linux-kernel@vger$(echo .)kernel.org \
--cc=linux-mm@kvack$(echo .)org \
--cc=lizf@cn$(echo .)fujitsu.com \
--cc=netdev@vger$(echo .)kernel.org \
--cc=paul@paulmenage$(echo .)org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox