From: Catalin Marinas <catalin.marinas@arm•com>
To: Anshuman Khandual <anshuman.khandual@arm•com>
Cc: mark.rutland@arm•com, mhocko@suse•com, david@redhat•com,
linux-mm@kvack•org, arunks@codeaurora•org,
cpandya@codeaurora•org, ira.weiny@intel•com, will@kernel•org,
steven.price@arm•com, valentin.schneider@arm•com,
suzuki.poulose@arm•com, Robin.Murphy@arm•com, broonie@kernel•org,
cai@lca•pw, ard.biesheuvel@arm•com, dan.j.williams@intel•com,
linux-arm-kernel@lists•infradead.org, osalvador@suse•de,
steve.capper@arm•com, logang@deltatee•com,
linux-kernel@vger•kernel.org, akpm@linux-foundation•org,
mgorman@techsingularity•net
Subject: Re: [PATCH V7 3/3] arm64/mm: Enable memory hot remove
Date: Thu, 12 Sep 2019 21:15:17 +0100 [thread overview]
Message-ID: <20190912201517.GB1068@C02TF0J2HF1T.local> (raw)
In-Reply-To: <1567503958-25831-4-git-send-email-anshuman.khandual@arm.com>
Hi Anshuman,
Thanks for the details on the need for removing the page tables and
vmemmap backing. Some comments on the code below.
On Tue, Sep 03, 2019 at 03:15:58PM +0530, Anshuman Khandual wrote:
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -60,6 +60,14 @@ static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss __maybe_unused;
>
> static DEFINE_SPINLOCK(swapper_pgdir_lock);
>
> +/*
> + * This represents if vmalloc and vmemmap address range overlap with
> + * each other on an intermediate level kernel page table entry which
> + * in turn helps in deciding whether empty kernel page table pages
> + * if any can be freed during memory hotplug operation.
> + */
> +static bool vmalloc_vmemmap_overlap;
I'd say just move the static find_vmalloc_vmemmap_overlap() function
here, the compiler should be sufficiently smart enough to figure out
that it's just a build-time constant.
> @@ -770,6 +1022,28 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
> void vmemmap_free(unsigned long start, unsigned long end,
> struct vmem_altmap *altmap)
> {
> +#ifdef CONFIG_MEMORY_HOTPLUG
> + /*
> + * FIXME: We should have called remove_pagetable(start, end, true).
> + * vmemmap and vmalloc virtual range might share intermediate kernel
> + * page table entries. Removing vmemmap range page table pages here
> + * can potentially conflict with a concurrent vmalloc() allocation.
> + *
> + * This is primarily because vmalloc() does not take init_mm ptl for
> + * the entire page table walk and it's modification. Instead it just
> + * takes the lock while allocating and installing page table pages
> + * via [p4d|pud|pmd|pte]_alloc(). A concurrently vanishing page table
> + * entry via memory hot remove can cause vmalloc() kernel page table
> + * walk pointers to be invalid on the fly which can cause corruption
> + * or worst, a crash.
> + *
> + * So free_empty_tables() gets called where vmalloc and vmemmap range
> + * do not overlap at any intermediate level kernel page table entry.
> + */
> + unmap_hotplug_range(start, end, true);
> + if (!vmalloc_vmemmap_overlap)
> + free_empty_tables(start, end);
> +#endif
> }
So, I see the risk with overlapping and I guess for some kernel
configurations (PAGE_SIZE == 64K) we may not be able to avoid it. If we
can, that's great, otherwise could we rewrite the above functions to
handle floor and ceiling similar to free_pgd_range()? (I wonder how this
function works if you called it on init_mm and kernel address range). By
having the vmemmap start/end information it avoids freeing partially
filled page table pages.
Another question: could we do the page table and the actual vmemmap
pages freeing in a single pass (sorry if this has been discussed
before)?
> @@ -1048,10 +1322,18 @@ int p4d_free_pud_page(p4d_t *p4d, unsigned long addr)
> }
>
> #ifdef CONFIG_MEMORY_HOTPLUG
> +static void __remove_pgd_mapping(pgd_t *pgdir, unsigned long start, u64 size)
> +{
> + unsigned long end = start + size;
> +
> + WARN_ON(pgdir != init_mm.pgd);
> + remove_pagetable(start, end, false);
> +}
I think the point I've made previously still stands: you only call
remove_pagetable() with sparse_vmap == false in this patch. Just remove
the extra argument and call unmap_hotplug_range() with sparse_vmap ==
false directly in remove_pagetable().
--
Catalin
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists•infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2019-09-12 20:15 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-09-03 9:45 [PATCH V7 0/3] arm64/mm: Enable memory hot remove Anshuman Khandual
2019-09-03 9:45 ` [PATCH V7 1/3] mm/hotplug: Reorder memblock_[free|remove]() calls in try_remove_memory() Anshuman Khandual
2019-09-04 8:16 ` David Hildenbrand
2019-09-05 4:27 ` Anshuman Khandual
2019-09-16 1:44 ` Balbir Singh
2019-09-18 9:28 ` Anshuman Khandual
2019-09-03 9:45 ` [PATCH V7 2/3] arm64/mm: Hold memory hotplug lock while walking for kernel page table dump Anshuman Khandual
2019-09-15 2:35 ` Balbir Singh
2019-09-18 9:12 ` Anshuman Khandual
2019-09-03 9:45 ` [PATCH V7 3/3] arm64/mm: Enable memory hot remove Anshuman Khandual
2019-09-10 16:17 ` Catalin Marinas
2019-09-11 10:31 ` David Hildenbrand
2019-09-12 4:28 ` Anshuman Khandual
2019-09-12 8:37 ` Anshuman Khandual
2019-09-12 20:15 ` Catalin Marinas [this message]
2019-09-13 5:58 ` Anshuman Khandual
2019-09-13 10:09 ` Catalin Marinas
2019-09-17 4:36 ` Anshuman Khandual
2019-09-17 15:08 ` Catalin Marinas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190912201517.GB1068@C02TF0J2HF1T.local \
--to=catalin.marinas@arm$(echo .)com \
--cc=Robin.Murphy@arm$(echo .)com \
--cc=akpm@linux-foundation$(echo .)org \
--cc=anshuman.khandual@arm$(echo .)com \
--cc=ard.biesheuvel@arm$(echo .)com \
--cc=arunks@codeaurora$(echo .)org \
--cc=broonie@kernel$(echo .)org \
--cc=cai@lca$(echo .)pw \
--cc=cpandya@codeaurora$(echo .)org \
--cc=dan.j.williams@intel$(echo .)com \
--cc=david@redhat$(echo .)com \
--cc=ira.weiny@intel$(echo .)com \
--cc=linux-arm-kernel@lists$(echo .)infradead.org \
--cc=linux-kernel@vger$(echo .)kernel.org \
--cc=linux-mm@kvack$(echo .)org \
--cc=logang@deltatee$(echo .)com \
--cc=mark.rutland@arm$(echo .)com \
--cc=mgorman@techsingularity$(echo .)net \
--cc=mhocko@suse$(echo .)com \
--cc=osalvador@suse$(echo .)de \
--cc=steve.capper@arm$(echo .)com \
--cc=steven.price@arm$(echo .)com \
--cc=suzuki.poulose@arm$(echo .)com \
--cc=valentin.schneider@arm$(echo .)com \
--cc=will@kernel$(echo .)org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox