public inbox for linux-arm-kernel@lists.infradead.org 
 help / color / mirror / Atom feed
From: Catalin Marinas <catalin.marinas@arm•com>
To: Anshuman Khandual <anshuman.khandual@arm•com>
Cc: mark.rutland@arm•com, mhocko@suse•com, david@redhat•com,
	linux-mm@kvack•org, arunks@codeaurora•org,
	cpandya@codeaurora•org, ira.weiny@intel•com, will@kernel•org,
	steven.price@arm•com, valentin.schneider@arm•com,
	suzuki.poulose@arm•com, Robin.Murphy@arm•com, broonie@kernel•org,
	cai@lca•pw, ard.biesheuvel@arm•com, dan.j.williams@intel•com,
	linux-arm-kernel@lists•infradead.org, osalvador@suse•de,
	steve.capper@arm•com, logang@deltatee•com,
	linux-kernel@vger•kernel.org, akpm@linux-foundation•org,
	mgorman@techsingularity•net
Subject: Re: [PATCH V7 3/3] arm64/mm: Enable memory hot remove
Date: Thu, 12 Sep 2019 21:15:17 +0100	[thread overview]
Message-ID: <20190912201517.GB1068@C02TF0J2HF1T.local> (raw)
In-Reply-To: <1567503958-25831-4-git-send-email-anshuman.khandual@arm.com>

Hi Anshuman,

Thanks for the details on the need for removing the page tables and
vmemmap backing. Some comments on the code below.

On Tue, Sep 03, 2019 at 03:15:58PM +0530, Anshuman Khandual wrote:
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -60,6 +60,14 @@ static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss __maybe_unused;
>  
>  static DEFINE_SPINLOCK(swapper_pgdir_lock);
>  
> +/*
> + * This represents if vmalloc and vmemmap address range overlap with
> + * each other on an intermediate level kernel page table entry which
> + * in turn helps in deciding whether empty kernel page table pages
> + * if any can be freed during memory hotplug operation.
> + */
> +static bool vmalloc_vmemmap_overlap;

I'd say just move the static find_vmalloc_vmemmap_overlap() function
here, the compiler should be sufficiently smart enough to figure out
that it's just a build-time constant.

> @@ -770,6 +1022,28 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>  void vmemmap_free(unsigned long start, unsigned long end,
>  		struct vmem_altmap *altmap)
>  {
> +#ifdef CONFIG_MEMORY_HOTPLUG
> +	/*
> +	 * FIXME: We should have called remove_pagetable(start, end, true).
> +	 * vmemmap and vmalloc virtual range might share intermediate kernel
> +	 * page table entries. Removing vmemmap range page table pages here
> +	 * can potentially conflict with a concurrent vmalloc() allocation.
> +	 *
> +	 * This is primarily because vmalloc() does not take init_mm ptl for
> +	 * the entire page table walk and it's modification. Instead it just
> +	 * takes the lock while allocating and installing page table pages
> +	 * via [p4d|pud|pmd|pte]_alloc(). A concurrently vanishing page table
> +	 * entry via memory hot remove can cause vmalloc() kernel page table
> +	 * walk pointers to be invalid on the fly which can cause corruption
> +	 * or worst, a crash.
> +	 *
> +	 * So free_empty_tables() gets called where vmalloc and vmemmap range
> +	 * do not overlap at any intermediate level kernel page table entry.
> +	 */
> +	unmap_hotplug_range(start, end, true);
> +	if (!vmalloc_vmemmap_overlap)
> +		free_empty_tables(start, end);
> +#endif
>  }

So, I see the risk with overlapping and I guess for some kernel
configurations (PAGE_SIZE == 64K) we may not be able to avoid it. If we
can, that's great, otherwise could we rewrite the above functions to
handle floor and ceiling similar to free_pgd_range()? (I wonder how this
function works if you called it on init_mm and kernel address range). By
having the vmemmap start/end information it avoids freeing partially
filled page table pages.

Another question: could we do the page table and the actual vmemmap
pages freeing in a single pass (sorry if this has been discussed
before)?

> @@ -1048,10 +1322,18 @@ int p4d_free_pud_page(p4d_t *p4d, unsigned long addr)
>  }
>  
>  #ifdef CONFIG_MEMORY_HOTPLUG
> +static void __remove_pgd_mapping(pgd_t *pgdir, unsigned long start, u64 size)
> +{
> +	unsigned long end = start + size;
> +
> +	WARN_ON(pgdir != init_mm.pgd);
> +	remove_pagetable(start, end, false);
> +}

I think the point I've made previously still stands: you only call
remove_pagetable() with sparse_vmap == false in this patch. Just remove
the extra argument and call unmap_hotplug_range() with sparse_vmap ==
false directly in remove_pagetable().

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists•infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  parent reply	other threads:[~2019-09-12 20:15 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-03  9:45 [PATCH V7 0/3] arm64/mm: Enable memory hot remove Anshuman Khandual
2019-09-03  9:45 ` [PATCH V7 1/3] mm/hotplug: Reorder memblock_[free|remove]() calls in try_remove_memory() Anshuman Khandual
2019-09-04  8:16   ` David Hildenbrand
2019-09-05  4:27     ` Anshuman Khandual
2019-09-16  1:44   ` Balbir Singh
2019-09-18  9:28     ` Anshuman Khandual
2019-09-03  9:45 ` [PATCH V7 2/3] arm64/mm: Hold memory hotplug lock while walking for kernel page table dump Anshuman Khandual
2019-09-15  2:35   ` Balbir Singh
2019-09-18  9:12     ` Anshuman Khandual
2019-09-03  9:45 ` [PATCH V7 3/3] arm64/mm: Enable memory hot remove Anshuman Khandual
2019-09-10 16:17   ` Catalin Marinas
2019-09-11 10:31     ` David Hildenbrand
2019-09-12  4:28     ` Anshuman Khandual
2019-09-12  8:37       ` Anshuman Khandual
2019-09-12 20:15   ` Catalin Marinas [this message]
2019-09-13  5:58     ` Anshuman Khandual
2019-09-13 10:09       ` Catalin Marinas
2019-09-17  4:36         ` Anshuman Khandual
2019-09-17 15:08           ` Catalin Marinas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190912201517.GB1068@C02TF0J2HF1T.local \
    --to=catalin.marinas@arm$(echo .)com \
    --cc=Robin.Murphy@arm$(echo .)com \
    --cc=akpm@linux-foundation$(echo .)org \
    --cc=anshuman.khandual@arm$(echo .)com \
    --cc=ard.biesheuvel@arm$(echo .)com \
    --cc=arunks@codeaurora$(echo .)org \
    --cc=broonie@kernel$(echo .)org \
    --cc=cai@lca$(echo .)pw \
    --cc=cpandya@codeaurora$(echo .)org \
    --cc=dan.j.williams@intel$(echo .)com \
    --cc=david@redhat$(echo .)com \
    --cc=ira.weiny@intel$(echo .)com \
    --cc=linux-arm-kernel@lists$(echo .)infradead.org \
    --cc=linux-kernel@vger$(echo .)kernel.org \
    --cc=linux-mm@kvack$(echo .)org \
    --cc=logang@deltatee$(echo .)com \
    --cc=mark.rutland@arm$(echo .)com \
    --cc=mgorman@techsingularity$(echo .)net \
    --cc=mhocko@suse$(echo .)com \
    --cc=osalvador@suse$(echo .)de \
    --cc=steve.capper@arm$(echo .)com \
    --cc=steven.price@arm$(echo .)com \
    --cc=suzuki.poulose@arm$(echo .)com \
    --cc=valentin.schneider@arm$(echo .)com \
    --cc=will@kernel$(echo .)org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox