Re: [PATCH v3 5/6] mm/vmalloc: map contiguous pages in batches for vmap() if possible

public inbox for linux-arm-kernel@lists.infradead.org 
 help / color / mirror / Atom feed

From: Dev Jain <dev.jain@arm•com>
To: Wen Jiang <jiangwenxiaomi@gmail•com>
Cc: linux-mm@kvack•org, linux-arm-kernel@lists•infradead.org,
	catalin.marinas@arm•com, will@kernel•org,
	akpm@linux-foundation•org, urezki@gmail•com, baohua@kernel•org,
	Xueyuan.chen21@gmail•com, rppt@kernel•org, david@kernel•org,
	ryan.roberts@arm•com, anshuman.khandual@arm•com,
	ajd@linux•ibm.com, linux-kernel@vger•kernel.org,
	jiangwen6@xiaomi•com
Subject: Re: [PATCH v3 5/6] mm/vmalloc: map contiguous pages in batches for vmap() if possible
Date: Fri, 29 May 2026 11:27:02 +0530	[thread overview]
Message-ID: <176c83fb-edf0-47b3-9823-21a92ea8b4c7@arm.com> (raw)
In-Reply-To: <CAHKocdHOT13DHdz+DsT2BaP0EJSKdoWU2NO-wjFwB9+--CqsHw@mail.gmail.com>



On 28/05/26 9:12 am, Wen Jiang wrote:
> On Wed, 27 May 2026 at 16:28, Dev Jain <dev.jain@arm•com> wrote:
>>
>>
>>
>> On 22/05/26 11:01 am, Wen Jiang wrote:
>>> From: "Barry Song (Xiaomi)" <baohua@kernel•org>
>>>
>>> In many cases, the pages passed to vmap() may include high-order
>>> pages. For example, the systemheap often allocates pages in descending
>>> order: order 8, then 4, then 0. Currently, vmap() iterates over every
>>> page individually—even pages inside a high-order block are handled
>>> one by one.
>>>
>>> This patch detects physically contiguous pages (regardless of whether
>>> they are compound or non-compound) by scanning with
>>> num_pages_contiguous(), and maps them as a single contiguous block
>>> whenever possible. The first page's pfn must be aligned to the
>>> mapping order for the batched mapping to be used.
>>>
>>> Pages with the same page_shift are coalesced and mapped via
>>> vmap_pages_range_noflush_walk() to avoid page table rewalk.
>>>
>>> As users typically allocate memory in descending orders (e.g.
>>> 8 → 4 → 0), once an order-0 page is encountered, we stop scanning
>>> for contiguous pages since subsequent pages are likely order-0 as well.
>>>
>>> Signed-off-by: Barry Song (Xiaomi) <baohua@kernel•org>
>>> Co-developed-by: Dev Jain <dev.jain@arm•com>
>>> Signed-off-by: Dev Jain <dev.jain@arm•com>
>>> Signed-off-by: Wen Jiang <jiangwen6@xiaomi•com>
>>> Tested-by: Xueyuan Chen <xueyuan.chen21@gmail•com>
>>> ---
>>>  mm/vmalloc.c | 82 ++++++++++++++++++++++++++++++++++++++++++++++++++--
>>>  1 file changed, 80 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>>> index deb764abc0571..50642246f4d40 100644
>>> --- a/mm/vmalloc.c
>>> +++ b/mm/vmalloc.c
>>> @@ -3542,6 +3542,84 @@ void vunmap(const void *addr)
>>>  }
>>>  EXPORT_SYMBOL(vunmap);
>>>
>>> +static inline int get_vmap_batch_order(struct page **pages,
>>> +             unsigned int max_steps, unsigned int idx)
>>> +{
>>> +     unsigned int nr_contig;
>>> +     int order;
>>> +
>>> +     if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP) ||
>>> +                     ioremap_max_page_shift == PAGE_SHIFT)
>>
>>
>> Why bail out on ioremap_max_page_shift == PAGE_SHIFT? The code
>> path for ioremap is different from vmap right?
>>
>>
> 
> ioremap_max_page_shift is under CONFIG_HAVE_ARCH_HUGE_VMAP which
> controls both ioremap and vmap huge mappings.

I don't get it. So with this patch if nohugeiomap is passed on kernel
cmdline, then vmap-huge is also disabled. That does not sound correct.
Currently ioremap_max_page_shift does not play at all with the normal
vmap code path. It is only involved in ioremap_page_range().


> 
>>> +             return 0;
>>> +
>>> +     nr_contig = num_pages_contiguous(&pages[idx], max_steps);
>>> +     if (nr_contig < 2)
>>> +             return 0;
>>> +
>>> +     order = fls(nr_contig) - 1;
>>> +
>>> +     if (arch_vmap_pte_supported_shift(PAGE_SIZE << order) == PAGE_SHIFT)
>>> +             return 0;

Also, for arches where this function does not do anything special
(i.e return PAGE_SHIFT), we will effectively not do any huge mappings
for them.


>>> +
>>> +     /* Ensure the first page's pfn is aligned to the order */
>>> +     if (!IS_ALIGNED(page_to_pfn(pages[idx]), 1 << order))
>>> +             return 0;

This condition is a bit fragile. It may happen that we have, say 2^8
contigous pages, but they are aligned to only 2^4. We are operating
on a page array and have no idea if the caller has passed some
random subrange of the array.

I think the purpose of these checks is this - to do an early bailout
if arch does not support huge mappings, or the alignment is not correct,
instead of finding this out very deep into vmap_pages_range_noflush_walk.

So you could do something like (completely untested and may miss some edge cases):

order = ilog2(nr_contig);

order = min(order, __ffs(page_to_pfn(pages[idx])));

order = vm_shift(PAGE_SIZE << order) - PAGE_SHIFT;

Where vm_shift() is the helper I had used in my patch.

>>> +
>>> +     return order;
>>> +}
>>> +
>>> +static int vmap_batched(unsigned long addr, unsigned long end,
>>> +             pgprot_t prot, struct page **pages)
>>> +{
>>> +     unsigned int count = (end - addr) >> PAGE_SHIFT;
>>> +     unsigned int prev_shift = 0, idx = 0;
>>> +     unsigned long start = addr, map_addr = addr;
>>> +     int err;
>>> +
>>> +     err = kmsan_vmap_pages_range_noflush(addr, end, prot, pages,
>>> +                                             PAGE_SHIFT, GFP_KERNEL);
>>> +     if (err)
>>> +             goto out;
>>> +
>>> +     for (unsigned int i = 0; i < count; ) {
>>> +             unsigned int shift = PAGE_SHIFT +
>>> +                     get_vmap_batch_order(pages, count - i, i);
>>> +
>>> +             if (!i)
>>> +                     prev_shift = shift;
>>> +
>>> +             if (shift != prev_shift) {
>>> +                     err = vmap_pages_range_noflush_walk(map_addr, addr,
>>
>> It would be worth documenting vmap_pages_range_noflush_walk() that
>> it can take an array of pages which are not all contiguous, but it
>> may have contiguous chunks, as hinted by page_shift.
>>
>> Otherwise this looks good.
>>
>>> +                                     prot, pages + idx,
>>> +                                     min(prev_shift, PMD_SHIFT));
>>> +                     if (err)
>>> +                             goto out;
>>> +                     prev_shift = shift;
>>> +                     map_addr = addr;
>>> +                     idx = i;
>>> +             }
>>> +
>>> +             /*
>>> +              * Once small pages are encountered, the remaining pages
>>> +              * are likely small as well.
>>> +              */
>>> +             if (shift == PAGE_SHIFT)
>>> +                     break;
>>> +
>>> +             addr += 1UL << shift;
>>> +             i += 1U << (shift - PAGE_SHIFT);
>>> +     }
>>> +
>>> +     /* Remaining */
>>> +     if (map_addr < end)
>>> +             err = vmap_pages_range_noflush_walk(map_addr, end,
>>> +                             prot, pages + idx, min(prev_shift, PMD_SHIFT));
>>> +
>>> +out:
>>> +     flush_cache_vmap(start, end);
>>> +     return err;
>>> +}
>>> +
>>>  /**
>>>   * vmap - map an array of pages into virtually contiguous space
>>>   * @pages: array of page pointers
>>> @@ -3585,8 +3663,8 @@ void *vmap(struct page **pages, unsigned int count,
>>>               return NULL;
>>>
>>>       addr = (unsigned long)area->addr;
>>> -     if (vmap_pages_range(addr, addr + size, pgprot_nx(prot),
>>> -                             pages, PAGE_SHIFT) < 0) {
>>> +     if (vmap_batched(addr, addr + size, pgprot_nx(prot),
>>> +                             pages) < 0) {
>>>               vunmap(area->addr);
>>>               return NULL;
>>>       }
>>

next prev parent reply	other threads:[~2026-05-29  5:57 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-22  5:31 [PATCH v3 0/6] mm/vmalloc: Speed up ioremap, vmalloc and vmap with contiguous memory Wen Jiang
2026-05-22  5:31 ` [PATCH v3 1/6] arm64/hugetlb: Extend batching of multiple CONT_PTE in a single PTE setup Wen Jiang
2026-05-26  7:56   ` Dev Jain
2026-05-22  5:31 ` [PATCH v3 2/6] arm64/vmalloc: Allow arch_vmap_pte_range_map_size to batch multiple CONT_PTE Wen Jiang
2026-05-27  5:43   ` Dev Jain
2026-05-22  5:31 ` [PATCH v3 3/6] mm/vmalloc: Extract vmap_set_ptes() to consolidate PTE mapping logic Wen Jiang
2026-06-01 17:34   ` Uladzislau Rezki
2026-06-02  7:45     ` Wen Jiang
2026-05-22  5:31 ` [PATCH v3 4/6] mm/vmalloc: Extend page table walk to support larger page_shift sizes and eliminate page table rewalk Wen Jiang
2026-05-27  5:58   ` Dev Jain
2026-05-28  3:39     ` Wen Jiang
2026-05-29  5:28       ` Dev Jain
2026-06-05  6:02       ` Dev Jain
2026-05-22  5:31 ` [PATCH v3 5/6] mm/vmalloc: map contiguous pages in batches for vmap() if possible Wen Jiang
2026-05-27  8:27   ` Dev Jain
2026-05-28  3:42     ` Wen Jiang
2026-05-29  5:57       ` Dev Jain [this message]
2026-06-02  7:34         ` Wen Jiang
2026-05-22  5:31 ` [PATCH v3 6/6] mm/vmalloc: align vm_area so vmap() can batch mappings Wen Jiang
2026-05-23  7:53   ` Uladzislau Rezki
2026-05-27  6:25   ` Dev Jain
2026-06-02  8:57     ` Wen Jiang
2026-05-22 18:07 ` [PATCH v3 0/6] mm/vmalloc: Speed up ioremap, vmalloc and vmap with contiguous memory Andrew Morton
2026-05-23  8:26   ` Wen Jiang
2026-05-23 21:40     ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=176c83fb-edf0-47b3-9823-21a92ea8b4c7@arm.com \
    --to=dev.jain@arm$(echo .)com \
    --cc=Xueyuan.chen21@gmail$(echo .)com \
    --cc=ajd@linux$(echo .)ibm.com \
    --cc=akpm@linux-foundation$(echo .)org \
    --cc=anshuman.khandual@arm$(echo .)com \
    --cc=baohua@kernel$(echo .)org \
    --cc=catalin.marinas@arm$(echo .)com \
    --cc=david@kernel$(echo .)org \
    --cc=jiangwen6@xiaomi$(echo .)com \
    --cc=jiangwenxiaomi@gmail$(echo .)com \
    --cc=linux-arm-kernel@lists$(echo .)infradead.org \
    --cc=linux-kernel@vger$(echo .)kernel.org \
    --cc=linux-mm@kvack$(echo .)org \
    --cc=rppt@kernel$(echo .)org \
    --cc=ryan.roberts@arm$(echo .)com \
    --cc=urezki@gmail$(echo .)com \
    --cc=will@kernel$(echo .)org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox