From: Ryan Roberts <ryan.roberts@arm•com>
To: Matthew Wilcox <willy@infradead•org>
Cc: Mark Rutland <mark.rutland@arm•com>,
Kefeng Wang <wangkefeng.wang@huawei•com>,
x86@kernel•org, David Hildenbrand <david@redhat•com>,
Catalin Marinas <catalin.marinas@arm•com>,
Yang Shi <shy828301@gmail•com>,
Dave Hansen <dave.hansen@linux•intel.com>,
linux-kernel@vger•kernel.org, linux-mm@kvack•org,
Andrey Ryabinin <ryabinin.a.a@gmail•com>,
"H. Peter Anvin" <hpa@zytor•com>, Will Deacon <will@kernel•org>,
Ard Biesheuvel <ardb@kernel•org>, Marc Zyngier <maz@kernel•org>,
Alistair Popple <apopple@nvidia•com>,
Barry Song <21cnbao@gmail•com>, Ingo Molnar <mingo@redhat•com>,
Zi Yan <ziy@nvidia•com>, John Hubbard <jhubbard@nvidia•com>,
Borislav Petkov <bp@alien8•de>,
Baolin Wang <baolin.wang@linux•alibaba.com>,
Thomas Gleixner <tglx@linutronix•de>,
linux-arm-kernel@lists•infradead.org, "Yin,
Fengwei" <fengwei.yin@intel•com>,
James Morse <james.morse@arm•com>,
Andrew Morton <akpm@linux-foundation•org>,
linuxppc-dev@lists•ozlabs.org
Subject: Re: [PATCH v6 18/18] arm64/mm: Automatically fold contpte mappings
Date: Tue, 25 Jun 2024 15:45:54 +0100 [thread overview]
Message-ID: <de83daf9-e899-4415-bf85-5e7d69f9693e@arm.com> (raw)
In-Reply-To: <ZnrO4clYoEH_67Ur@casper.infradead.org>
On 25/06/2024 15:06, Matthew Wilcox wrote:
> On Tue, Jun 25, 2024 at 02:41:18PM +0100, Ryan Roberts wrote:
>> On 25/06/2024 14:06, Matthew Wilcox wrote:
>>> On Tue, Jun 25, 2024 at 01:41:02PM +0100, Ryan Roberts wrote:
>>>> On 25/06/2024 13:37, Baolin Wang wrote:
>>>>
>>>> [...]
>>>>
>>>>>>> For other filesystems, like ext4, I did not found the logic to determin what
>>>>>>> size of folio to allocate in writable mmap() path
>>>>>>
>>>>>> Yes I'd be keen to understand this to. When I was doing contpte, page cache
>>>>>> would only allocate large folios for readahead. So that's why I wouldn't have
>>>>>
>>>>> You mean non-large folios, right?
>>>>
>>>> No I mean that at the time I wrote contpte, the policy was to allocate an
>>>> order-0 folio for any writes that missed in the page cache, and allocate large
>>>> folios only when doing readahead from storage into page cache. The test that is
>>>> regressing is doing writes.
>>>
>>> mmap() faults also use readahead.
>>>
>>> filemap_fault():
>>>
>>> folio = filemap_get_folio(mapping, index);
>>> if (likely(!IS_ERR(folio))) {
>>> if (!(vmf->flags & FAULT_FLAG_TRIED))
>>> fpin = do_async_mmap_readahead(vmf, folio);
>>> which does:
>>> if (folio_test_readahead(folio)) {
>>> fpin = maybe_unlock_mmap_for_io(vmf, fpin);
>>> page_cache_async_ra(&ractl, folio, ra->ra_pages);
>>>
>>> which has been there in one form or another since 2007 (3ea89ee86a82).
>>
>> OK sounds like I'm probably misremembering something I read on LWN... You're
>> saying that its been the case for a while that if we take a write fault for a
>> portion of a file, then we will still end up taking the readahead path and
>> allocating a large folio (filesystem permitting)? Does that apply in the case
>> where the file has never been touched but only ftruncate'd, as is happening in
>> this test? There is obviously no need for IO in that case, but have we always
>> taken a path where a large folio may be allocated for it? I thought that bit was
>> newer for some reason.
>
> The pagecache doesn't know whether the file contains data or holes.
> It allocates folios and then invites the filesystem to fill them; the
> filesystem checks its data structures and then either issues reads if
> there's data on media or calls memset if the records indicate there's
> a hole.
>
> Whether it chooses to allocate large folios or not is going to depend
> on the access pattern; a sequential write pattern will use large folios
> and a random write pattern won't.
>
> Now, I've oversimplified things a bit by talking about filemap_fault.
> Before we call filemap_fault, we call filemap_map_pages() which looks
> for any suitable folios in the page cache between start and end, and
> maps those.
OK that all makes sense, thanks. I guess it just means I don't have an excuse
for the perf regression. :)
next prev parent reply other threads:[~2024-06-25 14:47 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-15 10:31 [PATCH v6 00/18] Transparent Contiguous PTEs for User Mappings Ryan Roberts
2024-02-15 10:31 ` [PATCH v6 01/18] mm: Clarify the spec for set_ptes() Ryan Roberts
2024-02-15 10:31 ` [PATCH v6 02/18] mm: thp: Batch-collapse PMD with set_ptes() Ryan Roberts
2024-02-15 10:31 ` [PATCH v6 03/18] mm: Introduce pte_advance_pfn() and use for pte_next_pfn() Ryan Roberts
2024-02-15 10:40 ` David Hildenbrand
2024-02-15 10:31 ` [PATCH v6 04/18] arm64/mm: Convert pte_next_pfn() to pte_advance_pfn() Ryan Roberts
2024-02-15 10:42 ` David Hildenbrand
2024-02-15 11:17 ` Mark Rutland
2024-02-15 18:27 ` Catalin Marinas
2024-02-15 10:31 ` [PATCH v6 05/18] x86/mm: " Ryan Roberts
2024-02-15 10:43 ` David Hildenbrand
2024-02-15 10:31 ` [PATCH v6 06/18] mm: Tidy up pte_next_pfn() definition Ryan Roberts
2024-02-15 10:43 ` David Hildenbrand
2024-02-15 10:31 ` [PATCH v6 07/18] arm64/mm: Convert READ_ONCE(*ptep) to ptep_get(ptep) Ryan Roberts
2024-02-15 11:18 ` Mark Rutland
2024-02-15 18:34 ` Catalin Marinas
2024-02-15 10:31 ` [PATCH v6 08/18] arm64/mm: Convert set_pte_at() to set_ptes(..., 1) Ryan Roberts
2024-02-15 11:19 ` Mark Rutland
2024-02-15 18:34 ` Catalin Marinas
2024-02-15 10:31 ` [PATCH v6 09/18] arm64/mm: Convert ptep_clear() to ptep_get_and_clear() Ryan Roberts
2024-02-15 11:20 ` Mark Rutland
2024-02-15 18:35 ` Catalin Marinas
2024-02-15 10:31 ` [PATCH v6 10/18] arm64/mm: New ptep layer to manage contig bit Ryan Roberts
2024-02-15 11:23 ` Mark Rutland
2024-02-15 19:21 ` Catalin Marinas
2024-02-15 10:31 ` [PATCH v6 11/18] arm64/mm: Split __flush_tlb_range() to elide trailing DSB Ryan Roberts
2024-02-15 11:24 ` Mark Rutland
2024-02-15 19:22 ` Catalin Marinas
2024-02-15 10:31 ` [PATCH v6 12/18] arm64/mm: Wire up PTE_CONT for user mappings Ryan Roberts
2024-02-15 11:27 ` Mark Rutland
2024-02-16 12:25 ` Catalin Marinas
2024-02-16 12:53 ` Ryan Roberts
2024-02-16 16:56 ` Catalin Marinas
2024-02-16 19:54 ` John Hubbard
2024-02-20 19:50 ` Ryan Roberts
2024-02-19 15:18 ` Catalin Marinas
2024-02-20 19:58 ` Ryan Roberts
2024-02-15 10:32 ` [PATCH v6 13/18] arm64/mm: Implement new wrprotect_ptes() batch API Ryan Roberts
2024-02-15 11:28 ` Mark Rutland
2024-02-16 12:30 ` Catalin Marinas
2024-02-15 10:32 ` [PATCH v6 14/18] arm64/mm: Implement new [get_and_]clear_full_ptes() batch APIs Ryan Roberts
2024-02-15 11:28 ` Mark Rutland
2024-02-16 12:30 ` Catalin Marinas
2024-02-15 10:32 ` [PATCH v6 15/18] mm: Add pte_batch_hint() to reduce scanning in folio_pte_batch() Ryan Roberts
2024-02-15 10:32 ` [PATCH v6 16/18] arm64/mm: Implement pte_batch_hint() Ryan Roberts
2024-02-16 12:34 ` Catalin Marinas
2024-02-15 10:32 ` [PATCH v6 17/18] arm64/mm: __always_inline to improve fork() perf Ryan Roberts
2024-02-16 12:34 ` Catalin Marinas
2024-02-15 10:32 ` [PATCH v6 18/18] arm64/mm: Automatically fold contpte mappings Ryan Roberts
2024-02-15 11:30 ` Mark Rutland
2024-02-16 12:35 ` Catalin Marinas
2024-06-24 14:30 ` Kefeng Wang
2024-06-24 15:56 ` Ryan Roberts
2024-06-25 3:16 ` Kefeng Wang
2024-06-25 7:23 ` Baolin Wang
2024-06-25 11:40 ` Ryan Roberts
2024-06-25 12:37 ` Baolin Wang
2024-06-25 12:41 ` Ryan Roberts
2024-06-25 13:06 ` Matthew Wilcox
2024-06-25 13:41 ` Ryan Roberts
2024-06-25 14:06 ` Matthew Wilcox
2024-06-25 14:45 ` Ryan Roberts [this message]
2024-06-25 12:23 ` Kefeng Wang
2024-02-15 11:36 ` [PATCH v6 00/18] Transparent Contiguous PTEs for User Mappings Mark Rutland
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=de83daf9-e899-4415-bf85-5e7d69f9693e@arm.com \
--to=ryan.roberts@arm$(echo .)com \
--cc=21cnbao@gmail$(echo .)com \
--cc=akpm@linux-foundation$(echo .)org \
--cc=apopple@nvidia$(echo .)com \
--cc=ardb@kernel$(echo .)org \
--cc=baolin.wang@linux$(echo .)alibaba.com \
--cc=bp@alien8$(echo .)de \
--cc=catalin.marinas@arm$(echo .)com \
--cc=dave.hansen@linux$(echo .)intel.com \
--cc=david@redhat$(echo .)com \
--cc=fengwei.yin@intel$(echo .)com \
--cc=hpa@zytor$(echo .)com \
--cc=james.morse@arm$(echo .)com \
--cc=jhubbard@nvidia$(echo .)com \
--cc=linux-arm-kernel@lists$(echo .)infradead.org \
--cc=linux-kernel@vger$(echo .)kernel.org \
--cc=linux-mm@kvack$(echo .)org \
--cc=linuxppc-dev@lists$(echo .)ozlabs.org \
--cc=mark.rutland@arm$(echo .)com \
--cc=maz@kernel$(echo .)org \
--cc=mingo@redhat$(echo .)com \
--cc=ryabinin.a.a@gmail$(echo .)com \
--cc=shy828301@gmail$(echo .)com \
--cc=tglx@linutronix$(echo .)de \
--cc=wangkefeng.wang@huawei$(echo .)com \
--cc=will@kernel$(echo .)org \
--cc=willy@infradead$(echo .)org \
--cc=x86@kernel$(echo .)org \
--cc=ziy@nvidia$(echo .)com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox