public inbox for linux-arm-kernel@lists.infradead.org 
 help / color / mirror / Atom feed
From: Yang Shi <yang@os•amperecomputing.com>
To: Will Deacon <will@kernel•org>
Cc: catalin.marinas@arm•com, ryan.roberts@arm•com, cl@gentwo•org,
	linux-arm-kernel@lists•infradead.org,
	linux-kernel@vger•kernel.org
Subject: Re: [v7 PATCH] arm64: mm: show direct mapping use in /proc/meminfo
Date: Tue, 2 Jun 2026 13:38:48 -0700	[thread overview]
Message-ID: <1252b93e-0c6b-4e33-9bf9-e42cde5ba0d2@os.amperecomputing.com> (raw)
In-Reply-To: <ah7uCj2V11MnAlaR@willie-the-truck>



On 6/2/26 7:51 AM, Will Deacon wrote:
> On Tue, May 19, 2026 at 09:36:57AM -0700, Yang Shi wrote:
>> Since commit a166563e7ec3 ("arm64: mm: support large block mapping when
>> rodata=full"), the direct mapping may be split on some machines instead
>> keeping static since boot. It makes more sense to show the direct mapping
>> use in /proc/meminfo than before.
>> This patch will make /proc/meminfo show the direct mapping use like the
>> below (4K base page size):
>> DirectMap4K:       94792 kB
>> DirectMap64K:     134208 kB
>> DirectMap2M:     1173504 kB
>> DirectMap32M:    5636096 kB
>> DirectMap1G:    529530880 kB
>>
>> Although just the machines which support BBML2_NOABORT can split the
>> direct mapping, show it on all machines regardless of BBML2_NOABORT so
>> that the users have consistent view in order to avoid confusion.
>>
>> Although ptdump also can tell the direct map use, but it needs to dump
>> the whole kernel page table. It is costly and overkilling. It is also
>> in debugfs which may not be enabled by all distros. So showing direct
>> map use in /proc/meminfo seems more convenient and has less overhead.
>>
>> Signed-off-by: Yang Shi<yang@os•amperecomputing.com>
>> ---
>>   arch/arm64/mm/mmu.c | 192 +++++++++++++++++++++++++++++++++++++++-----
>>   1 file changed, 171 insertions(+), 21 deletions(-)
>>
>> v7: * Rebased to v7.1-rc4
>>      * Changed "dm" to "lm" to follow ARM convention per Will
>>      * Used __is_lm_alias() instead of reinventing a new helper per Will
> Thanks, but Sashiko has pointed out a few nasty issues:
>
> https://sashiko.dev/#/patchset/20260519163657.1259416-1-yang@os.amperecomputing.com
>
> In particular, the potential for races updating the shared counters and
> double-accounting of entries due to permission changes look like
> interesting things to check.

Hi Will,

Thanks for reminding for Sashiko review. Please see the below response.

#1
> When features like memfd_secret remove pages from the direct map by 
> calling
> set_direct_map_invalid_noflush(), it clears the PTE_VALID bit via the 
> pageattr
> infrastructure.
> Since this patch doesn't hook into those callbacks or set_pte() to invoke
> lm_meminfo_sub(), could the unmapped memory continue to be accounted for,
> leading to a persistent mismatch between the actual direct map size 
> and the
> reported statistics?

The direct mapping counters in /proc/meminfo don't treat invalid direct 
mapping differently if I read the x86 code correctly, as long as the 
range is still mapped by page table regardless whether it is valid or 
not. The counters are just updated when the mapping is created, removed 
(for example, boot stage or hot plug/unplug), split and collapsed. It 
seems like transient invalid direct mapping is not considered as 
"removed". And it seems not bother anyone.

I think we should follow the semantics and keep the consistency. We can 
definitely make changes in the future if it turns out to be a real problem.


#2
> Could concurrent calls to functions like split_pmd() (e.g., via
> set_memory_ro() or set_memory_rw() during BPF JIT allocations or module
> loading) cause data races and lost updates here?
> These updates use non-atomic read-modify-write operations, and
> apply_to_page_range() only locks individual page tables rather than 
> using a
> global lock.

Sounds like a false positive. The direct mapping page table split is 
serialized by pgtable_split_lock.


#3
> Does this code, as well as similar sections in init_pmd() and
> alloc_init_pud(), double-count memory regions when updating permissions
> of existing direct mappings?
> When functions like update_mapping_prot() change the permissions of 
> existing
> valid direct map entries, this will unconditionally add the size again 
> without
> checking if the entry was already present (e.g., checking pte_none() 
> first)
> or subtracting the old size.

Yes, it may double count when updating permission because updating 
permission reuses the same functions. It sounds not hard to address. We 
can check whether PUD/PMD/PTE is none. If it is none it means kernel is 
creating new page table, then we count it. If it not none, we skip 
accounting.


#4
> If a portion of a contiguous block is hot-removed, could this multiple
> subtraction underflow the counter?
> On ARM64, the memory hotplug block size can be smaller than a 
> contiguous block
> size (e.g., CONT_PMD_SIZE is 16GB with 64K base pages). If a partial 
> chunk is
> removed, this subtracts the full CONT_PMD_SIZE. However, it leaves the
> PTE_CONT bit intact on the remaining valid PMDs.
> A subsequent removal in the same contiguous block would see the PTE_CONT
> bit again and subtract the full CONT_PMD_SIZE a second time, 
> underflowing the
> counter.

It sounds like a false positive to me. This case should not happen at 
all. We just can't unplug a portion of DIMM, right?

For example, we plug a 16G dimm to the machine. We can offline a portion 
of it (at section granularity), but we can't unplug a portion of it, we 
just can unplug the whole dimm. The counters update happens when unplug.

Thanks,
Yang

> Will



      reply	other threads:[~2026-06-02 20:39 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-19 16:36 [v7 PATCH] arm64: mm: show direct mapping use in /proc/meminfo Yang Shi
2026-05-22 16:22 ` Christoph Lameter (Ampere)
2026-06-02 14:51 ` Will Deacon
2026-06-02 20:38   ` Yang Shi [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1252b93e-0c6b-4e33-9bf9-e42cde5ba0d2@os.amperecomputing.com \
    --to=yang@os$(echo .)amperecomputing.com \
    --cc=catalin.marinas@arm$(echo .)com \
    --cc=cl@gentwo$(echo .)org \
    --cc=linux-arm-kernel@lists$(echo .)infradead.org \
    --cc=linux-kernel@vger$(echo .)kernel.org \
    --cc=ryan.roberts@arm$(echo .)com \
    --cc=will@kernel$(echo .)org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox