From: FF <figure1802@126•com>
To: "Catalin Marinas" <catalin.marinas@arm•com>, runninglinuxkernel@126•com
Cc: mark.rutland@arm•com, steve.capper@arm•com, will.deacon@arm•com,
runninglinuxkernel@126•com, julien.grall@arm•com,
linux-arm-kernel <linux-arm-kernel@lists•infradead.org>
Subject: Re:Re: about the ptep_set_access_flags() for hardware AF/DBM
Date: Tue, 29 Oct 2019 08:54:38 +0800 (CST) [thread overview]
Message-ID: <1b0920d5.c4b.16e1501ef37.Coremail.figure1802@126.com> (raw)
In-Reply-To: <20191028184303.GB6619@arrakis.emea.arm.com>
At 2019-10-29 02:43:03, "Catalin Marinas" <catalin.marinas@arm•com> wrote:
>On Sun, Oct 27, 2019 at 05:56:24PM +0800, FF wrote:
>> i see a patch, commit id: 66dbd6e61a52 "arm64: Implement ptep_set_access_flags() for hardware AF/DBM"
>> in this patch, the author show a insteresting case of the racy of hardware AF/DBM.
>>
>> Here is the scenario:
>> A more complex situation is possible when all CPUs support hardware
>> AF/DBM:
>>
>> a) Initial state: shareable + writable vma and pte_none(pte)
>> b) Read fault taken by two threads of the same process on different
>> CPUs
>> c) CPU0 takes the mmap_sem and proceeds to handling the fault. It
>> eventually reaches do_set_pte() which sets a writable + clean pte.
>> CPU0 releases the mmap_sem
>> d) CPU1 acquires the mmap_sem and proceeds to handle_pte_fault(). The
>> pte entry it reads is present, writable and clean and it continues
>> to pte_mkyoung()
>> e) CPU1 calls ptep_set_access_flags()
>>
>> If between (d) and (e) the hardware (another CPU) updates the dirty
>> state (clears PTE_RDONLY), CPU1 will override the PTR_RDONLY bit
>> marking the entry clean again.
>>
>> my question is:
>> 1. in step a, it say, the initial state vma is : sharable + writable +
>> pte_none. let suppose this is a anon mapping by mmap() API.
>
>What I had in mind at the time was a file mapping rather than anonymous
>one (vma_is_anonymous() is false for shared mappings).
>
>> so the vma->vm_page_prot should be : VM_READ | VM_WRITE | VM_SHARED
>> in vm_get_page_prot(), it will change to pte attribute,in linux
>> kernel it has a protection_map[] array. in that case, it should be
>> __S011 (PAGE_SHARED). for PAGE_SHARED, the pte attribute will set
>> PTE_WRITE,so PTE_DBM is set, but the PTE_RDONLY should be zero,
>> right?
>
>PAGE_SHARED is indeed writable but how it ends up in the pte depends on
>the mapping. For a shared memory mapping, I think you do get a writable
>entry on a read fault.
>
>For file mappings, the writable attribute is cleared from vm_page_prot
>via the vma_set_page_prot() function because vma_wants_writenotify() is
>true. Filesystem normally want to track which pages have been dirtied to
>write back.
>
>> in step c, CPU0 trigger read fault and handle the page fault, it will
>> call do_anonymous_page(), and using system_zero_page. i don't what is
>> a clean pte? but currently, the PTE_RDONLY is zero, it means this
>> pte is writable.
>
>Note that we can't invoke do_anonymous_page() for VM_SHARED mappings.
>This is only for private mappings. If you look at mmap_region(), the vma
>is not set up as anonymous if MAP_SHARED is given but as a shmem.
>
>> when the CPU2 write this memory, it will update the dirty state like
>> clear PTE_RDONLY, but my questions, the PTE_RDONLY is still zero, in
>> step a~d, so why CPU1 will override RT_RDONLY bit and marking the
>> entry clean again.
>
>As I said above, this scenario is for shared file mappings where you do
>get a PTE_RDONLY set for clean mappings.
>
>--
>Catalin
hi Catalin:
Thanks for your point out.
i want to elaborate the scenario, i saw the first patch to fix the ptep_set_access_flags() for hardware AF/DBM is on Linux 4.7-rc1.
commit id "66dbd6e6" ,arm64: Implement ptep_set_access_flags() for hardware AF/DBM
i think you have issue on Linux 4.6, let's assume that we are look at Linux 4.6 source code.
1. initial phase: we want to create a sharable+writable file mapping by mmap() API, the filesyste is:ext4
in do_mmap(), the vm_flags should be set VM_READ | VM_WRITE | VM_SHARED.
in mmap_region()->vma_set_page_prot(), it will let the some shared mappigns will want the pages marked read-only to track write events,
so it will clear the VM_SHARED. so it will get the pte attribute from protection_map[] is __P011.
In Linux 4.6, __P011 is PAGE_COPY:
#define PAGE_COPY __pgprot(_PAGE_DEFAULT | PTE_USER | PTE_NG | PTE_PXN | PTE_UXN)
for PAGE_COPY, the PTE_RDONLY and PTE_WRITE(DMB) are zero.
so the vm_flags is: VM_READ | VM_WRITE
2. Thread 1 on CPU0 want to write this page, page_fault will be trigger.
in handle_pte_fault->do_fault->do_shared_fault(), it will allocate a new page cache, and in do_set_pte(),
it will call: "maybe_mkwrite(pte_mkdirty(entry), vma)" to set the pte entry.
so the pte attribute should be: PTE_DIRTY | PTE_WRITE.
3. Thread 2 on CPU1 also want to read this page but this pte has not create by Thread 1, so page_fault happen.
in pte_offset_map(), it found that the pte is created by Thread 1, so it will directly call:
entry = pte_mkyoung(entry);
ptep_set_access_flags()
in ptep_set_access_flags, it will call set_pte_at() to set pte.
but in set_pte_at() function:
if (pte_present(pte)) {
if (pte_sw_dirty(pte) && pte_write(pte))
pte_val(pte) &= ~PTE_RDONLY;
else
pte_val(pte) |= PTE_RDONLY;
if (pte_user(pte) && pte_exec(pte) && !pte_special(pte))
__sync_icache_dcache(pte, addr);
}
it will clean the PTE_RDONLY bit, because the PTE_DIRTY | PTE_WRITE is set in our scenario.
otherwise, anyone clean the PTE_DIRTY bit, who will clean this PTE_DIRTY bit?
so i am very confusing the patch "arm64: Implement ptep_set_access_flags() for hardware AF/DBM" commit log's scenrio.
would you like point out what i am missing?
Best
Ben
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists•infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2019-10-29 0:54 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-27 9:56 about the ptep_set_access_flags() for hardware AF/DBM FF
2019-10-28 18:43 ` Catalin Marinas
2019-10-29 0:54 ` FF [this message]
2019-10-29 12:11 ` Catalin Marinas
2019-10-29 14:04 ` FF
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1b0920d5.c4b.16e1501ef37.Coremail.figure1802@126.com \
--to=figure1802@126$(echo .)com \
--cc=catalin.marinas@arm$(echo .)com \
--cc=julien.grall@arm$(echo .)com \
--cc=linux-arm-kernel@lists$(echo .)infradead.org \
--cc=mark.rutland@arm$(echo .)com \
--cc=runninglinuxkernel@126$(echo .)com \
--cc=steve.capper@arm$(echo .)com \
--cc=will.deacon@arm$(echo .)com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox