[PATCH v2 05/22] ARM: LPAE: support 64-bit virt_to_phys patching

public inbox for linux-arm-kernel@lists.infradead.org 
 help / color / mirror / Atom feed

From: cyril@ti•com (Cyril Chemparathy)
To: linux-arm-kernel@lists•infradead.org
Subject: [PATCH v2 05/22] ARM: LPAE: support 64-bit virt_to_phys patching
Date: Sun, 12 Aug 2012 19:27:49 -0400	[thread overview]
Message-ID: <50283BF5.1050908@ti.com> (raw)
In-Reply-To: <alpine.LFD.2.02.1208112316560.5231@xanadu.home>

On 08/11/12 23:39, Nicolas Pitre wrote:
> On Fri, 10 Aug 2012, Cyril Chemparathy wrote:
>
>> This patch adds support for 64-bit physical addresses in virt_to_phys()
>> patching.  This does not do real 64-bit add/sub, but instead patches in the
>> upper 32-bits of the phys_offset directly into the output of virt_to_phys.
>>
>> There is no corresponding change on the phys_to_virt() side, because
>> computations on the upper 32-bits would be discarded anyway.
>>
>> Signed-off-by: Cyril Chemparathy <cyril@ti•com>
>> ---
>>   arch/arm/include/asm/memory.h |   22 ++++++++++++++++++----
>>   arch/arm/kernel/head.S        |    4 ++++
>>   arch/arm/kernel/setup.c       |    2 +-
>>   3 files changed, 23 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h
>> index 81e1714..dc5fbf3 100644
>> --- a/arch/arm/include/asm/memory.h
>> +++ b/arch/arm/include/asm/memory.h
>> @@ -154,14 +154,28 @@
>>   #ifdef CONFIG_ARM_PATCH_PHYS_VIRT
>>
>>   extern unsigned long	__pv_offset;
>> -extern unsigned long	__pv_phys_offset;
>> +extern phys_addr_t	__pv_phys_offset;
>>   #define PHYS_OFFSET	__virt_to_phys(PAGE_OFFSET)
>>
>>   static inline phys_addr_t __virt_to_phys(unsigned long x)
>>   {
>> -	unsigned long t;
>> -	early_patch_imm8("add", t, x, __pv_offset, 0);
>> -	return t;
>> +	unsigned long tlo, thi;
>> +
>> +	early_patch_imm8("add", tlo, x, __pv_offset, 0);
>> +
>> +#ifdef CONFIG_ARM_LPAE
>> +	/*
>> +	 * On LPAE, we do not _need_ to do 64-bit arithmetic because the high
>> +	 * order 32 bits are never changed by the phys-virt offset.  We simply
>> +	 * patch in the high order physical address bits instead.
>> +	 */
>> +#ifdef __ARMEB__
>> +	early_patch_imm8_mov("mov", thi, __pv_phys_offset, 0);
>> +#else
>> +	early_patch_imm8_mov("mov", thi, __pv_phys_offset, 4);
>> +#endif
>> +#endif
>> +	return (u64)tlo | (u64)thi << 32;
>>   }
>
> Hmmm...  I'm afraid this is going to be suboptimal when LPAE is not
> selected.
>

I understand your concern, but I don't see the sub-optimality.  I tested 
the following function with GCC versions 4.3.3 and 4.7.  This is after 
the other changes that I mentioned in my previous email, but with the 
__virt_to_phys() code itself unchanged:

phys_addr_t ____test_virt_to_phys(unsigned long addr)
{
         return __virt_to_phys(addr);
}

The resulting code in both cases looks like:

<____test_virt_to_phys>:
	b       c04f0528
	bx      lr

[the branch of course gets patched to an add]

> First of all, you do not need to cast tlo to a u64 in the return value.
>

True enough.

> Then, I'm not sure if the compiler is smart enough to see that the
> returned value is a phys_addr_t which can be a u32, and in this case the
> (u64)thi << 32 is going to be truncated right away, and therefore there
> is no point in emiting the corresponding instructions.
>

In this case, it appears to be smart enough.  However, I agree that 
relying on compiler smarts is probably not the best thing for us to do.

> Furthermore, if LPAE is not defined, then thi never gets initialized and
> should produce a warning. Did you test compilation of the code with LPAE
> turned off?
>

Sure.  One of our test platforms is non-LPAE.  The compiler does not 
produce warnings on this, and this is consistent across both compiler 
versions.

> I'd prefer something like this where more stuff is validated by the
> compiler:
>
> static inline phys_addr_t __virt_to_phys(unsigned long x)
> {
> 	unsigned long tlo, thi;
> 	phys_addr_t ret;
>
> 	early_patch_imm8("add", tlo, x, __pv_offset, 0);
> 	ret = tlo;
>
> 	if (sizeof(phys_addr_t) > 4) {
> #ifdef __ARMEB__
> 		early_patch_imm8_mov("mov", thi, __pv_phys_offset, 0);
> #else
> 		early_patch_imm8_mov("mov", thi, __pv_phys_offset, 4);
> #endif
> 		ret |= ((u64)thi) << 32;
> 	}
>
> 	return ret);
> }
>
> This should let the compiler optimize things whether LPAE is enabledor
> not while validating both cases.
>

Agreed on the principal, but more below...


I've meanwhile been chasing down another problem - the code generated 
for the LPAE case.  The original code resulted in the following:

<____test_virt_to_phys>:
         mov     r2, #0
         b       c01bc800	# patch: add r1, r0, __pv_offset
         b       c01bc810	# patch: mov r0, __phys_offset_high
         orr     r2, r2, r1
         mov     r3, r0
         mov     r1, r3
         mov     r0, r2
         bx      lr

Yikes! This code does a bunch of futile register shuffling and a 
pointless or, all in the name of generating the result in a 64-bit 
register-pair from the 32-bit halves.

In order to get past this, I tried adding operand qualifiers (R = upper 
32-bits, Q = lower 32-bits) in the patch macros, in the hope that 
treating these as native 64-bit register pairs would eliminate the need 
to shuffle them around after the inline assembly blocks.  This then 
allows us to implement __virt_to_phys() as follows:

static inline phys_addr_t __virt_to_phys(unsigned long x)
{
         phys_addr_t t;

         if (sizeof(t) == 4) {
		t = x;
                 early_patch_imm8("add", t, "", t, __pv_offset, 0);
                 return t;
         }

         /*
          * On LPAE, we do not _need_ to do 64-bit arithmetic because
          * the high order 32 bits are never changed by the phys-virt
          * offset.  We simply patch in the high order physical address
          * bits instead.
          *
          * Note: the mov _must_ be first here.  From the compiler's
          * perspective, this is the initializer for the variable. The
          * mov itself initializes only the upper half. The subsequent
          * add treats t as a read/write operand and initializes the
          * lower half.
          */
#ifdef __ARMEB__
         early_patch_imm8_mov("mov", t, "R", __pv_phys_offset, 0);
#else
         early_patch_imm8_mov("mov", t, "R", __pv_phys_offset, 4);
#endif
         early_patch_imm8("add", t, "Q", x, __pv_offset, 0);

         return t;
}

With this, we get significantly better looking generated code:

<____test_virt_to_phys>:
	b       c01d519c	# patch: mov r3, __phys_offset_high
	b       c01d51ac	# patch: add r2, r0, __phys_offset_high
	mov     r0, r2
	mov     r1, r3
	bx      lr

This is about as far along as I've been able to proceed.  I still 
haven't figured out a way to get it to patch in place without an extra 
register pair.

Overall, this is still a bit too kludgy for my liking.  In particular, 
the read/write operand forces add/sub/... users to initialize the result 
variable.  I am currently leaning towards adding native support for 
64-bit operations in the runtime patch code, instead of having to hack 
around it with 32-bit primitives.  Better ideas, any one?

Thanks
-- Cyril.

next prev parent reply	other threads:[~2012-08-12 23:27 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-11  1:24 [PATCH v2 00/22] Introducing the TI Keystone platform Cyril Chemparathy
2012-08-11  1:24 ` [PATCH v2 01/22] ARM: add mechanism for late code patching Cyril Chemparathy
2012-08-12  2:22   ` Nicolas Pitre
2012-08-12 18:13     ` Cyril Chemparathy
2012-08-11  1:24 ` [PATCH v2 02/22] ARM: add self test for runtime patch mechanism Cyril Chemparathy
2012-08-12  2:35   ` Nicolas Pitre
2012-08-12 16:32     ` Cyril Chemparathy
2012-08-13  3:19       ` Nicolas Pitre
2012-08-11  1:24 ` [PATCH v2 03/22] ARM: use late patch framework for phys-virt patching Cyril Chemparathy
2012-08-12  3:03   ` Nicolas Pitre
2012-08-12 17:34     ` Cyril Chemparathy
2012-08-13  3:32       ` Nicolas Pitre
2012-08-11  1:24 ` [PATCH v2 04/22] ARM: LPAE: use phys_addr_t on virt <--> phys conversion Cyril Chemparathy
2012-08-12  3:04   ` Nicolas Pitre
2012-08-11  1:24 ` [PATCH v2 05/22] ARM: LPAE: support 64-bit virt_to_phys patching Cyril Chemparathy
2012-08-12  3:39   ` Nicolas Pitre
2012-08-12 23:27     ` Cyril Chemparathy [this message]
2012-08-13  4:03       ` Nicolas Pitre
2012-08-11  1:24 ` [PATCH v2 06/22] ARM: LPAE: use signed arithmetic for mask definitions Cyril Chemparathy
2012-08-12  3:57   ` Nicolas Pitre
2012-08-11  1:24 ` [PATCH v2 07/22] ARM: LPAE: use phys_addr_t in alloc_init_pud() Cyril Chemparathy
2012-08-11  1:24 ` [PATCH v2 08/22] ARM: LPAE: use phys_addr_t in free_memmap() Cyril Chemparathy
2012-08-11  1:24 ` [PATCH v2 09/22] ARM: LPAE: use phys_addr_t for initrd location and size Cyril Chemparathy
2012-08-12  3:58   ` Nicolas Pitre
2012-08-11  1:24 ` [PATCH v2 10/22] ARM: LPAE: use phys_addr_t in switch_mm() Cyril Chemparathy
2012-08-12  4:04   ` Nicolas Pitre
2012-08-11  1:24 ` [PATCH v2 11/22] ARM: LPAE: use 64-bit accessors for TTBR registers Cyril Chemparathy
2012-08-12  4:11   ` Nicolas Pitre
2012-08-11  1:24 ` [PATCH v2 12/22] ARM: LPAE: define ARCH_LOW_ADDRESS_LIMIT for bootmem Cyril Chemparathy
2012-08-11  1:24 ` [PATCH v2 13/22] ARM: LPAE: factor out T1SZ and TTBR1 computations Cyril Chemparathy
2012-08-12  4:19   ` Nicolas Pitre
2012-08-11  1:24 ` [PATCH v2 14/22] ARM: LPAE: accomodate >32-bit addresses for page table base Cyril Chemparathy
2012-08-11  1:24 ` [PATCH v2 15/22] ARM: mm: use physical addresses in highmem sanity checks Cyril Chemparathy
2012-08-12  4:29   ` Nicolas Pitre
2012-08-11  1:24 ` [PATCH v2 16/22] ARM: mm: cleanup checks for membank overlap with vmalloc area Cyril Chemparathy
2012-08-12  4:36   ` Nicolas Pitre
2012-09-10 17:43     ` Cyril Chemparathy
2012-09-10 18:07       ` Nicolas Pitre
2012-08-11  1:25 ` [PATCH v2 17/22] ARM: mm: clean up membank size limit checks Cyril Chemparathy
2012-08-11  1:25 ` [PATCH v2 18/22] ARM: add virt_to_idmap for interconnect aliasing Cyril Chemparathy
2012-08-11  1:25 ` [PATCH v2 19/22] ARM: recreate kernel mappings in early_paging_init() Cyril Chemparathy
2012-08-11  1:25 ` [RFC v2 21/22] ARM: keystone: enable SMP on Keystone machines Cyril Chemparathy
2012-08-11  1:25 ` [RFC v2 22/22] ARM: keystone: add switch over to high physical address range Cyril Chemparathy
2012-08-11  1:26 ` [RFC v2 20/22] ARM: keystone: introducing TI Keystone platform Cyril Chemparathy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50283BF5.1050908@ti.com \
    --to=cyril@ti$(echo .)com \
    --cc=linux-arm-kernel@lists$(echo .)infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox