public inbox for linuxppc-dev@ozlabs.org 
 help / color / mirror / Atom feed
From: Alexey Kardashevskiy <aik@ozlabs•ru>
To: Benjamin Herrenschmidt <benh@kernel•crashing.org>
Cc: linuxppc-dev@lists•ozlabs.org,
	Alex Williamson <alex.williamson@redhat•com>,
	Paul Mackerras <paulus@samba•org>,
	David Gibson <david@gibson•dropbear.id.au>
Subject: Re: [PATCH] powerpc-powernv: align BARs to PAGE_SIZE on powernv platform
Date: Wed, 05 Sep 2012 15:27:13 +1000	[thread overview]
Message-ID: <5046E2B1.8010805@ozlabs.ru> (raw)
In-Reply-To: <1346822276.2257.46.camel@pasglop>

On 05/09/12 15:17, Benjamin Herrenschmidt wrote:
> On Tue, 2012-09-04 at 22:57 -0600, Alex Williamson wrote:
>
>> Do we need an extra region info field, or is it sufficient that we
>> define a region to be mmap'able with getpagesize() pages when the MMAP
>> flag is set and simply offset the region within the device fd?  ex.
>
> Alexey ? You mentioned you had ways to get at the offset with the
> existing interfaces ?


Yes, VFIO_DEVICE_GET_REGION_INFO ioctl of vfio-pci host driver, the "info" 
struct has an "offset" field.
I just do not have a place to use it in the QEMU right now as the guest 
does the same allocation as the host does (by accident).


>> BAR0: 0x10000 /* no offset */
>> BAR1: 0x21000 /* 4k offset */
>> BAR2: 0x32000 /* 8k offset */
>>
>> A second level optimization might make these 0x10000, 0x11000, 0x12000.
>>
>> This will obviously require some arch hooks w/in vfio as we can't do
>> this on x86 since we can't guarantee that whatever lives in the
>> overflow/gaps is in the same group and power is going to need to make
>> sure we don't accidentally allow msix table mapping... in fact hiding
>> the msix table might be a lot more troublesome on 64k page hosts.
>
> Fortunately, our guests don't access the msix table directly anyway, at
> least most of the time :-)


Not at all in our case. It took me some time to push a QEMU patch which 
changes msix table :)


> There's a paravirt API for it, and our iommu
> makes sure that if for some reason the guest still accesses it and does
> the wrong thing to it, the side effects will be contained to the guest.

>>> Now the main problem here is going to be that the guest itself might
>>> reallocate the BAR and move it around (well, it's version of the BAR
>>> which isn't the real thing), and so we cannot create a direct MMU
>>> mapping between -that- and the real BAR.
>>>
>>> IE. We can only allow that direct mapping if the guest BAR mapping has
>>> the same "offset within page" as the host BAR mapping.
>>
>> Euw...
>
> Yeah sucks :-) Basically, let's say page size is 64K. Host side BAR
> (real BAR) is at 0xf0001000.
>
> qemu maps 0xf0000000..0xf000ffff to a virtual address inside QEMU,
> itself 64k aligned, let's say 0x80000000 and knows that the BAR is at
> offset 0x1000 in there.
>
> However, the KVM "MR" API is such that we can only map PAGE_SIZE regions
> into the guest as well, so if the guest assigns a value ADDR to the
> guest BAR, let's say 0x40002000, all KVM can do is an MR that maps
> 0x40000000 (guest physical) to 0x80000000 (qemu). Any access within that
> 64K page will have the low bits transferred directly from guest to HW.
>
> So the guest will end up having that 0x2000 offset instead of the 0x1000
> needed to actually access the BAR. FAIL.
>
> There are ways to fix that but all are nasty.
>
>   - In theory, we have the capability (and use it today) to restrict IO
> mappings in the guest to 4K HW pages, so knowing that, KVM could use a
> "special" MR that plays tricks here... but that would break all sort of
> generic code both in qemu and kvm and generally be very nasty.
>
>   - The best approach is to rely on the fact that our guest kernels don't
> do BAR assignment, they rely on FW to do it (ie not at all, unlike x86,
> we can't even fixup because in the general case, the hypervisor won't
> let us anyway). So we could move our guest BAR allocation code out of
> our guest firmware (SLOF) back into qemu (where we had it very early
> on), which allows us to make sure that the guest BAR values we assign
> have the same "offset within the page" as the host side values. This
> would also allow us to avoid messing up too many MRs (this can have a
> performance impact with KVM) and eventually handle our "group" regions
> instead of individual BARs for mappings. We might need to do that anyway
> in the long run for hotplug as our hotplug hypervisor APIs also rely on
> the "new" hotplugged devices to have the BARs pre-assigned when they get
> handed out to the guest.
>
>>> Our guests don't mess with BARs but SLOF does ... it's really tempting
>>> to look into bringing the whole BAR allocation back into qemu and out of
>>> SLOF :-( (We might have to if we ever do hotplug anyway). That way qemu
>>> could set offsets that match appropriately.
>>
>> BTW, as I mentioned elsewhere, I'm on vacation this week, but I'll try
>> to keep up as much as I have time for.
>
> No worries,


-- 
Alexey

  reply	other threads:[~2012-09-05  5:27 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20120821113534.GS29724@truffula.fritz.box>
2012-09-04  7:33 ` [PATCH] vfio: enabled and supported on power (v7) Alexey Kardashevskiy
2012-09-04  7:35   ` [PATCH] powerpc-powernv: added tce_get callback for powernv platform Alexey Kardashevskiy
2012-09-04 19:41     ` Benjamin Herrenschmidt
2012-09-04 22:35       ` David Gibson
2012-09-05  0:19       ` Alexey Kardashevskiy
2012-09-05  0:32         ` Benjamin Herrenschmidt
2012-09-04  7:36   ` [PATCH] powerpc-kvm: fixing page alignment for TCE Alexey Kardashevskiy
2012-09-20  9:01     ` Alexander Graf
2012-09-04  7:36   ` [PATCH] powerpc-powernv: align BARs to PAGE_SIZE on powernv platform Alexey Kardashevskiy
2012-09-04 19:45     ` Benjamin Herrenschmidt
2012-09-05  0:55       ` Alexey Kardashevskiy
2012-09-05  1:16         ` Benjamin Herrenschmidt
2012-09-05  4:57           ` Alex Williamson
2012-09-05  5:17             ` Benjamin Herrenschmidt
2012-09-05  5:27               ` Alexey Kardashevskiy [this message]
2012-09-10 17:06                 ` Alex Williamson
2012-09-10 16:02   ` [PATCH] vfio: enabled and supported on power (v7) Alex Williamson
2012-09-11  8:28     ` Alexey Kardashevskiy
2012-09-13 22:34       ` Alex Williamson
2012-09-13 22:41         ` Scott Wood
2012-09-13 22:55           ` Alex Williamson
2012-09-14  0:51         ` Alexey Kardashevskiy
2012-09-14  4:35           ` Alex Williamson
2012-10-11  8:19             ` Alexey Kardashevskiy
2012-10-11 18:09               ` Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5046E2B1.8010805@ozlabs.ru \
    --to=aik@ozlabs$(echo .)ru \
    --cc=alex.williamson@redhat$(echo .)com \
    --cc=benh@kernel$(echo .)crashing.org \
    --cc=david@gibson$(echo .)dropbear.id.au \
    --cc=linuxppc-dev@lists$(echo .)ozlabs.org \
    --cc=paulus@samba$(echo .)org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox