public inbox for linux-arm-kernel@lists.infradead.org 
 help / color / mirror / Atom feed
* [RESENDx2] [PULL] virtio: fix barriers for virtio-mmio
       [not found]   ` <CA+55aFzKU0GWqpQo2AzAawoVm=qPaJiwKTcifeYApyZP_Hj3nQ@mail.gmail.com>
@ 2011-12-23 11:35     ` Ohad Ben-Cohen
  2011-12-24  3:01       ` Rusty Russell
  2011-12-29  4:31       ` Rusty Russell
  0 siblings, 2 replies; 4+ messages in thread
From: Ohad Ben-Cohen @ 2011-12-23 11:35 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Dec 23, 2011 at 11:35 AM, Linus Torvalds
<torvalds@linux-foundation•org> wrote:
> On x86, there really is never any reason to use the heavy memory
> barriers unless you are talking to a real device. And last I saw,
> "virtio" was still about virtual IO.

I reported this originally, so maybe I should describe our use case a
bit here (it's not virtio-mmio). It's a bit long, so apologizes in
advance.

Almost every SoC today have several additional cores (DSP or whatnot)
which usually employ some hardware multimedia accelerators and are
used to offload cpu-intensive tasks from the main application
processor. These other cores are used in an asymmetric multiprocessing
configurations, i.e. they run their own instance of operating system
(which can be some flavor of RTOS, or Linux, or whatnot. anything
goes).

Virtually every SoC vendor have this (lots of ARM vendors, but it's
definitely not limited to ARM), and they all have their own way of
controlling, and communicating with, those remote cores. And it's
usually rather big (tens of thousands loc), out-of-tree and very
vendor-specific code.

So we're trying to fix this by introducing some generic code that'd
control those remote cores and let drivers talk to them, which all
vendors could then use. I'll be sending you a 3.3 pull request for
this, but you can already take a look in linux-next at drivers/rpmsg
(inter-processor communication bus) and drivers/remoteproc (framework
for booting a remote core).

And rpmsg is using virtio to avoid implementing another shared memory
"wire" protocol. And of course to be able to reuse all the existing
virtio drivers (e.g. net, block, console) with a remote core backend.

Which leads me to the specific issue we have. On OMAP4, the virtio
kick is implemented using a memory-mapped mailbox device. After
updating a vring (which is mapped using ARM's Normal memory) and
before kicking the remote core (using the mailbox device which is
mapped using ARM's Device memory) we must use a "heavy" memory barrier
(i.e. ARM's DSB). Otherwise, if only an smp memory barrier is used
(i.e. DMB on ARM), the kick might jump ahead before the remote core
has observed the updates to the vrings. And then bad things happen.

We didn't want to inflict the performance degradation on the
virtualization use cases (which can run concurrently with the remote
core scenarios), hence the dynamic "IO or SMP barrier" thingy.

(Btw we don't have enough information about other
setups/configurations as other vendors just begin using virtio for
these kind of scenarios, but we guess this is probably isn't limited
to OMAP. Fixing it at the transport layer sounds reasonable, although
there are other ways to do this too).

Hope this makes sense,

Thanks,
Ohad.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [RESENDx2] [PULL] virtio: fix barriers for virtio-mmio
  2011-12-23 11:35     ` [RESENDx2] [PULL] virtio: fix barriers for virtio-mmio Ohad Ben-Cohen
@ 2011-12-24  3:01       ` Rusty Russell
  2011-12-29  4:31       ` Rusty Russell
  1 sibling, 0 replies; 4+ messages in thread
From: Rusty Russell @ 2011-12-24  3:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 23 Dec 2011 13:35:26 +0200, Ohad Ben-Cohen <ohad@wizery•com> wrote:
> On Fri, Dec 23, 2011 at 11:35 AM, Linus Torvalds
> <torvalds@linux-foundation•org> wrote:
> > On x86, there really is never any reason to use the heavy memory
> > barriers unless you are talking to a real device. And last I saw,
> > "virtio" was still about virtual IO.

Not any more, as the commit message describes.

But I missed that they're using rpmsg, which isn't merged yet.

Sorry for the noise,
Rusty.
PS. Switching barriers by bus type is too crude anyway, let's come up
    with something better...

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [RESENDx2] [PULL] virtio: fix barriers for virtio-mmio
  2011-12-23 11:35     ` [RESENDx2] [PULL] virtio: fix barriers for virtio-mmio Ohad Ben-Cohen
  2011-12-24  3:01       ` Rusty Russell
@ 2011-12-29  4:31       ` Rusty Russell
  2011-12-29  6:53         ` Ohad Ben-Cohen
  1 sibling, 1 reply; 4+ messages in thread
From: Rusty Russell @ 2011-12-29  4:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 23 Dec 2011 13:35:26 +0200, Ohad Ben-Cohen <ohad@wizery•com> wrote:
> On Fri, Dec 23, 2011 at 11:35 AM, Linus Torvalds
> And rpmsg is using virtio to avoid implementing another shared memory
> "wire" protocol. And of course to be able to reuse all the existing
> virtio drivers (e.g. net, block, console) with a remote core backend.

OK, I've applied this patch; you might want to carry it in your tree
too.  You'll need to fix up your call to vring_new_virtqueue()....

From: Rusty Russell <rusty@rustcorp•com.au>
Subject: virtio: harsher barriers for rpmsg.

We were cheating with our barriers; using the smp ones rather than the
real device ones.  That was fine, until rpmsg came along, which is
used to talk to a real device (a non-SMP CPU).

Unfortunately, just putting back the real barriers (reverting
d57ed95d) causes a performance regression on virtio-pci.  In
particular, Amos reports netbench's TCP_RR over virtio_net CPU
utilization increased up to 35% while throughput went down by up to
14%.

By comparison, this branch is in the noise.

Reference: https://lkml.org/lkml/2011/12/11/22

Signed-off-by: Rusty Russell <rusty@rustcorp•com.au>
---
 drivers/lguest/lguest_device.c |    8 +++++---
 drivers/s390/kvm/kvm_virtio.c  |    2 +-
 drivers/virtio/virtio_mmio.c   |    4 ++--
 drivers/virtio/virtio_pci.c    |    4 ++--
 drivers/virtio/virtio_ring.c   |   34 +++++++++++++++++++++-------------
 include/linux/virtio_ring.h    |    1 +
 tools/virtio/linux/virtio.h    |    1 +
 tools/virtio/virtio_test.c     |    3 ++-
 8 files changed, 35 insertions(+), 22 deletions(-)

diff --git a/drivers/lguest/lguest_device.c b/drivers/lguest/lguest_device.c
--- a/drivers/lguest/lguest_device.c
+++ b/drivers/lguest/lguest_device.c
@@ -292,10 +292,12 @@ static struct virtqueue *lg_find_vq(stru
 
 	/*
 	 * OK, tell virtio_ring.c to set up a virtqueue now we know its size
-	 * and we've got a pointer to its pages.
+	 * and we've got a pointer to its pages.  Note that we set weak_barriers
+	 * to 'true': the host just a(nother) SMP CPU, so we only need inter-cpu
+	 * barriers.
 	 */
-	vq = vring_new_virtqueue(lvq->config.num, LGUEST_VRING_ALIGN,
-				 vdev, lvq->pages, lg_notify, callback, name);
+	vq = vring_new_virtqueue(lvq->config.num, LGUEST_VRING_ALIGN, vdev,
+				 true, lvq->pages, lg_notify, callback, name);
 	if (!vq) {
 		err = -ENOMEM;
 		goto unmap;
diff --git a/drivers/s390/kvm/kvm_virtio.c b/drivers/s390/kvm/kvm_virtio.c
--- a/drivers/s390/kvm/kvm_virtio.c
+++ b/drivers/s390/kvm/kvm_virtio.c
@@ -198,7 +198,7 @@ static struct virtqueue *kvm_find_vq(str
 		goto out;
 
 	vq = vring_new_virtqueue(config->num, KVM_S390_VIRTIO_RING_ALIGN,
-				 vdev, (void *) config->address,
+				 vdev, true, (void *) config->address,
 				 kvm_notify, callback, name);
 	if (!vq) {
 		err = -ENOMEM;
diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
--- a/drivers/virtio/virtio_mmio.c
+++ b/drivers/virtio/virtio_mmio.c
@@ -310,8 +310,8 @@ static struct virtqueue *vm_setup_vq(str
 			vm_dev->base + VIRTIO_MMIO_QUEUE_PFN);
 
 	/* Create the vring */
-	vq = vring_new_virtqueue(info->num, VIRTIO_MMIO_VRING_ALIGN,
-				 vdev, info->queue, vm_notify, callback, name);
+	vq = vring_new_virtqueue(info->num, VIRTIO_MMIO_VRING_ALIGN, vdev,
+				 true, info->queue, vm_notify, callback, name);
 	if (!vq) {
 		err = -ENOMEM;
 		goto error_new_virtqueue;
diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -414,8 +414,8 @@ static struct virtqueue *setup_vq(struct
 		  vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
 
 	/* create the vring */
-	vq = vring_new_virtqueue(info->num, VIRTIO_PCI_VRING_ALIGN,
-				 vdev, info->queue, vp_notify, callback, name);
+	vq = vring_new_virtqueue(info->num, VIRTIO_PCI_VRING_ALIGN, vdev,
+				 true, info->queue, vp_notify, callback, name);
 	if (!vq) {
 		err = -ENOMEM;
 		goto out_activate_queue;
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -28,17 +28,20 @@
 #ifdef CONFIG_SMP
 /* Where possible, use SMP barriers which are more lightweight than mandatory
  * barriers, because mandatory barriers control MMIO effects on accesses
- * through relaxed memory I/O windows (which virtio does not use). */
-#define virtio_mb() smp_mb()
-#define virtio_rmb() smp_rmb()
-#define virtio_wmb() smp_wmb()
+ * through relaxed memory I/O windows (which virtio-pci does not use). */
+#define virtio_mb(vq) \
+	do { if ((vq)->weak_barriers) smp_mb(); else mb(); } while(0)
+#define virtio_rmb(vq) \
+	do { if ((vq)->weak_barriers) smp_rmb(); else rmb(); } while(0)
+#define virtio_wmb(vq) \
+	do { if ((vq)->weak_barriers) smp_rmb(); else rmb(); } while(0)
 #else
 /* We must force memory ordering even if guest is UP since host could be
  * running on another CPU, but SMP barriers are defined to barrier() in that
  * configuration. So fall back to mandatory barriers instead. */
-#define virtio_mb() mb()
-#define virtio_rmb() rmb()
-#define virtio_wmb() wmb()
+#define virtio_mb(vq) mb()
+#define virtio_rmb(vq) rmb()
+#define virtio_wmb(vq) wmb()
 #endif
 
 #ifdef DEBUG
@@ -77,6 +80,9 @@ struct vring_virtqueue
 	/* Actual memory layout for this queue */
 	struct vring vring;
 
+	/* Can we use weak barriers? */
+	bool weak_barriers;
+
 	/* Other side has made a mess, don't try any more. */
 	bool broken;
 
@@ -245,14 +251,14 @@ void virtqueue_kick(struct virtqueue *_v
 	START_USE(vq);
 	/* Descriptors and available array need to be set before we expose the
 	 * new available array entries. */
-	virtio_wmb();
+	virtio_wmb(vq);
 
 	old = vq->vring.avail->idx;
 	new = vq->vring.avail->idx = old + vq->num_added;
 	vq->num_added = 0;
 
 	/* Need to update avail index before checking if we should notify */
-	virtio_mb();
+	virtio_mb(vq);
 
 	if (vq->event ?
 	    vring_need_event(vring_avail_event(&vq->vring), new, old) :
@@ -314,7 +320,7 @@ void *virtqueue_get_buf(struct virtqueue
 	}
 
 	/* Only get used array entries after they have been exposed by host. */
-	virtio_rmb();
+	virtio_rmb(vq);
 
 	i = vq->vring.used->ring[vq->last_used_idx%vq->vring.num].id;
 	*len = vq->vring.used->ring[vq->last_used_idx%vq->vring.num].len;
@@ -337,7 +343,7 @@ void *virtqueue_get_buf(struct virtqueue
 	 * the read in the next get_buf call. */
 	if (!(vq->vring.avail->flags & VRING_AVAIL_F_NO_INTERRUPT)) {
 		vring_used_event(&vq->vring) = vq->last_used_idx;
-		virtio_mb();
+		virtio_mb(vq);
 	}
 
 	END_USE(vq);
@@ -366,7 +372,7 @@ bool virtqueue_enable_cb(struct virtqueu
 	 * entry. Always do both to keep code simple. */
 	vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
 	vring_used_event(&vq->vring) = vq->last_used_idx;
-	virtio_mb();
+	virtio_mb(vq);
 	if (unlikely(more_used(vq))) {
 		END_USE(vq);
 		return false;
@@ -393,7 +399,7 @@ bool virtqueue_enable_cb_delayed(struct 
 	/* TODO: tune this threshold */
 	bufs = (u16)(vq->vring.avail->idx - vq->last_used_idx) * 3 / 4;
 	vring_used_event(&vq->vring) = vq->last_used_idx + bufs;
-	virtio_mb();
+	virtio_mb(vq);
 	if (unlikely((u16)(vq->vring.used->idx - vq->last_used_idx) > bufs)) {
 		END_USE(vq);
 		return false;
@@ -453,6 +459,7 @@ EXPORT_SYMBOL_GPL(vring_interrupt);
 struct virtqueue *vring_new_virtqueue(unsigned int num,
 				      unsigned int vring_align,
 				      struct virtio_device *vdev,
+				      bool weak_barriers,
 				      void *pages,
 				      void (*notify)(struct virtqueue *),
 				      void (*callback)(struct virtqueue *),
@@ -476,6 +483,7 @@ struct virtqueue *vring_new_virtqueue(un
 	vq->vq.vdev = vdev;
 	vq->vq.name = name;
 	vq->notify = notify;
+	vq->weak_barriers = weak_barriers;
 	vq->broken = false;
 	vq->last_used_idx = 0;
 	vq->num_added = 0;
diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
--- a/include/linux/virtio_ring.h
+++ b/include/linux/virtio_ring.h
@@ -168,6 +168,7 @@ struct virtqueue;
 struct virtqueue *vring_new_virtqueue(unsigned int num,
 				      unsigned int vring_align,
 				      struct virtio_device *vdev,
+				      bool weak_barriers,
 				      void *pages,
 				      void (*notify)(struct virtqueue *vq),
 				      void (*callback)(struct virtqueue *vq),
diff --git a/tools/virtio/linux/virtio.h b/tools/virtio/linux/virtio.h
--- a/tools/virtio/linux/virtio.h
+++ b/tools/virtio/linux/virtio.h
@@ -214,6 +214,7 @@ void *virtqueue_detach_unused_buf(struct
 struct virtqueue *vring_new_virtqueue(unsigned int num,
 				      unsigned int vring_align,
 				      struct virtio_device *vdev,
+				      bool weak_barriers,
 				      void *pages,
 				      void (*notify)(struct virtqueue *vq),
 				      void (*callback)(struct virtqueue *vq),
diff --git a/tools/virtio/virtio_test.c b/tools/virtio/virtio_test.c
--- a/tools/virtio/virtio_test.c
+++ b/tools/virtio/virtio_test.c
@@ -92,7 +92,8 @@ static void vq_info_add(struct vdev_info
 	assert(r >= 0);
 	memset(info->ring, 0, vring_size(num, 4096));
 	vring_init(&info->vring, num, info->ring, 4096);
-	info->vq = vring_new_virtqueue(info->vring.num, 4096, &dev->vdev, info->ring,
+	info->vq = vring_new_virtqueue(info->vring.num, 4096, &dev->vdev,
+				       true, info->ring,
 				       vq_notify, vq_callback, "test");
 	assert(info->vq);
 	info->vq->priv = info;

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [RESENDx2] [PULL] virtio: fix barriers for virtio-mmio
  2011-12-29  4:31       ` Rusty Russell
@ 2011-12-29  6:53         ` Ohad Ben-Cohen
  0 siblings, 0 replies; 4+ messages in thread
From: Ohad Ben-Cohen @ 2011-12-29  6:53 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Dec 29, 2011 at 6:31 AM, Rusty Russell <rusty@rustcorp•com.au> wrote:
> OK, I've applied this patch; you might want to carry it in your tree
> too. ?You'll need to fix up your call to vring_new_virtqueue()....

Will do. Thanks !

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-12-29  6:53 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <8739cbkgv2.fsf@rustcorp.com.au>
     [not found] ` <CA+55aFyvMkAtXPRxiFR+UpZKGHMhp8Vd9aYg1dkaoKMwD5MLcw@mail.gmail.com>
     [not found]   ` <CA+55aFzKU0GWqpQo2AzAawoVm=qPaJiwKTcifeYApyZP_Hj3nQ@mail.gmail.com>
2011-12-23 11:35     ` [RESENDx2] [PULL] virtio: fix barriers for virtio-mmio Ohad Ben-Cohen
2011-12-24  3:01       ` Rusty Russell
2011-12-29  4:31       ` Rusty Russell
2011-12-29  6:53         ` Ohad Ben-Cohen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox