public inbox for netdev@vger.kernel.org 
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <brouer@redhat•com>
To: Eric Dumazet <eric.dumazet@gmail•com>
Cc: Rick Jones <rick.jones2@hpe•com>,
	netdev@vger•kernel.org, Saeed Mahameed <saeedm@mellanox•com>,
	Tariq Toukan <tariqt@mellanox•com>,
	Achiad Shochat <achiad@mellanox•com>,
	brouer@redhat•com
Subject: Re: [WIP] net+mlx4: auto doorbell
Date: Wed, 30 Nov 2016 23:30:15 +0100	[thread overview]
Message-ID: <20161130233015.3de95356@redhat.com> (raw)
In-Reply-To: <1480534200.18162.203.camel@edumazet-glaptop3.roam.corp.google.com>

On Wed, 30 Nov 2016 11:30:00 -0800
Eric Dumazet <eric.dumazet@gmail•com> wrote:

> On Wed, 2016-11-30 at 20:17 +0100, Jesper Dangaard Brouer wrote:
> 
> > Don't take is as critique Eric.  I was hoping your patch would have
> > solved this issue of being sensitive to TX completion adjustments.  You
> > usually have good solutions for difficult issues. I basically rejected
> > Achiad's approach/patch because it was too sensitive to these kind of
> > adjustments.  
> 
> Well, this patch can hurt latencies, because a doorbell can be delayed,
> and softirqs can be delayed by many hundred of usec in some cases.
> 
> I would not enable this behavior by default.

What about another scheme, where dev_hard_start_xmit() can return an
indication that driver choose not to flush (based on TX queue depth),
and there by requesting stack to call flush at a later point.

Would that introduce less latency issues?


Patch muckup (not even compile tested):

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 4ffcd874cc20..d7d15e4e6766 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -109,6 +109,7 @@ enum netdev_tx {
 	__NETDEV_TX_MIN	 = INT_MIN,	/* make sure enum is signed */
 	NETDEV_TX_OK	 = 0x00,	/* driver took care of packet */
 	NETDEV_TX_BUSY	 = 0x10,	/* driver tx path was busy*/
+	NETDEV_TX_FLUSHME= 0x04,	/* driver request doorbell/flush later */
 };
 typedef enum netdev_tx netdev_tx_t;
 
@@ -536,6 +537,8 @@ enum netdev_queue_state_t {
 	__QUEUE_STATE_DRV_XOFF,
 	__QUEUE_STATE_STACK_XOFF,
 	__QUEUE_STATE_FROZEN,
+	// __QUEUE_STATE_NEED_FLUSH
+	// is is better to store in txq state?
 };
 
 #define QUEUE_STATE_DRV_XOFF	(1 << __QUEUE_STATE_DRV_XOFF)
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index e6aa0a249672..7480e44c5a50 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -75,6 +75,7 @@ struct Qdisc {
 	void			*u32_node;
 
 	struct netdev_queue	*dev_queue;
+	struct netdev_queue	*flush_dev_queue; // store txq to flush here?
 
 	struct gnet_stats_rate_est64	rate_est;
 	struct gnet_stats_basic_cpu __percpu *cpu_bstats;
@@ -98,6 +99,20 @@ struct Qdisc {
 	spinlock_t		busylock ____cacheline_aligned_in_smp;
 };
 
+static inline void qdisc_request_txq_flush(struct Qdisc *qdisc,
+					   struct netdev_queue *txq)
+{
+	struct net_device dev;
+
+	if (qdisc->flush_dev_queue) {
+		if (likely(qdisc->flush_dev_queue == txq))
+			return;
+		/* Flush existing txq before reassignment */
+		dev_flush_xmit(qdisc_dev(q), txq);
+	}
+	qdisc->flush_dev_queue = txq;
+}
+
 static inline bool qdisc_is_running(const struct Qdisc *qdisc)
 {
 	return (raw_read_seqcount(&qdisc->running) & 1) ? true : false;
@@ -117,6 +132,19 @@ static inline bool qdisc_run_begin(struct Qdisc *qdisc)
 
 static inline void qdisc_run_end(struct Qdisc *qdisc)
 {
+	/* flush device txq here, if needed */
+	if (qdisc->flush_dev_queue) {
+		struct netdev_queue *txq = qdisc->flush_dev_queue;
+		struct net_device *dev = qdisc_dev(q);
+
+		qdisc->flush_dev_queue = NULL;
+		dev_flush_xmit(dev, txq);
+		/*
+		 * DISCUSS: it is too soon to flush here? What about
+		 * rescheduling a NAPI poll cycle for this device,
+		 * before calling flush.
+		 */
+	}
 	write_seqcount_end(&qdisc->running);
 }
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 048b46b7c92a..70339c267f33 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2880,6 +2880,15 @@ netdev_features_t netif_skb_features(struct sk_buff *skb)
 }
 EXPORT_SYMBOL(netif_skb_features);
 
+static int dev_flush_xmit(struct net_device *dev,
+			  struct netdev_queue *txq)
+{
+	const struct net_device_ops *ops = dev->netdev_ops;
+
+	ops->ndo_flush_xmit(dev, txq);
+	// Oh oh, do we need to take HARD_TX_LOCK ??
+}
+
 static int xmit_one(struct sk_buff *skb, struct net_device *dev,
 		    struct netdev_queue *txq, bool more)
 {
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 6cfb6e9038c2..55c01b6f6311 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -190,6 +190,13 @@ int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
 
 	if (dev_xmit_complete(ret)) {
 		/* Driver sent out skb successfully or skb was consumed */
+		if (ret == NETDEV_TX_FLUSHME) {
+			/* Driver choose no-TX-doorbell MMIO write.
+			 * This made taking qdisc root_lock less expensive.
+			 */
+			qdisc_request_txq_flush(q, txq);
+			// Flush happens later in qdisc_run_end()
+		}
 		ret = qdisc_qlen(q);
 	} else {
 		/* Driver returned NETDEV_TX_BUSY - requeue skb */


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

  reply	other threads:[~2016-11-30 22:30 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-03 14:59 High perf top ip_idents_reserve doing netperf UDP_STREAM Jesper Dangaard Brouer
2014-09-03 15:17 ` Eric Dumazet
2016-11-16 12:16   ` Netperf UDP issue with connected sockets Jesper Dangaard Brouer
2016-11-16 17:46     ` Rick Jones
2016-11-16 22:40       ` Jesper Dangaard Brouer
2016-11-16 22:50         ` Rick Jones
2016-11-17  0:34         ` Eric Dumazet
2016-11-17  8:16           ` Jesper Dangaard Brouer
2016-11-17 13:20             ` Eric Dumazet
2016-11-17 13:42               ` Jesper Dangaard Brouer
2016-11-17 14:17                 ` Eric Dumazet
2016-11-17 14:57                   ` Jesper Dangaard Brouer
2016-11-17 16:21                     ` Eric Dumazet
2016-11-17 18:30                       ` Jesper Dangaard Brouer
2016-11-17 18:51                         ` Eric Dumazet
2016-11-17 21:19                           ` Jesper Dangaard Brouer
2016-11-17 21:44                             ` Eric Dumazet
2016-11-17 23:08                               ` Rick Jones
2016-11-18  0:37                                 ` Julian Anastasov
2016-11-18  0:42                                   ` Rick Jones
2016-11-18 17:12                               ` Jesper Dangaard Brouer
2016-11-21 16:03                           ` Jesper Dangaard Brouer
2016-11-21 18:10                             ` Eric Dumazet
2016-11-29  6:58                               ` [WIP] net+mlx4: auto doorbell Eric Dumazet
2016-11-30 11:38                                 ` Jesper Dangaard Brouer
2016-11-30 15:56                                   ` Eric Dumazet
2016-11-30 19:17                                     ` Jesper Dangaard Brouer
2016-11-30 19:30                                       ` Eric Dumazet
2016-11-30 22:30                                         ` Jesper Dangaard Brouer [this message]
2016-11-30 22:40                                           ` Eric Dumazet
2016-12-01  0:27                                         ` Eric Dumazet
2016-12-01  1:16                                           ` Tom Herbert
2016-12-01  2:32                                             ` Eric Dumazet
2016-12-01  2:50                                               ` Eric Dumazet
2016-12-02 18:16                                                 ` Eric Dumazet
2016-12-01  5:03                                               ` Tom Herbert
2016-12-01 19:24                                                 ` Willem de Bruijn
2016-11-30 13:50                                 ` Saeed Mahameed
2016-11-30 15:44                                   ` Eric Dumazet
2016-11-30 16:27                                     ` Saeed Mahameed
2016-11-30 17:28                                       ` Eric Dumazet
2016-12-01 12:05                                       ` Jesper Dangaard Brouer
2016-12-01 14:24                                         ` Eric Dumazet
2016-12-01 16:04                                           ` Jesper Dangaard Brouer
2016-12-01 17:04                                             ` Eric Dumazet
2016-12-01 19:17                                               ` Jesper Dangaard Brouer
2016-12-01 20:11                                                 ` Eric Dumazet
2016-12-01 20:20                                               ` David Miller
2016-12-01 22:10                                                 ` Eric Dumazet
2016-12-02 14:23                                               ` Eric Dumazet
2016-12-01 21:32                                 ` Alexander Duyck
2016-12-01 22:04                                   ` Eric Dumazet
2016-11-17 17:34                     ` Netperf UDP issue with connected sockets David Laight
2016-11-17 22:39                       ` Alexander Duyck
2016-11-17 17:42             ` Rick Jones
2016-11-28 18:33             ` Rick Jones
2016-11-28 18:40               ` Rick Jones
2016-11-30 10:43               ` Jesper Dangaard Brouer
2016-11-30 17:42                 ` Rick Jones
2016-11-30 18:11                   ` David Miller
  -- strict thread matches above, loose matches on Subject: below --
2016-11-30  7:28 [WIP] net+mlx4: auto doorbell Alexei Starovoitov
2016-11-30 15:50 ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161130233015.3de95356@redhat.com \
    --to=brouer@redhat$(echo .)com \
    --cc=achiad@mellanox$(echo .)com \
    --cc=eric.dumazet@gmail$(echo .)com \
    --cc=netdev@vger$(echo .)kernel.org \
    --cc=rick.jones2@hpe$(echo .)com \
    --cc=saeedm@mellanox$(echo .)com \
    --cc=tariqt@mellanox$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox