public inbox for netdev@vger.kernel.org 
 help / color / mirror / Atom feed
From: Amir Vadai <amirv@mellanox•com>
To: John Fastabend <john.r.fastabend@intel•com>
Cc: "David S. Miller" <davem@davemloft•net>, <netdev@vger•kernel.org>,
	Roland Dreier <roland@purestorage•com>,
	Oren Duer <oren@mellanox•com>, Amir Vadai <amirv@mellanox•co.il>,
	Jeff Kirsher <jeffrey.t.kirsher@intel•com>,
	Eilon Greenstein <eilong@broadcom•com>
Subject: Re: [PATCH 7/8] net: support tx_ring per UP in HW based QoS mechanism
Date: Wed, 14 Mar 2012 12:09:30 +0200	[thread overview]
Message-ID: <4F606E5A.6080407@mellanox.com> (raw)
In-Reply-To: <4F5F9087.7090607@intel.com>

On 03/13/2012 08:23 PM, John Fastabend wrote:
> On 3/13/2012 10:22 AM, Amir Vadai wrote:
>> From: Amir Vadai<amirv@mellanox•co.il>
>>
>> The Current HW based QoS mechanism which was introduced in commit 4f57c087de9
>> "net: implement mechanism for HW based QOS" is in orientation to ETS traffic
>> class. This patch introduces an approach which allow to use this mechanism also
>> with hardware who has queues per user priority (UP). After the change,
>> __skb_tx_hash() will direct a flow to a tx ring from a range of tx rings. This
>> range is defined by the caller function by the specific HW. If TC based queues,
>> the range is by TC number and for UP based queues, the range is by UP.
>>
> ETS is one specific use case for mqprio it can easily be used with other
> hardware transmission selection algorithms 802.1Q std based or otherwise.
>
> The mapping is really just an skb->priority to group of queues (qoffset and
> qcount). I happened to call the queue grouping a traffic class because that
> aligns with 802.1Q but it _really_ is just a queue grouping.
This is good for untagged traffic, but for tagged traffic I can see 2 
problems with this approach:
1. mqprio mapping could contradict egress map or 8021Qaz mapping (UP <=> 
TC). This could be solved (not very elegantly) by forcing the mappings 
to be synced.
2. egress map is per vlan, and mqprio mapping is one global mapping.
>
> In your case what would it mean to change the map and num_tc see 'tc':
>
> [root@jf-dev1-dcblab netperf]# tc qdisc add dev eth3 root mqprio help
> Usage: ... mqprio [num_tc NUMBER] [map P0 P1 ...]
>                    [queues count1@offset1 count2@offset2 ...] [hw 1|0]
>
> For example setting 'num_tc 8 map 0 1 2 3 0 1 2 3' looks like it might
> not work correctly. Would you end up with an skb tagged with priority
> 4,5,6,7 being sent out UP queues 0,1,2,3? My quess is that won't work
> with PFC or your ETS correctly.
I don't see a problem here. For example, skb tagged with priority 5 is 
mapped to UP 1. And sent through one of the tx rings of UP 1. All the 
rings of UP 1 share the same transmission queue (schedule queue) which 
is controlled by PFC and ETS by the HW. What is the problem here?
>
> In the canonical iSCSI case we put iscsid in a net_prio cgroup to get the
> priority set then use the priority to steer the skb to the correct queue
> groupings. In your case I think you can just fail any num_tc != 8 and keep
> the dflt map 1:1 then this should work. What did I miss? It looks like you
> already fail the num_tc != 8 case so why do we need this change?
>
> At most maybe we need a flag to indicate the mqprio map can't be changed in
> some cases.
What you suggest is that the priority in net_prio cgroup will be the 
User Priority, and not just the skb priority?
And also, for tagged traffic, how could it be forced to be synced with 
egress map?
>
>
>> CC: John Fastabend<john.r.fastabend@intel•com>
>> CC: Jeff Kirsher<jeffrey.t.kirsher@intel•com>
>> CC: Eilon Greenstein<eilong@broadcom•com>
>> Signed-off-by: Amir Vadai<amirv@mellanox•co.il>
>> ---
>>   drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c |   11 ++++++++++-
>>   drivers/net/ethernet/mellanox/mlx4/en_tx.c      |    9 +++------
>>   include/linux/netdevice.h                       |   12 +++++++++++-
>>   include/linux/skbuff.h                          |    3 ++-
>>   net/core/dev.c                                  |   10 +---------
>>   5 files changed, 27 insertions(+), 18 deletions(-)
>>
> [...]
>
>>
>>   void bnx2x_set_num_queues(struct bnx2x *bp)
>> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
>> index 7a49830..d0d96e3 100644
>> --- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
>> +++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
>> @@ -570,18 +570,15 @@ static void build_inline_wqe(struct mlx4_en_tx_desc *tx_desc, struct sk_buff *sk
>>
>>   u16 mlx4_en_select_queue(struct net_device *dev, struct sk_buff *skb)
>>   {
>> -	struct mlx4_en_priv *priv = netdev_priv(dev);
>> -	int up = -1;
>> +	int up = 0;
>>
>>   	if (vlan_tx_tag_present(skb))
>>   		up = (vlan_tx_tag_get(skb)>>  13);
> I was trying to avoid logic like this in select_queue().
Why?
>
> Can we get the same behavior by keeping the egress map and mqprio
> map in sync?
As I said above, if we force egress map to be synced to mqprio mapping, 
we loose it's power - mqprio is global, and egress map is per vlan.
>
>>   	else if (dev->num_tc)
>>   		up = netdev_get_prio_tc_map(dev, skb->priority);
>>
>> -	if (up>= 0)
>> -		return MLX4_EN_NUM_TX_RINGS + up;
>> -
>> -	return __skb_tx_hash(dev, skb, MLX4_EN_NUM_TX_RINGS);
>> +	return __skb_tx_hash(dev, skb, MLX4_EN_NUM_TX_RINGS,
>> +			MLX4_EN_NUM_TX_RINGS + up, 1);
>>   }
>>
> Thanks,
> John
Thanks,
Amir

  reply	other threads:[~2012-03-14 10:09 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-13 17:21 [PATCH 0/8] net/mlx4_en: DCB QoS support Amir Vadai
2012-03-13 17:21 ` [PATCH 1/8] net/mlx4_en: Force user priority by QP attribute Amir Vadai
2012-03-13 17:21 ` [PATCH 2/8] net/mlx4_core: set port QoS attributes Amir Vadai
2012-03-13 17:21 ` [PATCH 3/8] net/mlx4_en: DCB QoS support Amir Vadai
2012-03-13 17:21 ` [PATCH 4/8] net/mlx4_en: Set max rate-limit for a TC Amir Vadai
2012-03-13 18:26   ` John Fastabend
2012-03-14 10:31     ` Amir Vadai
2012-03-13 19:16   ` Dave Taht
2012-03-14 10:42     ` Amir Vadai
2012-03-13 17:22 ` [PATCH 5/8] net/mlx4_en: sk_prio <=> UP for untagged traffic Amir Vadai
2012-03-13 17:22 ` [PATCH 6/8] IB/rdma_cm: TOS <=> UP mapping for IBoE Amir Vadai
2012-03-13 17:22 ` [PATCH 7/8] net: support tx_ring per UP in HW based QoS mechanism Amir Vadai
2012-03-13 18:23   ` John Fastabend
2012-03-14 10:09     ` Amir Vadai [this message]
2012-03-14 21:36       ` John Fastabend
2012-03-15 10:05         ` Amir Vadai
2012-03-16  7:16           ` John Fastabend
2012-03-13 17:22 ` [PATCH 8/8] net/mlx4_en: num cores tx rings for every UP Amir Vadai
2012-03-20 11:29 ` [PATCH 0/8] net/mlx4_en: DCB QoS support Amir Vadai
2012-03-20 19:58   ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F606E5A.6080407@mellanox.com \
    --to=amirv@mellanox$(echo .)com \
    --cc=amirv@mellanox$(echo .)co.il \
    --cc=davem@davemloft$(echo .)net \
    --cc=eilong@broadcom$(echo .)com \
    --cc=jeffrey.t.kirsher@intel$(echo .)com \
    --cc=john.r.fastabend@intel$(echo .)com \
    --cc=netdev@vger$(echo .)kernel.org \
    --cc=oren@mellanox$(echo .)com \
    --cc=roland@purestorage$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox