public inbox for netdev@vger.kernel.org 
 help / color / mirror / Atom feed
From: Jiri Pirko <jiri@resnulli•us>
To: John Fastabend <john.fastabend@gmail•com>
Cc: netdev@vger•kernel.org, davem@davemloft•net,
	nhorman@tuxdriver•com, andy@greyhouse•net, tgraf@suug•ch,
	dborkman@redhat•com, ogerlitz@mellanox•com, jesse@nicira•com,
	pshelar@nicira•com, azhou@nicira•com, ben@decadent•org.uk,
	stephen@networkplumber•org, jeffrey.t.kirsher@intel•com,
	vyasevic@redhat•com, xiyou.wangcong@gmail•com,
	john.r.fastabend@intel•com, edumazet@google•com,
	jhs@mojatatu•com, sfeldma@gmail•com, f.fainelli@gmail•com,
	roopa@cumulusnetworks•com, linville@tuxdriver•com,
	jasowang@redhat•com, ebiederm@xmission•com,
	nicolas.dichtel@6wind•com, ryazanov.s.a@gmail•com,
	buytenh@wantstofly•org, aviadr@mellanox•com, nbd@openwrt•org,
	alexei.starovoitov@gmail•com, Neil.Jerram@metaswitch•com,
	ronye@mellanox•com, simon.horman@netronome•com,
	alexander.h.duyck@redhat•com, john.ronciak@intel•com,
	mleitner@redhat•com, shrijeet@gmail•com,
	gospo@cumulusnetworks•com, bcrl@kvac
Subject: Re: [patch net-next v2 02/10] net: introduce generic switch devices support
Date: Tue, 11 Nov 2014 16:11:26 +0100	[thread overview]
Message-ID: <20141111151126.GE1825@nanopsycho.lan> (raw)
In-Reply-To: <5461354A.3020906@gmail.com>

Mon, Nov 10, 2014 at 10:59:38PM CET, john.fastabend@gmail•com wrote:
>On 11/09/2014 02:51 AM, Jiri Pirko wrote:
>>The goal of this is to provide a possibility to support various switch
>>chips. Drivers should implement relevant ndos to do so. Now there is
>>only one ndo defined:
>>- for getting physical switch id is in place.
>>
>>Note that user can use random port netdevice to access the switch.
>>
>>Signed-off-by: Jiri Pirko <jiri@resnulli•us>
>>---
>>  Documentation/networking/switchdev.txt | 59 ++++++++++++++++++++++++++++++++++
>>  MAINTAINERS                            |  7 ++++
>>  include/linux/netdevice.h              | 10 ++++++
>>  include/net/switchdev.h                | 30 +++++++++++++++++
>>  net/Kconfig                            |  1 +
>>  net/Makefile                           |  3 ++
>>  net/switchdev/Kconfig                  | 13 ++++++++
>>  net/switchdev/Makefile                 |  5 +++
>>  net/switchdev/switchdev.c              | 33 +++++++++++++++++++
>>  9 files changed, 161 insertions(+)
>>  create mode 100644 Documentation/networking/switchdev.txt
>>  create mode 100644 include/net/switchdev.h
>>  create mode 100644 net/switchdev/Kconfig
>>  create mode 100644 net/switchdev/Makefile
>>  create mode 100644 net/switchdev/switchdev.c
>>
>>diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt
>>new file mode 100644
>>index 0000000..98be76c
>>--- /dev/null
>>+++ b/Documentation/networking/switchdev.txt
>>@@ -0,0 +1,59 @@
>>+Switch (and switch-ish) device drivers HOWTO
>>+===========================
>>+
>>+Please note that the word "switch" is here used in very generic meaning.
>>+This include devices supporting L2/L3 but also various flow offloading chips,
>>+including switches embedded into SR-IOV NICs.
>>+
>>+Lets describe a topology a bit. Imagine the following example:
>>+
>>+       +----------------------------+    +---------------+
>>+       |     SOME switch chip       |    |      CPU      |
>>+       +----------------------------+    +---------------+
>>+       port1 port2 port3 port4 MNGMNT    |     PCI-E     |
>>+         |     |     |     |     |       +---------------+
>>+        PHY   PHY    |     |     |         |  NIC0 NIC1
>>+                     |     |     |         |   |    |
>>+                     |     |     +- PCI-E -+   |    |
>>+                     |     +------- MII -------+    |
>>+                     +------------- MII ------------+
>>+
>>+In this example, there are two independent lines between the switch silicon
>>+and CPU. NIC0 and NIC1 drivers are not aware of a switch presence. They are
>>+separate from the switch driver. SOME switch chip is by managed by a driver
>>+via PCI-E device MNGMNT. Note that MNGMNT device, NIC0 and NIC1 may be
>>+connected to some other type of bus.
>>+
>>+Now, for the previous example show the representation in kernel:
>>+
>>+       +----------------------------+    +---------------+
>>+       |     SOME switch chip       |    |      CPU      |
>>+       +----------------------------+    +---------------+
>>+       sw0p0 sw0p1 sw0p2 sw0p3 MNGMNT    |     PCI-E     |
>>+         |     |     |     |     |       +---------------+
>>+        PHY   PHY    |     |     |         |  eth0 eth1
>>+                     |     |     |         |   |    |
>>+                     |     |     +- PCI-E -+   |    |
>>+                     |     +------- MII -------+    |
>>+                     +------------- MII ------------+
>>+
>>+Lets call the example switch driver for SOME switch chip "SOMEswitch". This
>>+driver takes care of PCI-E device MNGMNT. There is a netdevice instance sw0pX
>>+created for each port of a switch. These netdevices are instances
>>+of "SOMEswitch" driver. sw0pX netdevices serve as a "representation"
>>+of the switch chip. eth0 and eth1 are instances of some other existing driver.
>>+
>>+The only difference of the switch-port netdevice from the ordinary netdevice
>>+is that is implements couple more NDOs:
>>+
>>+	ndo_sw_parent_get_id - This returns the same ID for two port netdevices
>>+			       of the same physical switch chip. This is
>>+			       mandatory to be implemented by all switch drivers
>>+			       and serves the caller for recognition of a port
>>+			       netdevice.
>
>What is the connection between ndo_sw_parent_get_id and
>ndo_get_phys_port_id(). I'm having a bit of trouble teasing
>this out.
>
>For example here is my ascii art for a SR-IOV NIC,
>
>       eth0     eth1     eth2
>        |         |        |
>        |         |        |
>        PF        VF       VF
>   +----+---------+--------+----+
>   |       embedded bridge      |
>   +-------------+--------------+
>                 |
>                port
>
>that can do switching between the various uplinks and downlinks.
>In IEEE 802.1Q language the embedded bridge acts like an edge
>relay. At least that seems to be the current state of the art
>for SR-IOV. Edge relay just means it has a single uplink port
>to the network and multiple downlinks and also isn't required
>to do learning and run loop detection protocols STP, et. al.
>
>Also there are multi-function devices that look the same except
>replace the VFs with PFs. It seems to be a common mode for NICs
>that do the iSCSI offloads with storage functions.
>
>When something is an embedded bridge vs a SOME switch chip is
>not entirely clear.
>
>My understanding is use ndo_sw_parent_get_id() when you have
>multiple physical ports all connected to a single switch object.
>When you have a single port connected to multiple PCIE functions
>or queues representing a netdev (e.g. macvlan offload) use the
>ndo_get_phys_port_id(). Just want to be sure we are on the
>same page here.

Nod. You described that right.


>
>Otherwise patch looks good. I think we can clear the above up
>with an addition to the documentation. Could go in after the
>initial set and be OK with me.
>
>IMO this patch is needed otherwise user space is at a complete
>loss on trying to figure out how netdevs map to switch silicon.
>You could have reused ndo_get_phys_port_id() perhaps but then
>I think user space may get confused by SR-IOV/VMDQ/etc ports
>attached to a switch silicon. For .02$ having a new distinct
>identifier is cleaner.

It most definitelly is. Therefore I went that way.


>
>
>>+	ndo_sw_parent_* - Functions that serve for a manipulation of the switch
>>+			  chip itself (it can be though of as a "parent" of the
>>+			  port, therefore the name). They are not port-specific.
>>+			  Caller might use arbitrary port netdevice of the same
>>+			  switch and it will make no difference.
>>+	ndo_sw_port_* - Functions that serve for a port-specific manipulation.
>
>[...]
>
>Thanks,
>John
>
>
>-- 
>John Fastabend         Intel Corporation

  reply	other threads:[~2014-11-11 15:11 UTC|newest]

Thread overview: 100+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-09 10:51 [patch net-next v2 00/10] introduce rocker switch driver with hardware accelerated datapath api - phase 1: bridge fdb offload Jiri Pirko
2014-11-09 10:51 ` [patch net-next v2 01/10] net: rename netdev_phys_port_id to more generic name Jiri Pirko
2014-11-10  3:35   ` Jamal Hadi Salim
2014-11-10  5:23     ` David Miller
2014-11-10 12:06       ` Jamal Hadi Salim
2014-11-10 12:33         ` Daniel Borkmann
2014-11-10 12:56           ` Jamal Hadi Salim
2014-11-10 16:28         ` David Miller
2014-11-10  7:43     ` Jiri Pirko
2014-11-10 12:17       ` Jamal Hadi Salim
2014-11-10 13:16         ` Jiri Pirko
2014-11-10 13:20           ` Jamal Hadi Salim
2014-11-10 16:28         ` David Miller
2014-11-10 19:03           ` Jamal Hadi Salim
2014-11-10 21:57   ` John Fastabend
2014-11-09 10:51 ` [patch net-next v2 02/10] net: introduce generic switch devices support Jiri Pirko
2014-11-10 21:59   ` John Fastabend
2014-11-11 15:11     ` Jiri Pirko [this message]
2014-11-11  9:49   ` M. Braun
2014-11-11 10:04     ` Jiri Pirko
2014-11-19 13:28   ` Roopa Prabhu
2014-11-19 13:46     ` Jiri Pirko
2014-11-19 13:59       ` Roopa Prabhu
2014-11-20 15:55         ` Andy Gospodarek
2014-11-21  7:16           ` Jiri Pirko
2014-11-09 10:51 ` [patch net-next v2 03/10] rtnl: expose physical switch id for particular device Jiri Pirko
2014-11-10  3:43   ` Jamal Hadi Salim
2014-11-10  7:45     ` Jiri Pirko
2014-11-10 17:58   ` Roopa Prabhu
2014-11-10 20:02     ` Scott Feldman
2014-11-11 13:55       ` Roopa Prabhu
2014-11-10 22:14     ` Jiri Pirko
2014-11-10 22:31       ` John Fastabend
2014-11-10 22:01   ` John Fastabend
2014-11-09 10:51 ` [patch net-next v2 04/10] net-sysfs: " Jiri Pirko
2014-11-10 22:01   ` John Fastabend
2014-11-09 10:51 ` [patch net-next v2 05/10] rocker: introduce rocker switch driver Jiri Pirko
2014-11-10 22:04   ` John Fastabend
2014-11-11 14:29     ` Thomas Graf
2014-11-11 15:19       ` Jiri Pirko
2014-11-11 15:32         ` Thomas Graf
2014-11-11 15:40           ` Jiri Pirko
2014-11-11 16:10             ` Thomas Graf
2014-11-27 14:09             ` Florian Fainelli
2014-11-11 15:41           ` Roopa Prabhu
2014-11-11 15:44             ` John Fastabend
2014-11-11 15:28     ` Jiri Pirko
2014-11-09 10:51 ` [patch net-next v2 06/10] bridge: introduce fdb offloading via switchdev Jiri Pirko
2014-11-10  3:47   ` Jamal Hadi Salim
2014-11-10  8:15     ` Jiri Pirko
2014-11-10  9:30       ` Scott Feldman
2014-11-10 12:47       ` Jamal Hadi Salim
2014-11-10 13:47         ` Jiri Pirko
2014-11-10 19:13           ` Jamal Hadi Salim
2014-11-10 13:51       ` Thomas Graf
2014-11-10 17:30         ` Andy Gospodarek
2014-11-10 19:03           ` Roopa Prabhu
2014-11-12 13:43             ` Jiri Pirko
2014-11-09 10:51 ` [patch net-next v2 07/10] bridge: call netdev_sw_port_stp_update when bridge port STP status changes Jiri Pirko
2014-11-10 13:11   ` Jamal Hadi Salim
2014-11-10 14:04     ` Thomas Graf
2014-11-10 19:20       ` Jamal Hadi Salim
2014-11-10 15:59     ` Roopa Prabhu
2014-11-09 10:51 ` [patch net-next v2 08/10] bridge: add API to notify bridge driver of learned FBD on offloaded device Jiri Pirko
2014-11-11 14:21   ` Roopa Prabhu
2014-11-11 17:38     ` Scott Feldman
2014-11-11 21:43       ` Roopa Prabhu
2014-11-09 10:51 ` [patch net-next v2 09/10] rocker: implement rocker ofdpa flow table manipulation Jiri Pirko
2014-11-09 10:51 ` [patch net-next v2 10/10] rocker: implement L2 bridge offloading Jiri Pirko
2014-11-10  3:53   ` Jamal Hadi Salim
2014-11-10  8:18     ` Jiri Pirko
2014-11-10  9:10       ` Nicolas Dichtel
2014-11-10  8:46     ` Scott Feldman
2014-11-10 12:27       ` Jamal Hadi Salim
2014-11-10 16:12         ` Roopa Prabhu
2014-11-10 17:36           ` Scott Feldman
2014-11-10 18:35             ` Roopa Prabhu
2014-11-10 19:27               ` Jamal Hadi Salim
2014-11-10 19:47                 ` Scott Feldman
2014-11-10 21:14                   ` Jamal Hadi Salim
2014-11-10 19:25             ` Jamal Hadi Salim
2014-11-10 17:22         ` Scott Feldman
2014-11-09 16:40 ` [patch net-next] bridge: rename fdb_*_hw to fdb_*_hw_addr to avoid confusion Jiri Pirko
2014-11-11  2:33   ` David Miller
2014-11-11  7:20     ` Jiri Pirko
2014-11-10  3:31 ` [patch net-next v2 00/10] introduce rocker switch driver with hardware accelerated datapath api - phase 1: bridge fdb offload Jamal Hadi Salim
2014-11-10  3:46   ` Simon Horman
2014-11-10  4:03     ` Jamal Hadi Salim
2014-11-10  4:58       ` Simon Horman
2014-11-10 22:23         ` John Fastabend
2014-11-11  8:51           ` Simon Horman
2014-11-13  5:44           ` Simon Horman
2014-11-13  6:31             ` John Fastabend
2014-11-21  2:01               ` Simon Horman
2014-11-21  7:20                 ` John Fastabend
2014-11-10  7:23   ` Jiri Pirko
2014-11-10 12:16     ` Jamal Hadi Salim
2014-11-10 13:12       ` Jiri Pirko
2014-11-10 16:48 ` Thomas Graf
2014-11-12 13:44 ` Jiri Pirko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141111151126.GE1825@nanopsycho.lan \
    --to=jiri@resnulli$(echo .)us \
    --cc=Neil.Jerram@metaswitch$(echo .)com \
    --cc=alexander.h.duyck@redhat$(echo .)com \
    --cc=alexei.starovoitov@gmail$(echo .)com \
    --cc=andy@greyhouse$(echo .)net \
    --cc=aviadr@mellanox$(echo .)com \
    --cc=azhou@nicira$(echo .)com \
    --cc=bcrl@kvac \
    --cc=ben@decadent$(echo .)org.uk \
    --cc=buytenh@wantstofly$(echo .)org \
    --cc=davem@davemloft$(echo .)net \
    --cc=dborkman@redhat$(echo .)com \
    --cc=ebiederm@xmission$(echo .)com \
    --cc=edumazet@google$(echo .)com \
    --cc=f.fainelli@gmail$(echo .)com \
    --cc=gospo@cumulusnetworks$(echo .)com \
    --cc=jasowang@redhat$(echo .)com \
    --cc=jeffrey.t.kirsher@intel$(echo .)com \
    --cc=jesse@nicira$(echo .)com \
    --cc=jhs@mojatatu$(echo .)com \
    --cc=john.fastabend@gmail$(echo .)com \
    --cc=john.r.fastabend@intel$(echo .)com \
    --cc=john.ronciak@intel$(echo .)com \
    --cc=linville@tuxdriver$(echo .)com \
    --cc=mleitner@redhat$(echo .)com \
    --cc=nbd@openwrt$(echo .)org \
    --cc=netdev@vger$(echo .)kernel.org \
    --cc=nhorman@tuxdriver$(echo .)com \
    --cc=nicolas.dichtel@6wind$(echo .)com \
    --cc=ogerlitz@mellanox$(echo .)com \
    --cc=pshelar@nicira$(echo .)com \
    --cc=ronye@mellanox$(echo .)com \
    --cc=roopa@cumulusnetworks$(echo .)com \
    --cc=ryazanov.s.a@gmail$(echo .)com \
    --cc=sfeldma@gmail$(echo .)com \
    --cc=shrijeet@gmail$(echo .)com \
    --cc=simon.horman@netronome$(echo .)com \
    --cc=stephen@networkplumber$(echo .)org \
    --cc=tgraf@suug$(echo .)ch \
    --cc=vyasevic@redhat$(echo .)com \
    --cc=xiyou.wangcong@gmail$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox