public inbox for netdev@vger.kernel.org 
 help / color / mirror / Atom feed
From: Roopa Prabhu <roopa@cumulusnetworks•com>
To: Jamal Hadi Salim <jhs@mojatatu•com>
Cc: John Fastabend <john.fastabend@gmail•com>,
	Hubert Sokolowski <h.sokolowski@wit•edu.pl>,
	"netdev@vger•kernel.org" <netdev@vger•kernel.org>,
	Vlad Yasevich <vyasevic@redhat•com>
Subject: Re: [PATCH net-next RESEND] net: Do not call ndo_dflt_fdb_dump if ndo_fdb_dump is defined.
Date: Tue, 16 Dec 2014 21:51:05 -0800	[thread overview]
Message-ID: <549119C9.3080504@cumulusnetworks.com> (raw)
In-Reply-To: <54902E5E.2070405@mojatatu.com>

On 12/16/14, 5:06 AM, Jamal Hadi Salim wrote:
> On 12/15/14 19:45, John Fastabend wrote:
>> On 12/15/2014 06:29 AM, Jamal Hadi Salim wrote:
>
>>
>> hmm good question. When I implemented this on the host nics with SR-IOV,
>> VMDQ, etc. The multi/unicast addresses were propagated into the FDB by
>> the driver.
>
> So if i understand correctly, this is a NIC with an FDB. And there is no
> concept of a bridge to which it is attached. To the point of
> classical uni/multicast addresses on a netdev abstraction; these
> are typically stored in *much simpler tables* (used to be IO
> registers back in the day)
> Do these NICs not have such a concept?
> An fdb entry has an egress port column; I have seen cases where the
> port is labeled as "Cpu port" which would mean it belongs to the host;
> but in this case it just seems there is no such concept and as Or
> brought up in another email - what does "VLANid" mean in such a case?
> If we go with a CPU port concept,
> We could then use the concept of a vlan filter on a port basis
> but then what happens when you dont have an fdb (majority of cases)?
>
>> My logic was if some netdev ethx has a set of MAC addresses
>> above it well then any virtual function or virtual device also behind
>> the hardware shouldn't be sending those addresses out the egress switch
>> facing port. Otherwise the switch will see packets it knows are behind
>> that port and drop them. Or flood them if it hasn't learned the address
>> yet. Either way they will never get to the right netdev.
>>
>> Admittedly I wasn't thinking about switches with many ports at the time.
>>
>
> I often struggle with trying to "box" SRIOV into some concept of a
> switch abstraction and sometimes i am puzzled.
> Would exposing the SRIOV underlay as a switch not have solved this
> problem? Then the virtual ports essentially are bridge ports.
> Maybe what we need is a concept of a "edge relay" extended netdev?
> These things would have an fdb as well down and uplink relay ports that
> can be attached to them.
>
>
>>> Some of these drivers may be just doing the LinuxWay(aka cutnpaste what
>>> the other driver did).
>>
>> My original thinking here was... if it didn't implement fdb_add, fdb_del
>> and fdb_dump then if you wanted to think of it as having forwarding
>> database that was fine but it was really just a two port mac relay. In
>> which case just dump all the mac addresses it knows about. In this case
>> if it was something more fancy it could do its own dump like vxlan or
>> macvlan.
>>
>
> The challenge here is lack of separation between a NICs uni/multicast
> ports which it owns - which is a traditional operation regardless of
> what capabilities the NIC has; vs an fdb which has may have many
> other capabilities. Probably all NICs capable of many MACs implement
> fdbs?
>
>> For a host nic ucast/multicast and fdb are the same, I think? The
>> code we had was just short-hand to allow the common case a host nic
>> to work. Notice vxlan and bridge drivers didn't dump there addr lists
>> from fdb_dump until your patch.
>>
>> Perhaps my implementation of macvlan fdb_{add|del|dump} is buggy. And
>> I shouldn't overload the addr lists.
>>
>
> Not just those - I am wondering about the general utility of what
> Hubert was trying to do if all the driver does is call the default
> dumper based on some flags presence and the default dumper
> does a dump of uni/multicast host entries. Those are not really fdb
> entries in the traditional sense.
> Is there no way to get the unicast/multicast mac addresses for such
> a driver?
> I think that would help bring clarity to my confusion.
>
>
>>
>> I'm interested to see what Vlad says as well. But the current situation
>> is previously some drivers dumped their addr lists others didn't.
>> Specifically, the more switch like devices (bridge, vxlan) didn't. Now
>> every device will dump the addr lists. I'm not entirely convinced that
>> is correct.
>>
>
> I am glad this happened ;-> Otherwise we wouldnt be having this
> discussion. When Vlad was asking me I was in a rush to get the patch
> out and didnt question because i thought this was something some crazy
> virtualization people needed.
> If Vlad's use case goes away, then Hubert's little restoration is fine.
>
>
>> It works OK for host nics (NICS that can't forward between ports) and
>> seems at best confusing for real switch asics.
>
> So if these NICs have fdb entries and i programmed it (meaning setting
> which port a given MAC should be sent to), would it not work?
>
>> On a related question do
>> you expect the switch asic to trap any packets with MAC addresses in
>> the multi/unicast address lists and send them to the correct netdev? Or
>> will the switch forward them using normal FDB tables?
>>
>
> I think there would be a separate table for that. Roopa, can you check
> with the ASICs you guys work on? 
Jamal, yes, AFAICS, we do have a separate table where we add some static 
entries
indicating send to  CPU (example IPV4 and IPV6 link local multicast) and 
such
packets are sent to the correct netdev

> The point i was trying to make above
> is today there is a uni/multicast list or table of sorts that all NICs
> expose.
> There's always the hack of a "cpu port". I have also seen the "cpu port"
> being conceptualized in L3 tables to imply "next hop is cpu" where you
> have an IP address owned by the host; so maybe we need a concept of a
> cpu port or again the revival of TheThing class device.
>
> cheers,
> jamal
>

  parent reply	other threads:[~2014-12-17  5:51 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-10 19:37 [PATCH net-next RESEND] net: Do not call ndo_dflt_fdb_dump if ndo_fdb_dump is defined Hubert Sokolowski
2014-12-11  4:32 ` David Miller
2014-12-11 11:49   ` Jamal Hadi Salim
2014-12-11 16:51     ` Hubert Sokolowski
2014-12-11  7:31 ` Roopa Prabhu
2014-12-11 16:39   ` Hubert Sokolowski
2014-12-11 18:47     ` Arad, Ronen
2014-12-11 17:06   ` Hubert Sokolowski
2014-12-11 17:32     ` Roopa Prabhu
2014-12-11 20:40       ` Jamal Hadi Salim
2014-12-12 11:38       ` Hubert Sokolowski
2014-12-12 11:54         ` Jamal Hadi Salim
2014-12-12 13:36           ` Hubert Sokolowski
2014-12-12 14:35             ` Jamal Hadi Salim
2014-12-12 20:05               ` John Fastabend
2014-12-15 14:29                 ` Jamal Hadi Salim
2014-12-16  0:45                   ` John Fastabend
2014-12-16 13:06                     ` Jamal Hadi Salim
2014-12-16 14:35                       ` Hubert Sokolowski
2014-12-16 16:35                       ` John Fastabend
2014-12-16 17:21                         ` Samudrala, Sridhar
2014-12-16 19:30                           ` Roopa Prabhu
2014-12-16 20:11                             ` Samudrala, Sridhar
2014-12-17  5:54                               ` Roopa Prabhu
2014-12-21 14:27                         ` SRIOV as bridge " Jamal Hadi Salim
     [not found]                           ` <443500166.23675449.1419179623398.JavaMail.zimbra@cumulusnetworks.com>
2014-12-21 16:33                             ` Shrijeet Mukherjee
2014-12-21 19:08                           ` Roopa Prabhu
2014-12-21 19:19                             ` Jamal Hadi Salim
2014-12-21 19:36                               ` Roopa Prabhu
2014-12-21 20:06                                 ` Jamal Hadi Salim
2014-12-21 20:46                                   ` Roopa Prabhu
2014-12-22  3:13                                     ` Jamal Hadi Salim
2014-12-22  6:24                                       ` Roopa Prabhu
2014-12-22 12:10                                         ` Jamal Hadi Salim
2014-12-22 13:04                                           ` Jamal Hadi Salim
2014-12-21 19:52                             ` John Fastabend
2014-12-22  2:59                               ` Jamal Hadi Salim
2014-12-21 14:46                         ` SRIOV fdb and modes WAS(Re: " Jamal Hadi Salim
2014-12-17  5:51                       ` Roopa Prabhu [this message]
2014-12-17 15:39                     ` Vlad Yasevich
2014-12-17 16:18                       ` Hubert Sokolowski
2014-12-18 22:32                         ` Jamal Hadi Salim
2014-12-19 15:17                           ` Hubert Sokolowski
2014-12-19 16:32                             ` Roopa Prabhu
2015-01-05 12:56                               ` Hubert Sokolowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=549119C9.3080504@cumulusnetworks.com \
    --to=roopa@cumulusnetworks$(echo .)com \
    --cc=h.sokolowski@wit$(echo .)edu.pl \
    --cc=jhs@mojatatu$(echo .)com \
    --cc=john.fastabend@gmail$(echo .)com \
    --cc=netdev@vger$(echo .)kernel.org \
    --cc=vyasevic@redhat$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox