public inbox for netdev@vger.kernel.org 
 help / color / mirror / Atom feed
From: roopa <roopa@cumulusnetworks•com>
To: Scott Feldman <sfeldma@gmail•com>
Cc: "stephen@networkplumber•org" <stephen@networkplumber•org>,
	"David S. Miller" <davem@davemloft•net>,
	"Jamal Hadi Salim" <jhs@mojatatu•com>,
	"Jiří Pírko" <jiri@resnulli•us>,
	"Arad, Ronen" <ronen.arad@intel•com>,
	"Thomas Graf" <tgraf@suug•ch>,
	"john fastabend" <john.fastabend@gmail•com>,
	"vyasevic@redhat•com" <vyasevic@redhat•com>,
	Netdev <netdev@vger•kernel.org>,
	"Wilson Kok" <wkok@cumulusnetworks•com>,
	"Andy Gospodarek" <gospo@cumulusnetworks•com>
Subject: Re: [RFC PATCH net-next] bridge: ability to disable forwarding on a port
Date: Sun, 18 Jan 2015 01:10:12 -0800	[thread overview]
Message-ID: <54BB7874.90201@cumulusnetworks.com> (raw)
In-Reply-To: <CAE4R7bBeNojtj3Zyd6+zSJFGRjG4vejBObnS1XUFAfJDZJZYow@mail.gmail.com>

On 1/17/15, 5:05 PM, Scott Feldman wrote:
> On Fri, Jan 16, 2015 at 11:32 PM,  <roopa@cumulusnetworks•com> wrote:
>> From: Roopa Prabhu <roopa@cumulusnetworks•com>
>>
>> On a Linux bridge with bridge forwarding offloaded to a switch ASIC,
>> there is a need to not re-forward the frames that come up to the
>> kernel in software.
>>
>> Typically these are broadcast or multicast packets forwarded by the
>> hardware to multiple destination ports including sending a copy of
>> the packet to the kernel (e.g. an arp broadcast).
>> The bridge driver will try to forward the packet again, resulting in
>> two copies of the same packet.
>>
>> These packets can also come up to the kernel for logging when they hit
>> a LOG acl in hardware.
>>
>> This patch makes forwarding a flag on the port similar to
>> learn and flood and drops the packet just before forwarding.
>> (The forwarding disable on a bridge is tested to work on our boxes.
>> The bridge port flag addition is only compile tested.
>> This will need to be further refined to cover cases where a non-switch port
>> is bridged to a switch port etc. We will submit more patches to cover
>> all cases if we agree on this approach).
> Good topic to bring up, thanks for proposing a patch.  There is indeed
> duplicate pkts sent out in the case where both the bridge and the
> offloaded device are flooding these non-unicast pkts, such as ARP
> requests.  We do have per-port control today over unicast flooding
> using BR_FLOOD (IFLA_BRPORT_UNICAST_FLOOD).
>
> As you point out, this doesn't solve the case for non-offloaded ports
> bridged with switch ports.  If this port setting is enabled on an
> offloaded switch port, for example, the non-offloaded port can't get
> an ARP request resolved, if the MAC is behind the offloaded switch
> port.  But do we care?  Is there a use-case for this one, mixing
> offloaded and non-offloaded ports in a bridge?

Not sure. I don't know the use case, but I think I might have heard that 
there could be a case
  where a switch port could be bridged with a vm's port running on the 
switch. (?)
>
>> Other ways to solve the same problem could be to:
>> - use the offload feature flag on these switch ports to avoid the
>> re-forward:
>> https://www.marc.info/?l=linux-netdev&m=141820235010603&w=2
>>
>> - Or the switch driver can mark or set a flag in the skb, which the bridge
>> driver can use to avoid a re-forward.
>>
>> Signed-off-by: Wilson Kok <wkok@cumulusnetworks•com>
>> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks•com>
>> ---
>>   include/linux/if_bridge.h    |    3 ++-
>>   include/uapi/linux/if_link.h |    1 +
>>   net/bridge/br_forward.c      |   13 +++++++++++++
>>   net/bridge/br_if.c           |    2 +-
>>   net/bridge/br_netlink.c      |    4 +++-
>>   net/bridge/br_sysfs_if.c     |    1 +
>>   net/core/rtnetlink.c         |    4 +++-
>>   7 files changed, 24 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
>> index 0a8ce76..c79f4eb 100644
>> --- a/include/linux/if_bridge.h
>> +++ b/include/linux/if_bridge.h
>> @@ -40,10 +40,11 @@ struct br_ip_list {
>>   #define BR_ADMIN_COST          BIT(4)
>>   #define BR_LEARNING            BIT(5)
>>   #define BR_FLOOD               BIT(6)
>> -#define BR_AUTO_MASK           (BR_FLOOD | BR_LEARNING)
>>   #define BR_PROMISC             BIT(7)
>>   #define BR_PROXYARP            BIT(8)
>>   #define BR_LEARNING_SYNC       BIT(9)
>> +#define BR_FORWARD             BIT(10)
> The name BR_FORWARD might confuse people thinking this is related to
> STP FORWARDING state.
yes, that thought crossed my mind too. I had BR_FORWARDING initially and 
to make it sound less like
STP changed it to BR_FORWARD. :)

> We have BR_FLOOD for unknown unicast flooding.
> How about renaming BR_FLOOD to BR_FLOOD_UNICAST and add
> BR_FLOOD_BROADCAST?  So you would have:
>
>    IFLA_BRPORT_UNICAST_FLOOD           BR_FLOOD_UNICAST        /* flood
> unknown unicast traffic to port */
>    IFLA_BRPORT_BROADCAST_FLOOD    BR_FLOOD_BROADCAST  /* flood
> bcast/mcast traffic to port */

That's a good idea. So, unknown unicast and broadcast will be covered 
with that.
Am afraid there might be other packets, like the acl LOG packet hitting 
the CPU/kernel (?)
I will check if there are others.
>
>> +#define BR_AUTO_MASK           (BR_FLOOD | BR_LEARNING | BR_FORWARD)
>>
>>   extern void brioctl_set(int (*ioctl_hook)(struct net *, unsigned int, void __user *));
>>
>> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
>> index f7d0d2d..d394625 100644
>> --- a/include/uapi/linux/if_link.h
>> +++ b/include/uapi/linux/if_link.h
>> @@ -246,6 +246,7 @@ enum {
>>          IFLA_BRPORT_UNICAST_FLOOD, /* flood unicast traffic */
>>          IFLA_BRPORT_PROXYARP,   /* proxy ARP */
>>          IFLA_BRPORT_LEARNING_SYNC, /* mac learning sync from device */
>> +       IFLA_BRPORT_FORWARD,    /* enable forwarding on a device */
>>          __IFLA_BRPORT_MAX
>>   };
>>   #define IFLA_BRPORT_MAX (__IFLA_BRPORT_MAX - 1)
>> diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
>> index f96933a..98c41c8 100644
>> --- a/net/bridge/br_forward.c
>> +++ b/net/bridge/br_forward.c
>> @@ -81,10 +81,23 @@ static void __br_deliver(const struct net_bridge_port *to, struct sk_buff *skb)
>>                  br_forward_finish);
>>   }
>>
>> +int br_hw_forward_finish(struct sk_buff *skb)
>> +{
>> +       kfree_skb(skb);
>> +
>> +       return 0;
>> +}
>> +
>>   static void __br_forward(const struct net_bridge_port *to, struct sk_buff *skb)
>>   {
>>          struct net_device *indev;
>>
>> +       if (!(to->flags & BR_FORWARD)) {
>> +               NF_HOOK(NFPROTO_BRIDGE, NF_BR_FORWARD, skb, skb->dev, to->dev,
>> +                       br_hw_forward_finish);
>> +               return;
>> +       }
>> +
> Seems you should make the (flags & BR_FORWARD) check earlier, before
> skb cloning, in br_flood(), alongside the (flags & BR_FLOOD) check.
This patch strategically places it in __br_forward to catch all forwards 
(due to floods or no floods ..with direct call to br_foward)
with minimal code changes in mind. Will see if the clone can be avoided.
>
> Also, the above code is skipping some vlan checks (br_handle_vlan).
The br_handle_vlan checks seemed not necessary for a packet getting dropped.
But, will check and fix if its needed while the packet traverses the 
netfilter hook and before it gets dropped.

Thanks,
Roopa

  reply	other threads:[~2015-01-18  9:10 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-17  7:32 [RFC PATCH net-next] bridge: ability to disable forwarding on a port roopa
2015-01-17 21:14 ` Arad, Ronen
2015-01-18  8:48   ` roopa
2015-01-18  1:05 ` Scott Feldman
2015-01-18  9:10   ` roopa [this message]
2015-01-18 20:55     ` roopa
2015-01-19  7:37       ` Scott Feldman
2015-01-20  6:20         ` roopa
2015-01-20  7:09           ` Samudrala, Sridhar
2015-01-20 13:59             ` roopa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54BB7874.90201@cumulusnetworks.com \
    --to=roopa@cumulusnetworks$(echo .)com \
    --cc=davem@davemloft$(echo .)net \
    --cc=gospo@cumulusnetworks$(echo .)com \
    --cc=jhs@mojatatu$(echo .)com \
    --cc=jiri@resnulli$(echo .)us \
    --cc=john.fastabend@gmail$(echo .)com \
    --cc=netdev@vger$(echo .)kernel.org \
    --cc=ronen.arad@intel$(echo .)com \
    --cc=sfeldma@gmail$(echo .)com \
    --cc=stephen@networkplumber$(echo .)org \
    --cc=tgraf@suug$(echo .)ch \
    --cc=vyasevic@redhat$(echo .)com \
    --cc=wkok@cumulusnetworks$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox