From: Jamal Hadi Salim <jhs@mojatatu•com>
To: davem@davemloft•net, stephen@networkplumber•org
Cc: netdev@vger•kernel.org, vyasevic@redhat•com,
sfeldma@cumulusnetworks•com, john.r.fastabend@intel•com,
roopa@cumulusnetworks•com
Subject: Re: [net-next PATCH 2/2] bridge netlink dump interface at par with brctl Actually better than brctl showmacs because we can filter by bridge port in the kernel
Date: Sun, 01 Jun 2014 08:16:41 -0400 [thread overview]
Message-ID: <538B19A9.4050607@mojatatu.com> (raw)
In-Reply-To: <1401623780-4297-2-git-send-email-jhs@emojatatu.com>
This is mostly to you Vlad since you brought it up earlier.
I ended using ifm instead of ndm. Currently there is lack of
symettry - we send requests with ifm and get responses with
ndms. Unfortunately after spending 2-3 hours I came to the
conclusion i cant change it without breaking old iproute2s that
were expecting this behavior. What we have here is a magnitude
better filtering but we could have done slightly better if we
were able to use an ndm. A little acrobatics later on to filter
by vlans may work..
cheers,
jamal
On 06/01/14 07:56, Jamal Hadi Salim wrote:
> From: Jamal Hadi Salim <jhs@mojatatu•com>
>
> The current bridge netlink interface doesnt scale when you have many bridges each
> with large fdbs or even bridges with many bridge ports
>
> Example usage:
>
> Lets start with two bridges each with a port...
>
> root@moja-mojo:bridge# ./bridge link
> 8: eth1 state DOWN : <BROADCAST,MULTICAST> mtu 1500 master br0 state disabled priority 32 cost 19
> 17: sw1-p1 state DOWN : <BROADCAST,NOARP> mtu 1500 master sw1 state disabled priority 32 cost 100
>
> show all...
> root@moja-mojo:bridge# ./bridge fdb show
> 33:33:00:00:00:01 dev bond0 self permanent
> 33:33:00:00:00:01 dev dummy0 self permanent
> 33:33:00:00:00:01 dev ifb0 self permanent
> 33:33:00:00:00:01 dev ifb1 self permanent
> 33:33:00:00:00:01 dev eth0 self permanent
> 01:00:5e:00:00:01 dev eth0 self permanent
> 33:33:ff:22:01:01 dev eth0 self permanent
> 02:00:00:12:01:02 dev eth1 vlan 0 master br0 permanent
> 00:17:42:8a:b4:05 dev eth1 vlan 0 master br0 permanent
> 00:17:42:8a:b4:07 dev eth1 self permanent
> 33:33:00:00:00:01 dev eth1 self permanent
> 33:33:00:00:00:01 dev gretap0 self permanent
> 33:33:00:00:00:01 dev br0 self permanent
> 33:33:00:00:00:01 dev sw1 self permanent
> a2:fb:21:4c:47:25 dev sw1-p1 vlan 0 master sw1 permanent
> 33:33:00:00:00:01 dev sw1-p1 self permanent
>
> Lets see a port that is not attached to a bridge
> root@moja-mojo:bridge# ./bridge fdb show brport eth0
> 33:33:00:00:00:01 self permanent
> 01:00:5e:00:00:01 self permanent
> 33:33:ff:22:01:01 self permanent
>
> Lets see a port that is attached to a bridge
> root@moja-mojo:bridge# ./bridge fdb show brport eth1
> 02:00:00:12:01:02 vlan 0 master br0 permanent
> 00:17:42:8a:b4:05 vlan 0 master br0 permanent
> 00:17:42:8a:b4:07 self permanent
> 33:33:00:00:00:01 self permanent
>
> Specify the correct bridge and you get good stuff
> root@moja-mojo:bridge# ./bridge fdb show brport eth1 br br0
> 02:00:00:12:01:02 vlan 0 master br0 permanent
> 00:17:42:8a:b4:05 vlan 0 master br0 permanent
> 00:17:42:8a:b4:07 self permanent
> 33:33:00:00:00:01 self permanent
>
> Specify the wrong bridge and you get good nada
> root@moja-mojo:bridge# ./bridge fdb show brport eth1 br sw1
>
> dump only br0
> root@moja-mojo:bridge# ./bridge fdb show br br0
> 02:00:00:12:01:02 dev eth1 vlan 0 master br0 permanent
> 00:17:42:8a:b4:05 dev eth1 vlan 0 master br0 permanent
> 00:17:42:8a:b4:07 dev eth1 self permanent
> 33:33:00:00:00:01 dev eth1 self permanent
>
> Lets move a port from one bridge to another for shits-and-giggles
> (as they say in New Brunswick)
> root@moja-mojo:bridge# ip link set sw1-p1 master br0
>
> Now dump again br0
> root@moja-mojo:bridge# ./bridge fdb show br br0
> 02:00:00:12:01:02 dev eth1 vlan 0 master br0 permanent
> 00:17:42:8a:b4:05 dev eth1 vlan 0 master br0 permanent
> 00:17:42:8a:b4:07 dev eth1 self permanent
> 33:33:00:00:00:01 dev eth1 self permanent
> a2:fb:21:4c:47:25 dev sw1-p1 vlan 0 master br0 permanent
> 33:33:00:00:00:01 dev sw1-p1 self permanent
>
> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu•com>
> ---
> net/core/rtnetlink.c | 68 +++++++++++++++++++++++++++++++++++++++++---------
> 1 file changed, 56 insertions(+), 12 deletions(-)
>
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index 064418e..71e6bc8 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -2508,26 +2508,70 @@ EXPORT_SYMBOL(ndo_dflt_fdb_dump);
>
> static int rtnl_fdb_dump(struct sk_buff *skb, struct netlink_callback *cb)
> {
> - int idx = 0;
> - struct net *net = sock_net(skb->sk);
> struct net_device *dev;
> + struct net_device *br_dev;
> + struct nlattr *tb[IFLA_MAX+1];
> + const struct net_device_ops *ops;
> + struct ifinfomsg *ifm = nlmsg_data(cb->nlh);
> + struct net *net = sock_net(skb->sk);
> + int brport_idx = 0;
> + int br_idx = 0;
> + int idx = 0;
> +
> + if (nlmsg_parse(cb->nlh, sizeof(struct ifinfomsg), tb, IFLA_MAX,
> + ifla_policy) == 0) {
> + if (tb[IFLA_MASTER])
> + br_idx = nla_get_u32(tb[IFLA_MASTER]);
> + }
> +
> + brport_idx = ifm->ifi_index;
>
> rcu_read_lock();
> for_each_netdev_rcu(net, dev) {
> - if (dev->priv_flags & IFF_BRIDGE_PORT) {
> - struct net_device *br_dev;
> - const struct net_device_ops *ops;
>
> - br_dev = netdev_master_upper_dev_get(dev);
> + if (brport_idx && (dev->ifindex != brport_idx))
> + continue;
> +
> + if (!br_idx) {
> + if (dev->priv_flags & IFF_BRIDGE_PORT) {
> + br_dev = netdev_master_upper_dev_get(dev);
> + ops = br_dev->netdev_ops;
> + if (ops->ndo_fdb_dump)
> + idx = ops->ndo_fdb_dump(skb, cb, br_dev,
> + dev, idx);
> + }
> +
> + /* all of bridge fdb entries are dumped via brports fdb
> + * therefore only allow for selfies for bridges
> + */
> + if (!(dev->priv_flags & IFF_EBRIDGE) &&
> + dev->netdev_ops->ndo_fdb_dump)
> + idx = dev->netdev_ops->ndo_fdb_dump(skb, cb, dev,
> + NULL, idx);
> + else
> + idx = ndo_dflt_fdb_dump(skb, cb, dev, NULL, idx);
> +
> + } else {
> + if (!(dev->priv_flags & IFF_BRIDGE_PORT))
> + continue;
> +
> + br_dev = __dev_get_by_index(net, br_idx);
> + if (!br_dev)
> + return -ENODEV;
> +
> + if (br_dev != netdev_master_upper_dev_get(dev))
> + continue;
> +
> ops = br_dev->netdev_ops;
> if (ops->ndo_fdb_dump)
> - idx = ops->ndo_fdb_dump(skb, cb, dev, NULL, idx);
> - }
> + idx = ops->ndo_fdb_dump(skb, cb, br_dev, dev, idx);
>
> - if (dev->netdev_ops->ndo_fdb_dump)
> - idx = dev->netdev_ops->ndo_fdb_dump(skb, cb, dev, NULL, idx);
> - else
> - idx = ndo_dflt_fdb_dump(skb, cb, dev, NULL, idx);
> + if (dev->netdev_ops->ndo_fdb_dump)
> + idx = dev->netdev_ops->ndo_fdb_dump(skb, cb, dev,
> + NULL, idx);
> + else
> + idx = ndo_dflt_fdb_dump(skb, cb, dev, NULL, idx);
> + }
> }
> rcu_read_unlock();
>
>
next prev parent reply other threads:[~2014-06-01 12:16 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-01 11:56 [net-next PATCH 1/2] bridge fdb dumping takes a filter device Dumping a bridge fdb dumps every fdb entry held. With this change we are going to filter on selected bridge port Jamal Hadi Salim
2014-06-01 11:56 ` [net-next PATCH 2/2] bridge netlink dump interface at par with brctl Actually better than brctl showmacs because we can filter by bridge port in the kernel Jamal Hadi Salim
2014-06-01 12:16 ` Jamal Hadi Salim [this message]
2014-06-01 12:24 ` Jamal Hadi Salim
2014-06-02 15:34 ` Vlad Yasevich
2014-06-02 22:17 ` Jamal Hadi Salim
2014-06-05 7:15 ` [net-next PATCH 1/2] bridge fdb dumping takes a filter device Dumping a bridge fdb dumps every fdb entry held. With this change we are going to filter on selected bridge port David Miller
2014-06-07 12:41 ` Jamal Hadi Salim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=538B19A9.4050607@mojatatu.com \
--to=jhs@mojatatu$(echo .)com \
--cc=davem@davemloft$(echo .)net \
--cc=john.r.fastabend@intel$(echo .)com \
--cc=netdev@vger$(echo .)kernel.org \
--cc=roopa@cumulusnetworks$(echo .)com \
--cc=sfeldma@cumulusnetworks$(echo .)com \
--cc=stephen@networkplumber$(echo .)org \
--cc=vyasevic@redhat$(echo .)com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox