public inbox for netdev@vger.kernel.org 
 help / color / mirror / Atom feed
From: Roopa Prabhu <roopa@cumulusnetworks•com>
To: David Ahern <dsa@cumulusnetworks•com>
Cc: netdev@vger•kernel.org, ddutt@cumulusnetworks•com
Subject: Re: [PATCH net-next v2 1/3] net: ipv6: Allow shorthand delete of all nexthops in multipath route
Date: Mon, 16 Jan 2017 07:48:55 -0800	[thread overview]
Message-ID: <587CEB67.3040807@cumulusnetworks.com> (raw)
In-Reply-To: <1484510826-2723-2-git-send-email-dsa@cumulusnetworks.com>

On 1/15/17, 12:07 PM, David Ahern wrote:
> IPv4 allows multipath routes to be deleted using just the prefix and
> length. For example:
>     $ ip ro ls vrf red
>     unreachable default metric 8192
>     1.1.1.0/24
>         nexthop via 10.100.1.254  dev eth1 weight 1
>         nexthop via 10.11.200.2  dev eth11.200 weight 1
>     10.11.200.0/24 dev eth11.200 proto kernel scope link src 10.11.200.3
>     10.100.1.0/24 dev eth1 proto kernel scope link src 10.100.1.3
>
>     $ ip ro del 1.1.1.0/24 vrf red
>
>     $ ip ro ls vrf red
>     unreachable default metric 8192
>     10.11.200.0/24 dev eth11.200 proto kernel scope link src 10.11.200.3
>     10.100.1.0/24 dev eth1 proto kernel scope link src 10.100.1.3
>
> The same notation does not work with IPv6 because of how multipath routes
> are implemented for IPv6. For IPv6 only the first nexthop of a multipath
> route is deleted if the request contains only a prefix and length. This
> leads to unnecessary complexity in userspace dealing with IPv6 multipath
> routes.
>
> This patch allows all nexthops to be deleted without specifying each one
> in the delete request by passing a new flag, RTM_F_ALL_NEXTHOPS, in
> rtm_flags. Internally, this is done by walking the sibling list of the
> route matching the specifications given (prefix, length, metric, protocol,
> etc).
>
> With this patch (and an updated iproute2 command):
>     $  ip -6 ro ls vrf red
>     2001:db8::/120 via 2001:db8:1::62 dev eth1 metric 256  pref medium
>     2001:db8::/120 via 2001:db8:1::61 dev eth1 metric 256  pref medium
>     2001:db8::/120 via 2001:db8:1::60 dev eth1 metric 256  pref medium
>     2001:db8:1::/120 dev eth1 proto kernel metric 256  pref medium
>     ...
>
>     $ ip -6 ro del vrf red 1111::1/120
>     $ ip -6 ro ls vrf red
>     2001:db8:1::/120 dev eth1 proto kernel metric 256  pref medium
>     ...
>
> The flag is added to fib6_config by converting fc_type to a u16 (as
> noted fc_type only uses 8 bits). The new u16 hole is a bitmap with
> fc_delete_all_nexthop as the first bit.
>
> Suggested-by: Dinesh Dutt <ddutt@cumulusnetworks•com>
> Signed-off-by: David Ahern <dsa@cumulusnetworks•com>
> ---
> v2
> - switched example to rfc 3849 documentation address per request
> - changed delete loop to explicitly look at siblings list for
>   first route matching specs given (metric, protocol, etc)
>
>  include/net/ip6_fib.h          |  4 +++-
>  include/uapi/linux/rtnetlink.h |  1 +
>  net/ipv6/route.c               | 28 +++++++++++++++++++++++++---
>  3 files changed, 29 insertions(+), 4 deletions(-)
>
> diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
> index a74e2aa40ef4..11ab99e87c5f 100644
> --- a/include/net/ip6_fib.h
> +++ b/include/net/ip6_fib.h
> @@ -37,7 +37,9 @@ struct fib6_config {
>  	int		fc_ifindex;
>  	u32		fc_flags;
>  	u32		fc_protocol;
> -	u32		fc_type;	/* only 8 bits are used */
> +	u16		fc_type;	/* only 8 bits are used */
> +	u16		fc_delete_all_nexthop : 1,
> +			__unused : 15;
>  
>  	struct in6_addr	fc_dst;
>  	struct in6_addr	fc_src;
> diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
> index 8c93ad1ef9ab..7fb206bc42f9 100644
> --- a/include/uapi/linux/rtnetlink.h
> +++ b/include/uapi/linux/rtnetlink.h
> @@ -276,6 +276,7 @@ enum rt_scope_t {
>  #define RTM_F_EQUALIZE		0x400	/* Multipath equalizer: NI	*/
>  #define RTM_F_PREFIX		0x800	/* Prefix addresses		*/
>  #define RTM_F_LOOKUP_TABLE	0x1000	/* set rtm_table to FIB lookup result */
> +#define RTM_F_ALL_NEXTHOPS	0x2000	/* delete all nexthops (IPv6) */
>  
Do we really need the flag ?. It seems like delete with just prefix should delete all the routes in a multipath
route by default... (understand that you have it there to preserve existing behavior...for people who maybe relying on it. But this seems more like a bug fix. route replace went through a few such bug fixes "ipv6: fix ECMP route replacement"). ok with either approach.

  reply	other threads:[~2017-01-16 15:48 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-15 20:07 [PATCH net-next 0/3] net: ipv6: Improve user experience with multipath routes David Ahern
2017-01-15 20:07 ` [PATCH net-next v2 1/3] net: ipv6: Allow shorthand delete of all nexthops in multipath route David Ahern
2017-01-16 15:48   ` Roopa Prabhu [this message]
2017-01-16 15:58     ` David Ahern
2017-01-17  0:51   ` David Miller
2017-01-17  1:27     ` David Ahern
2017-01-17  1:37       ` David Miller
2017-01-17  1:38         ` David Ahern
2017-01-15 20:07 ` [PATCH net-next 2/3] net: ipv6: remove nowait arg to rt6_fill_node David Ahern
2017-01-15 20:07 ` [PATCH net-next 3/3] net: ipv6: Add option to dump multipath routes via RTA_MULTIPATH attribute David Ahern
2017-01-16 17:40   ` David Ahern

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=587CEB67.3040807@cumulusnetworks.com \
    --to=roopa@cumulusnetworks$(echo .)com \
    --cc=ddutt@cumulusnetworks$(echo .)com \
    --cc=dsa@cumulusnetworks$(echo .)com \
    --cc=netdev@vger$(echo .)kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox