* [PATCH v2 net-next] net: ipv4: Consider unreachable nexthops in multipath routes
@ 2016-03-31 22:24 David Ahern
2016-04-01 8:09 ` Julian Anastasov
0 siblings, 1 reply; 3+ messages in thread
From: David Ahern @ 2016-03-31 22:24 UTC (permalink / raw)
To: netdev; +Cc: ja, David Ahern
Multipath route lookups should consider knowledge about next hops and not
select a hop that is known to be failed.
Example:
[h2] [h3] 15.0.0.5
| |
3| 3|
[SP1] [SP2]--+
1 2 1 2
| | /-------------+ |
| \ / |
| X |
| / \ |
| / \---------------\ |
1 2 1 2
12.0.0.2 [TOR1] 3-----------------3 [TOR2] 12.0.0.3
4 4
\ /
\ /
\ /
-------| |-----/
1 2
[TOR3]
3|
|
[h1] 12.0.0.1
host h1 with IP 12.0.0.1 has 2 paths to host h3 at 15.0.0.5:
root@h1:~# ip ro ls
...
12.0.0.0/24 dev swp1 proto kernel scope link src 12.0.0.1
15.0.0.0/16
nexthop via 12.0.0.2 dev swp1 weight 1
nexthop via 12.0.0.3 dev swp1 weight 1
...
If the link between tor3 and tor1 is down and the link between tor1
and tor2 then tor1 is effectively cut-off from h1. Yet the route lookups
in h1 are alternating between the 2 routes: ping 15.0.0.5 gets one and
ssh 15.0.0.5 gets the other. Connections that attempt to use the
12.0.0.2 nexthop fail since that neighbor is not reachable:
root@h1:~# ip neigh show
...
12.0.0.3 dev swp1 lladdr 00:02:00:00:00:1b REACHABLE
12.0.0.2 dev swp1 FAILED
...
The failed path can be avoided by considering known neighbor information
when selecting next hops. If the neighbor lookup fails we have no
knowledge about the nexthop, so give it a shot. If there is an entry
then only select the nexthop if the state is sane. This is similar to
what fib_detect_death does.
To maintain backward compatibility use of the neighbor information is
based on a new sysctl, fib_multipath_use_neigh.
Signed-off-by: David Ahern <dsa@cumulusnetworks•com>
---
v2
- use rcu locking to avoid refcnts per Eric's suggestion
- only consider neighbor info for nh_scope == RT_SCOPE_LINK per Julian's
comment
- drop the 'state == NUD_REACHABLE' from the state check since it is
part of NUD_VALID (comment from Julian)
- wrapped the use of the neigh in a sysctl
Documentation/networking/ip-sysctl.txt | 10 ++++++++++
include/net/netns/ipv4.h | 3 +++
net/ipv4/fib_semantics.c | 32 ++++++++++++++++++++++++++++++--
net/ipv4/sysctl_net_ipv4.c | 11 +++++++++++
4 files changed, 54 insertions(+), 2 deletions(-)
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index b183e2b606c8..5b316d33a23f 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -63,6 +63,16 @@ fwmark_reflect - BOOLEAN
fwmark of the packet they are replying to.
Default: 0
+fib_multipath_use_neigh - BOOLEAN
+ Use status of existing neighbor entry when determining nexthop for
+ multipath routes. If disabled neighbor information is not used then
+ packets could be directed to a dead nexthop. Only valid for kernels
+ built with CONFIG_IP_ROUTE_MULTIPATH enabled.
+ Default: 0 (disabled)
+ Possible values:
+ 0 - disabled
+ 1 - enabled
+
route/max_size - INTEGER
Maximum number of routes allowed in the kernel. Increase
this when using large numbers of interfaces and/or routes.
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index a69cde3ce460..d061ffeb1e71 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -133,6 +133,9 @@ struct netns_ipv4 {
struct fib_rules_ops *mr_rules_ops;
#endif
#endif
+#ifdef CONFIG_IP_ROUTE_MULTIPATH
+ int sysctl_fib_multipath_use_neigh;
+#endif
atomic_t rt_genid;
};
#endif
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index d97268e8ff10..6d423faff0ce 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -1559,17 +1559,45 @@ int fib_sync_up(struct net_device *dev, unsigned int nh_flags)
}
#ifdef CONFIG_IP_ROUTE_MULTIPATH
+static bool fib_good_nh(const struct fib_nh *nh, struct net_device *dev)
+{
+ struct neighbour *n = NULL;
+ int state = NUD_NONE;
+
+ if (nh->nh_scope == RT_SCOPE_LINK) {
+ rcu_read_lock_bh();
+
+ n = __neigh_lookup_noref(&arp_tbl, &nh->nh_gw, dev);
+ if (n)
+ state = n->nud_state;
+
+ rcu_read_unlock_bh();
+ }
+
+ /* outside of rcu locking using n only as a boolean
+ * on whether a neighbor entry existed
+ */
+ if (!n || (state & NUD_VALID))
+ return true;
+
+ return false;
+}
void fib_select_multipath(struct fib_result *res, int hash)
{
struct fib_info *fi = res->fi;
+ struct net_device *dev = fi->fib_dev;
+ struct net *net = fi->fib_net;
for_nexthops(fi) {
if (hash > atomic_read(&nh->nh_upper_bound))
continue;
- res->nh_sel = nhsel;
- return;
+ if (!net->ipv4.sysctl_fib_multipath_use_neigh ||
+ fib_good_nh(nh, dev)) {
+ res->nh_sel = nhsel;
+ return;
+ }
} endfor_nexthops(fi);
/* Race condition: route has just become dead. */
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 1e1fe6086dd9..bb0419582b8d 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -960,6 +960,17 @@ static struct ctl_table ipv4_net_table[] = {
.mode = 0644,
.proc_handler = proc_dointvec,
},
+#ifdef CONFIG_IP_ROUTE_MULTIPATH
+ {
+ .procname = "fib_multipath_use_neigh",
+ .data = &init_net.ipv4.sysctl_fib_multipath_use_neigh,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = &zero,
+ .extra2 = &one,
+ },
+#endif
{ }
};
--
1.9.1
^ permalink raw reply related [flat|nested] 3+ messages in thread* Re: [PATCH v2 net-next] net: ipv4: Consider unreachable nexthops in multipath routes
2016-03-31 22:24 [PATCH v2 net-next] net: ipv4: Consider unreachable nexthops in multipath routes David Ahern
@ 2016-04-01 8:09 ` Julian Anastasov
2016-04-01 14:39 ` David Ahern
0 siblings, 1 reply; 3+ messages in thread
From: Julian Anastasov @ 2016-04-01 8:09 UTC (permalink / raw)
To: David Ahern; +Cc: netdev
Hello,
On Thu, 31 Mar 2016, David Ahern wrote:
> To maintain backward compatibility use of the neighbor information is
> based on a new sysctl, fib_multipath_use_neigh.
>
> Signed-off-by: David Ahern <dsa@cumulusnetworks•com>
> ---
> v2
> - use rcu locking to avoid refcnts per Eric's suggestion
> - only consider neighbor info for nh_scope == RT_SCOPE_LINK per Julian's
> comment
> - drop the 'state == NUD_REACHABLE' from the state check since it is
> part of NUD_VALID (comment from Julian)
> - wrapped the use of the neigh in a sysctl
>
> Documentation/networking/ip-sysctl.txt | 10 ++++++++++
> include/net/netns/ipv4.h | 3 +++
> net/ipv4/fib_semantics.c | 32 ++++++++++++++++++++++++++++++--
> net/ipv4/sysctl_net_ipv4.c | 11 +++++++++++
> 4 files changed, 54 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
> index b183e2b606c8..5b316d33a23f 100644
> --- a/Documentation/networking/ip-sysctl.txt
> +++ b/Documentation/networking/ip-sysctl.txt
> @@ -63,6 +63,16 @@ fwmark_reflect - BOOLEAN
> fwmark of the packet they are replying to.
> Default: 0
>
> +fib_multipath_use_neigh - BOOLEAN
> + Use status of existing neighbor entry when determining nexthop for
> + multipath routes. If disabled neighbor information is not used then
> + packets could be directed to a dead nexthop. Only valid for kernels
Some may associate "dead" with RTNH_F_DEAD while
in context of nexthop "failed" or "unreachable" can be
more suitable? Can we use something like this?:
If disabled<COMMA_ALLOWED_HERE?> neighbor information is not used
and packets could be directed to a failed nexthop.
> + built with CONFIG_IP_ROUTE_MULTIPATH enabled.
> + Default: 0 (disabled)
> + Possible values:
> + 0 - disabled
> + 1 - enabled
> +
> route/max_size - INTEGER
> Maximum number of routes allowed in the kernel. Increase
> this when using large numbers of interfaces and/or routes.
> diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
> index a69cde3ce460..d061ffeb1e71 100644
> --- a/include/net/netns/ipv4.h
> +++ b/include/net/netns/ipv4.h
> @@ -133,6 +133,9 @@ struct netns_ipv4 {
> struct fib_rules_ops *mr_rules_ops;
> #endif
> #endif
> +#ifdef CONFIG_IP_ROUTE_MULTIPATH
> + int sysctl_fib_multipath_use_neigh;
> +#endif
> atomic_t rt_genid;
> };
> #endif
> diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
> index d97268e8ff10..6d423faff0ce 100644
> --- a/net/ipv4/fib_semantics.c
> +++ b/net/ipv4/fib_semantics.c
> @@ -1559,17 +1559,45 @@ int fib_sync_up(struct net_device *dev, unsigned int nh_flags)
> }
>
> #ifdef CONFIG_IP_ROUTE_MULTIPATH
> +static bool fib_good_nh(const struct fib_nh *nh, struct net_device *dev)
> +{
> + struct neighbour *n = NULL;
> + int state = NUD_NONE;
Looks like we can do it even better.
If we use NUD_REACHABLE here...
> +
> + if (nh->nh_scope == RT_SCOPE_LINK) {
> + rcu_read_lock_bh();
> +
> + n = __neigh_lookup_noref(&arp_tbl, &nh->nh_gw, dev);
> + if (n)
> + state = n->nud_state;
> +
> + rcu_read_unlock_bh();
> + }
> +
> + /* outside of rcu locking using n only as a boolean
> + * on whether a neighbor entry existed
> + */
> + if (!n || (state & NUD_VALID))
then check for '!n' is not needed. For bool type
'return state & NUD_VALID;' should work.
> + return true;
> +
> + return false;
> +}
>
> void fib_select_multipath(struct fib_result *res, int hash)
> {
> struct fib_info *fi = res->fi;
> + struct net_device *dev = fi->fib_dev;
> + struct net *net = fi->fib_net;
>
> for_nexthops(fi) {
> if (hash > atomic_read(&nh->nh_upper_bound))
> continue;
>
> - res->nh_sel = nhsel;
> - return;
> + if (!net->ipv4.sysctl_fib_multipath_use_neigh ||
> + fib_good_nh(nh, dev)) {
This dev is from first nexthop. Better fib_good_nh
to use nh->nh_dev instead, it is present for all nexthops
in multipath route. We should not copy the bugs from
fib_detect_death.
> + res->nh_sel = nhsel;
> + return;
> + }
> } endfor_nexthops(fi);
So, you dropped the idea to give full chance for
fallback? Now if last nexthop fails we do not fallback at all.
We promised to prefer reachable nexthops.
Regards
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: [PATCH v2 net-next] net: ipv4: Consider unreachable nexthops in multipath routes
2016-04-01 8:09 ` Julian Anastasov
@ 2016-04-01 14:39 ` David Ahern
0 siblings, 0 replies; 3+ messages in thread
From: David Ahern @ 2016-04-01 14:39 UTC (permalink / raw)
To: Julian Anastasov; +Cc: netdev
On 4/1/16 2:09 AM, Julian Anastasov wrote:
>> diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
>> index d97268e8ff10..6d423faff0ce 100644
>> --- a/net/ipv4/fib_semantics.c
>> +++ b/net/ipv4/fib_semantics.c
>> @@ -1559,17 +1559,45 @@ int fib_sync_up(struct net_device *dev, unsigned int nh_flags)
>> }
>>
>> #ifdef CONFIG_IP_ROUTE_MULTIPATH
>> +static bool fib_good_nh(const struct fib_nh *nh, struct net_device *dev)
>> +{
>> + struct neighbour *n = NULL;
>> + int state = NUD_NONE;
>
> Looks like we can do it even better.
> If we use NUD_REACHABLE here...
>
>> +
>> + if (nh->nh_scope == RT_SCOPE_LINK) {
>> + rcu_read_lock_bh();
>> +
>> + n = __neigh_lookup_noref(&arp_tbl, &nh->nh_gw, dev);
>> + if (n)
>> + state = n->nud_state;
>> +
>> + rcu_read_unlock_bh();
>> + }
>> +
>> + /* outside of rcu locking using n only as a boolean
>> + * on whether a neighbor entry existed
>> + */
>> + if (!n || (state & NUD_VALID))
>
> then check for '!n' is not needed. For bool type
> 'return state & NUD_VALID;' should work.
good simplification.
>
>> + return true;
>> +
>> + return false;
>> +}
>>
>> void fib_select_multipath(struct fib_result *res, int hash)
>> {
>> struct fib_info *fi = res->fi;
>> + struct net_device *dev = fi->fib_dev;
>> + struct net *net = fi->fib_net;
>>
>> for_nexthops(fi) {
>> if (hash > atomic_read(&nh->nh_upper_bound))
>> continue;
>>
>> - res->nh_sel = nhsel;
>> - return;
>> + if (!net->ipv4.sysctl_fib_multipath_use_neigh ||
>> + fib_good_nh(nh, dev)) {
>
> This dev is from first nexthop. Better fib_good_nh
> to use nh->nh_dev instead, it is present for all nexthops
> in multipath route. We should not copy the bugs from
> fib_detect_death.
ack. Will fix.
>
>> + res->nh_sel = nhsel;
>> + return;
>> + }
>> } endfor_nexthops(fi);
>
> So, you dropped the idea to give full chance for
> fallback? Now if last nexthop fails we do not fallback at all.
> We promised to prefer reachable nexthops.
I'll save the first nhsel not selected and if all fails return it. That
keeps the fallback inline with what happens today.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2016-04-01 14:39 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-03-31 22:24 [PATCH v2 net-next] net: ipv4: Consider unreachable nexthops in multipath routes David Ahern
2016-04-01 8:09 ` Julian Anastasov
2016-04-01 14:39 ` David Ahern
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox