public inbox for netdev@vger.kernel.org 
 help / color / mirror / Atom feed
From: Nicolas Dichtel <nicolas.dichtel@6wind•com>
To: sowmini varadhan <sowmini05@gmail•com>,
	Duan Jiong <duanj.fnst@cn•fujitsu.com>,
	David Miller <davem@davemloft•net>,
	netdev@vger•kernel.org,
	Hannes Frederic Sowa <hannes@stressinduktion•org>
Subject: Re: ipv6: a question about ECMP
Date: Fri, 08 Nov 2013 11:09:59 +0100	[thread overview]
Message-ID: <527CB877.6050800@6wind.com> (raw)
In-Reply-To: <CACP96tRh345x4D1BqpOK2eWfoeVbqMnu-+jGfF7eyFC1dbQoug@mail.gmail.com>

Le 07/11/2013 19:32, sowmini varadhan a écrit :
> On Thu, Nov 7, 2013 at 7:16 AM, Hannes Frederic Sowa
> <hannes@stressinduktion•org> wrote:
>> Hi Duan!
>>
>> On Thu, Nov 07, 2013 at 06:33:20PM +0800, Duan Jiong wrote:
>>>    After reading the ip6_pol_route(), i have a question about ECMP. Why we call
>>> the rt6_multipath_select() after calling rt6_select()?
>>>    In my opinion, the route returned by rt6_select() has a highest score, but the route
>>> returned by rt6_multipath_select() may has a lower score than the former, because the
>>> ECMP don't take the route preference into consideration. That means that the kernel will
>>> choose a less-desirable route.
>>
>> ECMP routes only differ in the gateway the specify, so I doubt there will be
>> any change in the score they woud receive. rt6_multipath_select does merly
>> make sure we don't select the same route again and again.
>
>   rt6_multipath_select() -> rt6_socre_route() seems to require that the
> interface *must* matchi, which is consistent with your assertion above that
> "ECMP routes differ in gw only".
In fact, ECMP routes have the same metric/weight and destination but not the
same next hop (ie gw + oif).

>
> But for IPv6, the gw addr is a a link-local, which is only required to be
> unique on the link. Thus, e.g.,  you can have fe80::1 as the gw on both eth0 and
> eth1.
Yes, oif can be different.
Note that gw can also be a global address.

>
> What is the assumption around "cost" for ECMP here- are we assuming some
> form of link bundling (Section 6 of rfc 2991) here? or is the "multiple parallel
> links" case handled somewhere else, that I am missing?
rt6_score_route() is called to check requested oif (see 52bd4c0c1551 "ipv6: fix 
ecmp lookup when oif is specified").

Regards,
Nicolas

>
> --Sowmini
>
>>
>> Please note, the rt6_info's siblings fields were added for the solely purpose
>> of ECMP and the insertion only updates the siblings list if the above criteria
>> did hold. They make sure the routes lookup up do differ on each lookup, so it
>> does actually do multipath and does not depend on the order the routes where
>> inserted.
>>
>> Hope that helps,
>>
>>    Hannes
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger•kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

      reply	other threads:[~2013-11-08 10:10 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-07 10:33 ipv6: a question about ECMP Duan Jiong
2013-11-07 12:16 ` Hannes Frederic Sowa
2013-11-07 18:32   ` sowmini varadhan
2013-11-08 10:09     ` Nicolas Dichtel [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=527CB877.6050800@6wind.com \
    --to=nicolas.dichtel@6wind$(echo .)com \
    --cc=davem@davemloft$(echo .)net \
    --cc=duanj.fnst@cn$(echo .)fujitsu.com \
    --cc=hannes@stressinduktion$(echo .)org \
    --cc=netdev@vger$(echo .)kernel.org \
    --cc=sowmini05@gmail$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox