public inbox for netdev@vger.kernel.org 
 help / color / mirror / Atom feed
From: Robert Shearman <rshearma@brocade•com>
To: roopa <roopa@cumulusnetworks•com>
Cc: <netdev@vger•kernel.org>,
	"Eric W. Biederman" <ebiederm@xmission•com>,
	Thomas Graf <tgraf@suug•ch>,
	Dinesh Dutt <ddutt@cumulusnetworks•com>,
	"Vivek Venkatraman" <vivek@cumulusnetworks•com>
Subject: Re: [RFC net-next 2/3] ipv4: storing and retrieval of per-nexthop encap
Date: Tue, 2 Jun 2015 17:35:18 +0100	[thread overview]
Message-ID: <556DDB46.9010701@brocade.com> (raw)
In-Reply-To: <556DD368.8070806@cumulusnetworks.com>

On 02/06/15 17:01, roopa wrote:
> On 6/1/15, 9:46 AM, Robert Shearman wrote:
>> Parse RTA_ENCAP attribute for one path and multipath routes. The encap
>> length is stored in a newly added field to fib_nh, nh_encap_len,
>> although this is added to a padding hole in the structure so that it
>> doesn't increase the size at all. The encap data itself is stored at
>> the end of the array of nexthops. Whilst this means that retrieval
>> isn't optimal, especially if there are multiple nexthops, this avoids
>> the memory cost of an extra pointer, as well as any potential change
>> to the cache or instruction layout that could cause a performance
>> impact.
>>
>> Currently, the dst structure allocated to represent the destination of
>> the packet and used for retrieving the encap by the encap-type
>> interface has been grown through the addition of the rt_encap_len and
>> rt_encap fields. This isn't desirable and could be fixed by defining a
>> new destination type with operations copied from the normal case,
>> other than the addition of the get_encap operation.
>>
>> Signed-off-by: Robert Shearman <rshearma@brocade•com>
...
>> @@ -434,6 +445,83 @@ static int fib_detect_death(struct fib_info *fi,
>> int order,
>>       return 1;
>>   }
>> +static int fib_total_encap(struct fib_config *cfg)
>> +{
>> +    struct net *net = cfg->fc_nlinfo.nl_net;
>> +    int total_encap_len = 0;
>> +
>> +    if (cfg->fc_mp) {
>> +        int remaining = cfg->fc_mp_len;
>> +        struct rtnexthop *rtnh = cfg->fc_mp;
>> +
>> +        while (rtnh_ok(rtnh, remaining)) {
>> +            struct nlattr *nla, *attrs = rtnh_attrs(rtnh);
>> +            int attrlen;
>> +
>> +            attrlen = rtnh_attrlen(rtnh);
>> +            nla = nla_find(attrs, attrlen, RTA_ENCAP);
>> +            if (nla) {
>> +                struct net_device *dev;
>> +                int len;
>> +
>> +                dev = __dev_get_by_index(net,
>> +                             rtnh->rtnh_ifindex);
>> +                if (!dev)
>> +                    return -EINVAL;
>> +
>> +                /* Determine space required */
>> +                len = rtnl_parse_encap(dev, nla, NULL);
>> +                if (len < 0)
>> +                    return len;
>> +
>> +                total_encap_len += len;
>> +            }
>> +
>> +            rtnh = rtnh_next(rtnh, &remaining);
>> +        }
>> +    } else {
>> +        if (cfg->fc_encap) {
>> +            struct net_device *dev;
>> +            int len;
>> +
>> +            dev = __dev_get_by_index(net, cfg->fc_oif);
>> +            if (!dev)
>> +                return -EINVAL;
>> +
>> +            /* Determine space required */
>> +            len = rtnl_parse_encap(dev, cfg->fc_encap, NULL);
>> +            if (len < 0)
>> +                return len;
>> +
>> +            total_encap_len += len;
>> +        }
>> +    }
>> +
>> +    return total_encap_len;
>> +}
> we could avoid parsing and finding this device twice, if fib_nh just
> held a pointer to the encap_info
> (or tunnel info) ?. And the encap_info/tun_info could be refcounted and
> shared between
> nexthops ?. In my implementation i have just a pointer to parsed encap
> state
> in fib_nh

Right - I took the approach here to avoid any extra memory use if encap 
isn't in use for the nexthop/route, and to avoid any potential 
performance penalty caused by extra cache misses. It would certainly 
make things simpler if those weren't concerns. I'd appreciate input from 
the community on this.

>> @@ -1403,6 +1418,15 @@ static void rt_set_nexthop(struct rtable *rt,
>> __be32 daddr,
>>   #ifdef CONFIG_IP_ROUTE_CLASSID
>>           rt->dst.tclassid = nh->nh_tclassid;
>>   #endif
>> +
>> +        nh_encap = fib_get_nh_encap(fi, nh);
>> +        if (unlikely(nh_encap)) {
>> +            rt->rt_encap = kmemdup(nh_encap, nh->nh_encap_len,
>> +                           GFP_KERNEL);
>> +            if (rt->rt_encap)
>> +                rt->rt_encap_len = nh->nh_encap_len;
>> +        }
>> +
>
> And..., you could make the rtable point to the same encap info.

Ack.

Thanks,
Rob

  reply	other threads:[~2015-06-02 16:36 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-01 16:46 [RFC net-next 0/3] IP imposition of per-nh MPLS encap Robert Shearman
2015-06-01 16:46 ` [RFC net-next 1/3] net: infra for per-nexthop encap data Robert Shearman
2015-06-02 18:15   ` Eric W. Biederman
2015-06-01 16:46 ` [RFC net-next 2/3] ipv4: storing and retrieval of per-nexthop encap Robert Shearman
2015-06-02 16:01   ` roopa
2015-06-02 16:35     ` Robert Shearman [this message]
2015-06-01 16:46 ` [RFC net-next 3/3] mpls: new ipmpls device for encapsulating IP packets as mpls Robert Shearman
2015-06-02 16:15   ` roopa
2015-06-02 16:33     ` Robert Shearman
2015-06-02 18:57       ` roopa
2015-06-02 21:06         ` Robert Shearman
2015-06-03 18:43           ` Vivek Venkatraman
2015-06-04 18:46             ` Robert Shearman
2015-06-04 21:38               ` Vivek Venkatraman
2015-06-02 18:26   ` Eric W. Biederman
2015-06-02 21:37     ` Thomas Graf
2015-06-02 22:48       ` Eric W. Biederman
2015-06-02 23:23       ` Eric W. Biederman
2015-06-03  9:50         ` Thomas Graf
2015-06-02  0:06 ` [RFC net-next 0/3] IP imposition of per-nh MPLS encap Thomas Graf
2015-06-02 13:28   ` Robert Shearman
2015-06-02 21:43     ` Thomas Graf
2015-06-03 13:30       ` Robert Shearman
2015-06-02 15:31 ` roopa
2015-06-02 18:30   ` Eric W. Biederman
2015-06-02 18:39     ` roopa
2015-06-02 18:11 ` Eric W. Biederman
2015-06-02 20:57   ` Robert Shearman
2015-06-02 21:10     ` Eric W. Biederman
2015-06-02 22:15       ` Robert Shearman
2015-06-02 22:58         ` Eric W. Biederman
2015-06-04 15:12           ` Nicolas Dichtel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=556DDB46.9010701@brocade.com \
    --to=rshearma@brocade$(echo .)com \
    --cc=ddutt@cumulusnetworks$(echo .)com \
    --cc=ebiederm@xmission$(echo .)com \
    --cc=netdev@vger$(echo .)kernel.org \
    --cc=roopa@cumulusnetworks$(echo .)com \
    --cc=tgraf@suug$(echo .)ch \
    --cc=vivek@cumulusnetworks$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox