From: Bob Gilligan <gilligan@aristanetworks•com>
To: David Miller <davem@davemloft•net>
Cc: netdev@vger•kernel.org
Subject: Re: [PATCH 1/2] ipv4: Improve the scaling of the ARP cache for multicast destinations.
Date: Fri, 31 Aug 2012 12:21:28 -0700 [thread overview]
Message-ID: <50410EB8.3040603@aristanetworks.com> (raw)
In-Reply-To: <20120830.210628.365120808137655227.davem@davemloft.net>
On 8/30/12 6:06 PM, David Miller wrote:
> From: Bob Gilligan <gilligan@aristanetworks•com>
> Date: Thu, 30 Aug 2012 17:55:04 -0700
>
>> The mapping from multicast IPv4 address to MAC address can just as
>> easily be done at the time a packet is to be sent. With this change,
>> we maintain one ARP cache entry for each interface that has at least
>> one multicast group member. All routes to IPv4 multicast destinations
>> via a particular interface use the same ARP cache entry. This entry
>> does not store the MAC address to use. Instead, packets for multicast
>> destinations go to a new output function that maps the destination
>> IPv4 multicast address into the MAC address and forms the MAC header.
>
> Doing an ARP MC mapping on every packet is much more expensive than
> doing a copy of the hard header cache.
>
> I do not believe the memory consumption issue you use to justify this
> change is a real issue.
>
> If you are talking to that many multicast groups actively, you do want
> that many neighbour cache entries. This is not different from talking
> to nearly every IP address on a local /8 subnet. You'll have a huge
> number of neighbour table entries in that case as well.
>
> If your the actual steady state number of active groups being spoken
> to is smaller, you can tune the neighbour cache thresholds to collect
> old less used entries more quickly.
>
> And this today is trivial, since routes no longer hold a reference
> to neighbour entries. Therefore any neighbour entry whatsoever can
> be immediately reclaimed at any moment.
The scaling is N-squared: the number of neighbor cache entries
required for your multicast traffic is interfaces * groups. 100
interfaces and 100 groups could generate 10,000 entries. 1,000
interfaces and 1,000 groups could generate a million entries.
But the number of groups is hard to predict: it depends on the
applications in use and the multicast traffic they generate. So, it
is hard to come up with a "budget" for multicast entries in the
neighbor cache for a multicast router.
If you pick a gc_thresh3 that is less than your working set, you'll
end up thrashing the neighbor cache. And calls to neigh_forced_gc()
are expensive: It performs a linear search of the entire neighbor
cache. Also, the calls to neigh_forced_gc() due to a large number of
multicast entries will negatively impact the unicast entries sharing the
neighbor cache: it will free any unreferenced but resolved unicast
entries. Any subsequent packets for those destinations will trigger a
re-ARP. Unnecessary re-ARPing is generally undesirable in a router.
The user who wants to avoid these problems is left with the
alternative of setting gc_thresh3 to a very large number based on a
worst case estimate of the number of unicast plus multicast entries
required.
Seems just simpler and more efficient to keep the multicast entries
out of the neighbor cache entirely.
Bob.
>
> I'm not fond of these patches, and adding yet more special cases to
> the neighbour layer, and therefore will not apply them.
>
next prev parent reply other threads:[~2012-08-31 19:21 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-08-31 0:55 [PATCH 1/2] ipv4: Improve the scaling of the ARP cache for multicast destinations Bob Gilligan
2012-08-31 1:06 ` David Miller
2012-08-31 19:21 ` Bob Gilligan [this message]
2012-09-02 13:26 ` Nicolas de Pesloüan
2012-09-04 4:22 ` Bob Gilligan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50410EB8.3040603@aristanetworks.com \
--to=gilligan@aristanetworks$(echo .)com \
--cc=davem@davemloft$(echo .)net \
--cc=netdev@vger$(echo .)kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox