From: Michael Haggerty <mhagger@alum•mit.edu>
To: Jeff King <peff@peff•net>
Cc: Martin Fick <mfick@codeaurora•org>,
git discussion list <git@vger•kernel.org>,
Junio C Hamano <gitster@pobox•com>
Subject: Re: remove_duplicates() in builtin/fetch-pack.c is O(N^2)
Date: Tue, 22 May 2012 09:15:55 +0200 [thread overview]
Message-ID: <4FBB3D2B.4010300@alum.mit.edu> (raw)
In-Reply-To: <20120522041123.GA9972@sigill.intra.peff.net>
On 05/22/2012 06:11 AM, Jeff King wrote:
> On Tue, May 22, 2012 at 05:59:29AM +0200, Michael Haggerty wrote:
>
>>> Try doing "git fetch . refs/*:refs/*" in a repository with a large
>>> number of refs (e.g., 400K). With git v1.7.10, this takes about 9.5s on
>>> my machine. But using the version of git in "next" takes about 16.5s.
>>> Bisection points to your 432ad41 (refs: store references hierarchically,
>>> 2012-04-10). Perf shows sort_ref_dir and msort_with_tmp as hot spots.
>>
>> I'm looking into this.
>>
>> For your test, were the 400k references all in one "directory", or
>> were they sharded somehow?
>
> Sharded. This was the rails network repo, so it basically looks like
> this:
>
> refs/remotes/XXXXXXX/heads/master
> refs/remotes/XXXXXXX/tags/v1.0
> refs/remotes/XXXXXXX/tags/v1.1
> ...
> refs/remotes/XXXXXXX/tags/v3.5
> refs/remotes/YYYYYYY/heads/master
> refs/remotes/YYYYYYY/tags/v1.0
> refs/remotes/YYYYYYY/tags/v1.1
> ...
>
> Basically the same 90 or so refs you would have in a normal repository
> (a branch or two, and a bunch of tags), repeated over and over under
> numbered remote hierarchies (i.e., each remote is basically a copy of
> the refs from somebody's fork of the repo).
>
> I can provide you with the exact repo if you want, but it is about 100M.
>
> Interestingly, I don't seem to get nearly as much slow-down in my "fake"
> many-refs repo, which has all 100K entries directly in refs/heads.
That could explain why I cannot reproduce your result. I have done all
of my testing in fake repos (with sharded and non-sharded variants) with
very little file content.
If it is not too much trouble, please let me know where I can obtain
your test repo and what commands you used to get your result. (Was the
local repo already a full clone of the remote, including all 400k
references? How was the remote set up--sharing data or not, local or
remote? Warm or cold disk cache?)
Thanks,
Michael
--
Michael Haggerty
mhagger@alum•mit.edu
http://softwareswirl.blogspot.com/
next prev parent reply other threads:[~2012-05-22 7:23 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-21 8:13 remove_duplicates() in builtin/fetch-pack.c is O(N^2) Michael Haggerty
2012-05-21 9:09 ` Junio C Hamano
2012-05-21 9:42 ` demerphq
2012-05-21 17:45 ` Jeff King
2012-05-21 22:14 ` Jeff King
2012-05-21 22:17 ` [PATCH 1/5] fetch-pack: sort incoming heads Jeff King
2012-05-22 20:08 ` Junio C Hamano
2012-05-22 20:23 ` Jeff King
2012-05-24 6:04 ` Jeff King
2012-05-21 22:17 ` [PATCH 2/5] fetch-pack: avoid quadratic behavior in remove_duplicates Jeff King
2012-05-21 22:19 ` [PATCH 3/5] add sorting infrastructure for list refs Jeff King
2012-05-21 22:19 ` [PATCH 4/5] fetch-pack: sort the list of incoming refs Jeff King
2012-05-21 22:23 ` [PATCH 5/5] fetch-pack: avoid quadratic loop in filter_refs Jeff King
2012-05-22 20:16 ` Junio C Hamano
2012-05-21 23:52 ` remove_duplicates() in builtin/fetch-pack.c is O(N^2) Jeff King
2012-05-22 0:07 ` Jeff King
2012-05-22 3:59 ` Michael Haggerty
2012-05-22 4:11 ` Jeff King
2012-05-22 7:15 ` Michael Haggerty [this message]
2012-05-22 7:37 ` Jeff King
2012-05-22 13:28 ` Michael Haggerty
2012-05-22 17:33 ` Jeff King
2012-05-24 12:05 ` Michael Haggerty
2012-05-25 0:17 ` Martin Fick
2012-05-25 0:39 ` Jeff King
2012-05-25 0:54 ` Martin Fick
2012-05-25 1:04 ` Jeff King
2012-05-25 1:32 ` Martin Fick
2012-05-25 6:50 ` Jeff King
2012-05-22 12:18 ` Nguyen Thai Ngoc Duy
2012-05-22 13:35 ` Michael Haggerty
2012-05-22 17:01 ` Jeff King
2012-05-22 17:35 ` Junio C Hamano
2012-05-22 17:46 ` Jeff King
2012-05-24 4:54 ` Michael Haggerty
2012-05-23 1:20 ` Nguyen Thai Ngoc Duy
2012-05-22 16:16 ` Junio C Hamano
2012-05-21 18:15 ` Martin Fick
2012-05-21 19:41 ` Jeff King
2012-05-21 22:08 ` Junio C Hamano
2012-05-21 22:24 ` Jeff King
2012-05-22 5:51 ` Martin Fick
2012-05-22 18:21 ` Jeff King
2012-05-22 22:19 ` Martin Fick
2012-05-22 23:23 ` Junio C Hamano
2012-05-23 0:46 ` Martin Fick
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FBB3D2B.4010300@alum.mit.edu \
--to=mhagger@alum$(echo .)mit.edu \
--cc=git@vger$(echo .)kernel.org \
--cc=gitster@pobox$(echo .)com \
--cc=mfick@codeaurora$(echo .)org \
--cc=peff@peff$(echo .)net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox