From: Thomas Rast <trast@inf•ethz.ch>
To: "Alex Bennée" <kernel-hacker@bennee•com>
Cc: Ramkumar Ramachandra <artagnon@gmail•com>,
Git Mailing List <git@vger•kernel.org>
Subject: Re: Poor performance of git describe in big repos
Date: Thu, 30 May 2013 18:21:55 +0200 [thread overview]
Message-ID: <87ip20bfq4.fsf@linux-k42r.v.cablecom.net> (raw)
In-Reply-To: <CAJ-05NOjVhb+3Cab7uQE8K3VE0Q2GhqR3FE=WzJZvSn8Djt6tw@mail.gmail.com> ("Alex \=\?utf-8\?Q\?Benn\=C3\=A9e\=22's\?\= message of "Thu, 30 May 2013 17:01:58 +0100")
Alex Bennée <kernel-hacker@bennee•com> writes:
> On 30 May 2013 16:33, Thomas Rast <trast@inf•ethz.ch> wrote:
>> Alex Bennée <kernel-hacker@bennee•com> writes:
>>
>>> 41.58% git libcrypto.so.1.0.0 [.] sha1_block_data_order_ssse3
>>> 33.62% git libz.so.1.2.3.4 [.] inflate_fast
>>> 10.39% git libz.so.1.2.3.4 [.] adler32
>>> 2.03% git [kernel.kallsyms] [k] clear_page_c
>>
>> Do you have any large blobs in the repo that are referenced directly by
>> a tag?
>
> Most probably. I've certainly done a bunch of releases (which are tagged) were
> the last thing that was updated was an FPGA image.
[...]
>> git-describe should probably be fixed to avoid loading blobs, though I'm
>> not sure off hand if we have any infrastructure to infer the type of a
>> loose object without inflating it. (This could probably be added by
>> inflating only the first block.) We do have this for packed objects, so
>> at least for packed repos there's a speedup to be had.
>
> Will it be loading the blob for every commit it traverses or just ones that hit
> a tag? Why does it need to load the blob at all? Surely the commit
> tree state doesn't
> need to be walked down?
No, my theory is that you tagged *the blobs*. Git supports this.
git-describe needs to look at the commit (if any) obtained by peeling
each tag (i.e. dereferencing tags until it reaches a non-tag). So to do
that, it resolves the tag's referent and loads it. Usually this will be
a commit, in which case it is marked as reached by the tag.
As my example shows, it also resolves tags' referents if they refer to
non-commits, in particular, it will decompress large blobs that are
(directly) referenced by a tag.
Note that while annotated tags provide the type information themselves,
e.g.
$ git cat-file tag junio-gpg-pub
object fe113d3f96636710600c6b02d5fd421fa7e87dd6
type blob
tag junio-gpg-pub
[...]
unannotated tags are simply refs, so it is not enough to just look at
the tag objects' referent type.
I had a brief look around sha1_file.c, in particular sha1_object_info,
and it turns out we lack the "deflate only early part" logic as I
suspected. So that'll have to be fixed first. After that I *think* it
should automatically carry over into the tag readers.
--
Thomas Rast
trast@{inf,student}.ethz.ch
next prev parent reply other threads:[~2013-05-30 16:22 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-05-30 10:38 Poor performance of git describe in big repos Alex Bennée
2013-05-30 11:33 ` Ramkumar Ramachandra
2013-05-30 13:09 ` Alex Bennée
2013-05-30 14:32 ` Ramkumar Ramachandra
2013-05-30 15:01 ` Alex Bennée
2013-05-30 15:17 ` Ramkumar Ramachandra
2013-05-30 15:33 ` Thomas Rast
2013-05-30 16:01 ` Alex Bennée
2013-05-30 16:21 ` Thomas Rast [this message]
2013-05-30 16:44 ` Thomas Rast
2013-05-30 19:01 ` Antoine Pelisse
2013-05-30 20:00 ` [PATCH 1/2] sha1_file: silence sha1_loose_object_info Thomas Rast
2013-05-30 20:00 ` [PATCH 2/2] lookup_commit_reference_gently: do not read non-{tag,commit} Thomas Rast
2013-05-30 21:22 ` Jeff King
2013-05-31 0:52 ` Duy Nguyen
2013-05-31 8:08 ` Thomas Rast
2013-05-31 16:00 ` Jeff King
2013-05-31 6:43 ` Ramkumar Ramachandra
2013-05-31 8:16 ` Thomas Rast
2013-05-30 19:30 ` Poor performance of git describe in big repos John Keeping
2013-05-31 8:14 ` Alex Bennée
2013-05-31 8:24 ` Thomas Rast
2013-05-31 8:40 ` Alex Bennée
2013-05-31 8:46 ` Thomas Rast
2013-05-31 9:57 ` Alex Bennée
2013-06-03 8:02 ` Alex Bennée
2013-06-03 16:32 ` Junio C Hamano
2013-06-03 17:48 ` Junio C Hamano
2013-05-31 10:27 ` Thomas Rast
2013-05-31 16:17 ` Jeff King
2013-06-03 8:39 ` Alex Bennée
2013-06-03 14:49 ` Jeff King
2013-05-31 8:32 ` John Keeping
2013-05-31 8:49 ` Alex Bennée
2013-05-31 8:59 ` John Keeping
2013-05-30 11:48 ` John Keeping
2013-05-30 12:29 ` Alex Bennée
2013-05-30 13:20 ` Duy Nguyen
[not found] ` <CAJ-05NPacjAEC99Ntd9eMnTD9_PMMYFob-_tAx5CeSB79TkRSg@mail.gmail.com>
2013-05-30 13:45 ` Duy Nguyen
2013-05-30 14:02 ` Alex Bennée
2013-05-30 13:16 ` Alex Bennée
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87ip20bfq4.fsf@linux-k42r.v.cablecom.net \
--to=trast@inf$(echo .)ethz.ch \
--cc=artagnon@gmail$(echo .)com \
--cc=git@vger$(echo .)kernel.org \
--cc=kernel-hacker@bennee$(echo .)com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox