public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Thomas Rast <trast@inf•ethz.ch>
To: "Alex Bennée" <kernel-hacker@bennee•com>
Cc: Ramkumar Ramachandra <artagnon@gmail•com>,
	Git Mailing List <git@vger•kernel.org>
Subject: Re: Poor performance of git describe in big repos
Date: Thu, 30 May 2013 18:21:55 +0200	[thread overview]
Message-ID: <87ip20bfq4.fsf@linux-k42r.v.cablecom.net> (raw)
In-Reply-To: <CAJ-05NOjVhb+3Cab7uQE8K3VE0Q2GhqR3FE=WzJZvSn8Djt6tw@mail.gmail.com> ("Alex \=\?utf-8\?Q\?Benn\=C3\=A9e\=22's\?\= message of "Thu, 30 May 2013 17:01:58 +0100")

Alex Bennée <kernel-hacker@bennee•com> writes:

> On 30 May 2013 16:33, Thomas Rast <trast@inf•ethz.ch> wrote:
>> Alex Bennée <kernel-hacker@bennee•com> writes:
>>
>>>  41.58%   git  libcrypto.so.1.0.0  [.] sha1_block_data_order_ssse3
>>>  33.62%   git  libz.so.1.2.3.4     [.] inflate_fast
>>>  10.39%   git  libz.so.1.2.3.4     [.] adler32
>>>   2.03%   git  [kernel.kallsyms]   [k] clear_page_c
>>
>> Do you have any large blobs in the repo that are referenced directly by
>> a tag?
>
> Most probably. I've certainly done a bunch of releases (which are tagged) were
> the last thing that was updated was an FPGA image.
[...]
>> git-describe should probably be fixed to avoid loading blobs, though I'm
>> not sure off hand if we have any infrastructure to infer the type of a
>> loose object without inflating it.  (This could probably be added by
>> inflating only the first block.)  We do have this for packed objects, so
>> at least for packed repos there's a speedup to be had.
>
> Will it be loading the blob for every commit it traverses or just ones that hit
> a tag? Why does it need to load the blob at all? Surely the commit
> tree state doesn't
> need to be walked down?

No, my theory is that you tagged *the blobs*.  Git supports this.

git-describe needs to look at the commit (if any) obtained by peeling
each tag (i.e. dereferencing tags until it reaches a non-tag).  So to do
that, it resolves the tag's referent and loads it.  Usually this will be
a commit, in which case it is marked as reached by the tag.

As my example shows, it also resolves tags' referents if they refer to
non-commits, in particular, it will decompress large blobs that are
(directly) referenced by a tag.

Note that while annotated tags provide the type information themselves,
e.g.

  $ git cat-file tag junio-gpg-pub
  object fe113d3f96636710600c6b02d5fd421fa7e87dd6
  type blob
  tag junio-gpg-pub
  [...]

unannotated tags are simply refs, so it is not enough to just look at
the tag objects' referent type.

I had a brief look around sha1_file.c, in particular sha1_object_info,
and it turns out we lack the "deflate only early part" logic as I
suspected.  So that'll have to be fixed first.  After that I *think* it
should automatically carry over into the tag readers.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

  reply	other threads:[~2013-05-30 16:22 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-30 10:38 Poor performance of git describe in big repos Alex Bennée
2013-05-30 11:33 ` Ramkumar Ramachandra
2013-05-30 13:09   ` Alex Bennée
2013-05-30 14:32     ` Ramkumar Ramachandra
2013-05-30 15:01       ` Alex Bennée
2013-05-30 15:17         ` Ramkumar Ramachandra
2013-05-30 15:33     ` Thomas Rast
2013-05-30 16:01       ` Alex Bennée
2013-05-30 16:21         ` Thomas Rast [this message]
2013-05-30 16:44           ` Thomas Rast
2013-05-30 19:01             ` Antoine Pelisse
2013-05-30 20:00             ` [PATCH 1/2] sha1_file: silence sha1_loose_object_info Thomas Rast
2013-05-30 20:00               ` [PATCH 2/2] lookup_commit_reference_gently: do not read non-{tag,commit} Thomas Rast
2013-05-30 21:22                 ` Jeff King
2013-05-31  0:52                   ` Duy Nguyen
2013-05-31  8:08                   ` Thomas Rast
2013-05-31 16:00                     ` Jeff King
2013-05-31  6:43                 ` Ramkumar Ramachandra
2013-05-31  8:16                   ` Thomas Rast
2013-05-30 19:30           ` Poor performance of git describe in big repos John Keeping
2013-05-31  8:14             ` Alex Bennée
2013-05-31  8:24               ` Thomas Rast
2013-05-31  8:40                 ` Alex Bennée
2013-05-31  8:46                   ` Thomas Rast
2013-05-31  9:57                     ` Alex Bennée
2013-06-03  8:02                       ` Alex Bennée
2013-06-03 16:32                         ` Junio C Hamano
2013-06-03 17:48                           ` Junio C Hamano
2013-05-31 10:27                     ` Thomas Rast
2013-05-31 16:17                       ` Jeff King
2013-06-03  8:39                         ` Alex Bennée
2013-06-03 14:49                           ` Jeff King
2013-05-31  8:32               ` John Keeping
2013-05-31  8:49                 ` Alex Bennée
2013-05-31  8:59                   ` John Keeping
2013-05-30 11:48 ` John Keeping
2013-05-30 12:29   ` Alex Bennée
2013-05-30 13:20     ` Duy Nguyen
     [not found]       ` <CAJ-05NPacjAEC99Ntd9eMnTD9_PMMYFob-_tAx5CeSB79TkRSg@mail.gmail.com>
2013-05-30 13:45         ` Duy Nguyen
2013-05-30 14:02           ` Alex Bennée
2013-05-30 13:16   ` Alex Bennée

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ip20bfq4.fsf@linux-k42r.v.cablecom.net \
    --to=trast@inf$(echo .)ethz.ch \
    --cc=artagnon@gmail$(echo .)com \
    --cc=git@vger$(echo .)kernel.org \
    --cc=kernel-hacker@bennee$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox