From: Junio C Hamano <gitster@pobox•com>
To: Michael Haggerty <mhagger@alum•mit.edu>
Cc: "Jeff King" <peff@peff•net>, "Brodie Rao" <brodie@sf•io>,
git@vger•kernel.org, "Nguyễn Thái Ngọc Duy" <pclouds@gmail•com>
Subject: Re: [PATCH v2 4/5] get_sha1: speed up ambiguous 40-hex test
Date: Thu, 09 Jan 2014 10:25:44 -0800 [thread overview]
Message-ID: <xmqq61ptau6v.fsf@gitster.dls.corp.google.com> (raw)
In-Reply-To: <52CD7835.2020708@alum.mit.edu> (Michael Haggerty's message of "Wed, 08 Jan 2014 17:09:25 +0100")
Michael Haggerty <mhagger@alum•mit.edu> writes:
> It's not only racy WRT other processes. If the current git process
> would create a new reference, it wouldn't be reflected in the cache.
>
> It's true that the main ref_cache doesn't invalidate itself
> automatically either when a new reference is created, so it's not really
> a fair complaint. However, as we add places where the cache is
> invalidated, it is easy to overlook this cache that is stuck in static
> variables within a function definition and it is impossible to
> invalidate it. Might it not be better to attach the cache to the
> ref_cache structure instead, and couple its lifetime to that object?
>
> Alternatively, the cache could be created and managed on the caller
> side, since the caller would know when the cache would have to be
> invalidated. Also, different callers are likely to have different
> performance characteristics. It is unlikely that the time to initialize
> the cache will be amortized in most cases; in fact, "rev-list --stdin"
> might be the *only* plausible use case.
True.
> Regarding the overall strategy: you gather all refnames that could be
> confused with an SHA-1 into a sha1_array, then later look up SHA-1s in
> the array to see if they are ambiguous. This is a very special-case
> optimization for SHA-1s.
>
> I wonder whether another approach would gain almost the same amount of
> performance but be more general. We could change dwim_ref() (or a
> version of it?) to read its data out of a ref_cache instead of going to
> disk every time. Then, at the cost of populating the relevant parts of
> the ref_cache once, we would have fast dwim_ref() calls for all strings.
If opendir-readdir to grab only the names (but not values) of many
refs is a lot faster than stat-open-read a handful of dwim-ref
locations for a given name, that optimization might be worthwhile,
but I think that requires an update to read_loose_refs() not to
read_ref_full() and the users of refs API to instead lazily resolve
the refs, no?
If I ask for five names (say 'maint', 'master', 'next', 'pu',
'jch'), the current code will do 5 dwim_ref()s, each of which will
consult 6 locations with resolve_ref_unsafe(), totalling 30 calls to
resolve_ref_unsafe(), each of which in turn is essentially an open
followed by either an return on ENOENT or a read. So 30 opens and 5
reads in total.
With your lazy ref_cache scheme, instead we would enumerate all the
loose ones in the same 6 directories (e.g. refs/tags/, refs/heads),
so 6 opendir()s with as many readdir()s as I have loose refs, plus
we open-read them in read_loose_refs() called from get_ref_dir()
with the current ref_cache code. For me, "find .git/refs/heads"
gives 500+ lines of output, which suggests that using the ref_cache
mechanism for dwim_ref() may not be a huge win, unless it is updated
to be extremely lazy, and readdir()s turns out to be extremely less
heavier than open-read. Also it is unlikely that the cost to
initialize the cache is amortized to be a net win unless we are
dealing with tons of dwim_ref()s.
next prev parent reply other threads:[~2014-01-09 18:25 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-01-07 3:32 [PATCH] sha1_name: don't resolve refs when core.warnambiguousrefs is false Brodie Rao
2014-01-07 3:35 ` Brodie Rao
2014-01-07 17:13 ` Jeff King
2014-01-07 17:51 ` Junio C Hamano
2014-01-07 17:52 ` Jeff King
2014-01-07 19:38 ` Junio C Hamano
2014-01-07 19:58 ` Jeff King
2014-01-07 20:31 ` Junio C Hamano
2014-01-07 22:08 ` Jeff King
2014-01-07 22:10 ` [PATCH 1/4] cat-file: refactor error handling of batch_objects Jeff King
2014-01-07 22:10 ` [PATCH 2/4] cat-file: fix a minor memory leak in batch_objects Jeff King
2014-01-07 22:10 ` [PATCH 3/4] cat-file: restore ambiguity warning flag " Jeff King
2014-01-07 22:11 ` [PATCH 4/4] revision: turn off object/refname ambiguity check for --stdin Jeff King
2014-01-07 23:56 ` [PATCH v2] speeding up 40-hex ambiguity check Jeff King
2014-01-07 23:57 ` [PATCH v2 1/5] cat-file: refactor error handling of batch_objects Jeff King
2014-01-07 23:57 ` [PATCH v2 2/5] cat-file: fix a minor memory leak in batch_objects Jeff King
2014-01-07 23:58 ` [PATCH v2 3/5] refs: teach for_each_ref a flag to avoid recursion Jeff King
2014-01-08 3:47 ` [PATCH v3 " Jeff King
2014-01-08 10:23 ` Jeff King
2014-01-08 11:29 ` Michael Haggerty
2014-01-09 21:49 ` Jeff King
2014-01-10 8:59 ` Michael Haggerty
2014-01-10 9:15 ` Jeff King
2014-01-09 17:51 ` Junio C Hamano
2014-01-09 21:55 ` Jeff King
2014-01-07 23:59 ` [PATCH v2 4/5] get_sha1: speed up ambiguous 40-hex test Jeff King
2014-01-08 16:09 ` Michael Haggerty
2014-01-09 18:25 ` Junio C Hamano [this message]
2014-01-10 9:41 ` Jeff King
2014-01-14 9:50 ` Jeff King
2014-01-14 11:34 ` Michael Haggerty
2014-01-08 0:00 ` [PATCH v2 5/5] get_sha1: drop object/refname ambiguity flag Jeff King
2014-01-08 16:34 ` Michael Haggerty
2014-01-07 6:45 ` [PATCH] sha1_name: don't resolve refs when core.warnambiguousrefs is false Duy Nguyen
2014-01-07 17:24 ` Junio C Hamano
2014-01-07 19:23 ` Brodie Rao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqq61ptau6v.fsf@gitster.dls.corp.google.com \
--to=gitster@pobox$(echo .)com \
--cc=brodie@sf$(echo .)io \
--cc=git@vger$(echo .)kernel.org \
--cc=mhagger@alum$(echo .)mit.edu \
--cc=pclouds@gmail$(echo .)com \
--cc=peff@peff$(echo .)net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox