Re: [PATCH v2 4/5] get_sha1: speed up ambiguous 40-hex test

public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed

From: Junio C Hamano <gitster@pobox•com>
To: Michael Haggerty <mhagger@alum•mit.edu>
Cc: "Jeff King" <peff@peff•net>, "Brodie Rao" <brodie@sf•io>,
	git@vger•kernel.org, "Nguyễn Thái Ngọc Duy" <pclouds@gmail•com>
Subject: Re: [PATCH v2 4/5] get_sha1: speed up ambiguous 40-hex test
Date: Thu, 09 Jan 2014 10:25:44 -0800	[thread overview]
Message-ID: <xmqq61ptau6v.fsf@gitster.dls.corp.google.com> (raw)
In-Reply-To: <52CD7835.2020708@alum.mit.edu> (Michael Haggerty's message of "Wed, 08 Jan 2014 17:09:25 +0100")

Michael Haggerty <mhagger@alum•mit.edu> writes:

> It's not only racy WRT other processes.  If the current git process
> would create a new reference, it wouldn't be reflected in the cache.
>
> It's true that the main ref_cache doesn't invalidate itself
> automatically either when a new reference is created, so it's not really
> a fair complaint.  However, as we add places where the cache is
> invalidated, it is easy to overlook this cache that is stuck in static
> variables within a function definition and it is impossible to
> invalidate it.  Might it not be better to attach the cache to the
> ref_cache structure instead, and couple its lifetime to that object?
>
> Alternatively, the cache could be created and managed on the caller
> side, since the caller would know when the cache would have to be
> invalidated.  Also, different callers are likely to have different
> performance characteristics.  It is unlikely that the time to initialize
> the cache will be amortized in most cases; in fact, "rev-list --stdin"
> might be the *only* plausible use case.

True.

> Regarding the overall strategy: you gather all refnames that could be
> confused with an SHA-1 into a sha1_array, then later look up SHA-1s in
> the array to see if they are ambiguous.  This is a very special-case
> optimization for SHA-1s.
>
> I wonder whether another approach would gain almost the same amount of
> performance but be more general.  We could change dwim_ref() (or a
> version of it?) to read its data out of a ref_cache instead of going to
> disk every time.  Then, at the cost of populating the relevant parts of
> the ref_cache once, we would have fast dwim_ref() calls for all strings.

If opendir-readdir to grab only the names (but not values) of many
refs is a lot faster than stat-open-read a handful of dwim-ref
locations for a given name, that optimization might be worthwhile,
but I think that requires an update to read_loose_refs() not to
read_ref_full() and the users of refs API to instead lazily resolve
the refs, no?

If I ask for five names (say 'maint', 'master', 'next', 'pu',
'jch'), the current code will do 5 dwim_ref()s, each of which will
consult 6 locations with resolve_ref_unsafe(), totalling 30 calls to
resolve_ref_unsafe(), each of which in turn is essentially an open
followed by either an return on ENOENT or a read.  So 30 opens and 5
reads in total.

With your lazy ref_cache scheme, instead we would enumerate all the
loose ones in the same 6 directories (e.g. refs/tags/, refs/heads),
so 6 opendir()s with as many readdir()s as I have loose refs, plus
we open-read them in read_loose_refs() called from get_ref_dir()
with the current ref_cache code.  For me, "find .git/refs/heads"
gives 500+ lines of output, which suggests that using the ref_cache
mechanism for dwim_ref() may not be a huge win, unless it is updated
to be extremely lazy, and readdir()s turns out to be extremely less
heavier than open-read.  Also it is unlikely that the cost to
initialize the cache is amortized to be a net win unless we are
dealing with tons of dwim_ref()s.

next prev parent reply	other threads:[~2014-01-09 18:25 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-07  3:32 [PATCH] sha1_name: don't resolve refs when core.warnambiguousrefs is false Brodie Rao
2014-01-07  3:35 ` Brodie Rao
2014-01-07 17:13   ` Jeff King
2014-01-07 17:51     ` Junio C Hamano
2014-01-07 17:52       ` Jeff King
2014-01-07 19:38         ` Junio C Hamano
2014-01-07 19:58           ` Jeff King
2014-01-07 20:31             ` Junio C Hamano
2014-01-07 22:08               ` Jeff King
2014-01-07 22:10                 ` [PATCH 1/4] cat-file: refactor error handling of batch_objects Jeff King
2014-01-07 22:10                 ` [PATCH 2/4] cat-file: fix a minor memory leak in batch_objects Jeff King
2014-01-07 22:10                 ` [PATCH 3/4] cat-file: restore ambiguity warning flag " Jeff King
2014-01-07 22:11                 ` [PATCH 4/4] revision: turn off object/refname ambiguity check for --stdin Jeff King
2014-01-07 23:56                 ` [PATCH v2] speeding up 40-hex ambiguity check Jeff King
2014-01-07 23:57                   ` [PATCH v2 1/5] cat-file: refactor error handling of batch_objects Jeff King
2014-01-07 23:57                   ` [PATCH v2 2/5] cat-file: fix a minor memory leak in batch_objects Jeff King
2014-01-07 23:58                   ` [PATCH v2 3/5] refs: teach for_each_ref a flag to avoid recursion Jeff King
2014-01-08  3:47                     ` [PATCH v3 " Jeff King
2014-01-08 10:23                       ` Jeff King
2014-01-08 11:29                       ` Michael Haggerty
2014-01-09 21:49                         ` Jeff King
2014-01-10  8:59                           ` Michael Haggerty
2014-01-10  9:15                             ` Jeff King
2014-01-09 17:51                       ` Junio C Hamano
2014-01-09 21:55                         ` Jeff King
2014-01-07 23:59                   ` [PATCH v2 4/5] get_sha1: speed up ambiguous 40-hex test Jeff King
2014-01-08 16:09                     ` Michael Haggerty
2014-01-09 18:25                       ` Junio C Hamano [this message]
2014-01-10  9:41                       ` Jeff King
2014-01-14  9:50                         ` Jeff King
2014-01-14 11:34                           ` Michael Haggerty
2014-01-08  0:00                   ` [PATCH v2 5/5] get_sha1: drop object/refname ambiguity flag Jeff King
2014-01-08 16:34                     ` Michael Haggerty
2014-01-07  6:45 ` [PATCH] sha1_name: don't resolve refs when core.warnambiguousrefs is false Duy Nguyen
2014-01-07 17:24 ` Junio C Hamano
2014-01-07 19:23   ` Brodie Rao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqq61ptau6v.fsf@gitster.dls.corp.google.com \
    --to=gitster@pobox$(echo .)com \
    --cc=brodie@sf$(echo .)io \
    --cc=git@vger$(echo .)kernel.org \
    --cc=mhagger@alum$(echo .)mit.edu \
    --cc=pclouds@gmail$(echo .)com \
    --cc=peff@peff$(echo .)net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox