From: Junio C Hamano <gitster@pobox•com>
To: Jeff King <peff@peff•net>
Cc: Andreas Krey <a.krey@gmx•de>,
Michael Haggerty <mhagger@alum•mit.edu>,
Git Mailing List <git@vger•kernel.org>
Subject: Re: [PATCH] refs.c: get_ref_cache: use a bucket hash
Date: Mon, 16 Mar 2015 22:35:18 -0700 [thread overview]
Message-ID: <xmqqd248p4o9.fsf@gitster.dls.corp.google.com> (raw)
In-Reply-To: <20150317024005.GA26313@peff.net> (Jeff King's message of "Mon, 16 Mar 2015 22:40:05 -0400")
Jeff King <peff@peff•net> writes:
> The get_ref_cache code was designed to scale to the actual number of
> submodules. I do not mind seeing it become a hash if people really do
> have a large number of submodules, but that is not what is happening
> here.
> ...
> So git-clean speculatively asks "what is HEAD in this maybe-submodule?". The
> right solution is probably one of:
>
> 1. In remove_dirs, find out if we have an actual submodule before calling
> resolve_gitlink_ref.
>
> 2. Teach get_ref_cache a "read-only" mode that will not auto-vivify the cache
> if it does not already exist.
>
> Of the two, I think (1) is probably cleaner (I think the way the ref
> code is structured, we have to create the submodule ref_cache in order
> to start looking things up in it).
Thanks for a great analysis. I too wondered if we should be growing
the per-submodule ref-cache when we are only probing.
> It looks like we don't even really care about the value of HEAD. We just
> want to know "is it a git directory?". I think in other places (like
> "git add"), we just do an existence check for "$dir/.git". That would
> not catch a bare repository, but I do not think the current check does
> either (it is looking for submodules, which always have a .git).
If we wanted to be consistent, perhaps we should be reusing the "is
this a git repository?" check used by the auto-discovery codepath
(setup.c:is_git_directory(), perhaps?), but the idea looks simple
enough and sounds sensible.
> Maybe something like (largely untested):
>
> diff --git a/builtin/clean.c b/builtin/clean.c
> index 98c103f..e2cc47b 100644
> --- a/builtin/clean.c
> +++ b/builtin/clean.c
> @@ -148,6 +148,32 @@ static int exclude_cb(const struct option *opt, const char *arg, int unset)
> return 0;
> }
>
> +static int dir_is_repo(struct strbuf *path)
> +{
> + size_t orig = path->len;
> + int ret;
> +
> + strbuf_addstr(path, "/.git");
> + if (!access(path->buf, F_OK))
> + ret = 1; /* definitely */
> + else if (errno == ENOENT)
> + ret = 0; /* definitely not */
> + else {
> + /*
> + * We couldn't tell. It would probably be safer to err
> + * on the side of saying "yes" here, because we are
> + * deciding what to delete, and are more likely to keep
> + * a sub-repo. But it would probably also create annoying
> + * false positives, where a directory we do not have
> + * permission to read would say something misleading
> + * like "not deleting sub-repo foo..."
> + */
> + ret = 0;
> + }
> + strbuf_setlen(path, orig);
> + return ret;
> +}
> +
> static int remove_dirs(struct strbuf *path, const char *prefix, int force_flag,
> int dry_run, int quiet, int *dir_gone)
> {
> @@ -155,13 +181,11 @@ static int remove_dirs(struct strbuf *path, const char *prefix, int force_flag,
> struct strbuf quoted = STRBUF_INIT;
> struct dirent *e;
> int res = 0, ret = 0, gone = 1, original_len = path->len, len;
> - unsigned char submodule_head[20];
> struct string_list dels = STRING_LIST_INIT_DUP;
>
> *dir_gone = 1;
>
> - if ((force_flag & REMOVE_DIR_KEEP_NESTED_GIT) &&
> - !resolve_gitlink_ref(path->buf, "HEAD", submodule_head)) {
> + if ((force_flag & REMOVE_DIR_KEEP_NESTED_GIT) && dir_is_repo(path)) {
> if (!quiet) {
> quote_path_relative(path->buf, prefix, "ed);
> printf(dry_run ? _(msg_would_skip_git_dir) : _(msg_skip_git_dir),
next prev parent reply other threads:[~2015-03-17 5:35 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-16 14:20 [PATCH] refs.c: get_ref_cache: use a bucket hash Andreas Krey
2015-03-16 17:19 ` Thomas Gummerer
2015-03-16 17:23 ` Junio C Hamano
2015-03-16 18:40 ` Andreas Krey
2015-03-17 2:40 ` Jeff King
2015-03-17 5:35 ` Junio C Hamano [this message]
2015-03-17 5:48 ` Jeff King
2015-11-13 15:29 ` Andreas Krey
2015-11-14 0:01 ` Jeff King
2015-11-14 13:22 ` Andreas Krey
2015-11-14 13:35 ` Andreas Krey
2015-11-16 16:31 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqd248p4o9.fsf@gitster.dls.corp.google.com \
--to=gitster@pobox$(echo .)com \
--cc=a.krey@gmx$(echo .)de \
--cc=git@vger$(echo .)kernel.org \
--cc=mhagger@alum$(echo .)mit.edu \
--cc=peff@peff$(echo .)net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox