public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox•com>
To: "René Scharfe" <l.s.r@web•de>
Cc: "Nikolay Avdeev" <avdeev@math•vsu.ru>,
	git@vger•kernel.org, "Nguyễn Thái Ngọc Duy" <pclouds@gmail•com>
Subject: Re: Bug report about symlinks
Date: Sun, 03 Aug 2014 10:19:21 -0700	[thread overview]
Message-ID: <xmqqk36ptrs6.fsf@gitster.dls.corp.google.com> (raw)
In-Reply-To: <53DCF14D.8040705@web.de> ("René Scharfe"'s message of "Sat, 02 Aug 2014 16:10:21 +0200")

René Scharfe <l.s.r@web•de> writes:

> How about the patch below?  Before it checks if an index entry exists
> in the work tree, it checks if its path includes a symlink.

Honestly, I didn't expect the fix to be in the refresh-index code
path, but doing this there sort of makes sense.

I think you, who dug to find out where to add the check, already
know this, and I am writing this mainly for myself and for the list
archive, but when the knee-jerk "has-syjmlink-leading-path missing?"
reaction came to me, two obvious optimizations also came to my mind:

 - When checking a cache entry "a/b/c/d/e" and we find out "a/b/c"
   is a symbolic link, we mark it as ~CE_VALID, but at the same
   time, we learn "a/b/c/any/thing" are also ~CE_VALID with that
   check, so we _could_, after the has_symlink_leading_path once
   returns true, so there may be an opportunity to fast-forward the
   scan, marking all paths under "a/b/c" as ~CE_VALID.

 - After finding out "a/b/c" is a symbolic link to cause anything
   underneath marked as ~CE_VALID, we also know "a/" and "a/b/"
   exists as real directories, so a later check for "a/b/some/thing"
   can start checking at "a/b/some/" without checking "a/" and "a/b/".

The latter is more important as it is a much more common case that
the shape of the working tree not to change.

But I think the implementation of has_symlink_leading_path() already
has these optimizations built inside around the (unfortunately named)
"struct cache_def", so it probably would not give us much boost to
implement such a fast-forwarding of the scan.

> And do we need to use the threaded_ variant of the function here?

Hmmm, this is a tangent, but you comment made me wonder if we also
need to adjust preload_thread() in preload-index.c somehow, but we
do not touch CE_UPTODATE there, so it probably is not necessary.

The caller of refresh_cache_ent() is walking an array of sorted
pathnames aka istate->cache[] in a single-threaded fashion, possibly
with a pathspec to limit the scan.  Do you mean that we may want to
make istate->cache[] scanned by multiple threads?  I am not sure.

> diff --git a/read-cache.c b/read-cache.c
> index 5d3c8bd..6f0057f 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1064,6 +1064,14 @@ static struct cache_entry *refresh_cache_ent(struct index_state *istate,
>  		return ce;
>  	}
>  
> +	if (has_symlink_leading_path(ce->name, ce_namelen(ce))) {
> +		if (ignore_missing)
> +			return ce;
> +		if (err)
> +			*err = ENOENT;
> +		return NULL;
> +	}
> +
>  	if (lstat(ce->name, &st) < 0) {
>  		if (ignore_missing && errno == ENOENT)
>  			return ce;

  reply	other threads:[~2014-08-03 17:19 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-31 19:50 Bug report about symlinks Nikolay Avdeev
2014-07-31 22:04 ` René Scharfe
2014-08-01 16:23   ` Junio C Hamano
2014-08-02 14:10     ` René Scharfe
2014-08-03 17:19       ` Junio C Hamano [this message]
2014-08-03 22:59         ` René Scharfe
2014-08-04 16:34           ` Junio C Hamano
2014-08-09 17:43             ` [PATCH] read-cache: check for leading symlinks when refreshing index René Scharfe
2014-08-04 11:03         ` Bug report about symlinks Duy Nguyen
2014-08-04 16:36           ` Junio C Hamano
  -- strict thread matches above, loose matches on Subject: below --
2014-07-30 11:30 NickKolok
2014-08-01 19:10 ` Dennis Kaarsemaker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqk36ptrs6.fsf@gitster.dls.corp.google.com \
    --to=gitster@pobox$(echo .)com \
    --cc=avdeev@math$(echo .)vsu.ru \
    --cc=git@vger$(echo .)kernel.org \
    --cc=l.s.r@web$(echo .)de \
    --cc=pclouds@gmail$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox