public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Jacob Keller <jacob.e.keller@intel•com>
To: Johannes Schindelin <Johannes.Schindelin@gmx•de>
Cc: <git@vger•kernel.org>, Junio C Hamano <gitster@pobox•com>,
	Jacob Keller <jacob.keller@gmail•com>
Subject: Re: [PATCH v4 3/3] diff --no-index: support limiting by pathspec
Date: Tue, 23 Sep 2025 15:48:41 -0700	[thread overview]
Message-ID: <a9cecd57-e683-4efd-9c79-5618000319f3@intel.com> (raw)
In-Reply-To: <c75ec5f9-407a-6555-d4fb-bb629d54ec61@gmx.de>


[-- Attachment #1.1: Type: text/plain, Size: 7045 bytes --]



On 9/23/2025 7:57 AM, Johannes Schindelin wrote:
> Hi Jacob,
> 
> I know this is about a patch that you contributed four months ago, and
> the usual feedback required sweeping changes, including this one that was
> introduced in v4:
> 

Hello,

(Replying from my work address since its easier today than trying to
load my personal email up).

It's been quite some time since I did this so my memory is a bit hazy on
the logic. It took me a while to get things working, and its very likely
I missed corner cases.

> On Wed, 21 May 2025, Jacob Keller wrote:
> 
>> diff --git a/diff-no-index.c b/diff-no-index.c
>> index 9739b2b268b9..4aeeb98cfa8f 100644
>> --- a/diff-no-index.c
>> +++ b/diff-no-index.c
>> @@ -15,20 +15,45 @@
>>  #include "gettext.h"
>>  #include "revision.h"
>>  #include "parse-options.h"
>> +#include "pathspec.h"
>>  #include "string-list.h"
>>  #include "dir.h"
>>  
>> -static int read_directory_contents(const char *path, struct string_list *list)
>> +static int read_directory_contents(const char *path, struct string_list *list,
>> +				   const struct pathspec *pathspec,
>> +				   int skip)
>>  {
>> +	struct strbuf match = STRBUF_INIT;
>> +	int len;
>>  	DIR *dir;
>>  	struct dirent *e;
>>  
>>  	if (!(dir = opendir(path)))
>>  		return error("Could not open directory %s", path);
>>  
>> -	while ((e = readdir_skip_dot_and_dotdot(dir)))
>> -		string_list_insert(list, e->d_name);
>> +	if (pathspec) {
>> +		strbuf_addstr(&match, path);
>> +		strbuf_complete(&match, '/');
>> +		strbuf_remove(&match, 0, skip);
> 
> Okay, so here the `read_directory_contents()` function learns to
> optionally skip `skip` bytes from the `path` variable, after potentially
> appending a `/`.
> 
>>  [...]
>> @@ -337,7 +369,23 @@ int diff_no_index(struct rev_info *revs, const struct git_hash_algo *algop,
>>  		paths[i] = p;
>>  	}
>>  
>> -	fixup_paths(paths, &replacement);
>> +	if (fixup_paths(paths, &replacement)) {
>> +		parse_pathspec(&pathspec, PATHSPEC_FROMTOP | PATHSPEC_ATTR,
>> +			       PATHSPEC_PREFER_FULL | PATHSPEC_NO_REPOSITORY,
>> +			       NULL, &argv[2]);
>> +		if (pathspec.nr)
>> +			ps = &pathspec;
>> +
>> +		skip1 = strlen(paths[0]);
>> +		skip1 += paths[0][skip1] == '/' ? 0 : 1;
> 
> Since `skip1` is defined as the length of `path[0]`, I would expect
> `paths[0][skip1]` to always evaluate to NUL, and therefore the `== '/'`
> condition to always evaluate to `false`. Did I miss anything?
> 

You're correct that obviously makes no sense.. :D

>> +		skip2 = strlen(paths[1]);
>> +		skip2 += paths[1][skip2] == '/' ? 0 : 1;
> 
> Same here, `paths[1][skip2]` should always return `NUL`.
> 
> This has ramifications where `skip1` and `skip2` are each one larger than
> the length of `paths[0]` and `paths[1]`, respectively, and hence the code
> in `read_directory_contents()` will now try to remove one more than the
> length of the path, after potentially appending a slash.
> 
> But what if there is already a slash? The answer is:
> 
>   $ git diff --no-index -- /tmp/ /tmp/ ':!'
>   fatal: `pos + len' is too far after the end of the buffer
> 
> This has been reported (with Windows paths, don't let that distract) in
> https://github.com/git-for-windows/git/issues/5836.
> 
> I _think_ that what the patch should have done instead was:
> 
> 	if (skip1 > 0 && paths[0][skip1 - 1] == '/')
> 		skip1--;
> 
> and likewise
> 
> 	if (skip2 > 0 && paths[1][skip2 - 1] == '/')
> 		skip2--;
> 
> Focusing on the lines' correctness (which I don't think was the primary
> concern in the review of your patch), that would be what I would suggest.
> 

Perhaps. I need to dig into this code to see what the whole point was. I
think the idea is that I'm trying to skip past the common prefix?

> However, this makes me wonder whether the logic itself is sound? It is not
> immediately obvious to me why the `paths[0]` and `paths[1]` values aren't
> matched against the pathspec yet their entirety is seemingly skipped in
> `read_directory_contents()`?

I recall fiddling a lot to try and get this working. The idea here is
that fixup_paths does some conversions to handle the DWIM logic where a
"diff D F" becomes "diff D/F F". It returns true if both paths are
directories, so we only enter this block when both paths are
directories. (Which is required because we only support pathspec
limiting for directory differences).

We are calculating the skip length which is then passed into the
queue_diff function.

We pass the skip value into the queue_diff function, and this is the
length of the prefix of each path to skip. Essentially, we're skipping
everything from start of the path up to the root of what you pass into
the function.

The queue_diff then descends into each directory, and creates new paths
which are all the files and directories recursively under the target
directory. Each of these needs to have its root skipped (everything the
user supplied) in order to allow path spec matching to work, since we
apply pathspec matches as if you're at the root of the two trees being
diffed.

Crucially, we pass the skip value in to the first call of queue_diff,
but it remains unchanged as we descend further in, so we keep cutting
off only the first part of the path structure as we compare.

You're correct that the logic for calculating this incorrectly, as it
apparently doesn't work right if the paths end in a slash already. I'd
have to dig farther to figure out if this proposed patch is correct. I'm
not certain.

Basically, read_directory_contents is given the path to current
directory + the name of a file under the directory. We convert "path" to
be "dir/", then we remove the prefix of everything that was passed by
the user, so that we only check against parts of the dir path which are
recursively below the original passed directory. But we need to be
careful that we strip out the slash as well as the overall path, so that
we don't end up comparing subdirectories as if they're absolute paths.

Basically, if path does contain a slash, then the strlen alone is
sufficient, but if it doesn't contain a slash, then we need to avoid
adding 1. It happens that we incorrectly checked for this, which results
in us always adding 1, so when a string has a slash we skip too much.

It should be pretty easy to add a test case for this, and I think the
correct check is actually something like:

skip1 = strlen(paths[0])
if (!skip1 || paths[0][skip1 - 1] == '/')
  skip1++

Basically, we actually want one to be the length of the string plus a
trailing slash. If the string doesn't include a trailing slash, we have
to add 1. If it does include one, we don't add one, since the length
already accounts for the slash. Note that if the string is 0 length,
there's no slash so skip1 should be 1, hence the !skip1 part of the check.

Let me see if I can hook up a test case and confirm this.
>
> Ciao,
> Johannes


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]

  reply	other threads:[~2025-09-23 22:48 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-21 23:29 [PATCH v4 0/3] diff: add pathspec support to --no-index Jacob Keller
2025-05-21 23:29 ` [PATCH v4 1/3] pathspec: add match_leading_pathspec variant Jacob Keller
2025-05-21 23:29 ` [PATCH v4 2/3] pathspec: add flag to indicate operation without repository Jacob Keller
2025-05-21 23:29 ` [PATCH v4 3/3] diff --no-index: support limiting by pathspec Jacob Keller
2025-06-04  2:37   ` Ben Knoble
2025-06-04 17:22     ` Jacob Keller
2025-06-04 18:27     ` Jacob Keller
2025-06-04 20:19       ` Junio C Hamano
2025-06-04 21:05         ` Jacob Keller
2025-06-04 21:36           ` D. Ben Knoble
2025-06-04 23:22             ` Junio C Hamano
2025-09-23 14:57   ` Johannes Schindelin
2025-09-23 22:48     ` Jacob Keller [this message]
2025-09-24 11:19       ` Johannes Schindelin
2025-09-24 18:19         ` Jacob Keller
2025-09-24 18:23           ` Jacob Keller
2025-05-22 21:37 ` [PATCH v4 0/3] diff: add pathspec support to --no-index Junio C Hamano
2025-05-22 21:50   ` Jacob Keller
2025-05-22 22:04     ` Junio C Hamano
2025-06-03 21:12 ` Junio C Hamano
2025-06-04  2:32   ` Ben Knoble
2025-06-05 15:34   ` Phillip Wood

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a9cecd57-e683-4efd-9c79-5618000319f3@intel.com \
    --to=jacob.e.keller@intel$(echo .)com \
    --cc=Johannes.Schindelin@gmx$(echo .)de \
    --cc=git@vger$(echo .)kernel.org \
    --cc=gitster@pobox$(echo .)com \
    --cc=jacob.keller@gmail$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox