From: Ben Knoble <ben.knoble@gmail•com>
To: Jacob Keller <jacob.e.keller@intel•com>
Cc: git@vger•kernel.org, Junio C Hamano <gitster@pobox•com>,
Jacob Keller <jacob.keller@gmail•com>
Subject: Re: [PATCH v4 3/3] diff --no-index: support limiting by pathspec
Date: Tue, 3 Jun 2025 22:37:28 -0400 [thread overview]
Message-ID: <374BC043-8FB8-4052-BDE7-6BAE7F182994@gmail.com> (raw)
In-Reply-To: <20250521232917.2333291-4-jacob.e.keller@intel.com>
Actually, one comment :)
> Le 21 mai 2025 à 19:29, Jacob Keller <jacob.e.keller@intel•com> a écrit :
>
> From: Jacob Keller <jacob.keller@gmail•com>
>
> The --no-index option of git-diff enables using the diff machinery from
> git while operating outside of a repository. This mode of git diff is
> able to compare directories and produce a diff of their contents.
>
> When operating git diff in a repository, git has the notion of
> "pathspecs" which can specify which files to compare. In particular,
> when using git to diff two trees, you might invoke:
>
> $ git diff-tree -r <treeish1> <treeish2>.
I do find it slightly confusing that this series and in particular this patch is all about git-diff(1), but the only example is about git-diff-tree(1). It’s not the best example to me, esp. since it doesn’t actually use the pathspec machinery (deferring that to prose only). But I get the gist, so not really an issue.
Rereading a bit, it seems this message goes to lengths to teach readers about pathspecs for git-diff here; perhaps we can simplify those parts and assume the reader is familiar enough with the details to understand the implications of « no-index mode doesn’t support pathspecs to limit comparison »?
Nit: Should the diff-tree command end with a period?
> where the treeish could point to a subdirectory of the repository.
>
> When invoked this way, users can limit the selected paths of the tree by
> using a pathspec. Either by providing some list of paths to accept, or
> by removing paths via a negative refspec.
>
> The git diff --no-index mode does not support pathspecs, and cannot
> limit the diff output in this way. Other diff programs such as GNU
> difftools have options for excluding paths based on a pattern match.
> However, using git diff as a diff replacement has several advantages
> over many popular diff tools, including coloring moved lines, rename
> detections, and similar.
>
> Teach git diff --no-index how to handle pathspecs to limit the
> comparisons. This will only be supported if both provided paths are
> directories.
>
> For comparisons where one path isn't a directory, the --no-index mode
> already has some DWIM shortcuts implemented in the fixup_paths()
> function.
>
> Modify the fixup_paths function to return 1 if both paths are
> directories. If this is the case, interpret any extra arguments to git
> diff as pathspecs via parse_pathspec.
>
> Use parse_pathspec to load the remaining arguments (if any) to git diff
> --no-index as pathspec items. Disable PATHSPEC_ATTR support since we do
> not have a repository to do attribute lookup. Disable PATHSPEC_FROMTOP
> since we do not have a repository root. All pathspecs are treated as
> rooted at the provided comparison paths.
>
> After loading the pathspec data, calculate skip offsets for skipping
> past the root portion of the paths. This is required to ensure that
> pathspecs start matching from the provided path, rather than matching
> from the absolute path. We could instead pass the paths as prefix values
> to parse_pathspec. This is slightly problematic because the paths come
> from the command line and don't necessarily have the proper trailing
> slash. Additionally, that would require parsing pathspecs multiple
> times.
>
> Pass the pathspec object and the skip offsets into queue_diff, which
> in-turn must pass them along to read_directory_contents.
>
> Modify read_directory_contents to check against the pathspecs when
> scanning the directory. Use the skip offset to skip past the initial
> root of the path, and only match against portions that are below the
> intended directory structure being compared.
>
> The search algorithm for finding paths is recursive with read_dir. To
> make pathspec matching work properly, we must set both
> DO_MATCH_DIRECTORY and DO_MATCH_LEADING_PATHSPEC.
>
> Without DO_MATCH_DIRECTORY, paths like "a/b/c/d" will not match against
> pathspecs like "a/b/c". This is usually achieved by setting the is_dir
> parameter of match_pathspec.
>
> Without DO_MATCH_LEADING_PATHSPEC, paths like "a/b/c" would not match
> against pathspecs like "a/b/c/d". This is crucial because we recursively
> iterate down the directories. We could simply avoid checking pathspecs
> at subdirectories, but this would force recursion down directories
> which would simply be skipped.
>
> If we always passed DO_MATCH_LEADING_PATHSPEC, then we will
> incorrectly match in certain cases such as matching 'a/c' against
> ':(glob)**/d'. The match logic will see that a matches the leading part
> of the **/ and accept this even tho c doesn't match.
>
> To avoid this, use the match_leading_pathspec() variant recently
> introduced. This sets both flags when is_dir is set, but leaves them
> both cleared when is_dir is 0.
>
> Add test cases and documentation covering the new functionality. Note
> for the documentation I opted not to move the placement of '--' which is
> sometimes used to disambiguate arguments. The diff --no-index mode
> requires exactly 2 arguments determining what to compare. Any additional
> arguments are interpreted as pathspecs and must come afterwards. Use of
> '--' would not actually disambiguate anything, since there will never be
> ambiguity over which arguments represent paths or pathspecs.
>
> Signed-off-by: Jacob Keller <jacob.keller@gmail•com>
> ---
> builtin/diff.c | 2 +-
> diff-no-index.c | 85 +++++++++++++++++++++++++++++--------
> Documentation/git-diff.adoc | 10 +++--
> t/t4053-diff-no-index.sh | 75 ++++++++++++++++++++++++++++++++
> 4 files changed, 151 insertions(+), 21 deletions(-)
>
> diff --git a/builtin/diff.c b/builtin/diff.c
> index fa963808c318..c6231edce4e8 100644
> --- a/builtin/diff.c
> +++ b/builtin/diff.c
> @@ -35,7 +35,7 @@ static const char builtin_diff_usage[] =
> " or: git diff [<options>] [--merge-base] <commit> [<commit>...] <commit> [--] [<path>...]\n"
> " or: git diff [<options>] <commit>...<commit> [--] [<path>...]\n"
> " or: git diff [<options>] <blob> <blob>\n"
> -" or: git diff [<options>] --no-index [--] <path> <path>"
> +" or: git diff [<options>] --no-index [--] <path> <path> [<pathspec>...]"
> "\n"
> COMMON_DIFF_OPTIONS_HELP;
>
> diff --git a/diff-no-index.c b/diff-no-index.c
> index 9739b2b268b9..4aeeb98cfa8f 100644
> --- a/diff-no-index.c
> +++ b/diff-no-index.c
> @@ -15,20 +15,45 @@
> #include "gettext.h"
> #include "revision.h"
> #include "parse-options.h"
> +#include "pathspec.h"
> #include "string-list.h"
> #include "dir.h"
>
> -static int read_directory_contents(const char *path, struct string_list *list)
> +static int read_directory_contents(const char *path, struct string_list *list,
> + const struct pathspec *pathspec,
> + int skip)
> {
> + struct strbuf match = STRBUF_INIT;
> + int len;
> DIR *dir;
> struct dirent *e;
>
> if (!(dir = opendir(path)))
> return error("Could not open directory %s", path);
>
> - while ((e = readdir_skip_dot_and_dotdot(dir)))
> - string_list_insert(list, e->d_name);
> + if (pathspec) {
> + strbuf_addstr(&match, path);
> + strbuf_complete(&match, '/');
> + strbuf_remove(&match, 0, skip);
>
> + len = match.len;
> + }
> +
> + while ((e = readdir_skip_dot_and_dotdot(dir))) {
> + if (pathspec) {
> + strbuf_setlen(&match, len);
> + strbuf_addstr(&match, e->d_name);
> +
> + if (!match_leading_pathspec(NULL, pathspec,
> + match.buf, match.len,
> + 0, NULL, e->d_type == DT_DIR ? 1 : 0))
> + continue;
> + }
> +
> + string_list_insert(list, e->d_name);
> + }
> +
> + strbuf_release(&match);
> closedir(dir);
> return 0;
> }
> @@ -131,7 +156,8 @@ static struct diff_filespec *noindex_filespec(const struct git_hash_algo *algop,
> }
>
> static int queue_diff(struct diff_options *o, const struct git_hash_algo *algop,
> - const char *name1, const char *name2, int recursing)
> + const char *name1, const char *name2, int recursing,
> + const struct pathspec *ps, int skip1, int skip2)
> {
> int mode1 = 0, mode2 = 0;
> enum special special1 = SPECIAL_NONE, special2 = SPECIAL_NONE;
> @@ -171,9 +197,9 @@ static int queue_diff(struct diff_options *o, const struct git_hash_algo *algop,
> int i1, i2, ret = 0;
> size_t len1 = 0, len2 = 0;
>
> - if (name1 && read_directory_contents(name1, &p1))
> + if (name1 && read_directory_contents(name1, &p1, ps, skip1))
> return -1;
> - if (name2 && read_directory_contents(name2, &p2)) {
> + if (name2 && read_directory_contents(name2, &p2, ps, skip2)) {
> string_list_clear(&p1, 0);
> return -1;
> }
> @@ -218,7 +244,7 @@ static int queue_diff(struct diff_options *o, const struct git_hash_algo *algop,
> n2 = buffer2.buf;
> }
>
> - ret = queue_diff(o, algop, n1, n2, 1);
> + ret = queue_diff(o, algop, n1, n2, 1, ps, skip1, skip2);
> }
> string_list_clear(&p1, 0);
> string_list_clear(&p2, 0);
> @@ -258,8 +284,10 @@ static void append_basename(struct strbuf *path, const char *dir, const char *fi
> * DWIM "diff D F" into "diff D/F F" and "diff F D" into "diff F D/F"
> * Note that we append the basename of F to D/, so "diff a/b/file D"
> * becomes "diff a/b/file D/file", not "diff a/b/file D/a/b/file".
> + *
> + * Return 1 if both paths are directories, 0 otherwise.
> */
> -static void fixup_paths(const char **path, struct strbuf *replacement)
> +static int fixup_paths(const char **path, struct strbuf *replacement)
> {
> struct stat st;
> unsigned int isdir0 = 0, isdir1 = 0;
> @@ -282,26 +310,31 @@ static void fixup_paths(const char **path, struct strbuf *replacement)
> if ((isdir0 && ispipe1) || (ispipe0 && isdir1))
> die(_("cannot compare a named pipe to a directory"));
>
> - if (isdir0 == isdir1)
> - return;
> + /* if both paths are directories, we will enable pathspecs */
> + if (isdir0 && isdir1)
> + return 1;
> +
> if (isdir0) {
> append_basename(replacement, path[0], path[1]);
> path[0] = replacement->buf;
> - } else {
> + } else if (isdir1) {
> append_basename(replacement, path[1], path[0]);
> path[1] = replacement->buf;
> }
> +
> + return 0;
> }
>
> static const char * const diff_no_index_usage[] = {
> - N_("git diff --no-index [<options>] <path> <path>"),
> + N_("git diff --no-index [<options>] <path> <path> [<pathspec>...]"),
> NULL
> };
>
> int diff_no_index(struct rev_info *revs, const struct git_hash_algo *algop,
> int implicit_no_index, int argc, const char **argv)
> {
> - int i, no_index;
> + struct pathspec pathspec, *ps = NULL;
> + int i, no_index, skip1 = 0, skip2 = 0;
> int ret = 1;
> const char *paths[2];
> char *to_free[ARRAY_SIZE(paths)] = { 0 };
> @@ -317,13 +350,12 @@ int diff_no_index(struct rev_info *revs, const struct git_hash_algo *algop,
> options = add_diff_options(no_index_options, &revs->diffopt);
> argc = parse_options(argc, argv, revs->prefix, options,
> diff_no_index_usage, 0);
> - if (argc != 2) {
> + if (argc < 2) {
> if (implicit_no_index)
> warning(_("Not a git repository. Use --no-index to "
> "compare two paths outside a working tree"));
> usage_with_options(diff_no_index_usage, options);
> }
> - FREE_AND_NULL(options);
> for (i = 0; i < 2; i++) {
> const char *p = argv[i];
> if (!strcmp(p, "-"))
> @@ -337,7 +369,23 @@ int diff_no_index(struct rev_info *revs, const struct git_hash_algo *algop,
> paths[i] = p;
> }
>
> - fixup_paths(paths, &replacement);
> + if (fixup_paths(paths, &replacement)) {
> + parse_pathspec(&pathspec, PATHSPEC_FROMTOP | PATHSPEC_ATTR,
> + PATHSPEC_PREFER_FULL | PATHSPEC_NO_REPOSITORY,
> + NULL, &argv[2]);
> + if (pathspec.nr)
> + ps = &pathspec;
> +
> + skip1 = strlen(paths[0]);
> + skip1 += paths[0][skip1] == '/' ? 0 : 1;
> + skip2 = strlen(paths[1]);
> + skip2 += paths[1][skip2] == '/' ? 0 : 1;
> + } else if (argc > 2) {
> + warning(_("Limiting comparison with pathspecs is only "
> + "supported if both paths are directories."));
> + usage_with_options(diff_no_index_usage, options);
> + }
> + FREE_AND_NULL(options);
>
> revs->diffopt.skip_stat_unmatch = 1;
> if (!revs->diffopt.output_format)
> @@ -354,7 +402,8 @@ int diff_no_index(struct rev_info *revs, const struct git_hash_algo *algop,
> setup_diff_pager(&revs->diffopt);
> revs->diffopt.flags.exit_with_status = 1;
>
> - if (queue_diff(&revs->diffopt, algop, paths[0], paths[1], 0))
> + if (queue_diff(&revs->diffopt, algop, paths[0], paths[1], 0, ps,
> + skip1, skip2))
> goto out;
> diff_set_mnemonic_prefix(&revs->diffopt, "1/", "2/");
> diffcore_std(&revs->diffopt);
> @@ -370,5 +419,7 @@ int diff_no_index(struct rev_info *revs, const struct git_hash_algo *algop,
> for (i = 0; i < ARRAY_SIZE(to_free); i++)
> free(to_free[i]);
> strbuf_release(&replacement);
> + if (ps)
> + clear_pathspec(ps);
> return ret;
> }
> diff --git a/Documentation/git-diff.adoc b/Documentation/git-diff.adoc
> index dec173a34517..272331afbaec 100644
> --- a/Documentation/git-diff.adoc
> +++ b/Documentation/git-diff.adoc
> @@ -14,7 +14,7 @@ git diff [<options>] --cached [--merge-base] [<commit>] [--] [<path>...]
> git diff [<options>] [--merge-base] <commit> [<commit>...] <commit> [--] [<path>...]
> git diff [<options>] <commit>...<commit> [--] [<path>...]
> git diff [<options>] <blob> <blob>
> -git diff [<options>] --no-index [--] <path> <path>
> +git diff [<options>] --no-index [--] <path> <path> [<pathspec>...]
>
> DESCRIPTION
> -----------
> @@ -31,14 +31,18 @@ files on disk.
> further add to the index but you still haven't. You can
> stage these changes by using linkgit:git-add[1].
>
> -`git diff [<options>] --no-index [--] <path> <path>`::
> +`git diff [<options>] --no-index [--] <path> <path> [<pathspec>...]`::
>
> This form is to compare the given two paths on the
> filesystem. You can omit the `--no-index` option when
> running the command in a working tree controlled by Git and
> at least one of the paths points outside the working tree,
> or when running the command outside a working tree
> - controlled by Git. This form implies `--exit-code`.
> + controlled by Git. This form implies `--exit-code`. If both
> + paths point to directories, additional pathspecs may be
> + provided. These will limit the files included in the
> + difference. All such pathspecs must be relative as they
> + apply to both sides of the diff.
>
> `git diff [<options>] --cached [--merge-base] [<commit>] [--] [<path>...]`::
>
> diff --git a/t/t4053-diff-no-index.sh b/t/t4053-diff-no-index.sh
> index 5e5bad61ca1e..01db9243abfe 100755
> --- a/t/t4053-diff-no-index.sh
> +++ b/t/t4053-diff-no-index.sh
> @@ -295,4 +295,79 @@ test_expect_success PIPE,SYMLINKS 'diff --no-index reads from pipes' '
> test_cmp expect actual
> '
>
> +test_expect_success 'diff --no-index F F rejects pathspecs' '
> + test_must_fail git diff --no-index -- a/1 a/2 a 2>actual.err &&
> + test_grep "usage: git diff --no-index" actual.err
> +'
> +
> +test_expect_success 'diff --no-index D F rejects pathspecs' '
> + test_must_fail git diff --no-index -- a a/2 a 2>actual.err &&
> + test_grep "usage: git diff --no-index" actual.err
> +'
> +
> +test_expect_success 'diff --no-index F D rejects pathspecs' '
> + test_must_fail git diff --no-index -- a/1 b b 2>actual.err &&
> + test_grep "usage: git diff --no-index" actual.err
> +'
> +
> +test_expect_success 'diff --no-index rejects absolute pathspec' '
> + test_must_fail git diff --no-index -- a b $(pwd)/a/1
> +'
> +
> +test_expect_success 'diff --no-index with pathspec' '
> + test_expect_code 1 git diff --name-status --no-index a b 1 >actual &&
> + cat >expect <<-EOF &&
> + D a/1
> + EOF
> + test_cmp expect actual
> +'
> +
> +test_expect_success 'diff --no-index with pathspec no matches' '
> + test_expect_code 0 git diff --name-status --no-index a b missing
> +'
> +
> +test_expect_success 'diff --no-index with negative pathspec' '
> + test_expect_code 1 git diff --name-status --no-index a b ":!2" >actual &&
> + cat >expect <<-EOF &&
> + D a/1
> + EOF
> + test_cmp expect actual
> +'
> +
> +test_expect_success 'setup nested' '
> + mkdir -p c/1/2 &&
> + mkdir -p d/1/2 &&
> + echo 1 >c/1/2/a &&
> + echo 2 >c/1/2/b
> +'
> +
> +test_expect_success 'diff --no-index with pathspec nested negative pathspec' '
> + test_expect_code 0 git diff --no-index c d ":!1"
> +'
> +
> +test_expect_success 'diff --no-index with pathspec nested pathspec' '
> + test_expect_code 1 git diff --name-status --no-index c d 1/2 >actual &&
> + cat >expect <<-EOF &&
> + D c/1/2/a
> + D c/1/2/b
> + EOF
> + test_cmp expect actual
> +'
> +
> +test_expect_success 'diff --no-index with pathspec glob' '
> + test_expect_code 1 git diff --name-status --no-index c d ":(glob)**/a" >actual &&
> + cat >expect <<-EOF &&
> + D c/1/2/a
> + EOF
> + test_cmp expect actual
> +'
> +
> +test_expect_success 'diff --no-index with pathspec glob and exclude' '
> + test_expect_code 1 git diff --name-status --no-index c d ":(glob,exclude)**/a" >actual &&
> + cat >expect <<-EOF &&
> + D c/1/2/b
> + EOF
> + test_cmp expect actual
> +'
> +
> test_done
> --
> 2.48.1.397.gec9d649cc640
next prev parent reply other threads:[~2025-06-04 2:37 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-21 23:29 [PATCH v4 0/3] diff: add pathspec support to --no-index Jacob Keller
2025-05-21 23:29 ` [PATCH v4 1/3] pathspec: add match_leading_pathspec variant Jacob Keller
2025-05-21 23:29 ` [PATCH v4 2/3] pathspec: add flag to indicate operation without repository Jacob Keller
2025-05-21 23:29 ` [PATCH v4 3/3] diff --no-index: support limiting by pathspec Jacob Keller
2025-06-04 2:37 ` Ben Knoble [this message]
2025-06-04 17:22 ` Jacob Keller
2025-06-04 18:27 ` Jacob Keller
2025-06-04 20:19 ` Junio C Hamano
2025-06-04 21:05 ` Jacob Keller
2025-06-04 21:36 ` D. Ben Knoble
2025-06-04 23:22 ` Junio C Hamano
2025-09-23 14:57 ` Johannes Schindelin
2025-09-23 22:48 ` Jacob Keller
2025-09-24 11:19 ` Johannes Schindelin
2025-09-24 18:19 ` Jacob Keller
2025-09-24 18:23 ` Jacob Keller
2025-05-22 21:37 ` [PATCH v4 0/3] diff: add pathspec support to --no-index Junio C Hamano
2025-05-22 21:50 ` Jacob Keller
2025-05-22 22:04 ` Junio C Hamano
2025-06-03 21:12 ` Junio C Hamano
2025-06-04 2:32 ` Ben Knoble
2025-06-05 15:34 ` Phillip Wood
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=374BC043-8FB8-4052-BDE7-6BAE7F182994@gmail.com \
--to=ben.knoble@gmail$(echo .)com \
--cc=git@vger$(echo .)kernel.org \
--cc=gitster@pobox$(echo .)com \
--cc=jacob.e.keller@intel$(echo .)com \
--cc=jacob.keller@gmail$(echo .)com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox