From: Toon Claes <toon@iotcl•com>
To: Patrick Steinhardt <ps@pks•im>
Cc: git@vger•kernel.org, Jeff King <peff@peff•net>,
Justin Tobler <jltobler@gmail•com>
Subject: Re: [PATCH 3/3] diff: teach tree-diff a max-depth parameter
Date: Wed, 06 Aug 2025 16:49:30 +0200 [thread overview]
Message-ID: <87jz3gtshx.fsf@iotcl.com> (raw)
In-Reply-To: <aIm5x4kah8608Ba5@pks.im>
Patrick Steinhardt <ps@pks•im> writes:
> On Tue, Jul 29, 2025 at 08:57:44PM +0200, Toon Claes wrote:
>> From: Jeff King <peff@peff•net>
>>
>> When you are doing a tree-diff, there are basically two options: do not
>> recurse into subtrees at all, or recurse indefinitely. While most
>> callers would want to always recurse and see full pathnames, some may
>> want the efficiency of looking only at a particular level of the tree.
>> This is currently easy to do for the top-level (just turn off
>> recursion), but you cannot say "show me what changed in subdir/, but do
>> not recurse".
>>
>> This patch adds a max-depth parameter which is measured from the closest
>> pathspec match, so that you can do:
>>
>> git log --raw --max-depth=1 -- a/b/c
>>
>> and see the raw output for a/b/c/, but not those of a/b/c/d/
>> (instead of the raw output you would see for a/b/c/d).
>
> Hm, okay. So what happens if I pass both "a/b" and "a/b/c"? Would I see
> the contents of both trees?
Yes, the max-depth applies to both pathspecs and entries that match
either of them are shown.
>
>> diff --git a/Documentation/diff-options.adoc b/Documentation/diff-options.adoc
>> index f3a35d8141..46e6ed2d67 100644
>> --- a/Documentation/diff-options.adoc
>> +++ b/Documentation/diff-options.adoc
>> @@ -893,5 +893,33 @@ endif::git-format-patch[]
>> reverted with `--ita-visible-in-index`. Both options are
>> experimental and could be removed in future.
>>
>> +--max-depth=<n>::
>> +
>> + Limit diff recursion to `<n>` levels (implies `-r`). The depth
>> + is measured from the closest pathspec. Given a tree containing
>> + `foo/bar/baz`, the following list shows the matches generated by
>> + each set of options:
>> ++
>> +--
>> + - `--max-depth=0 -- foo`: `foo`
>> +
>> + - `--max-depth=1 -- foo`: `foo/bar`
>> +
>> + - `--max-depth=1 -- foo/bar`: `foo/bar/baz`
>> +
>> + - `--max-depth=1 -- foo foo/bar`: `foo/bar/baz`
>
> This partially answers my question, as we talk about the scenario where
> we have "foo/bar/baz", but no "foo/qux". If we had the latter, would
> that also be displayed here because it's 1 level deep from the closest
> matching pathspec ("foo")?
"foo/qux" is indeed within 1 level deep from "foo", so yes "foo/qux"
will be shown. "foo/qux/duck" not obviously.
>
>> + - `--max-depth=2 -- foo`: `foo/bar/baz`
>> +--
>> ++
>> +If no pathspec is given, the depth is measured as if all
>> +top-level entries were specified. Note that this is different
>> +than measuring from the root, in that `--max-depth=0` would
>> +still return `foo`. This allows you to still limit depth while
>> +asking for a subset of the top-level entries.
>> ++
>> +Note that this option is only supported for diffs between tree objects,
>> +not against the index or working tree.
>
> Let's also note that wildcard pathspecs aren't supported.
Ah yes, good point. Will add.
>> For more detailed explanation on these common options, see also
>> linkgit:gitdiffcore[7].
>> diff --git a/diff-lib.c b/diff-lib.c
>> index 244468dd1a..fa7c24c267 100644
>> --- a/diff-lib.c
>> +++ b/diff-lib.c
>> @@ -115,6 +115,9 @@ void run_diff_files(struct rev_info *revs, unsigned int option)
>> uint64_t start = getnanotime();
>> struct index_state *istate = revs->diffopt.repo->index;
>>
>> + if (revs->diffopt.max_depth_valid)
>> + die("max-depth is not supported for worktree diffs");
>
> This and the following error messages should be made translatable.
True. Will do.
>> diff --git a/diff.c b/diff.c
>> index dca87e164f..c03a59ac3b 100644
>> --- a/diff.c
>> +++ b/diff.c
>> @@ -5622,6 +5625,18 @@ static int diff_opt_rotate_to(const struct option *opt, const char *arg, int uns
>> return 0;
>> }
>>
>> +static int diff_opt_max_depth(const struct option *opt,
>> + const char *arg, int unset)
>> +{
>> + struct diff_options *options = opt->value;
>> +
>> + BUG_ON_OPT_NEG(unset);
>> + options->flags.recursive = 1;
>> + options->max_depth = strtol(arg, NULL, 10);
>
> We're missing error handling in case the argument is not an integer. We
> should probably use `git_parse_unsigned()` instead.
Absolutely. Well, TIL!
>
>> @@ -5894,6 +5909,10 @@ struct option *add_diff_options(const struct option *opts,
>> OPT_CALLBACK_F(0, "diff-filter", options, N_("[(A|C|D|M|R|T|U|X|B)...[*]]"),
>> N_("select files by diff type"),
>> PARSE_OPT_NONEG, diff_opt_diff_filter),
>> + OPT_CALLBACK_F(0, "max-depth", options, N_("<depth>"),
>> + N_("maximum tree depth to recurse"),
>> + PARSE_OPT_NONEG, diff_opt_max_depth),
>> +
>> {
>> .type = OPTION_CALLBACK,
>> .long_name = "output",
>
> Okay. We don't use `OPT_UNSIGNED()` because we also want to impliy the
> `recursive` flag. Wouldn't it be simpler though to use `OPT_UNSIGNED()`
> and then set the flag in `diff_setup_done()` like we already do for a
> couple of other options?
But how would you determine `max_depth_valid`?
And, without realizing, you touch a good point here. Looking at the
git-grep(1) docs, it says:
For each <pathspec> given on command line, descend at most <depth>
levels of directories. A value of -1 means no limit.
And:
-r::
--recursive::
Same as `--max-depth=-1`; this is the default.
We should handle -1 the same, that's currently not the case.
>> diff --git a/diff.h b/diff.h
>> index 62e5768a9a..e3767df237 100644
>> --- a/diff.h
>> +++ b/diff.h
>> @@ -404,6 +404,15 @@ struct diff_options {
>> struct strmap *additional_path_headers;
>>
>> int no_free;
>> +
>> + /*
>> + * The extra "valid" flag is a slight hack. The value "0" is perfectly
>> + * valid for max-depth. We would normally use -1 to indicate "not set",
>> + * but there are many code paths which assume that assume that just
>
> Double "assume that".
Thanks!
>> + * zero-ing out a diff_options is enough to initialize it.
>> + */
>> + int max_depth;
>> + int max_depth_valid;
>
> So in that case, shouldn't we convert `max_depth` to be unsigned and
> `max_depth_valid` to be a boolean?
As I mentioned above, -1 should mean "No limit", but should enable
recursion. Now, that doesn't mean we can't make the changes you suggest,
we just have to take that into account.
>> diff --git a/tree-diff.c b/tree-diff.c
>> index e00fc2f450..acd302500f 100644
>> --- a/tree-diff.c
>> +++ b/tree-diff.c
>> @@ -48,6 +49,73 @@
>> free((x)); \
>> } while(0)
>>
>> +/* Returns true if and only if "dir" is a leading directory of "path" */
>> +static int is_dir_prefix(const char *path, const char *dir, int dirlen)
>> +{
>> + return !strncmp(path, dir, dirlen) &&
>> + (!path[dirlen] || path[dirlen] == '/');
>> +}
>> +
>> +static int check_recursion_depth(struct strbuf *name,
>
> Let's mark the `name` parameter as `const` both here and in
> `should_recurse()` so that it becomes clear that it shouldn't be
> modified.
:+1:
>> + const struct pathspec *ps,
>> + int max_depth)
>> +{
>> + int i;
>> +
>> + if (!ps->nr)
>> + return within_depth(name->buf, name->len, 1, max_depth);
>> +
>> + /*
>> + * We look through the pathspecs in reverse-sorted order, because we
>> + * want to find the longest match first (e.g., "a/b" is better for
>> + * checking depth than "a/b/c").
>> + */
>> + for (i = ps->nr - 1; i >= 0; i--) {
>
> `nr` is of type `int` indeed, so the loop index is correct even though
> it feels wrong. But that's nothing we have to fix as part of this patch
> series. We could inline the declaration though and make it loop-local.
Ack.
>> + const struct pathspec_item *item = ps->items+i;
>> +
>> + /*
>> + * If the name to match is longer than the pathspec, then we
>> + * are only interested if the pathspec matches and we are
>> + * within the allowed depth.
>> + */
>> + if (name->len >= item->len) {
>> + if (!is_dir_prefix(name->buf, item->match, item->len))
>> + continue;
>> + return within_depth(name->buf + item->len,
>> + name->len - item->len,
>> + 1, max_depth);
>> + }
>> +
>> + /*
>> + * Otherwise, our name is shorter than the pathspec. We need to
>> + * check if it is a prefix of the pathspec; if so, we must
>> + * always recurse in order to process further (the resulting
>> + * paths we find might or might not match our pathspec, but we
>> + * cannot know until we recurse).
>> + */
>> + if (is_dir_prefix(item->match, name->buf, name->len))
>> + return 1;
>> + }
>> + return 0;
>> +}
>> +
>> +static int should_recurse(struct strbuf *name, struct diff_options *opt)
>> +{
>> + if (!opt->flags.recursive)
>> + return 0;
>> + if (!opt->max_depth_valid)
>> + return 1;
>> +
>> + /*
>> + * We catch this during diff_setup_done, but let's double-check
>> + * against any internal munging.
>> + */
>> + if (opt->pathspec.has_wildcard)
>> + die("BUG: wildcard pathspecs are incompatible with max-depth");
>
> Let's use `BUG()` instead. This patch series must be old, the function
> has been introduced 8 years ago in d8193743e0 (usage.c: add BUG()
> function, 2017-05-12).
Good catch. Even looking at your suggestion, at first I didn't realize
it wasn't using BUG(). ;)
--
Cheers,
Toon
next prev parent reply other threads:[~2025-08-06 14:49 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-29 18:57 [PATCH 0/3] Teach git-diff-tree(1) option --max-depth Toon Claes
2025-07-29 18:57 ` [PATCH 1/3] combine-diff: zero memory used for callback filepairs Toon Claes
2025-07-29 18:57 ` [PATCH 2/3] within_depth: fix return for empty path Toon Claes
2025-07-30 6:20 ` Patrick Steinhardt
2025-08-06 14:30 ` Toon Claes
2025-08-07 6:15 ` Patrick Steinhardt
2025-07-29 18:57 ` [PATCH 3/3] diff: teach tree-diff a max-depth parameter Toon Claes
2025-07-30 6:20 ` Patrick Steinhardt
2025-08-06 14:49 ` Toon Claes [this message]
2025-08-07 5:55 ` Patrick Steinhardt
2025-08-07 20:52 ` [PATCH v2 0/3] Teach git-diff-tree(1) option --max-depth Toon Claes
2025-08-07 20:52 ` [PATCH v2 1/3] combine-diff: zero memory used for callback filepairs Toon Claes
2025-08-07 20:52 ` [PATCH v2 2/3] within_depth: fix return for empty path Toon Claes
2025-08-07 20:52 ` [PATCH v2 3/3] diff: teach tree-diff a max-depth parameter Toon Claes
2025-08-14 15:15 ` [PATCH v2 0/3] Teach git-diff-tree(1) option --max-depth Junio C Hamano
2025-08-15 5:17 ` Patrick Steinhardt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87jz3gtshx.fsf@iotcl.com \
--to=toon@iotcl$(echo .)com \
--cc=git@vger$(echo .)kernel.org \
--cc=jltobler@gmail$(echo .)com \
--cc=peff@peff$(echo .)net \
--cc=ps@pks$(echo .)im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox