From: "René Scharfe" <l.s.r@web•de>
To: Junio C Hamano <gitster@pobox•com>
Cc: Git List <git@vger•kernel.org>,
"D. Ben Knoble" <ben.knoble@gmail•com>, Jeff King <peff@peff•net>,
Phillip Wood <phillip.wood@dunelm•org.uk>
Subject: Re: [PATCH v2] diff-index: don't queue unchanged filepairs with diff_change()
Date: Wed, 3 Dec 2025 16:06:46 +0100 [thread overview]
Message-ID: <5581a8d8-afe3-4097-8a27-7e52c7e915ce@web.de> (raw)
In-Reply-To: <9a253514-376f-49fd-99fe-f076ecb180b6@web.de>
On 12/2/25 11:07 PM, René Scharfe wrote:
> On 11/30/25 7:02 PM, Junio C Hamano wrote:
>> René Scharfe <l.s.r@web•de> writes:
>>
>>> Add a new streamlined function for queuing unchanged filepairs and
>>> use it in show_modified(), which is called by diff_cache() via
>>> oneway_diff() and do_oneway_diff(). It allocates only a single filespec
>>> for each filepair and uses it twice with reference counting. This has a
>>> measurable effect if there are a lot of them, like in the Linux repo:
>>>
>>> Benchmark 1: ./git_v2.52.0 -C ../linux diff --cached --find-copies-harder
>>> Time (mean ± σ): 31.8 ms ± 0.2 ms [User: 24.2 ms, System: 6.3 ms]
>>> Range (min … max): 31.5 ms … 32.3 ms 85 runs
>>>
>>> Benchmark 2: ./git -C ../linux diff --cached --find-copies-harder
>>> Time (mean ± σ): 23.9 ms ± 0.2 ms [User: 18.1 ms, System: 4.6 ms]
>>> Range (min … max): 23.5 ms … 24.4 ms 111 runs
>>>
>>> Summary
>>> ./git -C ../linux diff --cached --find-copies-harder ran
>>> 1.33 ± 0.01 times faster than ./git_v2.52.0 -C ../linux diff --cached --find-copies-harder
>>
>> Nice. Is this technique only applicable to diff-index among the
>> three diff plumbing siblings?
>
>> [...] it would apply to diff-tree, wouldn't it?
> Yes, but its diff_change() call is behind two layers of callbacks, which
> complicates things.
>
> And I don't know how to avoid adding an object ID comparison. Do we
> perhaps have that bit somewhere in tree-diff.c already and can pass it
> along the pathchange call?
Yes, new patch below. Not sure if it's better, though.
> Benchmark 1: ./git_v2.52.0 diff-tree --find-copies-harder -r v2.51.0 v2.51.1
> Time (mean ± σ): 78.3 ms ± 0.2 ms [User: 57.4 ms, System: 19.8 ms]
> Range (min … max): 77.9 ms … 78.7 ms 36 runs
>
> Benchmark 2: ./git diff-tree --find-copies-harder -r v2.51.0 v2.51.1
> Time (mean ± σ): 78.8 ms ± 0.2 ms [User: 57.9 ms, System: 19.8 ms]
> Range (min … max): 78.4 ms … 79.2 ms 36 runs
>
> Summary
> ./git_v2.52.0 diff-tree --find-copies-harder -r v2.51.0 v2.51.1 ran
> 1.01 ± 0.00 times faster than ./git diff-tree --find-copies-harder -r v2.51.0 v2.51.1
Benchmark 1: ./git_v2.52.0 diff-tree --find-copies-harder -r v2.51.0 v2.51.1
Time (mean ± σ): 75.6 ms ± 0.6 ms [User: 57.4 ms, System: 17.2 ms]
Range (min … max): 75.0 ms … 78.0 ms 37 runs
Benchmark 2: ./git diff-tree --find-copies-harder -r v2.51.0 v2.51.1
Time (mean ± σ): 76.0 ms ± 0.2 ms [User: 57.9 ms, System: 17.1 ms]
Range (min … max): 75.6 ms … 76.4 ms 37 runs
Summary
./git_v2.52.0 diff-tree --find-copies-harder -r v2.51.0 v2.51.1 ran
1.00 ± 0.01 times faster than ./git diff-tree --find-copies-harder -r v2.51.0 v2.51.1
Hmm, I probably ran some background task when I measured yesterday and got
worse results for the git_v2.52.0 baseline than today.
René
---
builtin/reset.c | 1 +
diff.c | 1 +
diff.h | 9 +++++++--
diffcore.h | 4 ++--
tree-diff.c | 48 +++++++++++++++++++++++++++---------------------
5 files changed, 38 insertions(+), 25 deletions(-)
diff --git a/builtin/reset.c b/builtin/reset.c
index ed35802af1..ec674694dd 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -210,6 +210,7 @@ static int read_from_tree(const struct pathspec *pathspec,
opt.repo = the_repository;
opt.change = diff_change;
opt.add_remove = diff_addremove;
+ opt.keep = diff_same;
if (pathspec->nr && pathspec_needs_expanded_index(the_repository->index, pathspec))
ensure_full_index(the_repository->index);
diff --git a/diff.c b/diff.c
index 436da250eb..9671524d2b 100644
--- a/diff.c
+++ b/diff.c
@@ -4847,6 +4847,7 @@ void repo_diff_setup(struct repository *r, struct diff_options *options)
/* pathchange left =NULL by default */
options->change = diff_change;
options->add_remove = diff_addremove;
+ options->keep = diff_same;
options->use_color = diff_use_color_default;
options->detect_rename = diff_detect_rename_default;
options->xdl_opts |= diff_algorithm;
diff --git a/diff.h b/diff.h
index 7eb84aadf4..2e3a5ac04a 100644
--- a/diff.h
+++ b/diff.h
@@ -43,7 +43,8 @@ struct oidset;
* set_default in diff_options can be used to tweak this more.
*
* - As you find different pairs of files, call `diff_change()` to feed
- * modified files, `diff_addremove()` to feed created or deleted files, or
+ * modified files, `diff_addremove()` to feed created or deleted files,
+ * `diff_same()` to feed unmodified files if needed for copy detection, or
* `diff_unmerge()` to feed a file whose state is 'unmerged' to the API.
* These are thin wrappers to a lower-level `diff_queue()` function that is
* flexible enough to record any of these kinds of changes.
@@ -76,7 +77,7 @@ struct rev_info;
struct userdiff_driver;
typedef int (*pathchange_fn_t)(struct diff_options *options,
- struct combine_diff_path *path);
+ struct combine_diff_path *path, bool is_change);
typedef void (*change_fn_t)(struct diff_options *options,
unsigned old_mode, unsigned new_mode,
@@ -92,6 +93,9 @@ typedef void (*add_remove_fn_t)(struct diff_options *options,
int oid_valid,
const char *fullpath, unsigned dirty_submodule);
+typedef void (*keep_fn_t)(struct diff_options *options, unsigned mode,
+ const struct object_id *oid, const char *fullpath);
+
typedef void (*diff_format_fn_t)(struct diff_queue_struct *q,
struct diff_options *options, void *data);
@@ -384,6 +388,7 @@ struct diff_options {
pathchange_fn_t pathchange;
change_fn_t change;
add_remove_fn_t add_remove;
+ keep_fn_t keep;
void *change_fn_data;
diff_format_fn_t format_callback;
void *format_callback_data;
diff --git a/diffcore.h b/diffcore.h
index 9c0a0e7aaf..64b419b33f 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -35,8 +35,8 @@ struct userdiff_driver;
/**
* the internal representation for a single file (blob). It records the blob
* object name (if known -- for a work tree file it typically is a NUL SHA-1),
- * filemode and pathname. This is what the `diff_addremove()`, `diff_change()`
- * and `diff_unmerge()` synthesize and feed `diff_queue()` function with.
+ * filemode and pathname. This is what `diff_addremove()`, `diff_change()`,
+ * `diff_same()` and `diff_unmerge()` synthesize and feed `diff_queue()`.
*/
struct diff_filespec {
struct object_id oid;
diff --git a/tree-diff.c b/tree-diff.c
index 5988148b60..e711456766 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -163,12 +163,17 @@ static int tree_entry_pathcmp(struct tree_desc *t1, struct tree_desc *t2)
* emits diff to first parent only, and tells diff tree-walker that we are done
* with p and it can be freed.
*/
-static int emit_diff_first_parent_only(struct diff_options *opt, struct combine_diff_path *p)
+static int emit_diff_first_parent_only(struct diff_options *opt,
+ struct combine_diff_path *p,
+ bool is_change)
{
struct combine_diff_parent *p0 = &p->parent[0];
if (p->mode && p0->mode) {
- opt->change(opt, p0->mode, p->mode, &p0->oid, &p->oid,
- 1, 1, p->path, 0, 0);
+ if (is_change)
+ opt->change(opt, p0->mode, p->mode, &p0->oid, &p->oid,
+ 1, 1, p->path, 0, 0);
+ else if (opt->keep)
+ opt->keep(opt, p->mode, &p->oid, p->path);
}
else {
const struct object_id *oid;
@@ -205,7 +210,7 @@ static int emit_diff_first_parent_only(struct diff_options *opt, struct combine_
static void emit_path(struct combine_diff_path ***tail,
struct strbuf *base, struct diff_options *opt,
int nparent, struct tree_desc *t, struct tree_desc *tp,
- int imin, int depth)
+ int imin, int depth, bool is_change)
{
unsigned short mode;
const char *path;
@@ -288,7 +293,7 @@ static void emit_path(struct combine_diff_path ***tail,
keep = 1;
if (opt->pathchange)
- keep = opt->pathchange(opt, p);
+ keep = opt->pathchange(opt, p, is_change);
if (keep) {
**tail = p;
@@ -518,26 +523,27 @@ static void ll_diff_tree_paths(
/* t = p[imin] */
if (cmp == 0) {
/* are either pi > p[imin] or diff(t,pi) != ø ? */
- if (!opt->flags.find_copies_harder) {
- for (i = 0; i < nparent; ++i) {
- /* p[i] > p[imin] */
- if (tp[i].entry.mode & S_IFXMIN_NEQ)
- continue;
+ bool is_change = true;
- /* diff(t,pi) != ø */
- if (!oideq(&t.entry.oid, &tp[i].entry.oid) ||
- (t.entry.mode != tp[i].entry.mode))
- continue;
+ for (i = 0; i < nparent; ++i) {
+ /* p[i] > p[imin] */
+ if (tp[i].entry.mode & S_IFXMIN_NEQ)
+ continue;
- goto skip_emit_t_tp;
- }
+ /* diff(t,pi) != ø */
+ if (!oideq(&t.entry.oid, &tp[i].entry.oid) ||
+ (t.entry.mode != tp[i].entry.mode))
+ continue;
+
+ is_change = false;
+ break;
}
/* D += {δ(t,pi) if pi=p[imin]; "+a" if pi > p[imin]} */
- emit_path(tail, base, opt, nparent,
- &t, tp, imin, depth);
+ if (is_change || opt->flags.find_copies_harder)
+ emit_path(tail, base, opt, nparent,
+ &t, tp, imin, depth, is_change);
- skip_emit_t_tp:
/* t↓, ∀ pi=p[imin] pi↓ */
update_tree_entry(&t);
update_tp_entries(tp, nparent);
@@ -547,7 +553,7 @@ static void ll_diff_tree_paths(
else if (cmp < 0) {
/* D += "+t" */
emit_path(tail, base, opt, nparent,
- &t, /*tp=*/NULL, -1, depth);
+ &t, /*tp=*/NULL, -1, depth, true);
/* t↓ */
update_tree_entry(&t);
@@ -563,7 +569,7 @@ static void ll_diff_tree_paths(
}
emit_path(tail, base, opt, nparent,
- /*t=*/NULL, tp, imin, depth);
+ /*t=*/NULL, tp, imin, depth, true);
skip_emit_tp:
/* ∀ pi=p[imin] pi↓ */
--
2.52.0
prev parent reply other threads:[~2025-12-03 15:06 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-30 11:47 [PATCH v2] diff-index: don't queue unchanged filepairs with diff_change() René Scharfe
2025-11-30 18:02 ` Junio C Hamano
2025-12-02 21:16 ` René Scharfe
2025-12-02 22:07 ` René Scharfe
2025-12-03 15:06 ` René Scharfe [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5581a8d8-afe3-4097-8a27-7e52c7e915ce@web.de \
--to=l.s.r@web$(echo .)de \
--cc=ben.knoble@gmail$(echo .)com \
--cc=git@vger$(echo .)kernel.org \
--cc=gitster@pobox$(echo .)com \
--cc=peff@peff$(echo .)net \
--cc=phillip.wood@dunelm$(echo .)org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox