From: Phillip Wood <phillip.wood123@gmail•com>
To: Niels Glodny <n.glodny@campus•lmu.de>, git@vger•kernel.org
Cc: johannes.schindelin@gmx•de, peff@peff•net, phillip.wood@dunelm•org.uk
Subject: Re: [PATCH] xdiff: disable cleanup_records heuristic with --minimal
Date: Sun, 27 Apr 2025 16:04:10 +0100 [thread overview]
Message-ID: <23078e29-8f1f-4eb3-be71-7ef419252bab@gmail.com> (raw)
In-Reply-To: <20250425155951.1227700-1-n.glodny@campus.lmu.de>
Hi Niels
On 25/04/2025 16:59, Niels Glodny wrote:
> The cleanup_records function marks some lines as changed
> before running the actual diff algorithm. For most lines,
> this is a good performance optimization, but it also marks
> lines that are surrounded by many changed lines as changed
> as well. This can cause redundant changes and longer-than-
> necessary diffs.
Nicely explained. We normally wrap commit messages at 72 characters
> Whether this results in better-looking diffs is subjective.
> However, the --minimal flag explicitly requests the shortest
> possible diff.
Looking at the diff for 2ff58dec493 (refs: introduce function to batch
refname availability checks, 2025-03-12) I'd say it is definitely less
readable with this change but some of the changes in 320f2061b63 (t0602:
use subshell to ensure working directory unchanged, 2025-02-28) are more
readable. Anyway as you say if we promise to find the minimal diff then
that is what we should do.
> The performance impact of this change is negligible, and it
> results in shorter diffs in about 1.3% of diffs in Git's
> history.
Have you got any numbers for the performance change?
I think the premise of this patch is sound, I've left a couple of
comments below as I think we can simplify the code changes.
> diff --git a/t/t4071-diff-minimal.sh b/t/t4071-diff-minimal.sh
> new file mode 100755
> index 0000000000..3ad759dab4
> --- /dev/null
> +++ b/t/t4071-diff-minimal.sh
> @@ -0,0 +1,16 @@
> +#!/bin/sh
> +
> +test_description='minimal diff algorithm'
> +
> +. ./test-lib.sh
> +
> +test_expect_success 'minimal diff should not mark changes between changed lines' '
> + printf "x\nx\nx\nx\n" >pre &&
> + printf "x\nx\nx\nA\nB\nC\nD\nx\nE\nF\nG\n" >post &&
> + test_must_fail git diff --no-index \
> + --minimal pre post >diff &&
> + ! grep "+x" diff &&
> + ! grep "-x" diff
> +'
Thanks for taking the trouble to add a new test. We have a few test
helper functions which we could use here.
test_write_lines x x x x >pre &&
test_write_lines x x x A B C D x E F G >post &&
test_expect_code 1 git diff --no-index --minimal pre post >diff &&
test_grep ! ^[-+]x diff
test_expect_code ensures that the non-zero exit code is from the files
not matching rather than another error like an invalid option or missing
file. test_grep will display the file if there are matches which helps
when debugging test failures.
> [...]> -static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t
*xdf1, xdfile_t *xdf2) {
> +static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1,
> + xdfile_t *xdf2, int need_min) {
> long i, nm, nreff, mlim;
> xrecord_t **recs;
> xdlclass_t *rcrec;
> @@ -379,7 +383,7 @@ static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xd
xdlclassifier_t carries a copy of the flag that were interested in so I
think at the start of xdl_cleanup_records we can add
int need_min = !!(cf->flags & XDF_NEED_MINIMAL);
and then we don't need to change the signature.
> for (i = xdf1->dstart, recs = &xdf1->recs[xdf1->dstart]; i <= xdf1->dend; i++, recs++) {
> rcrec = cf->rcrecs[(*recs)->ha];
> nm = rcrec ? rcrec->len2 : 0;
> - dis1[i] = (nm == 0) ? 0: (nm >= mlim) ? 2: 1;
> + dis1[i] = (nm == 0) ? 0: (nm >= mlim && !need_min) ? 2: 1;
If we want a minimal diff then we force dis1[i] to be 1 so that this
line will never be marked as changed before searching for the longest
common sequence. Well spotted.
Best Wishes
Phillip
> }
>
> if ((mlim = xdl_bogosqrt(xdf2->nrec)) > XDL_MAX_EQLIMIT)
> @@ -387,7 +391,7 @@ static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xd
> for (i = xdf2->dstart, recs = &xdf2->recs[xdf2->dstart]; i <= xdf2->dend; i++, recs++) {
> rcrec = cf->rcrecs[(*recs)->ha];
> nm = rcrec ? rcrec->len1 : 0;
> - dis2[i] = (nm == 0) ? 0: (nm >= mlim) ? 2: 1;
> + dis2[i] = (nm == 0) ? 0: (nm >= mlim && !need_min) ? 2: 1;
> }
>
> for (nreff = 0, i = xdf1->dstart, recs = &xdf1->recs[xdf1->dstart];
> @@ -449,10 +453,10 @@ static int xdl_trim_ends(xdfile_t *xdf1, xdfile_t *xdf2) {
> }
>
>
> -static int xdl_optimize_ctxs(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xdf2) {
> -
> +static int xdl_optimize_ctxs(xdlclassifier_t *cf, xdfile_t *xdf1,
> + xdfile_t *xdf2, int need_min) {
> if (xdl_trim_ends(xdf1, xdf2) < 0 ||
> - xdl_cleanup_records(cf, xdf1, xdf2) < 0) {
> + xdl_cleanup_records(cf, xdf1, xdf2, need_min) < 0) {
>
> return -1;
> }
>
> base-commit: f65182a99e545d2f2bc22e6c1c2da192133b16a3
next prev parent reply other threads:[~2025-04-27 15:04 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-25 15:59 [PATCH] xdiff: disable cleanup_records heuristic with --minimal Niels Glodny
2025-04-27 15:04 ` Phillip Wood [this message]
2025-04-27 21:44 ` Niels Glodny
2025-04-28 17:05 ` Junio C Hamano
2025-04-27 22:06 ` [PATCH v2] " Niels Glodny
2025-04-29 9:00 ` Phillip Wood
2025-04-29 14:09 ` [PATCH v3] " Niels Glodny
2025-05-06 13:21 ` Phillip Wood
2025-05-06 22:50 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=23078e29-8f1f-4eb3-be71-7ef419252bab@gmail.com \
--to=phillip.wood123@gmail$(echo .)com \
--cc=git@vger$(echo .)kernel.org \
--cc=johannes.schindelin@gmx$(echo .)de \
--cc=n.glodny@campus$(echo .)lmu.de \
--cc=peff@peff$(echo .)net \
--cc=phillip.wood@dunelm$(echo .)org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox