From: Justin Tobler <jltobler@gmail•com>
To: git@vger•kernel.org
Cc: peff@peff•net, Justin Tobler <jltobler@gmail•com>
Subject: [PATCH v2 2/3] builtin: introduce diff-pairs command
Date: Tue, 11 Feb 2025 22:18:24 -0600 [thread overview]
Message-ID: <20250212041825.2455031-3-jltobler@gmail.com> (raw)
In-Reply-To: <20250212041825.2455031-1-jltobler@gmail.com>
Through git-diff(1), a single diff can be generated from a pair of blob
revisions directly. Unfortunately, there is not a mechanism to compute
batches of specific file pair diffs in a single process. Such a feature
is particularly useful on the server-side where diffing between a large
set of changes is not feasible all at once due to timeout concerns.
To facilitate this, introduce git-diff-pairs(1) which takes the
null-terminated raw diff format as input on stdin and produces diffs in
other formats. As the raw diff format already contains the necessary
metadata, it becomes possible to progressively generate batches of diffs
without having to recompute rename detection or retrieve object context.
Something like the following:
git diff-tree -r -z -M $old $new |
git diff-pairs -p
should generate the same output as `git diff-tree -p -M`. Furthermore,
each line of raw diff formatted input can also be individually fed to a
separate git-diff-pairs(1) process and still produce the same output.
Based-on-patch-by: Jeff King <peff@peff•net>
Signed-off-by: Justin Tobler <jltobler@gmail•com>
---
.gitignore | 1 +
Documentation/git-diff-pairs.adoc | 62 +++++++++++
Documentation/meson.build | 1 +
Makefile | 1 +
builtin.h | 1 +
builtin/diff-pairs.c | 178 ++++++++++++++++++++++++++++++
command-list.txt | 1 +
git.c | 1 +
meson.build | 1 +
t/meson.build | 1 +
t/t4070-diff-pairs.sh | 80 ++++++++++++++
11 files changed, 328 insertions(+)
create mode 100644 Documentation/git-diff-pairs.adoc
create mode 100644 builtin/diff-pairs.c
create mode 100755 t/t4070-diff-pairs.sh
diff --git a/.gitignore b/.gitignore
index e82aa19df0..03448c076a 100644
--- a/.gitignore
+++ b/.gitignore
@@ -54,6 +54,7 @@
/git-diff
/git-diff-files
/git-diff-index
+/git-diff-pairs
/git-diff-tree
/git-difftool
/git-difftool--helper
diff --git a/Documentation/git-diff-pairs.adoc b/Documentation/git-diff-pairs.adoc
new file mode 100644
index 0000000000..e9ef4a6615
--- /dev/null
+++ b/Documentation/git-diff-pairs.adoc
@@ -0,0 +1,62 @@
+git-diff-pairs(1)
+=================
+
+NAME
+----
+git-diff-pairs - Compare blob pairs generated by `diff-tree --raw`
+
+SYNOPSIS
+--------
+[verse]
+'git diff-pairs' [diff-options]
+
+DESCRIPTION
+-----------
+
+Given the output of `diff-tree -z` on its stdin, `diff-pairs` will
+reformat that output into whatever format is requested on its command
+line. For example:
+
+-----------------------------
+git diff-tree -z -M $a $b |
+git diff-pairs -p
+-----------------------------
+
+will compute the tree diff in one step (including renames), and then
+`diff-pairs` will compute and format the blob-level diffs for each pair.
+This can be used to modify the raw diff in the middle (without having to
+parse or re-create more complicated formats like `--patch`), or to
+compute diffs progressively over the course of multiple invocations of
+`diff-pairs`.
+
+Each blob pair is fed to the diff machinery individually queued and the output
+is flushed on stdin EOF.
+
+OPTIONS
+-------
+
+include::diff-options.adoc[]
+
+include::diff-generate-patch.adoc[]
+
+NOTES
+----
+
+`diff-pairs` should handle any input generated by `diff-tree --raw -z`.
+It may choke or otherwise misbehave on output from `diff-files`, etc.
+
+Here's an incomplete list of things that `diff-pairs` could do, but
+doesn't (mostly in the name of simplicity):
+
+ - Only `-z` input is accepted, not normal `--raw` input.
+
+ - Abbreviated sha1s are rejected in the input from `diff-tree`; if you
+ want to abbreviate the output, you can pass `--abbrev` to
+ `diff-pairs`.
+
+ - Pathspecs are not handled by `diff-pairs`; you can limit the diff via
+ the initial `diff-tree` invocation.
+
+GIT
+---
+Part of the linkgit:git[1] suite
diff --git a/Documentation/meson.build b/Documentation/meson.build
index ead8e48213..e5ee177022 100644
--- a/Documentation/meson.build
+++ b/Documentation/meson.build
@@ -41,6 +41,7 @@ manpages = {
'git-diagnose.adoc' : 1,
'git-diff-files.adoc' : 1,
'git-diff-index.adoc' : 1,
+ 'git-diff-pairs.adoc' : 1,
'git-difftool.adoc' : 1,
'git-diff-tree.adoc' : 1,
'git-diff.adoc' : 1,
diff --git a/Makefile b/Makefile
index 896d02339e..3b8e1ad15e 100644
--- a/Makefile
+++ b/Makefile
@@ -1232,6 +1232,7 @@ BUILTIN_OBJS += builtin/describe.o
BUILTIN_OBJS += builtin/diagnose.o
BUILTIN_OBJS += builtin/diff-files.o
BUILTIN_OBJS += builtin/diff-index.o
+BUILTIN_OBJS += builtin/diff-pairs.o
BUILTIN_OBJS += builtin/diff-tree.o
BUILTIN_OBJS += builtin/diff.o
BUILTIN_OBJS += builtin/difftool.o
diff --git a/builtin.h b/builtin.h
index f7b166b334..b2d2e9eb07 100644
--- a/builtin.h
+++ b/builtin.h
@@ -152,6 +152,7 @@ int cmd_diagnose(int argc, const char **argv, const char *prefix, struct reposit
int cmd_diff_files(int argc, const char **argv, const char *prefix, struct repository *repo);
int cmd_diff_index(int argc, const char **argv, const char *prefix, struct repository *repo);
int cmd_diff(int argc, const char **argv, const char *prefix, struct repository *repo);
+int cmd_diff_pairs(int argc, const char **argv, const char *prefix, struct repository *repo);
int cmd_diff_tree(int argc, const char **argv, const char *prefix, struct repository *repo);
int cmd_difftool(int argc, const char **argv, const char *prefix, struct repository *repo);
int cmd_env__helper(int argc, const char **argv, const char *prefix, struct repository *repo);
diff --git a/builtin/diff-pairs.c b/builtin/diff-pairs.c
new file mode 100644
index 0000000000..08f3ee81e5
--- /dev/null
+++ b/builtin/diff-pairs.c
@@ -0,0 +1,178 @@
+#include "builtin.h"
+#include "commit.h"
+#include "config.h"
+#include "diff.h"
+#include "diffcore.h"
+#include "gettext.h"
+#include "hex.h"
+#include "object.h"
+#include "parse-options.h"
+#include "revision.h"
+#include "strbuf.h"
+
+static unsigned parse_mode_or_die(const char *mode, const char **endp)
+{
+ uint16_t ret;
+
+ *endp = parse_mode(mode, &ret);
+ if (!*endp)
+ die("unable to parse mode: %s", mode);
+ return ret;
+}
+
+static void parse_oid(const char *p, struct object_id *oid, const char **endp,
+ const struct git_hash_algo *algop)
+{
+ if (parse_oid_hex_algop(p, oid, endp, algop) || *(*endp)++ != ' ')
+ die("unable to parse object id: %s", p);
+}
+
+static unsigned short parse_score(const char *score)
+{
+ unsigned long ret;
+ char *endp;
+
+ errno = 0;
+ ret = strtoul(score, &endp, 10);
+ ret *= MAX_SCORE / 100;
+ if (errno || endp == score || *endp || (unsigned short)ret != ret)
+ die("unable to parse rename/copy score: %s", score);
+ return ret;
+}
+
+static void flush_diff_queue(struct diff_options *options)
+{
+ /*
+ * If rename detection is not requested, use rename information from the
+ * raw diff formatted input. Setting found_follow ensures diffcore_std()
+ * does not mess with rename information already present in queued
+ * filepairs.
+ */
+ if (!options->detect_rename)
+ options->found_follow = 1;
+ diffcore_std(options);
+ diff_flush(options);
+}
+
+int cmd_diff_pairs(int argc, const char **argv, const char *prefix,
+ struct repository *repo)
+{
+ struct strbuf path_dst = STRBUF_INIT;
+ struct strbuf path = STRBUF_INIT;
+ struct strbuf meta = STRBUF_INIT;
+ struct rev_info revs;
+ int ret;
+
+ const char * const usage[] = {
+ N_("git diff-pairs [diff-options]"),
+ NULL
+ };
+ struct option options[] = {
+ OPT_END()
+ };
+
+ show_usage_with_options_if_asked(argc, argv, usage, options);
+
+ repo_init_revisions(repo, &revs, prefix);
+ repo_config(repo, git_diff_basic_config, NULL);
+ revs.disable_stdin = 1;
+ revs.abbrev = 0;
+ revs.diff = 1;
+
+ argc = setup_revisions(argc, argv, &revs, NULL);
+
+ /* Don't allow pathspecs at all. */
+ if (revs.prune_data.nr)
+ usage_with_options(usage, options);
+
+ if (!revs.diffopt.output_format)
+ revs.diffopt.output_format = DIFF_FORMAT_RAW;
+
+ while (1) {
+ struct object_id oid_a, oid_b;
+ struct diff_filepair *pair;
+ unsigned mode_a, mode_b;
+ const char *p;
+ char status;
+
+ if (strbuf_getline_nul(&meta, stdin) == EOF)
+ break;
+
+ p = meta.buf;
+ if (*p != ':')
+ die("invalid raw diff input");
+ p++;
+
+ mode_a = parse_mode_or_die(p, &p);
+ mode_b = parse_mode_or_die(p, &p);
+
+ parse_oid(p, &oid_a, &p, repo->hash_algo);
+ parse_oid(p, &oid_b, &p, repo->hash_algo);
+
+ status = *p++;
+
+ if (strbuf_getline_nul(&path, stdin) == EOF)
+ die("got EOF while reading path");
+
+ switch (status) {
+ case DIFF_STATUS_ADDED:
+ pair = diff_filepair_addremove(&revs.diffopt, '+',
+ mode_b, &oid_b,
+ 1, path.buf, 0);
+ if (pair)
+ pair->status = status;
+ break;
+
+ case DIFF_STATUS_DELETED:
+ pair = diff_filepair_addremove(&revs.diffopt, '-',
+ mode_a, &oid_a,
+ 1, path.buf, 0);
+ if (pair)
+ pair->status = status;
+ break;
+
+ case DIFF_STATUS_TYPE_CHANGED:
+ case DIFF_STATUS_MODIFIED:
+ pair = diff_filepair_change(&revs.diffopt,
+ mode_a, mode_b,
+ &oid_a, &oid_b, 1, 1,
+ path.buf, 0, 0);
+ if (pair)
+ pair->status = status;
+ break;
+
+ case DIFF_STATUS_RENAMED:
+ case DIFF_STATUS_COPIED:
+ {
+ struct diff_filespec *a, *b;
+
+ if (strbuf_getline_nul(&path_dst, stdin) == EOF)
+ die("got EOF while reading destination path");
+
+ a = alloc_filespec(path.buf);
+ b = alloc_filespec(path_dst.buf);
+ fill_filespec(a, &oid_a, 1, mode_a);
+ fill_filespec(b, &oid_b, 1, mode_b);
+
+ pair = diff_queue(&diff_queued_diff, a, b);
+ pair->status = status;
+ pair->score = parse_score(p);
+ pair->renamed_pair = 1;
+ }
+ break;
+
+ default:
+ die("unknown diff status: %c", status);
+ }
+ }
+
+ flush_diff_queue(&revs.diffopt);
+ ret = diff_result_code(&revs);
+
+ strbuf_release(&path_dst);
+ strbuf_release(&path);
+ strbuf_release(&meta);
+ release_revisions(&revs);
+
+ return ret;
+}
diff --git a/command-list.txt b/command-list.txt
index e0bb87b3b5..bb8acd51d8 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -95,6 +95,7 @@ git-diagnose ancillaryinterrogators
git-diff mainporcelain info
git-diff-files plumbinginterrogators
git-diff-index plumbinginterrogators
+git-diff-pairs plumbinginterrogators
git-diff-tree plumbinginterrogators
git-difftool ancillaryinterrogators complete
git-fast-export ancillarymanipulators
diff --git a/git.c b/git.c
index b23761480f..12bba872bb 100644
--- a/git.c
+++ b/git.c
@@ -540,6 +540,7 @@ static struct cmd_struct commands[] = {
{ "diff", cmd_diff, NO_PARSEOPT },
{ "diff-files", cmd_diff_files, RUN_SETUP | NEED_WORK_TREE | NO_PARSEOPT },
{ "diff-index", cmd_diff_index, RUN_SETUP | NO_PARSEOPT },
+ { "diff-pairs", cmd_diff_pairs, RUN_SETUP | NO_PARSEOPT },
{ "diff-tree", cmd_diff_tree, RUN_SETUP | NO_PARSEOPT },
{ "difftool", cmd_difftool, RUN_SETUP_GENTLY },
{ "fast-export", cmd_fast_export, RUN_SETUP },
diff --git a/meson.build b/meson.build
index fbb8105d96..66ce3326e8 100644
--- a/meson.build
+++ b/meson.build
@@ -537,6 +537,7 @@ builtin_sources = [
'builtin/diagnose.c',
'builtin/diff-files.c',
'builtin/diff-index.c',
+ 'builtin/diff-pairs.c',
'builtin/diff-tree.c',
'builtin/diff.c',
'builtin/difftool.c',
diff --git a/t/meson.build b/t/meson.build
index 4574280590..7ff17c6d29 100644
--- a/t/meson.build
+++ b/t/meson.build
@@ -500,6 +500,7 @@ integration_tests = [
't4067-diff-partial-clone.sh',
't4068-diff-symmetric-merge-base.sh',
't4069-remerge-diff.sh',
+ 't4070-diff-pairs.sh',
't4100-apply-stat.sh',
't4101-apply-nonl.sh',
't4102-apply-rename.sh',
diff --git a/t/t4070-diff-pairs.sh b/t/t4070-diff-pairs.sh
new file mode 100755
index 0000000000..e0a8e6f0a0
--- /dev/null
+++ b/t/t4070-diff-pairs.sh
@@ -0,0 +1,80 @@
+#!/bin/sh
+
+test_description='basic diff-pairs tests'
+. ./test-lib.sh
+
+# This creates a diff with added, modified, deleted, renamed, copied, and
+# typechange entries. That includes one in a subdirectory for non-recursive
+# tests, and both exact and inexact similarity scores.
+test_expect_success 'create commit with various diffs' '
+ echo to-be-gone >deleted &&
+ echo original >modified &&
+ echo now-a-file >symlink &&
+ test_seq 200 >two-hundred &&
+ test_seq 201 500 >five-hundred &&
+ git add . &&
+ test_tick &&
+ git commit -m base &&
+ git tag base &&
+
+ echo now-here >added &&
+ echo new >modified &&
+ rm deleted &&
+ mkdir subdir &&
+ echo content >subdir/file &&
+ mv two-hundred renamed &&
+ test_seq 201 500 | sed s/300/modified/ >copied &&
+ rm symlink &&
+ git add -A . &&
+ test_ln_s_add dest symlink &&
+ test_tick &&
+ git commit -m new &&
+ git tag new
+'
+
+test_expect_success 'diff-pairs recreates --raw' '
+ git diff-tree -r -M -C -C base new >expect &&
+ git diff-tree -r -M -C -C -z base new |
+ git diff-pairs >actual &&
+ test_cmp expect actual
+'
+
+test_expect_success 'diff-pairs can create -p output' '
+ git diff-tree -p -M -C -C base new >expect &&
+ git diff-tree -r -M -C -C -z base new |
+ git diff-pairs -p >actual &&
+ test_cmp expect actual
+'
+
+test_expect_success 'non-recursive --raw retains tree entry' '
+ git diff-tree base new >expect &&
+ git diff-tree -z base new |
+ git diff-pairs >actual &&
+ test_cmp expect actual
+'
+
+test_expect_success 'split input across multiple diff-pairs' '
+ write_script split-raw-diff "$PERL_PATH" <<-\EOF &&
+ $/ = "\0";
+ while (<>) {
+ my $meta = $_;
+ my $path = <>;
+ # renames have an extra path
+ my $path2 = <> if $meta =~ /[RC]\d+/;
+
+ open(my $fh, ">", sprintf "diff%03d", $.);
+ print $fh $meta, $path, $path2;
+ }
+ EOF
+
+ git diff-tree -p -M -C -C base new >expect &&
+
+ git diff-tree -r -z -M -C -C base new |
+ ./split-raw-diff &&
+ for i in diff*; do
+ git diff-pairs -p <$i || return 1
+ done >actual &&
+ test_cmp expect actual
+'
+
+test_done
--
2.48.1
next prev parent reply other threads:[~2025-02-12 4:22 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-13 4:23 [PATCH 0/3] batch blob diff generation Justin Tobler
2024-12-13 4:23 ` [PATCH 1/3] builtin: introduce diff-blob command Justin Tobler
2024-12-13 4:23 ` [PATCH 2/3] builtin/diff-blob: add "--stdin" option Justin Tobler
2024-12-13 4:23 ` [PATCH 3/3] builtin/diff-blob: Add "-z" option Justin Tobler
2024-12-13 8:12 ` [PATCH 0/3] batch blob diff generation Jeff King
2024-12-13 10:17 ` Junio C Hamano
2024-12-13 10:38 ` Jeff King
2024-12-15 2:07 ` Junio C Hamano
2024-12-15 2:17 ` Junio C Hamano
2024-12-16 11:11 ` Jeff King
2024-12-16 16:29 ` Junio C Hamano
2024-12-18 11:39 ` Jeff King
2024-12-18 14:53 ` Junio C Hamano
2024-12-20 9:09 ` Jeff King
2024-12-20 9:10 ` Jeff King
2024-12-13 16:41 ` Justin Tobler
2024-12-16 11:18 ` Jeff King
2024-12-13 22:34 ` Junio C Hamano
2024-12-15 23:24 ` Junio C Hamano
2024-12-16 11:30 ` Jeff King
2025-02-12 4:18 ` [PATCH v2 " Justin Tobler
2025-02-12 4:18 ` [PATCH v2 1/3] diff: return diff_filepair from diff queue helpers Justin Tobler
2025-02-12 9:06 ` Karthik Nayak
2025-02-12 17:35 ` Justin Tobler
2025-02-12 9:23 ` Patrick Steinhardt
2025-02-12 17:24 ` Justin Tobler
2025-02-13 5:45 ` Patrick Steinhardt
2025-02-12 4:18 ` Justin Tobler [this message]
2025-02-12 9:23 ` [PATCH v2 2/3] builtin: introduce diff-pairs command Patrick Steinhardt
2025-02-12 9:51 ` Karthik Nayak
2025-02-25 23:38 ` Justin Tobler
2025-02-12 11:40 ` Jean-Noël Avila
2025-02-12 16:50 ` Junio C Hamano
2025-02-19 22:19 ` Justin Tobler
2025-02-19 23:19 ` Junio C Hamano
2025-02-19 23:47 ` Junio C Hamano
2025-02-20 0:32 ` Justin Tobler
2025-02-20 14:56 ` Justin Tobler
2025-02-20 16:14 ` Junio C Hamano
2025-02-17 14:38 ` Phillip Wood
2025-02-19 20:51 ` Justin Tobler
2025-02-19 21:57 ` Junio C Hamano
2025-02-19 22:38 ` Justin Tobler
2025-02-26 14:47 ` Phillip Wood
2025-02-12 4:18 ` [PATCH v2 3/3] builtin/diff-pairs: allow explicit diff queue flush Justin Tobler
2025-02-12 9:23 ` Patrick Steinhardt
2025-02-17 14:38 ` Phillip Wood
2025-02-19 23:09 ` Justin Tobler
2025-02-25 23:39 ` [PATCH v3 0/3] batch blob diff generation Justin Tobler
2025-02-25 23:39 ` [PATCH v3 1/3] diff: return diff_filepair from diff queue helpers Justin Tobler
2025-02-26 18:04 ` Junio C Hamano
2025-02-25 23:39 ` [PATCH v3 2/3] builtin: introduce diff-pairs command Justin Tobler
2025-02-26 18:24 ` Junio C Hamano
2025-02-27 22:15 ` Justin Tobler
2025-02-27 9:35 ` Karthik Nayak
2025-02-27 22:36 ` Justin Tobler
2025-02-27 12:56 ` Patrick Steinhardt
2025-02-27 23:00 ` Justin Tobler
2025-02-25 23:39 ` [PATCH v3 3/3] builtin/diff-pairs: allow explicit diff queue flush Justin Tobler
2025-02-26 14:58 ` [PATCH v3 0/3] batch blob diff generation phillip.wood123
2025-02-27 22:04 ` Justin Tobler
2025-02-28 0:26 ` [PATCH v4 0/4] " Justin Tobler
2025-02-28 0:26 ` [PATCH v4 1/4] diff: return diff_filepair from diff queue helpers Justin Tobler
2025-02-28 0:26 ` [PATCH v4 2/4] diff: add option to skip resolving diff statuses Justin Tobler
2025-02-28 8:29 ` Patrick Steinhardt
2025-02-28 17:10 ` Justin Tobler
2025-02-28 0:26 ` [PATCH v4 3/4] builtin: introduce diff-pairs command Justin Tobler
2025-02-28 8:29 ` Patrick Steinhardt
2025-02-28 17:26 ` Justin Tobler
2025-02-28 0:26 ` [PATCH v4 4/4] builtin/diff-pairs: allow explicit diff queue flush Justin Tobler
2025-02-28 21:33 ` [PATCH v5 0/4] batch blob diff generation Justin Tobler
2025-02-28 21:33 ` [PATCH v5 1/4] diff: return diff_filepair from diff queue helpers Justin Tobler
2025-03-03 16:17 ` Junio C Hamano
2025-02-28 21:33 ` [PATCH v5 2/4] diff: add option to skip resolving diff statuses Justin Tobler
2025-03-03 16:19 ` Junio C Hamano
2025-02-28 21:33 ` [PATCH v5 3/4] builtin: introduce diff-pairs command Justin Tobler
2025-03-03 16:30 ` Junio C Hamano
2025-02-28 21:33 ` [PATCH v5 4/4] builtin/diff-pairs: allow explicit diff queue flush Justin Tobler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250212041825.2455031-3-jltobler@gmail.com \
--to=jltobler@gmail$(echo .)com \
--cc=git@vger$(echo .)kernel.org \
--cc=peff@peff$(echo .)net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox