From: Phillip Wood <phillip.wood123@gmail•com>
To: Ezekiel Newren via GitGitGadget <gitgitgadget@gmail•com>,
git@vger•kernel.org
Cc: Elijah Newren <newren@gmail•com>,
Ezekiel Newren <ezekielnewren@gmail•com>,
"brian m. carlson" <sandals@crustytoothpaste•net>,
Taylor Blau <me@ttaylorr•com>
Subject: Re: [PATCH 5/7] xdiff: separate parsing lines from hashing them
Date: Fri, 18 Jul 2025 14:34:50 +0100 [thread overview]
Message-ID: <ec15f63f-1ddd-4b10-9c7d-c21d81ead953@gmail.com> (raw)
In-Reply-To: <2db30cc739efadf8383bd9dc1b7825ce863e8f5a.1752784344.git.gitgitgadget@gmail.com>
Hi Ezekiel
On 17/07/2025 21:32, Ezekiel Newren via GitGitGadget wrote:
> From: Ezekiel Newren <ezekielnewren@gmail•com>
>
> We want to use xxhash for faster hashing. To facilitate that
> and to simplify the code. Separate the concerns of parsing
> and hashing into discrete steps. This makes swapping the hash
> function much easier. Since xdl_hash_record() both parses and
> hashses lines, this requires some slight code restructuring.
That makes sense though unfortunately we seem to have lost some error
handling in the restructuring. How much does this extra pass over the
input data slow down the cases that don't end up using xxhash?
> Signed-off-by: Ezekiel Newren <ezekielnewren@gmail•com>
> ---
> xdiff/xprepare.c | 75 ++++++++++++++++++++++++++++--------------------
> 1 file changed, 44 insertions(+), 31 deletions(-)
>
> diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c
> index 747268e4fdf7..c44005e9bbb8 100644
> --- a/xdiff/xprepare.c
> +++ b/xdiff/xprepare.c
> @@ -129,13 +129,39 @@ static int xdl_classify_record(unsigned int pass, xdlclassifier_t *cf, xrecord_t
> }
>
>
> +static void xdl_parse_lines(mmfile_t *mf, long narec, xdfile_t *xdf) {
> + u8 const* ptr = (u8 const*) mf->ptr;
> + usize len = (usize) mf->size;
> +
> + xdf->recs = NULL;
> + xdf->nrec = 0;
> + XDL_ALLOC_ARRAY(xdf->recs, narec);
This should return error if the allocation fails like the original code.
Although that does not make any difference for git a number of other
projects such as libgit2 carry a copy of our xdiff code and want to be
able to handle allocation failures.
> + while (len > 0) {
> + xrecord_t *rec = NULL;
> + usize length;
> + u8 const* result = memchr(ptr, '\n', len);
> + if (result) {
> + length = result - ptr + 1;
> + } else {
> + length = len;
> + }
> + if (XDL_ALLOC_GROW(xdf->recs, xdf->nrec + 1, narec))
> + die("XDL_ALLOC_GROW failed");
We should return an error rather than dying here
> + rec = xdl_cha_alloc(&xdf->rcha);
We should return an error if the call fails like the original code
> + rec->ptr = ptr;
> + rec->size = length;
> + rec->ha = 0;
> + xdf->recs[xdf->nrec++] = rec;
> + ptr += length;
> + len -= length;
> + }
> +
> +}
> +
> +
> static int xdl_prepare_ctx(unsigned int pass, mmfile_t *mf, long narec, xpparam_t const *xpp,
> xdlclassifier_t *cf, xdfile_t *xdf) {
> - long nrec, bsize;
> - unsigned long hav;
> - char const *blk, *cur, *top, *prev;
> - xrecord_t *crec;
> - xrecord_t **recs;
> unsigned long *ha;
> char *rchg;
> long *rindex;
> @@ -143,50 +169,37 @@ static int xdl_prepare_ctx(unsigned int pass, mmfile_t *mf, long narec, xpparam_
> ha = NULL;
> rindex = NULL;
> rchg = NULL;
> - recs = NULL;
>
> if (xdl_cha_init(&xdf->rcha, sizeof(xrecord_t), narec / 4 + 1) < 0)
> goto abort;
> - if (!XDL_ALLOC_ARRAY(recs, narec))
> - goto abort;
>
> - nrec = 0;
> - if ((cur = blk = xdl_mmfile_first(mf, &bsize))) {
> - for (top = blk + bsize; cur < top; ) {
> - prev = cur;
> - hav = xdl_hash_record(&cur, top, xpp->flags);
> - if (XDL_ALLOC_GROW(recs, nrec + 1, narec))
> - goto abort;
> - if (!(crec = xdl_cha_alloc(&xdf->rcha)))
> - goto abort;
> - crec->ptr = (u8 const*) prev;
> - crec->size = (long) (cur - prev);
> - crec->ha = hav;
> - recs[nrec++] = crec;
> - if (xdl_classify_record(pass, cf, crec) < 0)
> - goto abort;
> - }
> + xdl_parse_lines(mf, narec, xdf);
> +
> + for (usize i = 0; i < (usize) xdf->nrec; i++) {
> + xrecord_t *rec = xdf->recs[i];
> + char const* dump = (char const*) rec->ptr;
> + rec->ha = xdl_hash_record(&dump, (char const*) (rec->ptr + rec->size), xpp->flags);
I think we should update xdl_hash_record() to stop updating dump and use
the length from xdl_parse_lines(). Now that we parse the lines before
hashing we should use that length in the hash function so we have a
single definition of line length.
> + xdl_classify_record(pass, cf, rec);
We should return an error if this call fails like the original code
Thanks
Phillip
> }
>
> - if (!XDL_CALLOC_ARRAY(rchg, nrec + 2))
> +
> + if (!XDL_CALLOC_ARRAY(rchg, xdf->nrec + 2))
> goto abort;
>
> if ((XDF_DIFF_ALG(xpp->flags) != XDF_PATIENCE_DIFF) &&
> (XDF_DIFF_ALG(xpp->flags) != XDF_HISTOGRAM_DIFF)) {
> - if (!XDL_ALLOC_ARRAY(rindex, nrec + 1))
> + if (!XDL_ALLOC_ARRAY(rindex, xdf->nrec + 1))
> goto abort;
> - if (!XDL_ALLOC_ARRAY(ha, nrec + 1))
> + if (!XDL_ALLOC_ARRAY(ha, xdf->nrec + 1))
> goto abort;
> }
>
> - xdf->nrec = nrec;
> - xdf->recs = recs;
> xdf->rchg = rchg + 1;
> xdf->rindex = rindex;
> xdf->nreff = 0;
> xdf->ha = ha;
> xdf->dstart = 0;
> - xdf->dend = nrec - 1;
> + xdf->dend = xdf->nrec - 1;
>
> return 0;
>
> @@ -194,7 +207,7 @@ abort:
> xdl_free(ha);
> xdl_free(rindex);
> xdl_free(rchg);
> - xdl_free(recs);
> + xdl_free(xdf->recs);
> xdl_cha_free(&xdf->rcha);
> return -1;
> }
next prev parent reply other threads:[~2025-07-18 13:34 UTC|newest]
Thread overview: 204+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-17 20:32 [PATCH 0/7] RFC: Accelerate xdiff and begin its rustification Ezekiel Newren via GitGitGadget
2025-07-17 20:32 ` [PATCH 1/7] xdiff: introduce rust Ezekiel Newren via GitGitGadget
2025-07-17 21:30 ` brian m. carlson
2025-07-17 21:54 ` Junio C Hamano
2025-07-17 22:39 ` Taylor Blau
2025-07-18 23:15 ` Ezekiel Newren
2025-07-23 21:57 ` brian m. carlson
2025-07-23 22:26 ` Junio C Hamano
2025-07-28 19:11 ` Ezekiel Newren
2025-07-31 22:37 ` brian m. carlson
2025-07-22 22:02 ` Mike Hommey
2025-07-22 23:52 ` brian m. carlson
2025-07-17 22:38 ` Taylor Blau
2025-07-17 20:32 ` [PATCH 2/7] xdiff/xprepare: remove superfluous forward declarations Ezekiel Newren via GitGitGadget
2025-07-17 22:41 ` Taylor Blau
2025-07-17 20:32 ` [PATCH 3/7] xdiff: delete unnecessary fields from xrecord_t and xdfile_t Ezekiel Newren via GitGitGadget
2025-07-17 20:32 ` [PATCH 4/7] xdiff: make fields of xrecord_t Rust friendly Ezekiel Newren via GitGitGadget
2025-07-17 22:46 ` Taylor Blau
2025-07-17 23:13 ` brian m. carlson
2025-07-17 23:37 ` Elijah Newren
2025-07-18 0:23 ` Taylor Blau
2025-07-18 0:21 ` Taylor Blau
2025-07-18 13:35 ` Phillip Wood
2025-07-28 19:34 ` Ezekiel Newren
2025-07-28 19:52 ` Phillip Wood
2025-07-28 20:14 ` Ezekiel Newren
2025-07-31 14:20 ` Phillip Wood
2025-07-31 20:58 ` Ezekiel Newren
2025-08-01 9:14 ` Phillip Wood
2025-07-28 20:53 ` Junio C Hamano
2025-07-28 20:00 ` Collin Funk
2025-07-20 1:39 ` Johannes Schindelin
2025-07-17 20:32 ` [PATCH 5/7] xdiff: separate parsing lines from hashing them Ezekiel Newren via GitGitGadget
2025-07-17 22:59 ` Taylor Blau
2025-07-18 13:34 ` Phillip Wood [this message]
2025-07-17 20:32 ` [PATCH 6/7] xdiff: conditionally use Rust's implementation of xxhash Ezekiel Newren via GitGitGadget
2025-07-17 23:29 ` Taylor Blau
2025-07-18 19:00 ` Junio C Hamano
2025-07-31 21:13 ` Ezekiel Newren
2025-08-02 7:53 ` Matthias Aßhauer
2025-07-19 21:53 ` Johannes Schindelin
2025-07-20 10:14 ` Phillip Wood
2025-09-23 9:57 ` gitoxide-compatible licensing of Git's Rust code, was " Johannes Schindelin
2025-09-23 17:48 ` Jeff King
2025-09-24 13:48 ` Phillip Wood
2025-09-25 2:25 ` Jeff King
2025-09-25 5:42 ` Patrick Steinhardt
2025-09-26 10:06 ` Phillip Wood
2025-10-03 3:18 ` Jeff King
2025-10-03 9:51 ` Phillip Wood
2025-10-07 9:11 ` Patrick Steinhardt
2025-11-17 13:37 ` Johannes Schindelin
2025-10-05 5:32 ` Yee Cheng Chin
2025-07-17 20:32 ` [PATCH 7/7] github_workflows: install rust Ezekiel Newren via GitGitGadget
2025-07-17 21:23 ` brian m. carlson
2025-07-18 23:01 ` Ezekiel Newren
2025-07-25 23:56 ` Ben Knoble
2025-07-19 21:54 ` Johannes Schindelin
2025-07-17 21:51 ` [PATCH 0/7] RFC: Accelerate xdiff and begin its rustification brian m. carlson
2025-07-17 22:25 ` Taylor Blau
2025-07-18 0:29 ` brian m. carlson
2025-07-22 12:21 ` Patrick Steinhardt
2025-07-22 15:56 ` Junio C Hamano
2025-07-22 16:03 ` Sam James
2025-07-22 21:37 ` Elijah Newren
2025-07-22 21:55 ` Sam James
2025-07-22 22:08 ` Collin Funk
2025-07-18 9:23 ` Christian Brabandt
2025-07-18 16:26 ` Junio C Hamano
2025-07-19 0:32 ` Elijah Newren
2025-07-18 13:34 ` Phillip Wood
2025-07-18 21:25 ` Eli Schwartz
2025-07-19 0:48 ` Haelwenn (lanodan) Monnier
2025-07-22 12:21 ` Patrick Steinhardt
2025-07-22 14:24 ` Patrick Steinhardt
2025-07-22 15:14 ` Eli Schwartz
2025-07-22 15:56 ` Sam James
2025-07-23 4:32 ` Patrick Steinhardt
2025-07-24 9:01 ` Pierre-Emmanuel Patry
2025-07-24 10:00 ` Patrick Steinhardt
2025-07-28 9:06 ` Pierre-Emmanuel Patry
2025-07-18 14:38 ` Junio C Hamano
2025-07-18 21:56 ` Ezekiel Newren
2025-07-21 10:14 ` Phillip Wood
2025-07-21 18:33 ` Junio C Hamano
2025-07-19 21:53 ` Johannes Schindelin
2025-07-20 8:45 ` Matthias Aßhauer
2025-08-15 1:22 ` [PATCH v2 00/17] " Ezekiel Newren via GitGitGadget
2025-08-15 1:22 ` [PATCH v2 01/17] doc: add a policy for using Rust brian m. carlson via GitGitGadget
2025-08-15 17:03 ` Matthias Aßhauer
2025-08-15 21:31 ` Junio C Hamano
2025-08-16 8:06 ` Matthias Aßhauer
2025-08-19 2:06 ` Ezekiel Newren
2025-08-15 1:22 ` [PATCH v2 02/17] xdiff: introduce rust Ezekiel Newren via GitGitGadget
2025-08-15 1:22 ` [PATCH v2 03/17] xdiff/xprepare: remove superfluous forward declarations Ezekiel Newren via GitGitGadget
2025-08-15 1:22 ` [PATCH v2 04/17] xdiff: delete unnecessary fields from xrecord_t and xdfile_t Ezekiel Newren via GitGitGadget
2025-08-15 1:22 ` [PATCH v2 05/17] xdiff: make fields of xrecord_t Rust friendly Ezekiel Newren via GitGitGadget
2025-08-15 1:22 ` [PATCH v2 06/17] xdiff: separate parsing lines from hashing them Ezekiel Newren via GitGitGadget
2025-08-15 1:22 ` [PATCH v2 07/17] xdiff: conditionally use Rust's implementation of xxhash Ezekiel Newren via GitGitGadget
2025-08-15 1:22 ` [PATCH v2 08/17] github workflows: install rust Ezekiel Newren via GitGitGadget
2025-08-15 1:22 ` [PATCH v2 09/17] Do support Windows again after requiring Rust Johannes Schindelin via GitGitGadget
2025-08-15 17:12 ` Matthias Aßhauer
2025-08-15 21:48 ` Junio C Hamano
2025-08-15 22:11 ` Johannes Schindelin
2025-08-15 23:37 ` Junio C Hamano
2025-08-15 23:37 ` Junio C Hamano
2025-08-16 8:53 ` Matthias Aßhauer
2025-08-17 15:57 ` Junio C Hamano
2025-08-19 2:22 ` Ezekiel Newren
2025-08-15 1:22 ` [PATCH v2 10/17] win+Meson: allow for xdiff to be compiled with MSVC Johannes Schindelin via GitGitGadget
2025-08-15 1:22 ` [PATCH v2 11/17] win+Meson: do allow linking with the Rust-built xdiff Johannes Schindelin via GitGitGadget
2025-08-15 1:22 ` [PATCH v2 12/17] github workflows: define rust versions and targets in the same place Ezekiel Newren via GitGitGadget
2025-08-15 1:22 ` [PATCH v2 13/17] github workflows: upload Cargo.lock Ezekiel Newren via GitGitGadget
2025-08-15 1:22 ` [PATCH v2 14/17] xdiff: implement a white space iterator in Rust Ezekiel Newren via GitGitGadget
2025-08-15 1:22 ` [PATCH v2 15/17] xdiff: create line_hash() and line_equal() Ezekiel Newren via GitGitGadget
2025-08-15 1:22 ` [PATCH v2 16/17] xdiff: optimize case where --ignore-cr-at-eol is the only whitespace flag Ezekiel Newren via GitGitGadget
2025-08-15 1:22 ` [PATCH v2 17/17] xdiff: use rust's version of whitespace processing Ezekiel Newren via GitGitGadget
2025-08-15 15:07 ` [-SPAM-] [PATCH v2 00/17] RFC: Accelerate xdiff and begin its rustification Ramsay Jones
2025-08-19 2:00 ` Elijah Newren
2025-08-24 16:52 ` Patrick Steinhardt
2025-08-18 22:31 ` Junio C Hamano
2025-08-18 23:52 ` Ben Knoble
2025-08-19 1:52 ` Elijah Newren
2025-08-19 9:47 ` Junio C Hamano
2025-08-23 3:55 ` [PATCH v3 00/15] RFC: Cleanup " Ezekiel Newren via GitGitGadget
2025-08-23 3:55 ` [PATCH v3 01/15] doc: add a policy for using Rust brian m. carlson via GitGitGadget
2025-08-23 3:55 ` [PATCH v3 02/15] xdiff: introduce rust Ezekiel Newren via GitGitGadget
2025-08-23 13:43 ` rsbecker
2025-08-23 14:26 ` Kristoffer Haugsbakk
2025-08-23 15:06 ` rsbecker
2025-08-23 18:30 ` Elijah Newren
2025-08-23 19:24 ` brian m. carlson
2025-08-23 20:04 ` rsbecker
2025-08-23 20:36 ` Sam James
2025-08-23 21:17 ` Haelwenn (lanodan) Monnier
2025-08-27 1:57 ` Taylor Blau
2025-08-27 14:39 ` rsbecker
2025-08-27 17:06 ` Junio C Hamano
2025-08-27 17:15 ` rsbecker
2025-08-27 20:12 ` Taylor Blau
2025-08-27 20:22 ` Junio C Hamano
2025-09-02 11:16 ` Patrick Steinhardt
2025-09-02 11:30 ` Sam James
2025-09-02 17:27 ` brian m. carlson
2025-09-02 18:47 ` Sam James
2025-09-03 18:22 ` Collin Funk
2025-09-03 5:40 ` Patrick Steinhardt
2025-09-03 16:22 ` Ramsay Jones
2025-09-03 22:10 ` Junio C Hamano
2025-09-03 22:48 ` Josh Steadmon
2025-09-04 11:10 ` Patrick Steinhardt
2025-09-04 15:45 ` Junio C Hamano
2025-09-05 8:23 ` Patrick Steinhardt
2025-09-04 0:57 ` brian m. carlson
2025-09-04 11:39 ` Patrick Steinhardt
2025-09-04 13:53 ` Sam James
2025-09-05 3:55 ` Elijah Newren
2025-09-04 23:17 ` Ezekiel Newren
2025-09-05 3:54 ` Elijah Newren
2025-09-05 6:50 ` Patrick Steinhardt
2025-09-07 4:10 ` Elijah Newren
2025-09-07 16:09 ` rsbecker
2025-09-08 10:12 ` Phillip Wood
2025-09-08 15:32 ` rsbecker
2025-09-08 15:10 ` Ezekiel Newren
2025-09-08 15:41 ` rsbecker
2025-09-08 15:31 ` Elijah Newren
2025-09-08 15:36 ` rsbecker
2025-09-08 16:13 ` Elijah Newren
2025-09-08 17:01 ` rsbecker
2025-09-08 6:40 ` Patrick Steinhardt
2025-09-05 10:31 ` Phillip Wood
2025-09-05 11:32 ` Sam James
2025-09-05 13:14 ` Phillip Wood
2025-09-05 13:23 ` Patrick Steinhardt
2025-09-05 15:37 ` Junio C Hamano
2025-09-08 6:40 ` Patrick Steinhardt
2025-08-23 14:29 ` Ezekiel Newren
2025-08-23 3:55 ` [PATCH v3 03/15] github workflows: install rust Ezekiel Newren via GitGitGadget
2025-08-23 3:55 ` [PATCH v3 04/15] win+Meson: do allow linking with the Rust-built xdiff Johannes Schindelin via GitGitGadget
2025-08-23 3:55 ` [PATCH v3 05/15] github workflows: upload Cargo.lock Ezekiel Newren via GitGitGadget
2025-08-23 3:55 ` [PATCH v3 06/15] ivec: create a vector type that is interoperable between C and Rust Ezekiel Newren via GitGitGadget
2025-08-23 8:12 ` Kristoffer Haugsbakk
2025-08-23 9:29 ` Ezekiel Newren
2025-08-23 16:14 ` Junio C Hamano
2025-08-23 16:37 ` Ezekiel Newren
2025-08-23 18:05 ` Junio C Hamano
2025-08-23 20:29 ` Ezekiel Newren
2025-08-25 19:16 ` Elijah Newren
2025-08-26 5:40 ` Junio C Hamano
2025-08-24 13:31 ` Ben Knoble
2025-08-25 20:40 ` Ezekiel Newren
2025-08-26 13:30 ` D. Ben Knoble
2025-08-26 18:47 ` Ezekiel Newren
2025-08-26 22:01 ` brian m. carlson
2025-08-23 3:55 ` [PATCH v3 07/15] xdiff/xprepare: remove superfluous forward declarations Ezekiel Newren via GitGitGadget
2025-08-23 3:55 ` [PATCH v3 08/15] xdiff: delete unnecessary fields from xrecord_t and xdfile_t Ezekiel Newren via GitGitGadget
2025-08-23 3:55 ` [PATCH v3 09/15] xdiff: make fields of xrecord_t Rust friendly Ezekiel Newren via GitGitGadget
2025-08-23 3:55 ` [PATCH v3 10/15] xdiff: use one definition for freeing xdfile_t Ezekiel Newren via GitGitGadget
2025-08-23 3:55 ` [PATCH v3 11/15] xdiff: replace chastore with an ivec in xdfile_t Ezekiel Newren via GitGitGadget
2025-08-23 3:55 ` [PATCH v3 12/15] xdiff: delete nrec field from xdfile_t Ezekiel Newren via GitGitGadget
2025-08-23 3:55 ` [PATCH v3 13/15] xdiff: delete recs " Ezekiel Newren via GitGitGadget
2025-08-23 3:55 ` [PATCH v3 14/15] xdiff: make xdfile_t more rust friendly Ezekiel Newren via GitGitGadget
2025-08-23 3:55 ` [PATCH v3 15/15] xdiff: implement xdl_trim_ends() in Rust Ezekiel Newren via GitGitGadget
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ec15f63f-1ddd-4b10-9c7d-c21d81ead953@gmail.com \
--to=phillip.wood123@gmail$(echo .)com \
--cc=ezekielnewren@gmail$(echo .)com \
--cc=git@vger$(echo .)kernel.org \
--cc=gitgitgadget@gmail$(echo .)com \
--cc=me@ttaylorr$(echo .)com \
--cc=newren@gmail$(echo .)com \
--cc=sandals@crustytoothpaste$(echo .)net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox