From: Taylor Blau <me@ttaylorr•com>
To: Toon Claes <toon@iotcl•com>
Cc: git@vger•kernel.org, Karthik Nayak <karthik.188@gmail•com>,
Justin Tobler <jltobler@gmail•com>,
Derrick Stolee <stolee@gmail•com>
Subject: Re: [PATCH] last-modified: implement faster algorithm
Date: Thu, 16 Oct 2025 19:38:36 -0400 [thread overview]
Message-ID: <aPGB/FJtjDmyNLvG@nand.local> (raw)
In-Reply-To: <20251016-b4-toon-last-modified-faster-v1-1-85dca8a29e5c@iotcl.com>
On Thu, Oct 16, 2025 at 10:39:25AM +0200, Toon Claes wrote:
> [...]
>
> To avoid computing many first-parent diffs, add another trick on top of
> this and check if all paths active in 'c' are DEFINITELY NOT in c's
> Bloom filter. Since the commit-graph only stores first-parent diffs in
> the Bloom filters, we can only apply this trick to first-parent diffs.
OK, up to this point this is the same as the commit message that I wrote
with Stolee back in 2020.
> Comparing the performance of this new algorithm shows about a 2.6x
> improvement on git.git:
>
> Benchmark 1: master
> Time (mean ± σ): 3.077 s ± 0.055 s [User: 3.017 s, System: 0.051 s]
> Range (min … max): 2.947 s … 3.127 s 10 runs
>
> Benchmark 2: HEAD
> Time (mean ± σ): 1.181 s ± 0.010 s [User: 1.139 s, System: 0.038 s]
> Range (min … max): 1.169 s … 1.194 s 10 runs
>
> Summary
> HEAD ran
> 2.60 ± 0.05 times faster than master
>
> But when comparing a more extreme example of
> `git last-modified -- COPYING t`, the difference is a lot bigger:
>
> Benchmark 1: master
> Time (mean ± σ): 4.372 s ± 0.057 s [User: 4.286 s, System: 0.062 s]
> Range (min … max): 4.308 s … 4.509 s 10 runs
>
> Benchmark 2: HEAD
> Time (mean ± σ): 826.3 ms ± 22.3 ms [User: 784.1 ms, System: 39.2 ms]
> Range (min … max): 810.6 ms … 881.2 ms 10 runs
>
> Summary
> HEAD ran
> 5.29 ± 0.16 times faster than master
These benchmarks are different than the ones that I provided, which is
good, since we should be measuring modern Git, not dragging forward
stale benchmarks ;-).
I imagine that you are just doing a straight last-modified run here in
both instances. In the original patch, I timed this both with and
without changed-path Bloom filters, which helped illustrate their impact
on the changes here.
I'd suggest including those benchmarks as well, and potentially running
them on linux.git, or another comparably larger open-source repository.
git.git is large enough to show some interesting behavior, but I always
have found it useful to compare the results against a larger repository
as well.
> As an added benefit, this implementation gives more correct results. For
> example implementation in 'master' gives:
>
> $ git log --max-count=1 --format=%H -- pkt-line.h
> 15df15fe07ef66b51302bb77e393f3c5502629de
>
> $ git last-modified -- pkt-line.h
> 15df15fe07ef66b51302bb77e393f3c5502629de pkt-line.h
>
> $ git last-modified | grep pkt-line.h
> 5b49c1af03e600c286f63d9d9c9fb01403230b9f pkt-line.h
>
> With the changes in this patch the results of git-last-modified(1)
> always match those of `git log --max-count=1`.
>
> One thing to note though, the results might be outputted in a different
> order than before. This is not considerd to be an issue because nowhere
> is documented the order is guaranteed.
>
> Based-on-patches-by: Taylor Blau <me@ttaylorr•com>
Stolee and I wrote these patches together many years ago, so he should
be credited here as well. Since this patch appears to be substantially
based on the original work, I think it is appropriate to include my
S-o-b once the patch is ready.
> ---
> This series revives those changes. I did more thorough deep dive through
> the code and the algorithm and got the code working a lot faster. The
> benchmark results can be found in the commit message.
>
> Some changes compared to GitHub's version include:
>
> * Use of `struct bitmap` from "ewah/ewok.h", instead of self-defined
> `struct commit_active_paths`.
>
> * Removed shortcut code that handled the case when commit and parent
> are fully treesame, and instead always checked 'active_c' whether the
> next parent is worth looking at.
>
> * Modified comments and commit message to make the algorithm more
> clear (at least to me).
>
> * Mentioned the use of PARENT1 and PARENT2 in object.h.
>
> * Removed the use of any global variables.
>
> * Less conditions are checked in mark_path() because the hashmap of
> 'paths' is considered the single-source of truth.
>
> * pass_to_parent() doesn't pass on when the path isn't in the 'paths'
> hashmap no more.
Thanks for clearly showing what the changes on top are. When I applied
this locally and ran it, it pretty quickly segfaulted for me:
expecting success of 8020.3 'last-modified non-recursive':
check_last_modified <<-\EOF
3 a
1 file
EOF
+ check_last_modified
+ local indir=
+ test 0 != 0
+ cat
+ git last-modified
Segmentation fault
error: last command exited with $?=139
not ok 3 - last-modified non-recursive
#
# check_last_modified <<-\EOF
# 3 a
# 1 file
# EOF
#
1..3
Looking through the backtrace, it looks like someone is calling
mark_path() with a NULL oid, like so:
(gdb) bt
#0 __memcmp_evex_movbe ()
at ../sysdeps/x86_64/multiarch/memcmp-evex-movbe.S:132
#1 0x00005555555f2c32 in oideq (oid1=0x0, oid2=0x555555a5eeb0)
at ./hash.h:408
#2 0x00005555555f3523 in mark_path (path=0x555555a5eee8 "a", oid=0x0,
data=0x7fffffffd650) at builtin/last-modified.c:179
, which makes sense, since at the end of the main loop we call
mark_path() on all remaining active paths to indicate that they were
modified by whatever commit we just popped off the queue.
Something like this on top (which matches the original patch that I sent
from GitHub's fork) fixes the tests:
--- 8< ---
diff --git a/builtin/last-modified.c b/builtin/last-modified.c
index 40e520ba18..c8f66633a7 100644
--- a/builtin/last-modified.c
+++ b/builtin/last-modified.c
@@ -176,7 +176,7 @@ static void mark_path(const char *path, const struct object_id *oid,
* Is it arriving at a version of interest, or is it from a side branch
* which did not contribute to the final state?
*/
- if (!oideq(oid, &ent->oid))
+ if (oid && !oideq(oid, &ent->oid))
return;
last_modified_emit(data->lm, path, data->commit);
--- >8 ---
> +/* Remember to update object flag allocation in object.h */
> +#define PARENT1 (1u<<16) /* used instead of SEEN */
> +#define PARENT2 (1u<<17) /* used instead of BOTTOM, BOUNDARY */
> +
> struct last_modified_entry {
> struct hashmap_entry hashent;
> struct object_id oid;
> struct bloom_key key;
> + size_t diff_idx;
> const char path[FLEX_ARRAY];
> };
>
> @@ -37,13 +43,35 @@ static int last_modified_entry_hashcmp(const void *unused UNUSED,
> return strcmp(ent1->path, path ? path : ent2->path);
> }
>
> +/*
> + * Hold a bitmap for each commit we're working with. Each bit represents a path
> + * in `lm->all_paths`. Active bit means the path still needs to be dealt with.
> + */
> +define_commit_slab(commit_bitmaps, struct bitmap *);
> +
Nice, I am glad to see that we are using a bitmap here rather than the
hacky 'char *' that we had originally written. I seem to remember that
there was a tiny slow-down when using bitmaps, but can't find the
discussion anymore. (It wasn't in the internal PR that I originally
opened, and I no longer can read messages that far back in history.)
It might be worth benchmarking here to see if using a 'char *' is
faster. Of course, that's 8x worse in terms of memory usage, but not a
huge deal given both the magnitude and typical number of directory
elements (you'd need 1024^2 entries in a single tree to occupy even a
single MiB of heap).
Regardless of how you handle the above, I think that the commit slab
name here is a little generic. I guess it's OK since this is only
visible within this compilation unit, but perhaps something like
"active_paths_bitmap" would be more descriptive.
Likewise, I wonder if we should have elemtype here be just 'struct
bitmap'. Unfortunately I don't think the EWAH code has a function like:
void bitmap_init(struct bitmap *);
and only has ones that allocate for us. So we may consider adding one,
or creating a dummy bitmap and copying its contents, or otherwise.
> struct last_modified {
> struct hashmap paths;
> struct rev_info rev;
> bool recursive;
> bool show_trees;
> +
> + const char **all_paths;
> + size_t all_paths_nr;
I wonder if all_paths should be a strvec here? I think that this code
was all written when the type was called argv_array (hilariously, that
change took place towards the end of July, 2020, and the --go-faster
code where this patch came from was written just a couple of weeks
earlier.)
> @@ -196,7 +226,36 @@ static void last_modified_diff(struct diff_queue_struct *q,
> }
> }
>
> -static bool maybe_changed_path(struct last_modified *lm, struct commit *origin)
> +static size_t path_idx(struct last_modified *lm, char *path)
> +{
> + struct last_modified_entry *ent;
> + ent = hashmap_get_entry_from_hash(&lm->paths, strhash(path), path,
> + struct last_modified_entry, hashent);
> +
> + return ent ? ent->diff_idx : -1;
> +}
> +
> +static void pass_to_parent(struct last_modified *lm,
> + struct bitmap *c,
> + struct bitmap *p,
> + size_t pos)
> +{
> + struct last_modified_entry *ent;
> + struct hashmap_iter iter;
> +
> + bitmap_unset(c, pos);
> +
> + hashmap_for_each_entry(&lm->paths, &iter, ent, hashent) {
> + if (ent->diff_idx == pos) {
> + bitmap_set(p, pos);
> + break;
> + }
> + }
> +}
This one I'm not quite following. The original implementation does
something like:
c->active[i] = 0;
c->nr--;
p->active[i] = 1;
p->nr++;
, where 'i' is an index into the all_paths array. It looks like you are
effectively doing the first part of that with the bitmap_unset() call,
but I'm confused why you're iterating over paths here.
The caller in process_parents() is iterating over all entries that are
treesame to the parent, bit by bit. So I think you can just bitmap_set()
in the parent directly here, but let me know if I am missing something.
> @@ -220,42 +282,197 @@ static bool maybe_changed_path(struct last_modified *lm, struct commit *origin)
> return false;
> }
>
> +static void process_parent(struct last_modified *lm,
> + struct prio_queue *queue,
> + struct commit *c, struct bitmap *active_c,
> + struct commit *parent, int parent_i)
> +{
> + size_t i;
> + struct bitmap *active_p;
> +
> + repo_parse_commit(lm->rev.repo, parent);
> + active_p = get_bitmap(lm, parent);
> +
> + /*
> + * The first time entering this function for this commit (i.e. first parent)
> + * see if Bloom filters will tell us it's worth to do the diff.
> + */
> + if (parent_i || maybe_changed_path(lm, c, active_c)) {
> + diff_tree_oid(&parent->object.oid,
> + &c->object.oid, "", &lm->rev.diffopt);
> + diffcore_std(&lm->rev.diffopt);
> + }
> +
> + /*
> + * Otherwise, test each path for TREESAME-ness against the parent. If
This "otherwise" is referencing a piece of the patch that doesn't appear
to be here directly, which is how we handle the special case of having
nothing in the diff queue, meaning we are treesame at the root.
In the GitHub version of this patch, we pass all active paths to the
parent, assign the PARENT1 flag if it doesn't already have it, and put
it in the queue as well.
In your version, we'd skip past the next for-loop, and do the same
pass-to-parent dance below, along with inserting the parent into the
prio queue.
So I think that this is all functionally equivalent, but I had to work
through a little bit of the details here, mostly since I haven't looked
at or thought about this code in many years ;-).
> static int last_modified_run(struct last_modified *lm)
> {
> + int max_count, queue_popped = 0;
> + struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
> + struct prio_queue not_queue = { compare_commits_by_gen_then_commit_date };
> + struct commit_list *list;
> struct last_modified_callback_data data = { .lm = lm };
>
> lm->rev.diffopt.output_format = DIFF_FORMAT_CALLBACK;
> lm->rev.diffopt.format_callback = last_modified_diff;
> lm->rev.diffopt.format_callback_data = &data;
> + lm->rev.no_walk = 1;
This one is new relative to the original patch. Why set no_walk here?
>
> prepare_revision_walk(&lm->rev);
>
> - while (hashmap_get_size(&lm->paths)) {
> - data.commit = get_revision(&lm->rev);
> - if (!data.commit)
> - BUG("paths remaining beyond boundary in last-modified");
> + max_count = lm->rev.max_count;
> +
> + init_commit_bitmaps(&lm->commit_bitmaps);
> + lm->scratch = bitmap_word_alloc(lm->all_paths_nr);
> +
> + /*
> + * lm->rev.commits holds the set of boundary commits for our walk.
> + *
> + * Loop through each such commit, and place it in the appropriate queue.
> + */
> + for (list = lm->rev.commits; list; list = list->next) {
Hmm. In the original patch, we look at rev.pending, not rev.commits. The
rest of the patch looks good to me and looks like a faithful
representation of the original patch from GitHub's fork. Thanks for
working on this and making the new last-modified builtin faster ;-).
Thanks,
Taylor
next prev parent reply other threads:[~2025-10-16 23:38 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-16 8:39 [PATCH] last-modified: implement faster algorithm Toon Claes
2025-10-16 18:51 ` Justin Tobler
2025-10-17 10:38 ` Toon Claes
2025-10-16 20:48 ` D. Ben Knoble
2025-10-17 10:45 ` Toon Claes
2025-10-16 23:38 ` Taylor Blau [this message]
2025-10-17 6:30 ` Jeff King
2025-10-17 14:54 ` Taylor Blau
2025-10-21 8:20 ` Jeff King
2025-10-17 12:07 ` Toon Claes
2025-10-21 9:04 ` Toon Claes
2025-10-23 23:59 ` Taylor Blau
2025-10-21 13:00 ` Toon Claes
2025-10-23 23:56 ` Taylor Blau
2025-10-27 15:48 ` Toon Claes
2025-10-17 6:37 ` Jeff King
2025-10-17 10:47 ` Toon Claes
2025-10-21 12:56 ` [PATCH v2] " Toon Claes
2025-10-21 17:52 ` Junio C Hamano
2025-10-22 0:26 ` Taylor Blau
2025-10-22 0:28 ` Taylor Blau
2025-10-22 3:48 ` Junio C Hamano
2025-10-24 0:01 ` Taylor Blau
2025-10-24 0:37 ` Junio C Hamano
2025-10-27 19:22 ` Taylor Blau
2025-10-29 13:01 ` Toon Claes
2025-10-23 8:01 ` Toon Claes
2025-10-23 7:50 ` [PATCH v3] " Toon Claes
2025-10-24 0:03 ` Taylor Blau
2025-10-27 7:03 ` Toon Claes
2025-11-03 15:47 ` [PATCH v4] " Toon Claes
2025-11-03 16:44 ` Junio C Hamano
2025-11-04 15:08 ` Toon Claes
2025-11-19 11:34 ` t8020-last-modified.sh failure on s390x (Re: [PATCH v4] last-modified: implement faster algorithm) Anders Kaseorg
2025-11-19 13:49 ` Kristoffer Haugsbakk
2025-11-19 20:06 ` Anders Kaseorg
2025-11-20 8:16 ` Jeff King
2025-11-28 16:45 ` Toon Claes
2025-11-28 17:35 ` Kristoffer Haugsbakk
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aPGB/FJtjDmyNLvG@nand.local \
--to=me@ttaylorr$(echo .)com \
--cc=git@vger$(echo .)kernel.org \
--cc=jltobler@gmail$(echo .)com \
--cc=karthik.188@gmail$(echo .)com \
--cc=stolee@gmail$(echo .)com \
--cc=toon@iotcl$(echo .)com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox