From: Derrick Stolee <stolee@gmail•com>
To: "René Scharfe" <l.s.r@web•de>, "Git List" <git@vger•kernel.org>
Subject: Re: [PATCH] show-branch: use prio_queue
Date: Mon, 29 Dec 2025 13:09:07 -0500 [thread overview]
Message-ID: <01d09293-4b60-4a47-9350-73b1ff796c9a@gmail.com> (raw)
In-Reply-To: <70ed751e-fc3c-4cb4-a4fd-26094a9f622e@web.de>
On 12/26/2025 2:44 AM, René Scharfe wrote:
> Building a list using commit_list_insert_by_date() has quadratic worst
> case complexity. Avoid it by using prio_queue.
Excellent idea.
> That number is not limited, though, and in pathological cases like the
> one in p6010 we see a sizable improvement:
>
> Test v2.52.0 HEAD
> ------------------------------------------------------------------
> 6010.4: git show-branch 2.19(2.19+0.00) 0.03(0.02+0.00) -98.6%
I love to see improvements like this, even if the construction is
unlikely to exist in reality. I do think it's likely to be valuable
for some large repos with many parallel branches.
Indeed, I tested this patch against a monorepo with lots of merges
with hyperfine, getting this output:
Benchmark 1: old
Time (mean ± σ): 3.303 s ± 0.146 s [User: 0.058 s, System: 0.069 s]
Range (min … max): 3.162 s … 3.631 s 10 runs
Benchmark 2: new
Time (mean ± σ): 141.7 ms ± 3.2 ms [User: 30.5 ms, System: 93.1 ms]
Range (min … max): 137.5 ms … 149.4 ms 19 runs
Summary
new ran
23.31 ± 1.15 times faster than old
> -static struct commit *interesting(struct commit_list *list)
> +static struct commit *interesting(struct prio_queue *queue)
> {
> - while (list) {
> - struct commit *commit = list->item;
> - list = list->next;
> + for (size_t i = 0; i < queue->nr; i++) {
> + struct commit *commit = queue->array[i].data;
...
> -static void join_revs(struct commit_list **list_p,
> +static void join_revs(struct prio_queue *queue,
> struct commit_list **seen_p,
> int num_rev, int extra)
> {
> int all_mask = ((1u << (REV_SHIFT + num_rev)) - 1);
> int all_revs = all_mask & ~((1u << REV_SHIFT) - 1);
>
> - while (*list_p) {
> + while (queue->nr) {
> struct commit_list *parents;
> - int still_interesting = !!interesting(*list_p);
> - struct commit *commit = pop_commit(list_p);
> + int still_interesting = !!interesting(queue);
> + struct commit *commit = prio_queue_peek(queue);
Most of the changes are obvious replacements.
> + bool get_pending = true;
But this is a new variable. Let's see how it's used.
)
> @@ -253,8 +254,14 @@ static void join_revs(struct commit_list **list_p,
> if (mark_seen(p, seen_p) && !still_interesting)
> extra--;
> p->object.flags |= flags;
> - commit_list_insert_by_date(p, list_p);
> + if (get_pending)
> + prio_queue_replace(queue, p);
> + else
> + prio_queue_put(queue, p);
> + get_pending = false;
> }
> + if (get_pending)
> + prio_queue_get(queue);
What's missing from this context is the loop iterating over
the commit's parents. Here's the full context here:
while (queue->nr) {
struct commit_list *parents;
int still_interesting = !!interesting(queue);
struct commit *commit = prio_queue_peek(queue);
bool get_pending = true;
int flags = commit->object.flags & all_mask;
if (!still_interesting && extra <= 0)
break;
mark_seen(commit, seen_p);
if ((flags & all_revs) == all_revs)
flags |= UNINTERESTING;
parents = commit->parents;
while (parents) {
struct commit *p = parents->item;
int this_flag = p->object.flags;
parents = parents->next;
if ((this_flag & flags) == flags)
continue;
repo_parse_commit(the_repository, p);
if (mark_seen(p, seen_p) && !still_interesting)
extra--;
p->object.flags |= flags;
if (get_pending)
prio_queue_replace(queue, p);
else
prio_queue_put(queue, p);
get_pending = false;
}
if (get_pending)
prio_queue_get(queue);
}
The important thing here is that we are _peeking_ at the
current commit and then doing the following:
1. Replace the current top of the queue with the first parent.
2. Insert any later parents into the queue as new elements.
3. If no parents exist, then remove the current top.
This replacement of the first parent is like a removal and a put,
but avoids a double-sift. That's a small optimization, but likely
worth the complexity you're using here.
> @@ -639,7 +646,8 @@ int cmd_show_branch(int ac,
> {
> struct commit *rev[MAX_REVS], *commit;
> char *reflog_msg[MAX_REVS] = {0};
> - struct commit_list *list = NULL, *seen = NULL;
> + struct commit_list *seen = NULL;
> + struct prio_queue queue = { compare_commits_by_commit_date };
This confirms that the queue sorts by date instead of acting like
a stack (if there was no sort specified).
> - commit_list_insert_by_date(commit, &list);
> + prio_queue_put(&queue, commit);
...
> - join_revs(&list, &seen, num_rev, extra);
> + join_revs(&queue, &seen, num_rev, extra);
...
> - free_commit_list(list);
> + clear_prio_queue(&queue);
More standard replacements. Good.
> diff --git a/t/perf/p6010-merge-base.sh b/t/perf/p6010-merge-base.sh
> index 54f52fa23e..08212dd037 100755
> --- a/t/perf/p6010-merge-base.sh
> +++ b/t/perf/p6010-merge-base.sh
> @@ -83,9 +83,9 @@ build_history2 () {
> test_expect_success 'setup' '
> max_level=15 &&
> build_history $max_level | git fast-import --export-marks=marks &&
> - git tag one &&
> + git branch one &&
> build_history2 $max_level | git fast-import --import-marks=marks --force &&
> - git tag two &&
> + git branch two &&
These replacements of tags with branches does not impede any
other tests that use 'one' or 'two', but is necessary for the
functionality of 'git show-branch'. OK.
> +test_perf 'git show-branch' '
> + git show-branch one two
> +'
Thanks for expanding the performance tests.
This patch LGTM.
-Stolee
next prev parent reply other threads:[~2025-12-29 18:09 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-26 7:44 [PATCH] show-branch: use prio_queue René Scharfe
2025-12-29 18:09 ` Derrick Stolee [this message]
2025-12-29 20:01 ` René Scharfe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=01d09293-4b60-4a47-9350-73b1ff796c9a@gmail.com \
--to=stolee@gmail$(echo .)com \
--cc=git@vger$(echo .)kernel.org \
--cc=l.s.r@web$(echo .)de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox