* [PATCH] show-branch: use prio_queue
@ 2025-12-26 7:44 René Scharfe
2025-12-29 18:09 ` Derrick Stolee
0 siblings, 1 reply; 3+ messages in thread
From: René Scharfe @ 2025-12-26 7:44 UTC (permalink / raw)
To: Git List
Building a list using commit_list_insert_by_date() has quadratic worst
case complexity. Avoid it by using prio_queue.
Use prio_queue_peek()+prio_queue_replace() instead of prio_queue_get()+
prio_queue_put() if possible, as the former only rebalance the
prio_queue heap once instead of twice.
In sane repositories this won't make much of a difference because the
number of items in the list or queue won't be very high:
Benchmark 1: ./git_v2.52.0 show-branch origin/main origin/next origin/seen origin/todo
Time (mean ± σ): 538.2 ms ± 0.8 ms [User: 527.6 ms, System: 9.6 ms]
Range (min … max): 537.0 ms … 539.2 ms 10 runs
Benchmark 2: ./git show-branch origin/main origin/next origin/seen origin/todo
Time (mean ± σ): 530.6 ms ± 0.4 ms [User: 519.8 ms, System: 9.8 ms]
Range (min … max): 530.1 ms … 531.3 ms 10 runs
Summary
./git show-branch origin/main origin/next origin/seen origin/todo ran
1.01 ± 0.00 times faster than ./git_v2.52.0 show-branch origin/main origin/next origin/seen origin/todo
That number is not limited, though, and in pathological cases like the
one in p6010 we see a sizable improvement:
Test v2.52.0 HEAD
------------------------------------------------------------------
6010.4: git show-branch 2.19(2.19+0.00) 0.03(0.02+0.00) -98.6%
Signed-off-by: René Scharfe <l.s.r@web•de>
---
builtin/show-branch.c | 34 +++++++++++++++++++++-------------
t/perf/p6010-merge-base.sh | 8 ++++++--
2 files changed, 27 insertions(+), 15 deletions(-)
diff --git a/builtin/show-branch.c b/builtin/show-branch.c
index 10475a6b5e..f3ebc1d4ea 100644
--- a/builtin/show-branch.c
+++ b/builtin/show-branch.c
@@ -18,6 +18,7 @@
#include "commit-slab.h"
#include "date.h"
#include "wildmatch.h"
+#include "prio-queue.h"
static const char*const show_branch_usage[] = {
N_("git show-branch [-a | --all] [-r | --remotes] [--topo-order | --date-order]\n"
@@ -59,11 +60,10 @@ static const char *get_color_reset_code(void)
return "";
}
-static struct commit *interesting(struct commit_list *list)
+static struct commit *interesting(struct prio_queue *queue)
{
- while (list) {
- struct commit *commit = list->item;
- list = list->next;
+ for (size_t i = 0; i < queue->nr; i++) {
+ struct commit *commit = queue->array[i].data;
if (commit->object.flags & UNINTERESTING)
continue;
return commit;
@@ -222,17 +222,18 @@ static int mark_seen(struct commit *commit, struct commit_list **seen_p)
return 0;
}
-static void join_revs(struct commit_list **list_p,
+static void join_revs(struct prio_queue *queue,
struct commit_list **seen_p,
int num_rev, int extra)
{
int all_mask = ((1u << (REV_SHIFT + num_rev)) - 1);
int all_revs = all_mask & ~((1u << REV_SHIFT) - 1);
- while (*list_p) {
+ while (queue->nr) {
struct commit_list *parents;
- int still_interesting = !!interesting(*list_p);
- struct commit *commit = pop_commit(list_p);
+ int still_interesting = !!interesting(queue);
+ struct commit *commit = prio_queue_peek(queue);
+ bool get_pending = true;
int flags = commit->object.flags & all_mask;
if (!still_interesting && extra <= 0)
@@ -253,8 +254,14 @@ static void join_revs(struct commit_list **list_p,
if (mark_seen(p, seen_p) && !still_interesting)
extra--;
p->object.flags |= flags;
- commit_list_insert_by_date(p, list_p);
+ if (get_pending)
+ prio_queue_replace(queue, p);
+ else
+ prio_queue_put(queue, p);
+ get_pending = false;
}
+ if (get_pending)
+ prio_queue_get(queue);
}
/*
@@ -639,7 +646,8 @@ int cmd_show_branch(int ac,
{
struct commit *rev[MAX_REVS], *commit;
char *reflog_msg[MAX_REVS] = {0};
- struct commit_list *list = NULL, *seen = NULL;
+ struct commit_list *seen = NULL;
+ struct prio_queue queue = { compare_commits_by_commit_date };
unsigned int rev_mask[MAX_REVS];
int num_rev, i, extra = 0;
int all_heads = 0, all_remotes = 0;
@@ -883,14 +891,14 @@ int cmd_show_branch(int ac,
*/
commit->object.flags |= flag;
if (commit->object.flags == flag)
- commit_list_insert_by_date(commit, &list);
+ prio_queue_put(&queue, commit);
rev[num_rev] = commit;
}
for (i = 0; i < num_rev; i++)
rev_mask[i] = rev[i]->object.flags;
if (0 <= extra)
- join_revs(&list, &seen, num_rev, extra);
+ join_revs(&queue, &seen, num_rev, extra);
commit_list_sort_by_date(&seen);
@@ -1001,7 +1009,7 @@ int cmd_show_branch(int ac,
for (size_t i = 0; i < ARRAY_SIZE(reflog_msg); i++)
free(reflog_msg[i]);
free_commit_list(seen);
- free_commit_list(list);
+ clear_prio_queue(&queue);
free(args_copy);
free(head);
return ret;
diff --git a/t/perf/p6010-merge-base.sh b/t/perf/p6010-merge-base.sh
index 54f52fa23e..08212dd037 100755
--- a/t/perf/p6010-merge-base.sh
+++ b/t/perf/p6010-merge-base.sh
@@ -83,9 +83,9 @@ build_history2 () {
test_expect_success 'setup' '
max_level=15 &&
build_history $max_level | git fast-import --export-marks=marks &&
- git tag one &&
+ git branch one &&
build_history2 $max_level | git fast-import --import-marks=marks --force &&
- git tag two &&
+ git branch two &&
git gc &&
git log --format=%H --no-merges >expect
'
@@ -98,4 +98,8 @@ test_expect_success 'verify result' '
test_cmp expect actual
'
+test_perf 'git show-branch' '
+ git show-branch one two
+'
+
test_done
--
2.52.0
^ permalink raw reply related [flat|nested] 3+ messages in thread* Re: [PATCH] show-branch: use prio_queue
2025-12-26 7:44 [PATCH] show-branch: use prio_queue René Scharfe
@ 2025-12-29 18:09 ` Derrick Stolee
2025-12-29 20:01 ` René Scharfe
0 siblings, 1 reply; 3+ messages in thread
From: Derrick Stolee @ 2025-12-29 18:09 UTC (permalink / raw)
To: René Scharfe, Git List
On 12/26/2025 2:44 AM, René Scharfe wrote:
> Building a list using commit_list_insert_by_date() has quadratic worst
> case complexity. Avoid it by using prio_queue.
Excellent idea.
> That number is not limited, though, and in pathological cases like the
> one in p6010 we see a sizable improvement:
>
> Test v2.52.0 HEAD
> ------------------------------------------------------------------
> 6010.4: git show-branch 2.19(2.19+0.00) 0.03(0.02+0.00) -98.6%
I love to see improvements like this, even if the construction is
unlikely to exist in reality. I do think it's likely to be valuable
for some large repos with many parallel branches.
Indeed, I tested this patch against a monorepo with lots of merges
with hyperfine, getting this output:
Benchmark 1: old
Time (mean ± σ): 3.303 s ± 0.146 s [User: 0.058 s, System: 0.069 s]
Range (min … max): 3.162 s … 3.631 s 10 runs
Benchmark 2: new
Time (mean ± σ): 141.7 ms ± 3.2 ms [User: 30.5 ms, System: 93.1 ms]
Range (min … max): 137.5 ms … 149.4 ms 19 runs
Summary
new ran
23.31 ± 1.15 times faster than old
> -static struct commit *interesting(struct commit_list *list)
> +static struct commit *interesting(struct prio_queue *queue)
> {
> - while (list) {
> - struct commit *commit = list->item;
> - list = list->next;
> + for (size_t i = 0; i < queue->nr; i++) {
> + struct commit *commit = queue->array[i].data;
...
> -static void join_revs(struct commit_list **list_p,
> +static void join_revs(struct prio_queue *queue,
> struct commit_list **seen_p,
> int num_rev, int extra)
> {
> int all_mask = ((1u << (REV_SHIFT + num_rev)) - 1);
> int all_revs = all_mask & ~((1u << REV_SHIFT) - 1);
>
> - while (*list_p) {
> + while (queue->nr) {
> struct commit_list *parents;
> - int still_interesting = !!interesting(*list_p);
> - struct commit *commit = pop_commit(list_p);
> + int still_interesting = !!interesting(queue);
> + struct commit *commit = prio_queue_peek(queue);
Most of the changes are obvious replacements.
> + bool get_pending = true;
But this is a new variable. Let's see how it's used.
)
> @@ -253,8 +254,14 @@ static void join_revs(struct commit_list **list_p,
> if (mark_seen(p, seen_p) && !still_interesting)
> extra--;
> p->object.flags |= flags;
> - commit_list_insert_by_date(p, list_p);
> + if (get_pending)
> + prio_queue_replace(queue, p);
> + else
> + prio_queue_put(queue, p);
> + get_pending = false;
> }
> + if (get_pending)
> + prio_queue_get(queue);
What's missing from this context is the loop iterating over
the commit's parents. Here's the full context here:
while (queue->nr) {
struct commit_list *parents;
int still_interesting = !!interesting(queue);
struct commit *commit = prio_queue_peek(queue);
bool get_pending = true;
int flags = commit->object.flags & all_mask;
if (!still_interesting && extra <= 0)
break;
mark_seen(commit, seen_p);
if ((flags & all_revs) == all_revs)
flags |= UNINTERESTING;
parents = commit->parents;
while (parents) {
struct commit *p = parents->item;
int this_flag = p->object.flags;
parents = parents->next;
if ((this_flag & flags) == flags)
continue;
repo_parse_commit(the_repository, p);
if (mark_seen(p, seen_p) && !still_interesting)
extra--;
p->object.flags |= flags;
if (get_pending)
prio_queue_replace(queue, p);
else
prio_queue_put(queue, p);
get_pending = false;
}
if (get_pending)
prio_queue_get(queue);
}
The important thing here is that we are _peeking_ at the
current commit and then doing the following:
1. Replace the current top of the queue with the first parent.
2. Insert any later parents into the queue as new elements.
3. If no parents exist, then remove the current top.
This replacement of the first parent is like a removal and a put,
but avoids a double-sift. That's a small optimization, but likely
worth the complexity you're using here.
> @@ -639,7 +646,8 @@ int cmd_show_branch(int ac,
> {
> struct commit *rev[MAX_REVS], *commit;
> char *reflog_msg[MAX_REVS] = {0};
> - struct commit_list *list = NULL, *seen = NULL;
> + struct commit_list *seen = NULL;
> + struct prio_queue queue = { compare_commits_by_commit_date };
This confirms that the queue sorts by date instead of acting like
a stack (if there was no sort specified).
> - commit_list_insert_by_date(commit, &list);
> + prio_queue_put(&queue, commit);
...
> - join_revs(&list, &seen, num_rev, extra);
> + join_revs(&queue, &seen, num_rev, extra);
...
> - free_commit_list(list);
> + clear_prio_queue(&queue);
More standard replacements. Good.
> diff --git a/t/perf/p6010-merge-base.sh b/t/perf/p6010-merge-base.sh
> index 54f52fa23e..08212dd037 100755
> --- a/t/perf/p6010-merge-base.sh
> +++ b/t/perf/p6010-merge-base.sh
> @@ -83,9 +83,9 @@ build_history2 () {
> test_expect_success 'setup' '
> max_level=15 &&
> build_history $max_level | git fast-import --export-marks=marks &&
> - git tag one &&
> + git branch one &&
> build_history2 $max_level | git fast-import --import-marks=marks --force &&
> - git tag two &&
> + git branch two &&
These replacements of tags with branches does not impede any
other tests that use 'one' or 'two', but is necessary for the
functionality of 'git show-branch'. OK.
> +test_perf 'git show-branch' '
> + git show-branch one two
> +'
Thanks for expanding the performance tests.
This patch LGTM.
-Stolee
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: [PATCH] show-branch: use prio_queue
2025-12-29 18:09 ` Derrick Stolee
@ 2025-12-29 20:01 ` René Scharfe
0 siblings, 0 replies; 3+ messages in thread
From: René Scharfe @ 2025-12-29 20:01 UTC (permalink / raw)
To: Derrick Stolee, Git List
On 12/29/25 7:09 PM, Derrick Stolee wrote:
> On 12/26/2025 2:44 AM, René Scharfe wrote:
>
>> That number is not limited, though, and in pathological cases like the
>> one in p6010 we see a sizable improvement:
>>
>> Test v2.52.0 HEAD
>> ------------------------------------------------------------------
>> 6010.4: git show-branch 2.19(2.19+0.00) 0.03(0.02+0.00) -98.6%
>
> I love to see improvements like this, even if the construction is
> unlikely to exist in reality. I do think it's likely to be valuable
> for some large repos with many parallel branches.
>
> Indeed, I tested this patch against a monorepo with lots of merges
> with hyperfine, getting this output:
>
> Benchmark 1: old
> Time (mean ± σ): 3.303 s ± 0.146 s [User: 0.058 s, System: 0.069 s]
> Range (min … max): 3.162 s … 3.631 s 10 runs
>
> Benchmark 2: new
> Time (mean ± σ): 141.7 ms ± 3.2 ms [User: 30.5 ms, System: 93.1 ms]
> Range (min … max): 137.5 ms … 149.4 ms 19 runs
>
> Summary
> new ran
> 23.31 ± 1.15 times faster than old
Woah, the perf test gets a speedup by factor 46 in a repository
purpose-built to highlight this very difference, and here you get half
of that in the wild! Interesting to see that there are real commit
histories out there with such a taxing topology.
And thanks for your review!
René
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-12-29 20:02 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-26 7:44 [PATCH] show-branch: use prio_queue René Scharfe
2025-12-29 18:09 ` Derrick Stolee
2025-12-29 20:01 ` René Scharfe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox