public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Jeff King <peff@peff•net>
To: Taylor Blau <me@ttaylorr•com>
Cc: git@vger•kernel.org, Junio C Hamano <gitster@pobox•com>,
	Elijah Newren <newren@gmail•com>,
	Derrick Stolee <stolee@gmail•com>
Subject: Re: [PATCH 8/8] pack-bitmap: build pseudo-merge bitmaps after regular bitmaps
Date: Wed, 27 May 2026 06:25:34 -0400	[thread overview]
Message-ID: <20260527102534.GH981444@coredump.intra.peff.net> (raw)
In-Reply-To: <30ce254312cfee2a2a82f08246c3a2546ae32578.1779207127.git.me@ttaylorr.com>

On Tue, May 19, 2026 at 12:12:55PM -0400, Taylor Blau wrote:

> Each selected commit starts with one commit_mask bit in its "commit
> mask" bitmap. Then, we walk the first-parent history in topological
> order and OR each commit's mask into its (first) parent. Whenever that
> OR results in the parent having more bits set, the child is deemed to be
> non-maximal, and the frontier is pushed further back along the first
> parent history.
> 
> That approach works extremely well for ordinary selected commits, whose
> first-parent histories often describe real sharing between the bitmaps
> we are going to write.
> 
> It struggles, however, to efficiently generate pseudo-merge bitmaps.
> Unlike ordinary commits for which the above algorithm is designed,
> pseudo-merges don't represent any "real" commit in history, just a
> grouping of non-bitmapped reference tips. In that sense, their first
> parent is just a part of a larger set, and treating them like ordinary
> selected commits imposes a significant slow-down when generating bitmaps
> with pseudo-merges enabled.

This is a great explanation of the problem, and especially this:

> In other words, we pay a nearly ~5 minute penalty to generate
> pseudo-merge bitmaps, but only save ~50 seconds during traversal.

makes it clear that we're doing something sub-optimal. And it points us
in the right direction, since that traversal should be able to generate
the pseudo-merge bitmap we need in the first place! So that should be
our goal to work towards.

> Instead, build the regular selected commit bitmaps first, considering
> only non-pseudo-merge commits in `bitmap_builder_init()`. Once those
> bitmaps have been stored, build each pseudo-merge bitmap separately and
> attach its parent and object bitmaps to the corresponding pseudo-merge
> entry before writing the extension.

And then this solution follows naturally from the earlier explanations.
Good.

In some ways this goes back to the pre-v2.31 way of generating bitmaps,
which is to just traverse for each bitmap independently. But as you
note, the whole idea of pseudo-merge bitmaps is that they aren't
overlapping in any meaningful way. So doing one fill-in traversal per
pseudo-merge makes sense, and hopefully we hit enough real bitmaps that
it's not too costly.

> As a result, the overhead cost for generating pseudo-merges in the above
> configuration is much smaller:
> 
>     +------------------+-----------------+---------------+-------------------+
>     |                  | no pseudo-merge | pseudo-merges | Delta             |
>     |                  |                 | (HEAD)        |                   |
>     +------------------+-----------------+---------------+-------------------+
>     | elapsed          |   294.1 s       |   328.4 s     |  +34.3 s (+11.7%) |
>     | cycles           | 1,365.5 B       | 1,529.3 B     | +163.7 B (+12.0%) |
>     | instructions     | 1,389.8 B       | 1,552.8 B     | +163.0 B (+11.7%) |
>     | CPI              |     0.983       |     0.985     |  +0.002   (+0.2%) |
>     +------------------+-----------------+---------------+-------------------+

Nice. The time savings are going to depend on how many pseudo-merges we
generate, I think. And I'd guess that the numbers above come from making
one big pseudo-merge bitmap, per the config you showed earlier. But you
probably only want a handful of them in any repo, so hopefully it
doesn't scale _too_ badly.

> Recall that at the start of this series, generating reachability bitmaps
> took 612.5 seconds *without* pseudo-merges. With this commit, it is
> still ~46.38% *faster* to generate reachability bitmaps *with*
> pseudo-merges than it was to generate bitmaps wihtout them at the
> beginning of this series.

Sure, though 612.5 seconds is all in the distant past. We only care
about 294.1 seconds now. ;)

More seriously, I do think the interesting question here is how the time
scales for various pseudo-merge configurations. I don't know if we have
any real operational experience with them yet. The original idea is that
you might slice up the ref space into a few chunks. I'd guess that the
old code performed badly-ish overall, but the time did not grow all that
much as you increased the number of chunks. But with the new code, I
suspect that the cost grows more linearly with number of chunks. That's
just a guess, though.

The other thing we hope for with pseudo-merges is that the chunks are
selected such that most of the chunks don't change (because they are
composed of old, stable refs). So in subsequent bitmap generations, we
can either reuse them either verbatim or as a starting point (if there
were only additions). But all of that is going to be heuristic and
depend on your config, the changes the repo sees over time, and so on.

So I don't know if we'd really have good numbers on that.

> Now that we have decoupled how we generate pseudo-merges from their
> representation, the following commits will improve the API around
> specifying pseudo-merge groupings during bitmap generation.

I think we're at patch 8/8 here. I guess you have more to come
eventually, but for now this part is just misleading. ;)

>  pack-bitmap-write.c | 210 ++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 174 insertions(+), 36 deletions(-)

The patch looks reasonable, though I'm not all that familiar with the
ins and outs of the pseudo-merge code. I'd trust the tests here more
than my review.

> @@ -696,12 +700,32 @@ static int fill_bitmap_commit(struct bitmap_writer *writer,
>  		 * walk ensures we cover all parents.
>  		 */
>  		if (!(c->object.flags & BITMAP_PSEUDO_MERGE)) {
> +			struct tree *tree;
> +
> +			if (from_pseudo_merge && !c->object.parsed) {
> +				/*
> +				 * Commits reachable from selected
> +				 * non-pseudo-merges are already parsed
> +				 * by the regular bitmap build.
> +				 *
> +				 * However, pseudo-merge fills can also
> +				 * reach commits that were not covered
> +				 * there, so parse any such leftovers
> +				 * before reading their tree or parents.
> +				 */
> +				if (repo_parse_commit(writer->repo, c))
> +					return -1;
> +			}

Makes sense. This should be a quick noop for the regular bitmap build,
since we'll have the parsed flag set. And it should even allow use of
the commit-graph if it's available.

-Peff

  reply	other threads:[~2026-05-27 10:25 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-19 16:12 [PATCH 0/8] pack-bitmap-write: speed up bitmap generation Taylor Blau
2026-05-19 16:12 ` [PATCH 1/8] pack-bitmap: pass object position to `fill_bitmap_tree()` Taylor Blau
2026-05-27  8:57   ` Jeff King
2026-05-27 14:36     ` Taylor Blau
2026-05-19 16:12 ` [PATCH 2/8] pack-bitmap: check subtree bits before recursing Taylor Blau
2026-05-27  9:03   ` Jeff King
2026-05-27 14:36     ` Taylor Blau
2026-05-19 16:12 ` [PATCH 3/8] pack-bitmap: reuse stored selected bitmaps Taylor Blau
2026-05-27  9:24   ` Jeff King
2026-05-27 14:40     ` Taylor Blau
2026-05-29  6:00       ` Jeff King
2026-05-19 16:12 ` [PATCH 4/8] pack-bitmap: consolidate `find_object_pos()` success path Taylor Blau
2026-05-20 14:42   ` SZEDER Gábor
2026-05-20 17:12     ` Taylor Blau
2026-05-27  9:27   ` Jeff King
2026-05-19 16:12 ` [PATCH 5/8] pack-bitmap: cache object positions during fill Taylor Blau
2026-05-27  9:45   ` Jeff King
2026-05-27 14:46     ` Taylor Blau
2026-05-19 16:12 ` [PATCH 6/8] pack-bitmap: sort bitmaps before XORing Taylor Blau
2026-05-27 10:04   ` Jeff King
2026-05-27 16:56     ` Taylor Blau
2026-05-29  8:26       ` Jeff King
2026-05-19 16:12 ` [PATCH 7/8] pack-bitmap: remember pseudo-merge parents Taylor Blau
2026-05-19 16:12 ` [PATCH 8/8] pack-bitmap: build pseudo-merge bitmaps after regular bitmaps Taylor Blau
2026-05-27 10:25   ` Jeff King [this message]
2026-05-27 19:24     ` Taylor Blau
2026-05-29  8:33       ` Jeff King
2026-05-27 10:27 ` [PATCH 0/8] pack-bitmap-write: speed up bitmap generation Jeff King
2026-05-27 19:55 ` [PATCH v2 " Taylor Blau
2026-05-27 19:55   ` [PATCH v2 1/8] pack-bitmap: pass object position to `fill_bitmap_tree()` Taylor Blau
2026-05-27 19:55   ` [PATCH v2 2/8] pack-bitmap: check subtree bits before recursing Taylor Blau
2026-05-27 19:55   ` [PATCH v2 3/8] pack-bitmap: reuse stored selected bitmaps Taylor Blau
2026-05-27 19:55   ` [PATCH v2 4/8] pack-bitmap: consolidate `find_object_pos()` success path Taylor Blau
2026-05-27 19:56   ` [PATCH v2 5/8] pack-bitmap: cache object positions during fill Taylor Blau
2026-05-27 19:56   ` [PATCH v2 6/8] pack-bitmap: sort bitmaps before XORing Taylor Blau
2026-05-27 19:56   ` [PATCH v2 7/8] pack-bitmap: remember pseudo-merge parents Taylor Blau
2026-05-27 19:56   ` [PATCH v2 8/8] pack-bitmap: build pseudo-merge bitmaps after regular bitmaps Taylor Blau
2026-05-29  8:34   ` [PATCH v2 0/8] pack-bitmap-write: speed up bitmap generation Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260527102534.GH981444@coredump.intra.peff.net \
    --to=peff@peff$(echo .)net \
    --cc=git@vger$(echo .)kernel.org \
    --cc=gitster@pobox$(echo .)com \
    --cc=me@ttaylorr$(echo .)com \
    --cc=newren@gmail$(echo .)com \
    --cc=stolee@gmail$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox